Kang ZHAO Jinian BIAN Sheqin DONG Yang SONG Satoshi GOTO
To improve the computation efficiency of the application specific instruction-set processor (ASIP), a strategy of hardware/software collaborative design is usually utilized. In this process, the auto-customization of specific instruction set has always been a key part to support the automated design of ASIP. The key issue of this problem is how to effectively reduce the huge exponential exploration space in the instruction identification process. To address this issue, we first formulate it as a feasible sub-graph enumeration problem under multiple constraints, and then propose a fast instruction identification algorithm based on a new model called basic convex pattern (BCP). The kernel technique in this algorithm is the transformation from the graph exploration to the formula-based computations. The experimental results have indicated that the proposed algorithm has a distinct reduction in the execution time.
Chih-Chien WANG Shu-Chen CHANG
Recent developments in information technology have made it easy for people to "chat" online with others in real time, and many do so regularly. "Virtual" relationships can be attractive, especially for people with social interaction problems in the "real world". This study examines the influence on online chat dependency of three dimensions of social anxiety: general social situation fear, negative evaluation fear, and novel social situation fear. Participants of this study were 454 college students. The survey results show that negative evaluation fear and general social situation fear are relative to online chat dependency, while novel social situation fear does not seem to be a relevant factor.
Sumek WISAYATAKSIN Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA
We propose a low cost and stand-alone platform-based SoC for H.264/AVC decoder, whose target is practical mobile applications such as a handheld video player. Both low cost and stand-alone solutions are particularly emphasized. The SoC, consisting of RISC core and decoder core, has advantages in terms of flexibility, testability and various I/O interfaces. For decoder core design, the proposed H.264/AVC coprocessor in the SoC employs a new block pipelining scheme instead of a conventional macroblock or a hybrid one, which greatly contribute to reducing drastically the size of the core and its pipelining buffer. In addition, the decoder schedule is optimized to block level which is easy to be programmed. Actually, the core size is reduced to 138 KGate with 3.5 kbyte memory. In our practical development, a single external SDRAM is sufficient for both reference frame buffer and display buffer. Various peripheral interfaces such as a compact flash, a digital broadcast receiver and a LCD driver are also provided on a chip.
Tomokazu YONEDA Kimihiko MASUDA Hideo FUJIWARA
This paper presents a power-constrained test scheduling method for multi-clock domain SoCs that consist of cores operating at different clock frequencies during test. In the proposed method, we utilize virtual TAM to solve the frequency gaps between cores and the ATE. Moreover, we present a technique to reduce power consumption of cores during test while the test time of the cores remain the same or increase a little by using virtual TAM. Experimental results show the effectiveness of the proposed method.
Thomas Edison YU Tomokazu YONEDA Danella ZHAO Hideo FUJIWARA
The rapid advancement of VLSI technology has made it possible for chip designers and manufacturers to embed the components of a whole system onto a single chip, called System-on-Chip or SoC. SoCs make use of pre-designed modules, called IP-cores, which provide faster design time and quicker time-to-market. Furthermore, SoCs that operate at multiple clock domains and very low power requirements are being utilized in the latest communications, networking and signal processing devices. As a result, the testing of SoCs and multi-clock domain embedded cores under power constraints has been rapidly gaining importance. In this research, a novel method for designing power-aware test wrappers for embedded cores with multiple clock domains is presented. By effectively partitioning the various clock domains, we are able to increase the solution space of possible test schedules for the core. Since previous methods were limited to concurrently testing all the clock domains, we effectively remove this limitation by making use of bandwidth conversion, multiple shift frequencies and properly gating the clock signals to control the shift activity of various core logic elements. The combination of the above techniques gains us greater flexibility when determining an optimal test schedule under very tight power constraints. Furthermore, since it is computationally intensive to search the entire expanded solution space for the possible test schedules, we propose a heuristic 3-D bin packing algorithm to determine the optimal wrapper architecture and test schedule while minimizing the test time under power and bandwidth constraints.
Kazuya HARAGUCHI Mutsunori YAGIURA Endre BOROS Toshihide IBARAKI
We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a co-occurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing "deceptive" good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.
Chia-Chun TSAI Jan-Ou WU Trong-Yen LEE
This study has demonstrated that the clock tree construction in an SoC should be expanded to consider the intrinsic delay and skew of each IP's clock sink. A novel algorithm, called GDME, is proposed to combine grey relational clustering and DME approach for solving the problem of clock tree construction. Grey relational analysis can cluster the best pair of clock sinks and that guide a tapping point search for a DME algorithm for constructing a clock tree with zero skew and minimal delay. Experimentally, the proposed algorithm always obtains an RC- or RLC-based clock tree with zero skew and minimal delay for all the test cases and benchmarks. Experimental results demonstrate that the GDME improves up to 3.74% for total average in terms of total wire length compared with other DME algorithms. Furthermore, our results for the zero-skew RLC-based clock trees compared with Hspice are 0.017% and 0.2% lower for absolute average in terms of skew and delay, respectively.
Hiroki SHIMANO Fukashi MORISHITA Katsumi DOSAKA Kazutami ARIMOTO
The advanced-DFM (Design For Manufacturability) RAM provides the solution for the limitation of SRAM voltage scaling down and the countermeasure of the process fluctuations. The characteristics of this RAM are the voltage scalability (@0.6 V operation) with wide operating margin and the reliability of long data retention time. The memory cell consists of 2 Cell/bit with the complementary dynamic memory operation and has the 1 Cell/bit test mode for the accelerated screening against the marginal cells. The GND bitline pre-charge sensing scheme and SSW (Sense Synchronized Write) peripheral circuit technologies are also adopted for the low voltage and DFV (Dynamic Frequency and Voltage) controllable SoC which will be strongly required from the many kinds of applications. This RAM supports the DFM functions with both good cell/bit for advanced process technologies and the voltage scalable SoC memory platform.
In this paper, a new method for clustering of players in order to analyze games in soccer videos is proposed. The proposed method classifies players who are closely related in terms of soccer tactics into one group. Considering soccer tactics, the players in one group are located near each other. For this reason, the Euclidean distance between the players is an effective measurement for the clustering of players. However, the distance is not sufficient to extract tactics-based groups. Therefore, we utilize a modified version of the community extraction method, which finds community structure by dividing a non-directed graph. The use of this method in addition to the distance enables accurate clustering of players.
Md. Anwarul ABEDIN Yuki TANAKA Ali AHMADI Shogo SAKAKIBARA Tetsushi KOIDE Hans Jurgen MATTAUSCH
The realization of k-nearest-matches search capability in fully-parallel mixed digital-analog associative memories by a sequential autonomous search mode is reported. The proposed concept and circuit implementation can be applied with all types of distance measures such as Hamming, Manhattan or Euclidean distance search, and the k value can be freely selected during operation. A test chip for concept verification has been designed in 0.35 µm CMOS technology with two-poly, three-metal layers, realizes k-nearest-matches Euclidean distance search and consumes 5.12 mm2 of the chip area for 64 reference patterns each with 16 units of 5-bit.
Over the past ten years, the demand for low-cost, low-power, and small form-factor portable wireless devices has led to the integration of RF transceivers on the same silicon as digital processors to form wireless systems-on-a-chip. This paper describes the challenges in designing CMOS systems-on-a-chip for wireless communications. RF transceiver building blocks for signal amplification, frequency translation, and frequency selectivity are examined with special emphasis on low noise amplifiers, power amplifiers, mixers, and frequency synthesizers. System-on-a-chip integration issues such as leakage currents of digital logic, calibration techniques, and noise coupling are also discussed.
Koichiro NOGUCHI Takushi HASHIDA Makoto NAGATA
A highly multi-channel on-chip signal monitor and off-chip waveform acquisition processor established analog circuit diagnosis against environmental disturbances in SoC. An array of 53 distributed probes followed by a single shared waveform acquisition kernel is embedded in a 0.18-µm CMOS experimental on-chip test bench. In combination with the off-chip processor materialized in FPGA and a host PC, fully automated on-chip waveform monitoring achieves high-throughput data acquisition of 300 ms per sample point with adaptive 10-bit timing and voltage resolutions at a minimum LSB of 100 ps and 400 µV, respectively. Analog signals of interest in a 1.5-bit conversion stage of a pipeline ADC were evaluated in terms of their response to substrate noises that globally existed in a chip. On-chip diagnosis derives in-depth findings relating to dynamic, large-signal, and sensitive behaviors of analog circuits in a real SoC environment, far beyond simulations with inevitably limited capacity.
Yasue YAMAMOTO Masanori SHIRAHAMA Toshiaki KAWASAKI Ryuji NISHIHARA Shinichi SUMI Yasuhiro AGATA Hirohito KIKUKAWA Hiroyuki YAMAUCHI
A novel PND (PMOS-NMOS-Depletion MOS) technology for a single poly gate non-volatile memory cell design has been reported for the first time. This technology features memory cell design with a differential cell architecture which enables to provide the higher performance for the key specifications such as programming time, erasing time, and endurance characteristics. This memory cell consists of 3-Transistors, PMOS, NMOS, and Depletion MOS transistors (hereafter PND). The DMOS in this cell is used for the tunneling device in the erasing operation, while the NMOS and the PMOS are used for the tunneling device and the coupling capacitor in the programming operation, respectively. The proposed PND design can allow lower applied voltage of the erase-gate (EG) and control-gate (CG) in the erasing and the programming operations so that the endurance characteristics can be improved because the DMOS suppresses the potential of floating-gate (FG) and hence the effective potential difference between the EG and the FG can be increased in the erasing operation. Based on the measured data, it can be expected that the erasing speed of the PND cell can be 125-fold faster than that of our previously reported work (PN type). Therefore, high performance and high reliability CMOS non-volatile memory without any additional process can be realized using this proposed PND technology.
Hiroyuki KOBAYASHI Nobuto ONO Takashi SATO Jiro IWAI Hidenari NAKASHIMA Takaaki OKUMURA Masanori HASHIMOTO
With the recent advance of process technology shrinking, process parameter variation has become one of the major issues in SoC designs, especially for timing convergence. Recently, Statistical Static Timing Analysis (SSTA) has been proposed as a promising solution to consider the process parameter variation but it has not been widely used yet. For estimating the delay yield, designers have to know and understand the accuracy of SSTA. However, the accuracy has not been thoroughly studied from a practical point of view. This paper proposes two metrics to measure the pessimism/optimism of SSTA; the first corresponds to yield estimation error, and the second examines delay estimation error. We apply the metrics for a problem which has been widely discussed in SSTA community, that is, normal-distribution approximation of max operation. We also apply the proposed metrics for benchmark circuits and discuss about a potential problem originating from normal-distribution approximation. Our metrics indicate that the appropriateness of the approximation depends on not only given input distributions but also the target yield of the product, which is an important message for SSTA users.
Yoshihide KOMATSU Koichiro ISHIBASHI Makoto NAGATA
This paper describes a method of reducing substrate noise and random variability utilizing a self-adjusted forward body bias (SA-FBB) circuit. To achieve this, we designed a test chip (130 nm CMOS 3-well) that contained an on-chip oscilloscope for detecting dynamic noise from various frequency noise sources, and another test chip (90 nm CMOS 2-well) that contained 10-M transistors for measuring random variability tendencies. Under SA-FBB conditions, it reduced noise by 35.3-69.8% and reduced random variability σ (Ids) by 23.2-57.9%.
Masahiko OMURA Toshiki KANAMOTO Michiko TSUKAMOTO Mitsutoshi SHIROTA Takashi NAKAJIMA Masayuki TERAI
This paper proposes a new efficient method of characterizing a memory compiler in order to reduce the computation time and remove human error. The new features that make our method greatly efficient are the following three points: (1) high-speed circuit simulation of the whole memory module using a hierarchical LPE (Layout Parasitic Extractor) and a hierarchical circuit simulator, (2) automatic generation of circuit simulation input data from corresponding parameterized description termed the template file, and (3) carefully selected environmental conditions of circuit level simulator and minimizing the number of runs of it. We demonstrate the effectiveness of the proposed method by application to the single-port SRAM generators using 90 nm CMOS technology.
Gi-Ho PARK Kil-Whan LEE Tack-Don HAN Shin-Dug KIM
This paper presents a dual data cache system structure, called a cooperative cache system, that is designed as a low power cache structure for embedded processors. The cooperative cache system consists of two caches, i.e., a direct-mapped temporal oriented cache (TOC) and a four-way set-associative spatial oriented cache (SOC). The cooperative cache system achieves improvement in performance and reduction in power consumption by virtue of the structural characteristics of the two caches designed inherently to help each other. An evaluation chip of an embedded processor having the cooperative cache system is manufactured by Samsung Electronics Co. with 0.25 µm 4-metal process technology.
Jan-Ou WU Chia-Chun TSAI Chung-Chieh KUO Trong-Yen LEE
In nature an unbalanced clock tree exists in a SoC because the clock sinks of IPs have distinct input capacitive loads and internal delays. The construction of a bottom-up RLC clock tree with minimal clock delay and zero skew is crucial to ensure good SoC performance. This study proves that an RLC clock tree construction always has no zero skew owing to skew upward propagation. Specifically, this study proposes the insertion of two unit-size buffers associated with the binary search for a tapping point into each pair of subtrees to interrupt the non-zero skew upward propagation. This technique enables reliable construction of a buffered RLC clock tree with zero skew. The effectiveness of the proposed approach is demonstrated by assessing benchmarks.
Masanori SANO Ichiro YAMADA Hideki SUMIYOSHI Nobuyuki YAGI
We describe an online method for selecting and annotating highlight scenes in soccer matches being televised. The stadium crowd noise and the play-by-play announcer's voice are used as input signals. Candidate scenes for highlights are extracted from the crowd noise by dynamic thresholding and spectral envelope analysis. Using a dynamic threshold solves the problem in conventional methods of how to determine an appropriate threshold. Semantic-meaning information about the kind of play and the related team and player is extracted from the announcer's commentary by using domain-based rules. The information extracted from the two types of audio input is integrated to generate segment-metadata of highlight scenes. Application of the method to six professional soccer games has confirmed its effectiveness.
Toshiki KANAMOTO Tatsuhiko IKEDA Akira TSUCHIYA Hidetoshi ONODERA Masanori HASHIMOTO
This paper proposes a simple yet sufficient Si-substrate modeling for interconnect resistance and inductance extraction. The proposed modeling expresses Si-substrate as four filaments in a filament-based extractor. Although the number of filaments is small, extracted loop inductances and resistances show accurate frequency dependence resulting from the proximity effect. We experimentally prove the accuracy using FEM (Finite Element Method) based simulations of electromagnetic fields. We also show a method to determine optimal size of the four filaments. The proposed model realizes substrate-aware extraction in SoC design flow.