The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] grain(79hit)


  • Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

    Xinning LIU  Chen MEI  Peng CAO  Min ZHU  Longxing SHI  

    PAPER-Design Methodology

    E95-D No:2

    This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), a dynamically coarse grain reconfigurable architecture. REMUS-II consists of a µPU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used to speeds up control-intensive tasks and data-intensive tasks respectively. The parallel computing capability and flexibility of REMUS-II makes itself an excellent candidate to process multimedia applications, which require a large amount of memory accesses. In this paper, we specifically optimize the data flow to deal with those performance-hazard and energy-hungry memory accessing in order to meet the bandwidth requirement of parallel computing. The RPU internal memory could work in multiple modes, like 2D-access mode and transformation mode, according to different multimedia access patterns. This novel design can improve the performance up to 26% compared to traditional on-chip memory. Meanwhile, the block buffer is implemented to optimize the off-chip data flow through reducing off-chip memory accesses, which reducing up to 43% compared to direct DDR access. Based on RTL simulation, REMUS-II can achieve 1080p@30 fps of H.264 High Profile@ Level 4 and High Level MPEG2 at 200 MHz clock frequency. The REMUS-II is implemented into 23.7 mm2 silicon on TSMC 65 nm logic process with a 400 MHz maximum working frequency.

  • Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture

    Shouyi YIN  Chongyong YIN  Leibo LIU  Min ZHU  Shaojun WEI  

    PAPER-Design Methodology

    E95-D No:2

    Coarse-grained reconfigurable architecture (CGRA) combines the performance of application-specific integrated circuits (ASICs) and the flexibility of general-purpose processors (GPPs), which is a promising solution for embedded systems. With the increasing complexity of reconfigurable resources (processing elements, routing cells, I/O blocks, etc.), the reconfiguration cost is becoming the performance bottleneck. The major reconfiguration cost comes from the frequent memory-read/write operations for transferring the configuration context from main memory to context buffer. To improve the overall performance, it is critical to reduce the amount of configuration context. In this paper, we propose a configuration context reduction method for CGRA. The proposed method exploits the structure correlation of computation tasks that are mapped onto CGRA and reduce the redundancies in configuration context. Experimental results show that the proposed method can averagely reduce the configuration context size up to 71% and speed up the execution up to 68%. The proposed method does not depend on any architectural feature and can be applied to CGRA with an arbitrary architecture.

  • Iterative Synthesis Methods Estimating Programmable-Wire Congestion in a Dynamically Reconfigurable Processor

    Takao TOI  Takumi OKAMOTO  Toru AWASHIMA  Kazutoshi WAKABAYASHI  Hideharu AMANO  

    PAPER-High-Level Synthesis and System-Level Design

    E94-A No:12

    Iterative synthesis methods for making aware of wire congestion are proposed for a multi-context dynamically reconfigurable processor (DRP) with a large number of processing elements (PEs) and programmable-wire connections. Although complex data-paths can be synthesized using the programmable-wire, its delay is long especially when wire connections are congested. We propose two iterative synthesis techniques between a high-level synthesizer (HLS) and the place & route tool to shorten the prolonged wire delay. First, we feed back wire delays for each context to a scheduler in the HLS. The experimental results showed that a critical-path delay was shorten by 21% on average for applications with timing closure problems. Second, we skip the routing and estimate wire delays based on the congestion. The synthesis time was shorten to 1/3 causing delay improvement rate degradation at two points on average.

  • SAWSDL Service Discovery Based on Fine-Grained Data Semantics

    Dengping WEI  Ting WANG  Ji WANG  


    E94-D No:3

    With the aim to improve the effectiveness of SAWSDL service discovery, this paper proposes a novel discovery method for SAWSDL services, which is based on the matchmaking of so-called fine-grained data semantics that is defined via sets of atomic elements with built-in data types. The fine-grained data semantics can be obtained by a transformation algorithm that decomposes parameters at message level into a set of atomic elements, considering the characteristics of SAWSDL service structure and semantic annotations. Then, a matchmaking algorithm is proposed for the matching of fine-grained data semantics, which avoids the complex and expensive structural matching at the message level. The fine-grained data semantics is transparent to the specific data structure of message-level parameters, therefore, it can help to match successfully similar Web services with different data structures of parameters. Moreover, a comprehensive measure is proposed by considering together several important components of SAWSDL service descriptions at the same time. Finally, this method is evaluated on SAWSDL service discovery test collection SAWSDL-TC2 and compared with other SAWSDL matchmakers. The experimental results show that our method can improve the effectiveness of SAWSDL service discovery with low average query response time. The results imply that fine-grained parameters fit to represent the data semantics of SAWSDL services, especially when data structures of parameters are not important for semantics.

  • An Instruction Mapping Scheme for FU Array Accelerator

    Kazuhiro YOSHIMURA  Takuya IWAKAMI  Takashi NAKADA  Jun YAO  Hajime SHIMADA  Yasuhiko NAKASHIMA  

    PAPER-Computer System

    E94-D No:2

    Recently, we have proposed using a Linear Array Pipeline Processor (LAPP) to improve energy efficiency for various workloads such as image processing and to maintain programmability by working on VLIW codes. In this paper, we proposed an instruction mapping scheme for LAPP to fully exploit the array execution of functional units (FUs) and bypass networks by a mapper to fit the VLIW codes onto the FUs. The mapping can be finished within multi-cycles during a data prefetch before the array execution of FUs. According to an HDL based implementation, the hardware required for mapping scheme is 84% of the cost introduced by a baseline method. In addition, the proposed mapper can further help to shrink the size of array stage, as our results show that their combination becomes 88% of the baseline model in area.

  • Logic-In-Control-Architecture-Based Reconfigurable VLSI Using Multiple-Valued Differential-Pair Circuits

    Nobuaki OKADA  Michitaka KAMEYAMA  

    PAPER-Application of Multiple-Valued VLSI

    E93-D No:8

    A fine-grain bit-serial multiple-valued reconfigurable VLSI based on logic-in-control architecture is proposed for effective use of the hardware resources. In logic-in-control architecture, the control circuits can be merged with the arithmetic/logic circuits, where the control and arithmetic/logic circuits are constructed by using one or multiple logic blocks. To implement the control circuit, only one state in a state transition diagram is allocated to one logic block, which leads to reduction of the complexity of interconnections between logic blocks. The fine-grain logic block is implemented based on multiple-valued current-mode circuit technology. In the fine-grain logic block, an arbitrary 3-variable binary function can be programmed by using one multiplexer and two universal literal circuits. Three-variable binary functions are used to implement the control circuit. Moreover, the hardware resources can be utilized to construct a bit-serial adder, because full-adder sum and carry can be realized by programming in the universal literal circuit. Therefore, the logic block can be effectively reconfigured for arithmetic/logic and control circuits. It is made clear that the hardware complexity of the control circuit in the proposed reconfigurable VLSI can be reduced in comparison with that of the control circuit based on a typically sequential circuit in the conventional FPGA and the fine-grain field-programmable VLSI reported until now.

  • Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture

    Cao LIANG  Xinming HUANG  

    PAPER-Integrated Electronics

    E93-C No:3

    Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.

  • Pipelining a Multi-Mode SHA-384/512 Core with High Area Performance Rate

    Anh-Tuan HOANG  Katsuhiro YAMAZAKI  Shigeru OYANAGI  

    PAPER-VLSI Systems

    E92-D No:10

    The security hash algorithm 512 (SHA-512), which is used to verify the integrity of a message, involves computational iterations on data. The huge computation delay generated in such iterations limits the entire throughput of the system and makes it difficult to pipeline the computation. We describe a way to pipeline the computation using fine-grained pipelining with balanced critical paths. In this method, one critical path is broken into two stages by using data forwarding. The other critical path is broken into three stages by using computation postponement. The resulting critical paths all have two adder-layers with some data movements, and thus are balanced. In addition, the method also allows register reduction. Also, the similarity in SHA-384 and SHA-512 are used for a multi-mode design, which can generate a message digest for both versions with the same throughput, but with only a small increase in hardware size. Experimental results show that our implementation achieved not only the best area performance rate (throughput divided by area), but also a higher throughput than almost all related work.

  • Fine-Grain Multiple-Valued Reconfigurable VLSI Using Series-Gating Differential-Pair Circuits and Its Evaluation

    Nobuaki OKADA  Michitaka KAMEYAMA  


    E91-C No:9

    A fine-grain reconfigurable VLSI for various applications including arithmetic operations is developed. In the fine-grain architecture, it is important to define a cell function which leads to high utilization of a logic block and reduction of a switch block. From the point of view, a universal-literal-based multiple-valued cell suitable for bit-serial reconfigurable computation is proposed. A series-gating differential-pair circuit is effectively employed for implementing a full-adder circuit of Sum and a universal literal circuit. Therefore, a simple logic block can be constructed using the circuit technology. Moreover, interconnection complexity can be reduced by utilizing multiple-valued signaling, where superposition of serial data bits and a start signal which indicates heading of one-word is introduced. Differential-pair circuits are also effectively employed for current-output replication, which leads to high-speed signaling to adjacent cells The evaluation is done based on 90 nm CMOS design rule, and it is made clear that the area of the proposed cell can be reduced to 78% in comparison with that of the CMOS implementatiuon. Moreover, its area-time product becomes 92% while the delay time is increased by 18%.

  • Measurement and Evaluation of Submillimeter-Wave Antenna Quasioptical Feed System by a Phase-Retrieval Method in the 640-GHz Band

    Takeshi MANABE  Tomo FUKAMI  Toshiyuki NISHIBORI  Kazuo MIZUKOSHI  Satoshi OCHIAI  


    E91-B No:6

    A phase-retrieval method is applied to the quasioptical feed system of the offset Cassegrain antenna of the Superconducting Submillimeter-Wave Limb-Emission Sounder (JEM/SMILES) to be aboard the International Space Station for evaluating the beam alignment by estimating the phase pattern from the beam amplitude pattern measurements. As the result, the application of the phase retrieval method is demonstrated to be effective for measuring and evaluating the quasioptical antenna feed system. It is also demonstrated that the far-field radiation pattern of the antenna main reflector can be estimated from the phase-retrieved beam pattern of the feed system.

  • Filtering in Generalized Signal-Dependent Noise Model Using Covariance Information


    PAPER-Digital Signal Processing

    E91-A No:3

    In this paper, we propose a recursive filtering algorithm to restore monochromatic images which are corrupted by general dependent additive noise. It is assumed that the equation which describes the image field is not available and a filtering algorithm is obtained using the information provided by the covariance functions of the signal, noise that affects the measurement equation, and the fourth-order moments of the signal. The proposed algorithm is obtained by an innovation approach which provides a simple derivation of the least mean-squared error linear estimators. The estimation of the grey level in each spatial coordinate is made taking into account the information provided by the grey levels located on the row of the pixel to be estimated. The proposed filtering algorithm is applied to restore images which are affected by general signal-dependent additive noise.

  • Design and Evaluation of a Massively Parallel Processor Based on Matrix Architecture

    Toru SHIMIZU  Masami NAKAJIMA  Masahiro KAINAGA  


    E89-C No:11

    This paper describes the design and evaluation of a massively parallel processor base on Matrix architecture which is suitable for portable multimedia applications. The proposed architecture in this paper achieves 40 GOPS of 16-bit fixed-point additions at 200 MHz clock frequency and 250 mW power dissipation. In addition, 1 M-bit SRAM for data registers and 2,048 2-bit processing elements connected by a flexible switching network are integrated in 3.1 mm2 in 90 nm low-power CMOS technology. The energy-efficient Matrix architecture supports 2,048-way parallel operations and the programmable functions required for multimedia SoCs.

  • Role of Hydrogen in Polycrystallne Si by Excimer Laser Annealing

    Naoya KAWAMOTO  Naoto MATSUO  Atsushi MASUDA  Yoshitaka KITAMON  Hideki MATSUMURA  Yasunori HARADA  Tadaki MIYOSHI  Hiroki HAMADA  

    PAPER-Semiconductor Materials and Devices

    E88-C No:2

    The role of hydrogen in the Si film during excimer laser annealing (ELA) has been successfully studied by using a novel sample structure, which is stacked by a-Si film and SiN film. Hydrogen contents in the Si films during ELA are changed by preparing samples with hydrogen content of 2.3-8.2 at.% in the SiN films with a use of catalytic (Cat)-CVD method. For the low concentration of hydrogens in the Si film, the grain size increases by decreasing hydrogen concentration in the Si film, and the internal stress of the film decreases as increasing the shot number. For the high concentration of hydrogens in the Si film, hydrogen burst was observed at 500 mJ/cm2 and the dependence of the internal stress on the shot number becomes weak even at 318 mJ/cm2. These phenomena can be understood basically using the secondary grain growth mechanism, which we have proposed.

  • Dynamically Reconfigurable Processor Implemented with IPFlex's DAPDNA Technology

    Takayuki SUGAWARA  Keisuke IDE  Tomoyoshi SATO  


    E87-D No:8

    The DAPDNA®-2 is the world's first general purpose dynamically reconfigurable processor for commercial usage. It is a dual-core processor consisting of a custom RISC core called the Digital Application Processor (DAP), and a two dimensional array of dynamically reconfigurable processing elements referred to as the Distributed Network Architecture (DNA). The DAP has a 32 bit instruction set architecture with an 8 KB instruction cache and 8 KB data cache that can be accessed in one clock cycle. It has an interrupt control function to detect data processing completion in the DNA-Matrix. The DNA-Matrix has different types of data processing elements such as ALU, delay, and memory elements to process fully parallel computations. The DNA-Matrix includes 32 independent 16 KB high speed SRAM elements (in total 512 KB). The DNA-Matrix, even with its parallel computational capability, can be synchronized and co-work at the same clock frequency as the DAP. The processor operates at a 166 MHz working frequency and fabricated with a 0.11 µm CMOS process. The DAPDNA-2 device can be connected directly with up to 16 units with linear scalability in processing performance, provided the bandwidth requirement is within the maximum communication speed between DNAs, which is 32 Gbps. The DAPDNA-2 performs at a level that is two orders of magnitude higher than conventional high performance processors.

  • Effects of Various Rare Earth Sesquioxide Additives on Grain Growth in Millimeter-Wave Sintered Silicon Nitride Ceramics

    Masayuki HIROTA  Maria-Cecilia VALECILLOS  Manuel E. BRITO  Kiyoshi HIRAO  Motohiro TORIYAMA  

    PAPER-Millimeter-Wave Heating

    E86-C No:12

    Using various rare earth sesquioxides as additives, silicon nitride (Si3N4) samples were sintered at 1700 for 4 h by millimeter-wave heating performed in an applicator fed by a 28 GHz Gyrotron source under a nitrogen pressure of 0.1 MPa. A comparative study of densification, grain growth behavior and mechanical properties of silicon nitride fabricated by millimeter-wave and conventional sintering was carried out. Bulk densities were measured by Archimedes' technique. Except for the Eu2O3 containing sample, all samples were densified to relative densities of above 97.0%. Microstructure of the specimens was analyzed by scanning electron microscopy (SEM) and transmission electron microscopy (TEM). To investigate quantitatively the effect of millimeter-wave heating on grain growth, image analysis was carried out for grains in the specimens. Fracture toughness was determined by the indentation-fracture method (IF method) in accordance with Japan Industrial Standards (JIS). Fully dense millimeter-wave sintered silicon nitride presenting a bimodal microstructure exhibited higher values of fracture toughness than materials processed by conventional heating techniques. Results indicate that millimeter-wave sintering is more effective in enhancing the grain growth and in producing the bimodal microstructure than conventional heating. It was also confirmed that localized runaway in temperature, depending upon the sintering additives, can occur under millimeter-wave heating.

  • High-Resolution Beam Profiler for Engineering Laterally-Grown Grain Morphology

    Masayuki JYUMONJI  Yoshinobu KIMURA  Masato HIRAMATSU  Yukio TANIGUCHI  Masakiyo MATSUMURA  


    E86-C No:11

    A two-dimensional laser beam profiler has been developed that can measure the intensity distribution on a sample surface of a single-shot of an excimer-laser light beam from not only the macroscopic viewpoint, but also the microscopic viewpoint, which is important to excimer-laser triggered lateral large-grain growth of Si. A resolution as fine as 0.4 µm was obtained with a field of view of as large as 30 µm 30 µm. The effects of homogenizers, phase-shifters, and their combination on beam profiles were quantitatively investigated by using this apparatus. The relationship between the microscopic beam profile and the surface morphology of laterally grown grains was also examined.

  • Effects of Grain Size and Orientation on Magnetic Properties of CoCrPt/Ti Films for Perpendicular Magnetic Recording

    Pyungwoo JANG  Sooyoul HONG  


    E86-C No:9

    Several 2 nm seed layers were sputtered to increase coercivity (Hc) and anisotropy (Ku) of CoCrPt/Ti perpendicular recording media. Among them 2 nm Ag seed layer was very effective to increase Hc of (Co78Cr22)100-xPtx/Ti (x = 14, 20). However, the effect was more pronounced when (Co78Cr22)100-xPtx/Ti became thinner. In addition α[=4π(dM/dH)Hc] decreased when the Ag layer was used. The film thickness below which the seed Ag layer was effective was reduced with decreasing Pt content. However, the Ag seed layer did not promote (0002) texture of Ti and CoCrPt layers. Domain size was reduced when the Ag seed layer was used. The effects of Ag seed layer are thought to be due to change of exchange constant of the grains, for which the grain boundary plays an important role. Effects of film thickness and Pt content can also be explained successfully by the variation of exchange constant due to grain boundary. Some experimental evidence as well as crude mode for exchange constant variation are given.

  • Effects of Grain Size Distribution in Recording Layer on SNR and Thermal Stability in Double Layered Perpendicular Media

    Sung Chul LEE  Young Wook TAHK  Taek Dong LEE  


    E86-C No:9

    In this work, micromagnetic simulations of writing and reading processes in a perpendicular system including a single pole head and recording media with soft underlayer (SUL) have been performed. The noise contribution from the recording layer increased with increasing grain size distribution of the recording layer but that from soft underlayer remained almost a constant at a given linear density. Details of the noise from the soft underlayer will be discussed. Also thermal decay over a long time scale of the recorded bits was investigated by the Langevin equation and the time-temperature scaling method. It was found that at the linear density of 1058 kfci narrower grain size distribution in the recording layer even in the same average grain size is very important in the point of thermal decay than expectation.

  • Multigrain Parallel Processing on Compiler Cooperative OSCAR Chip Multiprocessor Architecture

    Keiji KIMURA  Takeshi KODAKA  Motoki OBATA  Hironori KASAHARA  

    PAPER-Architecture and Algorithms

    E86-C No:4

    This paper describes multigrain parallel processing on OSCAR (Optimally SCheduled Advanced multiprocessoR) chip multiprocessor architecture. OSCAR compiler cooperative chip multiprocessor architecture aims at development of scalable, high effective performance and cost effective chip multiprocessor with ease of use by compiler supports. OSCAR chip multiprocessor architecture integrates simple single issue processors having distributed shared data memory for optimal use of data locality over different loops and fine grain data transfer and synchronization, local data memory for private data recognized by compiler, and compiler controllable data transfer unit for overlapping data transfer to hide data transfer overhead. This OSCAR chip multiprocessor and OSCAR multigrain parallelizing compiler have been developed simultaneously. Performance of multigrain parallel processing on OSCAR chip multiprocessor architecture is evaluated using SPEC fp 2000/95 benchmark suite. When microSPARC like single issue core is used, OSCAR chip multiprocessor architecture gives us 2.36 times speedup in fpppp, 2.64 times in su2cor, 2.88 times in turb3d, 2.98 times in hydro2d, 3.84 times in tomcatv, 3.84 times in mgrid and 3.97 times in swim respectively for four processors against single processor.

  • Excimer-Laser-Induced Zone-Melting-Recrystallization of Silicon Thin Films on Large Glass Substrates and Its Application to TFTs

    Hiromichi TAKAOKA  Yoshinobu SATOU  Takaomi SUZUKI  Takuya SASAKI  Hiroshi TANABE  Hiroshi HAYAMA  

    PAPER-Active Matrix Displays

    E85-C No:11

    We have successfully produced laterally-grown grains on large (300 350 mm) glass substrates by means of a newly developed excimer laser crystallization system that features a high-precision mask stage and an auto-focusing system. The original grains were produced with a steep beam edge and their lateral growth was extended by repeated irradiation and translation. TFTs fabricated with these extended grains were found to have mobilities that remained almost constant at 270 cm2/Vs (n-ch. TFTs) and 230 cm2/Vs (p-ch. TFTs) over a wide range of laser fluence (400-600 mJ/cm2).
