The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] grain(79hit)

21-40hit(79hit)

  • High Quality Pentacene Film Formation on N-Doped LaB6 Donor Layer

    Yasutaka MAEDA  Shun-ichiro OHMI  Tetsuya GOTO  Tadahiro OHMI  

     
    PAPER

      Vol:
    E99-C No:5
      Page(s):
    535-540

    In this research, we have investigated the deposition condition of pentacene film on nitrogen doped (N-doped) LaB6 donor layer for larger grain growth at the channel region for bottom-contact type pentacene-based organic field-effect transistors (OFETs) to improve the device characteristics. Source and drain bottom-contacts of Al were patterned and 2nm-thick N-doped LaB6 donor layer was deposited on the SiO2/Si(100) back-gate structure. The dendritic grain growth of pentacene larger than 10µm without lamellar grain growth was demonstrated when the deposition temperature and rate were 100°C and 0.5nm/min, respectively. Furthermore, it was found that the dendritic grain growth was realized at the boundary region of bottom-contact as well as channel region.

  • Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators

    Yoshikazu INAGAKI  Shinya TAKAMAEDA-YAMAZAKI  Jun YAO  Yasuhiko NAKASHIMA  

     
    PAPER-Architecture

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2141-2149

    The Energy-aware Multi-mode Accelerator eXtension [24],[25] (EMAX) is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data, and image processing as well as low-power consumption. However, before mapping algorithms on the accelerator, application developers require sufficient knowledge of the hardware organization and specially designed instructions. They also need significant effort to tune the code for improving execution efficiency when no well-designed compiler or library is available. To address this problem, we focus on library support for stencil (nearest-neighbor) computations that represent a class of algorithms commonly used in many partial differential equation (PDE) solvers. In this research, we address the following topics: (1) system configuration, features, and mnemonics of EMAX; (2) instruction mapping techniques that reduce the amount of data to be read from the main memory; (3) performance evaluation of the library for PDE solvers. With the features of a library that can reuse the local data across the outer loop iterations and map many instructions by unrolling the outer loops, the amount of data to be read from the main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library reduced the execution time 23% more than a general-purpose processor.

  • An Error Correction Scheme through Time Redundancy for Enhancing Persistent Soft-Error Tolerance of CGRAs

    Takashi IMAGAWA  Masayuki HIROMOTO  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-Integrated Electronics

      Vol:
    E98-C No:7
      Page(s):
    741-750

    Time redundancy is sometimes an only option for enhancing circuit reliability when the circuit area is severely restricted. In this paper, a time-redundant error-correction scheme, which is particularly suitable for coarse-grained reconfigurable arrays (CGRAs), is proposed. It judges the correctness of the executions by comparing the results of two identical runs. Once a mismatch is found, the second run is terminated immediately to start the third run, under the assumption that the errors tend to persist in many applications, for selecting the correct result in the three runs. The circuit area and reliability of the proposed method is compared with a straightforward implementation of time-redundancy and a selective triple modular redundancy (TMR). A case study on a CGRA revealed that the area of the proposed method is 1% larger than that of the implementation for the selective TMR. The study also shows the proposed scheme is up to 2.6x more reliable than the full-TMR when the persistent error is predominant.

  • Mapping Multi-Level Loop Nests onto CGRAs Using Polyhedral Optimizations

    Dajiang LIU  Shouyi YIN  Leibo LIU  Shaojun WEI  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1419-1430

    The coarse-grained reconfigurable architecture (CGRA) is a promising computing platform that provides both high performance and high power-efficiency. The computation-intensive portions of an application (e.g. loop nests) are often mapped onto CGRA for acceleration. However, mapping loop nests onto CGRA efficiently is quite a challenge due to the special characteristics of CGRA. To optimize the mapping of loop nests onto CGRA, this paper makes three contributions: i) Establishing a precise performance model of mapping loop nests onto CGRA, ii) Formulating the loop nests mapping as a nonlinear optimization problem based on polyhedral model, iii) Extracting an efficient heuristic algorithm and building a complete flow of mapping loop nests onto CGRA (PolyMAP). Experiment results on most kernels of the PolyBench and real-life applications show that our proposed approach can improve the performance of the kernels by 27% on average, as compared to the state-of-the-art methods. The runtime complexity of our approach is also acceptable.

  • A Participating Fine-Grained Cloud Computing Platform with In-Network Guidance

    Kento NISHII  Yosuke TANIGAWA  Hideki TODE  

     
    PAPER-Network

      Vol:
    E98-B No:6
      Page(s):
    1008-1017

    What should be the ultimate form of the cloud computing environment? The solution should have two important features; “Fine-Granularity” and “Participation.” To realize an attractive and feasible solution with these features, we propose a “participating fine-grained cloud computing platform” that a large number of personal or small-company resource suppliers participate in, configure and provide cloud computing on. This enables users to be supplied with smaller units of resources such as computing, memory, content, and applications, in comparison with the traditional Infrastructure as a Service (IaaS). Furthermore, to search for nearby resources efficiently among the many available on the platform, we also propose Resource Breadcrumbs (RBC) as a key technology of our proposed platform to provide in-network guidance capability autonomously for users' queries. RBC allows supplier-nodes to distribute guidance information directed to themselves with dedicated control messages; in addition, the information can be logged along the trail of message from supplier to user. With this distributed information, users can to autonomously locate nearby resources. Distributed management also reduces computational load on the central database and enables a participating fine-grained cloud platform at lower cost.

  • Novel Phased Array-Fed Dual-Reflector Antenna with Different Orthogonal Cross-Section by Imaging Reflector Antenna and Ring-Focus Cassegrain Antenna

    Michio TAKIKAWA  Yoshio INASAWA  Hiroaki MIYASHITA  Izuru NAITO  

     
    PAPER

      Vol:
    E98-C No:1
      Page(s):
    8-15

    We propose a novel phased array-fed dual-reflector antenna that reduces performance degradation caused by multiple reflection. The marked feature of the proposed configuration is that different reflector profiles are employed for the two orthogonal directions. The reflector profile in the beam-scanning section (vertical section) is set to an imaging reflector configuration, while the profile in the orthogonal non-beam-scanning section (horizontal section) is set to a ring-focus Cassegrain antenna configuration. In order to compare the proposed antenna with the conventional antenna in which multiple reflection was problematic, we designed a prototype antenna of the same size, and verified the validity of the proposed antenna. The results of the verification were that the gain in the designed central frequency increased by 0.4 dB, and the ripple of the gain frequency properties that was produced by multiple reflection was decreased by 1.1,dB. These results demonstrated the validity of the proposed antenna.

  • Implementation of Voltage-Mode/Current-Mode Hybrid Circuits for a Low-Power Fine-Grain Reconfigurable VLSI

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E97-C No:10
      Page(s):
    1028-1035

    This paper proposes low-power voltage-mode/current-mode hybrid circuits to realize an arbitrary two-variable logic function and a full-adder function. The voltage and current mode can be selected for low-power operations at low and high frequency, respectively, according to speed requirement. An nMOS pass transistor network is shared to realize voltage switching and current steering for the voltage- and current-mode operations, respectively, which leads to high utilization of the hardware resources. As a result, when the operating frequency is more than 1.15,GHz, the current mode of the hybrid logic circuit is more power-efficient than the voltage mode. Otherwise, the voltage mode is more power-efficient. The power consumption of the hybrid two-variable logic circuit is lower than that of the conventional two-input look-up table (LUT) using CMOS transmission gates, when the operating frequency is more than 800,MHz. The delay and area of the hybrid two-variable logic circuit are increased by only 7% and 13%, respectively

  • Multiple-Valued Fine-Grain Reconfigurable VLSI Using a Global Tree Local X-Net Network

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-VLSI Architecture

      Vol:
    E97-D No:9
      Page(s):
    2278-2285

    A global tree local X-net network (GTLX) is introduced to realize high-performance data transfer in a multiple-valued fine-grain reconfigurable VLSI (MVFG-RVLSI). A global pipelined tree network is utilized to realize high-performance long-distance bit-parallel data transfer. Moreover, a logic-in-memory architecture is employed for solving data transfer bottleneck between a block data memory and a cell. A local X-net network is utilized to realize simple interconnections and compact switch blocks for eight-near neighborhood data transfer. Moreover, multiple-valued signaling is utilized to improve the utilization of the X-net network, where two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. To evaluate the MVFG-RVLSI, a fast Fourier transform (FFT) operation is mapped onto a previous MVFG-RVLSI using only the X-net network and the MVFG-RVLSI using the GTLX. As a result, the computation time, the power consumption and the transistor count of the MVFG-RVLSI using the GTLX are reduced by 25%, 36% and 56%, respectively, in comparison with those of the MVFG-RVLSI using only the X-net network.

  • Enhanced Film Grain Noise Removal and Synthesis for High Fidelity Video Coding

    Inseong HWANG  Jinwoo JEONG  Sungjei KIM  Jangwon CHOI  Yoonsik CHOE  

     
    PAPER-Image

      Vol:
    E96-A No:11
      Page(s):
    2253-2264

    In this paper, we propose a novel technique for film grain noise removal and synthesis that can be adopted in high fidelity video coding. Film grain noise enhances the natural appearance of high fidelity video, therefore, it should be preserved. However, film grain noise is a burden to typical video compression systems because it has relatively large energy levels in the high frequency region. In order to improve the coding performance while preserving film grain noise, we propose film grain noise removal in the pre-processing step and film grain noise synthesis in the post processing step. In the pre-processing step, the film grain noise is removed by using temporal and inter-color correlations. Specifically, color image denoisng using inter color prediction provides good denoising performance in the noise-concentrated B plane, because film grain noise has inter-color correlation in the RGB domain. In the post-processing step, we present a noise model to generate noise that is close to the actual noise in terms of a couple of observed statistical properties, such as the inter-color correlation and power of the film grain noise. The results show that the coding gain of the denoised video is higher than for previous works, while the visual quality of the final reconstructed video is well preserved.

  • Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC

    Gugang GAO  Peng CAO  Jun YANG  Longxing SHI  

     
    PAPER-Application

      Vol:
    E96-D No:8
      Page(s):
    1654-1666

    One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 44, parallel processing order for deblocking. With our mapping strategies, we improved the algorithm's performance on REMUS-II. For example, with a luma 1616 MB, the Hybrid VBSMC achieves 4 times greater performance than VBSMC and 2.2 times greater performance than fixed 44 partition approach. Finally, we achieve 1080p@33fps H.264 high-profile (HiP)@level 4.1 decoding when the working frequency of REMUS-II is 200 MHz. Compared with typical hardware platforms, we can achieve better performance, area, and flexibility. For example, our performance achieves approximately 175% improvement than that of a commercial CGRA processor XPP-III while only using 70% of its area.

  • A Multiple-Valued Reconfigurable VLSI Architecture Using Binary-Controlled Differential-Pair Circuits

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E96-C No:8
      Page(s):
    1083-1093

    This paper presents a fine-grain bit-serial reconfigurable VLSI architecture using multiple-valued switch blocks and binary logic modules. Multiple-valued signaling is utilized to implement a compact switch block. A binary-controlled current-steering technique is introduced, utilizing a programmable three-level differential-pair circuit to implement a high-performance low-power arbitrary two-variable binary function, and increase the noise margins in comparison with the quaternary-controlled differential-pair circuit. A current-source sharing technique between a series-gating differential-pair circuit and a current-mode D-latch is proposed to reduce the current source count and improve the speed. It is demonstrated that the power consumption and the delay of the proposed multiple-valued cell based on the binary-controlled current-steering technique and the current-source-sharing technique are reduced to 63% and 72%, respectively, in comparison with those of a previous multiple-valued cell.

  • Field Slack Assessment for Predictive Fault Avoidance on Coarse-Grained Reconfigurable Devices

    Toshihiro KAMEDA  Hiroaki KONOURA  Dawood ALNAJJAR  Yukio MITSUYAMA  Masanori HASHIMOTO  Takao ONOYE  

     
    PAPER-Test and Verification

      Vol:
    E96-D No:8
      Page(s):
    1624-1631

    This paper proposes a procedure for avoiding delay faults in field with slack assessment during standby time. The proposed procedure performs path delay testing and checks if the slack is larger than a threshold value using selectable delay embedded in basic elements (BE). If the slack is smaller than the threshold, a pair of BEs to be replaced, which maximizes the path slack, is identified. Experimental results with two application circuits mapped on a coarse-grained architecture show that for aging-induced delay degradation a small threshold slack, which is less than 1 ps in a test case, is enough to ensure the delay fault prediction.

  • A Bit-Serial Reconfigurable VLSI Based on a Multiple-Valued X-Net Data Transfer Scheme

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Computer System

      Vol:
    E96-D No:7
      Page(s):
    1449-1456

    A multiple-valued data transfer scheme using X-net is proposed to realize a compact bit-serial reconfigurable VLSI (BS-RVLSI). In the multiple-valued data transfer scheme using X-net, two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. One cell composed of a logic block and a switch block is connected to four adjacent cross points by four one-bit switches so that the complexity of the switch block is reduced to 50% in comparison with the cell of a BS-RVLSI using an eight nearest-neighbor mesh network (8-NNM). In the logic block, threshold logic circuits are used to perform threshold operations, and then their binary dual-rail voltage outputs enter a binary logic module which can be programmed to realize an arbitrary two-variable binary function or a bit-serial adder. As a result, the configuration memory count and transistor count of the proposed multiple-valued cell are reduced to 34% and 58%, respectively, in comparison with those of an equivalent CMOS cell. Moreover, its power consumption for an arbitrary 2-variable binary function becomes 67% at 800 MHz under the condition of the same delay time.

  • Fine-Grained Run-Tume Power Gating through Co-optimization of Circuit, Architecture, and System Software Design Open Access

    Hiroshi NAKAMURA  Weihan WANG  Yuya OHTA  Kimiyoshi USAMI  Hideharu AMANO  Masaaki KONDO  Mitaro NAMIKI  

     
    INVITED PAPER

      Vol:
    E96-C No:4
      Page(s):
    404-412

    Power consumption has recently emerged as a first class design constraint in system LSI designs. Specially, leakage power has occupied a large part of the total power consumption. Therefore, reduction of leakage power is indispensable for efficient design of high-performance system LSIs. Since 2006, we have carried out a research project called “Innovative Power Control for Ultra Low-Power and High-Performance System LSIs”, supported by Japan Science and Technology Agency as a CREST research program. One of the major objectives of this project is reducing the leakage power consumption of system LSIs by innovative power control through tight cooperation and co-optimization of circuit technology, architecture, and system software designs. In this project, we focused on power gating as a circuit technique for reducing leakage power. Temporal granularity is one of the most important issue in power gating. Thus, we have developed a series of Geysers as proof-of-concept CPUs which provide several mechanisms of fine-grained run-time power gating. In this paper, we describe their concept and design, and explain why co-optimization of different design layers are important. Then, three kinds of power gating implementations and their evaluation are presented from the view point of power saving and temporal granularity.

  • Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip

    Hung K. NGUYEN  Peng CAO  Xue-Xiang WANG  Jun YANG  Longxing SHI  Min ZHU  Leibo LIU  Shaojun WEI  

     
    PAPER-Computer System

      Vol:
    E96-D No:3
      Page(s):
    601-615

    REMUS-II (REconfigurable MUltimedia System 2) is a coarse-grained dynamically reconfigurable computing system for multimedia and communication baseband processing. This paper proposes a real-time H.264 baseline profile encoder on REMUS-II. First, we propose an overall mapping flow for mapping algorithms onto the platform of REMUS-II system and then illustrate it by implementing the H.264 encoder. Second, parallel and pipelining techniques are considered for fully exploiting the abundant computing resources of REMUS-II, thus increasing total computing throughput and solving high computational complexity of H.264 encoder. Besides, some data-reuse schemes are also used to increase data-reuse ratio and therefore reduce the required data bandwidth. Third, we propose a scheduling scheme to manage run-time reconfiguration of the system. The scheduling is also responsible for synchronizing the data communication between tasks and handling conflict between hardware resources. Experimental results prove that the REMUS-MB (REMUS-II version for mobile applications) system can perform a real-time H.264/AVC baseline profile encoder. The encoder can encode CIF@30 fps video sequences with two reference frames and maximum search range of [-16,15]. The implementation, thereby, can be applied to handheld devices targeted at mobile multimedia applications. The platform of REMUS-MB system is designed and synthesized by using TSMC 65 nm low power technology. The die size of REMUS-MB is 13.97 mm2. REMUS-MB consumes, on average, about 100 mW while working at 166 MHz. To my knowledge, in the literature this is the first implementation of H.264 encoding algorithm on a coarse-grained dynamically reconfigurable computing system.

  • A Data Prefetch and Reuse Strategy for Coarse-Grained Reconfigurable Architectures

    Wei GE  Zhi QI  Yue DU  Lu MA  Longxing SHI  

     
    PAPER-Computer System

      Vol:
    E96-D No:3
      Page(s):
    616-623

    The Coarse Grained Reconfigurable Architectures (CGRAs) are proposed as new choices for enhancing the ability of parallel processing. Data transfer throughput between Reconfigurable Cell Array (RCA) and on-chip local memory is usually the main performance bottleneck of CGRAs. In order to release this stress, we propose a novel data transfer strategy that is called Heuristic Data Prefetch and Reuse (HDPR), for the first time in the case of explicit CGRAs. The HDPR strategy provides not only the flexible data access schedule but also the high data throughput needed to realize fast pipelined implementations of various loop kernels. To improve the data utilization efficiency, a dual-bank cache-like data reuse structure is proposed. Furthermore, a heuristic data prefetch is also introduced to decrease the data access latency. Experimental results demonstrate that when compared with conventional explicit data transfer strategies, our work achieves a significant speedup improvement of, on average, 1.73 times at the expense of only 5.86% increase in area.

  • High-Tc Superconducting Electronic Devices Based on YBCO Step-Edge Grain Boundary Junctions Open Access

    Shane T. KEENAN  Jia DU  Emma E. MITCHELL  Simon K. H. LAM  John C. MACFARLANE  Chris J. LEWIS  Keith E. LESLIE  Cathy P. FOLEY  

     
    INVITED PAPER

      Vol:
    E96-C No:3
      Page(s):
    298-306

    We outline a number of high temperature superconducting Josephson junction-based devices including superconducting quantum interference devices (SQUIDs) developed for a wide range of applications including geophysical exploration, magnetic anomaly detection, terahertz (THz) imaging and microwave communications. All these devices are based on our patented technology for fabricating YBCO step-edge junction on MgO substrates. A key feature to the successful application of devices based on this technology is good stability, long term reliability, low noise and inherent flexibility of locating junctions anywhere on a substrate.

  • Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

    Bo LIU  Peng CAO  Min ZHU  Jun YANG  Leibo LIU  Shaojun WEI  Longxing SHI  

     
    PAPER-Computer System

      Vol:
    E95-D No:7
      Page(s):
    1858-1871

    This paper presents a novel architecture design to optimize the reconfiguration process of a coarse-grained reconfigurable architecture (CGRA) called Reconfigurable Multimedia System II ( REMUS-II ). In REMUS-II, the tasks in multi-media applications are divided into two parts: computing-intensive tasks and control-intensive tasks. Two Reconfigurable Processor Units (RPUs) for accelerating computing-intensive tasks and a Micro-Processor Unit (µPU) for accelerating control-intensive tasks are contained in REMUS-II. As a large-scale CGRA, REMUS-II can provide satisfying solutions in terms of both efficiency and flexibility. This feature makes REMUS-II well-suited for video processing, where higher flexibility requirements are posed and a lot of computation tasks are involved. To meet the high requirement of the dynamic reconfiguration performance for multimedia applications, the reconfiguration architecture of REMUS-II should be well designed. To optimize the reconfiguration architecture of REMUS-II, a hierarchical configuration storage structure and a 3-stage reconfiguration processing structure are proposed. Furthermore, several optimization methods for configuration reusing are also introduced, to further improve the performance of reconfiguration process. The optimization methods include two aspects: the multi-target reconfiguration method and the configuration caching strategies. Experimental results showed that, with the reconfiguration architecture proposed, the performance of reconfiguration process will be improved by 4 times. Based on RTL simulation, REMUS-II can support the 1080p@32 fps of H.264 HiP@Level4 and 1080p@40 fps High-level MPEG-2 stream decoding at the clock frequency of 200 MHz. The proposed REMUS-II system has been implemented on a TSMC 65 nm process. The die size is 23.7 mm2 and the estimated on-chip dynamic power is 620 mW.

  • Growth Mechanism of Pentacene on HfON Gate Insulator and Its Effect on Electrical Properties of Organic Field-Effect Transistors

    Min LIAO  Hiroshi ISHIWARA  Shun-ichiro OHMI  

     
    PAPER

      Vol:
    E95-C No:5
      Page(s):
    885-890

    Pentacene-based organic field-effect transistors (OFETs) with SiO2 and HfON gate insulators have been fabricated, and the effect of gate insulator on the electrical properties of pentacene-based OFETs and the microstructures of pentacene films were investigated. It was found that the grain size for pentacene film deposited on HfON gate insulator is larger than that for pentacene film deposited on SiO2 gate insulator. Due to the larger grain size, pentacene-based OFET with HfON gate insulator shows better electrical properties compared to pentacene-based OFET with SiO2 gate insulator. Meanwhile, low-temperature (such as 140) fabricated pentacene-based OFET with HfON gate insulator was also investigated. The OFET fabricated at 140 shows a small subthreshold swing of 0.14 V/decade, a large on/off current ratio of 4 104, a threshold voltage of -0.65 V, and a hole mobility of 0.33 cm2/Vs at an operating voltage of -2 V.

  • Low Power Nonvolatile Counter Unit with Fine-Grained Power Gating

    Shuta TOGASHI  Takashi OHSAWA  Tetsuo ENDOH  

     
    PAPER

      Vol:
    E95-C No:5
      Page(s):
    854-859

    In this paper, we propose a new low power nonvolatile counter unit based on Magnetic Tunnel Junction (MTJ) with fine-grained power gating. The proposed counter unit consists of only a single latch with two MTJs. We verify the basic operation and estimate the power consumption of the proposed counter unit. The operating power consumption of the proposed nonvolatile counter unit is smaller than the conventional one below 140 kHz. The power of the proposed unit is 74.6% smaller than the conventional one at low frequency.

21-40hit(79hit)