The search functionality is under construction.

Author Search Result

[Author] Yoshio MATSUDA(22hit)

1-20hit(22hit)

  • A Divided/Pausing Bitline Sensing Scheme (DIPS) for ULSI DRAM Core

    Hideto HIDAKA  Yoshio MATSUDA  Kazuyasu FUJISHIMA  

     
    LETTER-Integrated Circuits

      Vol:
    E73-E No:11
      Page(s):
    1852-1854

    A new DRAM bitline architecture, called Divided/Pausing Bitline Sensing Scheme (DIPS), is proposed for DRAM core of 64 Mbit level and beyond. This architecture eliminates the inter-bitline coupling noise and realizes a high speed sensing operation.

  • A Line-Mode Test with Data Register for ULSI Memory Architecture

    Tsukasa OOISHI  Masaki TSUKUDE  Kazutani ARIMOTO  Yoshio MATSUDA  Kazuyasu FUJISHIMA  

     
    PAPER-DRAM

      Vol:
    E76-C No:11
      Page(s):
    1595-1603

    We propose an advanced hyper parallel testing method which improves the line-mode test method by adding data inversion registers which we call the Advanced Line-mode Test (ALT). This testing method has the same testing capability as the conventional bit-by-bit and multi-bit test method (MBT), because it enables the application of a high sensitive and practical test patterns under the hyper parallel condition. The testing time for fixed data patterns are reduced by 1/1900 (all-0/1, checker board, and etc.). Moreover, the ALT can be applicable to the continuous patterns (march, walking, and etc.). The ALT improved from the line-mode test with registers and comparators (LTR) is able to applicable to the most test patterns and to reduce the testing time remarkably, and is suitable for the ULSI memories.

  • A VGA 30-fps Realtime Optical-Flow Processor Core for Moving Picture Recognition

    Yuichiro MURACHI  Yuki FUKUYAMA  Ryo YAMAMOTO  Junichi MIYAKOSHI  Hiroshi KAWAGUCHI  Hajime ISHIHARA  Masayuki MIYAMA  Yoshio MATSUDA  Masahiko YOSHIMOTO  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    457-464

    This paper describes an optical-flow processor core for real-time video recognition. The processor is based on the Pyramidal Lucas and Kanade (PLK) algorithm. It features a smaller chip area, higher pixel rate, and higher accuracy than conventional optical-flow processors. Introduction of search range limitation and the Carman filter to the original PLK algorithm improve the optical-flow accuracy, and reduce the processor hardware cost. Furthermore, window interleaving and window overlap methods reduces the necessary clock frequency of the processor by 70%, allowing low-power characteristics. We first verified the PLK algorithm and architecture with a proto-typed FPGA implementation. Then, we designed a VLSI processor that can handle a VGA 30-fps image sequence at a clock frequency of 332 MHz. The core size and power consumption are estimated at 3.503.00 mm2 and 600 mW, respectively, in a 90-nm process technology.

  • A VGA 30 fps Affine Motion Model Estimation VLSI for Real-Time Video Segmentation

    Yoshiki YUNBE  Masayuki MIYAMA  Yoshio MATSUDA  

     
    PAPER-Computer System

      Vol:
    E93-D No:12
      Page(s):
    3284-3293

    This paper describes an affine motion estimation processor for real-time video segmentation. The processor estimates the dominant motion of a target region with affine parameters. The processor is based on the Pseudo-M-estimator algorithm. Introduction of an image division method and a binary weight method to the original algorithm reduces data traffic and hardware costs. A pixel sampling method is proposed that reduces the clock frequency by 50%. The pixel pipeline architecture and a frame overlap method double throughput. The processor was prototyped on an FPGA; its function and performance were subsequently verified. It was also implemented as an ASIC. The core size is 5.05.0 mm2 in 0.18 µm process, standard cell technology. The ASIC can accommodate a VGA 30 fps video with 120 MHz clock frequency.

  • A Field Programmable Sequencer and Memory with Middle Grained Programmability Optimized for MCU Peripherals

    Yoshifumi KAWAMURA  Naoya OKADA  Yoshio MATSUDA  Tetsuya MATSUMURA  Hiroshi MAKINO  Kazutami ARIMOTO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E99-A No:5
      Page(s):
    917-928

    A Field Programmable Sequencer and Memory (FPSM), which is a programmable unit exclusively optimized for peripherals on a micro controller unit, is proposed. The FPSM functions as not only the peripherals but also the standard built-in memory. The FPSM provides easier programmability with a smaller area overhead, especially when compared with the FPGA. The FPSM is implemented on the FPGA and the programmability and performance for basic peripherals such as the 8 bit counter and 8 bit accuracy Pulse Width Modulation are emulated on the FPGA. Furthermore, the FPSM core with a 4K bit SRAM is fabricated in 0.18µm 5 metal CMOS process technology. The FPSM is an half the area of FPGA, its power consumption is less than one-fifth.

  • Novel VLIW Code Compaction Method for a 3D Geometry Processor

    Hiroaki SUZUKI  Hiroyuki KAWAI  Hiroshi MAKINO  Yoshio MATSUDA  

     
    PAPER-Digital Signal Processing

      Vol:
    E84-A No:11
      Page(s):
    2885-2893

    A VLIW (Very Long Instruction Word) architecture with a new code compaction method has been proposed. For a 3D-geometry processor, we consider two types of 2-issue VLIW architectures, the floating-point execution accelerating VLIW (FP-VLIW) and the data-move enhancing VLIW (MV-VLIW) architectures, as expansions of a Single-Streaming Single Instruction, Multiple Data (SS-SIMD) architecture. To solve the code bloat problem which is common to VLIW architectures, the proposed method makes it possible to compact original codes into the VLIW codes by software tools and decompact the VLIW codes by a simple hardware decompactor composed of an instruction swap circuit on a chip. Speeds and code densities of the two VLIWs with the code compaction are compared to the SS-SIMD with the same instruction set and the same building blocks. The FP-VLIW shows the fastest speed performance in the evaluation results of the viewperf CDRS-03 benchmark programs. It is 36% faster than the SS-SIMD used as reference. The proposed compaction method keeps the 95% code density of the SS-SIMD. One test program shows that the code density of the MV-VLIW is higher than that of the SS-SIMD. This result demonstrates that the merit of compacting nops can be greater than the VLIW penalty. The FP-VLIW architecture with the code compaction achieves 1.36 times the speed performance without significant code-density deterioration.

  • The Maximum Operating Region in SiGe HBTs for RF Power Amplifiers

    Akira INOUE  Shigenori NAKATSUKA  Takahide ISHIKAWA  Yoshio MATSUDA  

     
    PAPER-Active Devices and Circuits

      Vol:
    E87-C No:5
      Page(s):
    714-719

    The maximum operating region of a SiGe HBT has been experimentally investigated by a direct microwave waveform measurement. Dynamic RF load lines are used as a probe to detect the limit of the RF operation. For the first time, it is found that SiGe HBTs operate beyond the conventional BVceo, while GaAs HBTs cannot survive at that voltage. The conventional BVceo limits the average Vc of the maximum load lines, but has no influence on the peak voltage. Another BVceo measured with a voltage generator is proposed to represent the irreversible avalanche breakdown instead of the conventional one. A pulsed breakdown measurement is also performed to reveal the time constant of the phenomena.

  • A 300 MHz Dual Port Palette RAM Using Port Swap Architecture

    Yasunobu NAKASE  Koichiro MASHIKO  Yoshio MATSUDA  Takeshi TOKUDA  

     
    PAPER-Electronic Circuits

      Vol:
    E81-C No:9
      Page(s):
    1484-1490

    This paper proposes a dual port color palette SRAM using a single bit line cell. Since the single bit line cell consists of fewer bit lines and transistors than standard dual port cells, it is able to reduce the area. However, the cell has had a problem in writing a high level. The port swap architecture solves the problem without any special mechanism such as a boot strap. In the architecture, each of two bit lines is assigned to the read/write MPU port and the read only pixel port, respectively. When writing a low level, the MPU port uses pre-assigned bit line. On the other hand, when writing a high level, the MPU port uses the bit line assigned to the pixel port by a swap operation. During the swapping, the pixel port continues the read operation by using the bit line assigned to the MPU port. A color palette using this architecture is fabricated with a 0. 5 µm CMOS process technology. The memory cell size reduces by up to 43% compared with standard dual port cells. The color palette is able to supply the pixel data at 300 MHz at the supply voltage of 3.3 V. This speed is enough to support the practical highest resolution monitors in the world.

  • A Complete Charge Recycling TCAM with Checkerboard Array Arrangement for Low Power Applications

    Katsumi DOSAKA  Daisuke OGAWA  Takahito KUSUMOTO  Masayuki MIYAMA  Yoshio MATSUDA  

     
    PAPER-Integrated Electronics

      Vol:
    E93-C No:5
      Page(s):
    685-695

    Architecture of a low power Ternary Content Addressable Memory (TCAM) is proposed. The TCAM is a powerful engine for search and sort processing, but it has two serious problems, large power consumption and large power line noise. To solve these problems, we have developed a charge recycling scheme for match lines and search lines. A combination of the newly introduced PMOS CAM cell together with the conventional NMOS CAM cell realizes match line charge recycling. A checkerboard arrangement of the NMOS and the PMOS cell array enables search line charge recycling. By using these technologies, the power consumption of the TCAM can be reduced to 50% of conventional designs, and as a result, the power line noise is also reduced. An experimental chip has been fabricated in 180-nm 6-metal process. The power consumption of this chip is 6.3 fJ/bit/search, which is half of the conventional scheme.

  • A 1.5-V 250-MHz to 3.0-V 622-MHz Operation CMOS Phase-Locked Loop with Precharge Type Phase-Frequency Detector

    Harufusa KONDOH  Hiromi NOTANI  Tsutomu YOSHIMURA  Hiroshi SHIBATA  Yoshio MATSUDA  

     
    PAPER-Digital Circuits

      Vol:
    E78-C No:4
      Page(s):
    381-388

    A new approach which implements a simple, high-speed phase detector with precharge logic will be presented. The minimum detectable phase difference is 40 psec, which is less than a half of conventional detectors. A current mode ring oscillator with a complementary-input bias generator has also been developed to enhance the dynamic range of the VCO under a low supply voltage. A fully CMOS PLL was designed using 0.5-µm technology. By virtue of this simple, fast detector, the wide operation range of 250 MHz at 1.5 V to 622 MHz at 3.0 V was achieved by simulation.

  • Design and Implementation of 176-MHz WXGA 30-fps Real-Time Optical Flow Processor

    Yu SUZUKI  Masato ITO  Satoshi KANDA  Kousuke IMAMURA  Yoshio MATSUDA  Tetsuya MATSUMURA  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2888-2900

    This paper describes the design and implementation of a real-time optical flow processor using a single field-programmable gate array (FPGA) chip. By introducing the modified initial flow generation method, the successive over-relaxation (SOR) method for both layers, the optimization of the reciprocal operation method, and the image division method, it is now possible to both reduce hardware requirements and improve flow accuracy. Additionally, by introducing a pipeline structure to this processor, high-throughput hardware implementation could be achieved. Total logic cell (LC) amounts and processer memory capacity are reduced by about 8% and 16%, respectively, compared to our previous hierarchical optical flow estimation (HOE) processor. The results of our evaluation confirm that this processor can perform 30 fps wide extended graphics array (WXGA) 175.7MHz real-time optical flow processing with a single FPGA.

  • An Efficient Self-Timed Queue Architecture for ATM Switch LSIs

    Harufusa KONDOH  Hideaki YAMANAKA  Masahiko ISHIWAKI  Yoshio MATSUDA  Masao NAKAYA  

     
    PAPER-Multimedia System LSIs

      Vol:
    E77-C No:12
      Page(s):
    1865-1872

    A new approach to implement queues for controlling ATM switch LSI is presented. In many conventional architecture, external FIFOs are provided for each output link and used to manage the address of the buffer in an ATM switch. We reduce the number of FIFOs by using a self-timed queue with a search circuit that finds the earliest entry for each output link. Using this architecture, number of the FIFOs is reduced to 1/N, where N is the switch size. Delay priority and multicasting can be supported without doubling the number of the queues. This new queue can also be utilized as an ATM switch by itself. Evaluation chip was fabricated using 0.5-µm CMOS process technology. Inter-stage transfer speed over 500 MHz and cycle time over 125 MHz was obtained. This performance is enough for a 622-Mbps 1616 ATM Switch.

  • High-Performance Super-Resolution via Patch-Based Deep Neural Network for Real-Time Implementation

    Reo AOKI  Kousuke IMAMURA  Akihiro HIRANO  Yoshio MATSUDA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2018/08/20
      Vol:
    E101-D No:11
      Page(s):
    2808-2817

    Recently, Super-resolution convolutional neural network (SRCNN) is widely known as a state of the art method for achieving single-image super resolution. However, performance problems such as jaggy and ringing artifacts exist in SRCNN. Moreover, in order to realize a real-time upconverting system for high-resolution video streams such as 4K/8K 60 fps, problems such as processing delay and implementation cost remain. In the present paper, we propose high-performance super-resolution via patch-based deep neural network (SR-PDNN) rather than a convolutional neural network (CNN). Despite the very simple end-to-end learning system, the SR-PDNN achieves higher performance than the conventional CNN-based approach. In addition, this system is suitable for ultra-low-delay video processing by hardware implementation using an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).

  • A Shared Multibuffer Architecture for High-Speed ATM Switch LSIs

    Harufusa KONDOH  Hiromi NOTANI  Hideaki YAMANAKA  Keiichi HIGASHITANI  Hirotaka SAITO  Isamu HAYASHI  Yoshio MATSUDA  Kazuyoshi OSHIMA  Masao NAKAYA  

     
    PAPER-Improved Binary Digital Architectures

      Vol:
    E76-C No:7
      Page(s):
    1094-1101

    A new shared multibuffer architecture for high-speed ATM (Asynchronous Transfer Mode) switch LSIs is described. Multiple buffer memories are located between two crosspoint switches. By controlling the input-side crosspoint switch so as to equalize the utilization rate of each buffer memory, these multiple buffer memories can be recognized as a single large shared buffer memory. High utilization efficiency of buffer memory can thus be achieved, and the cell loss ratio is minimized. By accessing the buffer memories in parallel via crosspoint switches, the time required to access the buffer memories is greatly reduced. This feature enables high-speed operation of the switch. The shared multibuffer architecture was implemented in a switch LSI using 0.8-µm BiCMOS process technology. Experimental results revealed that this chip can operate at more than 125 MHz. Bit-sliced eight switch LSIs operating at 78 MHz construct a 622-Mb/s 88 ATM switching system with a buffer size of 1,024 ATM cells. Power consumption of the switch LSI was 3 W.

  • A Low Standby Current DSP Core Using Improved ABC-MT-CMOS with Charge Pump Circuit

    Hiromi NOTANI  Masayuki KOYAMA  Ryuji MANO  Hiroshi MAKINO  Yoshio MATSUDA  Osamu TOMISAWA  Shuhei IWADE  

     
    PAPER-Circuit Design

      Vol:
    E86-C No:4
      Page(s):
    597-603

    A 64-bit 100-MHz multimedia DSP core has been designed using 0.15-µ m CMOS technology. An improved Auto-Backgate-Controlled MT-CMOS (ABC-MT-CMOS) circuit with a charge pump is adopted to suppress the standby leakage current. The dynamic active current of whole chip was simulated to optimize the size of a switch for a power supply control. The DSP core chip, which integrates 300-kgate Logic, 64-kbyte SRAM and charge pump circuit, has 8-µ A standby leakage current. The reduction rate is 1/250.

  • A 250 MHz Dual Port Cursor RAM Using Dynamic Data Alignment Architecture

    Yasunobu NAKASE  Hiroyuki KONO  Yoshio MATSUDA  Hisanori HAMANO  

     
    PAPER-Electronic Circuits

      Vol:
    E81-C No:11
      Page(s):
    1750-1756

    Cursor RAMs have been composed of two memory planes. A cursor pattern is stored in these planes with 2-bit data depth. While the pixel port requires data from both planes at the same time, the MPU port accesses either one of the planes at a time. Since the address space is defined differently between the ports, conventional cursor RAMs could not have dealt with these different access ways at real time. This paper proposes a dual port cursor RAM with a dynamic data alignment architecture. The architecture processes the different access ways at real time, and reduces a large amount of control circuitry. Conventional cursor RAMs have been organized with a single port memory because dual port memory cells have been large. We have applied the port swap architecture which has reduced the cell size. The control block is further simplified because the controller no longer emulate a dual port memory. The cursor RAM with these architectures is fabricated with a double metal 0. 5 µm CMOS process technology. The active area is 1. 51. 6 mm2 including a couple of shift registers and a control block. It operates up to 263 MHz at the supply voltage of 3. 3 V.

  • A Cost-Effective 1T-4MTJ Embedded MRAM Architecture with Voltage Offset Self-Reference Sensing Scheme for IoT Applications

    Masanori HAYASHIKOSHI  Hiroaki TANIZAKI  Yasumitsu MURAI  Takaharu TSUJI  Kiyoshi KAWABATA  Koji NII  Hideyuki NODA  Hiroyuki KONDO  Yoshio MATSUDA  Hideto HIDAKA  

     
    PAPER

      Vol:
    E102-C No:4
      Page(s):
    287-295

    A 1-Transistor 4-Magnetic Tunnel Junction (1T-4MTJ) memory cell has been proposed for field type of Magnetic Random Access Memory (MRAM). Proposed 1T-4MTJ memory cell array is achieved 44% higher density than that of conventional 1T-1MTJ thanks to the common access transistor structure in a 4-bit memory cell. A self-reference sensing scheme which can read out with write-back in four clock cycles has been also proposed. Furthermore, we add to estimate with considering sense amplifier variation and show 1T-4MTJ cell configuration is the best solution in IoT applications. A 1-Mbit MRAM test chip is designed and fabricated successfully using 130-nm CMOS process. By applying 1T-4MTJ high density cell and partially embedded wordline driver peripheral into the cell array, the 1-Mbit macro size is 4.04 mm2 which is 35.7% smaller than the conventional one. Measured data shows that the read access is 55 ns at 1.5 V typical supply voltage and 25C. Combining with conventional high-speed 1T-1MTJ caches and proposed high-density 1T-4MTJ user memories is an effective on-chip hierarchical non-volatile memory solution, being implemented for low-power MCUs and SoCs of IoT applications.

  • Physical Design Methodology for On-Chip 64-Mb DRAM MPEG-2 Encoding with a Multimedia Processor

    Hidehiro TAKATA  Rei AKIYAMA  Tadao YAMANAKA  Haruyuki OHKUMA  Yasue SUETSUGU  Toshihiro KANAOKA  Satoshi KUMAKI  Kazuya ISHIHARA  Atsuo HANAMI  Tetsuya MATSUMURA  Tetsuya WATANABE  Yoshihide AJIOKA  Yoshio MATSUDA  Syuhei IWADE  

     
    PAPER-Product Designs

      Vol:
    E85-C No:2
      Page(s):
    368-374

    An on-chip, 64-Mb, embedded, DRAM MPEG-2 encoder LSI with a multimedia processor has been developed. To implement this large-scale and high-speed LSI, we have developed the hierarchical skew control of multi-clocks, with timing verification, in which cross-talk noise is considered, and simple measures taken against the IR drop in the power lines through decoupling capacitors. As a result, the target performance of 263 MHz at 1.5 V has been successfully attained and verified, the cross-talk noise has been considered, and, in addition, it has become possible to restrain the IR drop to 166 mV in the 162 MHz operation block.

  • Shared Multibuffer ATM Switches with Hierarchical Queueing and Multicast Functions

    Hideaki YAMANAKA  Hirotaka SAITO  Hirotoshi YAMADA  Harufusa KONDOH  Hiromi NOTANI  Yoshio MATSUDA  Kazuyoshi OSHIMA  

     
    PAPER-Switching and Communication Processing

      Vol:
    E79-B No:8
      Page(s):
    1109-1120

    A new ATM switch architecture, named shared multibuffering, features great advantages on memory access speed for a large switch, and overall size of buffer memories to achieve excellent cell-loss performance. We have developed a 622-Mb/s 88 shared multibuffer ATM switch with multicast functions and hierarchical queueing functions to accommodate 156-Mb/s, 622-Mb/s and 2.4-Gb/s interfaces. Implementation of the shared multibuffer ATM switch is described with respect to the four sorts of 0.8-µm BiCMOS LSIs and ATM switch boards. The switch board/type-1, with C1-LSI, allows to accommodate effectively 156-Mb/s and 622-Mb/s interfaces, which is suitable for an ATM access system. The switch board/type-2, with C2-LSI, can provide multicast functions and accommodate a 2.4-Gb/s interface. By using four switch boards, it is possible to apply them to a 2.4-Gb/s ATM loop system.

  • Mechanism of Bit Line Mode Soft Error for DRAM

    Mikio ASAKURA  Yoshio MATSUDA  Katsuhiro TSUKAMOTO  Kazuyasu FUJISHIMA  Tsutomu YOSHIHARA  

     
    LETTER-Semiconductor Devices

      Vol:
    E70-E No:11
      Page(s):
    1060-1061

    This letter reports a charge collection experiment of alpha-particle-induced carriers in the cell arrays of the 1 Mb DRAM. It is indicated that this experiment is effective to estimate the soft error rate of VLSI memories with various kinds of structures.

1-20hit(22hit)