The search functionality is under construction.

Author Search Result

[Author] Hiroaki KUNIEDA(45hit)

1-20hit(45hit)

  • Automatic Synthesis of a Serial Input Multiprocessor Array

    Dongji LI  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E79-A No:12
      Page(s):
    2097-2105

    Memory Sharing Processor Array (MSPA) architecture has been developed as an effective array processing architecture for both reduced data storages and increased processor cell utilization efficiency [1]. In this paper, the MSPA design methodology is extended to the VLSI synthesis of a serial input processor array (Pa). Then, a new bit-serial input multiplier and a new data serial input matrix multiplier are derived from the new PA. These multipliers are superior to the conventional multipliers by their smaller number of logic-gate count.

  • A New Approach for Datapath Synthesis of Application Specific Instruction Processor

    Kyung-Sik JANG  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E80-A No:8
      Page(s):
    1478-1488

    In this paper, a systematic method which synthesizes the datapath of Application Specific Instruction Processor (ASIP) is proposed. The behavioral description of application is written in instruction code defined on abstract machine. We introduce register transfer graph (RTG) to represent instructions and synthesis constraint tree to select the combinations of synthesis constraints to explore design space along area and performance axis. The high performance is achieved by scheduling micro-operations of instruction in out-of-order. The practical datapath is synthesized by considering connection geometry as well as the maximum utilization of hardware resources. To reduce connection cost, data transfer paths are minimized by replacing an inefficient data transfer path with its bypass route. The feasibility of the proposed synthesis method is verified with several experimental instruction sequences.

  • A High Level Design of Reconfigurable and High-Performance ASIP Engine for Image Signal Processing

    Hsuan-Chun LIAO  Mochamad ASRI  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E95-A No:12
      Page(s):
    2373-2383

    Emerging image and video applications and conventional MPSoC architectures encounter drastically increasing performance and flexibility requirements. In order to display high quality images, large amount of image processing needs to be carried out. These image processing algorithms are nonstandard and vary case by case, and it is difficult to achieve real time processing by using general purpose processors or DSP. In this paper, we present two reconfigurable Application Specific Instruction-set Processors (ASIP) which can perform several image processing algorithms by using the same processor architecture. These ASIPs can achieve performance similar to DSP; meanwhile, while the area is considerably smaller than DSP and slightly bigger than conventional RISC processor. 1D ASIP can perform 16 times higher compared to a RISC processor, and 2D ASIP can perform 3 to 7 times higher compared to a RISC processor.

  • A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

    Mohammad ZALFANY URFIANTO  Tsuyoshi ISSHIKI  Arif ULLAH KHAN  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:4
      Page(s):
    1185-1196

    This paper presents a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.

  • A Cost-Effective Network for Very Large ATM Cross-Connects--The Delta Network with Expanded Middle Stages--

    Takashi SHIMIZU  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E77-B No:11
      Page(s):
    1429-1436

    This paper presents a cost-effective network for very large ATM cross-connects. In order to develop it, we propose the delta network with expanded middle stages. This proposed network is the intermediate network between a nonblocking network and the delta network with respect to the cost of hardware and internal blocking probability. Using this network, we explore the tradeoff between the cost and internal blocking probability, and derive the optimum configuration under temporarily deviating traffic. Internal blocking occurs when input traffic temporarily deviates from its average value. However, we cannot evaluate the internal blocking probability by using conventional traffic models. In this paper, we adopt temporarily deviating traffic such that all traffic is described as the superposition of the paths which are defined by traffic parameters. As can easily be seen, the path corresponds to virtual path (VP) or virtual channel (VC). Therefore, we believe that our model describes actual traffic more exactly than conventional models do. We show that the optimum configuration is the proposed network whose expansion ratio γ=3 when the maximum number of paths that can be accommodated in one link is greater than 22. This network achieves the internal blocking probability of 10-10. As an example of this network, we show that the proposed network of size 7272 is constructed with only 40% of the hardware required by the nonblocking network.

  • Hybrid Minutiae Descriptor for Narrow Fingerprint Verification

    Zhiqiang HU  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-Pattern Recognition

      Pubricized:
    2016/12/12
      Vol:
    E100-D No:3
      Page(s):
    546-555

    Narrow swipe sensor based systems have drawn more and more attention in recent years. However, the size of captured image is significantly smaller than that obtained from the traditional area fingerprint sensor. Under this condition the available minutiae number is also limited. Therefore, only employing minutiae with the standard associated feature can hardly achieve high verification accuracy. To solve this problem, we present a novel Hybrid Minutiae Descriptor (HMD) which consists of two modules. The first one: Minutiae Ridge-Valley Orientation Descriptor captures the orientation information around minutia and also the trace points located at associated ridge and valley. The second one: Gabor Binary Code extracts and codes the image patch around minutiae. The proposed HMD enhances the representation capability of minutiae feature, and can be matched very efficiently. Experiments conducted over public databases and the database captured by the narrow swipe sensor show that this innovative method gives rise to significant improvements in reducing FRR (False Reject Rate) and EER (Equal Error Rate).

  • A Design of High Performance Parallel Architecture and Communication for Multi-ASIP Based Image Processing Engine

    Hsuan-Chun LIAO  Mochamad ASRI  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1222-1235

    Image processing engine is crucial for generating high quality images in video system. As Application Specific Integrated Circuit (ASIC) is dedicated for specific standards, Application Specific Instruction-set Processor (ASIP) which provides high flexibility and high performance seems to have more advantages in supporting the nonstandard pre/post image processing in video system. In our previous work, we have designed some ASIPs that can perform several image processing algorithms with a reconfigurable datapath. ASIP is as efficient as DSP, but its area is considerably smaller than DSP. As the resolution of image and the complexity of processing increase, the performance requirement also increases accordingly. In this paper, we presents a novel multi ASIP based image processing unit (IPU) which can provide sufficient performance for the emerging very-high-resolution applications. In order to provide a high performance image processing engine, we propose several new techniques and architecture such as multi block-pipes architecture, pixel direct transmission and boundary pixel write-through. Multi block-pipes architecture has flexible scalability in supporting a various ranges of resolution, which ranges from low resolution to high resolution. The boundary pixel write-through technique provides high efficient parallel processing, and pixel direct transmission technique is implemented in each ASIP to further reduce the data transmission time. Cycle-accurate SystemC simulations are performed, and the experimental results show that the maximum bandwidth of the proposed communication approach can achieve up to 1580 Mbyte/s at 400 MHz. Moreover, communication overhead can be reduced about a maximum of 88% compared to our previous works.

  • System-MSPA Design of H.263+ Video Encoder/Decoder LSI for Videotelephony Applications

    Chawalit HONSAWEK  Kazuhito ITO  Tomohiko OHTSUKA  Trio ADIONO  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design

      Vol:
    E84-A No:11
      Page(s):
    2614-2622

    In this paper, a LSI design for video encoder and decoder for H.263+ video compression is presented. LSI operates under clock frequency of 27 MHz to compress QCIF (176144 pixels) at the frame rate of 30 frame per second. The core size is 4.6 4.6 mm2 in a 0.35 µm process. The architecture is based on bus connected heterogeneous dedicated modules, named as System-MSPA architecture. It employs the fast and small-chip-area dedicated modules in lower level and controls them by employing the slow and flexible programmable device and an external DRAM. Design results in success to achieve real time encoder in quite compact size without losing flexibility and expand ability. Real time emulation and easy test capability with external PC is also implemented.

  • Practical Orientation Field Estimation for Embedded Fingerprint Recognition Systems

    Yukun LIU  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-Pattern Recognition

      Vol:
    E94-D No:9
      Page(s):
    1792-1799

    As a global feature of fingerprint patterns, the Orientation Field (OF) plays an important role in fingerprint recognition systems. This paper proposes a fast binary pattern based orientation estimation with nearest-neighbor search, which can reduce the computational complexity greatly. We also propose a classified post processing with adaptive averaging strategy to increase the accuracy of the estimated OF. Experimental results confirm that the proposed method can satisfy the strict requirements of the embedded applications over the conventional approaches.

  • Low Cost SoC Design of H.264/AVC Decoder for Handheld Video Player

    Sumek WISAYATAKSIN  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:4
      Page(s):
    1197-1205

    We propose a low cost and stand-alone platform-based SoC for H.264/AVC decoder, whose target is practical mobile applications such as a handheld video player. Both low cost and stand-alone solutions are particularly emphasized. The SoC, consisting of RISC core and decoder core, has advantages in terms of flexibility, testability and various I/O interfaces. For decoder core design, the proposed H.264/AVC coprocessor in the SoC employs a new block pipelining scheme instead of a conventional macroblock or a hybrid one, which greatly contribute to reducing drastically the size of the core and its pipelining buffer. In addition, the decoder schedule is optimized to block level which is easy to be programmed. Actually, the core size is reduced to 138 KGate with 3.5 kbyte memory. In our practical development, a single external SDRAM is sufficient for both reference frame buffer and display buffer. Various peripheral interfaces such as a compact flash, a digital broadcast receiver and a LCD driver are also provided on a chip.

  • Entropy Decoding Processor for Modern Multimedia Applications

    Sumek WISAYATAKSIN  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3248-3257

    An entropy decoding engine plays an important role in modern multimedia decoders. Previous researches that focused on the decoding performance paid a considerable attention to only one parameter such as the data parsing speed, but they did not consider the performance caused by a table configuration time and memory size. In this paper, we developed a novel method of entropy decoding based on the two step group matching scheme. Our approach achieves the high performance on both data parsing speed and configuration time with small memory needed. We also deployed our decoding scheme to implement an entropy decoding processor, which performs operations based on normal processor instructions and VLD instructions for decoding variable length codes. Several extended VLD instructions are prepared to increase the bitstream parsing process in modern multimedia applications. This processor provides a solution with software flexibility and hardware high speed for stand-alone entropy decoding engines. The VLSI hardware is designed by the Language for Instruction Set Architecture (LISA) with 23 Kgates and 110 MHz maximum clock frequency under TSMC 0.18 µm technology. The experimental simulations revealed that proposed processor achieves the higher performance and suitable for many practical applications such as MPEG-2, MPEG-4, H.264/AVC and AAC.

  • FOREWORD

    Hiroaki KUNIEDA  

     
    FOREWORD

      Vol:
    E80-A No:10
      Page(s):
    1741-1741
  • Two Dimensional Space Partition Recursive Filtering Algorithm on Rectangular Processor Array

    Yoshinori TAKEUCHI  Hiroaki KUNIEDA  

     
    PAPER-Digital Signal Processing

      Vol:
    E74-A No:1
      Page(s):
    42-48

    This paper studies the method of parallel processing for two dimensional recursive filters on a multiprocessor system. Conventional recursive filterings are sequential and iterative local, i.e. global processing. We decompose their global processings into space partition processings with a few global communications. We derive an efficient parallel algorithm for two dimensional recursive filterings using Roesser's model and investigate their speed-up, rate, efficiency and degradation.

  • Memory Sharing Processor Array (MSPA) Architecture

    Dongju LI  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E79-A No:12
      Page(s):
    2086-2096

    In this paper, a design of a new processor array architecture with effective data storage schemes which meets the practical requirement of a reduced number of processor elements is proposed. Its design method is shown to be drastically simpler than the popular systolic arrays. This processor array which we call Memory Sharing Processor Array (MSPA) consists of a processor array, several memory units, and some address generation hardware units used to minimize the number of I/O ports. MSPA architecture with its design methodology tries to overcome overlapping data storages, idle processing time and I/O bottleneck problems, which mostly degrade the performance of systolic architecture. It has practical advantages over the systolic array in the view of area-efficiency, high throughput and practical input schemes.

  • Design Optimization of VLSI Array Processor Architecture for Window Image Processing

    Dongju LI  Li JIANG  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E82-A No:8
      Page(s):
    1475-1484

    In this paper, we present a novel architecture named as Window-MSPA architecture which targets to window operations in image processing. We have previously developed a Memory Sharing Processor Array (MSPA) for fast array processing with regular iterative algorithms. Window-MSPA tries to optimize the data I/O ports and the number of processing elements so as to reduce hardware cost. The input scheme of image data is restricted to row by row input which simplifies the I/O architecture. Under this practical I/O restriction, the fastest processings are achieved. In this paper, we present the general Window-MSPA design methodology for wide variety of applications. As an practical application, we have already reported the design of MP@HL MPEG2 Motion Estimator LSI. Design formulas for Window-MSPA architecture are given for various size of window operations in image processing. Thus, the derived architecture is flexible enough to satisfy user's requirement for either area or speed.

  • Bits Truncation Adapteve Pyramid Algorithm for Motion Estimation of MPEG2

    Li JIANG  Kazuhito ITO  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E80-A No:8
      Page(s):
    1438-1445

    In this paper, a new bits truncation adaptive pyramid (BTAP) algorithm for motion estimation is presented. The method employs bits truncation of the gray level from 8bits to much less bits in the searching algorithm. Compared with conventional fast block matching algorithms, this method drastically improves speed for motion estimation of reduced gray-level images and preserves reasonable performance and algorithm reliability. Bits truncation concept is well combined with hierarchical pyramid algorithm in order to truncate adaptively according to image characteristics. The computation complexity is much less than that of pyramid algorithm and 3-Step motion estimation algorithm because of bit-truncated searbh and low overhead adaptation. Nevertheless, the PSNR property is also comparable with these two algorithms for various video sequences.

  • A New FPGA Architecture for High Performance Bit-Serial Pipeline Datapath

    Akihisa OHTA  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E83-A No:8
      Page(s):
    1663-1672

    In this paper, we present our work on the design of a new FPGA architecture targeted for high-performance bit-serial pipeline datapath. Bit-parallel systems require large amount of routing resource which is especially critical in using FPGAs. Their device utilization and operation frequency become low because of large routing penalty. Whereas bit-serial circuits are very efficient in routing, therefore are able to achieve a very high logic utilization. Our proposed FPGA architecture is designed taking into account the structure of bit-serial circuits to optimize the logic and routing architecture. Our FPGA guarantees near 100% logic utilization with a straightforward place and route tool due to high routability of bit-serial circuits and simple routing interconnect architecture. The FPGA chip core which we designed consists of around 200k transistors on 3.5 mm square substrate using 0.5 µm 2-metal CMOS process technology.

  • Narrow Fingerprint Template Synthesis by Clustering Minutiae Descriptors

    Zhiqiang HU  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-Pattern Recognition

      Pubricized:
    2017/03/08
      Vol:
    E100-D No:6
      Page(s):
    1290-1302

    Narrow swipe sensor has been widely used in embedded systems such as smart-phone. However, the size of captured image is much smaller than that obtained by the traditional area sensor. Therefore, the limited template coverage is the performance bottleneck of such kind of systems. Aiming to increase the geometry coverage of templates, a novel fingerprint template feature synthesis scheme is proposed in the present study. This method could synthesis multiple input fingerprints into a wider template by clustering the minutiae descriptors. The proposed method consists of two modules. Firstly, a user behavior-based Registration Pattern Inspection (RPI) algorithm is proposed to select the qualified candidates. Secondly, an iterative clustering algorithm Modified Fuzzy C-Means (MFCM) is proposed to process the large amount of minutiae descriptors and then generate the final template. Experiments conducted over swipe fingerprint database validate that this innovative method gives rise to significant improvements in reducing FRR (False Reject Rate) and EER (Equal Error Rate).

  • Register-Based Process Virtual Machine Acceleration Using Hardware Extension with Hybrid Execution

    Surachai THONGKAEW  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2505-2518

    The Process Virtual Machine (VM) is typical software that runs applications inside operating systems. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware, operating system and allows bytecodes (portable code) to be executed in the same way on any other platforms. The Process VMs are implemented using an interpreter to interpret bytecode instead of direct execution of host machine codes. Thus, the bytecode execution is slower than those of the compiled programming language execution. Several techniques including our previous paper, the “Fetch/Decode Hardware Extension”, have been proposed to speed up the interpretation of Process VMs. In this paper, we propose an additional methodology, the “Hardware Extension with Hybrid Execution” to further enhance the performance of Process VMs interpretation and focus on Register-based model. This new technique provides an additional decoder which can classify bytecodes into either simple or complex instructions. With “Hybrid Execution”, the simple instruction will be directly executed on hardware of native processor. The complex instruction will be emulated by the “extra optimized bytecode software handler” of native processor. In order to eliminate the overheads of retrieving and storing operand on memory, we utilize the physical registers instead of (low address) virtual registers. Moreover, the combination of 3 techniques: Delay scheduling, Mode predictor HW and Branch/goto controller can eliminate all of the switching mode overheads between native mode and bytecode mode. The experimental results show the improvements of execution speed on the Arithmetic instructions, loop & conditional instructions and method invocation & return instructions can be achieved up to 16.9x, 16.1x and 3.1x respectively. The approximate size of the proposed hardware extension is 0.04mm2 (or equivalent to 14.81k gates) and consumes an additional power of only 0.24mW. The stated results are obtained from logic synthesis using the TSMC 90nm technology @ 200MHz.

  • Fast Fingerprint Classification Based on Direction Pattern

    Jinqing QI  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-Image/Visual Signal Processing

      Vol:
    E87-A No:8
      Page(s):
    1887-1892

    A new and fast fingerprint classification method based on direction patterns is presented in this paper. This method is developed to be applicable to today's embedded fingerprint authentication system, in which small area sensors are widely used. Direction patterns are well treated in the direction map at block level, where each block consists of 88 pixels. It is demonstrated that the search of directions pattern in specific area, generally called as pattern area, is able to classify fingerprints clearly and quickly. With our algorithm, the classification accuracy of 89% is achieved over 4000 images in the NIST-4 database, slightly lower than the conventional approaches. However, the classification speed is improved tremendously up to about 10 times as fast as conventional singular point approaches.

1-20hit(45hit)