The search functionality is under construction.

Keyword Search Result

[Keyword] ASIC(66hit)

1-20hit(66hit)

  • Template-Based Design Optimization for Selecting Pairing-Friendly Curve Parameters

    Momoko FUKUDA  Makoto IKEDA  

     
    PAPER-VLSI Design Technology and CAD

      Pubricized:
    2023/08/31
      Vol:
    E107-A No:3
      Page(s):
    549-556

    We have realized a design automation platform of hardware accelerator for pairing operation over multiple elliptic curve parameters. Pairing operation is one of the fundamental operations to realize functional encryption. However, known as a computational complexity-heavy algorithm. Also because there have been not yet identified standard parameters, we need to choose curve parameters based on the required security level and affordable hardware resources. To explore this design optimization for each curve parameter is essential. In this research, we have realized an automated design platform for pairing hardware for such purposes. Optimization results show almost equivalent to those prior-art designs by hand.

  • Flexible and Energy-Efficient Crypto-Processor for Arbitrary Input Length Processing in Blockchain-Based IoT Applications

    Vu-Trung-Duong LE  Hoai-Luan PHAM  Thi-Hong TRAN  Yasuhiko NAKASHIMA  

     
    PAPER

      Pubricized:
    2023/09/04
      Vol:
    E107-A No:3
      Page(s):
    319-330

    Blockchain-based Internet of Things (IoT) applications require flexible, fast, and low-power hashing hardware to ensure IoT data integrity and maintain blockchain network confidentiality. However, existing hashing hardware poses challenges in achieving high performance and low power and limits flexibility to compute multiple hash functions with different message lengths. This paper introduces the flexible and energy-efficient crypto-processor (FECP) to achieve high flexibility, high speed, and low power with high hardware efficiency for blockchain-based IoT applications. To achieve these goals, three new techniques are proposed, namely the crypto arithmetic logic unit (Crypto-ALU), dual buffering extension (DBE), and local data memory (LDM) scheduler. The experiments on ASIC show that the FECP can perform various hash functions with a power consumption of 0.239-0.676W, a throughput of 10.2-3.35Gbps, energy efficiency of 4.44-14.01Gbps/W, and support up to 8916-bit message input. Compared to state-of-art works, the proposed FECP is 1.65-4.49 times, 1.73-21.19 times, and 1.48-17.58 times better in throughput, energy efficiency, and energy-delay product (EDP), respectively.

  • High Speed ASIC Architectures for Aggregate Signature over BLS12-381

    Kaoru MASADA  Ryohei NAKAYAMA  Makoto IKEDA  

     
    BRIEF PAPER

      Pubricized:
    2022/11/29
      Vol:
    E106-C No:6
      Page(s):
    331-334

    BLS signature is an elliptic curve cryptography with an attractive feature that signatures can be aggregated and shortened. We have designed two ASIC architectures for hashing to the elliptic curve and pairing to minimize the latency. Also, the designs are optimized for BLS12-381, a relatively new and safe curve.

  • Hardware-Accelerated Secured Naïve Bayesian Filter Based on Partially Homomorphic Encryption

    Song BIAN  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER-Cryptography and Information Security

      Vol:
    E102-A No:2
      Page(s):
    430-439

    In this work, we provide the first practical secure email filtering scheme based on homomorphic encryption. Specifically, we construct a secure naïve Bayesian filter (SNBF) using the Paillier scheme, a partially homomorphic encryption (PHE) scheme. We first show that SNBF can be implemented with only the additive homomorphism, thus eliminating the need to employ expensive fully homomorphic schemes. In addition, the design space for specialized hardware architecture realizing SNBF is explored. We utilize a recursive Karatsuba Montgomery structure to accelerate the homomorphic operations, where multiplication of 2048-bit integers are carried out. Through the experiment, both software and hardware versions of the SNBF are implemented. On software, 104-105x runtime and 103x storage reduction are achieved by SNBF, when compared to existing fully homomorphic approaches. By instantiating the designed hardware for SNBF, a further 33x runtime and 1919x power reduction are achieved. The proposed hardware implementation classifies an average-length email in under 0.5s, which is much more practical than existing solutions.

  • An ASIC Crypto Processor for 254-Bit Prime-Field Pairing Featuring Programmable Arithmetic Core Optimized for Quadratic Extension Field

    Hiromitsu AWANO  Tadayuki ICHIHASHI  Makoto IKEDA  

     
    PAPER

      Vol:
    E102-A No:1
      Page(s):
    56-64

    An ASIC crypto processor optimized for the 254-bit prime-field optimal-ate pairing over Barreto-Naehrig (BN) curve is proposed. The data path of the proposed crypto processor is designed to compute five Fp2 operations, a multiplication, three addition/subtractions, and an inversion, simultaneously. We further propose a design methodology to automate the instruction scheduling by using a combinatorial optimization solver, with which the total cycle count is reduced to 1/2 compared with ever reported. The proposed crypto processor is designed and fabricated by using a 65nm silicon-on-thin-box (SOTB) CMOS process. The chip measurement result shows that the fabricated chip successfully computes a pairing in 0.185ms when a typical operating voltage of 1.20V is applied, which corresponds to 2.8× speed up compared to the current state-of-the-art pairing implementation on ASIC platform.

  • Low Latency 256-bit $mathbb{F}_p$ ECDSA Signature Generation Crypto Processor

    Shotaro SUGIYAMA  Hiromitsu AWANO  Makoto IKEDA  

     
    PAPER

      Vol:
    E101-A No:12
      Page(s):
    2290-2296

    A 256-bit $mathbb{F}_p$ ECDSA crypto processor featuring low latency, low energy consumption and capability of changing the Elliptic curve parameters is designed and fabricated in SOTB 65nm CMOS process. We have demonstrated the lowest ever reported signature generation time of 31.3 μs at 238MHz clock frequency. Energy consumption is 3.28 μJ/signature-generation, which is same as the lowest reported till date. We have also derived addition formulae on Elliptic curve useful for reduce the number of registers and operation cycles.

  • Optimal Design of Notch Filter with Principal Basic Vectors in Subspace

    Jinguang HAO  Gang WANG  Lili WANG  Honggang WANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E101-A No:4
      Page(s):
    723-726

    In this paper, an optimal method is proposed to design sparse-coefficient notch filters with principal basic vectors in the column space of a matrix constituted with frequency samples. The proposed scheme can perform in two stages. At the first stage, the principal vectors can be determined in the least-squares sense. At the second stage, with some components of the principal vectors, the notch filter design is formulated as a linear optimization problem according to the desired specifications. Optimal results can form sparse coefficients of the notch filter by solving the linear optimization problem. The simulation results show that the proposed scheme can achieve better performance in designing a sparse-coefficient notch filter of small order compared with other methods such as the equiripple method, the orthogonal matching pursuit based scheme and the L1-norm based method.

  • The ASIC Implementation of SM3 Hash Algorithm for High Throughput

    Xiaojing DU  Shuguo LI  

     
    LETTER-Cryptography and Information Security

      Vol:
    E99-A No:7
      Page(s):
    1481-1487

    SM3 is a hash function standard defined by China. Unlike SHA-1 and SHA-2, it is hard for SM3 to speed up the throughput because it has more complicated compression function than other hash algorithm. In this paper, we propose a 4-round-in-1 structure to reduce the number of rounds, and a logical simplifying to move 3 adders and 3 XOR gates from critical path to the non-critical path. Based in SMIC 65nm CMOS technology, the throughput of SM3 can achieve 6.54Gbps which is higher than that of the reported designs.

  • PAC-k: A Parallel Aho-Corasick String Matching Approach on Graphic Processing Units Using Non-Overlapped Threads

    ThienLuan HO  Seung-Rohk OH  HyunJin KIM  

     
    PAPER-Network Management/Operation

      Vol:
    E99-B No:7
      Page(s):
    1523-1531

    A parallel Aho-Corasick (AC) approach, named PAC-k, is proposed for string matching in deep packet inspection (DPI). The proposed approach adopts graphic processing units (GPUs) to perform the string matching in parallel for high throughput. In parallel string matching, the boundary detection problem happens when a pattern is matched across chunks. The PAC-k approach solves the boundary detection problem because the number of characters to be scanned by a thread can reach the longest pattern length. An input string is divided into multiple sub-chunks with k characters. By adopting the new starting position in each sub-chunk for the failure transition, the required number of threads is reduced by a factor of k. Therefore, the overhead of terminating and reassigning threads is also decreased. In order to avoid the unnecessary overlapped scanning with multiple threads, a checking procedure is proposed that decides whether a new starting position is in the sub-chunk. In the experiments with target patterns from Snort and realistic input strings from DEFCON, throughputs are enhanced greatly compared to those of previous AC-based string matching approaches.

  • A Design Methodology for Positioning Sub-Platform on Smartphone Based LBS

    Tetsuya MANABE  Takaaki HASEGAWA  

     
    PAPER

      Vol:
    E99-A No:1
      Page(s):
    297-309

    This paper presents a design methodology for positioning sub-platform from the viewpoint of positioning for smartphone-based location-based services (LBS). To achieve this, we analyze a mechanism of positioning error generation including principles of positioning sub-systems and structure of smartphones. Specifically, we carry out the experiments of smartphone positioning performance evaluation by the smartphone basic API (Application Programming Interface) and by the wireless LAN in various environments. Then, we describe the importance of considering three layers as follows: 1) the lower layer that caused by positioning sub-systems, e.g., GPS, wireless LAN, mobile base stations, and so on; 2) the middle layer that caused by functions provided from the platform such as Android and iOS; 3) the upper layer that caused by operation algorithm of applications on the platform.

  • A Robust Speech Communication into Smart Info-Media System

    Yoshikazu MIYANAGA  Wataru TAKAHASHI  Shingo YOSHIZAWA  

     
    INVITED PAPER

      Vol:
    E96-A No:11
      Page(s):
    2074-2080

    This paper introduces our developed noise robust speech communication techniques and describes its implementation to a smart info-media system, i.e., a small robot. Our designed speech communication system consists of automatic speech detection, recognition, and rejection. By using automatic speech detection and recognition, an observed speech waveform can be recognized without a manual trigger. In addition, using speech rejection, this system only accepts registered speech phrases and rejects any other words. In other words, although an arbitrary input speech waveform can be fed into this system and recognized, the system responds only to the registered speech phrases. The developed noise robust speech processing can reduce various noises in many environments. In addition to the design of noise robust speech recognition, the LSI design of this system has been introduced. By using the design of speech recognition application specific IC (ASIC), we can simultaneously realize low power consumption and real-time processing. This paper describes the LSI architecture of this system and its performances in some field experiments. In terms of current speech recognition accuracy, the system can realize 85-99% under 0-20dB SNR and echo environments.

  • Design a Fast CAM-Based Exact Pattern Matching System on FPGA and 0.18µm CMOS Process

    Duc-Hung LE  Katsumi INOUE  Cong-Kha PHAM  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E96-A No:9
      Page(s):
    1883-1888

    A CAM-based matching system for fast exact pattern matching is implemented on a hardware system with FPGA and ASIC. The system has a simple structure, and does not employ any Central Processor Unit (CPU) as well as complicated computations. We take advantage of Content Addressable Memory (CAM) which has an ability of parallel multi-match mode for designing the system. The system is applied to fast pattern matching with various required search patterns without using search principles. In this paper, the authors present a CAM-based system for fast exact pattern matching on 2-D data.

  • An ASIC Design Support Tool Set for Non-pipelined Asynchronous Circuits with Bundled-Data Implementation

    Minoru IIZUKA  Naohiro HAMADA  Hiroshi SAITO  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    482-491

    This paper proposes an ASIC design support tool set for non-pipelined asynchronous circuits with bundled-data implementation. This tool set consists of seven tools to automate design processes of bundled-data implementation such as the generation of design constraints, timing verification, and delay adjustment considering a given latency constraint. With the proposed design flow which combines the proposed tool set and commercial CAD tools, most of design processes from an RTL model is fully automated. In the experiments, to show the effectiveness of energy consumption in bundled-data implementation compared to synchronous counterpart, this paper synthesizes several circuits with a latency constraint which is generated from the synchronous counterpart with the minimum clock cycle time.

  • Via Programmable Structured ASIC Architecture “VPEX3” and CAD Design System

    Ryohei HORI  Taisuke UEOKA  Taku OTANI  Masaya YOSHIKAWA  Takeshi FUJINO  

     
    PAPER-Physical Level Design

      Vol:
    E95-A No:12
      Page(s):
    2182-2190

    A low-cost and low-power via-programmable structured ASIC architecture named “VPEX3” and a VPEX3-specific CAD system are developed. In the VPEX3 architecture, which is an improved version of the old VPEX and VPEX2 architectures, an arbitrary logic function including sequential logic can be programmed by three via layers. The logic elements (LEs) of VPEX3 are 60% smaller than those of the previous VPEX2, which can be programmed by two via layers. In this paper, we describe a global architecture named Logic Array Block (LAB) composed of LE matrices. The clock lines are buffered in the buffering region on the left and right sides of LAB. Next, a VPEX3-specific CAD system utilizing an academic placement tool named “CAPO” and the “FGR” global router is developed. Since these tools are originally designed for ASICs, we developed CAD tools for supporting a structured ASIC architecture. In particular, we developed a detailed router that assigns via positions on the via-programmable routing fabric. Our CAD system successfully converts the HDL design to GDS-II data format including via-1, 2, 3 layouts, and the successful verification of LVS and DRC on GDSII is achieved. The performance of the VPEX3 architecture and the CAD system is evaluated using ISCAS benchmark circuits. The developed CAD system is used to successfully design a test chip composed of 130110 LEs.

  • Improved Via-Programmable Structured ASIC VPEX3 and Its Evaluation

    Ryohei HORI  Tatsuya KITAMORI  Taisuke UEOKA  Masaya YOSHIKAWA  Takeshi FUJINO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:9
      Page(s):
    1518-1528

    Various kinds of structured ASICs have been proposed that can customize logic functions using a few photomasks, which decreases the initial cost, especially that of expensive photo-masks. In the past, we have developed a via programmable structured ASIC “VPEX2” (Via Programmable logic device using EXclusive-or array) that is capable of changing logics on 2 via (the 1st and 3rd via) layers. The logic element (LE) of VPEX2 is composed of EXOR gate and 2 NOT gates. However, “VPEX2” architecture has the two important penalty, the area penalty is 5-6 times that of the ASIC and wiring congestion by detouring wires to avoid I/O terminals. In this paper, we propose a new architecture “VPEX3” in order to achieve the practical structures. In VPEX3, we applied three techniques for decrease area penalty and higher wiring efficiency: (1) LE area is reduced approximately 60% by omitting 1 NOT gate on a LE and the gate width reduction, (2) the kinds of configurable logic function on a single LE is increased from 13 to 22 by introducing “flexible AOI gate technique” and (3) flexible I/O terminal by introducing 2nd via as a programmable layers. Furthermore, the delay model for via programmable wiring is necessary in order to evaluate via programmable wiring architecture compared to standard cell ASIC. We extracted wiring delay characteristics from the ring oscillator test circuit using both of normal wiring and via-programmable wiring. These three new architectures and via programmable wiring-delay-model revealed that an area-delay product of “VPEX3” is as small as twice that of ASIC. Chip-cost estimation among FPGA, “VPEX2”, “VPEX3” and ASIC revealed that the “VPEX3” is the most cost-effective architecture for Systems-on-chips (SoCs) whose production volume is from one thousand to several tens of thousands units.

  • Compact Architecture for ASIC and FPGA Implementation of the KASUMI Block Cipher

    Dai YAMAMOTO  Kouichi ITOH  Jun YAJIMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E94-A No:12
      Page(s):
    2628-2638

    Compact design is very important for embedded systems such as wireless sensor nodes, RFID tags and mobile devices because of their limited hardware (H/W) resources. This paper proposes a compact H/W implementation for the KASUMI block cipher, which is the 3GPP standard encryption algorithm. In [8] and [9], Yamamoto et al. proposed a method of reducing the register size for the MISTY1 FO function (YYI-08), and implemented very compact MISTY1 H/W. In this paper we aim to implement the smallest KASUMI H/W to date by applying a YYI-08 configuration to KASUMI, whose FO function has a similar structure to that of MISTY1. However, we discovered that straightforward application of YYI-08 raises problems. We therefore propose a new YYI-08 configuration improved for KASUMI and the compact H/W architecture. The new YYI-08 configuration consists of new FL function calculation schemes and a suitable calculation order. According to our logic synthesis on a 0.11-µm ASIC process, the gate size is 2.99 K gates, which, to our knowledge, is the smallest to date.

  • Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture

    Cao LIANG  Xinming HUANG  

     
    PAPER-Integrated Electronics

      Vol:
    E93-C No:3
      Page(s):
    407-415

    Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.

  • Compact Architecture for ASIC Implementation of the MISTY1 Block Cipher

    Dai YAMAMOTO  Jun YAJIMA  Kouichi ITOH  

     
    PAPER-Symmetric Cryptography

      Vol:
    E93-A No:1
      Page(s):
    3-12

    This paper proposes a compact hardware (H/W) implementation for the MISTY1 block cipher, which is one of the ISO/IEC 18033-3 standard encryption algorithms. In designing the compact H/W, we focused on optimizing the implementation of FO/FI/FL functions, which are the main components of MISTY1. For this optimization, we propose three new methods; reducing temporary registers for the FO function, shortening the critical path for the FI function, and merging the FL/FL-1 functions. According to our logic synthesis on a 0.18-µm CMOS standard cell library based on our proposed methods, the gate size is 3.4 Kgates, which is the smallest as far as we know.

  • An Efficient Signature Matching Scheme for Mobile Security

    Ruhui ZHANG  Makoto IWATA  

     
    PAPER-Network Management/Operation

      Vol:
    E91-B No:10
      Page(s):
    3251-3261

    The development of network technology reveals the clear trend that mobile devices will soon be equipped with more and more network-based functions and services. This increase also results in more intrusions and attacks on mobile devices; therefore, mobile security mechanisms are becoming indispensable. In this paper, we propose a novel signature matching scheme for mobile security. This scheme not only emphasizes a small resource requirement and an optimal scan speed, which are both important for resource-limited mobile devices, but also focuses on practical features such as stable performance, fast signature set updates and hardware implementation. This scheme is based on the finite state machine (FSM) approach widely used for string matching. An SRAM-based two-level finite state machine (TFSM) solution is introduced to utilize the unbalanced transition distribution in the original FSM to decrease the memory requirement, and to shorten the critical path of the single-FSM solution. By adjusting the boundary of the two FSMs, optimum memory usage and throughput are obtainable. The hardware circuit of our scheme is designed and evaluated by both FPGA and ASIC technology. The result of FPGA evaluation shows that 2,168 unique patterns with a total of 32,776 characters are stored in 177.75 KB SelectRAM blocks of Xilinx XC4VLX40 FPGA and a 3.0 Gbps throughput is achieved. The result of ASIC evaluation with 180 nm-CMOS library shows a throughput of over 4.5 Gbps with 132 KB of SRAM. Because of the small amount of memory and logic cell requirements, as well as the scalability of our scheme, higher performance is achieved by instantiating several signature matching engines when more resources are provided.

  • Parallel Architecture for 2-D Discrete Wavelet Transform with Low Energy Consumption

    Nozomi ISHIHARA  Koki ABE  

     
    PAPER-Digital Signal Processing

      Vol:
    E91-A No:8
      Page(s):
    2068-2075

    A novel two-dimensional discrete wavelet transform (2-DDWT) parallel architecture for higher throughput and lower energy consumption is proposed. The proposed architecture fully exploits full-page burst accesses of DRAM and minimizes the number of DRAM activate and precharge operations. Simulation results revealed that the architecture reduces the number of clock cycles for DRAM memory accesses as well as the DRAM power consumption with moderate cost of internal memory. Evaluation of the VLSI implementation of the architecture showed that the throughput of wavelet filtering was increased by parallelizing row filtering with a minimum area cost, thereby enabling DRAM full-page burst accesses to be exploited.

1-20hit(66hit)