IEICE global.ieice.org Site

Keyword Search Result

[Keyword] coprocessor(5hit)

1-5hit

Design and Implement of High Performance Crypto Coprocessor
Shice NI Yong DOU Kai CHEN Jie ZHOU

LETTER-Algorithms and Data Structures

Vol:
E97-A No:4
Page(s):
989-990
This letter proposes a novel high performance crypto coprocessor that relies on Reconfigurable Cryptographic Blocks. We implement the prototype of the coprocessor on Xilinx FPGA chip. And the pipelining technique is adopted to realize data paralleling. The results show that the coprocessor, running at 189MHz, outperforms the software-based SSL protocol.
A K-Means-Based Multi-Prototype High-Speed Learning System with FPGA-Implemented Coprocessor for 1-NN Searching
Fengwei AN Tetsushi KOIDE Hans Jürgen MATTAUSCH

PAPER-Biocybernetics, Neurocomputing

Vol:
E95-D No:9
Page(s):
2327-2338
In this paper, we propose a hardware solution for overcoming the problem of high computational demands in a nearest neighbor (NN) based multi-prototype learning system. The multiple prototypes are obtained by a high-speed K-means clustering algorithm utilizing a concept of software-hardware cooperation that takes advantage of the flexibility of the software and the efficiency of the hardware. The one nearest neighbor (1-NN) classifier is used to recognize an object by searching for the nearest Euclidean distance among the prototypes. The major deficiency in conventional implementations for both K-means and 1-NN is the high computational demand of the nearest neighbor searching. This deficiency is resolved by an FPGA-implemented coprocessor that is a VLSI circuit for searching the nearest Euclidean distance. The coprocessor requires 12.9% logic elements and 58% block memory bits of an Altera Stratix III E110 FPGA device. The hardware communicates with the software by a PCI Express (4) local-bus-compatible interface. We benchmark our learning system against the popular case of handwritten digit recognition in which abundant previous works for comparison are available. In the case of the MNIST database, we could attain the most efficient accuracy rate of 97.91% with 930 prototypes, the learning speed of 1.310-4 s/sample and the classification speed of 3.9410-8 s/character.
Montgomery Multiplication with Twice the Bit-Length of Multipliers
Masayuki YOSHINO Katsuyuki OKEYA Camille VUILLAUME

PAPER-Implementation

Vol:
E91-A No:1
Page(s):
203-210
We present a novel approach for computing 2n-bit Montgomery multiplications with n-bit hardware Montgomery multipliers. Smartcards are usually equipped with such hardware Montgomery multipliers; however, due to progresses in factoring algorithms, the recommended bit length of public-key schemes such as RSA is steadily increasing, making the hardware quickly obsolete. Thanks to our double-size technique, one can re-use the existing hardware while keeping pace with the latest security requirements. Unlike the other double-size techniques which rely on classical n-bit modular multipliers, our idea is tailored to take advantage of n-bit Montgomery multipliers. Thus, our technique increases the perenniality of existing products without compromises in terms of security.
Synthesis of Application-Specific Coprocessor for Core-Based ASIC Design
Dae-Hyun LEE In-Cheol PARK Chong-Min KYUNG

PAPER-VLSI Design Technology and CAD

Vol:
E84-A No:2
Page(s):
604-613
This paper presents an efficient approach for a hardware/software partitioning problem: synthesis of an application-specific coprocessor which accelerates an embedded software running on a main processor. Given a set of data flow graphs (DFGs), most of previous hardware/software partitioning approaches have focused on mapping DFGs to hardware or software. Their common weaknesses are that 1) they ignore various implementation alternatives in realizing DFGs as hardware based on the assumption that only a single hardware implementation exists for a DFG, and that 2) they don't consider the effect of merging on hardware area when synthesizing a coprocessor by merging DFGs. To deal with the first issue, we formulate both the mapping of DFGs to hardware or software and the selection of the appropriate hardware implementation for each DFG as a single integer programming problem, and then apply an iterative algorithm based on the Kernighan and Lin's heuristic to solve the problem. To reduce the CPU time, we have devised data structures that quickly calculate costs of hardware implementations. To deal with the second issue, our method links DFGs with dummy nodes to produce a single large DFG, and then synthesizes a target coprocessor by globally scheduling the DFG and allocating its datapath. Experimental results demonstrate that our approach outperforms the previous approach based on genetic algorithm (GA) in both the coprocessor area and the CPU time.
An Algorithm for the K-Selection Problem Using Special-Purpose Sorters
Heung-Shik KIM Jong-Soo PARK Myunghwan KIM

PAPER-Algorithm and Computational Complexity

Vol:
E75-D No:5
Page(s):
704-708
An algorithm is presented for selecting the k-th smallest element of a totally ordered (but not sorted) set of n elements, 1kn, in the case that a special-purpose sorter is used as a coprocessor. When the pipeline merge sorter is used as the special-purpose sorter, we analyze the comparison complexity of the algorithm for the given capacity of the sorter. The comparison complexity of the algorithm is 1.4167no(n), provided that the capacity of the sorter is 256 elements. The comparison complexity of the algorithm decreases as the capacity of the sorter increases.