Hiroaki AKUTSU Ko ARAI
Lanxi LIU Pengpeng YANG Suwen DU Sani M. ABDULLAHI
Xiaoguang TU Zhi HE Gui FU Jianhua LIU Mian ZHONG Chao ZHOU Xia LEI Juhang YIN Yi HUANG Yu WANG
Yingying LU Cheng LU Yuan ZONG Feng ZHOU Chuangao TANG
Jialong LI Takuto YAMAUCHI Takanori HIRANO Jinyu CAI Kenji TEI
Wei LEI Yue ZHANG Hanfeng XIE Zebin CHEN Zengping CHEN Weixing LI
David CLARINO Naoya ASADA Atsushi MATSUO Shigeru YAMASHITA
Takashi YOKOTA Kanemitsu OOTSU
Xiaokang Jin Benben Huang Hao Sheng Yao Wu
Tomoki MIYAMOTO
Ken WATANABE Katsuhide FUJITA
Masashi UNOKI Kai LI Anuwat CHAIWONGYEN Quoc-Huy NGUYEN Khalid ZAMAN
Takaharu TSUBOYAMA Ryota TAKAHASHI Motoi IWATA Koichi KISE
Chi ZHANG Li TAO Toshihiko YAMASAKI
Ann Jelyn TIEMPO Yong-Jin JEONG
Haruhisa KATO Yoshitaka KIDANI Kei KAWAMURA
Jiakun LI Jiajian LI Yanjun SHI Hui LIAN Haifan WU
Gyuyeong KIM
Hyun KWON Jun LEE
Fan LI Enze YANG Chao LI Shuoyan LIU Haodong WANG
Guangjin Ouyang Yong Guo Yu Lu Fang He
Yuyao LIU Qingyong LI Shi BAO Wen WANG
Cong PANG Ye NI Jia Ming CHENG Lin ZHOU Li ZHAO
Nikolay FEDOROV Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Yukasa MURAKAMI Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Kazuya KAKIZAKI Kazuto FUKUCHI Jun SAKUMA
Yitong WANG Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO
Waqas NAWAZ Muhammad UZAIR Kifayat ULLAH KHAN Iram FATIMA
Haeyoung Lee
Ji XI Pengxu JIANG Yue XIE Wei JIANG Hao DING
Weiwei JING Zhonghua LI
Sena LEE Chaeyoung KIM Hoorin PARK
Akira ITO Yoshiaki TAKAHASHI
Rindo NAKANISHI Yoshiaki TAKATA Hiroyuki SEKI
Chuzo IWAMOTO Ryo TAKAISHI
Chih-Ping Wang Duen-Ren Liu
Yuya TAKADA Rikuto MOCHIDA Miya NAKAJIMA Syun-suke KADOYA Daisuke SANO Tsuyoshi KATO
Yi Huo Yun Ge
Rikuto MOCHIDA Miya NAKAJIMA Haruki ONO Takahiro ANDO Tsuyoshi KATO
Koichi FUJII Tomomi MATSUI
Yaotong SONG Zhipeng LIU Zhiming ZHANG Jun TANG Zhenyu LEI Shangce GAO
Souhei TAKAGI Takuya KOJIMA Hideharu AMANO Morihiro KUGA Masahiro IIDA
Jun ZHOU Masaaki KONDO
Tetsuya MANABE Wataru UNUMA
Kazuyuki AMANO
Takumi SHIOTA Tonan KAMATA Ryuhei UEHARA
Hitoshi MURAKAMI Yutaro YAMAGUCHI
Jingjing Liu Chuanyang Liu Yiquan Wu Zuo Sun
Zhenglong YANG Weihao DENG Guozhong WANG Tao FAN Yixi LUO
Yoshiaki TAKATA Akira ONISHI Ryoma SENDA Hiroyuki SEKI
Dinesh DAULTANI Masayuki TANAKA Masatoshi OKUTOMI Kazuki ENDO
Kento KIMURA Tomohiro HARAMIISHI Kazuyuki AMANO Shin-ichi NAKANO
Ryotaro MITSUBOSHI Kohei HATANO Eiji TAKIMOTO
Genta INOUE Daiki OKONOGI Satoru JIMBO Thiem Van CHU Masato MOTOMURA Kazushi KAWAMURA
Hikaru USAMI Yusuke KAMEDA
Yinan YANG
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI
Fengshan ZHAO Qin LIU Takeshi IKENAGA
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI
Tomohiro KOBAYASHI Tomomi MATSUI
Shin-ichi NAKANO
Ming PAN
Katsumi TAKAHASHI Hiroai ASAMI Katsuto NAKAJIMA Masahiro IIDA
We designed an FPGA-based parallel machine called "RASH"(Reconfigurable Architecture based on Scalable Hardware) for high speed and flexible signal/data processing. Cryptanalysis is one of the killer applications for FPGA-based machines because huge amounts of logical and/or simple arithmetic operations are required and FPGA is suitable for this. One of the well-known activities in cryptanalysis is the DES (Data Encryption Standard) cracking contest conducted by RSA Data Security. TMTO (Time-Memory Trade-Off) Cryptanalysis is a practical method to dramatically shorten the time for key search when plaintext is given in advance. A string of ASCII characters is used as the key much like a password. The ASCII character is 7-bit character and is changed to 96 kinds of value. The 56-bit DES key is given with a string of 8 ASCII characters. Although the DES key has 64 trillion(=256) possibilities, the key that is given with a string has only 6.4 trillion(=968) possibilities. Therefore, we improve TMTO cryptanalysis so that we search only the limited key by ASCII characters and reduce the quantity of computation. In this paper, we demonstrate how TMTO cryptanalysis for limited key is well suited to our FPGA-based RASH machine. By limiting the key to a string, DES key will be found at 80% probability within 45 minutes after ciphertext is given on 10 units of RASH. The precomputation before starting key search takes 3 weeks on the same RASH configuration.
Tomonori YOKOYAMA Naoyuki IZU Jun-ichiro TSUCHIYA Konosuke WATANABE Hideharu AMANO Tomohiro KUDOH
A reconfigurable network interface called RHiNET-2/NI0 is developed for parallel processing of PCs distributed within one or more floors of a building. Two configurations: the HS (High Speed) configuration with only a high-speed primitive and the DSM (Distributed Shared Memory) configuration which supports sophisticated primitives can be selected by the network requirement. From the empirical evaluation, it appears that the HS configuration markedly improves the latency of data transfer compared with traditional network interfaces. On the other hand, the DSM configuration executes sophisticated primitives for distributed shared memory more than twice as fast as that of software implementation.
Shuichi ICHIKAWA Shoji YAMAMOTO
Although the subgraph isomorphism problem has various important applications, it is generally NP-complete and difficult to solve. Though a custom computing circuit can reduce the execution time substantially, it requires considerable hardware resources and is inapplicable to large problems. This paper examines the feasibility of data dependent designs, which are particularly suitable to a Field Programmable Gate Array (FPGA). The data dependent approach drastically reduces hardware requirements. For graphs of 32 vertices, the average logic scale of data dependent circuits is only 5% of the corresponding data independent circuit. The data dependent circuit is estimated to be maximally 460 times faster than the software. Even if the circuit generation time is included, a data dependent circuit is estimated to be 2.04 times faster than software for graphs of 32 vertices. The performance gain would increase for larger graphs.
Jacir L. BORDIM Yasuaki ITO Koji NAKANO
The main contribution of this paper is to present an FPGA-based implementation of an instance-specific hardware which accelerates the CKY (Cocke-Kasami-Younger) parsing for context-free grammars. Given a context-free grammar G and a string x, the CKY parsing determines whether G derives x. We have developed a hardware generator that creates a Verilog HDL source to perform the CKY parsing for any given context-free grammar G. The generated source is embedded in an FPGA using the design software provided by the FPGA vendor. We evaluated the instance-specific hardware, generated by our hardware generator, using a timing analyzer and tested it using the Altera FPGAs. The generated hardware attains a speed-up factor of approximately 750 over the software CKY parsing algorithm.
The main contribution of this paper is to present an image retrieval system using FPGAs. Given a template image T and a database of a number of images I1, I2,
John A. WILLIAMS Anwar S. DAWOOD Stephen J. VISSER
In this paper we present reconfigurable computing as a compelling choice of computing platform for real-time, onboard processing for satellite applications. In particular, we discuss the use of reconfigurable computing in the context of a real-time remote sensing system, providing motivation for such a system and describing attributes of reconfigurable computing that support it as the technology of choice. The High Performance Computing (HPC-I) payload, designed and developed for the Australian scientific satellite FedSat, is introduced as a demonstration of onboard processing in space using reconfigurable logic. We present an overview of the real-time remote sensing system architecture, and describe the design and implementation of three remote sensing algorithms in HPC-I for cloud masking, wildfire detection and volcanic plume detection. Finally, results from simulation and testing are presented which show very promising performance in terms of data throughput and detection capabilities.
Kazuya TANIGAWA Tetsuo HIRONAKA Akira KOJIMA Noriyoshi YOSHIDA
Reconfigurable architectures have been focused for its potential on achieving high performance by reconfiguring special purpose circuits for a target application and its flexibility due to its ability of reconfiguring. We have set our sights on use of a reconfigurable architecture as a general-purpose computer by extending the advantageous properties of the architecture. To achieve the goal, a generalized execution model for reconfigurable architecture is required, so we have proposed an Ideal PARallel Structure (I-PARS) execution model. In the I-PARS execution model, any programs based on its model has no restriction depending on hardware structures based on a specific reconfigurable processor, which makes it easier to develop software. Further, we have proposed a PARS architecture which executes programs based on the I-PARS execution model effectively. The PARS architecture has a large reconfigurable part for highly parallel execution, which utilizes parallelism described on the I-PARS execution model. For effective utilization of the reconfigurable part in the PARS architecture, it has an ability to reconfigure and execute operations simultaneously in one cycle. Further, the PARS architecture supports branch operations to introduce control flow in an execution on the architecture, which makes it possible to skip an execution which does not produce a valid result. In this paper, we introduce the detailed structure of an implemented prototype processor based on the PARS architecture. In the implementation, 420,377 CMOS transistors were used, which was only 3.8% of the number of transistors used in the UltraSPARC-III in logic circuits. Additionally, we evaluated the performance of the prototype processor by using some benchmark programs. From the evaluation results, we found that the prototype processor could achieve nearly the same performance and be implemented with extremely the less number of transistors compared with UltraSPARC-III 750MHz.
Masaki UNO Yuichiro SHIBATA Hideharu AMANO
WASMII is a virtual hardware system that executes dataflow algorithms using a dynamically reconfigurable multi-context device with a data driven control mechanism. Although the effectiveness of the system has been evaluated through simulations and using an emulator, implementation of WASMII was infeasible due to the unavailability of such a device. However, the first prototype of a practical dynamically reconfigurable multi-context device called DRL has been developed by NEC, and we developed a reconfigurable test bed using four sample DRL chips. On this board, we have implemented and executed some simple applications of WASMII mechanism. Evaluation results show that the performance of the parallel implementation of WASMII is almost twice as that of a PC with a CPU based on the corresponding technology.
Jun'ichiro TAKEMOTO Toshihiro GOTO Yuichiro SHIBATA Kiyoshi OGURI
In this paper, the efficient structure of an LUT (look-up table) for an asynchronous reconfigurable PCA (Plastic Cell Architecture) device is investigated. A total of 15 types of implementation alternatives for LUTs are evaluated and compared in an empirical manner in which full custom layout design is developed and simulated. The evaluation results show that by introducing transmission gates in memory cells in an LUT, read time can be improved by 14.3% at the cost of 13.6% area increase compared to a conventional speed oriented implementation. It is also shown that use of transmission gates reduces 6.4% of area and 19.2% of read time against a conventional area oriented LUT implementation.
Hideyuki ITO Ryusuke KONISHI Hiroshi NAKADA Kiyoshi OGURI Minoru INAMORI Akira NAGOYA
This paper describes the realization of a dynamically reconfigurable logic LSI based on a novel parallel computer architecture. The key point of the architecture is its dual-structured cell array which enables dynamic and autonomous reconfiguration of the logic circuits. The LSI was completed by successfully introducing two specific features: fully asynchronous logic circuits and a homogeneous structure, only LUTs are used.
Takuya OKAMOTO Takafumi YUASA Tomonori IZUMI Takao ONOYE Yukihiro NAKAMURA
A configurable device "PCA-Chip2" implements the concept of Plastic Cell Architecture, which is an extension of programmable logic devices. This paper presents basic design tools for the PCA-Chip2 as the first step to develop the total design environment. Given a C description of a target function, configuration data for PCA-Chip2 is automatically generated by the tools. Trial designs by the tools are also presented to demonstrate the practicability of the proposed approach.
Kousuke KATAYAMA Atsushi IWATA
In this paper, we propose a novel pulse-coupled neural network (PCNN) simulator using a programmable gate array (PGA) technique. The simulator is composed of modified phase-locked loops (PLLs) and a programmable gate array (PGA). The PLL, which is modified by the addition of multiple inputs and multiple feedbacks, works as a neuron. The PGA, which controls the network connection, works as nodes of dendritic trees. This simulator, which has 16 neurons and 32
Isamu KAJITANI Masaya IWATA Nobuyuki OTSU Tetsuya HIGUCHI
This paper presents a new reconfigurable hardware paradigm, called evolvable hardware (EHW), and its application to the biomedical engineering problem of an artificial hand controller. Evolvable hardware is based on the idea of combining a reconfigurable hardware device with an artificial intelligence robust search technique called genetic algorithms (GAs) to execute reconfiguration autonomously. The first version of the EHW chip was designed in 1998, and this paper describes the latest improvements to the EHW chip, as well as outlining its architecture and the hardware implementation of the GA operations. Execution speed for genetic operations is shown to be about 38.7 times faster with the hardware implementation than with software program running on an AMD Athlon processor (1.2GHz). As an application of the EHW chip, this paper introduces a controller for a multi-functional prosthetic-hand, and presents experimental data in which a practical myoelectric pattern classification rate of 97.8% was achieved through the application of the EHW chip.
Spyridon BLIONAS Konstantinos MASSELOS Chrissavgi DRE Christos DROSOS Fragkiskos IEROMNIMON Dimitris METAFAS Thanasis PAGONIS Aristodemos PNEVMATIKAKIS Anna TATSAKI Theodor TRIMIS Adamandios VONTZALIDIS
In this paper the development of the prototyping platform of a partly reconfigurable System-on-Chip (SoC) for wireless LANs, is described. It is designed to realize both HIPERLAN/2 and IEEE 802.11a wireless LAN systems. The current version of the system includes Mobile Terminal and AP functionality only for indoor use. Future firmware versions (configurations for its reconfigurable part) will upgrade system's functionality to allow its operation in outdoor environments and in wireless point-to-point links. The target System-on-Chip implementation platform will include instruction set processor cores, ASIC blocks and embedded reconfigurable blocks to achieve an optimal balance between implementation efficiency (area, power, performance) and flexibility. The system's prototype is developed on the ARM integrator platform and all firmware versions will be verified before ASIC prototyping.
Takanori HAYASHIDA Kazuaki MURAKAMI
Online profiling methodologies are studied for exploiting dynamic optimization. On a dynamic optimizable system with online profilers, it has to get accurate profile in early step of the program execution for effective execution. However, for getting more effective profile by online profiling, it has to satisfy "Rapidness" and "Accuracy". They are conflicted requirements. Therefore, it has to choose trade-off point at implementation. We focused into online Hot Instruction Sequence (HIS) profiler to exploit reconfigurable functional units. To circumstantiate the effectiveness of online HIS profiling, we build some evaluation models for experimental evaluation. Our profiler models are SC/DM, SC/FA and JC/DM. These models have different policy of event counting and table lookup. Our event counting policies are simple-counting or jumble-counting. On the other hand, table lookup policies are direct-map or full-associative. In our experimental evaluation, SC/FA and JC/DM models scored higher accuracy than SC/DM. The JC/DM model is able to implement by lower cost for table lookup, but it scored high accuracy comparable to SC/FA.
Yasunori NAGATA Masao MUKAIDONO
Some of the recent digital systems have a serious clock skew problem due to huge hardware implementation and high-speed operation in VLSI's. To overcome this problem, clock distribution techniques and, more notably, asynchronous system design methodologies have been investigated. Since the latest asynchronous digital systems use two-rail logic with two-phase data transfer manner, more than two-fold hardware is required in comparison with the synchronous system. In this article, we present a design of asynchronous digital system which is based on B-ternary logic that can process binary data. The system which is based on speed-independent mode consists of data-path and its controller. Then we provide B-ternary two-phase binary data processing in the data-path and its control procedure with hand-shake protocol. To implement the system some functional elements are presented, that is, a ternary-in/binary-out register with request/acknowledge circuits and a control unit. These functional elements are fabricated with ternary NOR, NAND, INV gates and ternary-in/binary-out D-FF (D-elements). The B-ternary based asynchronous circuit has less interconnections, achives race-free operations and makes use of conventional binary powerful design tools. Particularly, we extend the speed-independent delay model to relativity delays in order to reduce hardware overhead of checking memory stability in the system. As a concrete example, a carry-completion type asynchronous adder system is demonstrated under extended speed-independent mode to show the validity of the extension.
Katsuhiko METSUGI Kazuaki MURAKAMI
TLSP (Thread-Level Speculative Parallel processing) architecture is a growing processor architecture. Parallelism of a program executed on this architecture is ruled by the combination of techniques which relax data dependences. In this paper, we evaluate the limits of parallelism of the TLSP architecture by using abstract machine models. We have three major results. First, if we use solely each technique which relaxes data dependences, "renaming" has a large effect on the TLSP architecture. Second, combinatorial use of "memory disambiguation" and "renaming" leads to huge parallelism. Third, constant effects are obtained by concurrent use of "value prediction" and other techniques.
Xiang-Yan ZENG Yen-Wei CHEN Zensho NAKAO Hanqing LU
We propose a novel pixel pattern-based approach for texture classification, which is independent of the variance of illumination. Gray scale images are first transformed into pattern maps in which edges and lines, used for characterizing texture information, are classified by pattern matching. We employ principal component analysis (PCA) which is widely applied to feature extraction. We use the basis functions learned through PCA as templates for pattern matching. Using PCA pattern maps, the feature vector is comprised of the numbers of the pixels belonging to a specific pattern. The effectiveness of the new feature is demonstrated by applications to the image retrievals of the Brodatz texture database. Comparisons with multichannel and multiresolution features indicate that the new feature is quite time saving, free of the influence of illumination, and has comparable accuracy.
Ilyas POTAMITIS Nikos FAKOTAKIS George KOKKINAKIS
Our work introduces a speech enhancement algorithm that modifies on-line the spectral representation of degraded speech to approximate the spectral coefficients of high quality speech. The proposed framework is based on the application of Discrete Fourier Transform (DFT) to a large ensemble of clean speech frames and the estimation of parametric, heavy-tail non-Gaussian probability distributions for the spectral magnitude. Each clean spectral band possesses a unique pdf. This is selected according to the smallest Kullback-Leibler divergence between each candidate heavy-tail pdf and the non-parametric pdf of the magnitude of each spectral band of the clean ensemble. The parameters of the distributions are derived by Maximum Likelihood Estimation (MLE). A maximum a-posteriori (MAP) formulation of the degraded spectral bands leads to soft threshold functions, optimally derived from the statistics of each spectral band and effectively reducing white and slowly varying coloured Gaussian noise. We evaluate the new algorithm on the task of improving the quality of speech perception as well as Automatic Speech Recognition (ASR) and demonstrate its robustness at SNRs as low as 0 dB.
Conny GUNADI Hiroyuki SHIMIZU Kazuya KODAMA Kiyoharu AIZAWA
Construction of large-scale virtual environment is gaining more attentions for its applications in virtual mall, virtual sightseeing, tele-presence, etc. This paper presents a framework for building a realistic virtual environment from geometry-based approach. We propose an algorithm to construct a realistic 3-D model from multi-view range data and multi-view texture images. The proposed method tries to adopt the result of region segmentation of range images in some phases of the modeling process. It is shown that the relations obtained from region segmentation are quite effective in improving the result of registration as well as mesh merging.
Kiminori SATO Nan HE Yukitoshi TAKAHASHI
Partial face images, e.g., eyes, nose, and ear images are significant for face recognition. In this paper, we present a method for partial face extraction and recognition based on Radial Basis Function (RBF) networks. Focus has been centered on using ear images because they are not influenced by facial expression, and the influences of aging are negligible. Original human side face image with 320
Hironori OKII Takashi UOZUMI Koichi ONO Hong YAN
In this paper, we propose a new computer-aided diagnosis system which can extract specific features from hematoxylin and eosin (HE)-stained breast tumor images and evaluate the type of tumor using artificial organisms. The gene of the artificial organisms is defined by three kinds of texture features, which can evaluate the specific features of the tumor region in the image. The artificial organisms move around in the image and investigate their environmental conditions during the searching process. When the target pixel is regarded as a tumor region, the organism obtains energy and produces offspring; organisms in other regions lose energy and die. The searching process is iterated until the 30th generation; as a result, tumor regions are filled with artificial organisms. Whether the detected tumor is benign or malignant is evaluated based on the combination of selected genes. The method developed was applied to 27 test cases and the distinction between benign and malignant tumors by the artificial organisms was successful in about 90% of tumor images. In this diagnosis support system, the combination of genes, which represents specific features of detected tumor region, is selected automatically for each tumor image during the searching process.
In many distributed systems, tokens are fundamental tools to manage resources shared by processes. Thus, monitoring tokens has become a significant problem in developing the distributed programs. This paper formulates the problems of monitoring tokens in terms of detecting the special global predicates, called summative global predicates. In this paper, several algorithms to detect various summative global predicates are developed and their time complexities are discussed.
We propose a method for camera calibration based on image registration. This method registers two images; one is a real image captured by a camera with a calibration object with known shape and texture, and the other is a synthetic image containing the object. The proposed method estimates the parameters of the rotation and translation of the object by using the depth information of the synthetic image. The Gauss-Newton method is used to minimize the residuals of intensities of the two images. The proposed method does not depend on initial values of the minimization, and is applicable to images with much noise. Experimental results using real images demonstrate the robustness against initial state and noise on the image.