Da XIAO Lvyin YANG Chuanyi LIU Bin SUN Shihui ZHENG
Provable Data Possession (PDP) schemes enable users to efficiently check the integrity of their data in the cloud. Support for massive and dynamic sets of data and adaptability to third-party auditing are two key factors that affect the practicality of existing PDP schemes. We propose a secure and efficient PDP system called IDPA-MF-PDP, by exploiting the characteristics of real-world cloud storage environments. The cost of auditing massive and dynamic sets of data is dramatically reduced by utilizing a multiple-file PDP scheme (MF-PDP), based on the data update patterns of cloud storage. Deployment and operational costs of third-party auditing and information leakage risks are reduced by an auditing framework based on integrated data possession auditors (DPAs), instantiated by trusted hardware and tamper-evident audit logs. The interaction protocols between the user, the cloud server, and the DPA integrate MF-PDP with the auditing framework. Analytical and experimental results demonstrate that IDPA-MF-PDP provides the same level of security as the original PDP scheme while reducing computation and communication overhead on the DPA, from linear the size of data to near constant. The performance of the system is bounded by disk I/O capacity.
Atsushi OOKA Shingo ATA Kazunari INOUE Masayuki MURATA
Content-centric networking (CCN) is an innovative network architecture that is being considered as a successor to the Internet. In recent years, CCN has received increasing attention from all over the world because its novel technologies (e.g., caching, multicast, aggregating requests) and communication based on names that act as addresses for content have the potential to resolve various problems facing the Internet. To implement these technologies, however, requires routers with performance far superior to that offered by today's Internet routers. Although many researchers have proposed various router components, such as caching and name lookup mechanisms, there are few router-level designs incorporating all the necessary components. The design and evaluation of a complete router is the primary contribution of this paper. We provide a concrete hardware design for a router model that uses three basic tables — forwarding information base (FIB), pending interest table (PIT), and content store (CS) — and incorporates two entities that we propose. One of these entities is the name lookup entity, which looks up a name address within a few cycles from content-addressable memory by use of a Bloom filter; the other is the interest count entity, which counts interest packets that require certain content and selects content worth caching. Our contributions are (1) presenting a proper algorithm for looking up and matching name addresses in CCN communication, (2) proposing a method to process CCN packets in a way that achieves high throughput and very low latency, and (3) demonstrating feasible performance and cost on the basis of a concrete hardware design using distributed content-addressable memory.
A new trigger circuit based on up/down counter is proposed. This trigger circuit consists of a up/down counter and a pulse conversion circuit. Compared with a trigger circuit based on 32-bit counter, the proposed trigger circuit occupies less circuit area and consumes less power consumption, while the trigger process can be inversed, increasing the controllability of the Trojan.
Naoki MIURA Akihiko MIYAZAKI Junichi KATO Nobuyuki TANAKA Satoshi SHIGEMATSU Masami URANO Mamoru NAKANISHI Tsugumichi SHIBATA
A 10-gigabit Ethernet passive optical network (10G-EPON) is promising for the next generation of access networks. A protocol processor for 10G-EPON needs to not only achieve 10-Gbps throughput but also to have protocol extendibility for various potential services. However, the conventional protocol processor does not have the ability to install additional protocols after chip fabrication, due to its hardware-based architecture. This paper presents a software-hardware cooperative protocol processor for 10G-EPON that provides the protocol extendibility. To achieve the software-hardware cooperation, the protocol processor newly employs a software-hardware partitioning technique driven by the timing requirements of 10G-EPON and a software-hardware interface circuit with event FIFO to absorb performance difference between software and hardware. The fabricated chip with this protocol processor properly works cooperatively and is able to accept newly standardized protocols. This protocol processor enables network operators to install additional service protocols adaptively for their own services.
Takehiko AMAKI Masanori HASHIMOTO Takao ONOYE
This paper presents an oscillator-based true random number generator (TRNG) that dynamically unbiases 0/1 probability. The proposed TRNG automatically adjusts the duty cycle of a fast oscillator to 50%, and generates unbiased random numbers tolerating process variation and dynamic temperature fluctuation. A prototype chip of the proposed TRNG was fabricated with a 65nm CMOS process. Measurement results show that the developed duty cycle monitor obtained the probability of ‘1’ 4,100 times faster than the conventional output bit observation, or estimated the probability with 70 times higher accuracy. The proposed TRNG adjusted the probability of ‘1’ to within 50±0.07% in five chips in the temperature range of 0°C to 75°C. Consequently, the proposed TRNG passed the NIST and DIEHARD tests at 7.5Mbps with 6,670µm2 area.
Mahmoud BAKHSHIZADEH Ali JAHANIAN
Hardware Trojan or any other kind of unwanted hardware modifications has been thought as a major challenge in many commercial and secure applications. Currently, detection and prevention of hardware Trojans appeared as an important requirement in such systems. In this paper, a new concept, Trojan Vulnerability Map, is introduced to model the immunity of various regions of hardware against hardware attacks. Then, placement and routing algorithms are proposed to improve the immunity of hardware using the Trojan Vulnerability Map. Experimental results show that the proposed placement and routing algorithm reduces the hardware vulnerability by 25.65% and 4.08%, respectively. These benefits are earned in cost of negligible total wire length and delay overhead.
Takayuki AKAMINE Mohamad Sofian ABU TALIP Yasunori OSANA Naoyuki FUJITA Hideharu AMANO
Computational fluid dynamics (CFD) is an important tool for designing aircraft components. FaSTAR (Fast Aerodynamics Routines) is one of the most recent CFD packages and has various subroutines. However, its irregular and complicated data structure makes it difficult to execute FaSTAR on parallel machines due to memory access problem. The use of a reconfigurable platform based on field programmable gate arrays (FPGAs) is a promising approach to accelerating memory-bottlenecked applications like FaSTAR. However, even with hardware execution, a large number of pipeline stalls can occur due to read-after-write (RAW) data hazards. Moreover, it is difficult to predict when such stalls will occur because of the unstructured mesh used in FaSTAR. To eliminate this problem, we developed an out-of-order mechanism for permuting the data order so as to prevent RAW hazards. It uses an execution monitor and a wait buffer. The former identifies the state of the computation units, and the latter temporarily stores data to be processed in the computation units. This out-of-order mechanism can be applied to various types of computations with data dependency by changing the number of execution monitors and wait buffers in accordance with the equations used in the target computation. An out-of-order system can be reconfigured by automatic changing of the parameters. Application of the proposed mechanism to five subroutines in FaSTAR showed that its use reduces the number of stalls to less than 1% compared to without the mechanism. In-order execution was speeded up 2.6-fold and software execution was speeded up 2.9-fold using an Intel Core 2 Duo processor with a reasonable amount of overhead.
Qingyi GU Abdullah AL NOMAN Tadayoshi AOYAMA Takeshi TAKAKI Idaku ISHII
In this paper, we present a high frame rate (HFR) vision system that can automatically control its exposure time by executing brightness histogram-based image processing in real time at a high frame rate. Our aim is to obtain high-quality HFR images for robust image processing of high-speed phenomena even under dynamically changing illumination, such as lamps flickering at 100 Hz, corresponding to an AC power supply at 50 / 60 Hz. Our vision system can simultaneously calculate a 256-bin brightness histogram for an 8-bit gray image of 512×512 pixels at 2000 fps by implementing a brightness histogram calculation circuit module as parallel hardware logic on an FPGA-based high-speed vision platform. Based on the HFR brightness histogram calculation, our method realizes automatic exposure (AE) control of 512×512 images at 2000 fps using our proposed AE algorithm. The proposed AE algorithm can maximize the number of pixels in the effective range of the brightness histogram, thus excluding much darker and brighter pixels, to improve the dynamic range of the captured image without over- and under-exposure. The effectiveness of our HFR system with AE control is evaluated according to experimental results for several scenes with illumination flickering at 100 Hz, which is too fast for the human eye to see.
Mingfu XUE Wei LIU Aiqun HU Youdong WANG
Hardware Trojan (HT) has emerged as an impending security threat to hardware systems. However, conventional functional tests fail to detect HT since Trojans are triggered by rare events. Most of the existing side-channel based HT detection techniques just simply compare and analyze circuit's parameters and offer no signal calibration or error correction properties, so they suffer from the challenge and interference of large process variations (PV) and noises in modern nanotechnology which can completely mask Trojan's contribution to the circuit. This paper presents a novel HT detection method based on subspace technique which can detect tiny HT characteristics under large PV and noises. First, we formulate the HT detection problem as a weak signal detection problem, and then we model it as a feature extraction model. After that, we propose a novel subspace HT detection technique based on time domain constrained estimator. It is proved that we can distinguish the weak HT from variations and noises through particular subspace projections and reconstructed clean signal analysis. The reconstructed clean signal of the proposed algorithm can also be used for accurate parameter estimation of circuits, e.g. power estimation. The proposed technique is a general method for related HT detection schemes to eliminate noises and PV. Both simulations on benchmarks and hardware implementation validations on FPGA boards show the effectiveness and high sensitivity of the new HT detection technique.
Duc-Hung LE Tran-Bao-Thuong CAO Katsumi INOUE Cong-Kha PHAM
In this paper, the authors present a CAM-based Information Detection Hardware System for fast, exact and approximate image matching on 2-D data, using FPGA. The proposed system can be potentially applied to fast image matching with various required search patterns, without using search principles. In designing the system, we take advantage of Content Addressable Memory (CAM) which has parallel multi-match mode capability and has been designed, using dual-port RAM blocks. The system has a simple structure, and does not employ any Central Processor Unit (CPU) or complicated computations.
Yuto NAKANO Kazuhide FUKUSHIMA Shinsaku KIYOMOTO Tsukasa ISHIGURO Yutaka MIYAKE Toshiaki TANAKA Kouichi SAKURAI
KCipher-2 is a word-oriented stream cipher and an ISO/IEC 18033 standard. It is listed as a CRYPTREC cryptographic algorithm for Japanese governmental use. It consists of two feedback shift registers and a non-linear function. The size of each register in KCipher-2 is 32 bits and the non-linear function mainly applies 32-bit operations. Therefore, it can be efficiently implemented as software. SNOW-family stream ciphers are also word-oriented stream ciphers, and their high performance has already been demonstrated.We propose optimised implementations of KCipher-2 and compare their performance to that of the SNOW-family and other eSTREAM portfolios. The fastest algorithm is SNOW 2.0 and KCipher-2 is the second fastest despite the complicated irregular clocking mechanism. However, KCipher-2 is the fastest of the feasible algorithms, as SNOW 2.0 has been shown to have a security flaw. We also optimise the hardware implementation for the Virtex-5 field-programmable gate array (FPGA) and show two implementations. The first implementation is a rather straightforward optimisation and achieves 16,153 Mbps with 732 slices. In the second implementation, we duplicate the non-linear function using the structural advantage of KCipher-2 and we achieve 17,354 Mbps with 813 slices. Our implementation of KCipher-2 is around three times faster than those of the SNOW-family and efficiency, which is evaluated by “Throughput/Area (Mbps/slice)”, is 3.6-times better than that of SNOW 2.0 and 8.5-times better than that of SNOW 3G. These syntheses are performed using Xilinx ISE version 12.4.
Tisheng ZHANG Hongping ZHANG Yalong BAN Kunlun YAN Xiaoji NIU Jingnan LIU
A deeply-coupled system can feed the INS information into a GNSS receiver, and the signal tracking precision can be improved under dynamic conditions by reducing tracking loop bandwidth without losing tracking reliability. In contrast to the vector-based deep integration, the scalar-based GNSS/INS deep integration is a relatively simple and practical architecture, in which all individual DLL and PLL are still exist. Since the implementation of a deeply-couple system needs to modify the firmware of a commercial hardware GNSS receiver, very few studies are reported on deep integration based on hardware platform, especially from academic institutions. This implementation-complexity issue has impeded the development of the deeply-coupled GNSS receivers. This paper introduces a scalar-based MEMS IMU/GNSS deeply-coupled system based on an integrated embedded hardware platform for real-time implementation. The design of the deeply-coupled technologies is described including the system architecture, the model of the inertial-aided tracking loop, and the relevant tracking errors analysis. The implementation issues, which include platform structure, real-time optimization, and generation of aiding information, are discussed as well. The performance of the inertial aided tracking loop and the final navigation solution of the developed deeply-coupled system are tested through the dynamic road test scenarios created by a hardware GNSS/INS simulator with GPS L1 C/A signals and low-level MEMS IMU analog signals outputs. The dynamic tests show that the inertial-aided PLL enables a much narrow tracking loop bandwidth (e.g. 3Hz) under dynamic scenarios; while the non-aided loop would lose lock with such narrow loop bandwidth once maneuvering commences. The dynamic zero-baseline tests show that the Doppler observation errors can be reduced by more than 50% with inertial aided tracking loop. The corresponding navigation results also show that the deep integration improved the velocity precision significantly.
Takahiro SUZUKI Takeshi IKENAGA
Scale-Invariant Feature Transform (SIFT) has lately attracted attention in computer vision as a robust keypoint detection algorithm which is invariant for scale, rotation and illumination changes. However, its computational complexity is too high to apply in practical real-time applications. This paper proposes a low complexity keypoint extraction algorithm based on SIFT descriptor and utilization of the database, and its real-time hardware implementation for Full-HD resolution video. The proposed algorithm computes SIFT descriptor on the keypoint obtained by corner detection and selects a scale from the database. It is possible to parallelize the keypoint detection and descriptor computation modules in the hardware. These modules do not depend on each other in the proposed algorithm in contrast with SIFT that computes a scale. The processing time of descriptor computation in this hardware is independent of the number of keypoints because its descriptor generation is pipelining structure of pixel. Evaluation results show that the proposed algorithm on software is 12 times faster than SIFT. Moreover, the proposed hardware on FPGA is 427 times faster than SIFT and 61 times faster than the proposed algorithm on software. The proposed hardware performs keypoint extraction and matching at 60 fps for Full-HD video.
Shan-Chun KUO Hong-Yuan JHENG Fan-Chieh CHENG Shanq-Jang RUAN
In this letter, a design of inverse discrete cosine transform for energy-efficient watermarking mechanism based on DS-CDMA with significant energy and area reduction is presented. Taking advantage of converged input data value set as a precomputation concept, the proposed one-dimensional IDCT is a multiplierless hardware which differs from Loeffler architecture and has benefits of low complexity and low power consumption. The experimental results show that our design can reduce 85.2% energy consumption and 58.6% area. Various spectrum and spatial attacks are also tested to corroborate the robustness.
Kazuhiko MITSUYAMA Tetsuomi IKEDA Tomoaki OHTSUKI
Multiple-input multiple-output (MIMO) systems with antenna selection are practical in that they can alleviate the computational complexity at the receiver and achieve good reception performance. Channel correlation, not just carrier-to-noise ratio (CNR), has a great impact on reception performance in MIMO channels. We propose a practical receive antenna subset selection algorithm with reduced complexity that uses the condition number of the partial channel matrix and a predetermined CNR threshold. This paper describes the algorithm and its performance evaluation by both computer simulation and indoor experiments using a prototype receiver and received signals obtained in an actual mobile outdoor experiment. The results confirm that our proposed method provides good bit error rate performance by setting the CNR threshold properly.
This paper proposes an enhanced feature detection method for the OFDM signals of digital TV (DTV) standards, namely Digital Video Broadcasting-Terrestrial (DVB-T) and Integrated Services Digital Broadcasting-Terrestrial (ISDB-T). The proposed method exploits property of time-domain sliding correlation results of DTV signals with the pilots that are inserted into OFDM symbols. Some correlation outputs are much larger than the remaining outputs and are called correlation peaks here, and, the distance between their positions in the correlation output sequence keep constant regardless of the received DTV timings. The proposed method then derives sensing test statistic with improved SNR by aggregating the correlation peaks based on their positions. Performance of the proposed method is evaluated by both computer simulation and hardware implementation. Simulation results for DVB-T detection verify that compared to the optimal conventional sensing method, the proposed method achieves superior sensing performance. It reduces sampling time by about 25% for the same sensing performance while increasing computational complexity by around 0.0001%. Hardware performance further verifies that the proposed method is able to accurately detect ISDB-T at the low SNR of -14.5 dB by employing 8 OFDM symbol durations of samples.
Jun GAO Minxuan ZHANG Zuocheng XING Chaochao FENG
This paper proposes a Reduced Explicitly Parallel Instruction Computing Processor (REPICP) which is an independently designed, 64-bit, general-purpose microprocessor. The REPICP based on EPIC architecture overcomes the disadvantages of hardware-based superscalar and software-based Very Long Instruction Word (VLIW) and utilizes the cooperation of compiler and hardware to enhance Instruction-Level Parallelism (ILP). In REPICP, we propose the Optimized Lock-Step execution Model (OLSM) and instruction control pipeline method. We also propose reduced innovative methods to optimize the design. The REPICP is fabricated in Artisan 0.13 µm Nominal 1P8M process with 57 M transistors. The die size of the REPICP is 100 mm2 (1010), and consumes only 12 W power when running at 300 MHz.
Mohamad Sofian ABU TALIP Takayuki AKAMINE Yasunori OSANA Naoyuki FUJITA Hideharu AMANO
Computational Fluid Dynamics (CFD) is used as a common design tool in the aerospace industry. UPACS, a package for CFD, is convenient for users, since a customized simulator can be built just by selecting desired functions. The problem is its computation speed, which is difficult to enhance by using the clusters due to its complex memory access patterns. As an economical solution, accelerators using FPGAs are hopeful candidate. However, the total scale of UPACS is too large to be implemented on small numbers of FPGAs. For cost efficient implementation, partial reconfiguration which dynamically loads only required functions is proposed in this paper. Here, the MUSCL scheme, which is used frequently in UPACS, is selected as a target. Partial reconfiguration is applied to the flux limiter functions (FLF) in MUSCL. Four FLFs are implemented for Turbulence MUSCL (TMUSCL) and eight FLFs are for Convection MUSCL (CMUSCL). All FLFs are developed independently and separated from the top MUSCL module. At start-up, only required FLFs are selected and deployed in the system without interfering the other modules. This implementation has successfully reduced the resource utilization by 44% to 63%. Total power consumption also reduced by 33%. Configuration speed is improved by 34-times faster as compared to full reconfiguration method. All implemented functions achieved at least 17 times speed-up performance compared with the software implementation.
Fengwei AN Tetsushi KOIDE Hans Jürgen MATTAUSCH
In this paper, we propose a hardware solution for overcoming the problem of high computational demands in a nearest neighbor (NN) based multi-prototype learning system. The multiple prototypes are obtained by a high-speed K-means clustering algorithm utilizing a concept of software-hardware cooperation that takes advantage of the flexibility of the software and the efficiency of the hardware. The one nearest neighbor (1-NN) classifier is used to recognize an object by searching for the nearest Euclidean distance among the prototypes. The major deficiency in conventional implementations for both K-means and 1-NN is the high computational demand of the nearest neighbor searching. This deficiency is resolved by an FPGA-implemented coprocessor that is a VLSI circuit for searching the nearest Euclidean distance. The coprocessor requires 12.9% logic elements and 58% block memory bits of an Altera Stratix III E110 FPGA device. The hardware communicates with the software by a PCI Express (4) local-bus-compatible interface. We benchmark our learning system against the popular case of handwritten digit recognition in which abundant previous works for comparison are available. In the case of the MNIST database, we could attain the most efficient accuracy rate of 97.91% with 930 prototypes, the learning speed of 1.310-4 s/sample and the classification speed of 3.9410-8 s/character.
Keiichi MIZUTANI Takehiro MIYAMOTO Kei SAKAGUCHI Kiyomichi ARAKI
This paper develops the first prototype hardware for a TDD two-way multi-hop relay network with MIMO network coding. Since conventional wireless multi-hop relay networks have the drawback of low data rate, TDD two-way multi-hop relay networks have been studied as a solution to realize high data rate recently. In these networks, forward and backward streams are spatially multiplexed by using interference cancellation techniques such as MIMO beamforming or MIMO network coding. In this paper, a demonstration system for the TDD two-way multi-hop relay network with MIMO network coding (called 2-way relay network hereafter) is developed using the prototype hardware. In the demonstration system, each transmitter and receiver performs network coded broadcast and MIMO multiple access, respectively. By using the demonstration system, network throughput is measured in an indoor environment to prove the realization and effectiveness of the 2-way relay network. From the results of network throughput, it is found that the 2-way relay network can achieve high network throughput approaching theoretical upper bound even in low average end-to-end SNR area where network throughput of the direct link degrades severely. From these results, the realization and effectiveness of the 2-way relay network can be proved in the real indoor environment.