Sangjin KIM Jihwan LIM Jaehong HAN Heekuck OH
In an RFID search protocol, a reader uses a designated query instead of an unspecified query commonly used in RFID authentication protocols. Due to this fundamental difference, techniques used in RFID authentication protocols may not be suitable for RFID search protocols. Tan et al.'s protocol, however, is based on techniques used in previous works such as using random values. In this paper, we propose two RFID search protocols, one based on static ID and the other based on dynamic ID, both which does not require additional measures to satisfy security requirements of RFID protocols. We achieve this by using counters.
Yi WAN Takuya ASAKA Tatsuro TAKAHASHI
In P2P content distribution systems, there are many cases in which the content can be classified into hierarchically organized categories. In this paper, we propose a hybrid overlay network design suitable for such content called Pastry/NSHCC (Pastry for Non-Strictly Hierarchically Categorized Content). The semantic information of classification hierarchies of the content can be utilized regardless of whether they are in a strict tree structure or not. By doing so, the search scope can be restrained to any granularity, and the number of query messages also decreases while maintaining keyword searching availability. Through simulation, we showed that the proposed method provides better performance and lower overhead than unstructured overlays exploiting the same semantic information.
This letter proposes a Cell-based Hybrid Index (CHI) for energy conserving k Nearest Neighbor search on air. The proposed CHI provides global knowledge on data distribution for fast decision of the search space and local knowledge for efficient pruning of data items. Simulations show that CHI outperforms the existing indexing schemes in terms of tuning time and energy efficiency. With respect to access time, it outperforms them except the distributed indexing scheme optimized for access time.
Reconfigurable architectures are one of the most promising solutions satisfying both performance and flexibility. However, reconfiguration overhead in those architectures makes them inappropriate for repetitive reconfigurations. In this paper, we introduce a configuration sharing technique to reduce reconfiguration overhead between similar applications using static partial reconfiguration. Compared to the traditional resource sharing that configures multiple temporal partitions simultaneously and employs a time-multiplexing technique, the proposed configuration sharing reconfigures a device incrementally as an application changes and requires a backend adaptation to reuse configurations between applications. Adopting a data-flow intermediate representation, our compiler framework extends a min-cut placer and a negotiation-based router to deal with the configuration sharing. The results report that the framework could reduce 20% of configuration time at the expense of 1.9% of computation time on average.
Xiang ZHANG Ping LU Hongbin SUO Qingwei ZHAO Yonghong YAN
In this letter, a recently proposed clustering algorithm named affinity propagation is introduced for the task of speaker clustering. This novel algorithm exhibits fast execution speed and finds clusters with low error. However, experiments show that the speaker purity of affinity propagation is not satisfying. Thus, we propose a hybrid approach that combines affinity propagation with agglomerative hierarchical clustering to improve the clustering performance. Experiments show that compared with traditional agglomerative hierarchical clustering, the hybrid method achieves better performance on the test corpora.
Tein-Yaw CHUNG Fong-Ching YUAN Yung-Mu CHEN Baw-Jhiune LIU
Selecting transparently a proper network connection for voice communication will be a fundamental requirement in future multi-mode heterogeneous wireless network. This paper presented a smart session selection (S3) scheme to meet this requirement. Instead of selecting a best access network as in conventional Always Best Connected (ABC) paradigm, S3 enables users to select a best network connection, which consists of source and destination access network pair, to satisfy quality constraint and users' preference. To support S3, we develop a user profile to specify network connection priority. Meanwhile IP multimedia subsystem (IMS) is extended to make smart decision for users. Finally, Analytic Hierarchy Process (AHP) is used to recommend a network connection with assistance of user profile and IMS signaling. An example is illustrated to show that AHP can successfully select a good network connection that fulfills the requirement of users.
Liang CHEN Le JIN Feng HE Hanwen CHENG Lenan WU
In next generation mobile multimedia communications, different wireless access networks are expected to cooperate. However, it is a challenging task to choose an optimal transmission path in this scenario. This paper focuses on the problem of selecting the optimal access network for multicast services in the cooperative mobile and broadcasting networks. An algorithm is proposed, which considers multiple decision factors and multiple optimization objectives. An analytic hierarchy process (AHP) method is applied to schedule the service queue and an artificial neural network (ANN) is used to improve the flexibility of the algorithm. Simulation results show that by applying the AHP method, a group of weight ratios can be obtained to improve the performance of multiple objectives. And ANN method is effective to adaptively adjust weight ratios when users' new waiting threshold is generated.
Thomas Edison YU Tomokazu YONEDA Krishnendu CHAKRABARTY Hideo FUJIWARA
Rapid advances in semiconductor manufacturing technology have led to higher chip power densities, which places greater emphasis on packaging and temperature control during testing. For system-on-chips, peak power-based scheduling algorithms have been used to optimize tests under specified power constraints. However, imposing power constraints does not always solve the problem of overheating due to the non-uniform distribution of power across the chip. This paper presents a TAM/Wrapper co-design methodology for system-on-chips that ensures thermal safety while still optimizing the test schedule. The method combines a simplified thermal-cost model with a traditional bin-packing algorithm to minimize test time while satisfying temperature constraints. Furthermore, for temperature checking, thermal simulation is done using cycle-accurate power profiles for more realistic results. Experiments show that even a minimal sacrifice in test time can yield a considerable decrease in test temperature as well as the possibility of further lowering temperatures beyond those achieved using traditional power-based test scheduling.
Ignacio ALGREDO-BADILLO Claudia FEREGRINO-URIBE Rene CUMPLIDO Miguel MORALES-SANDOVAL
MD5 is a cryptographic algorithm used for authentication. When implemented in hardware, the performance is affected by the data dependency of the iterative compression function. In this paper, a new functional description is proposed with the aim of achieving higher throughput by mean of reducing the critical path and latency. This description can be used in similar structures of other hash algorithms, such as SHA-1, SHA-2 and RIPEMD-160, which have comparable data dependence. The proposed MD5 hardware architecture achieves a high throughput/area ratio, results of implementation in an FPGA are presented and discussed, as well as comparisons against related works.
Yasuhiro KOBAYASHI Masanori HARIYAMA Michitaka KAMEYAMA
Hierarchical approaches using multi-resolution images are well-known techniques to reduce the computational amount without degrading quality. One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. The complexity of the interconnection network mainly depends on memory allocation; it maps pixels onto memory modules and determines the required number of memory modules. This paper presents a memory allocation method to minimize the number of memory modules for image processing using multi-resolution images. For efficient search, the proposed method exploits the regularity of window-type image processing. A practical example demonstrates that the number of memory modules is reduced to less than 14% that of conventional methods.
Hanieh AMIRSHAHI Satoshi KONDO Koichi ITO Takafumi AOKI
In this paper, we propose an image completion algorithm which takes advantage of the countless number of images available on Internet photo sharing sites to replace occlusions in an input image. The algorithm 1) automatically selects the most suitable images from a database of downloaded images and 2) seamlessly completes the input image using the selected images with minimal user intervention. Experimental results on input images captured at various locations and scene conditions demonstrate the effectiveness of the proposed technique in seamlessly reconstructing user-defined occlusions.
Point Pattern Matching (PPM) is an essential problem in many image analysis and computer vision tasks. This paper presents a two-stage algorithm for PPM problem using ellipse fitting and dual Hilbert scans. In the first matching stage, transformation parameters are coarsely estimated by using four node points of ellipses which are fitted by Weighted Least Square Fitting (WLSF). Then, Hilbert scans are used in two aspects of the second matching stage: it is applied to the similarity measure and it is also used for search space reduction. The similarity measure named Hilbert Scanning Distance (HSD) can be computed fast by converting the 2-D coordinates of 2-D points into 1-D space information using Hilbert scan. On the other hand, the N-D search space can be converted to a 1-D search space sequence by N-D Hilbert Scan and an efficient search strategy is proposed on the 1-D search space sequence. In the experiments, we use both simulated point set data and real fingerprint images to evaluate the performance of our algorithm, and our algorithm gives satisfying results both in accuracy and efficiency.
Masanori HARIYAMA Shota ISHIHARA Michitaka KAMEYAMA
This paper presents a novel asynchronous architecture of Field-programmable gate arrays (FPGAs) to reduce the power consumption. In the dynamic power consumption of the conventional FPGAs, the power consumed by the switch blocks and clock distribution is dominant since FPGAs have complex switch blocks and the large number of registers for high programmability. To reduce the power consumption of switch blocks and clock distribution, asynchronous bit-serial architecture is proposed. To ensure the correct operation independent of data-path lengths, we use the level-encoded dual-rail encoding and propose its area-efficient implementation. The proposed field-programmable VLSI is implemented in a 90 nm CMOS technology. The delay and the power consumption of the proposed FPVLSI are respectively 61% and 58% of those of 4-phase dual-rail encoding which is the most common encoding in delay insensitive encoding.
Ming-Der SHIEH Tai-Ping WANG Chien-Ming WU
We present a systematic and efficient way of managing the path metric memory and simplifying its connection network to the add_compare_select unit (ACSU) for Viterbi decoder (VD) design. Using the derived equations for memory partition and add-compare-select (ACS) arrangement together with the extended in-place scheduling scheme proposed in this work, we can increase the memory bandwidth for conflict-free path metric accesses with hardwired interconnection between the path metric memory and ACSU. Compared with the existing work, the developed architecture possesses the following advantages: (1) Each partitioned memory bank can be treated as a local memory of a specific processing element, inside the ACSU, with hardwired interconnection, so that the interconnect complexity is reduced significantly. (2) The partitioned memory banks can be merged into only two pseudo-banks regardless of the number of adopted ACS processing elements. This not only greatly simplifies the design of address generation unit, but also makes smaller the physical size of required memory. (3) The implementation can be accomplished in a systematic way with regular and simple controlling circuitry. Experimental results demonstrate the effectiveness of the developed architecture and the benefit will be more apparent for convolutional codes with large memory order.
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Takayuki GYOHTEN Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.
Chuzo IWAMOTO Harumasa YONEDA Kenichi MORITA Katsunobu IMAI
We present a tight time-hierarchy theorem for nondeterministic cellular automata by using a recursive padding argument. It is shown that, if t2(n) is a time-constructible function and t2(n) grows faster than t1(n+1), then there exists a language which can be accepted by a t2(n)-time nondeterministic cellular automaton but not by any t1(n)-time nondeterministic cellular automaton.
Modern microprocessors achieve high application performance at the acceptable level of power dissipation. In terms of power to performance trade-off, the instruction window is particularly important. This is because enlarging the window size achieves high performance but naive scaling of the conventional instruction window can severely increase the complexity and power consumption. In this paper, we propose low-power instruction window techniques for contemporary microprocessors. First, the small reorder buffer (SROB) reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until their all data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. This results in higher resource utilization and low power consumption. Second, we replace a conventional issue queue by a direct lookup table (DLT) with an efficient tag translation technique. The translation scheme resolves the instruction dependency, especially for the case of one producer to multiple consumers. The efficiency of the translation scheme stems from the fact that the vast majority of instruction dependency exists within a basic block. Experimental results show that our proposed design reduces the power consumption significantly for SPEC2000 benchmarks.
Kang ZHAO Jinian BIAN Sheqin DONG Yang SONG Satoshi GOTO
Programming the multiprocessor system-on-chip (MPSoC) requires partitioning the sequential reference programs onto multiple processors running in parallel. However, designers still need to partition the code manually due to the lack of automated partition techniques. To settle this issue, this paper proposes a partition exploration algorithm based on the search space smoothing techniques, and implements the proposed method using a commercial extensible processor (Xtensa LX2 processor from Tensilica Inc.). We have verified the feasibility of the algorithm by implementing the MPEG2 benchmark on the Xtensa-based two-processor system. The final experimental results indicate a performance improvement of at least 1.6 compared to the single-processor system.
Shuichi SAKAI Masahiro GOSHIMA Hidetsugu IRIE
This paper presents the processor architecture which provides much higher level dependability than the current ones. The features of it are: (1) fault tolerance and secure processing are integrated into a modern superscalar VLSI processor; (2) light-weight effective soft-error tolerant mechanisms are proposed and evaluated; (3) timing errors on random logic and registers are prevented by low-overhead mechanisms; (4) program behavior is hidden from the outer world by proposed address translation methods; (5) information leakage can be avoided by attaching policy tags for all data and monitoring them for each instruction execution; (6) injection attacks are avoided with much higher accuracy than the current systems, by providing tag trackings; (7) the overall structure of the dependable processor is proposed with a dependability manager which controls the detection of illegal conditions and recovers to the normal mode; and (8) an FPGA-based testbed system is developed where the system clock and the voltage are intentionally varied for experiment. The paper presents the fundamental scheme for the dependability, elemental technologies for dependability and the whole architecture of the ultra dependable processor. After showing them, the paper concludes with future works.
Ruicheng DAI Degui CHEN Xingwen LI Chunping NIU Weixiong TONG Honggang XIANG
The gas-puffer effect has important effects on the interruption capability of a molded case circuit breaker (MCCB). In this paper, on the basis of a simplified model of an arc chamber with a single break, the effect of back-volume of an arc-quenching chamber on arc behavior in an MCCB is investigated. Firstly, using a 2-D optical-fiber arc-motion measurement system, experiments are performed to study the effect of back-volume on the arc-motion and gas pressure in an arc-quenching chamber. We demonstrate that the lower back-volume of the arc-quenching chamber is, the higher the pressure and the better the arc motion will be. Then, corresponding to the above experiments, the gas pressure inside the arc-quenching chamber is calculated using the integral conservation equation. The simulation results are consistent with the experimental results.