Shan-Chun KUO Hong-Yuan JHENG Fan-Chieh CHENG Shanq-Jang RUAN
In this letter, a design of inverse discrete cosine transform for energy-efficient watermarking mechanism based on DS-CDMA with significant energy and area reduction is presented. Taking advantage of converged input data value set as a precomputation concept, the proposed one-dimensional IDCT is a multiplierless hardware which differs from Loeffler architecture and has benefits of low complexity and low power consumption. The experimental results show that our design can reduce 85.2% energy consumption and 58.6% area. Various spectrum and spatial attacks are also tested to corroborate the robustness.
Takeshi OKUMOTO Kumpei YOSHIKAWA Makoto NAGATA
An effective supply voltage monitor evaluates dynamic variation of (Vdd-Vss) within power rails of integrated circuits on a die. The monitor occupies an area of as small as 10.8 14.5 µm2 and is followed by backend digitizing circuits, both using 3.3 V thick oxide transistors in a 65 nm CMOS technology for covering all power domains from core circuits to peripheral I/O rings. A prototype demonstrates capturing of effective supply voltage waveforms in digital (shift registers) as well as in analog (4 bit Flash ADC) circuits.
Hiroshi NAKAMURA Weihan WANG Yuya OHTA Kimiyoshi USAMI Hideharu AMANO Masaaki KONDO Mitaro NAMIKI
Power consumption has recently emerged as a first class design constraint in system LSI designs. Specially, leakage power has occupied a large part of the total power consumption. Therefore, reduction of leakage power is indispensable for efficient design of high-performance system LSIs. Since 2006, we have carried out a research project called “Innovative Power Control for Ultra Low-Power and High-Performance System LSIs”, supported by Japan Science and Technology Agency as a CREST research program. One of the major objectives of this project is reducing the leakage power consumption of system LSIs by innovative power control through tight cooperation and co-optimization of circuit technology, architecture, and system software designs. In this project, we focused on power gating as a circuit technique for reducing leakage power. Temporal granularity is one of the most important issue in power gating. Thus, we have developed a series of Geysers as proof-of-concept CPUs which provide several mechanisms of fine-grained run-time power gating. In this paper, we describe their concept and design, and explain why co-optimization of different design layers are important. Then, three kinds of power gating implementations and their evaluation are presented from the view point of power saving and temporal granularity.
Jin Ho HAN Jong Hwan PARK Dong Hoon LEE
Broadcast encryption scheme with personalized messages (BEPM) is a new primitive that allows a broadcaster to encrypt both a common message and individual messages. BEPM is necessary in applications where individual messages include information related to user's privacy. Recently, Fujii et al. suggested a BEPM that is extended from a public key broadcast encryption (PKBE) scheme by Boneh, Gentry, and Waters. In this paper, we point out that 1) Conditional Access System using Fujii et al.'s BEPM should be revised in a way that decryption algorithm takes as input public key as well, and 2) performance analysis of Fujii et al.'s BEPM should be done depending on whether the public key is transmitted along with ciphertext or stored into user's device. Finally, we propose a new BEPM that is transmission-efficient, while preserving O(1) user storage cost. Our construction is based on a PKBE scheme suggested by Park, Kim, Sung, and Lee, which is also considered as being one of the best PKBE schemes.
Masashi NOMURA Shigemasa TAKAI
In this paper, we study decentralized supervisory control of timed discrete event systems, where we adopt the OR rule for fusing local enablement decisions and the AND rule for fusing local enforcement decisions. For any specification language satisfying a certain assumption, we propose a method for constructing a decentralized supervisor that achieves its sublanguage. The proposed method does not require computing the achieved sublanguage.
Alberto CALIXTO SIMON Saul E. POMARES HERNANDEZ Jose Roberto PEREZ CRUZ Pilar GOMEZ-GIL Khalil DRIRA
Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.
Hiroyuki NOZAKA Tomisato MIURA Zhongxi ZHENG
Objective: The virtual slides are high-magnification whole digital images of histopathological tissue sections. The existing virtual slide system, which is optimized for scanning flat and smooth plane slides such as histopathological paraffin-embedded tissue sections, but is unsuitable for scanning irregular plane slides such as cytological smear slides. This study aims to develop a virtual slide system suitable for cytopathology slide scanning and to evaluate the effectiveness of multi-focus image fusion (MF) in cytopathological diagnosis. Study Design: We developed a multi-layer virtual slide scanning system with MF technology. Tumors for this study were collected from 21 patients diagnosed with primary breast cancer. After surgical extraction, smear slide for cytopathological diagnosis were manufactured by the conventional stamp method, fine needle aspiration method (FNA), and tissue washing method. The stamp slides were fixed in 95% ethanol. FNA and tissue washing samples were fixed in CytoRich RED Preservative Fluid, a liquid-based cytopathology (LBC). These slides were stained with Papanicolaou stain, and scanned by virtual slide system. To evaluate the suitability of MF technology in cytopathological diagnosis, we compared single focus (SF) virtual slide with MF virtual slide. Cytopathological evaluation was carried out by 5 pathologists and cytotechnologists. Results: The virtual slide system with MF provided better results than the conventional SF virtual slide system with regard to viewing inside cell clusters and image file size. Liquid-based cytology was more suitable than the stamp method for virtual slides with MF. Conclusion: The virtual slide system with MF is a useful technique for the digitization in cytopathology, and this technology could be applied to tele-cytology and e-learning by virtual slide system.
Joon-Young CHOI Hongju KIM Soonman KWON
We address the global asymptotic stability of FAST TCP, especially considering cross traffics, time-varying network feedback delay, and queuing delay dynamics at link. Exploiting the inherent dynamic property of FAST TCP, we construct two sequences that represent the lower and upper bound variations of the congestion window in time. By showing that the sequences converge to the equilibrium point of the congestion window, we establish that FAST TCP in itself is globally asymptotically stable without any specific conditions on the tuning parameter α or the update gain γ.
Xu LI Kai LU Xiaoping WANG Bin DAI Xu ZHOU
Existing large-scale systems suffer from various hardware/software failures, motivating the research of fault-tolerance techniques. Checkpoint-restart techniques are widely applied fault-tolerance approaches, especially in scientific computing systems. However, the overhead of checkpoint largely influences the overall system performance. Recently, the emerging byte-addressable, persistent memory technologies, such as phase change memory (PCM), make it possible to implement checkpointing in arbitrary data granularity. However, the impact of data granularity on the checkpointing cost has not been fully addressed. In this paper, we investigate how data granularity influences the performance of a checkpoint system. Further, we design and implement a high-performance checkpoint system named AG-ckpt. AG-ckpt is a hybrid-granularity incremental checkpointing scheme through: (1) low-cost modified-memory detection and (2) fine-grained memory duplication. Moreover, we also formulize the performance-granularity relationship of checkpointing systems through a mathematical model, and further obtain the optimum solutions. We conduct the experiments through several typical benchmarks to verify the performance gain of our design. Compared to conventional incremental checkpoint, our results show that AG-ckpt can reduce checkpoint data amount up to 50% and provide a speedup of 1.2x-1.3x on checkpoint efficiency.
Kazunori HAYASHI Masaaki NAGAHARA Toshiyuki TANAKA
This survey provides a brief introduction to compressed sensing as well as several major algorithms to solve it and its various applications to communications systems. We firstly review linear simultaneous equations as ill-posed inverse problems, since the idea of compressed sensing could be best understood in the context of the linear equations. Then, we consider the problem of compressed sensing as an underdetermined linear system with a prior information that the true solution is sparse, and explain the sparse signal recovery based on
Hung K. NGUYEN Peng CAO Xue-Xiang WANG Jun YANG Longxing SHI Min ZHU Leibo LIU Shaojun WEI
REMUS-II (REconfigurable MUltimedia System 2) is a coarse-grained dynamically reconfigurable computing system for multimedia and communication baseband processing. This paper proposes a real-time H.264 baseline profile encoder on REMUS-II. First, we propose an overall mapping flow for mapping algorithms onto the platform of REMUS-II system and then illustrate it by implementing the H.264 encoder. Second, parallel and pipelining techniques are considered for fully exploiting the abundant computing resources of REMUS-II, thus increasing total computing throughput and solving high computational complexity of H.264 encoder. Besides, some data-reuse schemes are also used to increase data-reuse ratio and therefore reduce the required data bandwidth. Third, we propose a scheduling scheme to manage run-time reconfiguration of the system. The scheduling is also responsible for synchronizing the data communication between tasks and handling conflict between hardware resources. Experimental results prove that the REMUS-MB (REMUS-II version for mobile applications) system can perform a real-time H.264/AVC baseline profile encoder. The encoder can encode CIF@30 fps video sequences with two reference frames and maximum search range of [-16,15]. The implementation, thereby, can be applied to handheld devices targeted at mobile multimedia applications. The platform of REMUS-MB system is designed and synthesized by using TSMC 65 nm low power technology. The die size of REMUS-MB is 13.97 mm2. REMUS-MB consumes, on average, about 100 mW while working at 166 MHz. To my knowledge, in the literature this is the first implementation of H.264 encoding algorithm on a coarse-grained dynamically reconfigurable computing system.
Yang YU Shiro HANDA Fumihito SASAMORI Osamu TAKYU
In this paper, through extrinsic information transfer (EXIT) band chart analysis, an adaptive iterative decoding approach (AIDA) is proposed to reduce the iterative decoding complexity and delay for finite-length differentially encoded Low-density parity-check (DE-LDPC) coded systems with multiple-symbol differential detection (MSDD). The proposed AIDA can adaptively adjust the observation window size (OWS) of the MSDD soft-input soft-output demodulator (SISOD) and the outer iteration number of the iterative decoder (consisting of the MSDD SISOD and the LDPC decoder) instead of setting fixed values for the two parameters of the considered systems. The performance of AIDA depends on its stopping criterion (SC) which is used to terminate the iterative decoding before reaching the maximum outer iteration number. Many SCs have been proposed; however, these approaches focus on turbo coded systems, and it has been proven that they do not well suit for LDPC coded systems. To solve this problem, a new SC called differential mutual information (DMI) criterion, which can track the convergence status of the iterative decoding, is proposed; it is based on tracking the difference of the output mutual information of the LDPC decoder between two consecutive outer iterations of the considered systems. AIDA using the DMI criterion can adaptively adjust the out iteration number and OWS according to the convergence situation of the iterative decoding. Simulation results show that compared with using the existing SCs, AIDA using the DMI criterion can further reduce the decoding complexity and delay, and its performance is not affected by a change in the LDPC code and transmission channel parameters.
For simply-typed term rewriting systems (STRSs) and higher-order rewrite systems (HRSs) a la Nipkow, we proposed a method for proving termination, namely the static dependency pair method. The method combines the dependency pair method introduced for first-order rewrite systems with the notion of strong computability introduced for typed λ-calculi. This method analyzes a static recursive structure based on definition dependency. By solving suitable constraints generated by the analysis, we can prove termination. In this paper, we extend the method to rewriting systems for functional programs (RFPs) with product, algebraic data, and ML-polymorphic types. Although the type system in STRSs contains only product and simple types and the type system in HRSs contains only simple types, our RFPs allow product types, type constructors (algebraic data types), and type variables (ML-polymorphic types). Hence, our RFPs are more representative of existing functional programs than STRSs and HRSs. Therefore, our result makes a large contribution to applying theoretical rewriting techniques to actual problems, that is, to proving the termination of existing functional programs.
Compressive sensing enables quite lower sampling rate compared with Nyquist sampling. As long as the signal is sparsity in some basis, the random sampling with CS can be employed. In order to make CS applied in the practice, the Analog to Information Converter (AIC) should be involved. Based on the Limited Random Sequence (LRS) modulation, the AIC with LRS can be designed with high performance according to the fixed sparsity. However, if the sparsity of the signal varies with time, the original AIC with LRS is not efficient. In this paper, the adaptive AIC which adapts its scheme of LRS according to the variation of the sparsity is proposed and the prototype system is designed. Due to the adaption of the AIC with the scheme of LRS, the sampling rate can be further reduced. The simulation results confirm the performance of the proposed adaptive AIC scheme. The prototype system can successfully fulfil the random sampling and adapt to the variation of sparsity, which verify and consolidate the validity and feasibility for the future implementation of adaptive AIC on chip.
Ramesh K. POKHAREL Xin LIU Dayang A.A. MAT Ruibing DONG Haruichi KANAYA Keiji YOSHIDA
This paper presents the design of a second-order and a fourth-order bandpass filter (BPF) for 60 GHz millimeter-wave applications in 0.18 µm CMOS technology. The proposed on-chip BPFs employ the folded open loop structure designed on pattern ground shields. The adoption of a folded structure and utilization of multiple transmission zeros in the stopband permit the compact size and high selectivity for the BPF. Moreover, the pattern ground shields obviously slow down the guided waves which enable further reduction in the physical length of the resonator, and this, in turn, results in improvement of the insertion losses. A very good agreement between the electromagnetic (EM) simulations and measurement results has been achieved. As a result, the second-order BPF has the center frequency of 57.5 GHz, insertion loss of 2.77 dB, bandwidth of 14 GHz, return loss less than 27.5 dB and chip size of 650 µm810 µm (including bonding pads) while the fourth-order BPF has the center frequency of 57 GHz, insertion loss of 3.06 dB, bandwidth of 12 GHz, return loss less than 30 dB with chip size of 905 µm810 µm (including bonding pads).
Luong Pham VAN Hoyoung LEE Jaehwan KIM Byeungwoo JEON
Blocking artifacts are introduced in many block-based coding systems, and its reduction can significantly improve the subjective quality of compressed video. The H.264/AVC uses an in-loop deblocking filter to remove the blocking artifacts. The filter considers some coding conditions in its adaptive deblocking filtering such as coded block pattern (CBP), motion vector, macroblock type, etc. for inter-predicted blocks, however, it does not consider much for intra-coded blocks. In this paper, we utilize the human visual system (HVS) characteristic and the local characteristic of image blocks to modify the boundary strength (BS) of the intra-deblocking filter in order to gain improvement in the subjective quality and also to reduce the complexity in filtering intra coded slices. In addition, we propose a low-complexity deblocking method which utilizes the correlation between vertical and horizontal boundaries of a block in inter coded slices. Experimental results show that our proposed method achieves not only significant gain in the subjective quality but also some PSNR gain, and reduces the computational complexity of the deblocking filter by 36.23% on average.
Koji KAJIWARA Tatsushi YAMASAKI
In this paper, we propose an optimal supervisory control method for discrete event systems (DESs) that have different preferences. In our previous work, we proposed an optimal supervisory control method based on reinforcement learning. In this paper, we extend it and consider a system that consists of several local systems. This system is modeled by a decentralized DES (DDES) that consists of local DESs, and is supervised by a central supervisor. In addition, we consider that the supervisor and each local DES have their own preferences. Each preference is represented by a preference function. We introduce the new value function based on the preference functions. Then, we propose the learning method of the optimal supervisor based on reinforcement learning for the DDESs. The supervisor learns how to assign the control pattern so as to maximize the value function for the DDES. The proposed method shows the general framework of optimal supervisory control for the DDES that consists of several local systems with different preferences. We show the efficiency of the proposed method through a computer simulation.
Jiwon JANG Seil JEON Younghan KIM
Flow mobility is an emerging technology to support flexible network selection for an application flow and to spread concentrated load to less-overloaded Internet access. Network-based flow mobility (FMO) does not require a massive amount of software logic and system resources on the mobile node. Under this approach, there are two kinds of modes available: network-initiated and user-initiated. Network-initiated FMO decides the best access network suited for a specific flow, but the decisions depend on the operator's policy. Therefore, it has limitations in supporting the user's preference and private network selection. In the user-initiated mode, users can hand off specific flow so that information of both the current user's preference and the conditions of private network are reflected. User-initiated flow mobility method has not been what should be specified and how it can be supported with the protocol sequence for real deployment. This paper extends Internet Exchange Key v2 (IKEv2) and an Attach request message to support user-initiated FMO when a flow moves between 3G and Wi-Fi access. Through performance analysis, we confirm that user-initiated FMO is superior to network-initiated FMO in terms of signaling overhead and handover latency costs.
In this note we suggest a new parallelizable mode of operation for message authentication codes (MACs). The new MAC algorithm iterates a pseudo-random function (PRF) FK:{0,1}m → {0,1}n, where K is a key and m,n are positive integers such that m ≥ 2n. The new construction is an improvement over a sequential MAC algorithm presented at FSE2008, solving positively an open problem posed in the paper – the new mode is capable of fully parallel execution while achieving rate-1 efficiency and “full n-bit” security. Interestingly enough, PMAC-like parallel structure, rather than CBC-like serial iteration, has beneficial side effects on security. That is, the new construction is provided with a more straightforward security proof and with an even better (“
Chizu MATSUMOTO Yuichi HAMAMURA Michinobu NAKAO Kaname YAMASAKI Yoshikazu SAITO Shun'ichi KANEKO
Repairing embedded memories (e-memories) on an advanced system-on-chip (SoC) product is a key technique used to improve product yield. However, increasing the die area of SoC products equipped with various types of e-memories on the die is an issue. A fuse scheme can be used to resolve this issue. However, several fuse schemes that have been proposed to decrease the die area result in an increased repair time. Therefore, in this paper, we propose a novel fuse scheme that decreases both die area and repair time. Moreover, our approach is applied to a 65 nm SoC product. The results indicate that the proposed fuse scheme effectively decreases the die area and repair time of advanced SoC products.