RXv2 is the new generation of Renesas's processor architecture for microcontrollers with high-capacity flash memory. An enhanced instruction set and pipeline structure with an advanced fetch unit (AFU) provide an effective balance between power consumption performance and high processing performance. Enhanced instructions such as DSP function and floating point operation and a five-stage dual-issue pipeline synergistically boost the performance of digital signal applications. The RXv2 processor delivers 1.9 - 3.7 the cycle performance of the RXv1 in these applications. The decrease of the number of Flash memory accesses by AFU is a dominant determiner of reducing power consumption. AFU of RXv2 benefits from adopting branch target cache, which has a comparatively smaller area than that of typical cache systems. High code density delivers low power consumption by reducing instruction memory bandwidth. The implementation of RXv2 delivers up to 46% reduction in static code size, up to 30% reduction in dynamic code size relative to RISC architectures. RXv2 reaches 4.0 Coremark per MHz and operates up to 240MHz. The RXv2 processor delivers approximately more than 2.2 - 5.7x the power efficiency of the RXv1. The RXv2 microprocessor achieves the best possible computing performance in various applications such as building automation, medical, motor control, e-metering, and home appliances which lead to the higher memory capacity, frequency and processing performance.
Keisuke OKUNO Toshihiro KONISHI Shintaro IZUMI Masahiko YOSHIMOTO Hiroshi KAWAGUCHI
We present a low-jitter design for a 10-bit second-order frequency shift oscillator time-to-digital converter (FSOTDC). As described herein, we analyze the relation between performance and FSOTDC parameters and provide insight to support the design of the FSOTDC. Results show that an oscillator jitter limits the FSOTDC resolution, particularly during the first stage. To estimate and design an FSOTDC, the frequency shift oscillator requires an inverter of a certain size. In a standard 65-nm CMOS process, an SNDR of 64dB is achievable at an input signal frequency of 10kHz and a sampling clock of 2MHz. Measurements of the test chip confirmed that the measurements match the analyses.
Shinnosuke YOSHIDA Youhua SHI Masao YANAGISAWA Nozomu TOGAWA
As process technologies advance, timing-error correction techniques have become important as well. A suspicious timing-error prediction (STEP) technique has been proposed recently, which predicts timing errors by monitoring the middle points, or check points of several speed-paths in a circuit. However, if we insert STEP circuits (STEPCs) in the middle points of all the paths from primary inputs to primary outputs, we need many STEPCs and thus require too much area overhead. How to determine these check points is very important. In this paper, we propose an effective STEPC insertion algorithm minimizing area overhead. Our proposed algorithm moves the STEPC insertion positions to minimize inserted STEPC counts. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs and reduce the required number of STEPCs to 1/10-1/80 and their area to 1/5-1/8 compared with a naive algorithm. Furthermore, our algorithm realizes 1.12X-1.5X overclocking compared with just inserting STEPCs into several speed-paths.
Shuping ZHANG Jinjia ZHOU Dajiang ZHOU Shinji KIMURA Satoshi GOTO
Motion estimation (ME) is a key encoding component of almost all modern video coding standards. ME contributes significantly to video coding efficiency, but, it also consumes the most power of any component in a video encoder. In this paper, an ME processor with 3D stacked memory architecture is proposed to reduce memory and core power consumption. First, a memory die is designed and stacked with ME die. By adding face-to-face (F2F) pads and through-silicon-via (TSV) definitions, 2D electronic design automation (EDA) tools can be extended to support the proposed 3D stacking architecture. Moreover, a special memory controller is applied to control data transmission and timing between the memory die and the ME processor die. Finally, a 3D physical design is completed for the entire system. This design includes TSV/F2F placement, floor plan optimization, and power network generation. Compared to 2D technology, the number of input/output (IO) pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire lengths are reduced by 13.4% and 50%, respectively. The stacking static random access memory contributes the most power reduction in this work. The simulation results show that the design can support real-time 720p @ 60fps encoding at 8MHz using less than 65mW in power, which is much better compared to the state-of-the-art ME processor.
Hirotoshi HONMA Yoko NAKAJIMA Yuta IGARASHI Shigeru MASUYAMA
A hinge vertex is a vertex in an undirected graph such that there exist two vertices whose removal makes the distance between them longer than before. Identifying hinge vertices in a graph can help detect critical nodes in communication network systems, which is useful for making them more stable. For finding them, an O(n3) time algorithm was developed for a simple graph, and, linear time algorithms were developed for interval and permutation graphs, respectively. Recently, the maximum detour hinge vertex problem is defined by Honma et al. For a hinge vertex u in a graph, the detour degree of u is the largest value of distance between any pair of x and y (x and y are adjacent to u) by removing u. A hinge vertex with the largest detour degree in G is defined as the maximum detour hinge vertex of G. This problem is motivated by practical applications, such as network stabilization with a limited cost, i.e., by enhancing the reliability of the maximum detour hinge vertex, the stability of the network is much improved. We previously developed an O(n2) time algorithm for solving this problem on an interval graph. In this study, we propose an algorithm that identifies the maximum detour hinge vertex on a permutation graph in O(n2) time, where n is the number of vertices in the graph.
Dieu-Huong VU Yuki CHIBA Kenro YATAKE Toshiaki AOKI
Verification of a design with respect to its requirement specification is important to prevent errors before constructing an actual implementation. The existing works focus on verifications where the specifications are described using temporal logics or using the same languages as that used to describe the designs. Our work considers cases where the specifications and the designs are described using different languages. To verify such cases, we propose a framework to check if a design conforms to its specification based on their simulation relation. Specifically, we define the semantics of the specifications and the designs commonly as labelled transition systems (LTSs). We appreciate LTSs since they could interpret information about the system and actions that the system may perform as well as the effect of these actions. Then, we check whether a design conforms to its specification based on the simulation relation of their LTS. In this paper, we present our framework for the verification of reactive systems, and we present the case where the specifications and the designs are described in Event-B and Promela/Spin, respectively. We also present two case studies with the results of several experiments to illustrate the applicability of our framework on practical systems.
YoungKyu JANG Changnoh YOON Ik-Joon CHANG Jinsang KIM
Parameter variations in nanometer process technology are one of the major design challenges. They cause delay to be increased on the critical path and may change the logic level of internal nodes. The basic concept to solve these problems at the circuit level, design-for-variability (DFV), is to add an error handling circuit to the conventional circuits so that they are robust to nanometer related variations. The state-of-the-art variation-aware flip flops are mainly evolved from aggressive dynamic voltage and frequency scaling (DVFS) -based low-power application systems which handle errors due to the scaled supply voltage. However, they only detect the timing errors and cannot correct the errors. We propose a variation-aware flip flop which can detect and correct the timing error efficiently. The experimental results show that the proposed variation-aware flip flop is more robust and lower power than the existing approaches.
Yinan SUN Yongpan LIU Zhibo WANG Huazhong YANG
Function speculation design with error recovery mechanisms is quite promising due to its high performance and low area overhead. Previous work has focused on two-stage function speculation and thus lacks a systematic way to address the challenge of the multistage function speculation approach. This paper proposes a multistage function speculation with adaptive predictors and applies it in a novel adder. We deduced the analytical performance and area models for the design and validated them in our experiments. Based on those models, a general methodology is presented to guide design optimization. Both analytical proofs and experimental results on the fabricated chips show that the proposed adder's delay and area have a logarithmic and linear relationship with its bit number, respectively. Compared with the DesignWare IP, the proposed adder provides the same performance with 6-17% area reduction under different bit lengths.
Koichi KISE Shinichiro OMACHI Seiichi UCHIDA Masakazu IWAMURA Marcus LIWICKI
This paper reviews several trials of re-designing conventional communication medium, i.e., characters, for enriching their functions by using data-embedding techniques. For example, characters are re-designed to have better machine-readability even under various geometric distortions by embedding a geometric invariant into each character image to represent class label of the character. Another example is to embed various information into handwriting trajectory by using a new pen device, called a data-embedding pen. An experimental result showed that we can embed 32-bit information into a handwritten line of 5 cm length by using the pen device. In addition to those applications, we also discuss the relationship between data-embedding and pattern recognition in a theoretical point of view. Several theories tell that if we have appropriate supplementary information by data-embedding, we can enhance pattern recognition performance up to 100%.
Sumaru NIIDA Satoshi UEMURA Shigehiro ANO
With the rapid growth of high performance ICT (Information Communication Technologies) devices such as smart phones and tablet PCs, multitasking has become one of the popular ways of using mobile devices. The reasons users have adopted multitask operation are that it reduces the level of dissatisfaction regarding waiting time and makes effective use of time by switching their attention from the waiting process to other content. This is a good solution to the problem of waiting; however, it may cause another problem, which is the increase in traffic volume due to the multiple applications being worked on simultaneously. Thus, an effective method to control throughput adapted to the multitasking situation is required. This paper proposes a transmission rate control method for web browsing that takes multitasking behavior into account and quantitatively demonstrates the effect of service by two different field experiments. The main contribution of this paper is to present a service design process for a new transmission rate control that takes into account human-network interaction based on the human-centered approach. We show that the degree of satisfaction in relation to waiting time did not degrade even when a field trial using a testbed showed that throughput of the background task was reduced by 40%.
Yiqiang SHENG Atsushi TAKAHASHI
In this paper, a novel high-performance heuristic algorithm, named relay-race algorithm (RRA), which was proposed to approach a global optimal solution by exploring similar local optimal solutions more efficiently within shorter runtime for NP-hard problem is investigated. RRA includes three basic parts: rough search, focusing search and relay. The rough search is designed to get over small hills on the solution space and to approach a local optimal solution as fast as possible. The focusing search is designed to reach the local optimal solution as close as possible. The relay is to escape from the local optimal solution in only one step and to maintain search continuity simultaneously. As one of typical applications, multi-objective placement problem in physical design optimization is solved by the proposed RRA. In experiments, it is confirmed that the computational performance is considerably improved. RRA achieves overall Pareto improvement of two conflicting objectives: power consumption and maximal delay. RRA has its potential applications to improve the existing search methods for more hard problems.
Ittetsu TANIGUCHI Junya KAIDA Takuji HIEDA Yuko HARA-AZUMI Hiroyuki TOMIYAMA
This paper studies mapping techniques of multiple applications on embedded many-core SoCs. The mapping techniques proposed in this paper are static which means the mapping is decided at design time. The mapping techniques take into account both inter-application and intra-application parallelism in order to fully utilize the potential parallelism of the many-core architecture. Additionally, the proposed static mapping supports dynamic application switching, which means the applications mapped onto the same cores are switched to each other at runtime. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. Experimental results show the effectiveness of the proposed techniques.
Mahmoud BAKHSHIZADEH Ali JAHANIAN
Hardware Trojan or any other kind of unwanted hardware modifications has been thought as a major challenge in many commercial and secure applications. Currently, detection and prevention of hardware Trojans appeared as an important requirement in such systems. In this paper, a new concept, Trojan Vulnerability Map, is introduced to model the immunity of various regions of hardware against hardware attacks. Then, placement and routing algorithms are proposed to improve the immunity of hardware using the Trojan Vulnerability Map. Experimental results show that the proposed placement and routing algorithm reduces the hardware vulnerability by 25.65% and 4.08%, respectively. These benefits are earned in cost of negligible total wire length and delay overhead.
Tuan Hung NGUYEN Hiroshi SATO Yoshio KOYANAGI Hisashi MORISHITA
This study presents a proposal for space-saving design of built-in antennas for handset terminals based on the concept of requisite design antenna volume. By investigating the relation between antenna input characteristic and electric near-field around the antenna element and surrounding components inside the terminal, and then evaluating the requisite design antenna volume, we propose the most effective deployment for both the antenna and surrounding components. The results show that our simple proposal can help reduced, by about 17% and 31.75%, the space that the antenna element actually requires at least for stable operation inside the terminal, in the single-band designs for the cellular 2GHz band (1920-2170MHz) and 800MHz band (830-880MHz), respectively. In the dual-band design, we verify that it can reduce, the antenna space by about 35.18%, and completely cover the two above cellular bands with good antenna performance.
Yohei KATAYAMA Takehito YAMAMOTO Yukio TSUKISHIMA Kazuhisa YAMADA Noriyuki TAKAHASHI Atsushi TAKAHARA Akihiro NAKAO
Due to the recent network service market trends, network infrastructure providers must make their network infrastructures tolerant of network service complexity and swift at providing new network services. To achieve this, we first make a design decision for the single domain network infrastructure in which we use network virtualization and separate the network service control and management from the network infrastructure and leave the resource connectivity control and management in the network infrastructure so that the infrastructure can maintain simplicity and the network service can become complex and be quickly provided. Along with the decision, we construct an architecture of the network infrastructure and a network management model. The management model defines a slice as being determined by abstracted resource requirements and restructures the roles and planes from the viewpoint of network infrastructure usability so that network service requesters can manage network resources freely and swiftly in an abstract manner within the authorities the network infrastructure operator provides. We give the details of our design and implementation for a network virtualization management system along with the model. We deployed and evaluated our designed and implemented management system on the Japan national R&E testbed (JGN-X) to confirm the feasibility of our management system design and discuss room for improvement in terms of response time and scalability towards practical use. We also investigated certain cases of sophisticated network functions to confirm that the infrastructure can accept these functions without having to be modified.
This paper proposes low-power voltage-mode/current-mode hybrid circuits to realize an arbitrary two-variable logic function and a full-adder function. The voltage and current mode can be selected for low-power operations at low and high frequency, respectively, according to speed requirement. An nMOS pass transistor network is shared to realize voltage switching and current steering for the voltage- and current-mode operations, respectively, which leads to high utilization of the hardware resources. As a result, when the operating frequency is more than 1.15,GHz, the current mode of the hybrid logic circuit is more power-efficient than the voltage mode. Otherwise, the voltage mode is more power-efficient. The power consumption of the hybrid two-variable logic circuit is lower than that of the conventional two-input look-up table (LUT) using CMOS transmission gates, when the operating frequency is more than 800,MHz. The delay and area of the hybrid two-variable logic circuit are increased by only 7% and 13%, respectively
We propose a method for finding an appropriate setting of a pay-per-performance payment system to prevent participation of insincere workers in crowdsourcing. Crowdsourcing enables fast and low-cost accomplishment of tasks; however, insincere workers prevent the task requester from obtaining high-quality results. Instead of a fixed payment system, the pay-per-performance payment system is promising for excluding insincere workers. However, it is difficult to learn what settings are better, and a naive payment setting may cause unsatisfactory outcomes. To overcome these drawbacks, we propose a method for calculating the expected payments for sincere and insincere workers, and then clarifying the conditions in the payment setting in which sincere workers are willing to choose a task, while insincere workers are not willing to choose the task. We evaluated the proposed method by conducting several experiments on tweet labeling tasks in Amazon Mechanical Turk. The results suggest that the pay-per-performance system is useful for preventing participation of insincere workers.
Because dielectrics between active layers have low thermal conductivities, there is a demand to reduce the temperature increase in three-dimensional integrated circuits (3D ICs). This paper demonstrates that, in the design of 3D ICs, different layer assignments often lead to different temperature increases. Based on this observation, we are motivated to perform temperature-aware layer assignment. Our work includes two parts. Firstly, an integer linear programming (ILP) approach that guarantees a minimum temperature increase is proposed. Secondly, a polynomial-time heuristic algorithm that reduces the temperature increase is proposed. Compared with the previous work, which does not take the temperature increase into account, the experimental results show that both our ILP approach and our heuristic algorithm produce a significant reduction in the temperature increase with a very small area overhead.
Maximizing network lifetime and optimizing aggregate system utility are important but usually conflict goals in wireless multi-hop networks. For the trade-off, we present a matrix game-theoretic cross-layer optimization formulation to jointly maximize the diverse objectives in such networks with network coding. To this end, we introduce a cross-layer formulation of general network utility maximization (NUM) that accommodates routing, scheduling, and stream control from different layers in the coded networks. Specifically, for the scheduling problem and then the objective function involved, we develop a matrix game with the strategy sets of the players corresponding to hyperlink and transmission mode, and design multiple payoffs specific to lifetime and system utility, respectively. In particular, with the inherit merit that matrix game can be solved with mathematical programming, our cross-layer programming formulation actually benefits from both game-based and NUM-based approaches at the same time by cooperating the programming model for the matrix game with that for the other layers in a consistent framework. Finally, our numerical experiments quantitatively exemplify the possible performance trad-offs with respect to the two variants developed on the multiple objectives in question while qualitatively exhibiting the differences between the framework and the other related works.
Hoon RYU Jung-Lok YU Duseok JIN Jun-Hyung LEE Dukyun NAM Jongsuk LEE Kumwon CHO Hee-Jung BYUN Okhwan BYEON
We discuss a new high performance computing service (HPCS) platform that has been developed to provide domain-neutral computing service under the governmental support from “EDucation-research Integration through Simulation On the Net” (EDISON) project. With a first focus on technical features, we not only present in-depth explanations of the implementation details, but also describe the strengths of the EDISON platform against the successful nanoHUB.org gateway. To validate the performance and utility of the platform, we provide benchmarking results for the resource virtualization framework, and prove the stability and promptness of the EDISON platform in processing simulation requests by analyzing several statistical datasets obtained from a three-month trial service in the initiative area of computational nanoelectronics. We firmly believe that this work provides a good opportunity for understanding the science gateway project ongoing for the first time in Republic of Korea, and that the technical details presented here can be served as an useful guideline for any potential designs of HPCS platforms.