IEICE global.ieice.org Site

Keyword Search Result

[Keyword] low(1940hit)

401-420hit(1940hit)

Low-Power Motion Estimation Processor with 3D Stacked Memory
Shuping ZHANG Jinjia ZHOU Dajiang ZHOU Shinji KIMURA Satoshi GOTO

PAPER

Vol:
E98-A No:7
Page(s):
1431-1441
Motion estimation (ME) is a key encoding component of almost all modern video coding standards. ME contributes significantly to video coding efficiency, but, it also consumes the most power of any component in a video encoder. In this paper, an ME processor with 3D stacked memory architecture is proposed to reduce memory and core power consumption. First, a memory die is designed and stacked with ME die. By adding face-to-face (F2F) pads and through-silicon-via (TSV) definitions, 2D electronic design automation (EDA) tools can be extended to support the proposed 3D stacking architecture. Moreover, a special memory controller is applied to control data transmission and timing between the memory die and the ME processor die. Finally, a 3D physical design is completed for the entire system. This design includes TSV/F2F placement, floor plan optimization, and power network generation. Compared to 2D technology, the number of input/output (IO) pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire lengths are reduced by 13.4% and 50%, respectively. The stacking static random access memory contributes the most power reduction in this work. The simulation results show that the design can support real-time 720p @ 60fps encoding at 8MHz using less than 65mW in power, which is much better compared to the state-of-the-art ME processor.
Address Order Violation Detection with Parallel Counting Bloom Filters
Naruki KURATA Ryota SHIOYA Masahiro GOSHIMA Shuichi SAKAI

PAPER

Vol:
E98-C No:7
Page(s):
580-593
To eliminate CAMs from the load/store queues, several techniques to detect memory access order violation with hash filters composed of RAMs have been proposed. This paper proposes a technique with parallel counting Bloom filters (PCBF). A Bloom filter has extremely low false positive rates owing to multiple hash functions. Although some existing researches claim the use of Bloom filters, none of them make mention to multiple hash functions. This paper also addresses the problem relevant to the variety of access sizes of load/store instructions. The evaluation results show that our technique, with only 2720-bit Bloom filters, achieves a relative IPC of 99.0% while the area and power consumption are greatly reduced to 14.3% and 22.0% compared to a conventional model with CAMs. The filter is much smaller than usual branch predictors.
A Perpetuum Mobile 32bit CPU on 65nm SOTB CMOS Technology with Reverse-Body-Bias Assisted Sleep Mode
Koichiro ISHIBASHI Nobuyuki SUGII Shiro KAMOHARA Kimiyoshi USAMI Hideharu AMANO Kazutoshi KOBAYASHI Cong-Kha PHAM

PAPER

Vol:
E98-C No:7
Page(s):
536-543
A 32bit CPU, which can operate more than 15 years with 220mAH Li battery, or eternally operate with an energy harvester of in-door light is presented. The CPU was fabricated by using 65nm SOTB CMOS technology (Silicon on Thin Buried oxide) where gate length is 60nm and BOX layer thickness is 10nm. The threshold voltage was designed to be as low as 0.19V so that the CPU operates at over threshold region, even at lower supply voltages down to 0.22V. Large reverse body bias up to -2.5V can be applied to bodies of SOTB devices without increasing gate induced drain leak current to reduce the sleep current of the CPU. It operated at 14MHz and 0.35V with the lowest energy of 13.4 pJ/cycle. The sleep current of 0.14µA at 0.35V with the body bias voltage of -2.5V was obtained. These characteristics are suitable for such new applications as energy harvesting sensor network systems, and long lasting wearable computers.
Layout Dependent Effect-Aware Leakage Current Reduction and Its Application to Low-Power SAR-ADC
Gong CHEN Yu ZHANG Qing DONG Ming-Yu LI Shigetoshi NAKATAKE

PAPER

Vol:
E98-A No:7
Page(s):
1442-1454
As semiconductor manufacturing processing scaling down, leakage current of CMOS circuits is becoming a dominant contributor to power dissipation. This paper provides an efficient leakage current reduction (LCR) technique for low-power and low-frequency circuit designs in terms of design rules and layout parameters related to layout dependent effects. We address the LCR technique both for analog and digital circuits, and present a design case when applying the LCR techniqe to a successive-approximation-register (SAR) analog-to-digital converter (ADC), which typically employs analog and digital transistors. In the post-layout simulation results by HSPICE, an SAR-ADC with the LCR technique achieves 38.6-nW as the total power consumption. Comparing with the design without the LCR technique, we attain about 30% total energy reduction.
A Forward/Reverse Body Bias Generator with Wide Supply-Range down to Threshold Voltage
Norihiro KAMAE Akira TSUCHIYA Hidetoshi ONODERA

PAPER

Vol:
E98-C No:6
Page(s):
504-511
A forward/reverse body bias generator (BBG) which operates under wide supply-range is proposed. Fine-grained body biasing (FGBB) is effective to reduce variability and increase energy efficiency on digital LSIs. Since FGBB requires a number of BBGs to be implemented, simple design is preferred. We propose a BBG with charge pumps for reverse body bias and the BBG operates under wide supply-range from 0.5,V to 1.2,V. Layout of the BBG was designed in a cell-based flow with an AES core and fabricated in a 65~nm CMOS process. Area of the AES core is 0.22 mm$^2$ and area overhead of the BBG is 2.3%. Demonstration of the AES core shows a successful operation with the supply voltage from 0.5,V to 1.2,V which enables the reduction of power dissipation, for example, of 17% at 400,MHz operation.
A Constant-Current-Controlled Class-C Voltage-Controlled Oscillator using Self-Adjusting Replica Bias Circuit
Teerachot SIRIBURANON Wei DENG Kenichi OKADA Akira MATSUZAWA

PAPER

Vol:
E98-C No:6
Page(s):
471-479
This paper presents a constant-current-controlled class-C VCO using a self-adjusting replica bias circuit. The proposed class-C VCO is more suitable in real-life applications as it can maintain constant current which is more robust in phase noise performance over variation of gate bias of cross-coupled pair comparing to a traditional approach without amplitude modulation issue. The proposed VCO is implemented in 180,nm CMOS process. It achieves a tuning range of 4.8--4.9,GHz with a phase noise of -121,dBc/Hz at 1,MHz offset. The power consumption of the core oscillators is 4.8,mW and an FoM of -189,dBc/Hz is achieved.
New Construction of Optimal p²-Ary Low Correlation Zone Sequence Sets
Yubo LI Kai LIU Chengqian XU

PAPER-Information Theory

Vol:
E98-A No:6
Page(s):
1288-1294
In this correspondence, a generic method of constructing optimal p2-ary low correlation zone sequence sets is proposed. Firstly p2-ary column sequence sets are constructed, then p2-ary LCZ sequence sets with parameters (pn-1, pm-1, (pn-1)/(pm-1),1) are constructed by using column sequences and interleaving technique. The resultant p2-ary LCZ sequence sets are optimal with respect to the Tang-Fan-Matsufuji bound.
Variable Data-Flow Graph for Lightweight Program Slicing and Visualization
Yu KASHIMA Takashi ISHIO Shogo ETSUDA Katsuro INOUE

PAPER-Software Engineering

Pubricized:
2015/03/17
Vol:
E98-D No:6
Page(s):
1194-1205
To understand the behavior of a program, developers often need to read source code fragments in various modules. System-dependence-graph-based (SDG) program slicing is a good candidate for supporting the investigation of data-flow paths among modules, as SDG is capable of showing the data-dependence of focused program elements. However, this technique has two problems. First, constructing SDG requires heavyweight analysis, so SDG is not suitable for daily uses. Second, the results of SDG-based program slicing are difficult to visualize, as they contain many vertices. In this research, we proposed variable data-flow graphs (VDFG) for use in program slicing techniques. In contrast to SDG, VDFG is created by lightweight analysis because several approximations are used. Furthermore, we propose using the fractal value to visualize VDFG-based program slice in order to reduce the graph complexity for visualization purposes. We performed three experiments that demonstrate the accuracy of VDFG program slicing with fractal value, the size of a visualized program slice, and effectiveness of our tool for source code reading.
Optimization Methods for Nop-Shadows Typestate Analysis
Chengsong WANG Xiaoguang MAO Yan LEI Peng ZHANG

PAPER-Dependable Computing

Pubricized:
2015/02/23
Vol:
E98-D No:6
Page(s):
1213-1227
In recent years, hybrid typestate analysis has been proposed to eliminate unnecessary monitoring instrumentations for runtime monitors at compile-time. Nop-shadows Analysis (NSA) is one of these hybrid typestate analyses. Before generating residual monitors, NSA performs the data-flow analysis which is intra-procedural flow-sensitive and partially context-sensitive to improve runtime performance. Although NSA is precise, there are some cases on which it has little effects. In this paper, we propose three optimizations to further improve the precision of NSA. The first two optimizations try to filter interferential states of objects when determining whether a monitoring instrumentation is necessary. The third optimization refines the inter-procedural data-flow analysis induced by method invocations. We have integrated our optimizations into Clara and conducted extensive experiments on the DaCapo benchmark. The experimental results demonstrate that our first two optimizations can further remove unnecessary instrumentations after the original NSA in more than half of the cases, without a significant overhead. In addition, all the instrumentations can be removed for two cases, which implies the program satisfy the typestate property and is free of runtime monitoring. It comes as a surprise to us that the third optimization can only be effective on 8.7% cases. Finally, we analyze the experimental results and discuss the reasons why our optimizations fail to further eliminate unnecessary instrumentations in some special situations.
A 32-kHz Real-Time Clock Oscillator with On-Chip PVT Variation Compensation Circuit for Ultra-Low Power MCUs
Keishi TSUBAKI Tetsuya HIROSE Nobutaka KUROKI Masahiro NUMA

PAPER-Integrated Electronics

Vol:
E98-C No:5
Page(s):
446-453
This paper proposes an ultra-low power fully on-chip CMOS relaxation oscillator (ROSC) for a real-time clock application. The proposed ROSC employs a compensation circuit of a comparator's non-idealities caused by offset voltage and delay time. The ROSC can generate a stable, and 32-kHz oscillation clock frequency without increasing power dissipation by using a low reference voltage and employing a novel compensation architecture for comparators. Measurement results in a 0.18-$mu$m CMOS process demonstrated that the circuit can generate a stable clock frequency of 32.55,kHz with low power dissipation of 472,nW at 1.8-V power supply. Measured line regulation and temperature coefficient were 1.1%/V and 120,ppm/$^{circ}$C, respectively.
Low Complexity Centralized Scheduling Scheme for Downlink CoMP
Jing WANG Satoshi NAGATA Lan CHEN Huiling JIANG

PAPER-Wireless Communication Technologies

Vol:
E98-B No:5
Page(s):
940-948
Coordinated multi-point (CoMP) transmission and reception is a promising technique for interference mitigation in cellular systems. The scheduling algorithm for CoMP has a significant impact on the network processing complexity and performance. Performing exhaustive search permits centralized scheduling and thus the optimal global solution; however, it incurs a high level of computational complexity and may be impractical or lead to high cost as well as network instability. In order to provide a more realistic scheduling method while balancing performance and complexity, we propose a low complexity centralized scheduling scheme that adaptively selects users for single-cell transmission or different CoMP scheme transmission to maximize the system weighted sum capacity. We evaluate the computational complexity and system-level simulation performance in this paper. Compared to the optimal scheduling method with exhaustive search, the proposed scheme has a much lower complexity level and achieves near optimal performance.
A Low Power and Hardware Efficient Syndrome Key Equation Solver Architecture and Its Folding with Pipelining
Kazuhito ITO

PAPER-VLSI Design Technology and CAD

Vol:
E98-A No:5
Page(s):
1058-1066
Syndrome key equation solution is one of the important processes in the decoding of Reed-Solomon codes. This paper proposes a low power key equation solver (KES) architecture where the power consumption is reduced by decreasing the required number of multiplications without degrading the decoding throughput and latency. The proposed method employs smaller number of multipliers than a conventional low power KES architecture. The critical path in the proposed KES circuit is minimized so that the operation at a high clock frequency is possible. A low power folded KES architecture is also proposed to further reduce the hardware complexity by executing folded operations in a pipelined manner with a slight increase in decoding latency.
Discriminative Dictionary Learning with Low-Rank Error Model for Robust Crater Recognition
An LIU Maoyin CHEN Donghua ZHOU

LETTER-Image Recognition, Computer Vision

Pubricized:
2015/02/18
Vol:
E98-D No:5
Page(s):
1116-1119
Robust crater recognition is a research focus on deep space exploration mission, and sparse representation methods can achieve desirable robustness and accuracy. Due to destruction and noise incurred by complex topography and varied illumination in planetary images, a robust crater recognition approach is proposed based on dictionary learning with a low-rank error correction model in a sparse representation framework. In this approach, all the training images are learned as a compact and discriminative dictionary. A low-rank error correction term is introduced into the dictionary learning to deal with gross error and corruption. Experimental results on crater images show that the proposed method achieves competitive performance in both recognition accuracy and efficiency.
WBAN Energy Efficiency and Dependability Improvement Utilizing Wake-Up Receiver Open Access
Juha PETÄJÄJÄRVI Heikki KARVONEN Konstantin MIKHAYLOV Aarno PÄRSSINEN Matti HÄMÄLÄINEN Jari IINATTI

INVITED PAPER

Vol:
E98-B No:4
Page(s):
535-542
This paper discusses the perspectives of using a wake-up receiver (WUR) in wireless body area network (WBAN) applications with event-driven data transfers. First we compare energy efficiency between the WUR-based and the duty-cycled medium access control protocol -based IEEE 802.15.6 compliant WBAN. Then, we review the architectures of state-of-the-art WURs and discuss their suitability for WBANs. The presented results clearly show that the radio frequency envelope detection based architecture features the lowest power consumption at a cost of sensitivity. The other architectures are capable of providing better sensitivity, but consume more power. Finally, we propose the design modification that enables using a WUR to receive the control commands beside the wake-up signals. The presented results reveal that use of this feature does not require complex modifications of the current architectures, but enables to improve energy efficiency and latency for small data blocks transfers.
Low-Power Wiring Method for Band-Limited Signals in CMOS Logic Circuits by Segmentation Coding with Pseudo-Majority Voting
Katsuhiko UEDA Zuiko RIKUHASHI Kentaro HAYASHI Hiroomi HIKAWA

PAPER-Electronic Circuits

Vol:
E98-C No:4
Page(s):
356-363
It is important to reduce the power consumption of complementary metal oxide semiconductor (CMOS) logic circuits, especially those used in mobile devices. A CMOS logic circuit consists of metal-oxide-semiconductor field-effect transistors (MOSFETs), which consume electrical power dynamically when they charge and discharge load capacitance that is connected to their output. Load capacitance mainly exists in wiring or buses, and transitions between logic 0 and logic 1 cause these charges and discharges. Many methods have been proposed to reduce these transitions. One novel method (called segmentation coding) has recently been proposed that reduces power consumption of CMOS buses carrying band-limited signals, such as audio data. It improves performance by employing dedicated encoders for the upper and lower bits of transmitted data, in which the transition characteristics of band-limited signals are utilized. However, it uses a conventional majority voting circuit in the encoder for lower bits, and the circuit uses many adders to count the number of 1s to calculate the Hamming distance between the transmitted data. This paper proposes segmentation coding with pseudo-majority voting. The proposed pseudo-majority voting circuit counts the number of 1s with fewer circuit resources than the conventional circuit by further utilizing the transition characteristics of band-limited signals. The effectiveness of the proposed method was demonstrated through computer simulations and experiments.
An Adaptation of Proxy Mobile IPv6 to OpenFlow Architecture over Software Defined Networking
Seong-Mun KIM Hyon-Young CHOI Youn-Hee HAN Sung-Gi MIN

PAPER-Network

Vol:
E98-B No:4
Page(s):
596-606
In this paper, Proxy Mobile IPv6 (PMIPv6), which is a network-based mobility management protocol, is adapted to the OpenFlow architecture. Mobility-related signaling is generally performed by network entities on behalf of a mobile node, but in standard PMIPv6, the control and data packets are delivered and processed over the same network entities, which prevents the separation of the control and the data planes. In addition, IP tunneling inherent to PMIPv6 imposes excessive overhead for the network entities. In order to adapt PMIPv6 to the OpenFlow architecture, the mobility management function is separated from the PMIPv6 components, and components are reconstructed to take advantage of the offerings of the OpenFlow architecture. The components configure the flow table of the switches located in a path, which comprise the OpenFlow controller. Mobility-related signaling can then be performed at the dedicated secure channel, and all of the data packets can be sent normally in accordance with the flow table of the OpenFlow switches. Consequently, the proposed scheme eliminates IP tunneling when user traffic is forwarded and separates the data and the control planes. The performance analysis revealed that the proposed scheme can outperform PMIPv6 in terms of the signaling cost, packet delivery cost, and handover latency.
Preliminary Study of Electrical Contact Behaviors of Au-plated Material at Super Low Making/Breaking Velocity
Wanbin REN Shengjun XUE Hongxu ZHI Guofu ZHAI

PAPER-Electromechanical Devices and Components

Vol:
E98-C No:4
Page(s):
364-370
This paper presents the electrical contact behaviors of Au-plated material at super low making and breaking velocity conditions by introducing our new designed test rig. The fundamental phenomena in the contact voltage and contact force versus piezoactuator displacement curves were investigated under the load current of 1A and velocity of 50,nm/s. From the repetitive experimental results, we found that the adhesion phenomena during the unloading process are closely correlative with the initial contact stage in the loading process. Furthermore, a mathematical model which is relative to the variation of contact force in loading is built, thus the physical mechanism of adhesion and principal factors of gold-plated materials are discussed. Finally, the physical process of molten bridge under the no mechanical contact situation is also analyzed in detail.
A New Approach to Identify User Authentication Methods toward SSH Dictionary Attack Detection
Akihiro SATOH Yutaka NAKAMURA Takeshi IKENAGA

PAPER-Authentication

Pubricized:
2014/12/04
Vol:
E98-D No:4
Page(s):
760-768
A dictionary attack against SSH is a common security threat. Many methods rely on network traffic to detect SSH dictionary attacks because the connections of remote login, file transfer, and TCP/IP forwarding are visibly distinct from those of attacks. However, these methods incorrectly judge the connections of automated operation tasks as those of attacks due to their mutual similarities. In this paper, we propose a new approach to identify user authentication methods on SSH connections and to remove connections that employ non-keystroke based authentication. This approach is based on two perspectives: (1) an SSH dictionary attack targets a host that provides keystroke based authentication; and (2) automated tasks through SSH need to support non-keystroke based authentication. Keystroke based authentication relies on a character string that is input by a human; in contrast, non-keystroke based authentication relies on information other than a character string. We evaluated the effectiveness of our approach through experiments on real network traffic at the edges in four campus networks, and the experimental results showed that our approach provides high identification accuracy with only a few errors.
Two Sufficient Conditions on Refactorizability of Acyclic Extended Free Choice Workflow Nets to Acyclic Well-Structured Workflow Nets and Their Application
Ichiro TOYOSHIMA Shingo YAMAGUCHI Yuki MURAKAMI

PAPER

Vol:
E98-A No:2
Page(s):
635-644
A workflow net (WF-net for short) is a Petri net which represents a workflow. There are two important subclasses of WF-nets: extended free choice (EFC for short) and well-structured (WS for short). It is known that most actual workflows can be modeled as EFC WF-nets; and acyclic WS is a subclass of acyclic EFC but has more analysis methods. A sound acyclic EFC WF-net may be transformed to an acyclic WS WF-net without changing the observable behavior of the net. Such a transformation is called refactoring. In this paper, we tackled a problem, named acyclic EFC WF-net refactorizability problem, that decides whether a given sound acyclic EFC WF-net is refactorable to an acyclic WS WF-net. We gave two sufficient conditions on the problem, and constructed refactoring procedures based on the conditions. Furthermore, we applied the procedures to a sample workflow, and confirmed usefulness of the procedures for the enhancement of the readability and the analysis power of acyclic EFC WF-nets.
Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration
Stewart DENHOLM Hiroaki INOUE Takashi TAKENAKA Tobias BECKER Wayne LUK

PAPER-Application

Pubricized:
2014/11/19
Vol:
E98-D No:2
Page(s):
288-297
Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.

401-420hit(1940hit)

Keyword Search Result

[Keyword] low(1940hit)

Low-Power Motion Estimation Processor with 3D Stacked Memory

Address Order Violation Detection with Parallel Counting Bloom Filters

A Perpetuum Mobile 32bit CPU on 65nm SOTB CMOS Technology with Reverse-Body-Bias Assisted Sleep Mode

Layout Dependent Effect-Aware Leakage Current Reduction and Its Application to Low-Power SAR-ADC

A Forward/Reverse Body Bias Generator with Wide Supply-Range down to Threshold Voltage

A Constant-Current-Controlled Class-C Voltage-Controlled Oscillator using Self-Adjusting Replica Bias Circuit

New Construction of Optimal p²-Ary Low Correlation Zone Sequence Sets

Variable Data-Flow Graph for Lightweight Program Slicing and Visualization

Optimization Methods for Nop-Shadows Typestate Analysis

A 32-kHz Real-Time Clock Oscillator with On-Chip PVT Variation Compensation Circuit for Ultra-Low Power MCUs

Low Complexity Centralized Scheduling Scheme for Downlink CoMP

A Low Power and Hardware Efficient Syndrome Key Equation Solver Architecture and Its Folding with Pipelining

Discriminative Dictionary Learning with Low-Rank Error Model for Robust Crater Recognition

WBAN Energy Efficiency and Dependability Improvement Utilizing Wake-Up Receiver Open Access

Low-Power Wiring Method for Band-Limited Signals in CMOS Logic Circuits by Segmentation Coding with Pseudo-Majority Voting

An Adaptation of Proxy Mobile IPv6 to OpenFlow Architecture over Software Defined Networking

Preliminary Study of Electrical Contact Behaviors of Au-plated Material at Super Low Making/Breaking Velocity

A New Approach to Identify User Authentication Methods toward SSH Dictionary Attack Detection

Two Sufficient Conditions on Refactorizability of Acyclic Extended Free Choice Workflow Nets to Acyclic Well-Structured Workflow Nets and Their Application

Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles