Woo-Jae KIM Jong-Pil YOON Joo-Young BAEK Young-Joo SUH
In this paper, we focus on resource allocation schemes for minimizing the energy consumption of subscriber stations (SSs) in uplink flows of the IEEE 802.16 OFDMA systems. The resource allocation schemes assign subcarriers, powers, and data rates to each SS based on the measured signal to noise ratio (SNR) of the uplink channel and predefined modulation and coding scheme as system parameters. Previous research efforts to optimize resource allocation focus on the rate and throughput maximizations, and develop suboptimal heuristic algorithms. However, this paper intends to reduce the energy consumption of SSs by considering the relationship between energy efficiency and resource allocation. In order to clearly formulate the relationship, we use the Multiple Choice Knapsack (MCK) problem, which is proved to be an NP-hard problem. We propose two heuristic schemes to solve the NP-hard problem, which adaptively use the modulation and coding scheme, defined in the IEEE 802.16 OFDMA systems to minimize the required transmission power of each SS. Our simulation results show that the proposed schemes can reduce the energy consumption by up to 53% compared to the channel state information (CSI) scheme, which determines the modulation and coding level only considering the channel state information.
In modern VLSI physical design, huge integration scale necessitates hierarchical design and IP reuse to cope with design complexity. Besides, interconnect delay becomes dominant to overall circuit performance. These critical factors require some modules to be placed along designated boundaries to effectively facilitate hierarchical design and interconnection optimization related problems. In this paper, boundary constraints of general floorplan are solved smoothly based on the novel representation Single-Sequence (SS). Necessary and sufficient conditions of rooms along specified boundaries of a floorplan are proposed and proved. By assigning constrained modules to proper boundary rooms, our proposed algorithm always guarantees a feasible SS code with appropriate boundary constraints in each perturbation. Time complexity of the proposed algorithm is O(n). Experimental results on MCNC benchmarks show effectiveness and efficiency of the proposed method.
Masataka MIYAKE Daisuke HORI Norio SADACHIKA Uwe FELDMANN Mitiko MIURA-MATTAUSCH Hans Jurgen MATTAUSCH Takahiro IIZUKA Kazuya MATSUZAWA Yasuyuki SAHARA Teruhiko HOSHIDA Toshiro TSUKADA
We analyze the carrier dynamics in MOSFETs under low-voltage operation. For this purpose the displacement (charging/discharging) current, induced during switching operations is studied experimentally and theoretically for a 90 nm CMOS technology. It is found that the experimental transient characteristics can only be well reproduced in the circuit simulation of low voltage applications by considering the carrier-transit delay in the compact MOSFET model. Long carrier transit delay under the low voltage switching-on operation results in long duration of the displacement current flow. On the other hand, the switching-off characteristics are independent of the bias condition.
Hai VU Tomio ECHIGO Ryusuke SAGAWA Keiko YAGI Masatsugu SHIBA Kazuhide HIGUCHI Tetsuo ARAKAWA Yasushi YAGI
Interpretations by physicians of capsule endoscopy image sequences captured over periods of 7-8 hours usually require 45 to 120 minutes of extreme concentration. This paper describes a novel method to reduce diagnostic time by automatically controlling the display frame rate. Unlike existing techniques, this method displays original images with no skipping of frames. The sequence can be played at a high frame rate in stable regions to save time. Then, in regions with rough changes, the speed is decreased to more conveniently ascertain suspicious findings. To realize such a system, cue information about the disparity of consecutive frames, including color similarity and motion displacements is extracted. A decision tree utilizes these features to classify the states of the image acquisitions. For each classified state, the delay time between frames is calculated by parametric functions. A scheme selecting the optimal parameters set determined from assessments by physicians is deployed. Experiments involved clinical evaluations to investigate the effectiveness of this method compared to a standard-view using an existing system. Results from logged action based analysis show that compared with an existing system the proposed method reduced diagnostic time to around 32.5 7 minutes per full sequence while the number of abnormalities found was similar. As well, physicians needed less effort because of the systems efficient operability. The results of the evaluations should convince physicians that they can safely use this method and obtain reduced diagnostic times.
Natsumi HIRAKAWA Kunihiro FUJIYOSHI
Symmetry constrains are the constraints that the given cells should be placed symmetrically in design of analog ICs. We use O-tree to represent placements and propose a decoding algorithm which can obtain one of the minimum placements satisfying the constraints. The decoding algorithm uses linear programming, which is too much time consuming. Therefore we propose a graph based method to recognize if there exists no placement satisfying both the given symmetry and O-tree constraints, and use the method before application of linear programming. The effectiveness of the proposed method was shown by computational experiments.
Ming-Der SHIEH Tai-Ping WANG Chien-Ming WU
We present a systematic and efficient way of managing the path metric memory and simplifying its connection network to the add_compare_select unit (ACSU) for Viterbi decoder (VD) design. Using the derived equations for memory partition and add-compare-select (ACS) arrangement together with the extended in-place scheduling scheme proposed in this work, we can increase the memory bandwidth for conflict-free path metric accesses with hardwired interconnection between the path metric memory and ACSU. Compared with the existing work, the developed architecture possesses the following advantages: (1) Each partitioned memory bank can be treated as a local memory of a specific processing element, inside the ACSU, with hardwired interconnection, so that the interconnect complexity is reduced significantly. (2) The partitioned memory banks can be merged into only two pseudo-banks regardless of the number of adopted ACS processing elements. This not only greatly simplifies the design of address generation unit, but also makes smaller the physical size of required memory. (3) The implementation can be accomplished in a systematic way with regular and simple controlling circuitry. Experimental results demonstrate the effectiveness of the developed architecture and the benefit will be more apparent for convolutional codes with large memory order.
Tetsuo ASANO Shinnya BITOU Mitsuo MOTOKI Nobuaki USUI
This paper presents an algorithm for rotating a subimage in place without using any extra working array. Due to this constraint, we have to overwrite pixel values by interpolated values. Key ideas are local reliability test which determines whether interpolation at a pixel is carried out correctly without using interpolated values, and lazy interpolation which stores interpolated values in a region which is never used for output images and then fills in interpolated values after safety is guaranteed. It is shown that linear interpolation is always safely implemented. An extension to cubic interpolation is also discussed.
Liangpeng GUO Yici CAI Qiang ZHOU Xianlong HONG
Multiple supply voltage (MSV) is an effective scheme to achieve low power. Recent works in MSV are based on physical level and aim at reducing physical overheads, but all of them do not consider level converter, which is one of the most important issues in dual-vdd design. In this work, a logic and layout aware methodology and related algorithms combining voltage assignment and placement are proposed to minimize the number of level converters and to implement voltage islands with minimal physical overheads. Experimental results show that our approach uses much fewer level converters (reduced by 83.23% on average) and improves the power savings by 16% on average compared to the previous approach [1]. Furthermore, the methodology is able to produce feasible placement with a small impact to traditional placement goals.
Subjects' episodic memory performance is not simply reflected by eye movements. We use a 'theta phase coding' model of the hippocampus to predict subjects' memory performance from their eye movements. Results demonstrate the ability of the model to predict subjects' memory performance. These studies provide a novel approach to computational modeling in the human-machine interface.
Xu ZHANG Xiaohong JIANG Susumu HORIGUCHI
Three dimensional (3D) integrated circuits (ICs) have the potential to significantly enhance VLSI chip performance, functionality and device packing density. Interconnects delay and signal integrity issues are critical in chip design. In this paper, we extend the idea of redundant via insertion of conventional 2D ICs and propose an approach for vias insertion/placement in 3D ICs to minimize the propagation delay of interconnects with the consideration of signal integrity. The simulation results based on a 65 nm CMOS technology demonstrate that our approach in general can result in a 9% improvement in average delay and a 26% decrease in reflection coefficient. It is also shown that the proposed approach can be more effective for interconnects delay improvement when it is integrated with the buffer insertion in 3D ICs.
Han-Suh KOO Yong-Joon JEON Byeong-Moon JEON
This letter proposes a motion information inferring scheme for multi-view video coding motivated by the idea that the aspect of motion vector between the corresponding positions in the neighboring view pair is quite similar. The proposed method infers the motion information from the corresponding macroblock in the neighboring view after RD optimization with the existing prediction modes. This letter presents evaluation showing that the method significantly enhances the efficiency especially at high bit rates.
Areeyata SRIPETCH Poompat SAENGUDOMLERT
In a power grid used to distribute electricity, optical fibers can be inserted inside overhead ground wires to form an optical network infrastructure for data communications. Dense wavelength division multiplexing (DWDM)-based optical networks present a promising approach to achieve a scalable backbone network for power grids. This paper proposes a complete optimization procedure for optical network designs based on an existing power grid. We design a network as a subgraph of the power grid and divide the network topology into two layers: backbone and access networks. The design procedure includes physical topology design, routing and wavelength assignment (RWA) and optical amplifier placement. We formulate the problem of topology design into two steps: selecting the concentrator nodes and their node members, and finding the connections among concentrators subject to the two-connectivity constraint on the backbone topology. Selection and connection of concentrators are done using integer linear programming (ILP). For RWA and optical amplifier placement problem, we solve these two problems together since they are closely related. Since the ILP for solving these two problems becomes intractable with increasing network size, we propose a simulated annealing approach. We choose a neighborhood structure based on path-switching operations using k shortest paths for each source and destination pair. The optimal number of optical amplifiers is solved based on local search among these neighbors. We solve and present some numerical results for several randomly generated power grid topologies.
Moon Ho LEE Alexander DUDIN Alexy SHABAN Subash Shree POKHREL Wen Ping MA
Formulae required for accurate approximate calculation of transition probabilities of embedded Markov chain for single-server queues of the GI/ M/1,GI/M/1/K,M/G/1,M/G/1/K type with heavy-tail lognormal distribution of inter-arrival or service time are given.
The research on displacement vector detection has gained increasing attention in recent years. However, no relationship between displacement vectors and the outlines of objects in motion has been established. We describe a new method of detecting displacement vectors through edge segment detection by emphasizing the correlation between displacement vectors and their outlines. Specifically, after detecting an edge segment, the direction of motion of the edge segment can be inferred through the variation in the values of the Laplacian-Gaussian filter at the position near the edge segment before and after the motion. Then, by observing the degrees of displacement before and after the motion, the displacement vector can be calculated. The accuracy compared to other methods of displacement vector detection demonstrates the feasibility of this method.
Weixiang SHEN Yici CAI Xianlong HONG Jiang HU
As power consumption of the clock tree dominates over 40% of the total power in modern high performance VLSI designs, measures must be taken to keep it under control. One of the most effective methods is based on clock gating to shut off the clock when the modules are idle. However, previous works on gated clock tree power minimization are mostly focused on clock routing and the improvements are often limited by the given registers placement. The purpose of this work is to navigate the registers during placement to further reduce the clock tree power based on clock gating. Our method performs activity-aware register clustering that reduces the clock tree power not only by clumping the registers into a smaller area, but also by pulling the registers with the similar activity patterns closely to shut off the clock more time for the resultant subtrees. In order to reduce the impact of signal nets wirelength and power due to register clustering, we apply the timing and activity based net weighting in [14], which reduces the nets switching power by assigning a combination of activity and timing weights to the nets with higher switching rates or more critical timing. To tradeoff the power dissipated by the clock tree and the control signal, we extend the idea of local ungating in [6] and propose an algorithm of gate control signal optimization, which still sets the gate enable signal high if a register is active for a number of consecutive clock cycles. Experimental results on a set of MCNC benchmarks show that our approach is able to reduce the power and total wirelength of clock tree greatly with minimal overheads.
While web proxy caching is a widely deployed technique, the performance of a proxy cache is limited by the local storage. Some studies have addressed this limitation by using the residual resources of clients via a p2p method and have achieved a very high hit rate. However, these approaches treat web objects as homogeneous objects and there is no consideration of various web characteristics. Consequently, the byte hit rate of the system is limited, external bandwidth is wasted, and perceived user latency is increased. The present paper suggests an efficient p2p based web caching technique that manages objects with different policies so as to exploit the characteristics of web objects, such as size and temporal locality. Small objects are stored alone whereas large objects are stored by dividing them into numerous small blocks, which are distributed in clients. On a proxy cache, header blocks of large objects take the place of objects themselves and smaller objects are cached. This technique increases the hit rate. Unlike a web cache, which evicts large objects as soon as possible in the case where clients fulfill the role of backup storage, large objects are given higher priority than small objects in the proposed approach. This maximizes the effect of hits for large objects and thereby increases the byte hit rate. Furthermore, we construct simple latency models for various p2p based web caching systems and analyze the effects of the proposed policies on these systems. We then examine the performances of the efficient policies via a trace driven simulation. The results demonstrate that the proposed techniques effectively enhance web cache performance, including hit rate, byte hit rate, and response time.
Yici CAI Bin LIU Qiang ZHOU Xianlong HONG
The voltage island style has been widely accepted as an effective way to design low power high performance chips. This paper proposes an automated voltage island generation flow in standard cell based designs. Two important objectives in voltage island designs are addressed in this flow: 1) reducing power dissipation under given performance constraints; 2) reducing implementation overheads, mainly layout overheads caused by cell clustering to form islands. The first objective is handled with timing and power driven netweighting and timing analysis in voltage assignment. For the second objective, we propose layout aware voltage assignment, i.e., voltage assignment during placement. We iteratively perform the following to adjustments: adjustment on voltage assignment to facilitate voltage island generation, and adjustment on cell locations to cluster cells in voltage islands. These iterations lead to a flow featured with tightly integrated voltage assignment and cell placement. Experimental results have demonstrated the advantages of our approach.
Taisuke KAZAMA Makoto IKEDA Kunihiro ASADA
We propose a shot reduction technique of character projection (CP) Electron Beam Direct Writing (EBDW) using combined cell stencil (CCS) or the advanced process technology. CP EBDW is expected both to reduce mask costs and to realize quick turn around time. One of major issue of the conventional CP EBDW, however, is a throughput of lithography. The throughput is determined by numbers of shots, which are proportional to numbers of cell instances in LSIs. The conventional shot reduction techniques focus on optimization of cell stencil extraction, without any modifications on designed LSI mask patterns. The proposed technique employs the proposed combined cell stencil, with proposed modified design flow, for further shot reduction. We demonstrate 22.4% shot reduction within 4.3% area increase for a microprocessor and 28.6% shot reduction for IWLS benchmarks compared with the conventional technique.
Mitsuru TOMONO Masaki NAKANISHI Shigeru YAMASHITA Kazuo NAKAJIMA Katsumasa WATANABE
In a partially reconfigurable FPGA of the future, arbitrary portions of its logic resources and interconnection networks will be reconfigured without affecting the other parts. Multiple tasks will be mapped and executed concurrently in such an FPGA. Efficient execution of the tasks using the limited resources of the FPGA will necessitate effective resource management. A number of online FPGA placement methods have recently been proposed for such an FPGA. However, they cannot handle I/O communications of the tasks. Taking such I/O communications into consideration, we introduce a new approach to online FPGA placement. We present an algorithm for placing each arriving task in an empty area so as to complete all the tasks efficiently. We develop two fitting strategies to effectively handle I/O communications of the tasks. Our experimental results show that properly weighted combinations of these and two other previously proposed strategies enable this algorithm to run very fast and make an effective placement of the tasks. In fact, we show that the overhead associated with the use of this algorithm is negligible as compared to the total execution time of the tasks.
A multiple-places reservation discipline is studied in a discrete-time priority queueing system. We obtain the joint distribution of system state, from which the delays of high and low priority packets are derived. Comparison is made with the cases of FIFO, single-place reservation discipline and HOL priority.