Let f and g be two maps from a set E into a set F such that f(x) g(x) for every x in E. Sahili [8] has shown that, if min {|f-1(z)|,|g-1(z)|}≤ n for each z∈ F, then E can be partitioned into at most 2n+1 sets E1,..., E2n+1 such that f(Ei)∩ g(Ei)=
Temperature-tracking is becoming of paramount importance in modern electronic design automation tools. In this paper, we present a deterministic thermal placement algorithm for standard cell based layout which can lead to a smooth temperature distribution over the die. It is mainly based on Fiduccia-Mattheyses partition scheme and a former substrate thermal model that can convert the known temperature constraints into the corresponding power distribution constraints. Moreover, a kind of force-directed heuristic based on cells' power consumption is introduced in the above process. Experimental results demonstrate a comparatively uniform temperature distribution and show a reduction of the maximal temperature on the die.
Satoshi TAOKA Kazuya WATANABE Toshimasa WATANABE
Let G = (D ∪ S,E) be an undirected graph with a vertex set D ∪ S and an (undirected) edge set E, where the vertex set is partitioned into two subsets, a demand vertex set D and a supply vertex set S. We assume that D ≠
CheolHong KIM SungWoo CHUNG ChuShik JHON
Energy efficiency of cache memories is crucial in designing embedded processors. Reducing energy consumption in the instruction cache is especially important, since the instruction cache consumes a significant portion of total processor energy. This paper proposes a new instruction cache architecture, named Partitioned Instruction Cache (PI-Cache), for reducing dynamic energy consumption in the instruction cache by partitioning it to smaller (less power-consuming) sub-caches. When the proposed PI-Cache is accessed, only one sub-cache is accessed by utilizing the temporal/spatial locality of applications. In the meantime, other sub-caches are not accessed, leading to dynamic energy reduction. The PI-Cache also reduces dynamic energy consumption by eliminating the energy consumed in tag lookup and comparison. Moreover, the performance gap between the conventional instruction cache and the proposed PI-Cache becomes little when the physical cache access time is considered. We evaluated the energy efficiency by running a cycle accurate simulator, SimpleScalar, with power parameters obtained from CACTI. Simulation results show that the PI-Cache improves the energy-delay product by 20%-54% compared to the conventional direct-mapped instruction cache.
Yuichi NAKAMURA Ko YOSHIKAWA Takeshi YOSHIMURA
This paper describes a novel engineering change order (ECO) design method for large-scale, high performance LSIs, based on a patchwork-like partitioning technique. In conventional design methods, even when only small changes are made to the design after the placement and routing process, a whole re-layout must be done, and this is very time consuming. Using the proposed method, we can partition the design into several parts after logic synthesis. When design changes occur in HDL, only the parts related to the changes need to be redesigned. The netlist for the changed design remains almost the same as the original, except for the small changed parts. For partitioning, we used multiple-fan-out-points as partition borders. An experimental evaluation of our method showed that when a small change was made in the RTL description, the revised circuit part had only about 87 gates on average. This greatly reduces the re-layout time required for implementing an ECO. In actual commercial designs in which several design changes are required, it takes only one day to redesign.
Sung-Hwan JUNG Jung-Wan HONG Chang-Hoon LIE
An adaptive service framework is expected to support real-time multimedia services in wirless/mobile cellular networks with various classes of traffic and diverse bandwidth requirements. Quality of service (QoS) provisioning in an adaptive framework is another challenging consideration, such as quantifying the level of bandwidth degradation of an ongoing calls and guaranteeing stable QoS levels. Considering both the period and the depth of degradation, the degradation area ratio (DAR) represents the average ratio of a call's degradation and is one of the meaningful measures for adaptive service in call level analysis. In this paper, analytical models for estimating the DAR and finding the optimal control parameters are presented in multi-class traffic call management situations. In complete partitioning capacity based threshold-type call admission control (CAC), a one-dimensional Markov chain with an absorbing state is proposed for estimating the DAR in each traffic class. We formulate a two-leveled optimization problem minimizing the total blocking probabilities subject to QoS requirements and present the procedures required in finding the optimal capacities and threshold values by using modified dynamic programming. In complete sharing capacity based threshold-type CAC, the multidimensional Markov model is approximately reduced to a one-dimensional model in order to reduce complexity and hence calculation time. The reduced model is compared with multidimensional Markov model in numerical examples. The optimization problem is formulated minimizing the total blocking probabilities subject to QoS requirements and the optimal threshold parameters are found by using a genetic algorithm. Performance of two adopted admission policies in adaptive framework situations is illustrated by numerical results.
Susu JIANG Kentaro IKEMOTO Ryuji KOHNO
We introduce a differential space-time block code (DSTBC) integrated with trellis coded modulation with two transmit antennas. Our scheme enables transmission of DSTBC encoded symbols as trellis metric rather than concatenating an outer code. Unlike conventional DSTBC, different transmit symbol phase rotations are used for each transmit antenna in order to obtain more options for trellis branch. The set partitioning for proposed codes is derived as well. The decoder computes decision statistic using Viterbi Algorithm with different number of states undergoing Rayleigh fading channels. This approach can provide full diversity gain as well as coding gain simultaneously remaining full transmit rate, which cannot be obtained by conventional DSTBC.
Aryuanto SOETEDJO Koichi YAMADA
Traffic sign recognition usually consists of two stages: detection and classification. In this paper, we describe the classification stage using the ring-partitioned method. The proposed method uses a specified grayscale image in the pre-processing step and ring-partitioned matching in the matching step. The method does not need carefully prepared many samples of traffic sign images for the training process, alternatively only the standard traffic signs are used as the reference images. The experimental results show the effectiveness of the method in the matching of occluded, rotated, and illumination problems of the traffic sign images with the fast computation time.
Jing WANG Naoya NITTA Hiroyuki SEKI
A distributed network-oriented Intrusion Detection System (IDS) is a mechanism which detects misuse accesses to an intra-network by distributed IDSs on the network with decomposed attack scenarios. However, there are only ad hoc algorithms for determining a deployment of distributed IDSs and a partition of the attack scenarios. In this paper, we formally define this problem as the IDS partition deployment problem and design an efficient algorithm for a simplified version of the problem by graph theoretical techniques.
Tso-Bing JUANG Shen-Fu HSIAO Ming-Yu TSAI Jenq-Shiun JAN
In this paper, a cell-driven multiplier generator is developed that can produce high-performance gate-level netlists for multiplier-related arithmetic functional units, including multipliers, multiplier and accumulators (MAC) and dot product calculator. The generator optimizes the speed/area performance both in the partial product compression and in the final addition stage for the specified process technology. In addition to the conventional CMOS full adder cells, we have also designed fast compression elements based on pass-transistor logic for further performance improvement of the generated multipliers. Simulation results show that our proposed generator could produce better multiplier-related functional units compared to those generated using Synopsys Designware library or other previously proposed approaches.
Let G = (V,E) be a connected graph such that each edge e ∈ E and each vertex v ∈ V are weighted by nonnegative reals w(e) and h(v), respectively. Let r be a vertex designated as a root, and p be a positive integer. The minmax rooted-subtree cover problem (MRSC) asks to find a partition X = {X1,X2,...,Xp of V and a set of p subtrees T1,T2,...,Tp such that each Ti contains Xi∪{r} so as to minimize the maximum cost of the subtrees, where the cost of Ti is defined to be the sum of the weights of edges in Ti and the weights of vertices in Xi. Similarly, the minmax rooted-cycle cover problem (MRCC) asks to find a partition X = {X1,X2,...,Xp} of V and a set of p cycles C1,C2,...,Cp such that Ci contains Xi∪{r} so as to minimize the maximum cost of the cycles, where the cost of Ci is defined analogously with the MRSC. In this paper, we first propose a (3-2/(p+1))-approximation algorithm to the MRSC with a general graph G, and we then give a (6-4/(p+1))-approximation algorithm to the MRCC with a metric (G,w).
Hun-Chen CHEN Tian-Sheuan CHANG Jiun-In GUO Chein-Wei JEN
This paper presents a long length discrete Hartley transform (DHT) design with a new hardware efficient distributed arithmetic (DA) approach. The new DA design approach not only explores the constant property of coefficients as the conventional DA, but also exploits its cyclic property. To efficiently apply this approach to long length DHT, we first decompose the long length DHT algorithm to short ones using the prime factor algorithm (PFA), and further reformulate it by using Agarwal-Cooley algorithm such that all the partitioned short DHT still consists of the cyclic property. Besides, we also exploit the scheme of computation sharing on the content of ROM to reduce the hardware cost with the trade-off in slowing down the computing speeds. Comparing with the existing designs shows that the proposed design can reduce the area-delay product from 52% to 91% according to a 0.35 µm CMOS cell library.
Hideki KAWAZU Jumpei UCHIDA Yuichiro MIYAOKA Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
A b-bit SIMD functional unit has n k-bit sub-functional units in itself, where b = k n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a processor core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a processor core with small area under the given timing constraint. We expect that we can obtain processor core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.
Xiaoqing WEN Seiji KAJIHARA Hideo TAMAMOTO Kewal K. SALUJA Kozo KINOSHITA
This paper presents a novel approach to improving the IDDQ-based diagnosability of a CMOS circuit by dividing the circuit into independent partitions and using a separate power supply for each partition. This technique makes it possible to implement multiple IDDQ measurement points, resulting in improved IDDQ-based diagnosability. The paper formalizes the problem of partitioning a circuit for this purpose and proposes optimal and heuristic based solutions. The effectiveness of the proposed approach is demonstrated through experimental results.
Daebum CHOI Vladimir SHIN Jun IL AHN Byung-Ha AHN
This paper considers the problem of recursive filtering for linear discrete-time systems with uncertain observation. A new approximate adaptive filter with a parallel structure is herein proposed. It is based on the optimal mean square combination of arbitrary number of correlated estimates which is also derived. The equation for error covariance characterizing the mean-square accuracy of the new filter is derived. In consequence of parallel structure of the filtering equations the parallel computers can be used for their design. It is shown that this filter is very effective for multisensor systems containing different types of sensors. A practical implementation issue to consider this filter is also addressed. Example demonstrates the accuracy and efficiency of the proposed filter.
Current centralized restoration schemes are unsuitable for the increase of scale and complexity of networks. A novel distributed network partition scheme is proposed in this paper. In this scheme, a large-scale network can be partitioned into some wheellike sub-networks with nuclear nodes. In wheellike sub-networks, ring links and spoke links could provide reciprocal safeguard. Based on such structure, different distributed restoration schemes can be combined for failure restoration. The proposed partition approach has been implemented through computer simulation, and it was tested on practical national-scale optical networks. The simulation result shows that this scheme is practicable and effectual.
Hyuk-Choon KWON Won-Seok JANG Sang-Kook HAN
We have proposed and experimentally demonstrated a novel WDM-PON downstream optical link. It is composed of a wavelength-locked FP-LD with a spectrally-sliced FP-LD as an external-injection optical source and operated as directly-modulating in a downstream-traffic transmitter. The downstream transmissions at 622 Mbps and 2.5 Gbps were performed for four channels over 25 km. The proposed WDM-PON downstream transmitter can be expanded up to eight channels by controlling an external-injection optical source of a spectrally-sliced FP-LD. Also, the transmitter has facility of multi-channel selection by controlling temperature. We verified the potential of the transmitter in WDM-PON optical link.
This paper presents an efficient scaling-simulation algorithm that simulates operations of the reconfigurable mesh (RM) of size n n using the mesh with multiple partitioned buses (MMPB) of size m m (m < n). The RM and the MMPB are the two-dimensional mesh-connected computers equipped with broadcasting buses. The broadcasting buses of the RM can be used to dynamically obtain various interconnection patterns among the processors during the execution of programs, while those of the MMPB are placed only to every row and column and are statically partitioned in advance by a fixed length. We show that the RM of size n n can be simulated in steps by the MMPB of size m m (m < n), where L is the number of broadcasting buses in each row/column of the simulating MMPB. Although the time-complexity of our algorithm is less efficient than that of the fastest RM scaling-simulation algorithm, the simulating model of our algorithm is the MMPB model where the bus-reconfiguration is not allowed.
Herng-Jer LEE Ming-Hong LAI Chia-Chi CHU Wu-Shiung FENG
A new moment computation technique for general lumped R(L)C interconnect circuits with multiple resistor loops is proposed. Using the concept of tearing, a lumped R(L)C network can be partitioned into a spanning tree and several resistor links. The contributions of network moments from each tree and the corresponding links can be determined independently. By combining the conventional moment computation algorithms and the reduced ordered binary decision diagram (ROBDD), the proposed method can compute system moments efficiently. Experimental results have demonstrate that the proposed method can indeed obtain accurate moments and is more efficient than the conventional approach.
Jumpei UCHIDA Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems. In the systems, we can describe parallel behaving circuit blocks (threads) explicitly. First it focuses on a local register file RF in a thread. It partitions a thread into two sub-threads, one of which has RF and the other does not have RF. The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. Then we can synthesize a low power circuit with a low area overhead, compared to the original circuit. Experimental results demonstrate effectiveness and efficiency of the algorithm.