Vasily G. MOSHNYAGA Hiroshi TSUJI
Due to a large capacitance and enormous access rate, caches dissipate about a third of the total energy consumed by today's processors. In this paper we present a new architectural technique to reduce energy consumption in caches. Unlike previous approaches, which have focused on lowering cache capacitance and the number of accesses, our method exploits a new freedom in cache design, namely the voltage per access. Since in modern block-buffered caches, the loading capacitance operated on block-hit is much less than the capacitance operated on miss, the given clock cycle time is inefficiently utilized during the hit. We propose to trade-off this unused time with the supply voltage, lowering the voltage level on the hit and increasing it on the miss. Experiments show that the approach can half the cache energy dissipation without large performance and area overhead.
First, we give a graph-theoretic formalization for the spare assignment problems for two cases of reconfiguring NN mesh-connected processor arrays with spares on a diagonal line in the array or two orthogonal lines at the edges of the array. Second, we discuss the problems for minimizing the numbers of "dangerous processors" for the cases. Here, a dangerous processor is a nonfaulty one for which there remains no spare processor to be assigned if it becomes faulty, without modifying the spare assignments to other faulty processors. The problem for the latter case, originally presented by Melhem, has already been discussed and solved by the O(N2) algorithm in [3], but it's procedure is very complicated. Using the above graph-theoretic formalization, we give efficient plain algorithms for minimizing the numbers of dangerous processors by which the problems for both the cases can be solved in O(N) time.
Shuji TSUKIYAMA Masakazu TANAKA Masahiro FUKUI
In this paper, we present a new algorithm for statistical static timing analysis of a CMOS combinatorial circuit, which takes correlations into account to improve accuracy of the distribution of the maximum delay of the circuit. The correlations treated in the algorithm are not only the one between distributions of arrival times of input signals to a logic gate but also correlation between switching delays of a logic gate and correlation between interconnect delays of a net. We model each delay by a normal distribution, and use a normal distribution of two stochastic variables with a coefficient of correlation for computing the maximum of two delays. Since the algorithm takes the correlation into account, the time complexity is O(m2) in the worst-case, where m is the number of edges of the graph representing a given circuit. But, for real combinatorial circuits, the complexity is expected to be less than this.
Kazuaki TAKAHASHI Ushio SANGAWA Suguru FUJITA Michiaki MATSUO Takeharu URABE Hiroshi OGURA Hiroyuki YABUKI
We propose a three-dimensional structure on a planar substrate employing micromachining technology. A low-loss suspended structure incorporating a BCB membrane employing deep trench etching technology has been newly proposed. A micromachined suspended line structure using BCB membrane film enables us to realize a low loss planar resonator, which achieved an unloaded quality factor (Q-factor) of more than 280 at 60 GHz. We design low-loss filters and antennas built into silicon in a 60 GHz band. A low-loss filter realizes an insertion loss of 1.0 dB at 60 GHz and a patch antenna obtains a 3% bandwidth. In addition, we demonstrate a 60 GHz receiver front-end IC incorporating the planar filter and the antenna, and obtain good results. These techniques enable us to integrate various functions into a compact package even in millimeter-wave bands.
Kiminori IRIYAMA Shunsuke IHARA
We study the reliability functions or the minimum r-achievable rates of the lossless coding for the general sources in the sense of Han-Verdu, where r means the exponent of the error probability. Han has obtained formulas for the minimum r-achievable rates of the general sources. Our aim is to give alternative expressions for the minimum r-achievable rates. Our result seems to be a natural extension of the known results for the stationary memoryless sources and Markov sources.
Carlos G. PUNTONET Ali MANSOUR
This paper presents a new adaptive blind separation of sources (BSS) method for linear and non-linear mixtures. The sources are assumed to be statistically independent with non-uniform and symmetrical PDF. The algorithm is based on both simulated annealing and density estimation methods using a neural network. Considering the properties of the vectorial spaces of sources and mixtures, and using some linearization in the mixture space, the new method is derived. Finally, the main characteristics of the method are simplicity and the fast convergence experimentally validated by the separation of many kinds of signals, such as speech or biomedical data.
Basabi CHAKRABORTY Goutam CHAKRABORTY
Feature subset selection basically depends on the design of a criterion function to measure the effectiveness of a particular feature or a feature subset and the selection of a search strategy to find out the best feature subset. Lots of techniques have been developed so far which are mainly categorized into classifier independent filter approaches and classifier dependant wrapper approaches. Wrapper approaches produce good results but are computationally unattractive specially when nonlinear neural classifiers with complex learning algorithms are used. The present work proposes a hybrid two step approach for finding out the best feature subset from a large feature set in which a fuzzy set theoretic measure for assessing the goodness of a feature is used in conjunction with a multilayer perceptron (MLP) or fractal neural network (FNN) classifier to take advantage of both the approaches. Though the process does not guarantee absolute optimality, the selected feature subset produces near optimal results for practical purposes. The process is less time consuming and computationally light compared to any neural network classifier based sequential feature subset selection technique. The proposed algorithm has been simulated with two different data sets to justify its effectiveness.
Jeong-Joon LEE Kyu-Young WHANG Yang-Sae MOON Eui-Kyung HONG
Web caching has become an important problem when addressing the performance issues in Web applications. The expiration time of the Web data item is useful a piece of information for performance enhancement in Web caching. In this paper, we introduce the notion of the effective reference probability that incorporates the effect of expiration time for Web caching. For a formal approach, we propose the continuous independent reference model extending the existing independent reference model. Based on this model, we define formally the effective reference probability and derive it theoretically. By simply replacing the reference probability in the existing cache replacement algorithms with the effective reference probability, we can take the effect of expiration time into account. The results of performance experiments show that the replacement algorithms using the effective reference probability always outperform existing ones. In particular, when the cache fraction is 0.05 and data update is comparatively frequent (i.e., the update frequency is more than 1/10 of the reference frequency), the performance is enhanced by more than 30% in LRU-2 and 13% in Aggarwal's method. The results show that the effective reference probability significantly enhances the performance of Web caching when the expiration time is given.
Seiichiro TANI Toshiaki MIYAZAKI
Network caches reduce network traffic as well as user response time. When implementing network caches, the object replacement problem is one of the core problems; The problem is to determine which objects should be evicted from a cache when there is insufficient space. This paper first formalizes the problem and gives a simple but sufficient condition for deterministic online algorithms to be competitive. Based on the condition, a general framework to make a non-competitive algorithm competitive is constructed. As an application of the framework, an online algorithm, called Competitive_SIZE, is proposed. Both event-driven and trace-driven simulations show that Competitive_SIZE is better than previously proposed algorithms such as LRU (Least Recently Used).
The formulation of the process of analog system design has been done on the basis of the control theory application. This approach generalizes the design process and produces different design trajectories inside the same optimization procedure. The problem of the optimal design algorithm construction is defined as the minimal-time problem of the control theory. The main equations for the proposed design methodology were elaborated. These equations include the special control functions that are introduced artificially to generalize the design problem. Optimal dependencies of the control functions give the possibility to reduce the total computer design time. This idea was tested with different optimization algorithms of the design process. Numerical results of some simple electronic circuit design demonstrate the efficiency of the proposed approach. These examples show that the traditional design strategy is not time-optimal and the potential computer time gain of the optimal design strategy increases when the size and complexity of the system increase.
This paper investigates the stochastic property of the packet destinations and proposes an address generation algorithm which is applicable for describing various Internet access patterns. We assume that a stochastic process of Internet access satisfies the stationary condition and derive the fundamental structure of the address generation algorithm. Pseudo IP-address sequence generated from our algorithm gives dependable cache performance and reproduces the results obtained from trace-driven simulation. The proposed algorithm is applicable not only to the destination IP address but also to the destination URLs of packets, and is useful for simulation studies of Internet performance, Web caching, DNS, and so on.
Myint Myint SEIN Hiromitsu HAMA
This paper presents an accurate method for finding the 3D control points of the B-Spline curves. This method can automatically fit a set of data points with piecewise geometrically continuous cubic B-Spline curves. Iterating algorithm has been used for finding the 2D control points. And a new approach for shape reconstruction based on the control points of the curves on the object's surface is proposed. B-Spline patch, the extension of the B-Spline curves to surface, provides recovering the shape of the object in 2D approach. The 3D control points of the cubic B-Spline curves are computed from the factor decomposition of the measurement matrix of 2D control points. The multiple object approach is also proposed to reconstruct the 3D shape of each curves of an object. Some experiments are demonstrated to confirm the effectiveness of our proposed method.
Hyokyung BAHN Yong Hyeon SHIN Kern KOH
A new Web cache sharing scheme is presented. Our method reduces the duplicated copies of the same objects in global shared Web caches, even though the hot working set of each local cache can be duplicated. Experimental results show that the proposed scheme outperforms existing sharing schemes.
Sun-Mo KIM Jung-Woo LEE Soo-Haeng LEE Sang-Bang CHOI
Cache memories are small fast memories used to temporarily hold the contents of main memory that are likely to be referenced by processors so as to reduce instruction and data access time. In study of cache performance, most of previous works have employed simulation-based methods. However, that kind of researches cannot precisely explain the obtained results. Moreover, when a new processor is designed, huge simulations must be performed again with several different parameters. This research classifies cache structures for superscalar processors into four types, and then represents analytical model of instruction fetch process for each cache type considering various kinds of architectural parameters such as the frequency of branch instructions in program, cache miss rate, cache miss penalty, branch misprediction frequency, and branch misprediction penalty, and etc. To prove the correctness of the proposed models, we performed extensive simulations and compared the results with the analytical models. Simulation results showed that the proposed model can estimate the expected instruction fetch rate accurately within 10% error in most cases. This paper shows that the increase of cache misses reduces the instruction fetch rate more severely than that of branch misprediction does. The model is also able to provide exact relationship between cache miss and branch misprediction for the instruction fetch analysis. The proposed model can explain the causes of performance degradation that cannot be uncovered by the simulation method only.
Byung In MOON Dong Ryul RYU Jong Wook HONG Tae Young LEE Sangook MOON Yong Surk LEE
We have designed a 32-bit RISC microprocessor with 16-/32-bit fixed-point DSP functionality. This processor, called YD-RISC, combines both general-purpose microprocessor and digital signal processor (DSP) functionality using the reduced instruction set computer (RISC) design principles. It has functional units for arithmetic operation, digital signal processing (DSP) and memory access. They operate in parallel in order to remove stall cycles after DSP or load/store instructions, which usually need one or more issue latency cycles in addition to the first issue cycle. High performance was achieved with these parallel functional units while adopting a sophisticated five-stage pipeline structure. The pipelined DSP unit can execute one 32-bit multiply-accumulate (MAC) or 16-bit complex multiply instruction every one or two cycles through two 17-b 17-b multipliers and an operand examination logic circuit. Power-saving techniques such as power-down mode and disabling execution blocks allow low power consumption. In the design of this processor, we use logic synthesis and automatic place-and-route. This top-down approach shortens design time, while a high clock frequency is achieved by refining the processor architecture.
Jooyong KIM Hyokyung BAHN Kern KOH
Caching at the Web proxy server plays an important role in reducing the response time, the network traffic, and the load of Web servers. Many recent studies have proposed and examined the replacement and consistency policies for the proxy cache, which plays a central role in the performance of caching components. For better performance, they exploit various metadata of Web objects, such as the reference count, reference time, and modification time information of past behaviors, to estimate the re-reference likelihood and freshness of the objects. However, all of these known to the authors use the metadata only when the actual object is in the cache. We observed from various proxy traces that about 20-30% of clients' requests incurred only the validity checks of cached objects without transferring actual objects from the proxy server. In this case, only the metadata are necessary at the proxy server. This paper proposes a proxy cache consistency policy that uses the metadata even for absent objects. These include the time information of evicted objects from the cache and those out of the header-only replies from Web servers. Trace-driven simulations with public proxy cache traces show that our policy reduces the response time and the number of connections to Web servers significantly.
Junnosuke MORIYA Tetsuro NISHINO
In the filed of cognitive psychology, simple recurrent networks are used for modeling the natural language processing in the human brain. For example, Elman experimentally showed that the simple recurrent networks can predict the right-most word in sentential forms of a particular grammar which can generate compound sentences with high probability. Concerning these results, it is natural to ask whether the computational capability of the simple recurrent networks is sufficient to recognize natural languages. In this paper, we assume that the range of a function computed at each gate of a simple recurrent network is a finite set. This is a quite realistic assumption, because we cannot physically implement a gate whose range is an infinite set. Then, we define equivalence relations between simple recurrent networks and Mealy machines or Moore machines, which are finite automata with output. Then, under our assumption, we show (1) a construction of a Mealy machine which simulates a given simple recurrent network, and (2) a construction of a simple recurrent network which simulates a given Moore machine. From these two constructions, we can conclude that the computational capability of the simple recurrent networks is equal to that of finite automata with output under our assumption.
Takeru AMANO Fumio KOYAMA Nobuhiko NISHIYAMA Akihiro MATSUTANI Kenichi IGA
A novel temperature insensitive wavelength filter consisting of GaAlAs/GaAs distributed Bragg reflectors (DBRs) has been demonstrated. This micromachined DBR is mechanically tuned by differential thermal expansion. The strain-induced displacement of one mirror can generate wavelength tuning and trimming functions with an adjustable temperature dependence. We succeeded in the control of temperature dependence in this micromachined semiconductor filter by properly designing a vertical cavity structure. The achieved temperature dependence was as small as +0.01 nm/K, which is one-tenth of that of conventional semiconductor based optical filters. Also, a wavelength trimming of over 20 nm was demonstrated after completing the device fabrication. In addition, we demonstrated a 4 4 multiple wavelength micromachined vertical cavity filter array. The multi-wavelength filter array with a wavelength span of 45 nm was fabricated by partially etching off a GaAs wavelength control layer loaded on the top surface of device.
Takeru AMANO Fumio KOYAMA Nobuhiko NISHIYAMA Akihiro MATSUTANI Kenichi IGA
A novel temperature insensitive wavelength filter consisting of GaAlAs/GaAs distributed Bragg reflectors (DBRs) has been demonstrated. This micromachined DBR is mechanically tuned by differential thermal expansion. The strain-induced displacement of one mirror can generate wavelength tuning and trimming functions with an adjustable temperature dependence. We succeeded in the control of temperature dependence in this micromachined semiconductor filter by properly designing a vertical cavity structure. The achieved temperature dependence was as small as +0.01 nm/K, which is one-tenth of that of conventional semiconductor based optical filters. Also, a wavelength trimming of over 20 nm was demonstrated after completing the device fabrication. In addition, we demonstrated a 4 4 multiple wavelength micromachined vertical cavity filter array. The multi-wavelength filter array with a wavelength span of 45 nm was fabricated by partially etching off a GaAs wavelength control layer loaded on the top surface of device.
David Chee Kheong SIEW Gang FENG
The problem of finding a minimum-cast multicast tree (Steiner tree) is known as NP complete. Heuristic based algorithms for this problem to achieve good performance are usually time-consuming. In this paper, we propose a new strategy called tree-caching for efficient multicast connection setup in connection-oriented networks. In this scheme, the tree topologies that have been computed are cached in a database of the source nodes. This can reduce the connection establishment time for subsequent connection requests which have some common multicast members, by an efficient reuse of cached trees without having to re-run a multicast routing algorithm for the whole group. This method can provide an efficient way to eliminate, when ever possible, the expensive tree computation algorithm that has to be performed in setting up a multicast connection. We first formulate the problem of tree-caching and then propose a tree-caching algorithm to reduce the complexity of the tree computations when a new connection is to be established. Through simulations, we find that the proposed tree-caching strategy performs very well and can significantly reduce the computation complexity for setting up multicast connections.