Dynamic instruction window resizing (DIWR) is a scheme that effectively exploits both memory-level parallelism and instruction-level parallelism by configuring the instruction window size appropriately for exploiting each parallelism. Although a previous study has shown that the DIWR processor achieves a significant speedup, power consumption has not been explored. The power consumption is increased in DIWR because the instruction window resources are enlarged in memory-intensive phases. If the power consumption exceeds the power budget determined by certain requirements, the DIWR processor must save power and thus, the performance previously presented cannot be achieved. In this paper, we explore to what extent the DIWR processor can achieve improved performance for a given power budget, assuming that dynamic voltage and frequency scaling (DVFS) is introduced as a power saving technique. Evaluation results using the SPEC2006 benchmark programs show that the DIWR processor, even with a constrained power budget, achieves a speedup over the conventional processor over a wide range of given power budgets. At the most important power budget point, i.e., when the power a conventional processor consumes without any power constraint is supplied, DIWR achieves a 16% speedup.
Two-step physical register deallocation (TSD) is an architectural scheme that enhances memory-level parallelism (MLP) by pre-executing instructions. Ideally, TSD allows exploitation of MLP under an unlimited number of physical registers, and consequently only a small register file is needed for MLP. In practice, however, the amount of MLP exploitable is limited, because there are cases where either 1) pre-execution is not performed; or 2) the timing of pre-execution is delayed. Both are due to data dependencies among the pre-executed instructions. This paper proposes the use of value prediction to solve these problems. This paper proposes the use of value prediction to solve these problems. Evaluation results using the SPECfp2000 benchmark confirm that the proposed scheme with value prediction for predicting addresses achieves equivalent IPC, with a smaller register file, to the previous TSD scheme. The reduction rate of the register file size is 21%.
Out-of-order superscalar processors rename register numbers to remove false dependencies between instructions. A renaming logic for register renaming is a high-cost module in a superscalar processor, and it consumes considerable energy. A renamed trace cache (RTC) was proposed for reducing the energy consumption of a renaming logic. An RTC caches and reuses renamed operands, and thus, register renaming can be omitted on RTC hits. However, conventional RTCs suffer from several performance, energy consumption, and hardware overhead problems. We propose a semi-global renamed trace cache (SGRTC) that caches only renamed operands that are short distance from producers outside traces, and solves the problems of conventional RTCs. Evaluation results show that SGRTC achieves 64% lower energy consumption for renaming with a 0.2% performance overhead as compared to a conventional processor.
Hiroshi TOYAO Noriaki ANDO Takashi HARADA
A novel approach is proposed for miniaturizing the unit cell size of electromagnetic bandgap (EBG) structures that suppress power plane noise. In this approach, open stubs are introduced into the shunt circuits of these EBG structures. Since the stub length determines the resonant frequencies of the shunt circuit, the proposed structures can maintain the bandgaps at lower frequencies without increasing the unit cell size. The bandgap frequencies were estimated by dispersion analysis based on the Bloch theorem and full-wave simulations. Sample boards of the proposed EBG structures were fabricated with a unit cell size of 2.1 mm. Highly suppressed noise propagation over the estimated frequency range of 1.9-3.6 GHz including the 2.4-GHz wireless-LAN band was experimentally demonstrated.
Kyohei YAMAGUCHI Yuya KORA Hideki ANDO
This paper evaluates the delay of the issue queue in a superscalar processor to aid microarchitectural design, where quick quantification of the complexity of the issue queue is needed to consider the tradeoff between clock cycle time and instructions per cycle. Our study covers two aspects. First, we introduce banking tag RAM, which comprises the issue queue, to reduce the delay. Unlike normal RAM, this is not straightforward, because of the uniqueness of the issue queue organization. Second, we explore and identify the correct critical path in the issue queue. In a previous study, the critical path of each component in the issue queue was summed to obtain the issue queue delay, but this does not give the correct delay of the issue queue, because the critical paths of the components are not connected logically. In the evaluation assuming 32-nm LSI technology, we obtained the delays of issue queues with eight to 128 entries. The process of banking tag RAM and identifying the correct critical path reduces the delay by up to 20% and 23% for 4- and 8-issue widths, respectively, compared with not banking tag RAM and simply summing the critical path delay of each component.
Masami SHISHIBORI Kazuaki ANDO Yuuichirou KASHIWAGI Jun-ichi AOE
Natural language interface systems can accept more unrestricted queries from users than other systems, however it is impossible to understand erroneous sentences which include the syntax errors, unknown words and misspelling. In order to realize the superior natural language interface, the automatic error correction for erroneous sentences is one of problems to be solved. The method to apply the LR parsing strategies is one of the famous approaches as the robust error recovery scheme. This method is able to obtain a high correction accuracy, however it takes a great deal of time to parse the sentence, such that it becomes a very important task to improve the time-cost. In this paper, we propose the method to improve the time efficiency, keeping the correction accuracy of the traditional method. This method makes use of a new parsing table that denotes the states to be transited after accepting each symbol. By using this table, the symbol which is allocated just after the error position can be utilized for selecting correction symbols, as a result, the number of candidates produced on the correction process is reduced, and fast system can be realized. The experiment results, using 1,050 sentences including error characters, show that this method can correct error points 69 times faster than the traditional method, also keep the same correction accuracy as the traditional method.
Yoshiaki ANDO Ning GUAN Ken'ichiro YASHIRO Sumio OHKAWA
Excitation of magnetostatic surface waves by slot line transducers is analyzed by using the integral kernel expansion method. The Fourier integral for the current density is derived in terms of an unknown normal component of the magnetic flux density in a slot region. The integral kernel is expanded into a series of orthogonal polynomials and then applying Galerkin's method to the resulting equation yields a system of linear equations for the unknown coefficients. Comparison of a numerical result by the present method with an experiment is in good agreement.
Yoshiaki ANDO Masashi HAYAKAWA
The perfect matched layer (PML) is formulated for the use in the constrained interpolation profile (CIP) method. Numerical results are presented to examine the performance of the proposed formulation of the PML in the case of two-dimensional TM wave. The results show that the proposed methods suppress the reflection effectively in comparison with the natural absorbing boundary condition of the CIP method. We have two methods to formulate the PML, and it is shown that the both methods have equal characteristics.
Ki ANDO Yuki HASEGAWA Hitoshi MAEKAWA Teruaki KATSUBE
The bioelectric potential of plants is generated by ion concentration difference between inside and outside of plant cells. It has been reported that the bioelectric potential of leaves changes at the beginning of steady irradiation and intensity of the potential response increases with the photosynthetic rate. Although it has been reported that photosynthesis is accelerated by blinking irradiation, the potential response under the blinking irradiation have not been fully clarified. In this study, we measured the bioelectric potential and CO2 consumption of plants under various types of the blinking irradiation. This result showed that the potential response under the blinking irradiation has various behaviors and intensity of the response related to photosynthetic rate. We conclude that our method is suitable for monitoring the biological activity of plants such as photosynthesis.
Hideyuki ANDO Maki SUGIMOTO Taro MAEDA
There has recently been considerable interest in research on wearable non-grounded force display. However, there have been no developments for the communication of nonverbal information (ex. tennis and golf swing). We propose a small and lightweight wearable force display to present motion timing and direction. The display outputs a torque using rotational moment and mechanical brakes. We explain the principle of this device, and describe an actual measurement of the torque and torque sensitivity experiments.
Ryota SHIOYA Ryo TAKAMI Masahiro GOSHIMA Hideki ANDO
Out-of-order superscalar processors have high performance but consume a large amount of energy for dynamic instruction scheduling. We propose a front-end execution architecture (FXA) for improving the energy efficiency of out-of-order superscalar processors. FXA has two execution units: an out-of-order execution unit (OXU) and an in-order execution unit (IXU). The OXU is the execution core of a common out-of-order superscalar processor. In contrast, the IXU consists only of functional units and a bypass network only. The IXU is placed at the processor front end and executes instructions in order. The IXU functions as a filter for the OXU. Fetched instructions are first fed to the IXU, and the instructions are executed in order if they are ready to execute. The instructions executed in the IXU are removed from the instruction pipeline and are not executed in the OXU. The IXU does not include dynamic scheduling logic, and thus its energy consumption is low. Evaluation results show that FXA can execute more than 50% of the instructions by using IXU, thereby making it possible to shrink the energy-consuming OXU without incurring performance degradation. As a result, FXA achieves both high performance and low energy consumption. We evaluated FXA and compared it with conventional out-of-order/in-order superscalar processors after ARM big.LITTLE architecture. The results show that FXA achieves performance improvements of 7.4% on geometric mean in SPECCPU INT 2006 benchmark suite relative to a conventional superscalar processor (big), while reducing the energy consumption by 17% in the entire processor. The performance/energy ratio (the inverse of the energy-delay product) of FXA is 25% higher than that of a conventional superscalar processor (big) and 27% higher than that of a conventional in-order superscalar processor (LITTLE).
Eisuke HARAGUCHI Hitomi ONO Junya NISHIOKA Toshiyuki ANDO Masateru NAGASE Akira AKAISHI Takashi TAKAHASHI
To provide a satellite communication system with high reliability for social infrastructure, building flexible beam adapting to change of communication traffic is necessary. Optical Beam Forming Network has the capability of broadband transmission and small light construction. However, in space environment, there are concerns that the reception efficiency is reduced by the relative phase error of receiving signal among antenna elements with temperature fluctuation. To prevent this, we control relative phase among received signals with optical phase locked loop. In this paper, we propose the active optical phased array system using multi dither heterodyning technique for receiving OBF, and present experimental results under temperature fluctuation. We evaluated the stability of relative phase among 3 elements for temperature fluctuation at multiplexer from -15 to 45, and checked the stability of PLL among 3 elements.
Yoshinao ISOBE Nobuhiko MIYAMOTO Noriaki ANDO Yutaka OIWA
In this paper, we demonstrate that a formal approach is effective for improving reliability of cooperative robot designs, where the control logics are expressed in concurrent FSMs (Finite State Machines), especially in accordance with the standard FSM4RTC (FSM for Robotic Technology Components), by a case study of cooperative transport robots. In the case study, FSMs are modeled in the formal specification language CSP (Communicating Sequential Processes) and checked by the model-checking tool FDR, where we show techniques for modeling and verification of cooperative robots implemented with the help of the RTM (Robotic Technology Middleware).
Yoshiaki ANDO Yusuke TAKAHASHI
This paper presents an application of the constained interpolation profile basis set (CIP-BS) method to electromagnetic fields analyses. Electromagnetic fields can be expanded in terms of multi-dimensional CIP basis functions, and the Galerkin method can then be applied to obtain a system of linear equations. In the present study, we focus on a two-dimensional problem with TMz polarization. In order to examine the precision of the CIP-BS method, TE202 resonant mode in a rectangular cavity is analyzed. The numerical results show that CIP-BS method has better performance than the finite-difference time-domain (FDTD) method when the time step is small. Then an absorbing boundary condition based on the perfectly matched layer (PML) is formulated, and the absorption performance is demonstrated. Finally, the propagation in an inhomogeneous medium is computed by using the proposed method, and it is observed that in the CIP-BS method, smooth variation of material constants is effectively formulated without additional computational costs, and that accurate results are obtained in comparison with the FDTD method even if the permittivity is high.
Yuya KORA Kyohei YAMAGUCHI Hideki ANDO
Single-thread performance has not improved much over the past few years, despite an ever increasing transistor budget. One of the reasons for this is that there is a speed gap between the processor and main memory, known as the memory wall. A promising method to overcome this memory wall is aggressive out-of-order execution by extensively enlarging the instruction window resources to exploit memory-level parallelism (MLP). However, simply enlarging the window resources lengthens the clock cycle time. Although pipelining the resources solves this problem, it in turn prevents instruction-level parallelism (ILP) from being exploited because issuing instructions requires multiple clock cycles. This paper proposed a dynamic scheme that adaptively resizes the instruction window based on the predicted available parallelism, either ILP or MLP. Specifically, if the scheme predicts that MLP is available during execution, the instruction window is enlarged and the window resources are pipelined, thereby exploiting MLP. Conversely, if the scheme predicts that less MLP is available, that is, ILP is exploitable for improved performance, the instruction window is shrunk and the window resources are de-pipelined, thereby exploiting ILP. Our evaluation results using the SPEC2006 benchmark programs show that the proposed scheme achieves nearly the best performance possible with fixed-size resources. On average, our scheme realizes a performance improvement of 21% over that of a conventional processor, with additional cost of only 6% of the area of the conventional processor core or 3% of that of the entire processor chip. The evaluation results also show 8% better energy efficiency in terms of 1/EDP (energy-delay product).
Yasutaka MATSUDA Ryota SHIOYA Hideki ANDO
The high energy consumption of current processors causes several problems, including a limited clock frequency, short battery lifetime, and reduced device reliability. It is therefore important to reduce the energy consumption of the processor. Among resources in a processor, the issue queue (IQ) is a large consumer of energy, much of which is consumed by the wakeup logic. Within the wakeup logic, the tag comparison that checks source operand readiness consumes a significant amount of energy. This paper proposes an energy reduction scheme for tag comparison, called double-stage tag comparison. This scheme first compares the lower bits of the tag and then, only if these match, compares the higher bits. Because the energy consumption of tag comparison is roughly proportional to the total number of bits compared, energy is saved by reducing this number. However, this sequential comparison increases the delay of the IQ, thereby increasing the clock cycle time. Although this can be avoided by allocating an extra cycle to the issue operation, this in turn degrades the IPC. To avoid IPC degradation, we reconfigure a small number of entries in the IQ, where several oldest instructions that are likely to have an adverse effect on performance reside, to a single stage for tag comparison. Our evaluation results for SPEC2017 benchmark programs show that the double-stage tag comparison achieves on average a 21% reduction in the energy consumed by the wakeup logic (15% when including the overhead) with only 3.0% performance degradation.
Yoshiaki ANDO Hiroyuki SAITO Masashi HAYAKAWA
A total-field/scattered-field (TF/SF) boundary which is commonly used in the finite-difference time-domain (FDTD) method to illuminate scatterers by plane waves, is developed for use in the constrained interpolation profile (CIP) method. By taking the numerical dispersion into account, the nearly perfect TF/SF boundary can be achieved, which allows us to calculate incident fields containing high frequency components without fictitious scattered fields. First of all, we formulate the TF/SF boundary in the CIP scheme. The numerical dispersion relation is then reviewed. Finally the numerical dispersion is implemented in the TF/SF boundary to estimate deformed incident fields. The performance of the nearly perfect TF/SF boundary is examined by measuring leaked fields in the SF region, and the proposed method drastically diminish the leakage compared with the simple TF/SF boundary.
Yuki ANDO Seiya SHIBATA Shinya HONDA Hiroyuki TOMIYAMA Hiroaki TAKADA
We present a hardware sharing method for design space exploration of multi-processor embedded systems. In our prior work, we had developed a system-level design tool named SystemBuilder which automatically synthesizes target implementation of a system from a functional description. In this work, we have extended SystemBuilder so that it can automatically synthesize an area-efficient implementation which shares a hardware module among different applications. With SystemBuilder, designers only need to enable an option in order to share a hardware module. The designers, therefore, can easily explore a design space including hardware sharing in short time. A case study shows the effectiveness of the hardware sharing on design space exploration.
Takuya KOJIMA Naoki ANDO Hayate OKUHARA Ng. Anh Vu DOAN Hideharu AMANO
Variable Pipeline Cool Mega Array (VPCMA) is a low power Coarse Grained Reconfigurable Architecture (CGRA) based on the concept of CMA (Cool Mega Array). It provides a pipeline structure in the PE array that can be configured so as to fit target algorithms and required performance. Also, VPCMA uses the Silicon On Thin Buried oxide (SOTB) technology, a type of Fully Depleted Silicon On Insulator (FDSOI), so it is possible to control its body bias voltage to provide a balance between performance and leakage power. In this paper, we study the optimization of the VPCMA body bias while considering simultaneously its variable pipeline structure. Through evaluations, we can observe that it is possible to achieve an average reduction of energy consumption, for the studied applications, of 17.75% and 10.49% when compared to respectively the zero bias (without body bias control) and the uniform (control of the whole PE array) cases, while respecting performance constraints. Besides, it is observed that, with appropriate body bias control, it is possible to extend the possible performance, hence enabling broader trade-off analyzes between consumption and performance. Considering the dynamic power as well as the static power, more appropriate pipeline structure and body bias voltage can be obtained. In addition, when the control of VDD is integrated, higher performance can be achieved with a steady increase of the power. These promising results show that applying an adequate optimization technique for the body bias control while simultaneously considering pipeline structures can not only enable further power reduction than previous methods, but also allow more trade-off analysis possibilities.
Yoshiaki ANDO Ning GUAN Ken'ichiro YASHIRO Sumio OHKAWA
Excitation of magnetostatic surface waves by coplanar waveguide transducers is analyzed by using the integral kernel expansion method. The Fourier integral for the current density is derived in terms of an unknown normal component of the magnetic flux density on slot region of a coplanar waveguide. The integral kernel is expanded into a series of Legendre polynomials and then applying Galerkin's method to the unknown field reduces the Fourier integral to a system of linear equations for the unknown coefficients. In this process, we should take into account the edge conditions which show nonreciprocal characteristics depending on frequency. The present method shows excellent agreement with experiments.