IEICE global.ieice.org Site

Author Search Result

[Author] Ki ANDO(24hit)

1-20hit(24hit)

Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS Control
Hideki ANDO Ryota SHIOYA

PAPER-Computer System

Pubricized:
2015/11/12
Vol:
E99-D No:2
Page(s):
341-350
Dynamic instruction window resizing (DIWR) is a scheme that effectively exploits both memory-level parallelism and instruction-level parallelism by configuring the instruction window size appropriately for exploiting each parallelism. Although a previous study has shown that the DIWR processor achieves a significant speedup, power consumption has not been explored. The power consumption is increased in DIWR because the instruction window resources are enlarged in memory-intensive phases. If the power consumption exceeds the power budget determined by certain requirements, the DIWR processor must save power and thus, the performance previously presented cannot be achieved. In this paper, we explore to what extent the DIWR processor can achieve improved performance for a given power budget, assuming that dynamic voltage and frequency scaling (DVFS) is introduced as a power saving technique. Evaluation results using the SPEC2006 benchmark programs show that the DIWR processor, even with a constrained power budget, achieves a speedup over the conventional processor over a wide range of given power budgets. At the most important power budget point, i.e., when the power a conventional processor consumes without any power constraint is supplied, DIWR achieves a 16% speedup.
Register File Size Reduction through Instruction Pre-Execution Incorporating Value Prediction
Yusuke TANAKA Hideki ANDO

PAPER-Computer System

Vol:
E93-D No:12
Page(s):
3294-3305
Two-step physical register deallocation (TSD) is an architectural scheme that enhances memory-level parallelism (MLP) by pre-executing instructions. Ideally, TSD allows exploitation of MLP under an unlimited number of physical registers, and consequently only a small register file is needed for MLP. In practice, however, the amount of MLP exploitable is limited, because there are cases where either 1) pre-execution is not performed; or 2) the timing of pre-execution is delayed. Both are due to data dependencies among the pre-executed instructions. This paper proposes the use of value prediction to solve these problems. This paper proposes the use of value prediction to solve these problems. Evaluation results using the SPECfp2000 benchmark confirm that the proposed scheme with value prediction for predicting addresses achieves equivalent IPC, with a smaller register file, to the previous TSD scheme. The reduction rate of the register file size is 21%.
Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency
Ryota SHIOYA Hideki ANDO

PAPER-Computer System

Pubricized:
2015/12/04
Vol:
E99-D No:3
Page(s):
630-640
Out-of-order superscalar processors rename register numbers to remove false dependencies between instructions. A renaming logic for register renaming is a high-cost module in a superscalar processor, and it consumes considerable energy. A renamed trace cache (RTC) was proposed for reducing the energy consumption of a renaming logic. An RTC caches and reuses renamed operands, and thus, register renaming can be omitted on RTC hits. However, conventional RTCs suffer from several performance, energy consumption, and hardware overhead problems. We propose a semi-global renamed trace cache (SGRTC) that caches only renamed operands that are short distance from producers outside traces, and solves the problems of conventional RTCs. Evaluation results show that SGRTC achieves 64% lower energy consumption for renaming with a 0.2% performance overhead as compared to a conventional processor.
Electromagnetic Bandgap (EBG) Structures Using Open Stubs to Suppress Power Plane Noise
Hiroshi TOYAO Noriaki ANDO Takashi HARADA

PAPER-PCB and Circuit Design for EMI Control

Vol:
E93-B No:7
Page(s):
1754-1759
A novel approach is proposed for miniaturizing the unit cell size of electromagnetic bandgap (EBG) structures that suppress power plane noise. In this approach, open stubs are introduced into the shunt circuits of these EBG structures. Since the stub length determines the resonant frequencies of the shunt circuit, the proposed structures can maintain the bandgaps at lower frequencies without increasing the unit cell size. The bandgap frequencies were estimated by dispersion analysis based on the Bloch theorem and full-wave simulations. Sample boards of the proposed EBG structures were fabricated with a unit cell size of 2.1 mm. Highly suppressed noise propagation over the estimated frequency range of 1.9-3.6 GHz including the 2.4-GHz wireless-LAN band was experimentally demonstrated.
Delay Evaluation of Issue Queue in Superscalar Processors with Banking Tag RAM and Correct Critical Path Identification
Kyohei YAMAGUCHI Yuya KORA Hideki ANDO

PAPER-Computer System

Vol:
E95-D No:9
Page(s):
2235-2246
This paper evaluates the delay of the issue queue in a superscalar processor to aid microarchitectural design, where quick quantification of the complexity of the issue queue is needed to consider the tradeoff between clock cycle time and instructions per cycle. Our study covers two aspects. First, we introduce banking tag RAM, which comprises the issue queue, to reduce the delay. Unlike normal RAM, this is not straightforward, because of the uniqueness of the issue queue organization. Second, we explore and identify the correct critical path in the issue queue. In a previous study, the critical path of each component in the issue queue was summed to obtain the issue queue delay, but this does not give the correct delay of the issue queue, because the critical paths of the components are not connected logically. In the evaluation assuming 32-nm LSI technology, we obtained the delays of issue queues with eight to 128 entries. The process of banking tag RAM and identifying the correct critical path reduces the delay by up to 20% and 23% for 4- and 8-issue widths, respectively, compared with not banking tag RAM and simply summing the critical path delay of each component.
A Fast Correction Method for Erroneous Sentences Using the LR Parsing
Masami SHISHIBORI Kazuaki ANDO Yuuichirou KASHIWAGI Jun-ichi AOE

PAPER-Natural Language Processing

Vol:
E83-D No:9
Page(s):
1797-1804
Natural language interface systems can accept more unrestricted queries from users than other systems, however it is impossible to understand erroneous sentences which include the syntax errors, unknown words and misspelling. In order to realize the superior natural language interface, the automatic error correction for erroneous sentences is one of problems to be solved. The method to apply the LR parsing strategies is one of the famous approaches as the robust error recovery scheme. This method is able to obtain a high correction accuracy, however it takes a great deal of time to parse the sentence, such that it becomes a very important task to improve the time-cost. In this paper, we propose the method to improve the time efficiency, keeping the correction accuracy of the traditional method. This method makes use of a new parsing table that denotes the states to be transited after accepting each symbol. By using this table, the symbol which is allocated just after the error position can be utilized for selecting correction symbols, as a result, the number of candidates produced on the correction process is reduced, and fast system can be realized. The experiment results, using 1,050 sentences including error characters, show that this method can correct error points 69 times faster than the traditional method, also keep the same correction accuracy as the traditional method.
Excitation of Magnetostatic Surface Waves by Slot Line Transducers
Yoshiaki ANDO Ning GUAN Ken'ichiro YASHIRO Sumio OHKAWA

PAPER-Passive Devices and Circuits

Vol:
E82-C No:7
Page(s):
1123-1128
Excitation of magnetostatic surface waves by slot line transducers is analyzed by using the integral kernel expansion method. The Fourier integral for the current density is derived in terms of an unknown normal component of the magnetic flux density in a slot region. The integral kernel is expanded into a series of orthogonal polynomials and then applying Galerkin's method to the resulting equation yields a system of linear equations for the unknown coefficients. Comparison of a numerical result by the present method with an experiment is in good agreement.
Implementation of the Perfect Matched Layer to the CIP Method
Yoshiaki ANDO Masashi HAYAKAWA

LETTER-Electromagnetic Theory

Vol:
E89-C No:5
Page(s):
645-648
The perfect matched layer (PML) is formulated for the use in the constrained interpolation profile (CIP) method. Numerical results are presented to examine the performance of the proposed formulation of the PML in the case of two-dimensional TM wave. The results show that the proposed methods suppress the reflection effectively in comparison with the natural absorbing boundary condition of the CIP method. We have two methods to formulate the PML, and it is shown that the both methods have equal characteristics.
Analyzing Bioelectric Potential Response of Plants Related to Photosynthesis under Blinking Irradiation
Ki ANDO Yuki HASEGAWA Hitoshi MAEKAWA Teruaki KATSUBE

PAPER-Bioelectronics

Vol:
E91-C No:12
Page(s):
1905-1909
The bioelectric potential of plants is generated by ion concentration difference between inside and outside of plant cells. It has been reported that the bioelectric potential of leaves changes at the beginning of steady irradiation and intensity of the potential response increases with the photosynthetic rate. Although it has been reported that photosynthesis is accelerated by blinking irradiation, the potential response under the blinking irradiation have not been fully clarified. In this study, we measured the bioelectric potential and CO2 consumption of plants under various types of the blinking irradiation. This result showed that the potential response under the blinking irradiation has various behaviors and intensity of the response related to photosynthetic rate. We conclude that our method is suitable for monitoring the biological activity of plants such as photosynthesis.
Wearable Moment Display Device for Nonverbal Communications
Hideyuki ANDO Maki SUGIMOTO Taro MAEDA

PAPER

Vol:
E87-D No:6
Page(s):
1354-1360
There has recently been considerable interest in research on wearable non-grounded force display. However, there have been no developments for the communication of nonverbal information (ex. tennis and golf swing). We propose a small and lightweight wearable force display to present motion timing and direction. The display outputs a torque using rotational moment and mechanical brakes. We explain the principle of this device, and describe an actual measurement of the torque and torque sensitivity experiments.
FXA: Executing Instructions in Front-End for Energy Efficiency
Ryota SHIOYA Ryo TAKAMI Masahiro GOSHIMA Hideki ANDO

PAPER-Computer System

Pubricized:
2016/01/06
Vol:
E99-D No:4
Page(s):
1092-1107
Out-of-order superscalar processors have high performance but consume a large amount of energy for dynamic instruction scheduling. We propose a front-end execution architecture (FXA) for improving the energy efficiency of out-of-order superscalar processors. FXA has two execution units: an out-of-order execution unit (OXU) and an in-order execution unit (IXU). The OXU is the execution core of a common out-of-order superscalar processor. In contrast, the IXU consists only of functional units and a bypass network only. The IXU is placed at the processor front end and executes instructions in order. The IXU functions as a filter for the OXU. Fetched instructions are first fed to the IXU, and the instructions are executed in order if they are ready to execute. The instructions executed in the IXU are removed from the instruction pipeline and are not executed in the OXU. The IXU does not include dynamic scheduling logic, and thus its energy consumption is low. Evaluation results show that FXA can execute more than 50% of the instructions by using IXU, thereby making it possible to shrink the energy-consuming OXU without incurring performance degradation. As a result, FXA achieves both high performance and low energy consumption. We evaluated FXA and compared it with conventional out-of-order/in-order superscalar processors after ARM big.LITTLE architecture. The results show that FXA achieves performance improvements of 7.4% on geometric mean in SPECCPU INT 2006 benchmark suite relative to a conventional superscalar processor (big), while reducing the energy consumption by 17% in the entire processor. The performance/energy ratio (the inverse of the energy-delay product) of FXA is 25% higher than that of a conventional superscalar processor (big) and 27% higher than that of a conventional in-order superscalar processor (LITTLE).
Optical Phased Array Using Multi Dither Heterodyning Technique for Receiving Optical Beam Former
Eisuke HARAGUCHI Hitomi ONO Junya NISHIOKA Toshiyuki ANDO Masateru NAGASE Akira AKAISHI Takashi TAKAHASHI

PAPER

Vol:
E99-B No:10
Page(s):
2128-2135
To provide a satellite communication system with high reliability for social infrastructure, building flexible beam adapting to change of communication traffic is necessary. Optical Beam Forming Network has the capability of broadband transmission and small light construction. However, in space environment, there are concerns that the reception efficiency is reduced by the relative phase error of receiving signal among antenna elements with temperature fluctuation. To prevent this, we control relative phase among received signals with optical phase locked loop. In this paper, we propose the active optical phased array system using multi dither heterodyning technique for receiving OBF, and present experimental results under temperature fluctuation. We evaluated the stability of relative phase among 3 elements for temperature fluctuation at multiplexer from -15 to 45, and checked the stability of PLL among 3 elements.
Formal Modeling and Verification of Concurrent FSMs: Case Study on Event-Based Cooperative Transport Robots
Yoshinao ISOBE Nobuhiko MIYAMOTO Noriaki ANDO Yutaka OIWA

PAPER

Pubricized:
2021/07/08
Vol:
E104-D No:10
Page(s):
1515-1532
In this paper, we demonstrate that a formal approach is effective for improving reliability of cooperative robot designs, where the control logics are expressed in concurrent FSMs (Finite State Machines), especially in accordance with the standard FSM4RTC (FSM for Robotic Technology Components), by a case study of cooperative transport robots. In the case study, FSMs are modeled in the formal specification language CSP (Communicating Sequential Processes) and checked by the model-checking tool FDR, where we show techniques for modeling and verification of cooperative robots implemented with the help of the RTM (Robotic Technology Middleware).
CIP Basis Set Method for Electromagnetic Simulation
Yoshiaki ANDO Yusuke TAKAHASHI

PAPER-Numerical Techniques

Vol:
E97-C No:1
Page(s):
26-32
This paper presents an application of the constained interpolation profile basis set (CIP-BS) method to electromagnetic fields analyses. Electromagnetic fields can be expanded in terms of multi-dimensional CIP basis functions, and the Galerkin method can then be applied to obtain a system of linear equations. In the present study, we focus on a two-dimensional problem with TMz polarization. In order to examine the precision of the CIP-BS method, TE202 resonant mode in a rectangular cavity is analyzed. The numerical results show that CIP-BS method has better performance than the finite-difference time-domain (FDTD) method when the time step is small. Then an absorbing boundary condition based on the perfectly matched layer (PML) is formulated, and the absorption performance is demonstrated. Finally, the propagation in an inhomogeneous medium is computed by using the proposed method, and it is observed that in the CIP-BS method, smooth variation of material constants is effectively formulated without additional computational costs, and that accurate results are obtained in comparison with the FDTD method even if the permittivity is high.
MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism
Yuya KORA Kyohei YAMAGUCHI Hideki ANDO

PAPER-Computer System

Pubricized:
2014/09/22
Vol:
E97-D No:12
Page(s):
3110-3123
Single-thread performance has not improved much over the past few years, despite an ever increasing transistor budget. One of the reasons for this is that there is a speed gap between the processor and main memory, known as the memory wall. A promising method to overcome this memory wall is aggressive out-of-order execution by extensively enlarging the instruction window resources to exploit memory-level parallelism (MLP). However, simply enlarging the window resources lengthens the clock cycle time. Although pipelining the resources solves this problem, it in turn prevents instruction-level parallelism (ILP) from being exploited because issuing instructions requires multiple clock cycles. This paper proposed a dynamic scheme that adaptively resizes the instruction window based on the predicted available parallelism, either ILP or MLP. Specifically, if the scheme predicts that MLP is available during execution, the instruction window is enlarged and the window resources are pipelined, thereby exploiting MLP. Conversely, if the scheme predicts that less MLP is available, that is, ILP is exploitable for improved performance, the instruction window is shrunk and the window resources are de-pipelined, thereby exploiting ILP. Our evaluation results using the SPEC2006 benchmark programs show that the proposed scheme achieves nearly the best performance possible with fixed-size resources. On average, our scheme realizes a performance improvement of 21% over that of a conventional processor, with additional cost of only 6% of the area of the conventional processor core or 3% of that of the entire processor chip. The evaluation results also show 8% better energy efficiency in terms of 1/EDP (energy-delay product).
Reducing Energy Consumption of Wakeup Logic through Double-Stage Tag Comparison
Yasutaka MATSUDA Ryota SHIOYA Hideki ANDO

PAPER-Computer System

Pubricized:
2021/11/02
Vol:
E105-D No:2
Page(s):
320-332
The high energy consumption of current processors causes several problems, including a limited clock frequency, short battery lifetime, and reduced device reliability. It is therefore important to reduce the energy consumption of the processor. Among resources in a processor, the issue queue (IQ) is a large consumer of energy, much of which is consumed by the wakeup logic. Within the wakeup logic, the tag comparison that checks source operand readiness consumes a significant amount of energy. This paper proposes an energy reduction scheme for tag comparison, called double-stage tag comparison. This scheme first compares the lower bits of the tag and then, only if these match, compares the higher bits. Because the energy consumption of tag comparison is roughly proportional to the total number of bits compared, energy is saved by reducing this number. However, this sequential comparison increases the delay of the IQ, thereby increasing the clock cycle time. Although this can be avoided by allocating an extra cycle to the issue operation, this in turn degrades the IPC. To avoid IPC degradation, we reconfigure a small number of entries in the IQ, where several oldest instructions that are likely to have an adverse effect on performance reside, to a single stage for tag comparison. Our evaluation results for SPEC2017 benchmark programs show that the double-stage tag comparison achieves on average a 21% reduction in the energy consumed by the wakeup logic (15% when including the overhead) with only 3.0% performance degradation.
A Nearly Perfect Total-Field/Scattered-Field Boundary for the One-Dimensional CIP Method
Yoshiaki ANDO Hiroyuki SAITO Masashi HAYAKAWA

PAPER-Electromagnetic Theory

Vol:
E91-C No:10
Page(s):
1677-1683
A total-field/scattered-field (TF/SF) boundary which is commonly used in the finite-difference time-domain (FDTD) method to illuminate scatterers by plane waves, is developed for use in the constrained interpolation profile (CIP) method. By taking the numerical dispersion into account, the nearly perfect TF/SF boundary can be achieved, which allows us to calculate incident fields containing high frequency components without fictitious scattered fields. First of all, we formulate the TF/SF boundary in the CIP scheme. The numerical dispersion relation is then reviewed. Finally the numerical dispersion is implemented in the TF/SF boundary to estimate deformed incident fields. The performance of the nearly perfect TF/SF boundary is examined by measuring leaked fields in the SF region, and the proposed method drastically diminish the leakage compared with the simple TF/SF boundary.
Automatic Communication Synthesis with Hardware Sharing for Multi-Processor SoC Design
Yuki ANDO Seiya SHIBATA Shinya HONDA Hiroyuki TOMIYAMA Hiroaki TAKADA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E93-A No:12
Page(s):
2509-2516
We present a hardware sharing method for design space exploration of multi-processor embedded systems. In our prior work, we had developed a system-level design tool named SystemBuilder which automatically synthesizes target implementation of a system from a functional description. In this work, we have extended SystemBuilder so that it can automatically synthesize an area-efficient implementation which shares a hardware module among different applications. With SystemBuilder, designers only need to enable an option in order to share a hardware module. The designers, therefore, can easily explore a design space including hardware sharing in short time. A case study shows the effectiveness of the hardware sharing on design space exploration.
Optimization of Body Biasing for Variable Pipelined Coarse-Grained Reconfigurable Architectures
Takuya KOJIMA Naoki ANDO Hayate OKUHARA Ng. Anh Vu DOAN Hideharu AMANO

PAPER-Computer System

Pubricized:
2018/03/09
Vol:
E101-D No:6
Page(s):
1532-1540
Variable Pipeline Cool Mega Array (VPCMA) is a low power Coarse Grained Reconfigurable Architecture (CGRA) based on the concept of CMA (Cool Mega Array). It provides a pipeline structure in the PE array that can be configured so as to fit target algorithms and required performance. Also, VPCMA uses the Silicon On Thin Buried oxide (SOTB) technology, a type of Fully Depleted Silicon On Insulator (FDSOI), so it is possible to control its body bias voltage to provide a balance between performance and leakage power. In this paper, we study the optimization of the VPCMA body bias while considering simultaneously its variable pipeline structure. Through evaluations, we can observe that it is possible to achieve an average reduction of energy consumption, for the studied applications, of 17.75% and 10.49% when compared to respectively the zero bias (without body bias control) and the uniform (control of the whole PE array) cases, while respecting performance constraints. Besides, it is observed that, with appropriate body bias control, it is possible to extend the possible performance, hence enabling broader trade-off analyzes between consumption and performance. Considering the dynamic power as well as the static power, more appropriate pipeline structure and body bias voltage can be obtained. In addition, when the control of VDD is integrated, higher performance can be achieved with a steady increase of the power. These promising results show that applying an adequate optimization technique for the body bias control while simultaneously considering pipeline structures can not only enable further power reduction than previous methods, but also allow more trade-off analysis possibilities.
Excitation of Magnetostatic Surface Wave by Coplanar Waveguide Transducers
Yoshiaki ANDO Ning GUAN Ken'ichiro YASHIRO Sumio OHKAWA

PAPER-Electromagnetic Theory

Vol:
E81-C No:12
Page(s):
1942-1947
Excitation of magnetostatic surface waves by coplanar waveguide transducers is analyzed by using the integral kernel expansion method. The Fourier integral for the current density is derived in terms of an unknown normal component of the magnetic flux density on slot region of a coplanar waveguide. The integral kernel is expanded into a series of Legendre polynomials and then applying Galerkin's method to the unknown field reduces the Fourier integral to a system of linear equations for the unknown coefficients. In this process, we should take into account the edge conditions which show nonreciprocal characteristics depending on frequency. The present method shows excellent agreement with experiments.

1-20hit(24hit)

Author Search Result

[Author] Ki ANDO(24hit)

Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS Control

Register File Size Reduction through Instruction Pre-Execution Incorporating Value Prediction

Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency

Electromagnetic Bandgap (EBG) Structures Using Open Stubs to Suppress Power Plane Noise

Delay Evaluation of Issue Queue in Superscalar Processors with Banking Tag RAM and Correct Critical Path Identification

A Fast Correction Method for Erroneous Sentences Using the LR Parsing

Excitation of Magnetostatic Surface Waves by Slot Line Transducers

Implementation of the Perfect Matched Layer to the CIP Method

Analyzing Bioelectric Potential Response of Plants Related to Photosynthesis under Blinking Irradiation

Wearable Moment Display Device for Nonverbal Communications

FXA: Executing Instructions in Front-End for Energy Efficiency

Optical Phased Array Using Multi Dither Heterodyning Technique for Receiving Optical Beam Former

Formal Modeling and Verification of Concurrent FSMs: Case Study on Event-Based Cooperative Transport Robots

CIP Basis Set Method for Electromagnetic Simulation

MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

Reducing Energy Consumption of Wakeup Logic through Double-Stage Tag Comparison

A Nearly Perfect Total-Field/Scattered-Field Boundary for the One-Dimensional CIP Method

Automatic Communication Synthesis with Hardware Sharing for Multi-Processor SoC Design

Optimization of Body Biasing for Variable Pipelined Coarse-Grained Reconfigurable Architectures

Excitation of Magnetostatic Surface Wave by Coplanar Waveguide Transducers

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles