Masami SHISHIBORI Kazuaki ANDO Yuuichirou KASHIWAGI Jun-ichi AOE
Natural language interface systems can accept more unrestricted queries from users than other systems, however it is impossible to understand erroneous sentences which include the syntax errors, unknown words and misspelling. In order to realize the superior natural language interface, the automatic error correction for erroneous sentences is one of problems to be solved. The method to apply the LR parsing strategies is one of the famous approaches as the robust error recovery scheme. This method is able to obtain a high correction accuracy, however it takes a great deal of time to parse the sentence, such that it becomes a very important task to improve the time-cost. In this paper, we propose the method to improve the time efficiency, keeping the correction accuracy of the traditional method. This method makes use of a new parsing table that denotes the states to be transited after accepting each symbol. By using this table, the symbol which is allocated just after the error position can be utilized for selecting correction symbols, as a result, the number of candidates produced on the correction process is reduced, and fast system can be realized. The experiment results, using 1,050 sentences including error characters, show that this method can correct error points 69 times faster than the traditional method, also keep the same correction accuracy as the traditional method.
Yoshiaki ANDO Ning GUAN Ken'ichiro YASHIRO Sumio OHKAWA
Excitation of magnetostatic surface waves by slot line transducers is analyzed by using the integral kernel expansion method. The Fourier integral for the current density is derived in terms of an unknown normal component of the magnetic flux density in a slot region. The integral kernel is expanded into a series of orthogonal polynomials and then applying Galerkin's method to the resulting equation yields a system of linear equations for the unknown coefficients. Comparison of a numerical result by the present method with an experiment is in good agreement.
Yoshiaki ANDO Masashi HAYAKAWA
The perfect matched layer (PML) is formulated for the use in the constrained interpolation profile (CIP) method. Numerical results are presented to examine the performance of the proposed formulation of the PML in the case of two-dimensional TM wave. The results show that the proposed methods suppress the reflection effectively in comparison with the natural absorbing boundary condition of the CIP method. We have two methods to formulate the PML, and it is shown that the both methods have equal characteristics.
Ki ANDO Yuki HASEGAWA Hitoshi MAEKAWA Teruaki KATSUBE
The bioelectric potential of plants is generated by ion concentration difference between inside and outside of plant cells. It has been reported that the bioelectric potential of leaves changes at the beginning of steady irradiation and intensity of the potential response increases with the photosynthetic rate. Although it has been reported that photosynthesis is accelerated by blinking irradiation, the potential response under the blinking irradiation have not been fully clarified. In this study, we measured the bioelectric potential and CO2 consumption of plants under various types of the blinking irradiation. This result showed that the potential response under the blinking irradiation has various behaviors and intensity of the response related to photosynthetic rate. We conclude that our method is suitable for monitoring the biological activity of plants such as photosynthesis.
Hideyuki ANDO Maki SUGIMOTO Taro MAEDA
There has recently been considerable interest in research on wearable non-grounded force display. However, there have been no developments for the communication of nonverbal information (ex. tennis and golf swing). We propose a small and lightweight wearable force display to present motion timing and direction. The display outputs a torque using rotational moment and mechanical brakes. We explain the principle of this device, and describe an actual measurement of the torque and torque sensitivity experiments.
Ryota SHIOYA Ryo TAKAMI Masahiro GOSHIMA Hideki ANDO
Out-of-order superscalar processors have high performance but consume a large amount of energy for dynamic instruction scheduling. We propose a front-end execution architecture (FXA) for improving the energy efficiency of out-of-order superscalar processors. FXA has two execution units: an out-of-order execution unit (OXU) and an in-order execution unit (IXU). The OXU is the execution core of a common out-of-order superscalar processor. In contrast, the IXU consists only of functional units and a bypass network only. The IXU is placed at the processor front end and executes instructions in order. The IXU functions as a filter for the OXU. Fetched instructions are first fed to the IXU, and the instructions are executed in order if they are ready to execute. The instructions executed in the IXU are removed from the instruction pipeline and are not executed in the OXU. The IXU does not include dynamic scheduling logic, and thus its energy consumption is low. Evaluation results show that FXA can execute more than 50% of the instructions by using IXU, thereby making it possible to shrink the energy-consuming OXU without incurring performance degradation. As a result, FXA achieves both high performance and low energy consumption. We evaluated FXA and compared it with conventional out-of-order/in-order superscalar processors after ARM big.LITTLE architecture. The results show that FXA achieves performance improvements of 7.4% on geometric mean in SPECCPU INT 2006 benchmark suite relative to a conventional superscalar processor (big), while reducing the energy consumption by 17% in the entire processor. The performance/energy ratio (the inverse of the energy-delay product) of FXA is 25% higher than that of a conventional superscalar processor (big) and 27% higher than that of a conventional in-order superscalar processor (LITTLE).
Eisuke HARAGUCHI Hitomi ONO Junya NISHIOKA Toshiyuki ANDO Masateru NAGASE Akira AKAISHI Takashi TAKAHASHI
To provide a satellite communication system with high reliability for social infrastructure, building flexible beam adapting to change of communication traffic is necessary. Optical Beam Forming Network has the capability of broadband transmission and small light construction. However, in space environment, there are concerns that the reception efficiency is reduced by the relative phase error of receiving signal among antenna elements with temperature fluctuation. To prevent this, we control relative phase among received signals with optical phase locked loop. In this paper, we propose the active optical phased array system using multi dither heterodyning technique for receiving OBF, and present experimental results under temperature fluctuation. We evaluated the stability of relative phase among 3 elements for temperature fluctuation at multiplexer from -15 to 45, and checked the stability of PLL among 3 elements.
Yoshinao ISOBE Nobuhiko MIYAMOTO Noriaki ANDO Yutaka OIWA
In this paper, we demonstrate that a formal approach is effective for improving reliability of cooperative robot designs, where the control logics are expressed in concurrent FSMs (Finite State Machines), especially in accordance with the standard FSM4RTC (FSM for Robotic Technology Components), by a case study of cooperative transport robots. In the case study, FSMs are modeled in the formal specification language CSP (Communicating Sequential Processes) and checked by the model-checking tool FDR, where we show techniques for modeling and verification of cooperative robots implemented with the help of the RTM (Robotic Technology Middleware).
Yoshiaki ANDO Yusuke TAKAHASHI
This paper presents an application of the constained interpolation profile basis set (CIP-BS) method to electromagnetic fields analyses. Electromagnetic fields can be expanded in terms of multi-dimensional CIP basis functions, and the Galerkin method can then be applied to obtain a system of linear equations. In the present study, we focus on a two-dimensional problem with TMz polarization. In order to examine the precision of the CIP-BS method, TE202 resonant mode in a rectangular cavity is analyzed. The numerical results show that CIP-BS method has better performance than the finite-difference time-domain (FDTD) method when the time step is small. Then an absorbing boundary condition based on the perfectly matched layer (PML) is formulated, and the absorption performance is demonstrated. Finally, the propagation in an inhomogeneous medium is computed by using the proposed method, and it is observed that in the CIP-BS method, smooth variation of material constants is effectively formulated without additional computational costs, and that accurate results are obtained in comparison with the FDTD method even if the permittivity is high.
Yuya KORA Kyohei YAMAGUCHI Hideki ANDO
Single-thread performance has not improved much over the past few years, despite an ever increasing transistor budget. One of the reasons for this is that there is a speed gap between the processor and main memory, known as the memory wall. A promising method to overcome this memory wall is aggressive out-of-order execution by extensively enlarging the instruction window resources to exploit memory-level parallelism (MLP). However, simply enlarging the window resources lengthens the clock cycle time. Although pipelining the resources solves this problem, it in turn prevents instruction-level parallelism (ILP) from being exploited because issuing instructions requires multiple clock cycles. This paper proposed a dynamic scheme that adaptively resizes the instruction window based on the predicted available parallelism, either ILP or MLP. Specifically, if the scheme predicts that MLP is available during execution, the instruction window is enlarged and the window resources are pipelined, thereby exploiting MLP. Conversely, if the scheme predicts that less MLP is available, that is, ILP is exploitable for improved performance, the instruction window is shrunk and the window resources are de-pipelined, thereby exploiting ILP. Our evaluation results using the SPEC2006 benchmark programs show that the proposed scheme achieves nearly the best performance possible with fixed-size resources. On average, our scheme realizes a performance improvement of 21% over that of a conventional processor, with additional cost of only 6% of the area of the conventional processor core or 3% of that of the entire processor chip. The evaluation results also show 8% better energy efficiency in terms of 1/EDP (energy-delay product).
Yasutaka MATSUDA Ryota SHIOYA Hideki ANDO
The high energy consumption of current processors causes several problems, including a limited clock frequency, short battery lifetime, and reduced device reliability. It is therefore important to reduce the energy consumption of the processor. Among resources in a processor, the issue queue (IQ) is a large consumer of energy, much of which is consumed by the wakeup logic. Within the wakeup logic, the tag comparison that checks source operand readiness consumes a significant amount of energy. This paper proposes an energy reduction scheme for tag comparison, called double-stage tag comparison. This scheme first compares the lower bits of the tag and then, only if these match, compares the higher bits. Because the energy consumption of tag comparison is roughly proportional to the total number of bits compared, energy is saved by reducing this number. However, this sequential comparison increases the delay of the IQ, thereby increasing the clock cycle time. Although this can be avoided by allocating an extra cycle to the issue operation, this in turn degrades the IPC. To avoid IPC degradation, we reconfigure a small number of entries in the IQ, where several oldest instructions that are likely to have an adverse effect on performance reside, to a single stage for tag comparison. Our evaluation results for SPEC2017 benchmark programs show that the double-stage tag comparison achieves on average a 21% reduction in the energy consumed by the wakeup logic (15% when including the overhead) with only 3.0% performance degradation.
Yoshiaki ANDO Hiroyuki SAITO Masashi HAYAKAWA
A total-field/scattered-field (TF/SF) boundary which is commonly used in the finite-difference time-domain (FDTD) method to illuminate scatterers by plane waves, is developed for use in the constrained interpolation profile (CIP) method. By taking the numerical dispersion into account, the nearly perfect TF/SF boundary can be achieved, which allows us to calculate incident fields containing high frequency components without fictitious scattered fields. First of all, we formulate the TF/SF boundary in the CIP scheme. The numerical dispersion relation is then reviewed. Finally the numerical dispersion is implemented in the TF/SF boundary to estimate deformed incident fields. The performance of the nearly perfect TF/SF boundary is examined by measuring leaked fields in the SF region, and the proposed method drastically diminish the leakage compared with the simple TF/SF boundary.
Yuki ANDO Seiya SHIBATA Shinya HONDA Hiroyuki TOMIYAMA Hiroaki TAKADA
We present a hardware sharing method for design space exploration of multi-processor embedded systems. In our prior work, we had developed a system-level design tool named SystemBuilder which automatically synthesizes target implementation of a system from a functional description. In this work, we have extended SystemBuilder so that it can automatically synthesize an area-efficient implementation which shares a hardware module among different applications. With SystemBuilder, designers only need to enable an option in order to share a hardware module. The designers, therefore, can easily explore a design space including hardware sharing in short time. A case study shows the effectiveness of the hardware sharing on design space exploration.
Takuya KOJIMA Naoki ANDO Hayate OKUHARA Ng. Anh Vu DOAN Hideharu AMANO
Variable Pipeline Cool Mega Array (VPCMA) is a low power Coarse Grained Reconfigurable Architecture (CGRA) based on the concept of CMA (Cool Mega Array). It provides a pipeline structure in the PE array that can be configured so as to fit target algorithms and required performance. Also, VPCMA uses the Silicon On Thin Buried oxide (SOTB) technology, a type of Fully Depleted Silicon On Insulator (FDSOI), so it is possible to control its body bias voltage to provide a balance between performance and leakage power. In this paper, we study the optimization of the VPCMA body bias while considering simultaneously its variable pipeline structure. Through evaluations, we can observe that it is possible to achieve an average reduction of energy consumption, for the studied applications, of 17.75% and 10.49% when compared to respectively the zero bias (without body bias control) and the uniform (control of the whole PE array) cases, while respecting performance constraints. Besides, it is observed that, with appropriate body bias control, it is possible to extend the possible performance, hence enabling broader trade-off analyzes between consumption and performance. Considering the dynamic power as well as the static power, more appropriate pipeline structure and body bias voltage can be obtained. In addition, when the control of VDD is integrated, higher performance can be achieved with a steady increase of the power. These promising results show that applying an adequate optimization technique for the body bias control while simultaneously considering pipeline structures can not only enable further power reduction than previous methods, but also allow more trade-off analysis possibilities.
Yoshiaki ANDO Ning GUAN Ken'ichiro YASHIRO Sumio OHKAWA
Excitation of magnetostatic surface waves by coplanar waveguide transducers is analyzed by using the integral kernel expansion method. The Fourier integral for the current density is derived in terms of an unknown normal component of the magnetic flux density on slot region of a coplanar waveguide. The integral kernel is expanded into a series of Legendre polynomials and then applying Galerkin's method to the unknown field reduces the Fourier integral to a system of linear equations for the unknown coefficients. In this process, we should take into account the edge conditions which show nonreciprocal characteristics depending on frequency. The present method shows excellent agreement with experiments.
Tomoaki ANDO Vasily G. MOSHNYAGA Koji HASHIMOTO
This paper introduces new FPGA design of user-monitoring system for power management of PC display. From the camera readings the system detects whether the user looks at the screen or not and produces signals to control the display backlight. The system provides over 88% eye detection accuracy at 8f/s image processing rate. We describe new eye-tracking algorithm and hardware and present the results of its experimental evaluation in prototype display power management system.
A total-field/scattered-field (TF/SF) boundary for the constrained interpolation profile (CIP) method is proposed for multi-dimensional electromagnetic problems. Incident fields are added to or subtracted from update equations in order to satisfy advection equations into which Maxwell's equations are reduced by means of the directional splitting. Modified incident fields are introduced to take into account electromagnetic fields after advection. The developed TF/SF boundary is examined numerically, and the results show that it operates with good performance. Finally, we apply the proposed TF/SF boundary to a scattering problem, and it can be solved successfully.
Hideki ANDO Chikako NAKANISHI Hirohisa MACHIDA Tetsuya HARA Masao NAKAYA
Superscalar processors improve performance by exploiting instruction-level parallelism (ILP). ILP in a basic block is, however, not sufficient on non-numerical applications for gaining substantial speedup. Instructions across branches are required to be executed in parallel to dramatically improve performance. That is, speculative execution is strongly required. Boosting is a general solution to achieving speculative execution. Boosting labels an instruction to be speculatively executed, and the hardware handles side-effects. This paper describes the efficient implementation of boosting in terms of cost/performance trade-offs. Our policy in implementation is beneficial in code scheduling heuristics, penalties imposed by code duplication to maintain program semantics, and area cost. This paper also describes a branch scheme which minimizes branch penalty. Branch delay causes crucial penalties on the performance of superscalar processors since multiple delay slots exist even in a single delay cycle. Our scheme is the fetching of both sequential and target instructions, and either of them is selected on a branch. No delay cycle can be imposed. This scheme is realized by a combination of static code movement and hardware support. As a result, we reduce branch penalty with small cost. Simulation results show that our ideas are highly effective in improving the performance of a superscalar processor.
Kazunaga HYODO Kengo IWAMOTO Hideki ANDO
Instruction pre-execution is an effective way to prefetch data. We previously proposed an instruction pre-execution scheme, which we call two-step physical register deallocation (TSD). The TSD realizes pre-execution by exploiting the difference between the amount of instruction-level parallelism available with an unlimited number of physical registers and that available with an actual number of physical registers. Although previous TSD study has successfully improved performance, it still has an inefficient energy consumption. This is because attempts are made for instructions to be pre-executed as much as possible, independently of whether or not they can significantly contribute to load latency reduction, allowing for maximal performance improvement. This paper presents a scheme that improves the energy efficiency of the TSD by pre-executing only those instructions that have great benefit. Our evaluation results using the SPECfp2000 benchmark show that our scheme reduces the dynamic pre-executed instruction count by 76%, compared with the original scheme. This reduction saves 7% energy consumption of the execution core with 2% overhead. Performance degrades by 2%, compared with that of the original scheme, but is still 15% higher than that of the normal processor without the TSD.
Dynamic instruction window resizing (DIWR) is a scheme that effectively exploits both memory-level parallelism and instruction-level parallelism by configuring the instruction window size appropriately for exploiting each parallelism. Although a previous study has shown that the DIWR processor achieves a significant speedup, power consumption has not been explored. The power consumption is increased in DIWR because the instruction window resources are enlarged in memory-intensive phases. If the power consumption exceeds the power budget determined by certain requirements, the DIWR processor must save power and thus, the performance previously presented cannot be achieved. In this paper, we explore to what extent the DIWR processor can achieve improved performance for a given power budget, assuming that dynamic voltage and frequency scaling (DVFS) is introduced as a power saving technique. Evaluation results using the SPEC2006 benchmark programs show that the DIWR processor, even with a constrained power budget, achieves a speedup over the conventional processor over a wide range of given power budgets. At the most important power budget point, i.e., when the power a conventional processor consumes without any power constraint is supplied, DIWR achieves a 16% speedup.