The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] acceleration(53hit)

21-40hit(53hit)

  • Accelerated Widely-Linear Signal Detection by Polynomials for Over-Loaded Large-Scale MIMO Systems

    Qian DENG  Li GUO  Chao DONG  Jiaru LIN  Xueyan CHEN  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2017/07/13
      Vol:
    E101-B No:1
      Page(s):
    185-194

    In this paper, we propose a low-complexity widely-linear minimum mean square error (WL-MMSE) signal detection based on the Chebyshev polynomials accelerated symmetric successive over relaxation (SSORcheb) algorithm for uplink (UL) over-loaded large-scale multiple-input multiple-output (MIMO) systems. The technique of utilizing Chebyshev acceleration not only speeds up the convergence rate significantly, and maximizes the data throughput, but also reduces the cost. By utilizing the random matrix theory, we present good estimates for the Chebyshev acceleration parameters of the proposed signal detection in real large-scale MIMO systems. Simulation results demonstrate that the new WL-SSORcheb-MMSE detection not only outperforms the recently proposed linear iterative detection, and the optimal polynomial expansion (PE) WL-MMSE detection, but also achieves a performance close to the exact WL-MMSE detection. Additionally, the proposed detection offers superior sum rate and bit error rate (BER) performance compared to the precision MMSE detection with substantially fewer arithmetic operations in a short coherence time. Therefore, the proposed detection can satisfy the high-density and high-mobility requirements of some of the emerging wireless networks, such as, the high-mobility Internet of Things (IoT) networks.

  • A Survey of Efficient Ray-Tracing Techniques for Mobile Radio Propagation Analysis Open Access

    Tetsuro IMAI  

     
    INVITED SURVEY PAPER-Antennas and Propagation

      Pubricized:
    2016/12/01
      Vol:
    E100-B No:5
      Page(s):
    666-679

    With the advances in computer processing that have yielded an enormous increase in performance, numerical analytical approaches based on electromagnetic theory have recently been applied to mobile radio propagation analysis. One such approach is the ray-tracing method based on geometrical optics and the uniform geometrical theory of diffraction. In this paper, ray-tracing techniques that have been proposed in order to improve computational accuracy and speed are surveyed. First, imaging and ray-launching methods are described and their extended methods are surveyed as novel fundamental ray-tracing techniques. Next, various ray-tracing acceleration techniques are surveyed and categorized into three approaches, i.e., deterministic, heuristic, and brute force. Then, hybrid methods are surveyed such as those employing Physical optics, the Effective Roughness model, and the Finite-Difference Time-Domain method that have been proposed in order to improve analysis accuracy.

  • FPGA Hardware Acceleration of a Phylogenetic Tree Reconstruction with Maximum Parsimony Algorithm

    Henry BLOCK  Tsutomu MARUYAMA  

     
    PAPER-Computer System

      Pubricized:
    2016/11/14
      Vol:
    E100-D No:2
      Page(s):
    256-264

    In this paper, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with a maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the Progressive Neighborhood and the Indirect Calculation of Tree Lengths method. This method is widely used for the acceleration of the phylogenetic tree reconstruction algorithm in software. In our implementation, we define a tree structure and accelerate the search by parallel and pipeline processing. We show results for eight real-world biological datasets. We compare execution times against our previous hardware approach, and TNT, the fastest available parsimony program, which is also accelerated by the Indirect Calculation of Tree Lengths method. Acceleration rates between 34 to 45 per rearrangement, and 2 to 6 for the whole search, are obtained against our previous hardware approach. Acceleration rates between 2 to 36 per rearrangement, and 18 to 112 for the whole search, are obtained against TNT.

  • An Efficient Soft Shadow Mapping for Area Lights in Various Shapes and Colors

    Youngjae CHUN  Kyoungsu OH  

     
    LETTER-Computer Graphics

      Pubricized:
    2016/11/11
      Vol:
    E100-D No:2
      Page(s):
    396-400

    Shadow is an important effect that makes virtual 3D scenes more realistic. In this paper, we propose a fast and correct soft shadow generation method for area lights of various shapes and colors. To conduct efficient as well as accurate visibility tests, we exploit the complexity of shadow and area light color.

  • Lower Trunk Acceleration Signals Reflect Fall Risk During Walking

    Yoshitaka OTANI  Osamu AOKI  Tomohiro HIROTA  Hiroshi ANDO  

     
    LETTER

      Pubricized:
    2016/04/01
      Vol:
    E99-D No:6
      Page(s):
    1482-1484

    The purpose of this study is to make available a fall risk assessment for stroke patients during walking using an accelerometer. We assessed gait parameters, normalized root mean squared acceleration (NRMSA) and berg balance scale (BBS) values. Walking dynamics were better reflected in terms of the risk of falls during walking by NRMSA compared to the BBS.

  • Register-Based Process Virtual Machine Acceleration Using Hardware Extension with Hybrid Execution

    Surachai THONGKAEW  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2505-2518

    The Process Virtual Machine (VM) is typical software that runs applications inside operating systems. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware, operating system and allows bytecodes (portable code) to be executed in the same way on any other platforms. The Process VMs are implemented using an interpreter to interpret bytecode instead of direct execution of host machine codes. Thus, the bytecode execution is slower than those of the compiled programming language execution. Several techniques including our previous paper, the “Fetch/Decode Hardware Extension”, have been proposed to speed up the interpretation of Process VMs. In this paper, we propose an additional methodology, the “Hardware Extension with Hybrid Execution” to further enhance the performance of Process VMs interpretation and focus on Register-based model. This new technique provides an additional decoder which can classify bytecodes into either simple or complex instructions. With “Hybrid Execution”, the simple instruction will be directly executed on hardware of native processor. The complex instruction will be emulated by the “extra optimized bytecode software handler” of native processor. In order to eliminate the overheads of retrieving and storing operand on memory, we utilize the physical registers instead of (low address) virtual registers. Moreover, the combination of 3 techniques: Delay scheduling, Mode predictor HW and Branch/goto controller can eliminate all of the switching mode overheads between native mode and bytecode mode. The experimental results show the improvements of execution speed on the Arithmetic instructions, loop & conditional instructions and method invocation & return instructions can be achieved up to 16.9x, 16.1x and 3.1x respectively. The approximate size of the proposed hardware extension is 0.04mm2 (or equivalent to 14.81k gates) and consumes an additional power of only 0.24mW. The stated results are obtained from logic synthesis using the TSMC 90nm technology @ 200MHz.

  • Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration

    Stewart DENHOLM  Hiroaki INOUE  Takashi TAKENAKA  Tobias BECKER  Wayne LUK  

     
    PAPER-Application

      Pubricized:
    2014/11/19
      Vol:
    E98-D No:2
      Page(s):
    288-297

    Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.

  • Activity Recognition Based on an Accelerometer in a Smartphone Using an FFT-Based New Feature and Fusion Methods

    Yang XUE  Yaoquan HU  Lianwen JIN  

     
    LETTER-Human-computer Interaction

      Vol:
    E97-D No:8
      Page(s):
    2182-2186

    With the development of personal electronic equipment, the use of a smartphone with a tri-axial accelerometer to detect human physical activity is becoming popular. In this paper, we propose a new feature based on FFT for activity recognition from tri-axial acceleration signals. To improve the classification performance, two fusion methods, minimal distance optimization (MDO) and variance contribution ranking (VCR), are proposed. The new proposed feature achieves a recognition rate of 92.41%, which outperforms six traditional time- or frequency-domain features. Furthermore, the proposed fusion methods effectively improve the recognition rates. In particular, the average accuracy based on class fusion VCR (CFVCR) is 97.01%, which results in an improvement in accuracy of 4.14% compared with the results without any fusion. Experiments confirm the effectiveness of the new proposed feature and fusion methods.

  • Evaluation of Basic Dynamical Parameters in Printed Circuit Board — Mass, Force, and Acceleration —

    Shin-ichi WADA  Koichiro SAWA  

     
    PAPER

      Vol:
    E96-C No:9
      Page(s):
    1165-1172

    The authors have developed a mechanism that applies real vibration to electrical contacts by hammering oscillation in the vertical direction similar to that in real cases, and they have studied the effects of micro-oscillation on the contacts using the mechanism. It is shown that the performance of the hammering oscillation mechanism (HOM) for measuring acceleration and force is superior to that of other methods in terms of the stability of data. Using the mechanism, much simpler and more practical protocols are proposed for evaluating acceleration, force, and mass using only the measured acceleration. It is also indicated that the relationship between the inertial force generated by the hammering oscillation mechanism and the frictional force in electrical devices attached on a board is related to one of the causes of the degradation of electrical contacts under the effect of external micro-oscillation.

  • Response-Time Acceleration of a Frontend Amplifier for High Output Impedance Sensors

    Kamel MARS  Shoji KAWAHITO  

     
    PAPER-Electronic Circuits

      Vol:
    E95-C No:9
      Page(s):
    1543-1548

    This paper presents a response time acceleration technique in a high-gain capacitive-feedback frontend amplifier (FA) for high output impedance sensors. Using an auxiliary amplifier as a unity-gain buffer, a sample-and-hold capacitor which is used for band-limiting and sampling the FA output is driven at the beginning of the transient response to make the response faster and then it is re-charged directly by the FA output. A condition and parameters for the response time acceleration using this technique while maintaining the noise level unaffected are discussed. Theoretical analysis and simulation results show that the response time can be less than half of the case without the acceleration technique for the specified settling error of less than 0.5%.

  • Discrimination between Upstairs and Downstairs Based on Accelerometer

    Yang XUE  Lianwen JIN  

     
    LETTER

      Vol:
    E94-D No:6
      Page(s):
    1173-1177

    An algorithm for the discrimination between human upstairs and downstairs using a tri-axial accelerometer is presented in this paper, which consists of vertical acceleration calibration, extraction of two kinds of features (Interquartile Range and Wavelet Energy), effective feature subset selection with the wrapper approach, and SVM classification. The proposed algorithm can recognize upstairs and downstairs with 95.64% average accuracy for different sensor locations, i.e. located on the subject's waist belt, in the trousers pocket, and in the shirt pocket. Even for the mixed data from all sensor locations, the average recognition accuracy can reach 94.84%. Experimental results have successfully validated the effectiveness of the proposed method.

  • Efficient Combination of Likelihood Recycling and Batch Calculation for Fast Acoustic Likelihood Calculation

    Atsunori OGAWA  Satoshi TAKAHASHI  Atsushi NAKAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E94-D No:3
      Page(s):
    648-658

    This paper proposes an efficient combination of state likelihood recycling and batch state likelihood calculation for accelerating acoustic likelihood calculation in an HMM-based speech recognizer. Recycling and batch calculation are each based on different technical approaches, i.e. the former is a purely algorithmic technique while the latter fully exploits computer architecture. To accelerate the recognition process further by combining them efficiently, we introduce conditional fast processing and acoustic backing-off. Conditional fast processing is based on two criteria. The first potential activity criterion is used to control not only the recycling of state likelihoods at the current frame but also the precalculation of state likelihoods for several succeeding frames. The second reliability criterion and acoustic backing-off are used to control the choice of recycled or batch calculated state likelihoods when they are contradictory in the combination and to prevent word accuracies from degrading. Large vocabulary spontaneous speech recognition experiments using four different CPU machines under two environmental conditions showed that, compared with the baseline recognizer, recycling and batch calculation, our combined acceleration technique further reduced both of the acoustic likelihood calculation time and the total recognition time. We also performed detailed analyses to reveal each technique's acceleration and environmental dependency mechanisms by classifying types of state likelihoods and counting each of them. The analysis results comfirmed the effectiveness of the combined acceleration technique.

  • Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

    Yuma MUNEKAWA  Fumihiko INO  Kenichi HAGIHARA  

     
    PAPER-Parallel and Distributed Architecture

      Vol:
    E93-D No:6
      Page(s):
    1479-1488

    This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.

  • Acceleration of Genetic Programming by Hierarchical Structure Learning: A Case Study on Image Recognition Program Synthesis

    Ukrit WATCHAREERUETAI  Tetsuya MATSUMOTO  Noboru OHNISHI  Hiroaki KUDO  Yoshinori TAKEUCHI  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E92-D No:10
      Page(s):
    2094-2102

    We propose a learning strategy for acceleration in learning speed of genetic programming (GP), named hierarchical structure GP (HSGP). The HSGP exploits multiple learning nodes (LNs) which are connected in a hierarchical structure, e.g., a binary tree. Each LN runs conventional evolutionary process to evolve its own population, and sends the evolved population into the connected higher-level LN. The lower-level LN evolves the population with a smaller subset of training data. The higher-level LN then integrates the evolved population from the connected lower-level LNs together, and evolves the integrated population further by using a larger subset of training data. In HSGP, evolutionary processes are sequentially executed from the bottom-level LNs to the top-level LN which evolves with the entire training data. In the experiments, we adopt conventional GPs and the HSGPs to evolve image recognition programs for given training images. The results show that the use of hierarchical structure learning can significantly improve learning speed of GPs. To achieve the same performance, the HSGPs need only 30-40% of the computation cost needed by conventional GPs.

  • Accelerating Relaxation Using Dynamic Error Prediction

    Hong Bo CHE  Jin Wook KIM  Tae Il BAE  Young Hwan KIM  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E92-A No:2
      Page(s):
    648-651

    A new acceleration scheme that decreases the number of required iterations in relaxation methodology is proposed. The proposed scheme uses dynamic error prediction of an improved approximation to the solution during an iterative computation. The proposed scheme's application to circuit simulations required an average of 67.3% fewer iterations compared to un-accelerated relaxation methods.

  • Convergence Acceleration of Iterative Signal Detection for MIMO System with Belief Propagation

    Satoshi GOUNAI  Tomoaki OHTSUKI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E91-B No:8
      Page(s):
    2640-2647

    In multiple-input multiple-output (MIMO) wireless systems, the receiver must extract each transmitted signal from received signals. Iterative signal detection with belief propagation (BP) can improve the error rate performance, by increasing the number of detection and decoding iterations in MIMO systems. This number of iterations is, however, limited in actual systems because each additional iteration increases latency, receiver size, and so on. This paper proposes a convergence acceleration technique that can achieve better error rate performance with fewer iterations than the conventional iterative signal detection. Since the Log-Likelihood Ratio (LLR) of one bit propagates to all other bits with BP, improving some LLRs improves overall decoder performance. In our proposal, all the coded bits are divided into groups and only one group is detected in each iterative signal detection whereas in the conventional approach, each iterative signal detection run processes all coded bits, simultaneously. Our proposal increases the frequency of initial LLR update by increasing the number of iterative signal detections and decreasing the number of coded bits that the receiver detects in one iterative signal detection. Computer simulations show that our proposal achieves better error rate performance with fewer detection and decoding iterations than the conventional approach.

  • Analysis of the Behavior of Cuprous Oxide by Acceleration Test for Evaluation of Heat and Fire Phenomena of Imperfectly Connected Electrical Terminal

    Yoichi AOYAMA  Hisa NUMA  Ryo FUJITA  

     
    PAPER-Contact Phenomena

      Vol:
    E90-C No:7
      Page(s):
    1398-1404

    To evaluate heat and fire phenomena caused by accumulated microslide motion on an imperfectly connected electrical terminal, an acceleration test method using vibrator was developed. The process from the generation of CuO to that of Cu2O has been reproduced. The influence of current is investigated, and it is found that as current increases, CuO generation time T1 and Cu2O generation time T2 decrease for pure copper, however when current exceeds 3 A, we could not produce CuO or Cu2O. The contact resistances of a Cu terminal and wire, compared with the terminal material were investigated in terms of the effects of current and ambient temperature.

  • Multigrid Optimization Method Applied to Electromagnetic Inverse Scattering Problem

    Mitsuru TANAKA  Kazuki YANO  Hiroyuki YOSHIDA  Atsushi KUSUNOKI  

     
    PAPER-Inverse Problems

      Vol:
    E90-C No:2
      Page(s):
    320-326

    An iterative reconstruction algorithm of accelerating the estimation of the complex relative permittivity of a cylindrical dielectric object based on the multigrid optimization method (MGOM) is presented. A cost functional is defined by the norm of a difference between the scattered electric fields measured and calculated for an estimated contrast function, which is expressed as a function of the complex relative permittivity of the object. Then the electromagnetic inverse scattering problem can be treated as an optimization problem where the contrast function is determined by minimizing the cost functional. We apply the conjugate gradient method (CGM) and the frequency-hopping technique (FHT) to the minimization of the cost functional, and also employ the multigrid method (MGM) with a V-cycle to accelerate the rate of convergence for getting the reconstructed profile. The reconstruction scheme is called the multigrid optimization method. Computer simulations are performed for lossy and inhomogeneous dielectric circular cylinders by using single-frequency or multifrequency scattering data. The numerical results demonstrate that the rate of convergence of the proposed metod is much faster than that of the conventional CGM for both noise-free and noisy cases.

  • Accelerating Database Processing at Database-Driven Web Sites

    Seunglak CHOI  Jinwon LEE  Su Myeon KIM  Junehwa SONG  Yoon-Joon LEE  

     
    PAPER-Contents Technology and Web Information Systems

      Vol:
    E89-D No:11
      Page(s):
    2724-2738

    Most commercial Web sites dynamically generate their contents through a three-tier server architecture composed of a Web server, an application server, and a database server. In such an architecture, the database server easily becomes a bottleneck to the overall performance. In this paper, we propose WDBAccel, a high-performance database server accelerator that significantly improves the throughput of database processing. WDBAccel eliminates costly, complex query processing needed to obtain query results by reusing the results from previous queries for subsequent queries. This differentiates WDBAccel from other database cache systems, which employ traditional query processing. WDBAccel further improves its performance by fully utilizing main memory as the primary storage. This paper presents the design and implementation of the WDBAccel as well as the results of performance evaluation with a prototype.

  • JPEG 2000 Encoding Method for Reducing Tiling Artifacts

    Masayuki HASHIMOTO  Kenji MATSUO  Atsushi KOIKE  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E88-D No:12
      Page(s):
    2839-2848

    This paper proposes an effective JPEG 2000 encoding method for reducing tiling artifacts, which cause one of the biggest problems in JPEG 2000 encoders. Symmetric pixel extension is generally thought to be the main factor in causing artifacts. However this paper shows that differences in quantization accuracy between tiles are a more significant reason for tiling artifacts at middle or low bit rates. This paper also proposes an algorithm that predicts whether tiling artifacts will occur at a tile boundary in the rate control process and that locally improves quantization accuracy by the original post quantization control. This paper further proposes a method for reducing processing time which is yet another serious problem in the JPEG 2000 encoder. The method works by predicting truncation points using the entropy of wavelet transform coefficients prior to the arithmetic coding. These encoding methods require no additional processing in the decoder. The experiments confirmed that tiling artifacts were greatly reduced and that the coding process was considerably accelerated.

21-40hit(53hit)