The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] arc(1309hit)

221-240hit(1309hit)

  • Wavelet Pyramid Based Multi-Resolution Bilateral Motion Estimation for Frame Rate Up-Conversion

    Ran LI  Hongbing LIU  Jie CHEN  Zongliang GAN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2015/06/03
      Vol:
    E99-D No:1
      Page(s):
    208-218

    The conventional bilateral motion estimation (BME) for motion-compensated frame rate up-conversion (MC-FRUC) can avoid the problem of overlapped areas and holes but usually results in lots of inaccurate motion vectors (MVs) since 1) the MV of an object between the previous and following frames is more likely to have no temporal symmetry with respect to the target block of the interpolated frame and 2) the repetitive patterns existing in video frame lead to the problem of mismatch due to the lack of the interpolated block. In this paper, a new BME algorithm with a low computational complexity is proposed to resolve the above problems. The proposed algorithm incorporates multi-resolution search into BME, since it can easily utilize the MV consistency between two adjacent pyramid levels and spatial neighboring MVs to correct the inaccurate MVs resulting from no temporal symmetry while guaranteeing low computational cost. Besides, the multi-resolution search uses the fast wavelet transform to construct the wavelet pyramid, which not only can guarantee low computational complexity but also can reserve the high-frequency components of image at each level while sub-sampling. The high-frequency components are used to regularize the traditional block matching criterion for reducing the probability of mismatch in BME. Experiments show that the proposed algorithm can significantly improve both the objective and subjective quality of the interpolated frame with low computational complexity, and provide the better performance than the existing BME algorithms.

  • Design and Evaluation of a Configurable Query Processing Hardware for Data Streams

    Yasin OGE  Masato YOSHIMI  Takefumi MIYOSHI  Hideyuki KAWASHIMA  Hidetsugu IRIE  Tsutomu YOSHINAGA  

     
    PAPER-Computer System

      Pubricized:
    2015/09/14
      Vol:
    E98-D No:12
      Page(s):
    2207-2217

    In this paper, we propose Configurable Query Processing Hardware (CQPH), an FPGA-based accelerator for continuous query processing over data streams. CQPH is a highly optimized and minimal-overhead execution engine designed to deliver real-time response for high-volume data streams. Unlike most of the other FPGA-based approaches, CQPH provides on-the-fly configurability for multiple queries with its own dynamic configuration mechanism. With a dedicated query compiler, SQL-like queries can be easily configured into CQPH at run time. CQPH supports continuous queries including selection, group-by operation and sliding-window aggregation with a large number of overlapping sliding windows. As a proof of concept, a prototype of CQPH is implemented on an FPGA platform for a case study. Evaluation results indicate that a given query can be configured within just a few microseconds, and the prototype implementation of CQPH can process over 150 million tuples per second with a latency of less than a microsecond. Results also indicate that CQPH provides linear scalability to increase its flexibility (i.e., on-the-fly configurability) without sacrificing performance (i.e., maximum allowable clock speed).

  • Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators

    Yoshikazu INAGAKI  Shinya TAKAMAEDA-YAMAZAKI  Jun YAO  Yasuhiko NAKASHIMA  

     
    PAPER-Architecture

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2141-2149

    The Energy-aware Multi-mode Accelerator eXtension [24],[25] (EMAX) is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data, and image processing as well as low-power consumption. However, before mapping algorithms on the accelerator, application developers require sufficient knowledge of the hardware organization and specially designed instructions. They also need significant effort to tune the code for improving execution efficiency when no well-designed compiler or library is available. To address this problem, we focus on library support for stencil (nearest-neighbor) computations that represent a class of algorithms commonly used in many partial differential equation (PDE) solvers. In this research, we address the following topics: (1) system configuration, features, and mnemonics of EMAX; (2) instruction mapping techniques that reduce the amount of data to be read from the main memory; (3) performance evaluation of the library for PDE solvers. With the features of a library that can reuse the local data across the outer loop iterations and map many instructions by unrolling the outer loops, the amount of data to be read from the main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library reduced the execution time 23% more than a general-purpose processor.

  • A Routing-Based Mobility Management Scheme for IoT Devices in Wireless Mobile Networks Open Access

    Masanori ISHINO  Yuki KOIZUMI  Toru HASEGAWA  

     
    PAPER

      Vol:
    E98-B No:12
      Page(s):
    2376-2381

    Internet of Things (IoT) devices, which have different characteristics in mobility and communication patterns from traditional mobile devices such as cellular phones, have come into existence as a new type of mobile devices. A strict mobility management scheme for providing highly mobile devices with seamless access is over-engineered for IoT devices' mobility management. We revisit current mobility management schemes for wireless mobile networks based on identifier/locator separation. In this paper, we focus on IoT communication patterns, and propose a new routing-based mobility scheme for them. Our scheme adopts routing information aggregation scheme using the Bloom Filter as a data structure to store routing information. We clarify the effectiveness of our scheme in IoT environments with a large number of IoT devices, and discuss its deployment issues.

  • Ultrasmall: A Tiny Soft Processor Architecture with Multi-Bit Serial Datapaths for FPGAs

    Shinya TAKAMAEDA-YAMAZAKI  Hiroshi NAKATSUKA  Yuichiro TANAKA  Kenji KISE  

     
    PAPER-Architecture

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2150-2158

    Soft processors are widely used in FPGA-based embedded computing systems. For such purposes, efficiency in resource utilization is as important as high performance. This paper proposes Ultrasmall, a new soft processor architecture for FPGAs. Ultrasmall supports a subset of the MIPS-I instruction set architecture and employs an area efficient microarchitecture to reduce the use of FPGA resources. While supporting the original 32-bit ISA, Ultrasmall uses a 2-bit serial ALU for all of its operations. This approach significantly reduces the resource utilization instead of increasing the performance overheads. In addition to these device-independent optimizations, we applied several device-dependent optimizations for Xilinx Spartan-3E FPGAs using 4-input lookup tables (LUTs). Optimizations using specific primitives aggressively reduce the number of occupied slices. Our evaluation result shows that Ultrasmall occupies only 84% of the previous small soft processor. In addition to the utilized resource reduction, Ultrasmall achieves 2.9 times higher performance than the previous approach.

  • High Performance VLSI Architecture of H.265/HEVC Intra Prediction for 8K UHDTV Video Decoder

    Jianbin ZHOU  Dajiang ZHOU  Shihao WANG  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2519-2527

    8K Ultra High Definition Television (UHDTV) requires extremely high throughput for video decoding based on H.265. In H.265, intra coding could significantly enhance video compression efficiency, at the expense of an increased computational complexity compared with H.264. For intra prediction of 8K UHDTV real-time H.265 decoding, the joint complexity and throughput issue is more difficult to solve. Therefore, based on the divide-and-conquer strategy, we propose a new VLSI architecture in this paper, including two techniques, in order to achieve 8K UHDTV H.265 intra prediction decoding. The first technique is the LUT based Reference Sample Fetching Scheme (LUT-RSFS), reducing the number of reference samples in the worst case from 99 to 13. It further reduces the circuit area and enhances the performance. The second one is the Hybrid Block Reordering and Data Forwarding (HBRDF), minimizing the idle time and eliminating the dependency between TUs by creating 3 Data Forwarding paths. It achieves the hardware utilization of 94%. Our design is synthesized using Synopsys Design Compiler in 40nm process technology. It achieves an operation frequency of 260MHz, with a gate count of 217.8K for 8-bit design, and 251.1K for 10-bit design. The proposed VLSI architecture can support 4320p@120fps H.265 intra decoding (8-bit or 10-bit), with all 35 intra prediction modes and prediction unit sizes ranging from 4×4 to 64×64.

  • Data-Transfer-Aware Design of an FPGA-Based Heterogeneous Multicore Platform with Custom Accelerators

    Yasuhiro TAKEI  Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E98-A No:12
      Page(s):
    2658-2669

    For an FPGA-based heterogeneous multicore platform, we present the design methodology to reduce the total processing time considering data-transfer. The reconfigurability of recent FPGAs with hard CPU cores allows us to realize a single-chip heterogeneous processor optimized for a given application. The major problem in designing such heterogeneous processors is data-transfer between CPU cores and accelerator cores. The total processing time with data-transfers is modeled considering the overlap of computation time and data-transfer time, and optimal design parameters are searched for.

  • Top-Down Visual Attention Estimation Using Spatially Localized Activation Based on Linear Separability of Visual Features

    Takatsugu HIRAYAMA  Toshiya OHIRA  Kenji MASE  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2015/09/10
      Vol:
    E98-D No:12
      Page(s):
    2308-2316

    Intelligent information systems captivate people's attention. Examples of such systems include driving support vehicles capable of sensing driver state and communication robots capable of interacting with humans. Modeling how people search visual information is indispensable for designing these kinds of systems. In this paper, we focus on human visual attention, which is closely related to visual search behavior. We propose a computational model to estimate human visual attention while carrying out a visual target search task. Existing models estimate visual attention using the ratio between a representative value of visual feature of a target stimulus and that of distractors or background. The models, however, can not often achieve a better performance for difficult search tasks that require a sequentially spotlighting process. For such tasks, the linear separability effect of a visual feature distribution should be considered. Hence, we introduce this effect to spatially localized activation. Concretely, our top-down model estimates target-specific visual attention using Fisher's variance ratio between a visual feature distribution of a local region in the field of view and that of a target stimulus. We confirm the effectiveness of our computational model through a visual search experiment.

  • Fast Repairing from Large-Scale Failure Using Hierarchical SDN Controllers

    Shohei KAMAMURA  Hiroshi YAMAMOTO  Kouichi GENDA  Yuki KOIZUMI  Shin'ichi ARAKAWA  Masayuki MURATA  

     
    PAPER-Network

      Vol:
    E98-B No:11
      Page(s):
    2269-2279

    This paper proposes fast repairing methods that uses hierarchical software defined network controllers for recovering from massive failure in a large-scale IP over a wavelength-division multiplexing network. The network consists of multiple domains, and slave controllers are deployed in each domain. While each slave controller configures transport paths in its domain, the master controller manages end-to-end paths, which are established across multiple domains. For fast repair of intra-domain paths by the slave controllers, we define the optimization problem of path configuration order and propose a heuristic method, which minimizes the repair time to move from a disrupted state to a suboptimal state. For fast repair of end-to-end path through multiple domains, we also propose a network abstraction method, which efficiently manages the entire network. Evaluation results suggest that fast repair within a few minutes can be achieved by applying the proposed methods to the repairing scenario, where multiple links and nodes fail, in a 10,000-node network.

  • Efficient Anchor Graph Hashing with Data-Dependent Anchor Selection

    Hiroaki TAKEBE  Yusuke UEHARA  Seiichi UCHIDA  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2015/08/17
      Vol:
    E98-D No:11
      Page(s):
    2030-2033

    Anchor graph hashing (AGH) is a promising hashing method for nearest neighbor (NN) search. AGH realizes efficient search by generating and utilizing a small number of points that are called anchors. In this paper, we propose a method for improving AGH, which considers data distribution in a similarity space and selects suitable anchors by performing principal component analysis (PCA) in the similarity space.

  • HTTP Traffic Classification Based on Hierarchical Signature Structure

    Sung-Ho YOON  Jun-Sang PARK  Ji-Hyeok CHOI  Youngjoon WON  Myung-Sup KIM  

     
    LETTER-Information Network

      Pubricized:
    2015/08/19
      Vol:
    E98-D No:11
      Page(s):
    1994-1997

    Considering diversified HTTP types, the performance bottleneck of signature-based classification must be resolved. We define a signature model classifying the traffic in multiple dimensions and suggest a hierarchical signature structure to remove signature redundancy and minimize search space. Our experiments on campus traffic demonstrated 1.8 times faster processing speed than the Aho-Corasick matching algorithm in Snort.

  • Scalable Hardware Winner-Take-All Neural Network with DPLL

    Masaki AZUMA  Hiroomi HIKAWA  

     
    PAPER-Biocybernetics, Neurocomputing

      Pubricized:
    2015/07/21
      Vol:
    E98-D No:10
      Page(s):
    1838-1846

    Neural networks are widely used in various fields due to their superior learning abilities. This paper proposes a hardware winner-take-all neural network (WTANN) that employs a new winner-take-all (WTA) circuit with phase-modulated pulse signals and digital phase-locked loops (DPLLs). The system uses DPLL as a computing element, so all input values are expressed by phases of rectangular signals. The proposed WTA circuit employs a simple winner search circuit. The proposed WTANN architecture is described by very high speed integrated circuit (VHSIC) hardware description language (VHDL), and its feasibility was tested and verified through simulations and experiments. Conventional WTA takes a global winner search approach, in which vector distances are collected from all neurons and compared. In contrast, the WTA in the proposed system is carried out locally by a distributed winner search circuit among neurons. Therefore, no global communication channels with a wide bandwidth between the winner search module and each neuron are required. Furthermore, the proposed WTANN can easily extend the system scale, merely by increasing the number of neurons. The circuit size and speed were then evaluated by applying the VHDL description to a logic synthesis tool and experiments using a field programmable gate array (FPGA). Vector classifications with WTANN using two kinds of data sets, Iris and Wine, were carried out in VHDL simulations. The results revealed that the proposed WTANN achieved valid learning.

  • Comparisons on Arc Behavior and Contact Performance between Cu and Cu-Mo Alloys in a Bridge-Type Contact System

    Xue ZHOU  Mo CHEN  Guofu ZHAI  

     
    PAPER

      Vol:
    E98-C No:9
      Page(s):
    904-911

    Cu-Mo alloy carries forward not only high electrical conductivity and high thermal conductivity from Cu but also high hardness from Mo, which makes it a promising potential application in electrical contact fields. In this paper, arc characteristic and erosion characteristic of Cu-Mo contacts are studied with a bridge-type contact high speed break mechanism on DC270 V/200 A load condition. And in each experiment group, 2500 times break operations are carried out. During every break operation, a high-speed AD card is used to record voltage and current signal of the arc, a high-speed camera is applied to record arcing process, and the temperature of contacts and arc are acquired by thermocouple and spectrometer, respectively. The mass and contact resistance of contacts are measured before and after every group experiment. Besides, the photograph of contact surface is taken by SEM to help analyze the erosion characteristic. The comparison between Cu-Mo contacts and Cu contacts indicates that although Cu contacts have a better electrical conductivity and thermal conductivity, Cu-Mo contacts can decrease the temperature of arc to prevent thermal breakdown, and they are also harder to be ablated and have a longer life span.

  • Cooperative Communication Using the DF Protocol in the Hierarchical Modulation

    Sung-Bok CHOI  Eui-Hak LEE  Jung-In BAIK  Young-Hwan YOU  Hyoung-Kyu SONG  

     
    LETTER-Communication Theory and Signals

      Vol:
    E98-A No:9
      Page(s):
    1990-1994

    To improve the BER performance of the conventional cooperative communication, this letter proposes an efficient method for the reliability, and it uses hierarchical modulation that has both the high priority (HP) layer and the low priority (LP) layer. To compensate more reliable transmission, the proposed method uses the error correction capability of Reed-Solomon (RS) codes additionally. The simulation results show that the proposed method can transmit data more reliably than the basic RS coded decode-and-forward (DF) method.

  • Influences of Contact Opening Speeds on Break Arc Behaviors of AgSnO2 Contact Pairs in DC Inductive Load Conditions

    Makoto HASEGAWA  

     
    BRIEF PAPER

      Vol:
    E98-C No:9
      Page(s):
    923-927

    Break operations of DC inductive (L=20mH) load currents up to about 5A with 14V were conducted in air with AgSnO2 contact pairs under different contact opening speeds, first up to 20mm/s and then to 200mm/s. Average break arc duration at each current level was calculated under the respective opening speeds. While break arc durations became shorter with increases in the opening speeds at larger current levels, such reduction tendencies were less significant with an increase of the contact opening speed from 20mm/s to 200mm/s, even when operated to break a load current of 5A. Both load current levels and contact opening speed levels seem to exhibit certain roles for realizing arc shortening effects.

  • Hardware Architecture of the Fast Mode Decision Algorithm for H.265/HEVC

    Wenjun ZHAO  Takao ONOYE  Tian SONG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E98-A No:8
      Page(s):
    1787-1795

    In this paper, a specified hardware architecture of the Fast Mode Decision (FMD) algorithms presented by our previous work is proposed. This architecture is designed as an embedded mode dispatch module. On the basis of this module, some unnecessary modes can be skipped or the mode decision process can be terminated in advanced. In order to maintain a higher compatibility, the FMD algorithms are unitedly designed as an unique module that can be easily embedded into a common video codec for H.265/HEVC. The input and output interfaces between the proposed module and other parts of the codec are designed based on simple but effective protocol. Hardware synthesis results on FPGA demonstrate that the proposed architecture achieves a maximum frequency of about 193 MHz with less than 1% of the total resources consumed. Moreover, the proposed module can improve the overall throughput.

  • A Novel Beam Search Method in Millimeter-Wave Access Networks for 5G Mobile Communications

    Shunsuke FUJIO  Chimato KOIKE  Dai KIMURA  

     
    PAPER

      Vol:
    E98-B No:8
      Page(s):
    1456-1464

    The fifth generation (5G) mobile communication technologies are attracting a lot of attention in terms of accommodating the huge traffic expected in the future. Millimeter wave communications, which utilize wide frequency bands, are attracting attention for the realization of the high capacity required in the 5G era. In millimeter wave communications, beamforming with massive antennas is expected to play a very important role in compensating the large propagation loss of millimeter waves. Because massive beamforming yields narrow beams, the search for the optimal beam could have considerable impact on the system. In this paper, we propose a new beam search method that can reduce the load of beam search significantly while keeping beamforming gain almost the same as that of the conventional method. The proposed method consists of three stages with the creation of a set of candidate beams in the first stage, selection of an initial beam in the second stage, and refinement of the selected beam in the third stage. In the first stage, the created set of candidate beams contains beams of various widths instead of beams of a uniform width to reduce the number of candidate beams in the set. Here, we leverage the property of millimeter waves according to which the fluctuation of millimeter wave propagation loss is spatially and temporally small because of the fewer multipaths, and therefore, the propagation loss has strong correlation with the user location. By using the decreased set of candidate beams, the beam search time can be reduced in the second stage. Then the beam refinement can increase the beamforming gain to increase user throughput in the third stage. To confirm the effects of the proposed beam search method, we conduct system level simulations by using a propagation model for millimeter wave communications proposed by MiWEBA, which is an international project between Europe and Japan. The results show that the proposed beam search method can reduce the number of candidate beams, and can therefore shorten the beam search time by about 39% without any degradation in outage probability compared with a conventional method.

  • A Cooking-Step Scheduling Algorithm with Guidance System for Homemade Cooking

    Yukiko MATSUSHIMA  Nobuo FUNABIKI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2015/05/18
      Vol:
    E98-D No:8
      Page(s):
    1439-1448

    Homemade cooking plays a key role for a healthy and cost-efficient life. Unfortunately, preparing multiple dishes is generally time-consuming. In this paper, an algorithm is proposed to minimize the cooking time by scheduling the cooking-step of multiple dishes. The cooking procedure of a dish is divided into a sequence of six types of cooking-steps to consider the constraints in cooks and cooking utensils in a kitchen. A cooking model is presented to optimize the cooking-step schedule and estimate the cooking time for a given starting order of dishes under various constraints of cooks and utensils. Then, a high-quality schedule is sought by repeating the generation of a new order and the model application based on exhaustive search and simulated annealing. Our simulation results and cooking experiments confirm the effectiveness of our proposal.

  • A Near-Threshold Cell-Based All-Digital PLL with Hierarchical Band-Selection G-DCO for Fast Lock-In and Low-Power Applications

    Chia-Wen CHANG  Yuan-Hua CHU  Shyh-Jye JOU  

     
    PAPER-Integrated Electronics

      Vol:
    E98-C No:8
      Page(s):
    882-891

    This paper presents a cell-based all-digital phase-locked loop (ADPLL) with hierarchical gated digitally controlled oscillator (G-DCO) for low voltage operation, wide frequency range as well as low-power consumption. In addition, a new time-domain hierarchical frequency estimation algorithm (HFEA) for frequency acquisition is proposed to estimate the output frequency in 1.5MF (MF = 3 in this paper) cycles and this fast lock-in time is suitable to the dynamic voltage frequency scaling (DVFS) systems. A hierarchical G-DCO is proposed to work at low supply voltage to reduce the power consumption and at the same time to achieve wide frequency range and precise frequency resolution. The core area of the proposed ADPLL is 0.02635 mm2. In near-threshold region (VDD = 0.36 V), the proposed ADPLL only dissipates 68.2 µW and has a rms period jitter of 1.25% UI at 60 MHz output clock frequency. Under 0.5 V VDD operation, the proposed ADPLL dissipates 404.2 µW at 400 MHz. The fast lock-in time of 4.489 µs and the low jitter performance below 0.5% UI at 400 MHz output clock frequency in the proposed ADPLL are suitable in event-driven or DVFS applications.

  • A High-Level Synthesis Algorithm with Inter-Island Distance Based Operation Chainings for RDR Architectures

    Kotaro TERADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1366-1375

    In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.

221-240hit(1309hit)