The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ATI(18690hit)

3381-3400hit(18690hit)

  • A Bipartite Graph-Based Ranking Approach to Query Subtopics Diversification Focused on Word Embedding Features

    Md Zia ULLAH  Masaki AONO  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2016/09/05
      Vol:
    E99-D No:12
      Page(s):
    3090-3100

    Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and contain a group of alternative queries with similar meaning. Therefore, identifying the subtopics covering possible intents behind a query is a formidable task. Moreover, both the query and subtopics are short in length, it is challenging to estimate the similarity between a pair of short texts and rank them accordingly. In this paper, we propose a method for mining and ranking subtopics where we introduce multiple semantic and content-aware features, a bipartite graph-based ranking (BGR) method, and a similarity function for short texts. Given a query, we aggregate the suggested queries from search engines as candidate subtopics and estimate the relevance of them with the given query based on word embedding and content-aware features by modeling a bipartite graph. To estimate the similarity between two short texts, we propose a Jensen-Shannon divergence based similarity function through the probability distributions of the terms in the top retrieved documents from a search engine. A diversified ranked list of subtopics covering possible intents of a query is assembled by balancing the relevance and novelty. We experimented and evaluated our method on the NTCIR-10 INTENT-2 and NTCIR-12 IMINE-2 subtopic mining test collections. Our proposed method outperforms the baselines, known related methods, and the official participants of the INTENT-2 and IMINE-2 competitions.

  • An FPGA Implementation for a Flexible-Length-Arithmetic Processor Employing the FDFM Processor Core Approach

    Tatsuya KAWAMOTO  Xin ZHOU  Jacir L. BORDIM  Yasuaki ITO  Koji NAKANO  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2901-2910

    Algorithms requiring fast manipulation of multiple-length numbers are usually implemented in hardware. However, hardware implementation, using HDL (Hardware Description Language) for instance, is a laborious task and the quality of the solution relies heavily on the designer expertise. The main contribution of this work is to present a flexible-length-arithmetic processor based on FDFM (Few DSP slices and Few Memory blocks) approach that supports arithmetic operations on multiple-length numbers using FPGAs (Field Programmable Gate Array). The proposed processor has been implement on the Xilinx Virtex-6 FPGA. Arithmetic instructions of the proposed processor architecture include addition, subtraction, and multiplication of integer numbers exceeding 64-bits. To reduce the burden of implementing algorithm directly on the FPGA, applications requiring multiple-length arithmetic operations are written in a C-like language and translated into a machine program. The machine program is then transferred and executed on the proposed architecture. A 2048-bit RSA encryption/decryption implementation has been used to assess the goodness of the proposed approach. Experimental results shows that the computing time, using the proposed architecture, of a 2048-bit RSA encryption takes only 2.2 times longer than a direct FPGA implementation. Furthermore, by employing multiple FDFM cores for the same task, the computing time reduces considerably.

  • Cognition-Aware Summarization of Photos Representing Events

    Bei LIU  Makoto P. KATO  Katsumi TANAKA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2016/09/01
      Vol:
    E99-D No:12
      Page(s):
    3140-3153

    The use of photo summarization technology to summarize a photo collection is often oriented to users who own the photo collection. However, people's interest in sharing photos with others highlights the importance of cognition-aware summarization of photos by which viewers can easily recognize the exact event those photos represent. In this research, we address the problem of cognition-aware summarization of photos representing events, and propose to solve this problem and to improve the perceptual quality of a photo set by proactively preventing misrecognization that a photo set might bring. Three types of neighbor events that can possibly cause misrecognizations are discussed in this paper, namely sub-events, super-events and sibling-events. We analyze the reasons for these misrecognitions and then propose three criteria to prevent from them. A combination of the criteria is used to generate summarization of photos that can represent an event with several photos. Our approach was empirically demonstrated with photos from Flickr by utilizing their visual features and related tags. The results indicated the effectiveness of our proposed methods in comparison with a baseline method.

  • A Mobile Agent Based Distributed Variational Bayesian Algorithm for Flow and Speed Estimation in a Traffic System

    Mohiyeddin MOZAFFARI  Behrouz SAFARINEJADIAN  

     
    PAPER-Sensor network

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2934-2942

    This paper provides a mobile agent based distributed variational Bayesian (MABDVB) algorithm for density estimation in sensor networks. It has been assumed that sensor measurements can be statistically modeled by a common Gaussian mixture model. In the proposed algorithm, mobile agents move through the routes of the network and compute the local sufficient statistics using local measurements. Afterwards, the global sufficient statistics will be updated using these local sufficient statistics. This procedure will be repeated until convergence is reached. Consequently, using this global sufficient statistics the parameters of the density function will be approximated. Convergence of the proposed method will be also analytically studied, and it will be shown that the estimated parameters will eventually converge to their true values. Finally, the proposed algorithm will be applied to one-dimensional and two dimensional data sets to show its promising performance.

  • Performance Optimization of Light-Field Applications on GPU

    Yuttakon YUTTAKONKIT  Shinya TAKAMAEDA-YAMAZAKI  Yasuhiko NAKASHIMA  

     
    PAPER-Computer System

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    3072-3081

    Light-field image processing has been widely employed in many areas, from mobile devices to manufacturing applications. The fundamental process to extract the usable information requires significant computation with high-resolution raw image data. A graphics processing unit (GPU) is used to exploit the data parallelism as in general image processing applications. However, the sparse memory access pattern of the applications reduced the performance of GPU devices for both systematic and algorithmic reasons. Thus, we propose an optimization technique which redesigns the memory access pattern of the applications to alleviate the memory bottleneck of rendering application and to increase the data reusability for depth extraction application. We evaluated our optimized implementations with the state-of-the-art algorithm implementations on several GPUs where all implementations were optimally configured for each specific device. Our proposed optimization increased the performance of rendering application on GTX-780 GPU by 30% and depth extraction application on GTX-780 and GTX-980 GPUs by 82% and 18%, respectively, compared with the original implementations.

  • A Feasibility Study of DSP-Enabled Cancellation of Random Phase Noise Caused by Optical Coherent Transceivers in Next-Generation Optical Access Systems

    Sang-Yuep KIM  Jun-ichi KANI  Hideaki KIMURA  

     
    PAPER-Fiber-Optic Transmission for Communications

      Pubricized:
    2016/06/28
      Vol:
    E99-B No:12
      Page(s):
    2574-2582

    This paper presents a scheme that digitally cancels the unwanted phase components generated by the transmitter's laser and the receiver's local oscillator laser; such components place a substantial limit on the performance of coherent transceivers monolithically integrated with lasers in a photonic integrated circuit (PIC). Our cancellation proposal adopts the orthogonal polarization approach to provide a reference that is uncorrelated with the data signal. We elaborate on the principle of our proposal and its digital signal processing (DSP) algorithm. Experiments on a VCSEL with a linewidth of approximately 300MHz verify that our proposal can overcome the inherent phase noise limitations indicated by simulations and experiments. Our cancellation algorithm in conjunction with CMA-based polarization control is demonstrated and evaluated to confirm the feasibility of our proposal. The achievement of greatly relaxed laser linewidth will offer a significant benefit in offsetting the technical and cost requirements of coherent transceiver PICs with lasers. Therefore, our cancellation proposal is an enabling technology for the successful deployment of future coherent-based passive optical network (PON) systems.

  • Lossless Data Compression via Substring Enumeration for k-th Order Markov Sources with a Finite Alphabet

    Ken-ichi IWATA  Mitsuharu ARIMURA  

     
    PAPER-Source Coding and Data Compression

      Vol:
    E99-A No:12
      Page(s):
    2130-2135

    A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.

  • Fast Live Migration for IO-Intensive VMs with Parallel and Adaptive Transfer of Page Cache via SAN

    Soramichi AKIYAMA  Takahiro HIROFUCHI  Ryousei TAKANO  Shinichi HONIDEN  

     
    PAPER-Operating system

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    3024-3034

    Live migration plays an important role on improving efficiency of cloud data centers by enabling dynamically replacing virtual machines (VMs) without disrupting services running on them. Although many studies have proposed acceleration mechanisms of live migration, IO-intensive VMs still suffer from long total migration time due to a large amount of page cache. Existing studies for this problem either force the guest OS to delete the page cache before a migration, or they do not consider dynamic characteristics of cloud data centers. We propose a parallel and adaptive transfer of page cache for migrating IO-intensive VMs which (1) does not delete the page cache and is still fast by utilizing the storage area network of a data center, and (2) achieves the shortest total migration time without tuning hand-crafted parameters. Experiments showed that our method reduces total migration time of IO-intensive VMs up to 33.9%.

  • Asymptotic Optimality of QPSK Faster-than-Nyquist Signaling in Massive MIMO Systems

    Keigo TAKEUCHI  

     
    PAPER-Communication Theory and Systems

      Vol:
    E99-A No:12
      Page(s):
    2192-2201

    Faster-than-Nyquist (FTN) signaling is investigated for quasi-static flat fading massive multiple-input multiple-output (MIMO) systems. In FTN signaling, pulse trains are sent at a symbol rate higher than the Nyquist rate to increase the transmission rate. As a result, inter-symbol interference occurs inevitably for flat fading channels. This paper assesses the information-theoretically achievable rate of MIMO FTN signaling based on the optimum joint equalization and multiuser detection. The replica method developed in statistical physics is used to evaluate the achievable rate in the large-system limit, where the dimensions of input and output signals tend to infinity at the same rate. An analytical expression of the achievable rate is derived for general modulation schemes in the large-system limit. It is shown that FTN signaling does not improve the channel capacity of massive MIMO systems, and that FTN signaling with quadrature phase-shift keying achieves the channel capacity for all signal-to-noise ratios as the symbol period tends to zero.

  • Low Overhead Design of Power Reconfigurable FPGA with Fine-Grained Body Biasing on 65-nm SOTB CMOS Technology

    Masakazu HIOKI  Hanpei KOIKE  

     
    PAPER-Computer System

      Pubricized:
    2016/09/13
      Vol:
    E99-D No:12
      Page(s):
    3082-3089

    A Field Programmable Gate Array (FPGA) with fine-grained body biasing shows satisfactory static power reduction. Contrarily, the FPGA incurs high overhead because additional body bias selectors and electrical isolation regions are needed to program the threshold voltage (Vt) of elemental circuits such as MUX, buffer and LUT in the FPGA. In this paper, low overhead design of FPGA with fine-grained body biasing is described. The FPGA is designed and fabricated on 65-nm SOTB CMOS technology. By not only adopting a customized design rule specifying that reliability is verified by TEGs but downsizing a body bias selector, the FPGA tile area becomes small by 39% compared with the conventional design, resulting in 900 FPGA tiles with 4,4000 programmable Vt regions. In addition, the chip performance is evaluated by implementing 32-bit binary counter in the supply voltage range of 0.5V from 1.2V. The counter circuit operates at a frequency of 72MHz and 14MHz with the supply voltage of 1.2V and 0.5V respectively. The static power saving of 80% in elemental circuits of the FPGA at 0.5-V supply voltage and 0.5-V reverse body bias voltage is achieved in the best case. In the whole chip including configuration memory and body bias selector in addition to elemental circuits, effective static power reduction around 30% is maintained by applying 0.3-V reverse body bias voltage at each supply voltage.

  • A Fast MER Enumeration Algorithm for Online Task Placement on Reconfigurable FPGAs

    Tieyuan PAN  Lian ZENG  Yasuhiro TAKASHIMA  Takahiro WATANABE  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2412-2424

    In this paper, we propose a fast Maximal Empty Rectangle (MER) enumeration algorithm for online task placement on reconfigurable Field-Programmable Gate Arrays (FPGAs). On the assumption that each task utilizes rectangle-shaped resources, the proposed algorithm can manage the free space on FPGAs by an MER list. When assigning or removing a task, a series of MERs are selected and cut into segments according to the task and its assignment location. By processing these segments, the MER list can be updated quickly with low memory consumption. Under the proof of the upper limit of the number of the MERs on the FPGA, we analyze both the time and space complexity of the proposed algorithm. The efficiency of the proposed algorithm is verified by experiments.

  • Non-Uniform Clock Mesh Synthesis with Clock Gating and Register Clustering

    Wei-Kai CHENG  Jui-Hung HUNG  Yi-Hsuan CHIU  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2388-2397

    As the increasing complexity of chip design, reducing both power consumption and clock skew becomes a crucial research topic in clock network synthesis. Among various clock network synthesis approaches, clock tree has less power consumption in comparison with clock mesh structure. In contrast, clock mesh has a higher tolerance of process variation and hence is easier to satisfy the clock skew constraint. To reduce the power consumption of clock mesh network, an effective way is to minimize the wire capacitance of stub wires. In addition, integration of clock gating and register clustering techniques on clock mesh network can further reduce dynamic power consumption. In this paper, under both enable timing constraint and clock skew constraint, we propose a methodology to reduce the switching capacitance by non-uniform clock mesh synthesis, clock gate insertion and register clustering. In comparison with clock mesh synthesis and clock gating technique individually, experimental results show that our methodology can improve both the clock skew and switching capacitance efficiently.

  • A Peer-to-Peer Content-Distribution Scheme Resilient to Key Leakage

    Tatsuyuki MATSUSHITA  Shinji YAMANAKA  Fangming ZHAO  

     
    PAPER-Distributed system

      Pubricized:
    2016/08/25
      Vol:
    E99-D No:12
      Page(s):
    2956-2967

    Peer-to-peer (P2P) networks have attracted increasing attention in the distribution of large-volume and frequently accessed content. In this paper, we mainly consider the problem of key leakage in secure P2P content distribution. In secure content distribution, content is encrypted so that only legitimate users can access the content. Usually, users (peers) cannot be fully trusted in a P2P network because malicious ones might leak their decryption keys. If the redistribution of decryption keys occurs, copyright holders may incur great losses caused by free riders who access content without purchasing it. To decrease the damage caused by the key leakage, the individualization of encrypted content is necessary. The individualization means that a different (set of) decryption key(s) is required for each user to access content. In this paper, we propose a P2P content distribution scheme resilient to the key leakage that achieves the individualization of encrypted content. We show the feasibility of our scheme by conducting a large-scale P2P experiment in a real network.

  • Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

    Yuechao LU  Fumihiko INO  Kenichi HAGIHARA  

     
    PAPER-Computer System

      Pubricized:
    2016/09/05
      Vol:
    E99-D No:12
      Page(s):
    3060-3071

    This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 20483-voxel volume from 1200 20482-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.

  • Auto-Radiometric Calibration in Photometric Stereo

    Wiennat MONGKULMANN  Takahiro OKABE  Yoichi SATO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2016/09/01
      Vol:
    E99-D No:12
      Page(s):
    3154-3164

    We propose a framework to perform auto-radiometric calibration in photometric stereo methods to estimate surface orientations of an object from a sequence of images taken using a radiometrically uncalibrated camera under varying illumination conditions. Our proposed framework allows the simultaneous estimation of surface normals and radiometric responses, and as a result can avoid cumbersome and time-consuming radiometric calibration. The key idea of our framework is to use the consistency between the irradiance values converted from pixel values by using the inverse response function and those computed from the surface normals. Consequently, a linear optimization problem is formulated to estimate the surface normals and the response function simultaneously. Finally, experiments on both synthetic and real images demonstrate that our framework enables photometric stereo methods to accurately estimate surface normals even when the images are captured using cameras with unknown and nonlinear response functions.

  • A Bit-Write-Reducing and Error-Correcting Code Generation Method by Clustering ECC Codewords for Non-Volatile Memories

    Tatsuro KOJO  Masashi TAWADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2398-2411

    Non-volatile memories are paid attention to as a promising alternative to memory design. Data stored in them still may be destructed due to crosstalk and radiation. We can restore the data by using error-correcting codes which require extra bits to correct bit errors. Further, non-volatile memories consume ten to hundred times more energy than normal memories in bit-writing. When we configure them using error-correcting codes, it is quite necessary to reduce writing bits. In this paper, we propose a method to generate a bit-write-reducing code with error-correcting ability. We first pick up an error-correcting code which can correct t-bit errors. We cluster its codeswords and generate a cluster graph satisfying the S-bit flip conditions. We assign a data to be written to each cluster. In other words, we generate one-to-many mapping from each data to the codewords in the cluster. We prove that, if the cluster graph is a complete graph, every data in a memory cell can be re-written into another data by flipping at most S bits keeping error-correcting ability to t bits. We further propose an efficient method to cluster error-correcting codewords. Experimental results show that the bit-write-reducing and error-correcting codes generated by our proposed method efficiently reduce energy consumption. This paper proposes the world-first theoretically near-optimal bit-write-reducing code with error-correcting ability based on the efficient coding theories.

  • Signal Power Estimation Based on Orthogonal Projection and Oblique Projection

    Norisato SUGA  Toshihiro FURUKAWA  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:12
      Page(s):
    2571-2575

    In this letter, we show the new signal power estimation method base on the subspace projection. This work mainly contributes to the SINR estimation problem because, in this research, the signal power estimation is implicitly or explicitly performed. The difference between our method and the conventional method related to this topic is the exploitation of the subspace character of the signals constructing the observed signal. As tools to perform subspace operation, we apply orthogonal projection and oblique projection which can extracts desired parameters. In the proposed scheme, the statistics of the projected observed signal by these projection are used to estimate the parameters.

  • GPU-Accelerated Bulk Execution of Multiple-Length Multiplication with Warp-Synchronous Programming Technique

    Takumi HONDA  Yasuaki ITO  Koji NAKANO  

     
    PAPER-GPU computing

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    3004-3012

    In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt a warp-synchronous programming technique. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize execution of threads and communicate within threads. In warp-synchronous programming technique, however, execution of threads in a warp can be synchronized instruction by instruction without any barrier synchronous operations. Also, inter-thread communication can be performed by warp shuffle functions without accessing shared memory. The experimental results show that our GPU implementation on NVIDIA GeForce GTX 980 attains a speed-up factor of 52 for 1024-bit multiple-length multiplication over the sequential CPU implementation. Moreover, we use this 1024-bit multiple-length multiplication for larger size of bits as a sub-routine. The GPU implementation attains a speed-up factor of 21 for 65536-bit multiple-length multiplication.

  • Low Complexity Reed-Solomon Decoder Design with Pipelined Recursive Euclidean Algorithm

    Kazuhito ITO  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2453-2462

    A Reed-Solomon (RS) decoder is designed based on the pipelined recursive Euclidean algorithm in the key equation solution. While the Euclidean algorithm uses less Galois multipliers than the modified Euclidean (ME) and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, division between two elements in Galois field is required. By implementing the division with a multi-cycle Galois inverter and a serial Galois multiplier, the proposed key equation solver architecture achieves lower complexity than the conventional ME and RiBM based architectures. The proposed RS (255,239) decoder reduces the hardware complexity by 25.9% with 6.5% increase in decoding latency.

  • On the Computational Complexity of the Linear Solvability of Information Flow Problems with Hierarchy Constraint

    Yuki TAKEDA  Yuichi KAJI  Minoru ITO  

     
    PAPER-Networks and Network Coding

      Vol:
    E99-A No:12
      Page(s):
    2211-2217

    An information flow problem is a graph-theoretical formalization of the transportation of information over a complicated network. It is known that a linear network code plays an essential role in a certain type of information flow problems, but it is not understood clearly how contributing linear network codes are for other types of information flow problems. One basic problem concerning this aspect is the linear solvability of information flow problems, which is to decide if there is a linear network code that is a solution to the given information flow problem. Lehman et al. characterize the linear solvability of information flow problems in terms of constraints on the sets of source and sink nodes. As an extension of Lehman's investigation, this study introduces a hierarchy constraint of messages, and discusses the computational complexity of the linear solvability of information flow problems with the hierarchy constraints. Nine classes of problems are newly defined, and classified to one of three categories that were discovered by Lehman et al.

3381-3400hit(18690hit)