The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PU(3318hit)

341-360hit(3318hit)

  • Impulse Noise Removal of Digital Image Considering Local Line Structure

    Shi BAO  Go TANAKA  

     
    LETTER-Image

      Vol:
    E102-A No:12
      Page(s):
    1915-1919

    For the impulse noise removal from a digital image, most of existing methods cannot repair line structures in an input image. In this letter, a method which considers the local line structure is proposed. In order to judge the direction of the line structure, adjacent lines are considered. The effectiveness of the proposed filter is shown by experiments.

  • Memory Efficient Load Balancing for Distributed Large-Scale Volume Rendering Using a Two-Layered Group Structure

    Marcus WALLDEN  Stefano MARKIDIS  Masao OKITA  Fumihiko INO  

     
    PAPER-Computer Graphics

      Pubricized:
    2019/09/09
      Vol:
    E102-D No:12
      Page(s):
    2306-2316

    We propose a novel compositing pipeline and a dynamic load balancing technique for volume rendering which utilizes a two-layered group structure to achieve effective and scalable load balancing. The technique enables each process to render data from non-contiguous regions of the volume with minimal impact on the total render time. We demonstrate the effectiveness of the proposed technique by performing a set of experiments on a modern GPU cluster. The experiments show that using the technique results in up to a 35.7% lower worst-case memory usage as compared to a dynamic k-d tree load balancing technique, whilst simultaneously achieving similar or higher render performance. The proposed technique was also able to lower the amount of transferred data during the load balancing stage by up to 72.2%. The technique has the potential to be used in many scenarios where other dynamic load balancing techniques have proved to be inadequate, such as during large-scale visualization.

  • Ternary Convolutional Codes with Optimum Distance Spectrum

    Shungo MIYAGI  Motohiko ISAKA  

     
    LETTER-Coding Theory

      Vol:
    E102-A No:12
      Page(s):
    1688-1690

    This letter presents ternary convolutional codes and their punctured codes with optimum distance spectrum.

  • On-Chip Cache Architecture Exploiting Hybrid Memory Structures for Near-Threshold Computing

    Hongjie XU  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E102-A No:12
      Page(s):
    1741-1750

    This paper focuses on power-area trade-off axis to memory systems. Compared with the power-performance-area trade-off application on the traditional high performance cache, this paper focuses on the edge processing environment which is becoming more and more important in the Internet of Things (IoT) era. A new power-oriented trade-off is proposed for on-chip cache architecture. As a case study, this paper exploits a good energy efficiency of Standard-Cell Memory (SCM) operating in a near-threshold voltage region and a good area efficiency of Static Random Access Memory (SRAM). A hybrid 2-level on-chip cache structure is first introduced as a replacement of 6T-SRAM cache as L0 cache to save the energy consumption. This paper proposes a method for finding the best capacity combination for SCM and SRAM, which minimizes the energy consumption of the hybrid cache under a specific cache area constraint. The simulation result using a 65-nm process technology shows that up to 80% energy consumption is reduced without increasing the die area by replacing the conventional SRAM instruction cache with the hybrid 2-level cache. The result shows that energy consumption can be reduced if the area constraint for the proposed hybrid cache system is less than the area which is equivalent to a 8kB SRAM. If the target operating frequency is less than 100MHz, energy reduction can be achieved, which implies that the proposed cache system is suitable for low-power systems where a moderate processing speed is required.

  • Representative Spatial Selection and Temporal Combination for 60fps Real-Time 3D Tracking of Twelve Volleyball Players on GPU

    Xina CHENG  Yiming ZHAO  Takeshi IKENAGA  

     
    PAPER-Image

      Vol:
    E102-A No:12
      Page(s):
    1882-1890

    Real-time 3D players tracking plays an important role in sports analysis, especially for the live services of sports broadcasting, which have a strict limitation on processing time. For these kinds of applications, 3D trajectories of players contribute to high-level game analysis such as tactic analysis and commercial applications such as TV contents. Thus real-time implementation for 3D players tracking is expected. In order to achieve real-time for 60fps videos with high accuracy, (that means the processing time should be less than 16.67ms per frame), the factors that limit the processing time of target algorithm include: 1) Large image area of each player. 2) Repeated processing of multiple players in multiple views. 3) Complex calculation of observation algorithm. To deal with the above challenges, this paper proposes a representative spatial selection and temporal combination based real-time implementation for multi-view volleyball players tracking on the GPU device. First, the representative spatial pixel selection, which detects the pixels that mostly represent one image region to scale down the image spatially, reduces the number of processing pixels. Second, the representative temporal likelihood combination shares observation calculation by using the temporal correlation between images so that the times of complex calculation is reduced. The experiments are based on videos of the Final and Semi-Final Game of 2014 Japan Inter High School Games of Men's Volleyball in Tokyo Metropolitan Gymnasium. On the GPU device GeForce GTX 1080Ti, the tracking system achieves real-time on 60fps videos and keeps the tracking accuracy higher than 97%.

  • Interworking Layer of Distributed MQTT Brokers

    Ryohei BANNO  Jingyu SUN  Susumu TAKEUCHI  Kazuyuki SHUDO  

     
    PAPER-Information Network

      Pubricized:
    2019/07/30
      Vol:
    E102-D No:12
      Page(s):
    2281-2294

    MQTT is one of the promising protocols for various data exchange in IoT environments. Typically, those environments have a characteristic called “edge-heavy”, which means that things at the network edge generate a massive volume of data with high locality. For handling such edge-heavy data, an architecture of placing multiple MQTT brokers at the network edges and making them cooperate with each other is quite effective. It can provide higher throughput and lower latency, as well as reducing consumption of cloud resources. However, under this kind of architecture, heterogeneity could be a vital issue. Namely, an appropriate product of MQTT broker could vary according to the different environment of each network edge, even though different products are hard to cooperate due to the MQTT specification providing no interoperability between brokers. In this paper, we propose Interworking Layer of Distributed MQTT brokers (ILDM), which enables arbitrary kinds of MQTT brokers to cooperate with each other. ILDM, designed as a generic mechanism independent of any specific cooperation algorithm, provides APIs to facilitate development of a variety of algorithms. By using the APIs, we also present two basic cooperation algorithms. To evaluate the usefulness of ILDM, we introduce a benchmark system which can be used for both a single broker and multiple brokers. Experimental results show that the throughput of five brokers running together by ILDM is improved 4.3 times at maximum than that of a single broker.

  • A Lightweight Method to Evaluate Effect of Approximate Memory with Hardware Performance Monitors

    Soramichi AKIYAMA  

     
    PAPER-Computer System

      Pubricized:
    2019/09/02
      Vol:
    E102-D No:12
      Page(s):
    2354-2365

    The latency and the energy consumption of DRAM are serious concerns because (1) the latency has not improved much for decades and (2) recent machines have huge capacity of main memory. Device-level studies reduce them by shortening the wait time of DRAM internal operations so that they finish fast and consume less energy. Applying these techniques aggressively to achieve approximate memory is a promising direction to further reduce the overhead, given that many data-center applications today are to some extent robust to bit-flips. To advance research on approximate memory, it is required to evaluate its effect to applications so that both researchers and potential users of approximate memory can investigate how it affects realistic applications. However, hardware simulators are too slow to run workloads repeatedly with different parameters. To this end, we propose a lightweight method to evaluate effect of approximate memory. The idea is to count the number of DRAM internal operations that occur to approximate data of applications and calculate the probability of bit-flips based on it, instead of using heavy-weight simulators. The evaluation shows that our system is 3 orders of magnitude faster than cycle accurate simulators, and we also give case studies of evaluating effect of approximate memory to some realistic applications.

  • Accelerating the Smith-Waterman Algorithm Using the Bitwise Parallel Bulk Computation Technique on the GPU

    Takahiro NISHIMURA  Jacir Luiz BORDIM  Yasuaki ITO  Koji NAKANO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2019/07/09
      Vol:
    E102-D No:12
      Page(s):
    2400-2408

    The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution of this work is to present a Bitwise Parallel Bulk Computation (BPBC) to accelerate the Smith-Waterman Algorithm (SWA) using the affine gap penalty. Thus, our idea is to convert this computation into a circuit simulation using the BPBC technique to compute multiple instances simultaneously. The proposed BPBC technique for the SWA has been implemented on the GPU and CPU. Experimental results show that the proposed BPBC for the SWA accelerates the computation by over 646 times as compared to a single CPU implementation and by 6.9 times as compared to a multi-core CPU implementation with 160 threads.

  • Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition

    Xiuzhen CHEN  Xiaoyan ZHOU  Cheng LU  Yuan ZONG  Wenming ZHENG  Chuangao TANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/08/26
      Vol:
    E102-D No:12
      Page(s):
    2632-2636

    For cross-corpus speech emotion recognition (SER), how to obtain effective feature representation for the discrepancy elimination of feature distributions between source and target domains is a crucial issue. In this paper, we propose a Target-adapted Subspace Learning (TaSL) method for cross-corpus SER. The TaSL method trys to find a projection subspace, where the feature regress the label more accurately and the gap of feature distributions in target and source domains is bridged effectively. Then, in order to obtain more optimal projection matrix, ℓ1 norm and ℓ2,1 norm penalty terms are added to different regularization terms, respectively. Finally, we conduct extensive experiments on three public corpuses, EmoDB, eNTERFACE and AFEW 4.0. The experimental results show that our proposed method can achieve better performance compared with the state-of-the-art methods in the cross-corpus SER tasks.

  • Simulation Study of Low-Latency Network Model with Orchestrator in MEC Open Access

    Krittin INTHARAWIJITR  Katsuyoshi IIDA  Hiroyuki KOGA  Katsunori YAMAOKA  

     
    PAPER-Network

      Pubricized:
    2019/05/16
      Vol:
    E102-B No:11
      Page(s):
    2139-2150

    Most of latency-sensitive mobile applications depend on computational resources provided by a cloud computing service. The problem of relying on cloud computing is that, sometimes, the physical locations of cloud servers are distant from mobile users and the communication latency is long. As a result, the concept of distributed cloud service, called mobile edge computing (MEC), is being introduced in the 5G network. However, MEC can reduce only the communication latency. The computing latency in MEC must also be considered to satisfy the required total latency of services. In this research, we study the impact of both latencies in MEC architecture with regard to latency-sensitive services. We also consider a centralized model, in which we use a controller to manage flows between users and mobile edge resources to analyze MEC in a practical architecture. Simulations show that the interval and controller latency trigger some blocking and error in the system. However, the permissive system which relaxes latency constraints and chooses an edge server by the lowest total latency can improve the system performance impressively.

  • Performance Improvement of the Catastrophic CPM Scheme with New Split-Merged MNSED

    Richard Hsin-Hsyong YANG  Chia-Kun LEE  Shiunn-Jang CHERN  

     
    PAPER-Transmission Systems and Transmission Equipment for Communications

      Pubricized:
    2019/05/16
      Vol:
    E102-B No:11
      Page(s):
    2091-2103

    Continuous phase modulation (CPM) is a very attractive digital modulation scheme, with constant envelope feature and high efficiency in meeting the power and bandwidth requirements. CPM signals with pairs of input sequences that differ in an infinite number of positions and map into pairs of transmitted signals with finite Euclidean distance (ED) are called catastrophic. In the CPM scheme, data sequences that have the catastrophic property are called the catastrophic sequences; they are periodic difference data patterns. The catastrophic sequences are usually with shorter length of the merger. The corresponding minimum normalized squared ED (MNSED) is smaller and below the distance bound. Two important CPM schemes, viz., LREC and LRC schemes, are known to be catastrophic for most cases; they have poor overall power and bandwidth performance. In the literatures, it has been shown that the probability of generating such catastrophic sequences are negligible, therefore, the asymptotic error performance (AEP) of those well-known catastrophic CPM schemes evaluated with the corresponding MNSED, over AWGN channels, might be too negative or pessimistic. To deal with this problem in AWGN channel, this paper presents a new split-merged MNSED and provide criteria to explore which conventional catastrophic CPM scheme could increase the length of mergers with split-merged non-periodic events, effectively. For comparison, we investigate the exact power and bandwidth performance for LREC and LRC CPM for the same bandwidth occupancy. Computer simulation results verify that the AEP evaluating with the split-merged MNSED could achieve up to 3dB gain over the conventional approach.

  • Parameter Estimation of Fractional Bandlimited LFM Signals Based on Orthogonal Matching Pursuit Open Access

    Xiaomin LI  Huali WANG  Zhangkai LUO  

     
    PAPER-Digital Signal Processing

      Vol:
    E102-A No:11
      Page(s):
    1448-1456

    Parameter estimation theorems for LFM signals have been developed due to the advantages of fractional Fourier transform (FrFT). The traditional estimation methods in the fractional Fourier domain (FrFD) are almost based on two-dimensional search which have the contradiction between estimation performance and complexity. In order to solve this problem, we introduce the orthogonal matching pursuit (OMP) into the FrFD, propose a modified optimization method to estimate initial frequency and final frequency of fractional bandlimited LFM signals. In this algorithm, the differentiation fractional spectrum which is used to form observation matrix in OMP is derived from the spectrum analytical formulations of the LFM signal, and then, based on that the LFM signal has approximate rectangular spectrum in the FrFD and the correlation between the LFM signal and observation matrix yields a maximal value at the edge of the spectrum (see Sect.3.3 for details), the edge spectrum information can be extracted by OMP. Finally, the estimations of initial frequency and final frequency are obtained through multiplying the edge information by the sampling frequency resolution. The proposed method avoids reconstruction and the traditional peak-searching procedure, and the iterations are needed only twice. Thus, the computational complexity is much lower than that of the existing methods. Meanwhile, Since the vectors at the initial frequency and final frequency points both have larger modulus, so that the estimations are closer to the actual values, better normalized root mean squared error (NRMSE) performance can be achieved. Both theoretical analysis and simulation results demonstrate that the proposed algorithm bears a relatively low complexity and its estimation precision is higher than search-based and reconstruction-based algorithms.

  • Amplification Characteristics of a Phase-Sensitive Amplifier of a Chirped Optical Pulse

    Kyo INOUE  

     
    PAPER-Lasers, Quantum Electronics

      Pubricized:
    2019/06/07
      Vol:
    E102-C No:11
      Page(s):
    818-824

    Phase-sensitive amplification (PSA) has unique properties, such as the quantum-limited noise figure of 0 dB and the phase clamping effect. This study investigates PSA characteristics when a chirped pulse is incident. The signal gain, the output waveform, and the noise figure for an optical pulse having been chirped through chromatic dispersion or self-phase modulation before amplification are analyzed. The results indicate that the amplification properties for a chirped pulse are different from those of a non-chirped pulse, such that the signal gain is small, the waveform is distorted, and the noise figure is degraded.

  • Optimized Charge Pump and Nonlinear Phase Frequency Detector for a Ka-Band Phase-Locked Loop in 90-nm CMOS Process

    Lu TANG  Zhigong WANG  Tiantian FAN  Faen LIU  Changchun ZHANG  

     
    PAPER-Electronic Circuits

      Pubricized:
    2019/06/07
      Vol:
    E102-C No:11
      Page(s):
    825-832

    In this paper, an improved charge pump (CP) and a modified nonlinear phase frequency detector (PFD) are designed and fabricated in a 90-nm CMOS process. The CP is optimized with a combination of circuit techniques such as pedestal error cancel scheme to eliminate the charge injection and the other non-ideal characteristics. The nonlinear PFD is based on a modified circuit topology to enhance the acquisition capability of the PLL. The optimized CP and nonlinear PFD are integrated into a Ka-band PLL. The measured output current mismatch ratio of the improved CP is less than 1% when the output voltage Vout fluctuates between 0.2 to 1.1V from a 1.2V power supply. The measured phase error detection range of the modified nonlinear PFD is between -2π and 2π. Owing to the modified CP and PFD, the measured reference spur of the Ka-band PLL frequency synthesizer containing the optimized CP and PFD is only -56.409dBc at 30-GHz at the locked state.

  • NP-Completeness of Fill-a-Pix and ΣP2-Completeness of Its Fewest Clues Problem

    Yuta HIGUCHI  Kei KIMURA  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E102-A No:11
      Page(s):
    1490-1496

    Fill-a-Pix is a pencil-and-paper puzzle, which is popular worldwide since announced by Conceptis in 2003. It provides a rectangular grid of squares that must be filled in to create a picture. Precisely, we are given a rectangular grid of squares some of which has an integer from 0 to 9 in it, and our task is to paint some squares black so that every square with an integer has the same number of painted squares around it including the square itself. Despite its popularity, computational complexity of Fill-a-Pix has not been known. We in this paper show that the puzzle is NP-complete, ASP-complete, and #P-complete via a parsimonious reduction from the Boolean satisfiability problem. We also consider the fewest clues problem of Fill-a-Pix, where the fewest clues problem is recently introduced by Demaine et al. for analyzing computational complexity of designing “good” puzzles. We show that the fewest clues problem of Fill-a-Pix is Σ2P-complete.

  • Enhanced Selected Mapping for Impulsive Noise Blanking in Multi-Carrier Power-Line Communication Systems Open Access

    Tomoya KAGEYAMA  Osamu MUTA  Haris GACANIN  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2019/05/16
      Vol:
    E102-B No:11
      Page(s):
    2174-2182

    In this paper, we propose an enhanced selected mapping (e-SLM) technique to improve the performance of OFDM-PLC systems under impulsive noise. At the transmitter, the best transmit sequence is selected from among possible candidates so as to minimize the weighted sum of transmit signal peak power and the estimated receive one, where the received signal peak power is estimated at the transmitter using channel state information (CSI). At the receiver, a nonlinear blanking is applied to hold the impulsive noise under a given threshold, where impulsive noise detection accuracy is improved by the proposed e-SLM. We evaluate the probability of false alarms raised by impulsive noise detection and bit error rate (BER) of OFDM-PLC system using the proposed e-SLM. The results show the effectiveness of the proposed method in OFDM-PLC system compared with the conventional blanking technique.

  • Accelerating Stochastic Simulations on GPUs Using OpenCL

    Pilsung KANG  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2019/07/23
      Vol:
    E102-D No:11
      Page(s):
    2253-2256

    Since first introduced in 2008 with the 1.0 specification, OpenCL has steadily evolved over the decade to increase its support for heterogeneous parallel systems. In this paper, we accelerate stochastic simulation of biochemical reaction networks on modern GPUs (graphics processing units) by means of the OpenCL programming language. In implementing the OpenCL version of the stochastic simulation algorithm, we carefully apply its data-parallel execution model to optimize the performance provided by the underlying hardware parallelism of the modern GPUs. To evaluate our OpenCL implementation of the stochastic simulation algorithm, we perform a comparative analysis in terms of the performance using the CPU-based cluster implementation and the NVidia CUDA implementation. In addition to the initial report on the performance of OpenCL on GPUs, we also discuss applicability and programmability of OpenCL in the context of GPU-based scientific computing.

  • ORRIS: Throughput Optimization for Backscatter Link on Physical and MAC Layers

    Jumin ZHAO  Yanxia LI  Dengao LI  Hao WU  Biaokai ZHU  

     
    PAPER-Multimedia Systems for Communications

      Pubricized:
    2019/04/05
      Vol:
    E102-B No:10
      Page(s):
    2082-2090

    Unlike Radio Frequency Identification (RFID), emerging Computational RFID (CRFID) integrates the RF front-end and MCU with multiple sensors. CRFIDs need to transmit data within the interrogator range, so when the tags moved rapidly or the contact duration with interrogator is limited, the sensor data collected by CRFID must be transferred to interrogator quickly. In this paper, we focus on throughput optimization for backscatter link, take physical and medium access control (MAC) layers both into consideration, put forward our scheme called ORRIS. On physical layer, we propose Cluster Gather Degree (CGD) indicator, which is the clustering degree of signal in IQ domain. Then CGD is regarded as the criterion to adaptively adjust the rate encoding mode and link frequency, accordingly achieve adaptive rate transmission. On MAC layer, based on the idea of asynchronous transfer, we utilize the the number of clusters in IQ domain to select the optimal Q value as much as possible. So that achieve burst transmission or bulk data transmission. Experiments and analyses on the static and mobile scenarios show that our proposal has significantly better mean throughput than BLINK or CARA, which demonstrate the effectiveness of our scheme.

  • Analysis of Relevant Quality Metrics and Physical Parameters in Softness Perception and Assessment System

    Zhiyu SHAO  Juan WU  Qiangqiang OUYANG  

     
    PAPER-Rehabilitation Engineering and Assistive Technology

      Pubricized:
    2019/06/11
      Vol:
    E102-D No:10
      Page(s):
    2013-2024

    Many quality metrics have been proposed for the compliance perception to assess haptic device performance and perceived results. Perceived compliance may be influenced by factors such as object properties, experimental conditions and human perceptual habits. In this paper, analysis of softness perception was conducted to find out relevant quality metrics dominating in the compliance perception system and their correlation with perception results, by expressing these metrics by basic physical parameters that characterizing these factors. Based on three psychophysical experiments, just noticeable differences (JNDs) for perceived softness of combination of different stiffness coefficients and damping levels rendered by haptic devices were analyzed. Interaction data during the interaction process were recorded and analyzed. Preliminary experimental results show that the discrimination ability of softness perception changes with the ratio of damping to stiffness when subjects exploring at their habitual speed. Analysis results indicate that quality metrics of Rate-hardness, Extended Rate-hardness and ratio of damping to stiffness have high correlation for perceived results. Further analysis results show that parameters that reflecting object properties (stiffness, damping), experimental conditions (force bandwidth) and human perceptual habits (initial speed, maximum force change rate) lead to the change of these quality metrics, which then bring different perceptual feeling and finally result in the change of discrimination ability. Findings in this paper may provide a better understanding of softness perception and useful guidance in improvement of haptic and teleoperation devices.

  • A Taxonomy of Secure Two-Party Comparison Protocols and Efficient Constructions

    Nuttapong ATTRAPADUNG  Goichiro HANAOKA  Shinsaku KIYOMOTO  Tomoaki MIMOTO  Jacob C. N. SCHULDT  

     
    PAPER-Cryptography and Information Security

      Vol:
    E102-A No:9
      Page(s):
    1048-1060

    Secure two-party comparison plays a crucial role in many privacy-preserving applications, such as privacy-preserving data mining and machine learning. In particular, the available comparison protocols with the appropriate input/output configuration have a significant impact on the performance of these applications. In this paper, we firstly describe a taxonomy of secure two-party comparison protocols which allows us to describe the different configurations used for these protocols in a systematic manner. This taxonomy leads to a total of 216 types of comparison protocols. We then describe conversions among these types. While these conversions are based on known techniques and have explicitly or implicitly been considered previously, we show that a combination of these conversion techniques can be used to convert a perhaps less-known two-party comparison protocol by Nergiz et al. (IEEE SocialCom 2010) into a very efficient protocol in a configuration where the two parties hold shares of the values being compared, and obtain a share of the comparison result. This setting is often used in multi-party computation protocols, and hence in many privacy-preserving applications as well. We furthermore implement the protocol and measure its performance. Our measurement suggests that the protocol outperforms the previously proposed protocols for this input/output configuration, when off-line pre-computation is not permitted.

341-360hit(3318hit)