The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PU(3318hit)

621-640hit(3318hit)

  • Design, Analysis and Implementation of Pulse Generator by CMOS Flipped on Glass for Low Power UWB-IR

    Parit KANJANAVIROJKUL  Nguyen NGOC MAI-KHANH  Tetsuya IIZUKA  Toru NAKURA  Kunihiro ASADA  

     
    PAPER

      Vol:
    E100-A No:1
      Page(s):
    200-209

    This paper discusses a pulse generator implemented by CMOS flipped on a glass substrate aiming at low power applications with low duty cycle. The pulse generator is theoretically possible to generate a pulse at a frequency near and beyond Fmax. It also features a quick starting time and zero stand-by power. By using a simplified circuit model, analytical expressions for Q factor, energy conversion efficiency, output energy, and oscillation frequency of the pulse generator are derived. Pulse generator prototypes are designed on a 0.18 μm CMOS chip flipped over a transmission line resonator on a glass substrate. Measurement results of two different prototypes confirm the feasibility of the proposed circuit and the analytical model.

  • Multi-Divisible On-Line/Off-Line Encryptions

    Dan YAMAMOTO  Wakaha OGATA  

     
    PAPER

      Vol:
    E100-A No:1
      Page(s):
    91-102

    We present a new notion of public-key encryption, called multi-divisible on-line/off-line encryptions, in which partial ciphertexts can be computed and made publicly available for the recipients before the recipients' public key and/or the plaintexts are determined. We formalize its syntax and define several security notions with regard to the level of divisibility, the number of users, and the number of encryption (challenge) queries per user. Furthermore, we show implications and separations between these security notions and classify them into three categories. We also present concrete multi-divisible on-line/off-line encryption schemes. The schemes allow the computationally-restricted and/or bandwidth-restricted devices to transmit ciphertexts with low computational overhead and/or low-bandwidth network.

  • Computationally Secure Verifiable Secret Sharing Scheme for Distributing Many Secrets

    Wakaha OGATA  Toshinori ARAKI  

     
    PAPER

      Vol:
    E100-A No:1
      Page(s):
    103-114

    Many researchers studied computationally-secure (verifiable) secret sharing schemes which distribute multiple secrets with a bulletin board. However, the security definition is ambiguous in many of the past articles. In this paper, we first review existing schemes based on formal definitions of indistinguishability of secrets, verifiability of consistency, and cheater-detectability. And then, we propose a new secret sharing scheme which is the first scheme with indistinguishability of secrets, verifiability, and cheater-detectability, and allows to share secrets with arbitrary access structures. Further, our scheme is provably secure under well known computational assumptions.

  • A 8 Phases 192MHz Crystal-Less Clock Generator with PVT Calibration

    Ting-Chou LU  Ming-Dou KER  Hsiao-Wen ZAN  Jen-Chieh LIU  Yu LEE  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E100-A No:1
      Page(s):
    275-282

    A multi-phase crystal-less clock generator (MPCLCG) with a process-voltage-temperature (PVT) calibration circuit is proposed. It operates at 192 MHz with 8 phases outputs, and is implemented as a 0.18µm CMOS process for digital power management systems. A temperature calibrated circuit is proposed to align operational frequency under process and supply voltage variations. It occupies an area of 65µm × 75µm and consumes 1.1mW with the power supply of 1.8V. Temperature coefficient (TC) is 69.5ppm/°C from 0 to 100°C, and 2-point calibration is applied to calibrate PVT variation. The measured period jitter is a 4.58-ps RMS jitter and a 34.55-ps peak-to-peak jitter (P2P jitter) at 192MHz within 12.67k-hits. At 192MHz, it shows a 1-MHz-offset phase noise of -102dBc/Hz. Phase to phase errors and duty cycle errors are less than 5.5% and 4.3%, respectively.

  • A Novel Compressed Sensing-Based Channel Estimation Method for OFDM System

    Liping XIAO  Zhibo LIANG  Kai LIU  

     
    LETTER-Communication Theory and Signals

      Vol:
    E100-A No:1
      Page(s):
    322-326

    Mutipath matching pursuit (MMP) is a new reconstruction algorithm based on compressed sensing (CS). In this letter, we applied the MMP algorithm to channel estimation in orthogonal frequency division multiplexing (OFDM) communication systems, and then proposed an improved MMP algorithm. The improved method adjusted the number of children generated by candidates. It can greatly reduce the complexity. The simulation results demonstrate that the improved method can reduce the running time under the premise of guaranteeing the performance of channel estimation.

  • Efficient Search for High-Rate Punctured Convolutional Codes Using Dual Codes

    Sen MORIYA  Kana KIKUCHI  Hiroshi SASANO  

     
    PAPER-Coding Theory and Techniques

      Vol:
    E99-A No:12
      Page(s):
    2162-2169

    In this study, we consider techniques to search for high-rate punctured convolutional code (PCC) encoders using dual code encoders. A low-rate R=1/n convolutional code (CC) has a dual code that is identical to a PCC with rate R=(n-1)/n. This implies that a rate R=1/n convolutional code encoder can assist in searches for high-rate PCC encoders. On the other hand, we can derive a rate R=1/n CC encoder from good PCC encoders with rate R=(n-1)/n using dual code encoders. This paper proposes a method to obtain improved high-rate PCC encoders, using exhaustive search results of PCC encoders with rate R=1/3 original encoders, and dual code encoders. We also show some PCC encoders obtained by searches that utilized our method.

  • On the Computational Complexity of the Linear Solvability of Information Flow Problems with Hierarchy Constraint

    Yuki TAKEDA  Yuichi KAJI  Minoru ITO  

     
    PAPER-Networks and Network Coding

      Vol:
    E99-A No:12
      Page(s):
    2211-2217

    An information flow problem is a graph-theoretical formalization of the transportation of information over a complicated network. It is known that a linear network code plays an essential role in a certain type of information flow problems, but it is not understood clearly how contributing linear network codes are for other types of information flow problems. One basic problem concerning this aspect is the linear solvability of information flow problems, which is to decide if there is a linear network code that is a solution to the given information flow problem. Lehman et al. characterize the linear solvability of information flow problems in terms of constraints on the sets of source and sink nodes. As an extension of Lehman's investigation, this study introduces a hierarchy constraint of messages, and discusses the computational complexity of the linear solvability of information flow problems with the hierarchy constraints. Nine classes of problems are newly defined, and classified to one of three categories that were discovered by Lehman et al.

  • Reliability-Security Tradeoff for Secure Transmission with Untrusted Relays

    Dechuan CHEN  Weiwei YANG  Jianwei HU  Yueming CAI  Xin LIU  

     
    LETTER-Communication Theory and Signals

      Vol:
    E99-A No:12
      Page(s):
    2597-2599

    In this paper, we identify the tradeoff between security and reliability in the amplify-and-forward (AF) distributed beamforming (DBF) cooperative network with K untrusted relays. In particular, we derive the closed-form expressions for the connection outage probability (COP), the secrecy outage probability (SOP), the tradeoff relationship, and the secrecy throughput. Analytical and simulation results demonstrate that increasing K leads to the enhancement of the reliability performance, but the degradation of the security performance. This tradeoff also means that there exists an optimal K maximizing the secrecy throughput.

  • Two-Level Popularity-Oriented Cache Replacement Policy for Video Delivery over CCN

    Haipeng LI  Hidenori NAKAZATO  

     
    PAPER

      Vol:
    E99-B No:12
      Page(s):
    2532-2540

    We introduce a novel cache replacement policy to improve the entire network performance of video delivery over content-centric networking (CCN). In the case of the CCN structure, we argue that: 1) for video multiplexing scenario, general cache strategies that ignore the intrinsic linear time characteristic of video requests are unable to make better use of the cache resources, and 2) it is inadequate to simply extend the existing research conclusions of file-oriented popularity to chunk-by-chunk popularity, which are widely used in CCN. Unlike previous works in this field, the proposed policy in this study, named two-level popularity-oriented time-to-hold cache replacement policy (TLP-TTH), is designed on the basis of the following principles. Firstly, the proposed cache replacement strategy is customized for video delivery by carefully considering the essential auto-correlated request feature of video chunks within a video file. Furthermore, the popularity in video delivery is subdivided into two levels, namely chunk-level access probability and file-level popularity, in order to efficiently utilize cache resources. We evaluated the proposed policy in both a hierarchical topology and a real network based hybrid topology, and took viewers departure into consideration as well. The results validate that for video delivery over CCN, TLP-TTH policy improves the network performance from several aspects. In particular, we observed that the proposed policy not only increases the cache hit ratio at the edge of the network but the cache utilization at the intermediate routers is also improved markedly. Further, with respect to the video popularity variation scenario, the cache hit ratio of TLP-TTH policy responds sensitively to maintain efficient cache utilization.

  • A Highly Efficient Switched-Capacitor Voltage Boost Converter with Nano-Watt MPPT Controller for Low-Voltage Energy Harvesting

    Toshihiro OZAKI  Tetsuya HIROSE  Takahiro NAGAI  Keishi TSUBAKI  Nobutaka KUROKI  Masahiro NUMA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2491-2499

    This paper presents a fully integrated voltage boost converter consisting of a charge pump (CP) and maximum power point tracking (MPPT) controller for ultra-low power energy harvesting. The converter is based on a conventional CP circuit and can deliver a wide range of load current by using nMOS and pMOS driver circuits for highly efficient charge transfer operation. The MPPT controller we propose dissipates nano-watt power to extract maximum power regardless of the harvester's power generation conditions and load current. The measurement results demonstrated that the circuit converted a 0.49-V input to a 1.46-V output with 73% power conversion efficiency when the output power was 348µW. The circuit can operate at an extremely low input voltage of 0.21V.

  • GPU-Accelerated Bulk Execution of Multiple-Length Multiplication with Warp-Synchronous Programming Technique

    Takumi HONDA  Yasuaki ITO  Koji NAKANO  

     
    PAPER-GPU computing

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    3004-3012

    In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt a warp-synchronous programming technique. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize execution of threads and communicate within threads. In warp-synchronous programming technique, however, execution of threads in a warp can be synchronized instruction by instruction without any barrier synchronous operations. Also, inter-thread communication can be performed by warp shuffle functions without accessing shared memory. The experimental results show that our GPU implementation on NVIDIA GeForce GTX 980 attains a speed-up factor of 52 for 1024-bit multiple-length multiplication over the sequential CPU implementation. Moreover, we use this 1024-bit multiple-length multiplication for larger size of bits as a sub-routine. The GPU implementation attains a speed-up factor of 21 for 65536-bit multiple-length multiplication.

  • Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

    Yuechao LU  Fumihiko INO  Kenichi HAGIHARA  

     
    PAPER-Computer System

      Pubricized:
    2016/09/05
      Vol:
    E99-D No:12
      Page(s):
    3060-3071

    This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 20483-voxel volume from 1200 20482-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.

  • Asymptotic Optimality of QPSK Faster-than-Nyquist Signaling in Massive MIMO Systems

    Keigo TAKEUCHI  

     
    PAPER-Communication Theory and Systems

      Vol:
    E99-A No:12
      Page(s):
    2192-2201

    Faster-than-Nyquist (FTN) signaling is investigated for quasi-static flat fading massive multiple-input multiple-output (MIMO) systems. In FTN signaling, pulse trains are sent at a symbol rate higher than the Nyquist rate to increase the transmission rate. As a result, inter-symbol interference occurs inevitably for flat fading channels. This paper assesses the information-theoretically achievable rate of MIMO FTN signaling based on the optimum joint equalization and multiuser detection. The replica method developed in statistical physics is used to evaluate the achievable rate in the large-system limit, where the dimensions of input and output signals tend to infinity at the same rate. An analytical expression of the achievable rate is derived for general modulation schemes in the large-system limit. It is shown that FTN signaling does not improve the channel capacity of massive MIMO systems, and that FTN signaling with quadrature phase-shift keying achieves the channel capacity for all signal-to-noise ratios as the symbol period tends to zero.

  • Comparing Performance of Hierarchical Identity-Based Signature Schemes

    Peixin CHEN  Yilun WU  Jinshu SU  Xiaofeng WANG  

     
    LETTER-Information Network

      Pubricized:
    2016/09/01
      Vol:
    E99-D No:12
      Page(s):
    3181-3184

    The key escrow problem and high computational cost are the two major problems that hinder the wider adoption of hierarchical identity-based signature (HIBS) scheme. HIBS schemes with either escrow-free (EF) or online/offline (OO) model have been proved secure in our previous work. However, there is no much EF or OO scheme that has been evaluated experimentally. In this letter, several EF/OO HIBS schemes are considered. We study the algorithmic complexity of the schemes both theoretically and experimentally. Scheme performance and practicability of EF and OO models are discussed.

  • Blind Identification of Multichannel Systems Based on Sparse Bayesian Learning

    Kai ZHANG  Hongyi YU  Yunpeng HU  Zhixiang SHEN  Siyu TAO  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2016/06/28
      Vol:
    E99-B No:12
      Page(s):
    2614-2622

    Reliable wireless communication often requires accurate knowledge of the underlying multipath channels. Numerous measurement campaigns have shown that physical multipath channels tend to exhibit a sparse structure. Conventional blind channel identification (BCI) strategies such as the least squares, which are known to be optimal under the assumption of rich multipath channels, are ill-suited to exploiting the inherent sparse nature of multipath channels. Recently, l1-norm regularized least-squares-type approaches have been proposed to address this problem with a single parameter governing all coefficients, which is equivalent to maximum a posteriori probability estimation with a Laplacian prior for the channel coefficients. Since Laplace prior is not conjugate to the Gaussian likelihood, no closed form of Bayesian inference is possible. Following a different approach, this paper deals with blind channel identification of a single-input multiple-output (SIMO) system based on sparse Bayesian learning (SBL). The inherent sparse nature of wireless multipath channels is exploited by incorporating a transformative cross relation formulation into a general Bayesian framework, in which the filter coefficients are governed by independent scalar parameters. A fast iterative Bayesian inference method is then applied to the proposed model for obtaining sparse solutions, which completely eliminates the need for computationally costly parameter fine tuning, which is necessary in the l1-norm regularization method. Simulation results are provided to demonstrate the superior effectiveness of the proposed channel estimation algorithm over the conventional least squares (LS) scheme as well as the l1-norm regularization method. It is shown that the proposed algorithm exhibits superior estimation performance compared to both LS and l1-norm regularization methods.

  • An Inductive Method to Select Simulation Points

    MinSeong CHOI  Takashi FUKUDA  Masahiro GOSHIMA  Shuichi SAKAI  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2891-2900

    The time taken for processor simulation can be drastically reduced by selecting simulation points, which are dynamic sections obtained from the simulation result of processors. The overall behavior of the program can be estimated by simulating only these sections. The existing methods to select simulation points, such as SimPoint, used for selecting simulation points are deductive and based on the idea that dynamic sections executing the same static section of the program are of the same phase. However, there are counterexamples for this idea. This paper proposes an inductive method, which selects simulation points from the results obtained by pre-simulating several processors with distinctive microarchitectures, based on assumption that sections in which all the distinctive processors have similar istructions per cycle (IPC) values are of the same phase. We evaluated the first 100G instructions of SPEC 2006 programs. Our method achieved an IPC estimation error of approximately 0.1% by simulating approximately 0.05% of the 100G instructions.

  • A Memory-Access-Efficient Implementation for Computing the Approximate String Matching Algorithm on GPUs

    Lucas Saad Nogueira NUNES  Jacir Luiz BORDIM  Yasuaki ITO  Koji NAKANO  

     
    PAPER-GPU computing

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2995-3003

    The closeness of a match is an important measure with a number of practical applications, including computational biology, signal processing and text retrieval. The approximate string matching (ASM) problem asks to find a substring of string Y of length n that is most similar to string X of length m. It is well-know that the ASM can be solved by dynamic programming technique by computing a table of size m×n. The main contribution of this work is to present a memory-access-efficient implementation for computing the ASM on a GPU. The proposed GPU implementation relies on warp shuffle instructions which are used to accelerate the communication between threads without resorting to shared memory access. Despite the fact that O(mn) memory access operations are necessary to access all elements of a table with size n×m, the proposed implementation performs only $O( rac{mn}{w})$ memory access operations, where w is the warp size. Experimental results carried out on a GeForce GTX 980 GPU show that the proposed implementation, called w-SCAN, provides speed-up of over two fold in computing the ASM as compared to another prominent alternative.

  • Fully Parallelized LZW Decompression for CUDA-Enabled GPUs

    Shunji FUNASAKA  Koji NAKANO  Yasuaki ITO  

     
    PAPER-GPU computing

      Pubricized:
    2016/08/25
      Vol:
    E99-D No:12
      Page(s):
    2986-2994

    The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompression algorithm on the CREW-PRAM (Concurrent-Read Exclusive-Write Parallel Random Access Machine), which is a standard theoretical parallel computing model with a shared memory. We then go on to present an efficient implementation of this parallel algorithm on a GPU. The experimental results show that our GPU implementation performs LZW decompression in 1.15 milliseconds for a gray scale TIFF image with 4096×3072 pixels stored in the global memory of GeForce GTX 980. On the other hand, sequential LZW decompression for the same image stored in the main memory of Intel Core i7 CPU takes 50.1 milliseconds. Thus, our parallel LZW decompression on the global memory of the GPU is 43.6 times faster than a sequential LZW decompression on the main memory of the CPU for this image. To show the applicability of our GPU implementation for LZW decompression, we evaluated the SSD-GPU data loading time for three scenarios. The experimental results show that the scenario using our LZW decompression on the GPU is faster than the others.

  • A 60mV-3V Wide-Input-Voltage-Range Boost Converter with Amplitude-Regulated Oscillator for Energy Harvesting

    Hiroyuki NAKAMOTO  Hong GAO  Hiroshi YAMAZAKI  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2483-2490

    This paper presents a wide-input-voltage-range and high-efficiency boost converter that is assisted by a transformer-based oscillator. The oscillator can provide a sufficient amount of power to drive a following switched-inductor boost converter at low voltages. Moreover, it adopts a novel amplitude-regulation circuit (ARC) without using high power-consuming protective devices to suppress the expansion of the oscillation amplitude at high input voltages. Therefore, it can avoid over-voltage problems without sacrificing the power efficiency. Additionally, a power-down circuit (PDC) is implemented to turn off the oscillator, when the boost converter can be driven by its own output power, thus, eliminating the power consumption by the oscillator and improving the power efficiency. We implemented the ARC and the PDC with discrete components rather than one-chip integration for the proof of concept. The experimental results showed that the proposed circuit became possible to operate from an input voltage of 60mV to 3V while maintaining high peak efficiency up to 92%. To the best of our knowledge, this converter provides a wider input range in comparison with the previously-published converters. We are convinced that the proposed approach by inserting an appropriate start-up circuit in a commercial converter will be effective for rapid design proposals in order to respond promptly to customer needs as Internet of things (IoT) devices with energy harvester.

  • Performance Optimization of Light-Field Applications on GPU

    Yuttakon YUTTAKONKIT  Shinya TAKAMAEDA-YAMAZAKI  Yasuhiko NAKASHIMA  

     
    PAPER-Computer System

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    3072-3081

    Light-field image processing has been widely employed in many areas, from mobile devices to manufacturing applications. The fundamental process to extract the usable information requires significant computation with high-resolution raw image data. A graphics processing unit (GPU) is used to exploit the data parallelism as in general image processing applications. However, the sparse memory access pattern of the applications reduced the performance of GPU devices for both systematic and algorithmic reasons. Thus, we propose an optimization technique which redesigns the memory access pattern of the applications to alleviate the memory bottleneck of rendering application and to increase the data reusability for depth extraction application. We evaluated our optimized implementations with the state-of-the-art algorithm implementations on several GPUs where all implementations were optimally configured for each specific device. Our proposed optimization increased the performance of rendering application on GTX-780 GPU by 30% and depth extraction application on GTX-780 and GTX-980 GPUs by 82% and 18%, respectively, compared with the original implementations.

621-640hit(3318hit)