The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] IDT(386hit)


  • Measuring SET Pulse Widths in pMOSFETs and nMOSFETs Separately by Heavy Ion and Neutron Irradiation Open Access

    Jun FURUTA  Shotaro SUGITANI  Ryuichi NAKAJIMA  Takafumi ITO  Kazutoshi KOBAYASHI  

    PAPER-Semiconductor Materials and Devices

    E107-C No:9

    Radiation-induced temporal errors become a significant issue for circuit reliability. We measured the pulse widths of radiation-induced single event transients (SETs) from pMOSFETs and nMOSFETs separately. Test results show that heavy-ion induced SET rates of nMOSFETs were twice as high as those of pMOSFETs and that neutron-induced SETs occurred only in nMOSFETs. It was confirmed that the SET distribution from inverter chains can be estimated using the SET distribution from pMOSFETs and nMOSFETs by considering the difference in load capacitance of the measurement circuits.

  • On a Spectral Lower Bound of Treewidth

    Tatsuya GIMA  Tesshu HANAKA  Kohei NORO  Hirotaka ONO  Yota OTACHI  


    E107-D No:3

    In this letter, we present a new lower bound for the treewidth of a graph in terms of the second smallest eigenvalue of its Laplacian matrix. Our bound slightly improves the lower bound given by Chandran and Subramanian [Inf. Process. Lett., 87 (2003)].

  • Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV

    Siyi HU  Makiko ITO  Takahide YOSHIKAWA  Yuan HE  Hiroshi NAKAMURA  Masaaki KONDO  


    E106-D No:12

    Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.

  • A New SIDGS-Based Tunable BPF Design Method with Controllable Bandwidth

    Weiyu ZHOU  Koji WADA  

    PAPER-Microwaves, Millimeter-Waves

    E106-C No:10

    This paper provides a new method to implement substrate integrated defected ground structure (SIDGS)-based bandpass filter (BPF) with adjustable frequency and controllable bandwidth. Compared with previous literature, this method implements a new SIDGS-like resonator capable of tunable frequency in the same plane as the slotted line using a varactor diode, increasing the design flexibility. In addition, the method solves the problem that the tunable BPF constituted by the SIDGS resonator cannot control the bandwidth by introducing a T-shaped non-resonant unit. The theoretical design method and the structural design are shown. Moreover, the configured structure is fabricated and measured to show the validity of the design method in this paper.

  • Optimizing Edge-Cloud Cooperation for Machine Learning Accuracy Considering Transmission Latency and Bandwidth Congestion Open Access

    Kengo TAJIRI  Ryoichi KAWAHARA  Yoichi MATSUO  

    PAPER-Network Management/Operation

    E106-B No:9

    Machine learning (ML) has been used for various tasks in network operations in recent years. However, since the scale of networks has grown and the amount of data generated has increased, it has been increasingly difficult for network operators to conduct their tasks with a single server using ML. Thus, ML with edge-cloud cooperation has been attracting attention for efficiently processing and analyzing a large amount of data. In the edge-cloud cooperation setting, although transmission latency, bandwidth congestion, and accuracy of tasks using ML depend on the load balance of processing data with edge servers and a cloud server in edge-cloud cooperation, the relationship is too complex to estimate. In this paper, we focus on monitoring anomalous traffic as an example of ML tasks for network operations and formulate transmission latency, bandwidth congestion, and the accuracy of the task with edge-cloud cooperation considering the ratio of the amount of data preprocessed in edge servers to that in a cloud server. Moreover, we formulate an optimization problem under constraints for transmission latency and bandwidth congestion to select the proper ratio by using our formulation. By solving our optimization problem, the optimal load balance between edge servers and a cloud server can be selected, and the accuracy of anomalous traffic monitoring can be estimated. Our formulation and optimization framework can be used for other ML tasks by considering the generating distribution of data and the type of an ML model. In accordance with our formulation, we simulated the optimal load balance of edge-cloud cooperation in a topology that mimicked a Japanese network and conducted an anomalous traffic detection experiment by using real traffic data to compare the estimated accuracy based on our formulation and the actual accuracy based on the experiment.

  • Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

    Van-Cam NGUYEN  Yasuhiko NAKASHIMA  

    PAPER-Computer System

    E106-D No:6

    Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.

  • A Beam Search Method with Adaptive Beam Width Control Based on Area Size for Initial Access

    Takuto ARAI  Daisei UCHIDA  Tatsuhiko IWAKUNI  Shuki WAI  Naoki KITA  

    PAPER-Wireless Communication Technologies

    E106-B No:4

    High gain antennas with narrow-beamforming are required to compensate for the high propagation loss expected in high frequency bands such as the millimeter wave and sub-terahertz wave bands, which are promising for achieving extremely high speeds and capacity. However using narrow-beamforming for initial access (IA) beam search in all directions incurs an excessive overhead. Using wide-beamforming can reduce the overhead for IA but it also shrinks the coverage area due to the lower beamforming gain. Here, it is assumed that there are some situations in which the required coverage distance differs depending on the direction from the antenna. For example, the distance to an floor for a ceiling-mounted antenna varies depending on the direction, and the distance to the obstruction becomes the required coverage distance for an antenna installation design that assumes line-of-sight. In this paper, we propose a novel IA beam search scheme with adaptive beam width control based on the distance to shield obstacles in each direction. Simulations and experiments show that the proposed method reduces the overhead by 20%-50% without shrinking the coverage area in shield environments compared to exhaustive beam search with narrow-beamforming.

  • Band Characteristics of a Polarization Splitter with Circular Cores and Hollow Pits

    Midori NAGASAKA  Taiki ARAKAWA  Yutaro MOCHIDA  Kazunori KAMEDA  Shinichi FURUKAWA  


    E106-C No:4

    In this study, we discuss a structure that realizes a wideband polarization splitter comprising fiber 1 with a single core and fiber 2 with circular pits, which touch the top and bottom of a single core. The refractive index profile of the W type was adopted in the core of fiber 1 to realize the wideband. We compared the maximum bandwidth of BW-15 (bandwidth at an extinction ratio of -15dB) for the W type obtained in this study with those (our previous results) of BW-15 for the step and graded types with cores and pits at the same location; this comparison clarified that the maximum bandwidth of BW-15 for the W type is 5.22 and 4.96 times wider than those of step and graded types, respectively. Furthermore, the device length at the maximum bandwidth improved, becoming slightly shorter. The main results of the FPS in this study are all obtained by numerical analysis based on our proposed MM-DM (a method that combines the multipole method and the difference method for the inhomogeneous region). Our MM-DM is a quite reliable method for high accuracy analysis of the FPS composed of inhomogeneous circular regions.

  • An Efficient Combined Bit-Width Reducing Method for Ising Models

    Yuta YACHI  Masashi TAWADA  Nozomu TOGAWA  

    PAPER-Fundamentals of Information Systems

    E106-D No:4

    Annealing machines such as quantum annealing machines and semiconductor-based annealing machines have been attracting attention as an efficient computing alternative for solving combinatorial optimization problems. They solve original combinatorial optimization problems by transforming them into a data structure called an Ising model. At that time, the bit-widths of the coefficients of the Ising model have to be kept within the range that an annealing machine can deal with. However, by reducing the Ising-model bit-widths, its minimum energy state, or ground state, may become different from that of the original one, and hence the targeted combinatorial optimization problem cannot be well solved. This paper proposes an effective method for reducing Ising model's bit-widths. The proposed method is composed of two processes: First, given an Ising model with large coefficient bit-widths, the shift method is applied to reduce its bit-widths roughly. Second, the spin-adding method is applied to further reduce its bit-widths to those that annealing machines can deal with. Without adding too many extra spins, we efficiently reduce the coefficient bit-widths of the original Ising model. Furthermore, the ground state before and after reducing the coefficient bit-widths is not much changed in most of the practical cases. Experimental evaluations demonstrate the effectiveness of the proposed method, compared to existing methods.

  • DAG-Pathwidth: Graph Algorithmic Analyses of DAG-Type Blockchain Networks

    Shoji KASAHARA  Jun KAWAHARA  Shin-ichi MINATO  Jumpei MORI  


    E106-D No:3

    This paper analyzes a blockchain network forming a directed acyclic graph (DAG), called a DAG-type blockchain, from the viewpoint of graph algorithm theory. To use a DAG-type blockchain, NP-hard graph optimization problems on the DAG are required to be solved. Although various problems for undirected and directed graphs can be efficiently solved by using the notions of graph parameters, these currently known parameters are meaningless for DAGs, which implies that it is hopeless to design efficient algorithms based on the parameters for such problems. In this work, we propose a novel graph parameter for directed graphs called a DAG-pathwidth, which represents the closeness to a directed path. This is an extension of the pathwidth, a well-known graph parameter for undirected graphs. We analyze the features of the DAG-pathwidth and prove that computing the DAG-pathwidth of a DAG (directed graph in general) is NP-complete. Finally, we propose an efficient algorithm for a variant of the maximum k-independent set problem for the DAG-type blockchain when the DAG-pathwidth of the input graph is small.

  • A Satisfiability Algorithm for Deterministic Width-2 Branching Programs Open Access

    Tomu MAKITA  Atsuki NAGAO  Tatsuki OKADA  Kazuhisa SETO  Junichi TERUYAMA  

    PAPER-Algorithms and Data Structures

    E105-A No:9

    A branching program is a well-studied model of computation and a representation for Boolean functions. It is a directed acyclic graph with a unique root node, some accepting nodes, and some rejecting nodes. Except for the accepting and rejecting nodes, each node has a label with a variable and each outgoing edge of the node has a label with a 0/1 assignment of the variable. The satisfiability problem for branching programs is, given a branching program with n variables and m nodes, to determine if there exists some assignment that activates a consistent path from the root to an accepting node. The width of a branching program is the maximum number of nodes at any level. The satisfiability problem for width-2 branching programs is known to be NP-complete. In this paper, we present a satisfiability algorithm for width-2 branching programs with n variables and cn nodes, and show that its running time is poly(n)·2(1-µ(c))n, where µ(c)=1/2O(c log c). Our algorithm consists of two phases. First, we transform a given width-2 branching program to a set of some structured formulas that consist of AND and Exclusive-OR gates. Then, we check the satisfiability of these formulas by a greedy restriction method depending on the frequency of the occurrence of variables.

  • Artificial Bandwidth Extension for Lower Bandwidth Using Sinusoidal Synthesis based on First Formant Location

    Yuya HOSODA  Arata KAWAMURA  Youji IIGUNI  

    PAPER-Engineering Acoustics

    E105-A No:4

    The narrow bandwidth limitation of 300-3400Hz on the public switching telephone network results in speech quality deterioration. In this paper, we propose an artificial bandwidth extension approach that reconstructs the missing lower bandwidth of 50-300Hz using sinusoidal synthesis based on the first formant location. Sinusoidal synthesis generates sinusoidal waves with a harmonic structure. The proposed method detects the fundamental frequency using an autocorrelation method based on YIN algorithm, where a threshold processing avoids the false fundamental frequency detection on unvoiced sounds. The amplitude of the sinusoidal waves is calculated in the time domain from the weighted energy of 300-600Hz. In this case, since the first formant location corresponds to the first peak of the spectral envelope, we reconstruct the harmonic structure to avoid attenuating and overemphasizing by increasing the weight when the first formant location is lower, and vice versa. Consequently, the subjective and objective evaluations show that the proposed method reduces the speech quality difference between the original speech signal and the bandwidth extended speech signal.

  • An Overflow/Underflow-Free Fixed-Point Bit-Width Optimization Method for OS-ELM Digital Circuit Open Access

    Mineto TSUKADA  Hiroki MATSUTANI  


    E105-A No:3

    Currently there has been increasing demand for real-time training on resource-limited IoT devices such as smart sensors, which realizes standalone online adaptation for streaming data without data transfers to remote servers. OS-ELM (Online Sequential Extreme Learning Machine) has been one of promising neural-network-based online algorithms for on-chip learning because it can perform online training at low computational cost and is easy to implement as a digital circuit. Existing OS-ELM digital circuits employ fixed-point data format and the bit-widths are often manually tuned, however, this may cause overflow or underflow which can lead to unexpected behavior of the circuit. For on-chip learning systems, an overflow/underflow-free design has a great impact since online training is continuously performed and the intervals of intermediate variables will dynamically change as time goes by. In this paper, we propose an overflow/underflow-free bit-width optimization method for fixed-point digital circuits of OS-ELM. Experimental results show that our method realizes overflow/underflow-free OS-ELM digital circuits with 1.0x - 1.5x more area cost compared to the baseline simulation method where overflow or underflow can happen.

  • Low-Power Design Methodology of Voltage Over-Scalable Circuit with Critical Path Isolation and Bit-Width Scaling Open Access



    E105-A No:3

    This work proposes a design methodology that saves the power dissipation under voltage over-scaling (VOS) operation. The key idea of the proposed design methodology is to combine critical path isolation (CPI) and bit-width scaling (BWS) under the constraint of computational quality, e.g., Peak Signal-to-Noise Ratio (PSNR) in the image processing domain. Conventional CPI inherently cannot reduce the delay of intrinsic critical paths (CPs), which may significantly restrict the power saving effect. On the other hand, the proposed methodology tries to reduce both intrinsic and non-intrinsic CPs. Therefore, our design dramatically reduces the supply voltage and power dissipation while satisfying the quality constraint. Moreover, for reducing co-design exploration space, the proposed methodology utilizes the exclusiveness of the paths targeted by CPI and BWS, where CPI aims at reducing the minimum supply voltage of non-intrinsic CP, and BWS focuses on intrinsic CPs in arithmetic units. From this key exclusiveness, the proposed design splits the simultaneous optimization problem into three sub-problems; (1) the determination of bit-width reduction, (2) the timing optimization for non-intrinsic CPs, and (3) investigating the minimum supply voltage of the BWS and CPI-applied circuit under quality constraint, for reducing power dissipation. Thanks to the problem splitting, the proposed methodology can efficiently find quality-constrained minimum-power design. Evaluation results show that CPI and BWS are highly compatible, and they significantly enhance the efficacy of VOS. In a case study of a GPGPU processor, the proposed design saves the power dissipation by 42.7% with an image processing workload and by 51.2% with a neural network inference workload.

  • Recent Progress on High Output Power, High Frequency and Wide Bandwidth GaN Power Amplifiers Open Access

    Masaru SATO  Yoshitaka NIIDA  Atsushi YAMADA  Junji KOTANI  Shiro OZAKI  Toshihiro OHKI  Naoya OKAMOTO  Norikazu NAKAMURA  


    E104-C No:10

    This paper presents recent progress on high frequency and wide bandwidth GaN high power amplifiers (PAs) that are usable for high-data-rate wireless communications and modern radar systems. The key devices and design techniques for PA are described in this paper. The results of the state-of-the art GaN PAs for microwave to millimeter-wave applications and design methodology for ultra-wideband GaN PAs are shown. In order to realize high output power density, InAlGaN/GaN HEMTs were employed. An output power density of 14.8 W/mm in S-band was achieved which is 1.5 times higher than that of the conventional AlGaN/GaN HEMTs. This technique was applied to the millimeter-wave GaN PAs, and a measured power density at 96 GHz was 3 W/mm. The modified Angelov model was employed for a millimeter-wave design. W-band GaN MMIC achieved the maximum Pout of 1.15 W under CW operation. The PA with Lange coupler achieved 2.6 W at 94 GHz. The authors also developed a wideband PA. A power combiner with an impedance transformation function based on the transmission line transformer (TLT) technique was adopted for the wideband PA design. The fabricated PA exhibited an average Pout of 233 W, an average PAE of 42 %, in the frequency range of 0.5 GHz to 2.1 GHz.

  • Analysis and Design of Continuous-Time Comparator Open Access

    Takahiro MIKI  


    E104-C No:10

    Applications of continuous-time (CT) comparator include relaxation oscillators, pulse width modulators, and so on. CT comparator receives a differential input and outputs a strobe ideally when the differential input crosses zero. Unlike the DT comparators with positive feedback circuit, amplifiers consuming static power must be employed in CT comparators to amplify the input signal. Therefore, minimization of comparator delay under the constraint of power consumption often becomes an issue. This paper analyzes transient behavior of a CT comparator. Using “constant delay approximation”, the comparator delay is derived as a function of input slew rate, number of stages of the preamplifier, and device parameters in each block. This paper also discusses optimum design of the CT comparator. The condition for minimum comparator delay is derived with keeping power consumption constant. The results include that the optimum DC gain of the preamplifier is e∼e3 per stage depending on the element which dominates load capacitance of the preamplifier.

  • A High-Speed PWM-Modulated Transceiver Network for Closed-Loop Channel Topology

    Kyongsu LEE  Jae-Yoon SIM  


    E104-C No:7

    This paper proposes a pulse-width modulated (PWM) signaling[1] to send clock and data over a pair of channels for in-vehicle network where a closed chain of point-to-point (P2P) interconnection between electronic control units (ECU) has been established. To improve detection speed and margin of proposed receiver, we also proposed a novel clock and data recovery (CDR) scheme with 0.5 unit-interval (UI) tuning range and a PWM generator utilizing 10 equally-spaced phases. The feasibility of proposed system has been proved by successfully detecting 1.25 Gb/s data delivered via 3 ECUs and inter-channels in 180 nm CMOS technology. Compared to previous study, the proposed system achieved better efficiency in terms of power, cost, and reliability.

  • Acquisition of the Width of a Virtual Body through Collision Avoidance Trials

    Yoshiaki SAITO  Kazumasa KAWASHIMA  Masahito HIRAKAWA  

    PAPER-Human-computer Interaction

    E104-D No:5

    The progress of immersive technology enables researchers and developers to construct work spaces that are freed from real-world constraints. This has motivated us to investigate the role of the human body. In this research, we examine human cognitive behaviors in obtaining an understanding of the width of their virtual body through simple yet meaningful experiments using virtual reality (VR). In the experiments, participants were modeled as an invisible board, and a spherical object was thrown at the participants to provide information for exploring the width of their invisible body. Audio and visual feedback were provided when the object came into contact with the board (body). We first explored how precisely the participants perceived the virtual body width. Next, we examined how the body perception was generated and changed as the trial proceeded when the participants tried to move right or left actively for the avoidance of collision with approaching objects. The results of the experiments indicated that the participants could become successful in avoiding collision within a limited number of trials (14 at most) under the experimental conditions. It was also found that they postponed deciding how much they should move at the beginning and then started taking evasive action earlier as they become aware of the virtual body.

  • A Low Complexity CFO Estimation Method for UFMC Systems

    Hui ZHANG  Bin SHENG  Pengcheng ZHU  

    PAPER-Wireless Communication Technologies

    E104-B No:2

    Universal filtered multicarrier (UFMC) systems offer a flexibility of filtering sub-bands with arbitrary bandwidth to suppress out-of-band (OoB) emission, while keeping the orthogonality between subcarriers in one sub-band. Oscillator discrepancies between the transmitter and receiver induce carrier frequency offset (CFO) in practical systems. In this paper, we propose a novel CFO estimation method for UFMC systems that has very low computational complexity and can then be used in practical systems. In order to fully exploit the coherence bandwidth of the channel, the training symbols are designed to have several identical segments in the frequency domain. As a result, the integral part of CFO can be estimated by simply determining the correlation between received signal and the training symbol. Simulation results show that the proposed method can achieve almost the same performance as an existing method and even a better performance in channels that have small decay parameter values. The proposed method can also be used in other multicarrier systems, such as orthogonal frequency division multiplexing (OFDM).

  • NOMA-Based Optimal Multiplexing for Multiple Downlink Service Channels to Maximize Integrated System Throughput Open Access

    Teruaki SHIKUMA  Yasuaki YUDA  Kenichi HIGUCHI  

    PAPER-Wireless Communication Technologies

    E103-B No:11

    We propose a novel non-orthogonal multiple access (NOMA)-based optimal multiplexing method for multiple downlink service channels to maximize the integrated system throughput. In the fifth generation (5G) mobile communication system, the support of various wireless communication services such as massive machine-type communications (mMTC), ultra-reliable low latency communications (URLLC), and enhanced mobile broadband (eMBB) is expected. These services will serve different numbers of terminals and have different requirements regarding the spectrum efficiency and fairness among terminals. Furthermore, different operators may have different policies regarding the overall spectrum efficiency and fairness among services. Therefore, efficient radio resource allocation is essential during the multiplexing of multiple downlink service channels considering these requirements. The proposed method achieves better system performance than the conventional orthogonal multiple access (OMA)-based multiplexing method thanks to the wider transmission bandwidth per terminal and inter-terminal interference cancellation using a successive interference canceller (SIC). Computer simulation results reveal that the effectiveness of the proposed method is especially significant when the system prioritizes the fairness among terminals (including fairness among services).
