The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PAR(2741hit)

81-100hit(2741hit)

  • SeCAM: Tightly Accelerate the Image Explanation via Region-Based Segmentation

    Phong X. NGUYEN  Hung Q. CAO  Khang V. T. NGUYEN  Hung NGUYEN  Takehisa YAIRI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/05/11
      Vol:
    E105-D No:8
      Page(s):
    1401-1417

    In recent years, there has been an increasing trend of applying artificial intelligence in many different fields, which has a profound and direct impact on human life. Consequently, this raises the need to understand the principles of model making predictions. Since most current high-precision models are black boxes, neither the AI scientist nor the end-user profoundly understands what is happening inside these models. Therefore, many algorithms are studied to explain AI models, especially those in the image classification problem in computer vision such as LIME, CAM, GradCAM. However, these algorithms still have limitations, such as LIME's long execution time and CAM's confusing interpretation of concreteness and clarity. Therefore, in this paper, we will propose a new method called Segmentation - Class Activation Mapping (SeCAM)/ This method combines the advantages of these algorithms above while at simultaneously overcoming their disadvantages. We tested this algorithm with various models, including ResNet50, InceptionV3, and VGG16 from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set. Outstanding results were achieved when the algorithm has met all the requirements for a specific explanation in a remarkably short space of time.

  • LDPC Codes for Communication Systems: Coding Theoretic Perspective Open Access

    Takayuki NOZAKI  Motohiko ISAKA  

     
    INVITED SURVEY PAPER-Fundamental Theories for Communications

      Pubricized:
    2022/02/10
      Vol:
    E105-B No:8
      Page(s):
    894-905

    Low-density parity-check (LDPC) codes are widely used in communication systems for their high error-correcting performance. This survey introduces the elements of LDPC codes: decoding algorithms, code construction, encoding algorithms, and several classes of LDPC codes.

  • Blind Signal Separation for Array Radar Measurement Using Mathematical Model of Pulse Wave Propagation Open Access

    Takuya SAKAMOTO  

     
    PAPER-Sensing

      Pubricized:
    2022/02/18
      Vol:
    E105-B No:8
      Page(s):
    981-989

    This paper presents a novel blind signal separation method for the measurement of pulse waves at multiple body positions using an array radar system. The proposed method is based on a mathematical model of pulse wave propagation. The model relies on three factors: (1) a small displacement approximation, (2) beam pattern orthogonality, and (3) an impulse response model of pulse waves. The separation of radar echoes is formulated as an optimization problem, and the associated objective function is established using the mathematical model. We evaluate the performance of the proposed method using measured radar data from participants lying in a prone position. The accuracy of the proposed method, in terms of estimating the body displacements, is measured using reference data taken from laser displacement sensors. The average estimation errors are found to be 10-21% smaller than those of conventional methods. These results indicate the effectiveness of the proposed method for achieving noncontact measurements of the displacements of multiple body positions.

  • Minimal Paths in a Bicube

    Masaaki OKADA  Keiichi KANEKO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2022/04/22
      Vol:
    E105-D No:8
      Page(s):
    1383-1392

    Nowadays, a rapid increase of demand on high-performance computation causes the enthusiastic research activities regarding massively parallel systems. An interconnection network in a massively parallel system interconnects a huge number of processing elements so that they can cooperate to process tasks by communicating among others. By regarding a processing element and a link between a pair of processing elements as a node and an edge, respectively, many problems with respect to communication and/or routing in an interconnection network are reducible to the problems in the graph theory. For interconnection networks of the massively parallel systems, many topologies have been proposed so far. The hypercube is a very popular topology and it has many variants. The bicube is a such topology and it can interconnect the same number of nodes with the same degree as the hypercube while its diameter is almost half of that of the hypercube. In addition, the bicube keeps the node-symmetric property. Hence, we focus on the bicube and propose an algorithm that gives a minimal or shortest path between an arbitrary pair of nodes. We give a proof of correctness of the algorithm and demonstrate its execution.

  • On a Cup-Stacking Concept in Repetitive Collective Communication

    Takashi YOKOTA  Kanemitsu OOTSU  Shun KOJIMA  

     
    LETTER-Computer System

      Pubricized:
    2022/04/15
      Vol:
    E105-D No:7
      Page(s):
    1325-1329

    Parallel computing essentially consists of computation and communication and, in many cases, communication performance is vital. Many parallel applications use collective communications, which often dominate the performance of the parallel execution. This paper focuses on collective communication performance to speed-up the parallel execution. This paper firstly offers our experimental result that splitting a session of collective communication to small portions (slices) possibly enables efficient communication. Then, based on the results, this paper proposes a new concept cup-stacking with a genetic algorithm based methodology. The preliminary evaluation results reveal the effectiveness of the proposed method.

  • A Large-Scale SCMA Codebook Optimization and Codeword Allocation Method

    Shiqing QIAN  Wenping GE  Yongxing ZHANG  Pengju ZHANG  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2021/12/24
      Vol:
    E105-B No:7
      Page(s):
    788-796

    Sparse code division multiple access (SCMA) is a non-orthogonal multiple access (NOMA) technology that can improve frequency band utilization and allow many users to share quite a few resource elements (REs). This paper uses the modulation of lattice theory to develop a systematic construction procedure for the design of SCMA codebooks under Gaussian channel environments that can achieve near-optimal designs, especially for cases that consider large-scale SCMA parameters. However, under the condition of large-scale SCMA parameters, the mother constellation (MC) points will overlap, which can be solved by the method of the partial dimensions transformation (PDT). More importantly, we consider the upper bounded error probability of the signal transmission in the AWGN channels, and design a codeword allocation method to reduce the inter symbol interference (ISI) on the same RE. Simulation results show that under different codebook sizes and different overload rates, using two different message passing algorithms (MPA) to verify, the codebook proposed in this paper has a bit error rate (BER) significantly better than the reference codebooks, moreover the convergence time does not exceed that of the reference codebooks.

  • High Accuracy Test Techniques with Fine Pattern Generator and Ramp Test Circuit for CMOS Image Sensor

    Fukashi MORISHITA  Wataru SAITO  Norihito KATO  Yoichi IIZUKA  Masao ITO  

     
    PAPER

      Pubricized:
    2022/01/14
      Vol:
    E105-C No:7
      Page(s):
    316-323

    This paper proposes novel test techniques for high accuracy measurement of ADCs and a ramp generator on a CMOS image sensor (CIS) chip. The test circuit for the ADCs has a dual path and has an ability of multi-functional fine pattern generator that can define any input for each column to evaluate CIS specific characteristics electrically. The test circuit for the ramp generator can realize an on-chip current cell test and reject the current cell failure within 1LSB accuracy. We fabricated the test sensor using 55nm CIS process and measured the IP characteristics. Measured results show INL of 14.6LSB, crosstalk of 14.9LSB and column interference noise of 5.4LSB. These measured results agree with the designed values. By using this technique, we confirmed the accurate ADC measurement can be realized without being affected by the ambiguity of the optical input.

  • 32-Bit ALU with Clockless Gates for RSFQ Bit-Parallel Processor Open Access

    Takahiro KAWAGUCHI  Naofumi TAKAGI  

     
    INVITED PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    245-250

    A 32-bit arithmetic logic unit (ALU) is designed for a rapid single flux quantum (RSFQ) bit-parallel processor. In the ALU, clocked gates are partially replaced by clockless gates. This reduces the number of D flip flops (DFFs) required for path balancing. The number of clocked gates, including DFFs, is reduced by approximately 40 %, and size of the clock distribution network is reduced. The number of pipeline stages becomes modest. The layout design of the ALU and simulation results show the effectiveness of using clockless gates in wide datapath circuits.

  • A High-Speed Interface Based on a Josephson Latching Driver for Adiabatic Quantum-Flux-Parametron Logic

    Fumihiro CHINA  Naoki TAKEUCHI  Hideo SUZUKI  Yuki YAMANASHI  Hirotaka TERAI  Nobuyuki YOSHIKAWA  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    264-269

    The adiabatic quantum flux parametron (AQFP) is an energy-efficient, high-speed superconducting logic device. To observe the tiny output currents from the AQFP in experiments, high-speed voltage drivers are indispensable. In the present study, we develop a compact voltage driver for AQFP logic based on a Josephson latching driver (JLD), which has been used as a high-speed driver for rapid single-flux-quantum (RSFQ) logic. In the JLD-based voltage driver, the signal currents of AQFP gates are converted into gap-voltage-level signals via an AQFP/RSFQ interface and a four-junction logic gate. Furthermore, this voltage driver includes only 15 Josephson junctions, which is much fewer than in the case for the previously designed driver based on dc superconducting quantum interference devices (60 junctions). In measurement, we successfully operate the JLD-based voltage driver up to 4 GHz. We also evaluate the bit error rate (BER) of the driver and find that the BER is 7.92×10-10 and 2.67×10-3 at 1GHz and 4GHz, respectively.

  • Adiabatic Quantum-Flux-Parametron with Delay-Line Clocking Using Square Excitation Currents

    Taiki YAMAE  Naoki TAKEUCHI  Nobuyuki YOSHIKAWA  

     
    PAPER

      Pubricized:
    2022/01/19
      Vol:
    E105-C No:6
      Page(s):
    277-282

    The adiabatic quantum-flux-parametron (AQFP) is an energy-efficient superconductor logic device. In a previous study, we proposed a low-latency clocking scheme called delay-line clocking, and several low-latency AQFP logic gates have been demonstrated. In delay-line clocking, the latency between adjacent excitation phases is determined by the propagation delay of excitation currents, and thus the rising time of excitation currents should be sufficiently small; otherwise, an AQFP gate can switch before the previous gate is fully excited. This means that delay-line clocking needs high clock frequencies, because typical excitation currents are sinusoidal and the rising time depends on the frequency. However, AQFP circuits need to be tested in a wide frequency range experimentally. Hence, in the present study, we investigate AQFP circuits adopting delay-line clocking with square excitation currents to apply delay-line clocking in a low frequency range. Square excitation currents have shorter rising time than sinusoidal excitation currents and thus enable low frequency operation. We demonstrate an AQFP buffer chain with delay-line clocking using square excitation currents, in which the latency is approximately 20ps per gate, and confirm that the operating margin for the buffer chain is kept sufficiently wide at clock frequencies below 1GHz, whereas in the sinusoidal case the operating margin shrinks below 500MHz. These results indicate that AQFP circuits adopting delay-line clocking can operate in a low frequency range by using square excitation currents.

  • Development of Quantum Annealer Using Josephson Parametric Oscillators Open Access

    Tomohiro YAMAJI  Masayuki SHIRANE  Tsuyoshi YAMAMOTO  

     
    INVITED PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    283-289

    A Josephson parametric oscillator (JPO) is an interesting system from the viewpoint of quantum optics because it has two stable self-oscillating states and can deterministically generate quantum cat states. A theoretical proposal has been made to operate a network of multiple JPOs as a quantum annealer, which can solve adiabatically combinatorial optimization problems at high speed. Proof-of-concept experiments have been actively conducted for application to quantum computations. This article provides a review of the mechanism of JPOs and their application as a quantum annealer.

  • A 16-Bit Parallel Prefix Carry Look-Ahead Kogge-Stone Adder Implemented in Adiabatic Quantum-Flux-Parametron Logic

    Tomoyuki TANAKA  Christopher L. AYALA  Nobuyuki YOSHIKAWA  

     
    PAPER

      Pubricized:
    2022/01/19
      Vol:
    E105-C No:6
      Page(s):
    270-276

    Extremely energy-efficient logic devices are required for future low-power high-performance computing systems. Superconductor electronic technology has a number of energy-efficient logic families. Among them is the adiabatic quantum-flux-parametron (AQFP) logic family, which adiabatically switches the quantum-flux-parametron (QFP) circuit when it is excited by an AC power-clock. When compared to state-of-the-art CMOS technology, AQFP logic circuits have the advantage of relatively fast clock rates (5 GHz to 10 GHz) and 5 - 6 orders of magnitude reduction in energy before cooling overhead. We have been developing extremely energy-efficient computing processor components using the AQFP. The adder is the most basic computational unit and is important in the development of a processor. In this work, we designed and measured a 16-bit parallel prefix carry look-ahead Kogge-Stone adder (KSA). We fabricated the circuit using the AIST 10 kA/cm2 High-speed STandard Process (HSTP). Due to a malfunction in the measurement system, we were not able to confirm the complete operation of the circuit at the low frequency of 100 kHz in liquid He, but we confirmed that the outputs that we did observe are correct for two types of tests: (1) critical tests and (2) 110 random input tests in total. The operation margin of the circuit is wide, and we did not observe any calculation errors during measurement.

  • In Search of the Performance- and Energy-Efficient CNN Accelerators Open Access

    Stanislav SEDUKHIN  Yoichi TOMIOKA  Kohei YAMAMOTO  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    209-221

    In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.

  • Supervised Audio Source Separation Based on Nonnegative Matrix Factorization with Cosine Similarity Penalty Open Access

    Yuta IWASE  Daichi KITAMURA  

     
    PAPER-Engineering Acoustics

      Pubricized:
    2021/12/08
      Vol:
    E105-A No:6
      Page(s):
    906-913

    In this study, we aim to improve the performance of audio source separation for monaural mixture signals. For monaural audio source separation, semisupervised nonnegative matrix factorization (SNMF) can achieve higher separation performance by employing small supervised signals. In particular, penalized SNMF (PSNMF) with orthogonality penalty is an effective method. PSNMF forces two basis matrices for target and nontarget sources to be orthogonal to each other and improves the separation accuracy. However, the conventional orthogonality penalty is based on an inner product and does not affect the estimation of the basis matrix properly because of the scale indeterminacy between the basis and activation matrices in NMF. To cope with this problem, a new PSNMF with cosine similarity between the basis matrices is proposed. The experimental comparison shows the efficacy of the proposed cosine similarity penalty in supervised audio source separation.

  • Particle Filter Design Based on Reinforcement Learning and Its Application to Mobile Robot Localization

    Ryota YOSHIMURA  Ichiro MARUTA  Kenji FUJIMOTO  Ken SATO  Yusuke KOBAYASHI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/01/28
      Vol:
    E105-D No:5
      Page(s):
    1010-1023

    Particle filters have been widely used for state estimation problems in nonlinear and non-Gaussian systems. Their performance depends on the given system and measurement models, which need to be designed by the user for each target system. This paper proposes a novel method to design these models for a particle filter. This is a numerical optimization method, where the particle filter design process is interpreted into the framework of reinforcement learning by assigning the randomnesses included in both models of the particle filter to the policy of reinforcement learning. In this method, estimation by the particle filter is repeatedly performed and the parameters that determine both models are gradually updated according to the estimation results. The advantage is that it can optimize various objective functions, such as the estimation accuracy of the particle filter, the variance of the particles, the likelihood of the parameters, and the regularization term of the parameters. We derive the conditions to guarantee that the optimization calculation converges with probability 1. Furthermore, in order to show that the proposed method can be applied to practical-scale problems, we design the particle filter for mobile robot localization, which is an essential technology for autonomous navigation. By numerical simulations, it is demonstrated that the proposed method further improves the localization accuracy compared to the conventional method.

  • A Performance Model for Reconfigurable Block Cipher Array Utilizing Amdahl's Law

    Tongzhou QU  Zibin DAI  Yanjiang LIU  Lin CHEN  Xianzhao XIA  

     
    PAPER-Computer System

      Pubricized:
    2022/02/17
      Vol:
    E105-D No:5
      Page(s):
    964-972

    The existing research on Amdahl's law is limited to multi/many-core processors, and cannot be applied to the important parallel processing architecture of coarse-grained reconfigurable arrays. This paper studies the relation between the multi-level parallelism of block cipher algorithms and the architectural characteristics of coarse-grain reconfigurable arrays. We introduce the key variables that affect the performance of reconfigurable arrays, such as communication overhead and configuration overhead, into Amdahl's law. On this basis, we propose a performance model for coarse-grain reconfigurable block cipher array (CGRBA) based on the extended Amdahl's law. In addition, this paper establishes the optimal integer nonlinear programming model, which can provide a parameter reference for the architecture design of CGRBA. The experimental results show that: (1) reducing the communication workload ratio and increasing the number of configuration pages reasonably can significantly improve the algorithm performance on CGRBA; (2) the communication workload ratio has a linear effect on the execution time.

  • Performance Evaluation of Bluetooth Low Energy Positioning Systems When Using Sparse Training Data

    Tetsuya MANABE  Kosuke OMURA  

     
    PAPER

      Pubricized:
    2021/11/01
      Vol:
    E105-A No:5
      Page(s):
    778-786

    This paper evaluates the bluetooth low energy (BLE) positioning systems using the sparse-training data through the comparison experiments. The sparse-training data is extracted from the database including enough data for realizing the highly accurate and precise positioning. First, we define the sparse-training data, i.e., the data collection time and the number of smartphones, directions, beacons, and reference points, on BLE positioning systems. Next, the positioning performance evaluation experiments are conducted in two indoor environments, that is, an indoor corridor as a one-dimensionally spread environment and a hall as a twodimensionally spread environment. The algorithms for comparison are the conventional fingerprint algorithm and the hybrid algorithm (the authors already proposed, and combined the proximity algorithm and the fingerprint algorithm). Based on the results, we confirm that the hybrid algorithm performs well in many cases even when using sparse-training data. Consequently, the robustness of the hybrid algorithm, that the authors already proposed for the sparse-training data, is shown.

  • Fully Connected Imaging Network for Near-Field Synthetic Aperture Interferometric Radiometer

    Zhimin GUO  Jianfei CHEN  Sheng ZHANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/02/09
      Vol:
    E105-D No:5
      Page(s):
    1120-1124

    Millimeter wave synthetic aperture interferometric radiometers (SAIR) are very powerful instruments, which can effectively realize high-precision imaging detection. However due to the existence of interference factor and complex near-field error, the imaging effect of near-field SAIR is usually not ideal. To achieve better imaging results, a new fully connected imaging network (FCIN) is proposed for near-field SAIR. In FCIN, the fully connected network is first used to reconstruct the image domain directly from the visibility function, and then the residual dense network is used for image denoising and enhancement. The simulation results show that the proposed FCIN method has high imaging accuracy and shorten imaging time.

  • Feature Selection and Parameter Optimization of Support Vector Machines Based on a Local Search Based Firefly Algorithm for Classification of Formulas in Traditional Chinese Medicine Open Access

    Wen SHI  Jianling LIU  Jingyu ZHANG  Yuran MEN  Hongwei CHEN  Deke WANG  Yang CAO  

     
    LETTER-Algorithms and Data Structures

      Pubricized:
    2021/11/16
      Vol:
    E105-A No:5
      Page(s):
    882-886

    Syndrome is a crucial principle of Traditional Chinese Medicine. Formula classification is an effective approach to discover herb combinations for the clinical treatment of syndromes. In this study, a local search based firefly algorithm (LSFA) for parameter optimization and feature selection of support vector machines (SVMs) for formula classification is proposed. Parameters C and γ of SVMs are optimized by LSFA. Meanwhile, the effectiveness of herbs in formula classification is adopted as a feature. LSFA searches for well-performing subsets of features to maximize classification accuracy. In LSFA, a local search of fireflies is developed to improve FA. Simulations demonstrate that the proposed LSFA-SVM algorithm outperforms other classification algorithms on different datasets. Parameters C and γ and the features are optimized by LSFA to obtain better classification performance. The performance of FA is enhanced by the proposed local search mechanism.

  • Dynamic Fault Tolerance for Multi-Node Query Processing

    Yutaro BESSHO  Yuto HAYAMIZU  Kazuo GODA  Masaru KITSUREGAWA  

     
    PAPER

      Pubricized:
    2022/02/03
      Vol:
    E105-D No:5
      Page(s):
    909-919

    Parallel processing is a typical approach to answer analytical queries on large database. As the size of the database increases, we often try to increase the parallelism by incorporating more processing nodes. However, this approach increases the possibility of node failure as well. According to the conventional practice, if a failure occurs during query processing, the database system restarts the query processing from the beginning. Such temporal cost may be unacceptable to the user. This paper proposes a fault-tolerant query processing mechanism, named PhoeniQ, for analytical parallel database systems. PhoeniQ continuously takes a checkpoint for every operator pipeline and replicates the output of each stateful operator among different processing nodes. If a single processing node fails during query processing, another can promptly take over the processing. Hence, PhoneniQ allows the database system to efficiently resume query processing after a partial failure event. This paper presents a key design of PhoeniQ and prototype-based experiments to demonstrate that PhoeniQ imposes negligible performance overhead and efficiently continues query processing in the face of node failure.

81-100hit(2741hit)