The search functionality is under construction.

Author Search Result

[Author] Masayuki HIROMOTO(12hit)

1-12hit
  • Hardware-Accelerated Secured Naïve Bayesian Filter Based on Partially Homomorphic Encryption

    Song BIAN  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER-Cryptography and Information Security

      Vol:
    E102-A No:2
      Page(s):
    430-439

    In this work, we provide the first practical secure email filtering scheme based on homomorphic encryption. Specifically, we construct a secure naïve Bayesian filter (SNBF) using the Paillier scheme, a partially homomorphic encryption (PHE) scheme. We first show that SNBF can be implemented with only the additive homomorphism, thus eliminating the need to employ expensive fully homomorphic schemes. In addition, the design space for specialized hardware architecture realizing SNBF is explored. We utilize a recursive Karatsuba Montgomery structure to accelerate the homomorphic operations, where multiplication of 2048-bit integers are carried out. Through the experiment, both software and hardware versions of the SNBF are implemented. On software, 104-105x runtime and 103x storage reduction are achieved by SNBF, when compared to existing fully homomorphic approaches. By instantiating the designed hardware for SNBF, a further 33x runtime and 1919x power reduction are achieved. The proposed hardware implementation classifies an average-length email in under 0.5s, which is much more practical than existing solutions.

  • SimpleZSL: Extremely Simple and Fast Zero-Shot Learning with Nearest Neighbor Classifiers

    Masayuki HIROMOTO  Hisanao AKIMA  Teruo ISHIHARA  Takuji YAMAMOTO  

     
    PAPER-Pattern Recognition

      Pubricized:
    2021/10/29
      Vol:
    E105-D No:2
      Page(s):
    396-405

    Zero-shot learning (ZSL) aims to classify images of unseen classes by learning relationship between visual and semantic features. Existing works have been improving recognition accuracy from various approaches, but they employ computationally intensive algorithms that require iterative optimization. In this work, we revisit the primary approach of the pattern recognition, ı.e., nearest neighbor classifiers, to solve the ZSL task by an extremely simple and fast way, called SimpleZSL. Our algorithm consists of the following three simple techniques: (1) just averaging feature vectors to obtain visual prototypes of seen classes, (2) calculating a pseudo-inverse matrix via singular value decomposition to generate visual features of unseen classes, and (3) inferring unseen classes by a nearest neighbor classifier in which cosine similarity is used to measure distance between feature vectors. Through the experiments on common datasets, the proposed method achieves good recognition accuracy with drastically small computational costs. The execution time of the proposed method on a single CPU is more than 100 times faster than those of the GPU implementations of the existing methods with comparable accuracies.

  • Area Efficient Annealing Processor for Ising Model without Random Number Generator

    Hidenori GYOTEN  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER-Device and Architecture

      Pubricized:
    2017/11/17
      Vol:
    E101-D No:2
      Page(s):
    314-323

    An area-efficient FPGA-based annealing processor that is based on Ising model is proposed. The proposed processor eliminates random number generators (RNGs) and temperature schedulers, which are the key components in the conventional annealing processors and occupying a large portion of the design. Instead, a shift-register-based spin flipping scheme successfully helps the Ising model from stucking in the local optimum solutions. An FPGA implementation and software-based evaluation on max-cut problems of 2D-grid torus structure demonstrate that our annealing processor solves the problems 10-104 times faster than conventional optimization algorithms to obtain the solution of equal accuracy.

  • Fast Estimation of NBTI-Induced Delay Degradation Based on Signal Probability

    Song BIAN  Michihiro SHINTANI  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1400-1409

    As technology further scales semiconductor devices, aging-induced device degradation has become one of the major threats to device reliability. Hence, taking aging-induced degradation into account during the design phase can greatly improve the reliability of the manufactured devices. However, accurately estimating the aging effect for extremely large circuits, like processors, is time-consuming. In this research, we focus on the negative bias temperature instability (NBTI) as the aging-induced degradation mechanism, and propose a fast and efficient way of estimating NBTI-induced delay degradation by utilizing static-timing analysis (STA) and simulation-based lookup table (LUT). We modeled each type of gates at different degradation levels, load capacitances and input slews. Using these gate-delay models, path delays of arbitrary circuits can be efficiently estimated. With a typical five-stage pipelined processor as the design target, by comparing the calculated delay from LUT with the reference delay calculated by a commercial circuit simulator, we achieved 4114 times speedup within 5.6% delay error.

  • Utilization of Path-Clustering in Efficient Stress-Control Gate Replacement for NBTI Mitigation

    Shumpei MORITA  Song BIAN  Michihiro SHINTANI  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER

      Vol:
    E100-A No:7
      Page(s):
    1464-1472

    Replacement of highly stressed logic gates with internal node control (INC) logics is known to be an effective way to alleviate timing degradation due to NBTI. We propose a path clustering approach to accelerate finding effective replacement gates. Upon the observation that there exist paths that always become timing critical after aging, critical path candidates are clustered to select representative path in each cluster. With efficient data structure to further reduce timing calculation, INC logic optimization has first became tractable in practical time. Through the experiments using a processor, 171x speedup has been demonstrated while retaining almost the same level of mitigation gain.

  • Reliability Evaluation Environment for Exploring Design Space of Coarse-Grained Reconfigurable Architectures

    Takashi IMAGAWA  Masayuki HIROMOTO  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E93-A No:12
      Page(s):
    2524-2532

    This paper proposes a reliability evaluation environment for coarse-grained reconfigurable architectures. This environment is designed so that it can be easily extended to different target architectures and applications by automating the generation of the simulation inputs such as HDL codes for fault injection and configuration information. This automation enables us to explore a huge design space in order to efficiently analyze area/reliability trade-offs and find the best solution. This paper also shows demonstrative examples of the design space exploration of coarse-grained reconfigurable architectures using the proposed environment. Through the demonstrations, we discuss relationship between coarse-grained architectures and reliability, which has not yet been addressed in existing literatures and show the feasibility of the proposed environment.

  • An Error Correction Scheme through Time Redundancy for Enhancing Persistent Soft-Error Tolerance of CGRAs

    Takashi IMAGAWA  Masayuki HIROMOTO  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-Integrated Electronics

      Vol:
    E98-C No:7
      Page(s):
    741-750

    Time redundancy is sometimes an only option for enhancing circuit reliability when the circuit area is severely restricted. In this paper, a time-redundant error-correction scheme, which is particularly suitable for coarse-grained reconfigurable arrays (CGRAs), is proposed. It judges the correctness of the executions by comparing the results of two identical runs. Once a mismatch is found, the second run is terminated immediately to start the third run, under the assumption that the errors tend to persist in many applications, for selecting the correct result in the three runs. The circuit area and reliability of the proposed method is compared with a straightforward implementation of time-redundancy and a selective triple modular redundancy (TMR). A case study on a CGRA revealed that the area of the proposed method is 1% larger than that of the implementation for the selective TMR. The study also shows the proposed scheme is up to 2.6x more reliable than the full-TMR when the persistent error is predominant.

  • MRO-PUF: Physically Unclonable Function with Enhanced Resistance against Machine Learning Attacks Utilizing Instantaneous Output of Ring Oscillator

    Masayuki HIROMOTO  Motoki YOSHINAGA  Takashi SATO  

     
    PAPER

      Vol:
    E101-A No:7
      Page(s):
    1035-1044

    This paper proposes MRO-PUF, a new architecture for ring-oscillator-based physically unclonable functions (PUFs) with enhanced resistance against machine learning attacks. In the proposed PUF, an instantaneous output value of a ring oscillator is used as a response, whereas the most existing PUFs directly use propagation delays to determine the response. Since the response of the MRO-PUF is non-linear and discontinuous as the delay of the ring oscillator increases, the prediction of the response by machine learning attacks is difficult. Through the performance evaluation of the MRO-PUF with simulations, it achieves 15 times stronger resistance against machine learning attacks using a support vector machine compared to the existing ones such as an arbiter PUF and a bistable ring PUF. The MRO-PUF also achieves a sufficient level of the basic performance of PUFs in terms of uniqueness and robustness.

  • Automation of Model Parameter Estimation for Random Telegraph Noise

    Hirofumi SHIMIZU  Hiromitsu AWANO  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Vol:
    E97-A No:12
      Page(s):
    2383-2392

    The modeling of random telegraph noise (RTN) of MOS transistors is becoming increasingly important. In this paper, a novel method is proposed for realizing automated estimation of two important RTN-model parameters: the number of interface-states and corresponding threshold voltage shift. The proposed method utilizes a Gaussian mixture model (GMM) to represent the voltage distributions, and estimates their parameters using the expectation-maximization (EM) algorithm. Using information criteria, the optimal estimation is automatically obtained while avoiding overfitting. In addition, we use a shared variance for all the Gaussian components in the GMM to deal with the noise in RTN signals. The proposed method improved estimation accuracy when the large measurement noise is observed.

  • Identification and Application of Invariant Critical Paths under NBTI Degradation

    Song BIAN  Shumpei MORITA  Michihiro SHINTANI  Hiromitsu AWANO  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2797-2806

    As technology further scales semiconductor devices, aging-induced device degradation has become one of the major threats to device reliability. In addition, aging mechanisms like the negative bias temperature instability (NBTI) are known to be sensitive to workload (i.e., signal probability) that is hard to be assumed at design phase. In this work, we analyze the workload dependence of NBTI degradation using a processor, and propose a novel technique to estimate the worst-case paths. In our approach, we exploit the fact that the deterministic nature of circuit structure limits the amount of NBTI degradation on different paths, and propose a two-stage path extraction algorithm to identify the invariant critical paths (ICPs) in the processor. Utilizing these paths, we also propose an optimization technique for the replacement of internal node control logic that mitigates the NBTI degradation in the design. Through numerical experiment on two processor designs, we achieved nearly 300x reduction in the sheer number of paths on both designs. Utilizing the extracted ICPs, we achieved 96x-197x speedup without loss in mitigation gain.

  • Efficient Aging-Aware SRAM Failure Probability Calculation via Particle Filter-Based Importance Sampling

    Hiromitsu AWANO  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1390-1399

    An efficient Monte Carlo (MC) method for the calculation of failure probability degradation of an SRAM cell due to negative bias temperature instability (NBTI) is proposed. In the proposed method, a particle filter is utilized to incrementally track temporal performance changes in an SRAM cell. The number of simulations required to obtain stable particle distribution is greatly reduced, by reusing the final distribution of the particles in the last time step as the initial distribution. Combining with the use of a binary classifier, with which an MC sample is quickly judged whether it causes a malfunction of the cell or not, the total number of simulations to capture the temporal change of failure probability is significantly reduced. The proposed method achieves 13.4× speed-up over the state-of-the-art method.

  • Efficient Mini-Batch Training on Memristor Neural Network Integrating Gradient Calculation and Weight Update

    Satoshi YAMAMORI  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E101-A No:7
      Page(s):
    1092-1100

    We propose an efficient training method for memristor neural networks. The proposed method is suitable for the mini-batch-based training, which is a common technique for various neural networks. By integrating the two processes of gradient calculation in the backpropagation algorithm and weight update in the write operation to the memristors, the proposed method accelerates the training process and also eliminates the external computing resources required in the existing method, such as multipliers and memories. Through numerical experiments, we demonstrated that the proposed method achieves twice faster convergence of the training process than the existing method, while retaining the same level of the accuracy for the classification results.