The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PU(3318hit)

61-80hit(3318hit)

  • Recursive Probability Mass Function Method to Calculate Probability Distributions of Pulse-Shaped Signals

    Tomoya FUKAMI  Hirobumi SAITO  Akira HIROSE  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2023/03/27
      Vol:
    E106-A No:10
      Page(s):
    1286-1296

    This paper proposes an accurate and efficient method to calculate probability distributions of pulse-shaped complex signals. We show that the distribution over the in-phase and quadrature-phase (I/Q) complex plane is obtained by a recursive probability mass function of the accumulator for a pulse-shaping filter. In contrast to existing analytical methods, the proposed method provides complex-plane distributions in addition to instantaneous power distributions. Since digital signal processing generally deals with complex amplitude rather than power, the complex-plane distributions are more useful when considering digital signal processing. In addition, our approach is free from the derivation of signal-dependent functions. This fact results in its easy application to arbitrary constellations and pulse-shaping filters like Monte Carlo simulations. Since the proposed method works without numerical integrals and calculations of transcendental functions, the accuracy degradation caused by floating-point arithmetic is inherently reduced. Even though our method is faster than Monte Carlo simulations, the obtained distributions are more accurate. These features of the proposed method realize a novel framework for evaluating the characteristics of pulse-shaped signals, leading to new modulation, predistortion and peak-to-average power ratio (PAPR) reduction schemes.

  • FOM-CDS PUF: A Novel Configurable Dual State Strong PUF Based on Feedback Obfuscation Mechanism against Modeling Attacks

    Hong LI  Wenjun CAO  Chen WANG  Xinrui ZHU  Guisheng LIAO  Zhangqing HE  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/03/29
      Vol:
    E106-A No:10
      Page(s):
    1311-1321

    The configurable Ring oscillator Physical unclonable function (CRO PUF) is the newly proposed strong PUF based on classic RO PUF, which can generate exponential Challenge-Response Pairs (CRPs) and has good uniqueness and reliability. However, existing proposals have low hardware utilization and vulnerability to modeling attacks. In this paper, we propose a Novel Configurable Dual State (CDS) PUF with lower overhead and higher resistance to modeling attacks. This structure can be flexibly transformed into RO PUF and TERO PUF in the same topology according to the parity of the Hamming Weight (HW) of the challenge, which can achieve 100% utilization of the inverters and improve the efficiency of hardware utilization. A feedback obfuscation mechanism (FOM) is also proposed, which uses the stable count value of the ring oscillator in the PUF as the updated mask to confuse and hide the original challenge, significantly improving the effect of resisting modeling attacks. The proposed FOM-CDS PUF is analyzed by building a mathematical model and finally implemented on Xilinx Artix-7 FPGA, the test results show that the FOM-CDS PUF can effectively resist several popular modeling attack methods and the prediction accuracy is below 60%. Meanwhile it shows that the FOM-CDS PUF has good performance with uniformity, Bit Error Rate at different temperatures, Bit Error Rate at different voltages and uniqueness of 53.68%, 7.91%, 5.64% and 50.33% respectively.

  • A Network Design Scheme in Delay Sensitive Monitoring Services Open Access

    Akio KAWABATA  Takuya TOJO  Bijoy CHAND CHATTERJEE  Eiji OKI  

     
    PAPER-Network Management/Operation

      Pubricized:
    2023/04/19
      Vol:
    E106-B No:10
      Page(s):
    903-914

    Mission-critical monitoring services, such as finding criminals with a monitoring camera, require rapid detection of newly updated data, where suppressing delay is desirable. Taking this direction, this paper proposes a network design scheme to minimize this delay for monitoring services that consist of Internet-of-Things (IoT) devices located at terminal endpoints (TEs), databases (DB), and applications (APLs). The proposed scheme determines the allocation of DB and APLs and the selection of the server to which TE belongs. DB and APL are allocated on an optimal server from multiple servers in the network. We formulate the proposed network design scheme as an integer linear programming problem. The delay reduction effect of the proposed scheme is evaluated under two network topologies and a monitoring camera system network. In the two network topologies, the delays of the proposed scheme are 78 and 80 percent, compared to that of the conventional scheme. In the monitoring camera system network, the delay of the proposed scheme is 77 percent compared to that of the conventional scheme. These results indicate that the proposed scheme reduces the delay compared to the conventional scheme where APLs are located near TEs. The computation time of the proposed scheme is acceptable for the design phase before the service is launched. The proposed scheme can contribute to a network design that detects newly added objects quickly in the monitoring services.

  • Hybrid, Asymmetric and Reconfigurable Input Unit Designs for Energy-Efficient On-Chip Networks

    Xiaoman LIU  Yujie GAO  Yuan HE  Xiaohan YUE  Haiyan JIANG  Xibo WANG  

     
    PAPER

      Pubricized:
    2023/04/10
      Vol:
    E106-C No:10
      Page(s):
    570-579

    The complexity and scale of Networks-on-Chip (NoCs) are growing as more processing elements and memory devices are implemented on chips. However, under strict power budgets, it is also critical to lower the power consumption of NoCs for the sake of energy efficiency. In this paper, we therefore present three novel input unit designs for on-chip routers attempting to shrink their power consumption while still conserving the network performance. The key idea behind our designs is to organize buffers in the input units with characteristics of the network traffic in mind; as in our observations, only a small portion of the network traffic are long packets (composed of multiple flits), which means, it is fair to implement hybrid, asymmetric and reconfigurable buffers so that they are mainly targeting at short packets (only having a single flit), hence the smaller power consumption and area overhead. Evaluations show that our hybrid, asymmetric and reconfigurable input unit designs can achieve an average reduction of energy consumption per flit by 45%, 52.3% and 56.2% under 93.6% (for hybrid designs) and 66.3% (for asymmetric and reconfigurable designs) of the original router area, respectively. Meanwhile, we only observe minor degradation in network latency (ranging from 18.4% to 1.5%, on average) with our proposals.

  • Kr-Plasma Sputtering for Pt Gate Electrode Deposition on MFSFET with 5 nm-Thick Ferroelectric Nondoped HfO2 Gate Insulator for Analog Memory Application

    Joong-Won SHIN  Masakazu TANUMA  Shun-ichiro OHMI  

     
    PAPER

      Pubricized:
    2023/06/02
      Vol:
    E106-C No:10
      Page(s):
    581-587

    In this research, we investigated the threshold voltage (VTH) control by partial polarization of metal-ferroelectric-semiconductor field-effect transistors (MFSFETs) with 5 nm-thick nondoped HfO2 gate insulator utilizing Kr-plasma sputtering for Pt gate electrode deposition. The remnant polarization (2Pr) of 7.2 μC/cm2 was realized by Kr-plasma sputtering for Pt gate electrode deposition. The memory window (MW) of 0.58 V was realized by the pulse amplitude and width of -5/5 V, 100 ms. Furthermore, the VTH of MFSFET was controllable by program/erase (P/E) input pulse even with the pulse width below 100 ns which may be caused by the reduction of leakage current with decreasing plasma damage.

  • Contact Pad Design Considerations for Semiconductor Qubit Devices for Reducing On-Chip Microwave Crosstalk

    Kaito TOMARI  Jun YONEDA  Tetsuo KODERA  

     
    BRIEF PAPER

      Pubricized:
    2023/02/20
      Vol:
    E106-C No:10
      Page(s):
    588-591

    Reducing on-chip microwave crosstalk is crucial for semiconductor spin qubit integration. Toward crosstalk reduction and qubit integration, we investigate on-chip microwave crosstalk for gate electrode pad designs with (i) etched trenches between contact pads or (ii) contact pads with reduced sizes. We conclude that the design with feature (ii) is advantageous for high-density integration of semiconductor qubits with small crosstalk (below -25 dB at 6 GHz), favoring the introduction of flip-chip bonding.

  • Feedback Node Sets in Pancake Graphs and Burnt Pancake Graphs

    Sinyu JUNG  Keiichi KANEKO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/06/30
      Vol:
    E106-D No:10
      Page(s):
    1677-1685

    A feedback node set (FNS) of a graph is a subset of the nodes of the graph whose deletion makes the residual graph acyclic. By finding an FNS in an interconnection network, we can set a check point at each node in it to avoid a livelock configuration. Hence, to find an FNS is a critical issue to enhance the dependability of a parallel computing system. In this paper, we propose a method to find FNS's in n-pancake graphs and n-burnt pancake graphs. By analyzing the types of cycles proposed in our method, we also give the number of the nodes in the FNS in an n-pancake graph, (n-2.875)(n-1)!+1.5(n-3)!, and that in an n-burnt pancake graph, 2n-1(n-1)!(n-3.5).

  • GPU-Accelerated Estimation and Targeted Reduction of Peak IR-Drop during Scan Chain Shifting

    Shiling SHI  Stefan HOLST  Xiaoqing WEN  

     
    PAPER-Dependable Computing

      Pubricized:
    2023/07/07
      Vol:
    E106-D No:10
      Page(s):
    1694-1704

    High power dissipation during scan test often causes undue yield loss, especially for low-power circuits. One major reason is that the resulting IR-drop in shift mode may corrupt test data. A common approach to solving this problem is partial-shift, in which multiple scan chains are formed and only one group of scan chains is shifted at a time. However, existing partial-shift based methods suffer from two major problems: (1) their IR-drop estimation is not accurate enough or computationally too expensive to be done for each shift cycle; (2) partial-shift is hence applied to all shift cycles, resulting in long test time. This paper addresses these two problems with a novel IR-drop-aware scan shift method, featuring: (1) Cycle-based IR-Drop Estimation (CIDE) supported by a GPU-accelerated dynamic power simulator to quickly find potential shift cycles with excessive peak IR-drop; (2) a scan shift scheduling method that generates a scan chain grouping targeted for each considered shift cycle to reduce the impact on test time. Experiments on ITC'99 benchmark circuits show that: (1) the CIDE is computationally feasible; (2) the proposed scan shift schedule can achieve a global peak IR-drop reduction of up to 47%. Its scheduling efficiency is 58.4% higher than that of an existing typical method on average, which means our method has less test time.

  • Efficient Construction of CGL Hash Function Using Legendre Curves

    Yuji HASHIMOTO  Koji NUIDA  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/02/07
      Vol:
    E106-A No:9
      Page(s):
    1131-1140

    The CGL hash function is a provably secure hash function using walks on isogeny graphs of supersingular elliptic curves. A dominant cost of its computation comes from iterative computations of power roots over quadratic extension fields. In this paper, we reduce the necessary number of power root computations by almost half, by applying and also extending an existing method of efficient isogeny sequence computation on Legendre curves (Hashimoto and Nuida, CASC 2021). We also point out some relationship between 2-isogenies for Legendre curves and those for Edwards curves, which is of independent interests, and develop a method of efficient computation for 2e-th roots in quadratic extension fields.

  • Attractiveness Computing in Image Media

    Toshihiko YAMASAKI  

     
    INVITED PAPER-Vision

      Pubricized:
    2023/06/16
      Vol:
    E106-A No:9
      Page(s):
    1196-1201

    Our research group has been working on attractiveness prediction, reasoning, and even enhancement for multimedia content, which we call “attractiveness computing.” Attractiveness includes impressiveness, instagrammability, memorability, clickability, and so on. Analyzing such attractiveness was usually done by experienced professionals but we have experimentally revealed that artificial intelligence (AI) based on big multimedia data can imitate or reproduce professionals' skills in some cases. In this paper, we introduce some of the representative works and possible real-life applications of our attractiveness computing for image media.

  • Backup Resource Allocation Model with Probabilistic Protection Considering Service Delay

    Shinya HORIMOTO  Fujun HE  Eiji OKI  

     
    PAPER-Network

      Pubricized:
    2023/03/24
      Vol:
    E106-B No:9
      Page(s):
    798-816

    This paper proposes a backup resource allocation model for virtual network functions (VNFs) to minimize the total allocated computing capacity for backup with considering the service delay. If failures occur to primary hosts, the VNFs in failed hosts are recovered by backup hosts whose allocation is pre-determined. We introduce probabilistic protection, where the probability that the protection by a backup host fails is limited within a given value; it allows backup resource sharing to reduce the total allocated computing capacity. The previous work does not consider the service delay constraint in the backup resource allocation problem. The proposed model considers that the probability that the service delay, which consists of networking delay between hosts and processing delay in each VNF, exceeds its threshold is constrained within a given value. We introduce a basic algorithm to solve our formulated delay-constraint optimization problem. In a problem with the size that cannot be solved within an acceptable computation time limit by the basic algorithm, we develop a simulated annealing algorithm incorporating Yen's algorithm to handle the delay constraint heuristically. We observe that both algorithms in the proposed model reduce the total allocated computing capacity by up to 56.3% compared to a baseline; the simulated annealing algorithm can get feasible solutions in problems where the basic algorithm cannot.

  • A Fully Analog Deep Neural Network Inference Accelerator with Pipeline Registers Based on Master-Slave Switched Capacitors

    Yaxin MEI  Takashi OHSAWA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2023/03/08
      Vol:
    E106-C No:9
      Page(s):
    477-485

    A fully analog pipelined deep neural network (DNN) accelerator is proposed, which is constructed by using pipeline registers based on master-slave switched capacitors. The idea of the master-slave switched capacitors is an analog equivalent of the delayed flip-flop (D-FF) which has been used as a digital pipeline register. To estimate the performance of the pipeline register, it is applied to a conventional DNN which performs non-pipeline operation. Compared with the conventional DNN, the cycle time is reduced by 61.5% and data rate is increased by 160%. The accuracy reaches 99.6% in MNIST classification test. The energy consumption per classification is reduced by 88.2% to 0.128µJ, achieving an energy efficiency of 1.05TOPS/W and a throughput of 0.538TOPS in 180nm technology node.

  • Design of Enclosing Signing Keys by All Issuers in Distributed Public Key Certificate-Issuing Infrastructure

    Shohei KAKEI  Hiroaki SEKO  Yoshiaki SHIRAISHI  Shoichi SAITO  

     
    LETTER

      Pubricized:
    2023/05/25
      Vol:
    E106-D No:9
      Page(s):
    1495-1498

    This paper first takes IoT as an example to provide the motivation for eliminating the single point of trust (SPOT) in a CA-based private PKI. It then describes a distributed public key certificate-issuing infrastructure that eliminates the SPOT and its limitation derived from generating signing keys. Finally, it proposes a method to address its limitation by all certificate issuers.

  • On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/06/12
      Vol:
    E106-D No:9
      Page(s):
    1537-1545

    In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.

  • LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs

    Song LIU  Jie MA  Chenyu ZHAO  Xinhe WAN  Weiguo WU  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2023/02/10
      Vol:
    E106-A No:8
      Page(s):
    1043-1050

    GPUs have become the dominant computing units to meet the need of high performance in various computational fields. But the long operation latency causes the underutilization of on-chip computing resources, resulting in performance degradation when running parallel tasks on GPUs. A good warp scheduling strategy is an effective solution to hide latency and improve resource utilization. However, most current warp scheduling algorithms on GPUs ignore the ability of long operations to hide latency. In this paper, we propose a long-operation-first warp scheduling algorithm, LFWS, for GPU platforms. The LFWS filters warps in the ready state to a ready queue and updates the queue in time according to changes in the status of the warp. The LFWS divides the warps in the ready queue into long and short operation groups based on the type of operations in their instruction buffers, and it gives higher priority to the long-operating warp in the ready queue. This can effectively use the long operations to hide some of the latency from each other and enhance the system's ability to hide the latency. To verify the effectiveness of the LFWS, we implement the LFWS algorithm on a simulation platform GPGPU-Sim. Experiments are conducted over various CUDA applications to evaluate the performance of LFWS algorithm, compared with other five warp scheduling algorithms. The results show that the LFWS algorithm achieves an average performance improvement of 8.01% and 5.09%, respectively, over three traditional and two novel warp scheduling algorithms, effectively improving computational resource utilization on GPU.

  • Level Allocation of Four-Level Pulse-Amplitude Modulation Signal in Optically Pre-Amplified Receiver Systems

    Hiroki KAWAHARA  Koji IGARASHI  Kyo INOUE  

     
    PAPER-Fiber-Optic Transmission for Communications

      Pubricized:
    2023/02/03
      Vol:
    E106-B No:8
      Page(s):
    652-659

    This study numerically investigates the symbol-level allocation of four-level pulse-amplitude modulation (PAM4) signals for optically pre-amplified receiver systems. Three level-allocation schemes are examined: intensity-equispaced, amplitude-equispaced, and numerically optimized. Numerical simulations are conducted to comprehensively compare the receiver sensitivities for these level-allocation schemes under various system conditions. The results show that the superiority or inferiority between the level allocations is significantly dependent on the system conditions of the bandwidth of amplified spontaneous emission light, modulation bandwidth, and signal extinction ratio (ER). The mechanisms underlying these dependencies are also discussed.

  • Networking Experiment of Domain-Specific Networking Platform Based on Optically Interconnected Reconfigurable Communication Processors Open Access

    Masaki MURAKAMI  Takashi KURIMOTO  Satoru OKAMOTO  Naoaki YAMANAKA  Takayuki MURANAKA  

     
    PAPER-Network System

      Pubricized:
    2023/02/15
      Vol:
    E106-B No:8
      Page(s):
    660-668

    A domain-specific networking platform based on optically interconnected reconfigurable communication processors is proposed. Some application examples of the reconfigurable communication processor and networking experiment results are presented.

  • Motion Parameter Estimation Based on Overlapping Elements for TDM-MIMO FMCW Radar

    Feng TIAN  Wan LIU  Weibo FU  Xiaojun HUANG  

     
    PAPER-Sensing

      Pubricized:
    2023/02/06
      Vol:
    E106-B No:8
      Page(s):
    705-713

    Intelligent traffic monitoring provides information support for autonomous driving, which is widely used in intelligent transportation systems (ITSs). A method for estimating vehicle moving target parameters based on millimeter-wave radars is proposed to solve the problem of low detection accuracy due to velocity ambiguity and Doppler-angle coupling in the process of traffic monitoring. First of all, a MIMO antenna array with overlapping elements is constructed by introducing them into the typical design of MIMO radar array antennas. The motion-induced phase errors are eliminated by the phase difference among the overlapping elements. Then, the position errors among them are corrected through an iterative method, and the angle of multiple targets is estimated. Finally, velocity disambiguation is performed by adopting the error-corrected phase difference among the overlapping elements. An accurate estimation of vehicle moving target angle and velocity is achieved. Through Monte Carlo simulation experiments, the angle error is 0.1° and the velocity error is 0.1m/s. The simulation results show that the method can be used to effectively solve the problems related to velocity ambiguity and Doppler-angle coupling, meanwhile the accuracy of velocity and angle estimation can be improved. An improved algorithm is tested on the vehicle datasets that are gathered in the forward direction of ordinary public scenes of a city. The experimental results further verify the feasibility of the method, which meets the real-time and accuracy requirements of ITSs on vehicle information monitoring.

  • Reliable and Efficient Chip-PCB Hybrid PUF and Lightweight Key Generator

    Yuanzhong XU  Tao KE  Wenjun CAO  Yao FU  Zhangqing HE  

     
    PAPER-Electronic Circuits

      Pubricized:
    2023/03/10
      Vol:
    E106-C No:8
      Page(s):
    432-441

    Physical Unclonable Function (PUF) is a promising lightweight hardware security primitive that can extract device fingerprints for encryption or authentication. However, extracting fingerprints from either the chip or the board individually has security flaws and cannot provide hardware system-level security. This paper proposes a new Chip-PCB hybrid PUF(CPR PUF) in which Weak PUF on PCB is combined with Strong PUF inside the chip to generate massive responses under the control of challenges of on-chip Strong PUF. This structure tightly couples the chip and PCB into an inseparable and unclonable unit thus can verify the authenticity of chip as well as the board. To improve the uniformity and reliability of Chip-PCB hybrid PUF, we propose a lightweight key generator based on a reliability self-test and debiasing algorithm to extract massive stable and secure keys from unreliable and biased PUF responses, which eliminates expensive error correction processes. The FPGA-based test results show that the PUF responses after robust extraction and debiasing achieve high uniqueness, reliability, uniformity and anti-counterfeiting features. Moreover, the key generator greatly reduces the execution cost and the bit error rate of the keys is less than 10-9, the overall security of the key is also improved by eliminating the entropy leakage of helper data.

  • Write Variation & Reliability Error Compensation by Layer-Wise Tunable Retraining of Edge FeFET LM-GA CiM

    Shinsei YOSHIKIYO  Naoko MISAWA  Kasidit TOPRASERTPONG  Shinichi TAKAGI  Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Pubricized:
    2022/12/19
      Vol:
    E106-C No:7
      Page(s):
    352-364

    This paper proposes a layer-wise tunable retraining method for edge FeFET Computation-in-Memory (CiM) to compensate the accuracy degradation of neural network (NN) by FeFET device errors. The proposed retraining can tune the number of layers to be retrained to reduce inference accuracy degradation by errors that occur after retraining. Weights of the original NN model, accurately trained in cloud data center, are written into edge FeFET CiM. The written weights are changed by FeFET device errors in the field. By partially retraining the written NN model, the proposed method combines the error-affected layers of NN model with the retrained layers. The inference accuracy is thus recovered. After retraining, the retrained layers are re-written to CiM and affected by device errors again. In the evaluation, at first, the recovery capability of NN model by partial retraining is analyzed. Then the inference accuracy after re-writing is evaluated. Recovery capability is evaluated with non-volatile memory (NVM) typical errors: normal distribution, uniform shift, and bit-inversion. For all types of errors, more than 50% of the degraded percentage of inference accuracy is recovered by retraining only the final fully-connected (FC) layer of Resnet-32. To simulate FeFET Local-Multiply and Global-accumulate (LM-GA) CiM, recovery capability is also evaluated with FeFET errors modeled based on FeFET measurements. Retraining only FC layer achieves recovery rate of up to 53%, 66%, and 72% for FeFET write variation, read-disturb, and data-retention, respectively. In addition, just adding two more retraining layers improves recovery rate by 20-30%. In order to tune the number of retraining layers, inference accuracy after re-writing is evaluated by simulating the errors that occur after retraining. When NVM typical errors are injected, it is optimal to retrain FC layer and 3-6 convolution layers of Resnet-32. The optimal number of layers can be increased or decreased depending on the balance between the size of errors before retraining and errors after retraining.

61-80hit(3318hit)