Tomoyuki TANAKA Christopher L. AYALA Nobuyuki YOSHIKAWA
Extremely energy-efficient logic devices are required for future low-power high-performance computing systems. Superconductor electronic technology has a number of energy-efficient logic families. Among them is the adiabatic quantum-flux-parametron (AQFP) logic family, which adiabatically switches the quantum-flux-parametron (QFP) circuit when it is excited by an AC power-clock. When compared to state-of-the-art CMOS technology, AQFP logic circuits have the advantage of relatively fast clock rates (5 GHz to 10 GHz) and 5 - 6 orders of magnitude reduction in energy before cooling overhead. We have been developing extremely energy-efficient computing processor components using the AQFP. The adder is the most basic computational unit and is important in the development of a processor. In this work, we designed and measured a 16-bit parallel prefix carry look-ahead Kogge-Stone adder (KSA). We fabricated the circuit using the AIST 10 kA/cm2 High-speed STandard Process (HSTP). Due to a malfunction in the measurement system, we were not able to confirm the complete operation of the circuit at the low frequency of 100 kHz in liquid He, but we confirmed that the outputs that we did observe are correct for two types of tests: (1) critical tests and (2) 110 random input tests in total. The operation margin of the circuit is wide, and we did not observe any calculation errors during measurement.
With the increasing densification of 5G and future 6G networks high-capacity backhaul links to connect the numerous base stations become an issue. Since not all base stations can be connected via fibre links for either technical or economic reasons wireless connections at 300GHz, which may provide data rates comparable to fibre links, are an alternative. This paper deals with the planning of 300GHz backhaul links and describes two novel automatic planning approaches for backhaul links arranged in ring and star topology. The two planning approaches are applied to various scenarios and the corresponding planning results are evaluated by comparing signal to interference plus noise ratio under various simulation conditions including weather impacts showing the feasibility of wireless backhaul links.
Ai-ichiro SASAKI Ken FUKUSHIMA
Magnetic fields are often utilized for position sensing of mobile devices. In typical sensing systems, multiple sensors are used to detect magnetic fields generated by target devices. To determine the positions of the devices, magnetic-field data detected by the sensors must be converted to device-position data. The data conversion is not trivial because it is a nonlinear inverse problem. In this study, we propose a machine-learning approach suitable for data conversion required in the magnetic-field-based position sensing of target devices. In our approach, two different sets of training data are used. One of the training datasets is composed of raw data of magnetic fields to be detected by sensors. The other set is composed of logarithmically represented data of the fields. We can obtain two different predictor functions by learning with these training datasets. Results show that the prediction accuracy of the target position improves when the two different predictor functions are used. Based on our simulation, the error of the target position estimated with the predictor functions is within 10cm in a 2m × 2m × 2m cubic space for 87% of all the cases of the target device states. The computational time required for predicting the positions of the target device is 4ms. As the prediction method is accurate and rapid, it can be utilized for the real-time tracking of moving objects and people.
Shanqi PANG Xiankui PENG Xiao ZHANG Ruining ZHANG Cuijiao YIN
Quantum combinatorial designs are gaining popularity in quantum information theory. Quantum Latin squares can be used to construct mutually unbiased maximally entangled bases and unitary error bases. Here we present a general method for constructing quantum Latin arrangements from irredundant orthogonal arrays. As an application of the method, many new quantum Latin arrangements are obtained. We also find a sufficient condition such that the improved quantum orthogonal arrays [10] are equivalent to quantum Latin arrangements. We further prove that an improved quantum orthogonal array can produce a quantum uniform state.
Ding LI Chunxiang GU Yuefei ZHU
Website Fingerprinting (WF) enables a passive attacker to identify which website a user is visiting over an encrypted tunnel. Current WF attacks have two strong assumptions: (i) specific tunnel, i.e., the attacker can train on traffic samples collected in a simulated tunnel with the same tunnel settings as the user, and (ii) pseudo-open-world, where the attacker has access to training samples of unmonitored sites and treats them as a separate class. These assumptions, while experimentally feasible, render WF attacks less usable in practice. In this paper, we present Gene Fingerprinting (GF), a new WF attack that achieves cross-tunnel transferability by generating fingerprints that reflect the intrinsic profile of a website. The attack leverages Zero-shot Learning — a machine learning technique not requiring training samples to identify a given class — to reduce the effort to collect data from different tunnels and achieve a real open-world. We demonstrate the attack performance using three popular tunneling tools: OpenSSH, Shadowsocks, and OpenVPN. The GF attack attains over 94% accuracy on each tunnel, far better than existing CUMUL, DF, and DDTW attacks. In the more realistic open-world scenario, the attack still obtains 88% TPR and 9% FPR, outperforming the state-of-the-art attacks. These results highlight the danger of our attack in various scenarios where gathering and training on a tunnel-specific dataset would be impractical.
Takumi NISHIME Hiroshi HASHIGUCHI Naobumi MICHISHITA Hisashi MORISHITA
Platform-mounted small antennas increase dielectric loss and conductive loss and decrease the radiation efficiency. This paper proposes a novel antenna design method to improve radiation efficiency for platform-mounted small antennas by characteristic mode analysis. The proposed method uses mapping of modal weighting coefficient (MWC) and infinitesimal dipole and evaluate the metal casing with 100mm × 55mm × 23mm as a platform excited by an inverted-F antenna. The simulation and measurement results show that the radiation efficiency of 5% is improved with the whole system from 2.5% of the single antenna.
Chi-Min LI Dong-Lin LU Pao-Jen WANG
Currently, as the widespread usage of the smart devices in our daily life, the demands of high data rate and low latency services become important issues to facilitate various applications. However, high data rate service usually implies large bandwidth requirement. To solve the problem of bandwidth shortage below 6GHz (sub-6G), future wireless communications can be up-converted to the millimeter-wave (mm-wave) bands. Nevertheless, mm-wave frequency bands suffer from high channel attenuation and serious penetration loss compared with sub-6G frequency bands, and the signal transmission in the indoor environment will furthermore be affected by various partition materials, such as concrete, wood, glass, etc. Therefore, the fifth-generation (5G) mobile communication system may use multiple small cells (SC) to overcome the signal attenuation caused by using mm-wave bands. This paper will analyze the attenuation characteristics of some common partition materials in indoor environments. Besides, the performances, such as the received signal power, signal to interference plus noise ratio (SINR) and system capacity for different SC deployments are simulated and analyzed to provide the suitable guideline for each SC deployments.
Kyogo OTA Daisuke INOUE Mamoru SAWAHASHI Satoshi NAGATA
This paper proposes individual computation processes of the partial demodulation reference signal (DM-RS) sequence in a synchronization signal (SS)/physical broadcast channel (PBCH) block to be used to detect the radio frame timing based on SS/PBCH block index detection for New Radio (NR) initial access. We present the radio frame timing detection probability using the proposed partial DM-RS sequence detection method that is applied subsequent to the physical-layer cell identity (PCID) detection in five tapped delay line (TDL) models in both non-line-of-sight (NLOS) and line-of-sight (LOS) environments. Computer simulation results show that by using the proposed method, the radio frame timing detection probabilities of almost 100% and higher than 90% are achieved for the LOS and NLOS channel models, respectively, at the average received signal-to-noise power ratio (SNR) of 0dB with the frequency stability of a local oscillator in a set of user equipment (UE) of 5ppm at the carrier frequency of 4GHz.
Stanislav SEDUKHIN Yoichi TOMIOKA Kohei YAMAMOTO
In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.
Yuyao LIU Shi BAO Go TANAKA Yujun LIU Dongsheng XU
When collecting images, owing to the influence of shooting equipment, shooting environment, and other factors, often low-illumination images with insufficient exposure are obtained. For low-illumination images, it is necessary to improve the contrast. In this paper, a digital color image contrast enhancement method based on luminance weight adjustment is proposed. This method improves the contrast of the image and maintains the detail and nature of the image. In the proposed method, the illumination of the histogram equalization image and the adaptive gamma correction with weighted distribution image are adjusted by the luminance weight of w1 to obtain a detailed image of the bright areas. Thereafter, the suppressed multi-scale retinex (MSR) is used to process the input image and obtain a detailed image of the dark areas. Finally, the luminance weight w2 is used to adjust the illumination component of the detailed images of the bright and dark areas, respectively, to obtain the output image. The experimental results show that the proposed method can enhance the details of the input image and avoid excessive enhancement of contrast, which maintains the naturalness of the input image well. Furthermore, we used the discrete entropy and lightness order error function to perform a numerical evaluation to verify the effectiveness of the proposed method.
Shuhei TAMATE Yutaka TABUCHI Yasunobu NAKAMURA
In this paper, we review the basic components of superconducting quantum computers. We mainly focus on the packaging and wiring technologies required to realize large-scalable superconducting quantum computers.
Shunsuke TSUKADA Hikaru TAKAYASHIKI Masayuki SATO Kazuhiko KOMATSU Hiroaki KOBAYASHI
A hybrid memory architecture (HMA) that consists of some distinct memory devices is expected to achieve a good balance between high performance and large capacity. Unlike conventional memory architectures, the HMA needs the metadata for data management since the data are migrated between the memory devices during the execution of an application. The memory controller caches the metadata to avoid accessing the memory devices for the metadata reference. However, as the amount of the metadata increases in proportion to the size of the HMA, the memory controller needs to handle a large amount of metadata. As a result, the memory controller cannot cache all the metadata and increases the number of metadata references. This results in an increase in the access latency to reach the target data and degrades the performance. To solve this problem, this paper proposes a metadata prefetching mechanism for HMAs. The proposed mechanism loads the metadata needed in the near future by prefetching. Moreover, to increase the effect of the metadata prefetching, the proposed mechanism predicts the metadata used in the near future based on an address difference that is the difference between two consecutive access addresses. The evaluation results show that the proposed metadata prefetching mechanism can improve the instructions per cycle by up to 44% and 9% on average.
The objective of critical nodes problem is to minimize pair-wise connectivity as a result of removing a specific number of nodes in the residual graph. From a mathematical modeling perspective, it comes the truth that the more the number of fragmented components and the evenly distributed of disconnected sub-graphs, the better the quality of the solution. Basing on this conclusion, we proposed a new Cluster Expansion Method for Critical Node Problem (CEMCNP), which on the one hand exploits a contraction mechanism to greedy simplify the complexity of sparse graph model, and on the other hand adopts an incremental cluster expansion approach in order to maintain the size of formed component within reasonable limitation. The proposed algorithm also relies heavily on the idea of multi-start iterative local search algorithm, whereas brings in a diversified late acceptance local search strategy to keep the balance between interleaving diversification and intensification in the process of neighborhood search. Extensive evaluations show that CEMCNP running on 35 of total 42 benchmark instances are superior to the outcome of KBV, while holding 3 previous best results out of the challenging instances. In addition, CEMCNP also demonstrates equivalent performance in comparison with the existing MANCNP and VPMS algorithms over 22 of total 42 graph models with fewer number of node exchange operations.
In this study, we aim to improve the performance of audio source separation for monaural mixture signals. For monaural audio source separation, semisupervised nonnegative matrix factorization (SNMF) can achieve higher separation performance by employing small supervised signals. In particular, penalized SNMF (PSNMF) with orthogonality penalty is an effective method. PSNMF forces two basis matrices for target and nontarget sources to be orthogonal to each other and improves the separation accuracy. However, the conventional orthogonality penalty is based on an inner product and does not affect the estimation of the basis matrix properly because of the scale indeterminacy between the basis and activation matrices in NMF. To cope with this problem, a new PSNMF with cosine similarity between the basis matrices is proposed. The experimental comparison shows the efficacy of the proposed cosine similarity penalty in supervised audio source separation.
Takashi ISHIO Naoto MAEDA Kensuke SHIBUYA Kenho IWAMOTO Katsuro INOUE
Software developers may write a number of similar source code fragments including the same mistake in software products. To remove such faulty code fragments, developers inspect code clones if they found a bug in their code. While various code clone detection methods have been proposed to identify clones of either code blocks or functions, those tools do not always fit the code inspection task because a faulty code fragment may be much smaller than code blocks, e.g. a single line of code. To enable developers to search code clones of such a small faulty code fragment in a large-scale software product, we propose a method using Lempel-Ziv Jaccard Distance, which is an approximation of Normalized Compression Distance. We conducted an experiment using an existing research dataset and a user survey in a company. The result shows our method efficiently reports cloned faulty code fragments and the performance is acceptable for software developers.
Yu DAI Zijian ZHOU Fangguo ZHANG Chang-An ZHAO
Pairing computations on elliptic curves with odd prime degrees are rarely studied as low efficiency. Recently, Clarisse, Duquesne and Sanders proposed two new curves with odd prime embedding degrees: BW13-P310 and BW19-P286, which are suitable for some special cryptographic schemes. In this paper, we propose efficient methods to compute the optimal ate pairing on this types of curves, instantiated by the BW13-P310 curve. We first extend the technique of lazy reduction into the finite field arithmetic. Then, we present a new method to execute Miller's algorithm. Compared with the standard Miller iteration formulas, the new ones provide a more efficient software implementation of pairing computations. At last, we also give a fast formula to perform the final exponentiation. Our implementation results indicate that it can be computed efficiently, while it is slower than that over the (BLS12-P446) curve at the same security level.
Ryota YOSHIMURA Ichiro MARUTA Kenji FUJIMOTO Ken SATO Yusuke KOBAYASHI
Particle filters have been widely used for state estimation problems in nonlinear and non-Gaussian systems. Their performance depends on the given system and measurement models, which need to be designed by the user for each target system. This paper proposes a novel method to design these models for a particle filter. This is a numerical optimization method, where the particle filter design process is interpreted into the framework of reinforcement learning by assigning the randomnesses included in both models of the particle filter to the policy of reinforcement learning. In this method, estimation by the particle filter is repeatedly performed and the parameters that determine both models are gradually updated according to the estimation results. The advantage is that it can optimize various objective functions, such as the estimation accuracy of the particle filter, the variance of the particles, the likelihood of the parameters, and the regularization term of the parameters. We derive the conditions to guarantee that the optimization calculation converges with probability 1. Furthermore, in order to show that the proposed method can be applied to practical-scale problems, we design the particle filter for mobile robot localization, which is an essential technology for autonomous navigation. By numerical simulations, it is demonstrated that the proposed method further improves the localization accuracy compared to the conventional method.
Tomohiro YAMAZAKI Hisashi KOGA
We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.