Jaeyong KO Namkyoung KIM Kyungho YOO Tongho CHUNG
The increasing demand for millimeter-wave (mmWave) frequencies with wider signal bandwidths, such as 5G NR, requires large investments on test equipment. This work presents a 5G mmWave up/down-converter with a 40 GHz LO, fabricated in custom PCBs with off-the-shelf components. The mmWave converter has broad IF and RF bandwidths of 1∼5 GHz and 21∼45 GHz, and the built-in LO generates 20∼29.5 GHz and 33.5∼40 GHz of output. To achieve high linearity of the converter simultaneously, the LO must produce low-phase-noise and be capable of high harmonics/spur rejection, and design techniques related to these features are demonstrated. Additionally, a reconfigurable IF amplifier for bi-directional conversion is included and demonstrates low gain variation to maintain the linearity of the wideband modulation signals. The final designed converter is tested with 5G OFDM 64-QAM 100 MHz 1-CC (4-CC) signals and shows RF/IF output power of -3/8 dBm with a linear range of 35 (30)/38 (33) dB at an EVM of 25 dB.
Robin KAESBACH Marcel VAN DELDEN Thomas MUSCH
Precision microwave measurement systems require highly stable oscillators with both excellent long-term and short-term stability. Compared to components used in laboratory instruments, dielectric resonator oscillators (DRO) offer low phase noise with greatly reduced mechanical complexity. To further enhance performance, phase-locked loop (PLL) stabilization can be used to eliminate drift and provide precise frequency control. In this work, the design of a low-cost DRO concept is presented and its performance is evaluated through simulations and measurements. An open-loop phase noise of -107.2 dBc/Hz at 10 kHz offset frequency and 12.8 GHz output frequency is demonstrated. Drift and phase noise are reduced by a PLL, so that a very low jitter of under 29.6 fs is achieved over the entire operating bandwidth.
Yuma KAWAMOTO Toki YOSHIOKA Norihiko SHIBATA Daniel HEADLAND Masayuki FUJITA Ryo KOMA Ryo IGARASHI Kazutaka HARA Jun-ichi KANI Tadao NAGATSUMA
We propose a novel silicon diplexer integrated with filters for frequency-division multiplexing in the 300-GHz band. The diplexer consists of a directional coupler formed of unclad silicon wires, a photonic bandgap-based low-pass filter, and a high-pass filter based on frequency-dependent bending loss. These integrated filters are capable of suppressing crosstalk and providing >15dB isolation over 40GHz, which is highly beneficial for terahertz-range wireless communications applications. We have used this diplexer in a simultaneous error-free wireless transmission of 300-GHz and 335-GHz channels at the aggregate data rate of 36Gbit/s.
Rongcheng DONG Taisuke IZUMI Naoki KITAMURA Yuichi SUDO Toshimitsu MASUZAWA
The maximal independent set (MIS) problem is one of the most fundamental problems in the field of distributed computing. This paper focuses on the MIS problem with unreliable communication between processes in the system. We propose a relaxed notion of MIS, named almost MIS (ALMIS), and show that the loosely-stabilizing algorithm proposed in our previous work can achieve exponentially long holding time with logarithmic convergence time and space complexity regarding ALMIS, which cannot be achieved at the same time regarding MIS in our previous work.
Atsushi MATSUO Yudai SUZUKI Ikko HAMAMURA Shigeru YAMASHITA
The Variational Quantum Eigensolver (VQE) algorithm is gaining interest for its potential use in near-term quantum devices. In the VQE algorithm, parameterized quantum circuits (PQCs) are employed to prepare quantum states, which are then utilized to compute the expectation value of a given Hamiltonian. Designing efficient PQCs is crucial for improving convergence speed. In this study, we introduce problem-specific PQCs tailored for optimization problems by dynamically generating PQCs that incorporate problem constraints. This approach reduces a search space by focusing on unitary transformations that benefit the VQE algorithm, and accelerate convergence. Our experimental results demonstrate that the convergence speed of our proposed PQCs outperforms state-of-the-art PQCs, highlighting the potential of problem-specific PQCs in optimization problems.
Ullah IMDAD Akram BEN AHMED Kazuei HIRONAKA Kensuke IIZUKA Hideharu AMANO
FPGA clusters that consist of multiple FPGA boards have been gaining interest in recent times. Massively parallel processing with a stand-alone heterogeneous FPGA cluster with SoC- style FPGAs and mid-scale FPGAs is promising with cost-performance benefit. Here, we propose such a heterogeneous FPGA cluster with FiC and M-KUBOS cluster. FiC consists of multiple boards, mounting middle scale Xilinx's FPGAs and DRAMs, which are tightly coupled with high-speed serial links. In addition, M-KUBOS boards are connected to FiC for ensuring high IO data transfer bandwidth. As an example of massively parallel processing, here we implement genomic pattern search. Next-generation sequencing (NGS) technology has revolutionized biological system related research by its high-speed, scalable and massive throughput. To analyze the genomic data, short read mapping technique is used where short Deoxyribonucleic acid (DNA) sequences are mapped relative to a known reference sequence. Although several pattern matching techniques are available, FM-index based pattern search is perfectly suitable for this task due to the fastest mapping from known indices. Since matching can be done in parallel for different data, the massively parallel computing which distributes data, executes in parallel and gathers the results can be applied. We also implement a data compression method where about 10 times reduction in data size is achieved. We found that a M-KUBOS board matches four FiC boards, and a system with six M-KUBOS boards and 24 FiC boards achieved 30 times faster than the software based implementation.
Takashi YOKOTA Kanemitsu OOTSU Shun KOJIMA
An interconnection network is an inevitable component for constructing parallel computers. It connects computation nodes so that the nodes can communicate with each other. As a parallel computation essentially requires inter-node communication according to a parallel algorithm, the interconnection network plays an important role in terms of communication performance. This paper focuses on the collective communication that is frequently performed in parallel computation and this paper addresses the Cup-Stacking method that is proposed in our preceding work. The key issues of the method are splitting a large packet into slices, re-shaping the slice, and stacking the slices, in a genetic algorithm (GA) manner. This paper discusses extending the Cup-Stacking method by introducing additional items (genes) and proposes the extended Cup-Stacking method. Furthermore, this paper places comprehensive discussions on the drawbacks and further optimization of the method. Evaluation results reveal the effectiveness of the extended method, where the proposed method achieves at most seven percent improvement in duration time over the former Cup-Stacking method.
Zahra AZIZAH Tomoya OHYAMA Xiumin ZHAO Yuichi OHKAWA Takashi MITSUISHI
Learning analytics (LA) has emerged as a technique for educational quality improvement in many learning contexts, including blended learning (BL) courses. Numerous studies show that students' academic performance is significantly impacted by their ability to engage in self-regulated learning (SRL). In this study, learning behaviors indicating SRL and motivation are elucidated during a BL course on second language learning. Online trace data of a mobile language learning application (m-learning app) is used as a part of BL implementation. The observed motivation were of two categories: high-level motivation (study in time, study again, and early learning) and low-level motivation (cramming and catch up). As a result, students who perform well tend to engage in high-level motivation. While low performance students tend to engage in clow-level motivation. Those findings are supported by regression models showing that study in time followed by early learning significantly influences the academic performance of BL courses, both in the spring and fall semesters. Using limited resource of m-learning app log data, this BL study could explain the overall BL performance.
We present an effective system for integrating generative zero-shot classification modules into a YOLO-like dense detector to detect novel objects. Most double-stage-based novel object detection methods are achieved by refining the classification output branch but cannot be applied to a dense detector. Our system utilizes two paths to inject knowledge of novel objects into a dense detector. One involves injecting the class confidence for novel classes from a classifier trained on data synthesized via a dual-step generator. This generator learns a mapping function between two feature spaces, resulting in better classification performance. The second path involves re-training the detector head with feature maps synthesized on different intensity levels. This approach significantly increases the predicted objectness for novel objects, which is a major challenge for a dense detector. We also introduce a stop-and-reload mechanism during re-training for optimizing across head layers to better learn synthesized features. Our method relaxes the constraint on the detector head architecture in the previous method and has markedly enhanced performance on the MSCOCO dataset.
Shugang LIU Yujie WANG Qiangguo YU Jie ZHAN Hongli LIU Jiangtao LIU
Driver fatigue detection has become crucial in vehicle safety technology. Achieving high accuracy and real-time performance in detecting driver fatigue is paramount. In this paper, we propose a novel driver fatigue detection algorithm based on dynamic tracking of Facial Eyes and Yawning using YOLOv7, named FEY-YOLOv7. The Coordinate Attention module is inserted into YOLOv7 to enhance its dynamic tracking accuracy by focusing on coordinate information. Additionally, a small target detection head is incorporated into the network architecture to promote the feature extraction ability of small facial targets such as eyes and mouth. In terms of compution, the YOLOv7 network architecture is significantly simplified to achieve high detection speed. Using the proposed PERYAWN algorithm, driver status is labeled and detected by four classes: open_eye, closed_eye, open_mouth, and closed_mouth. Furthermore, the Guided Image Filtering algorithm is employed to enhance image details. The proposed FEY-YOLOv7 is trained and validated on RGB-infrared datasets. The results show that FEY-YOLOv7 has achieved mAP of 0.983 and FPS of 101. This indicates that FEY-YOLOv7 is superior to state-of-the-art methods in accuracy and speed, providing an effective and practical solution for image-based driver fatigue detection.
Lei LI Hong-Jun ZHANG Hang-Yu FAN Zhe-Ming LU
Until today, digital image watermarking has not been large-scale used in the industry. The first reason is that the watermarking efficiency is low and the real-time performance cannot be satisfied. The second reason is that the watermarking scheme cannot cope with various attacks. To solve above problems, this paper presents a multi-domain based digital image watermarking scheme, where a fast DFT (Discrete Fourier Transform) based watermarking method is proposed for synchronization correction and an IWT-DCT (Integer Wavelet Transform-Discrete Cosine Transform) based watermarking method is proposed for information embedding. The proposed scheme has high efficiency during embedding and extraction. Compared with five existing schemes, the robustness of our scheme is very strong and our scheme can cope with many common attacks and compound attacks, and thus can be used in wide application scenarios.
Modern distributed storage requires microsecond-scale tail latency, but the current coordinator-based quorum coordination causes a burdensome latency overhead. This paper presents Archon, a new quorum coordination architecture that supports low tail latency for microsecond-scale replicated storage. The key idea of Archon is to perform the quorum coordination in the network switch by leveraging the flexibility and capability of emerging programmable switch ASICs. Our in-network quorum coordination is based on the observation that the modern programmable switch provides nanosecond-scale processing delay and high flexibility simultaneously. To realize the idea, we design a custom switch data plane. We implement a Archon prototype on an Intel Tofino switch and conduct a series of testbed experiments. Our experimental results show that Archon can provide lower tail latency than the coordinator-based solution.
Ann Jelyn TIEMPO Yong-Jin JEONG
Field Programmable Gate Array (FPGA) is gaining popularity because of their reconfigurability which brings in security concerns like inserting hardware trojan. Various detection methods to overcome this threat have been proposed but in the ASIC's supply chain and cannot directly apply to the FPGA application. In this paper, the authors aim to implement a structural feature-based detection method for detecting hardware trojan in a cell-level netlist, which is not well explored yet, where the nets are segmented into smaller groups based on their interconnection and further analyzed by looking at their structural similarities. Experiments show positive performance with an average detection rate of 95.41%, an average false alarm rate of 2.87% and average accuracy of 96.27%.
Fengchuan XU Qiaoyue LI Guilu ZHANG Yasheng CHANG Zixuan ZHENG
This letter presents a global feature-based method for evaluating the no reference quality of scanning electron microscopy (SEM) contrast-distorted images. Based on the characteristics of SEM images and the human visual system, the global features of SEM images are extracted as the score for evaluating image quality. In this letter, the texture information of SEM images is first extracted using a low-pass filter with orientation, and the amount of information in the texture part is calculated based on the entropy reflecting the complexity of the texture. The singular values with four scales of the original image are then calculated, and the amount of structural change between different scales is calculated and averaged. Finally, the amounts of texture information and structural change are pooled to generate the final quality score of the SEM image. Experimental results show that the method can effectively evaluate the quality of SEM contrast-distorted images.
This paper mainly proposes a line segment detection method based on pseudo peak suppression and local Hough transform, which has good noise resistance and can solve the problems of short line segment missing detection, false detection, and oversegmentation. In addition, in response to the phenomenon of uneven development in nuclear emulsion tomographic images, this paper proposes an image preprocessing process that uses the “Difference of Gaussian” method to reduce noise and then uses the standard deviation of the gray value of each pixel to bundle and unify the gray value of each pixel, which can robustly obtain the linear features in these images. The tests on the actual dataset of nuclear emulsion tomographic images and the public YorkUrban dataset show that the proposed method can effectively improve the accuracy of convolutional neural network or vision in transformer-based event classification for alpha-decay events in nuclear emulsion. In particular, the line segment detection method in the proposed method achieves optimal results in both accuracy and processing speed, which also has strong generalization ability in high quality natural images.
Junya YOSHIDA Naoki HAYASHI Shigemasa TAKAI
This paper presents a quantized gradient descent algorithm for distributed nonconvex optimization in multiagent systems that takes into account the bandwidth limitation of communication channels. Each agent encodes its estimation variable using a zoom-in parameter and sends the quantized intermediate variable to the neighboring agents. Then, each agent updates the estimation by decoding the received information. In this paper, we show that all agents achieve consensus and their estimated variables converge to a critical point in the optimization problem. A numerical example of a nonconvex logistic regression shows that there is a trade-off between the convergence rate of the estimation and the communication bandwidth.
Bandpass filters (BPFs) are very important to extract target signals and eliminate noise from the received signals. A BPF of which frequency characteristics is a sum of Gaussian functions is called the Gaussian mixture BPF (GMBPF). In this research, we propose to implement the GMBPF approximately by the sum of several frequency components of the sliding Fourier transform (SFT) or the attenuated SFT (ASFT). Because a component of the SFT/ASFT can be approximately realized using the finite impulse response (FIR) recursive filters, its calculation complexity does not depend on the length of the impulse response. The property makes GMBPF ideal for narrow bandpass filtering applications. We conducted experiments to demonstrate the advantages of the proposed GMBPF over FIR filters designed by a MATLAB function with regard to the computational complexity.
Tomoya FUKAMI Hirobumi SAITO Akira HIROSE
This paper proposes an accurate and efficient method to calculate probability distributions of pulse-shaped complex signals. We show that the distribution over the in-phase and quadrature-phase (I/Q) complex plane is obtained by a recursive probability mass function of the accumulator for a pulse-shaping filter. In contrast to existing analytical methods, the proposed method provides complex-plane distributions in addition to instantaneous power distributions. Since digital signal processing generally deals with complex amplitude rather than power, the complex-plane distributions are more useful when considering digital signal processing. In addition, our approach is free from the derivation of signal-dependent functions. This fact results in its easy application to arbitrary constellations and pulse-shaping filters like Monte Carlo simulations. Since the proposed method works without numerical integrals and calculations of transcendental functions, the accuracy degradation caused by floating-point arithmetic is inherently reduced. Even though our method is faster than Monte Carlo simulations, the obtained distributions are more accurate. These features of the proposed method realize a novel framework for evaluating the characteristics of pulse-shaped signals, leading to new modulation, predistortion and peak-to-average power ratio (PAPR) reduction schemes.
Hong LI Wenjun CAO Chen WANG Xinrui ZHU Guisheng LIAO Zhangqing HE
The configurable Ring oscillator Physical unclonable function (CRO PUF) is the newly proposed strong PUF based on classic RO PUF, which can generate exponential Challenge-Response Pairs (CRPs) and has good uniqueness and reliability. However, existing proposals have low hardware utilization and vulnerability to modeling attacks. In this paper, we propose a Novel Configurable Dual State (CDS) PUF with lower overhead and higher resistance to modeling attacks. This structure can be flexibly transformed into RO PUF and TERO PUF in the same topology according to the parity of the Hamming Weight (HW) of the challenge, which can achieve 100% utilization of the inverters and improve the efficiency of hardware utilization. A feedback obfuscation mechanism (FOM) is also proposed, which uses the stable count value of the ring oscillator in the PUF as the updated mask to confuse and hide the original challenge, significantly improving the effect of resisting modeling attacks. The proposed FOM-CDS PUF is analyzed by building a mathematical model and finally implemented on Xilinx Artix-7 FPGA, the test results show that the FOM-CDS PUF can effectively resist several popular modeling attack methods and the prediction accuracy is below 60%. Meanwhile it shows that the FOM-CDS PUF has good performance with uniformity, Bit Error Rate at different temperatures, Bit Error Rate at different voltages and uniqueness of 53.68%, 7.91%, 5.64% and 50.33% respectively.
Naoko KIFUNE Hironori UCHIKAWA
At a flash memory, each stored data frame is protected by error correction codes (ECC) such as Bose-Chaudhuri-Hocquenghem (BCH) codes from random errors. Exclusive-OR (XOR) based erasure codes like RAID-5 have also been employed at the flash memory to protect from memory block defects. Conventionally, the ECC and erasure codes are used separately since their target errors are different. Due to recent aggressive technology scaling, additional error correction capability for random errors is required without adding redundancy. We propose an algorithm to improve error correction capability by using XOR parity with a simple counter that counts the number of unreliable bits in the XOR stripe. We also propose to apply Chase decoding to the proposed algorithm. The counter makes it possible to reduce the false correction and execute the efficient Chase decoding. We show that combining the proposed algorithm with Chase decoding can significantly improve the decoding performance.