Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA
End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.
Zhishuo ZHANG Chengxiang TAN Xueyan ZHAO Min YANG
Entity alignment (EA) is a crucial task for integrating cross-lingual and cross-domain knowledge graphs (KGs), which aims to discover entities referring to the same real-world object from different KGs. Most existing embedding-based methods generate aligning entity representation by mining the relevance of triple elements, paying little attention to triple indivisibility and entity role diversity. In this paper, a novel framework named TTEA - Type-enhanced Ensemble Triple Representation via Triple-aware Attention for Cross-lingual Entity Alignment is proposed to overcome the above shortcomings from the perspective of ensemble triple representation considering triple specificity and diversity features of entity role. Specifically, the ensemble triple representation is derived by regarding relation as information carrier between semantic and type spaces, and hence the noise influence during spatial transformation and information propagation can be smoothly controlled via specificity-aware triple attention. Moreover, the role diversity of triple elements is modeled via triple-aware entity enhancement in TTEA for EA-oriented entity representation. Extensive experiments on three real-world cross-lingual datasets demonstrate that our framework makes comparative results.
Hao WANG Yao MA Jianyong DUAN Li HE Xin LI
Chinese Spelling Correction (CSC) is an important natural language processing task. Existing methods for CSC mostly utilize BERT models, which select a character from a candidate list to correct errors in the sentence. World knowledge refers to structured information and relationships spanning a wide range of domains and subjects, while definition knowledge pertains to textual explanations or descriptions of specific words or concepts. Both forms of knowledge have the potential to enhance a model’s ability to comprehend contextual nuances. As BERT lacks sufficient guidance from world knowledge for error correction and existing models overlook the rich definition knowledge in Chinese dictionaries, the performance of spelling correction models is somewhat compromised. To address these issues, within the world knowledge network, this study injects world knowledge from knowledge graphs into the model to assist in correcting spelling errors caused by a lack of world knowledge. Additionally, the definition knowledge network in this model improves the error correction capability by utilizing the definitions from the Chinese dictionary through a comparative learning approach. Experimental results on the SIGHAN benchmark dataset validate the effectiveness of our approach.
Hongbo LI Aijun LIU Qiang YANG Zhe LYU Di YAO
To improve the direction-of-arrival estimation performance of the small-aperture array, we propose a source localization method inspired by the Ormia fly’s coupled ears and MUSIC-like algorithm. The Ormia can local its host cricket’s sound precisely despite the tremendous incompatibility between the spacing of its ear and the sound wavelength. In this paper, we first implement a biologically inspired coupled system based on the coupled model of the Ormia’s ears and solve its responses by the modal decomposition method. Then, we analyze the effect of the system on the received signals of the array. Research shows that the system amplifies the amplitude ratio and phase difference between the signals, equivalent to creating a virtual array with a larger aperture. Finally, we apply the MUSIC-like algorithm for DOA estimation to suppress the colored noise caused by the system. Numerical results demonstrate that the proposed method can improve the localization precision and resolution of the array.
Hongyun LU Mengmeng ZHANG Hongyuan JING Zhi LIU
Currently, the most advanced knowledge distillation models use a metric learning approach based on probability distributions. However, the correlation between supervised probability distributions is typically geometric and implicit, causing inefficiency and an inability to capture structural feature representations among different tasks. To overcome this problem, we propose a knowledge distillation loss using the robust sliced Wasserstein distance with geometric median (GMSW) to estimate the differences between the teacher and student representations. Due to the intuitive geometric properties of GMSW, the student model can effectively learn to align its produced hidden states from the teacher model, thereby establishing a robust correlation among implicit features. In experiment, our method outperforms state-of-the-art models in both high-resource and low-resource settings.
Kota CHIN Keita EMURA Shingo SATO Kazumasa OMOTE
In an open-bid auction, a bidder can know the budgets of other bidders. Thus, a sealed-bid auction that hides bidding prices is desirable. However, in previous sealed-bid auction protocols, it has been difficult to provide a “fund binding” property, which would guarantee that a bidder has funds more than or equal to the bidding price and that the funds are forcibly withdrawn when the bidder wins. Thus, such protocols are vulnerable to a false bidding. As a solution, many protocols employ a simple deposit method in which each bidder sends a deposit to a smart contract, which is greater than or equal to the bidding price, before the bidding phase. However, this deposit reveals the maximum bidding price, and it is preferable to hide this information. In this paper, we propose a sealed-bid auction protocol that provides a fund binding property. Our protocol not only hides the bidding price and a maximum bidding price, but also provides a fund binding property, simultaneously. For hiding the maximum bidding price, we pay attention to the fact that usual Ethereum transactions and transactions for sending funds to a one-time address have the same transaction structure, and it seems that they are indistinguishable. We discuss how much bidding transactions are hidden. We also employ DECO (Zhang et al., CCS 2020) that proves the validity of the data to a verifier in which the data are taken from a source without showing the data itself. Finally, we give our implementation which shows transaction fees required and compare it to a sealed-bid auction protocol employing the simple deposit method.
Kensuke SUMOTO Kenta KANAKOGI Hironori WASHIZAKI Naohiko TSUDA Nobukazu YOSHIOKA Yoshiaki FUKAZAWA Hideyuki KANUKA
Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.
Li HE Jingxuan ZHAO Jianyong DUAN Hao WANG Xin LI
In Natural Language Understanding, intent detection and slot filling have been widely used to understand user queries. However, current methods tend to rely on single words and sentences to understand complex semantic concepts, and can only consider local information within the sentence. Therefore, they usually cannot capture long-distance dependencies well and are prone to problems where complex intentions in sentences are difficult to recognize. In order to solve the problem of long-distance dependency of the model, this paper uses ConceptNet as an external knowledge source and introduces its extensive semantic information into the multi-intent detection and slot filling model. Specifically, for a certain sentence, based on confidence scores and semantic relationships, the most relevant conceptual knowledge is selected to equip the sentence, and a concept context map with rich information is constructed. Then, the multi-head graph attention mechanism is used to strengthen context correlation and improve the semantic understanding ability of the model. The experimental results indicate that the model has significantly improved performance compared to other models on the MixATIS and MixSNIPS multi-intent datasets.
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
Yukihiro TOZAWA Takeshi ISHIDA Jiaqing WANG Osamu FUJIWARA
Measurements of contact discharge current waveforms from an ESD generator with a test voltage of 4kV are conducted with the IEC specified arrangement of a 2m long return current cable in different three calibration environments that all comply with the IEC calibration standard to identify the occurrence source of damped oscillations (ringing), which has remained unclear since contact discharge testing was first adopted in 1989 IEC publication 801-2. Their frequency spectra are analyzed comparing with the spectrum calculated from the ideal contact discharge current waveform without ringing (IEC specified waveform) offered in IEC 61000-4-2 and the spectra derived from a simplified equivalent circuit based on the IEC standard in combination with the measured input impedances of one-ended grounding return current cable with the same arrangement in the same calibration environment as those for the current measurements. The results show that the measured contact discharge waveforms have ringing around the IEC specified waveform after the falling edge of the peak, causing their spectra from 20MHz to 200MHz, but the spectra from 40MHz to 200MHz significantly differ depending on the calibration environments even for the same cable arrangement, which do not almost affect the spectra from 20MHz to 40MHz and over 200MHz. In the calibration environment under the cable arrangement close to the reference ground, the spectral shapes of the measured contact discharge currents and their frequencies of the multiple peaks and dips roughly correspond to the spectral distributions calculated from the simplified equivalent circuit using the measured cable input impedances. These findings reveal that the root cause of ringing is mainly due to the resonances of the return current cable, and calibration environment under the cable arrangement away from the reference ground tends to mitigate the cable resonances.
Rin OISHI Junichiro KADOMOTO Hidetsugu IRIE Shuichi SAKAI
As more and more programs handle personal information, the demand for secure handling of data is increasing. The protocol that satisfies this demand is called Secure function evaluation (SFE) and has attracted much attention from a privacy protection perspective. In two-party SFE, two mutually untrustworthy parties compute an arbitrary function on their respective secret inputs without disclosing any information other than the output of the function. For example, it is possible to execute a program while protecting private information, such as genomic information. The garbled circuit (GC) — a method of program obfuscation in which the program is divided into gates and the output is calculated using a symmetric key cipher for each gate — is an efficient method for this purpose. However, GC is computationally expensive and has a significant overhead even with an accelerator. We focus on hardware acceleration because of the nature of GC, which is limited to certain types of calculations, such as encryption and XOR. In this paper, we propose an architecture that accelerates garbling by running multiple garbling engines simultaneously based on the latest FPGA-based GC accelerator. In this architecture, managers are introduced to perform multiple rows of pipeline processing simultaneously. We also propose an optimized implementation of RAM for this FPGA accelerator. As a result, it achieves an average performance improvement of 26% in garbling the same set of programs, compared to the state-of-the-art (SOTA) garbling accelerator.
Masayuki ARIYOSHI Kazumine OGURA Tatsuya SUMIYA Nagma S. KHAN Shingo YAMANOUCHI Toshiyuki NOMURA
Radar-based sensing and concealed weapon detection technologies have been attracting attention as a measure to enhance security screening in public facilities and various venues. For these applications, the security check must be performed without impeding the flow of people, with minimum human effort, and in a non-contact manner. We developed technologies for a high-throughput walk-through security screening called Invisible Sensing (IVS) and implemented them in a prototype system. The IVS system consists of dual planar radar panels facing each other and carries out an inspection based on a multi-region screening approach as a person walks between the panels. Our imaging technology constructs a high-quality radar image that compensates for motion blur caused by a person's walk. Our detection technology takes multi-view projected images across the multiple regions as input to enable real-time whole-body screening. The IVS system runs its functions by pipeline processing to achieve real-time screening operation. This paper presents our IVS system along with these key technologies and demonstrates its empirical performance.
Keisuke KAWAHARA Yohtaro UMEDA Kyoya TAKANO Shinsuke HARA
This paper presents a compact fully-differential distributed amplifier using a coupled inductor. Differential distributed amplifiers are widely required in optical communication systems. Most of the distributed amplifiers reported in the past are single-ended or pseudo-differential topologies. In addition, the differential distributed amplifiers require many inductors, which increases the silicon cost. In this study, we use differentially coupled inductors to reduce the chip area to less than half and eliminate the difficulties in layout design. The challenge in using coupled inductors is the capacitive parasitic coupling that degrades the flatness of frequency response. To address this challenge, the odd-mode image parameters of a differential artificial transmission line are derived using a simple loss-less model. Based on the analytical results, we optimize the dimensions of the inductor with the gradient descent algorithm to achieve accurate impedance matching and phase matching. The amplifier was fabricated in 0.18-µm CMOS technology. The core area of the amplifier is 0.27 mm2, which is 57% smaller than the previous work. Besides, we demonstrated a small group delay variation of ±2.7 ps thanks to the optimization. the amplifier successfully performed 30-Gbps NRZ and PAM4 transmissions with superior jitter performance. The proposed technique will promote the high-density integration of differential traveling wave devices.
Robin KAESBACH Marcel VAN DELDEN Thomas MUSCH
Precision microwave measurement systems require highly stable oscillators with both excellent long-term and short-term stability. Compared to components used in laboratory instruments, dielectric resonator oscillators (DRO) offer low phase noise with greatly reduced mechanical complexity. To further enhance performance, phase-locked loop (PLL) stabilization can be used to eliminate drift and provide precise frequency control. In this work, the design of a low-cost DRO concept is presented and its performance is evaluated through simulations and measurements. An open-loop phase noise of -107.2 dBc/Hz at 10 kHz offset frequency and 12.8 GHz output frequency is demonstrated. Drift and phase noise are reduced by a PLL, so that a very low jitter of under 29.6 fs is achieved over the entire operating bandwidth.
Tania SULTANA Sho KUROSAKI Yutaka JITSUMATSU Shigehide KUHARA Jun'ichi TAKEUCHI
We assess how well the recently created MRI reconstruction technique, Multi-Resolution Convolutional Neural Network (MRCNN), performs in the core medical vision field (classification). The primary goal of MRCNN is to identify the best k-space undersampling patterns to accelerate the MRI. In this study, we use the Figshare brain tumor dataset for MRI classification with 3064 T1-weighted contrast-enhanced MRI (CE-MRI) over three categories: meningioma, glioma, and pituitary tumors. We apply MRCNN to the dataset, which is a method to reconstruct high-quality images from under-sampled k-space signals. Next, we employ the pre-trained VGG16 model, which is a Deep Neural Network (DNN) based image classifier to the MRCNN restored MRIs to classify the brain tumors. Our experiments showed that in the case of MRCNN restored data, the proposed brain tumor classifier achieved 92.79% classification accuracy for a 10% sampling rate, which is slightly higher than that of SRCNN, MoDL, and Zero-filling methods have 91.89%, 91.89%, and 90.98% respectively. Note that our classifier was trained using the dataset consisting of the images with full sampling and their labels, which can be regarded as a model of the usual human diagnostician. Hence our results would suggest MRCNN is useful for human diagnosis. In conclusion, MRCNN significantly enhances the accuracy of the brain tumor classification system based on the tumor location using under-sampled k-space signals.
Jean TEMGA Koki EDAMATSU Tomoyuki FURUICHI Mizuki MOTOYOSHI Takashi SHIBA Noriharu SUEMATSU
In this article, a new Beamforming Network (BFN) realized in Broadside Coupled Stripline (BCS) is proposed to feed 1×4 and 2×2 arrays antenna at 28 GHZ-Band. The new BFN is composed only of couplers and phase shifters. It doesn't require any crossover compared to the conventional Butler Matrix (BM) which requires two crossovers. The tight coupling and low loss characteristics of the BCS allow a design of a compact and wideband BFN. The new BFN produces the phase differences of (±90°) and (±45°, ±135°) respectively in x- and y-directions. Its integration with a 1×4 linear array antenna reduces the array area by 70% with an improvement of the gain performance compared with the conventional array. The integration with a 2×2 array allows the realization of a full 2-D beam scanning. The proposed concept has been verified experimentally by measuring the fabricated prototypes of the BFN, the 1-D and 2-D patch arrays antennas. The measured 11.5 dBi and 11.3 dBi maximum gains are realized in θ0 = 14° and (θ0, φ0) = (45°,345°) directions respectively for the 1-D and 2-D patch arrays. The physical area of the fabricated BFN is only (0.37λ0×0.3λ0×0.08λ0), while the 1-D array and 2-D array antennas areas without feeding transmission lines are respectively (0.5λ0×2.15λ0×0.08λ0) and (0.9λ0×0.8λ0×0.08λ0).
Yingyao WANG Han WANG Chaoqun DUAN Tiejun ZHAO
Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.
Jean TEMGA Tomoyuki FURUICHI Takashi SHIBA Noriharu SUEMATSU
A 2-D beam scanning array antenna fed by a compact 16-way 2-D beamforming network (BFN) designed in Broadside Coupled Stripline (BCS) is addressed. The proposed 16-way 2-D BFN is formed by interconnecting two groups of 4x4 Butler Matrix (BM). Each group is composed of four compact 4x4 BMs. The critical point of the design is to propose a simple and compact 4x4 BM without crossover in BCS to achieve a better transmission coefficient of the 16-way 2-D BFN with reduced size of merely 0.8λ0×0.8λ0×0.04λ0. Moreover, the complexity of the interface connection between the 2-D BFN and the 4x4 patch array antenna is reduced by using probe feeding. The 16-way 2-D BFN is able to produce the phase shift of ±45°, and ±135° in x- and y- directions. The 2-D BFN is easily integrated under the 4x4 patch array to form a 2-D phased array capable of switching 16 beams in both elevation and azimuth directions. The area of the proposed 2-D beam scanning array antenna module has been significantly reduced to 2λ0×2λ0×0.04λ0. A prototype operating in the frequency range of 4-6GHz is fabricated and measured to validate the concept. The measurement results agree well with the simulations.
Shohei KAKEI Hiroaki SEKO Yoshiaki SHIRAISHI Shoichi SAITO
This paper first takes IoT as an example to provide the motivation for eliminating the single point of trust (SPOT) in a CA-based private PKI. It then describes a distributed public key certificate-issuing infrastructure that eliminates the SPOT and its limitation derived from generating signing keys. Finally, it proposes a method to address its limitation by all certificate issuers.
In this letter, we propose a feature-based knowledge distillation scheme which transfers knowledge between intermediate blocks of teacher and student with flow-based architecture, specifically Normalizing flow in our implementation. In addition to the knowledge transfer scheme, we examine how configuration of the distillation positions impacts on the knowledge transfer performance. To evaluate the proposed ideas, we choose two knowledge distillation baseline models which are based on Normalizing flow on different domains: CS-Flow for anomaly detection and SRFlow-DA for super-resolution. A set of performance comparison to the baseline models with popular benchmark datasets shows promising results along with improved inference speed. The comparison includes performance analysis based on various configurations of the distillation positions in the proposed scheme.