Naoto MATSUO Akira HEYA Kazushige YAMANA Koji SUMITOMO Tetsuo TABEI
The influence of the gate voltage or base pair ratio modulation on the λ-DNA FET performance was examined. The result of the gate voltage modulation indicated that the captured electrons in the guanine base of the λ-DNA molecules greatly influenced the Id-Vd characteristics, and that of the base pair ratio modulation indicated that the tendency of the conductivity was partly clarified by considering the activation energy of holes and electrons and the length and numbers of the serial AT or GC sequences over which the holes or electrons jumped. In addition, the influence of the dimensionality of the DNA molecule on the conductivity was discussed theoretically.
This paper presents the design of a capacitive coupler for underwater wireless power transfer focused on the landing direction of a drone. The main design feature is the relative position of power feeding/receiving points on the coupler electrodes, which depends on the landing direction of the drone. First, the maximum power transfer efficiencies of coupled lines with different feeding positions are derived in a uniform dielectric environment, such as that realized underwater. As a result, these are formulated by the coupling coefficient of the capacitive coupler, the unloaded qualify factor of dielectrics, and hyperbolic functions with complex propagation constants. The hyperbolic functions vary depending on the relative positions and whether these are identical or opposite couplers, and the efficiencies of each coupler depend on the type of water, such as seawater and tap water. The design method was demonstrated and achieved the highest efficiencies of 95.2%, 91.5%, and 85.3% in tap water at transfer distances of 20, 50, and 100 mm, respectively.
Akio KAWABATA Bijoy CHAND CHATTERJEE Eiji OKI
This paper proposes a network design model, considering data consistency for a delay-sensitive distributed processing system. The data consistency is determined by collating the own state and the states of slave servers. If the state is mismatched with other servers, the rollback process is initiated to modify the state to guarantee data consistency. In the proposed model, the selected servers and the master-slave server pairs are determined to minimize the end-to-end delay and the delay for data consistency. We formulate the proposed model as an integer linear programming problem. We evaluate the delay performance and computation time. We evaluate the proposed model in two network models with two, three, and four slave servers. The proposed model reduces the delay for data consistency by up to 31 percent compared to that of a typical model that collates the status of all servers at one master server. The computation time is a few seconds, which is an acceptable time for network design before service launch. These results indicate that the proposed model is effective for delay-sensitive applications.
Shingo YASHIKI Chako TAKAHASHI Koutarou SUZUKI
This paper investigates the effects of backdoor attacks on graph neural networks (GNNs) trained through simple data augmentation by modifying the edges of the graph in graph classification. The numerical results show that GNNs trained with data augmentation remain vulnerable to backdoor attacks and may even be more vulnerable to such attacks than GNNs without data augmentation.
Jie LUO Chengwan HE Hongwei LUO
Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.
While deep image compression performs better than traditional codecs like JPEG on natural images, it faces a challenge as a learning-based approach: compression performance drastically decreases for out-of-domain images. To investigate this problem, we introduce a novel task that we call universal deep image compression, which involves compressing images in arbitrary domains, such as natural images, line drawings, and comics. Furthermore, we propose a content-adaptive optimization framework to tackle this task. This framework adapts a pre-trained compression model to each target image during testing for addressing the domain gap between pre-training and testing. For each input image, we insert adapters into the decoder of the model and optimize the latent representation extracted by the encoder and the adapter parameters in terms of rate-distortion, with the adapter parameters transmitted per image. To achieve the evaluation of the proposed universal deep compression, we constructed a benchmark dataset containing uncompressed images of four domains: natural images, line drawings, comics, and vector arts. We compare our proposed method with non-adaptive and existing adaptive compression methods, and the results show that our method outperforms them. Our code and dataset are publicly available at https://github.com/kktsubota/universal-dic.
Representation learning is a crucial and complex task for multivariate time series data analysis, with a wide range of applications including trend analysis, time series data search, and forecasting. In practice, unsupervised learning is strongly preferred owing to sparse labeling. However, most existing studies focus on the representation of individual subseries without considering relationships between different subseries. In certain scenarios, this may lead to downstream task failures. Here, an unsupervised representation learning model is proposed for multivariate time series that considers the semantic relationship among subseries of time series. Specifically, the covariance calculated by the Gaussian process (GP) is introduced to the self-attention mechanism, capturing relationship features of the subseries. Additionally, a novel unsupervised method is designed to learn the representation of multivariate time series. To address the challenges of variable lengths of input subseries, a temporal pyramid pooling (TPP) method is applied to construct input vectors with equal length. The experimental results show that our model has substantial advantages compared with other representation learning models. We conducted experiments on the proposed algorithm and baseline algorithms in two downstream tasks: classification and retrieval. In classification task, the proposed model demonstrated the best performance on seven of ten datasets, achieving an average accuracy of 76%. In retrieval task, the proposed algorithm achieved the best performance under different datasets and hidden sizes. The result of ablation study also demonstrates significance of semantic relationship in multivariate time series representation learning.
Zejing ZHAO Bin ZHANG Hun-ok LIM
In this study, a Coanda-drone with length, width, and height of 121.6, 121.6, and 191[mm] was designed, and its total mass was 1166.7[g]. Using four propulsion devices, it could produce a maximum of 5428[g] thrust. Its structure is very different from conventional drones because in this study it combines the design of the jet engine of a jet fixed-wing drone with the fuselage structure layout of a rotary-wing drone. The advantage of jet drone's high propulsion is kept so that it can output greater thrust under the same variation of PWM waveform output. In this study, the propulsion device performs high-speed jetting, and the airflow around the propulsion device will also be jetted downward along the direction of the airflow.
Yasumasa NAKA Akihiko ISHIWATA Masaya TAMURA
The misalignment of a coupler is a significant issue for capacitive wireless power transfer (WPT). This paper presents a capacitive WPT system specifically designed for underwater drones operating in flowing freshwater environments. The primary design features include a capacitive coupler with an opposite relative position between feeding and receiving points on the coupler electrode, two phase compensation circuits, and a load-independent inverter. A stable and energy-efficient power transmission is achieved by maintaining a 90° phase difference on the coupler electrode in dielectrics with a large unloaded quality factor (Q factor), such as in freshwater. Although a 622-mm coupler electrode is required at 13.56MHz, the phase compensation circuits can reduce to 250mm as one example, which is mountable to small underwater drones. Furthermore, the electricity waste is automatically reduced using the constant-current (CC) output inverter in the event of misalignment where efficiency drops occur. Finally, their functions are simulated and demonstrated at various receiver positions and transfer distances in tap water.
Shuichi MAEDA Akihiro FUKAMI Kaiki YAMAZAKI
There are several benefits of the information that is invisible to the human eye. “Invisible” here means that it can be visualized or quantified when using instruments. For example, it can improve security without compromising product design. We have succeeded in making an invisible digital image on a metal substrate using periodic repeatability by thin-film interference of niobium oxides. Although this digital information is invisible in the visible light wavelength range of 400-800nm, but detectable in the infrared light that of 800-1150nm. This technology has a potential to be applied to anti-counterfeiting and traceability.
Karin WAKATSUKI Chiemi FUJIKAWA Makoto OMODANI
Herein, we propose a volumetric 3D display in which cross-sectional images are projected onto a rotating helix screen. The method employed by this display can enable image observation from universal directions. A major challenge associated with this method is the presence of invisible regions that occur depending on the observation angle. This study aimed to fabricate a mirror-image helix screen with two helical surfaces coaxially arranged in a plane-symmetrical configuration. The visible region was actually measured to be larger than the visible region of the conventional helix screen. We confirmed that the improved visible region was almost independent of the observation angle and that the visible region was almost equally wide on both the left and right sides of the rotation axis.
Taisei URAKAMI Tamami MARUYAMA Shimpei NISHIYAMA Manato KUSAMIZU Akira ONO Takahiro SHIOZAWA
The novel patch element shapes with the interdigital and multi-via structures for mushroom-type metasurface reflectors are proposed for controlling the reflection phases. The interdigital structure provides a wide reflection phase range by changing the depth of the interdigital fingers. In addition, the multi-via structure provides the higher positive reflection phases such as near +180°. The sufficient reflection phase range of 360° and the low polarization dependent properties could be confirmed by the electromagnetic field simulation. The metasurface reflector for the normal incident plane wave was designed. The desired reflection angles and sharp far field patterns of the reflected beams could be confirmed in the simulation results. The prototype reflectors for the experiments should be designed in the same way as the primary reflector design of the reflector antenna. Specifically, the reflector design method based on the ray tracing method using the incident wave phase was proposed for the prototype. The experimental radiation pattern for the reflector antenna composed of the transmitting antenna (TX) and the prototype metasurface reflector was similar to the simulated radiation pattern. The effectiveness of the proposed structures and their design methods could be confirmed by these simulation and experiment results.
Peng GAO Xin-Yue ZHANG Xiao-Li YANG Jian-Cheng NI Fei WANG
Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.
Kazuki EGASHIRA Atsuyuki MIYAI Qing YU Go IRIE Kiyoharu AIZAWA
We propose a novel classification problem setting where Undesirable Classes (UCs) are defined for each class. UC is the class you specifically want to avoid misclassifying. To address this setting, we propose a framework to reduce the probabilities for UCs while increasing the probability for a correct class.
Xiaotian WANG Tingxuan LI Takuya TAMURA Shunsuke NISHIDA Takehito UTSURO
In the research of machine reading comprehension of Japanese how-to tip QA tasks, conventional extractive machine reading comprehension methods have difficulty in dealing with cases in which the answer string spans multiple locations in the context. The method of fine-tuning of the BERT model for machine reading comprehension tasks is not suitable for such cases. In this paper, we trained a generative machine reading comprehension model of Japanese how-to tip by constructing a generative dataset based on the website “wikihow” as a source of information. We then proposed two methods for multi-task learning to fine-tune the generative model. The first method is the multi-task learning with a generative and extractive hybrid training dataset, where both generative and extractive datasets are simultaneously trained on a single model. The second method is the multi-task learning with the inter-sentence semantic similarity and answer generation, where, drawing upon the answer generation task, the model additionally learns the distance between the sentences of the question/context and the answer in the training examples. The evaluation results showed that both of the multi-task learning methods significantly outperformed the single-task learning method in generative question-and-answer examples. Between the two methods for multi-task learning, that with the inter-sentence semantic similarity and answer generation performed the best in terms of the manual evaluation result. The data and the code are available at https://github.com/EternalEdenn/multitask_ext-gen_sts-gen.
Kenichi FUJITA Atsushi ANDO Yusuke IJIMA
This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the essential factors among speaker characteristics, along with acoustic features such as F0, for reproducing individual utterances in speech synthesis. A novel feature of the proposed method is the rhythm-based embeddings extracted from phonemes and their durations, which are known to be related to speaking rhythm. They are extracted with a speaker identification model similar to the conventional spectral feature-based one. We conducted three experiments, speaker embeddings generation, speech synthesis with generated embeddings, and embedding space analysis, to evaluate the performance. The proposed method demonstrated a moderate speaker identification performance (15.2% EER), even with only phonemes and their duration information. The objective and subjective evaluation results demonstrated that the proposed method can synthesize speech with speech rhythm closer to the target speaker than the conventional method. We also visualized the embeddings to evaluate the relationship between the distance of the embeddings and the perceptual similarity. The visualization of the embedding space and the relation analysis between the closeness indicated that the distribution of embeddings reflects the subjective and objective similarity.
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
Rikuya SASAKI Hiroyuki ICHIDA Htoo Htoo Sandi KYAW Keiichi KANEKO
The increasing demand for high-performance computing in recent years has led to active research on massively parallel systems. The interconnection network in a massively parallel system interconnects hundreds of thousands of processing elements so that they can process large tasks while communicating among others. By regarding the processing elements as nodes and the links between processing elements as edges, respectively, we can discuss various problems of interconnection networks in the framework of the graph theory. Many topologies have been proposed for interconnection networks of massively parallel systems. The hypercube is a very popular topology and it has many variants. The cross-cube is such a topology, which can be obtained by adding one extra edge to each node of the hypercube. The cross-cube reduces the diameter of the hypercube, and allows cycles of odd lengths. Therefore, we focus on the cross-cube and propose an algorithm that constructs disjoint paths from a node to a set of nodes. We give a proof of correctness of the algorithm. Also, we show that the time complexity and the maximum path length of the algorithm are O(n3 log n) and 2n - 3, respectively. Moreover, we estimate that the average execution time of the algorithm is O(n2) based on a computer experiment.
Jinsoo SEO Junghyun KIM Hyemi KIM
Song-level feature summarization is fundamental for the browsing, retrieval, and indexing of digital music archives. This study proposes a deep neural network model, CQTXNet, for extracting song-level feature summary for cover song identification. CQTXNet incorporates depth-wise separable convolution, residual network connections, and attention models to extend previous approaches. An experimental evaluation of the proposed CQTXNet was performed on two publicly available cover song datasets by varying the number of network layers and the type of attention modules.