The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] Si(16348hit)

201-220hit(16348hit)

  • Semantic Relationship-Based Unsupervised Representation Learning of Multivariate Time Series

    Chengyang YE  Qiang MA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/11/16
      Vol:
    E107-D No:2
      Page(s):
    191-200

    Representation learning is a crucial and complex task for multivariate time series data analysis, with a wide range of applications including trend analysis, time series data search, and forecasting. In practice, unsupervised learning is strongly preferred owing to sparse labeling. However, most existing studies focus on the representation of individual subseries without considering relationships between different subseries. In certain scenarios, this may lead to downstream task failures. Here, an unsupervised representation learning model is proposed for multivariate time series that considers the semantic relationship among subseries of time series. Specifically, the covariance calculated by the Gaussian process (GP) is introduced to the self-attention mechanism, capturing relationship features of the subseries. Additionally, a novel unsupervised method is designed to learn the representation of multivariate time series. To address the challenges of variable lengths of input subseries, a temporal pyramid pooling (TPP) method is applied to construct input vectors with equal length. The experimental results show that our model has substantial advantages compared with other representation learning models. We conducted experiments on the proposed algorithm and baseline algorithms in two downstream tasks: classification and retrieval. In classification task, the proposed model demonstrated the best performance on seven of ten datasets, achieving an average accuracy of 76%. In retrieval task, the proposed algorithm achieved the best performance under different datasets and hidden sizes. The result of ablation study also demonstrates significance of semantic relationship in multivariate time series representation learning.

  • Development of a Coanda-Drone with Built-in Propellers

    Zejing ZHAO  Bin ZHANG  Hun-ok LIM  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/11/10
      Vol:
    E107-D No:2
      Page(s):
    180-190

    In this study, a Coanda-drone with length, width, and height of 121.6, 121.6, and 191[mm] was designed, and its total mass was 1166.7[g]. Using four propulsion devices, it could produce a maximum of 5428[g] thrust. Its structure is very different from conventional drones because in this study it combines the design of the jet engine of a jet fixed-wing drone with the fuselage structure layout of a rotary-wing drone. The advantage of jet drone's high propulsion is kept so that it can output greater thrust under the same variation of PWM waveform output. In this study, the propulsion device performs high-speed jetting, and the airflow around the propulsion device will also be jetted downward along the direction of the airflow.

  • Capacitive Wireless Power Transfer System with Misalignment Tolerance in Flowing Freshwater Environments

    Yasumasa NAKA  Akihiko ISHIWATA  Masaya TAMURA  

     
    PAPER-Electromagnetic Theory

      Pubricized:
    2023/08/01
      Vol:
    E107-C No:2
      Page(s):
    47-56

    The misalignment of a coupler is a significant issue for capacitive wireless power transfer (WPT). This paper presents a capacitive WPT system specifically designed for underwater drones operating in flowing freshwater environments. The primary design features include a capacitive coupler with an opposite relative position between feeding and receiving points on the coupler electrode, two phase compensation circuits, and a load-independent inverter. A stable and energy-efficient power transmission is achieved by maintaining a 90° phase difference on the coupler electrode in dielectrics with a large unloaded quality factor (Q factor), such as in freshwater. Although a 622-mm coupler electrode is required at 13.56MHz, the phase compensation circuits can reduce to 250mm as one example, which is mountable to small underwater drones. Furthermore, the electricity waste is automatically reduced using the constant-current (CC) output inverter in the event of misalignment where efficiency drops occur. Finally, their functions are simulated and demonstrated at various receiver positions and transfer distances in tap water.

  • Invisible Digital Image by Thin-Film Interference of Niobium Oxide Using Its Periodic Repeatability Open Access

    Shuichi MAEDA  Akihiro FUKAMI  Kaiki YAMAZAKI  

     
    INVITED PAPER

      Pubricized:
    2023/08/22
      Vol:
    E107-C No:2
      Page(s):
    42-46

    There are several benefits of the information that is invisible to the human eye. “Invisible” here means that it can be visualized or quantified when using instruments. For example, it can improve security without compromising product design. We have succeeded in making an invisible digital image on a metal substrate using periodic repeatability by thin-film interference of niobium oxides. Although this digital information is invisible in the visible light wavelength range of 400-800nm, but detectable in the infrared light that of 800-1150nm. This technology has a potential to be applied to anti-counterfeiting and traceability.

  • Universal Angle Visibility Realized by a Volumetric 3D Display Using a Rotating Mirror-Image Helix Screen Open Access

    Karin WAKATSUKI  Chiemi FUJIKAWA  Makoto OMODANI  

     
    INVITED PAPER

      Pubricized:
    2023/08/03
      Vol:
    E107-C No:2
      Page(s):
    23-28

    Herein, we propose a volumetric 3D display in which cross-sectional images are projected onto a rotating helix screen. The method employed by this display can enable image observation from universal directions. A major challenge associated with this method is the presence of invisible regions that occur depending on the observation angle. This study aimed to fabricate a mirror-image helix screen with two helical surfaces coaxially arranged in a plane-symmetrical configuration. The visible region was actually measured to be larger than the visible region of the conventional helix screen. We confirmed that the improved visible region was almost independent of the observation angle and that the visible region was almost equally wide on both the left and right sides of the rotation axis.

  • Interdigital and Multi-Via Structures for Mushroom-Type Metasurface Reflectors

    Taisei URAKAMI  Tamami MARUYAMA  Shimpei NISHIYAMA  Manato KUSAMIZU  Akira ONO  Takahiro SHIOZAWA  

     
    PAPER-Antennas and Propagation

      Vol:
    E107-B No:2
      Page(s):
    309-320

    The novel patch element shapes with the interdigital and multi-via structures for mushroom-type metasurface reflectors are proposed for controlling the reflection phases. The interdigital structure provides a wide reflection phase range by changing the depth of the interdigital fingers. In addition, the multi-via structure provides the higher positive reflection phases such as near +180°. The sufficient reflection phase range of 360° and the low polarization dependent properties could be confirmed by the electromagnetic field simulation. The metasurface reflector for the normal incident plane wave was designed. The desired reflection angles and sharp far field patterns of the reflected beams could be confirmed in the simulation results. The prototype reflectors for the experiments should be designed in the same way as the primary reflector design of the reflector antenna. Specifically, the reflector design method based on the ray tracing method using the incident wave phase was proposed for the prototype. The experimental radiation pattern for the reflector antenna composed of the transmitting antenna (TX) and the prototype metasurface reflector was similar to the simulated radiation pattern. The effectiveness of the proposed structures and their design methods could be confirmed by these simulation and experiment results.

  • Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention

    Peng GAO  Xin-Yue ZHANG  Xiao-Li YANG  Jian-Cheng NI  Fei WANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/10/20
      Vol:
    E107-D No:1
      Page(s):
    161-164

    Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.

  • Lightweight and Fast Low-Light Image Enhancement Method Based on PoolFormer

    Xin HU  Jinhua WANG  Sunhan XU  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/10/05
      Vol:
    E107-D No:1
      Page(s):
    157-160

    Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.

  • Negative Learning to Prevent Undesirable Misclassification

    Kazuki EGASHIRA  Atsuyuki MIYAI  Qing YU  Go IRIE  Kiyoharu AIZAWA  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/05
      Vol:
    E107-D No:1
      Page(s):
    144-147

    We propose a novel classification problem setting where Undesirable Classes (UCs) are defined for each class. UC is the class you specifically want to avoid misclassifying. To address this setting, we propose a framework to reduce the probabilities for UCs while increasing the probability for a correct class.

  • Multi-Task Learning of Japanese How-to Tip Machine Reading Comprehension by a Generative Model

    Xiaotian WANG  Tingxuan LI  Takuya TAMURA  Shunsuke NISHIDA  Takehito UTSURO  

     
    PAPER-Natural Language Processing

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    125-134

    In the research of machine reading comprehension of Japanese how-to tip QA tasks, conventional extractive machine reading comprehension methods have difficulty in dealing with cases in which the answer string spans multiple locations in the context. The method of fine-tuning of the BERT model for machine reading comprehension tasks is not suitable for such cases. In this paper, we trained a generative machine reading comprehension model of Japanese how-to tip by constructing a generative dataset based on the website “wikihow” as a source of information. We then proposed two methods for multi-task learning to fine-tune the generative model. The first method is the multi-task learning with a generative and extractive hybrid training dataset, where both generative and extractive datasets are simultaneously trained on a single model. The second method is the multi-task learning with the inter-sentence semantic similarity and answer generation, where, drawing upon the answer generation task, the model additionally learns the distance between the sentences of the question/context and the answer in the training examples. The evaluation results showed that both of the multi-task learning methods significantly outperformed the single-task learning method in generative question-and-answer examples. Between the two methods for multi-task learning, that with the inter-sentence semantic similarity and answer generation performed the best in terms of the manual evaluation result. The data and the code are available at https://github.com/EternalEdenn/multitask_ext-gen_sts-gen.

  • Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

    Kenichi FUJITA  Atsushi ANDO  Yusuke IJIMA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2023/10/06
      Vol:
    E107-D No:1
      Page(s):
    93-104

    This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the essential factors among speaker characteristics, along with acoustic features such as F0, for reproducing individual utterances in speech synthesis. A novel feature of the proposed method is the rhythm-based embeddings extracted from phonemes and their durations, which are known to be related to speaking rhythm. They are extracted with a speaker identification model similar to the conventional spectral feature-based one. We conducted three experiments, speaker embeddings generation, speech synthesis with generated embeddings, and embedding space analysis, to evaluate the performance. The proposed method demonstrated a moderate speaker identification performance (15.2% EER), even with only phonemes and their duration information. The objective and subjective evaluation results demonstrated that the proposed method can synthesize speech with speech rhythm closer to the target speaker than the conventional method. We also visualized the embeddings to evaluate the relationship between the distance of the embeddings and the perceptual similarity. The visualization of the embedding space and the relation analysis between the closeness indicated that the distribution of embeddings reflects the subjective and objective similarity.

  • Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

    Wenkai LIU  Lin ZHANG  Menglong WU  Xichang CAI  Hongxia DONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    83-92

    The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.

  • Node-to-Set Disjoint Paths Problem in Cross-Cubes

    Rikuya SASAKI  Hiroyuki ICHIDA  Htoo Htoo Sandi KYAW  Keiichi KANEKO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/10/06
      Vol:
    E107-D No:1
      Page(s):
    53-59

    The increasing demand for high-performance computing in recent years has led to active research on massively parallel systems. The interconnection network in a massively parallel system interconnects hundreds of thousands of processing elements so that they can process large tasks while communicating among others. By regarding the processing elements as nodes and the links between processing elements as edges, respectively, we can discuss various problems of interconnection networks in the framework of the graph theory. Many topologies have been proposed for interconnection networks of massively parallel systems. The hypercube is a very popular topology and it has many variants. The cross-cube is such a topology, which can be obtained by adding one extra edge to each node of the hypercube. The cross-cube reduces the diameter of the hypercube, and allows cycles of odd lengths. Therefore, we focus on the cross-cube and propose an algorithm that constructs disjoint paths from a node to a set of nodes. We give a proof of correctness of the algorithm. Also, we show that the time complexity and the maximum path length of the algorithm are O(n3 log n) and 2n - 3, respectively. Moreover, we estimate that the average execution time of the algorithm is O(n2) based on a computer experiment.

  • CQTXNet: A Modified Xception Network with Attention Modules for Cover Song Identification

    Jinsoo SEO  Junghyun KIM  Hyemi KIM  

     
    LETTER

      Pubricized:
    2023/10/02
      Vol:
    E107-D No:1
      Page(s):
    49-52

    Song-level feature summarization is fundamental for the browsing, retrieval, and indexing of digital music archives. This study proposes a deep neural network model, CQTXNet, for extracting song-level feature summary for cover song identification. CQTXNet incorporates depth-wise separable convolution, residual network connections, and attention models to extend previous approaches. An experimental evaluation of the proposed CQTXNet was performed on two publicly available cover song datasets by varying the number of network layers and the type of attention modules.

  • A Coded Aperture as a Key for Information Hiding Designed by Physics-in-the-Loop Optimization

    Tomoki MINAMATA  Hiroki HAMASAKI  Hiroshi KAWASAKI  Hajime NAGAHARA  Satoshi ONO  

     
    PAPER

      Pubricized:
    2023/09/28
      Vol:
    E107-D No:1
      Page(s):
    29-38

    This paper proposes a novel application of coded apertures (CAs) for visual information hiding. CA is one of the representative computational photography techniques, in which a patterned mask is attached to a camera as an alternative to a conventional circular aperture. With image processing in the post-processing phase, various functions such as omnifocal image capturing and depth estimation can be performed. In general, a watermark embedded as high-frequency components is difficult to extract if captured outside the focal length, and defocus blur occurs. Installation of a CA into the camera is a simple solution to mitigate the difficulty, and several attempts are conducted to make a better design for stable extraction. On the contrary, our motivation is to design a specific CA as well as an information hiding scheme; the secret information can only be decoded if an image with hidden information is captured with the key aperture at a certain distance outside the focus range. The proposed technique designs the key aperture patterns and information hiding scheme through evolutionary multi-objective optimization so as to minimize the decryption error of a hidden image when using the key aperture while minimizing the accuracy when using other apertures. During the optimization process, solution candidates, i.e., key aperture patterns and information hiding schemes, are evaluated on actual devices to account for disturbances that cannot be considered in optical simulations. Experimental results have shown that decoding can be performed with the designed key aperture and similar ones, that decrypted image quality deteriorates as the similarity between the key and the aperture used for decryption decreases, and that the proposed information hiding technique works on actual devices.

  • Frameworks for Privacy-Preserving Federated Learning

    Le Trieu PHONG  Tran Thi PHUONG  Lihua WANG  Seiichi OZAWA  

     
    INVITED PAPER

      Pubricized:
    2023/09/25
      Vol:
    E107-D No:1
      Page(s):
    2-12

    In this paper, we explore privacy-preserving techniques in federated learning, including those can be used with both neural networks and decision trees. We begin by identifying how information can be leaked in federated learning, after which we present methods to address this issue by introducing two privacy-preserving frameworks that encompass many existing privacy-preserving federated learning (PPFL) systems. Through experiments with publicly available financial, medical, and Internet of Things datasets, we demonstrate the effectiveness of privacy-preserving federated learning and its potential to develop highly accurate, secure, and privacy-preserving machine learning systems in real-world scenarios. The findings highlight the importance of considering privacy in the design and implementation of federated learning systems and suggest that privacy-preserving techniques are essential in enabling the development of effective and practical machine learning systems.

  • Optimal Design of Multiuser mmWave LOS MIMO Systems Using Hybrid Arrays of Subarrays

    Zhaohu PAN  Hang LI  Xiaojing HUANG  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2023/09/26
      Vol:
    E107-B No:1
      Page(s):
    262-271

    In this paper, we investigate optimal design of millimeter-wave (mmWave) multiuser line-of-sight multiple-input-multiple-output (LOS MIMO) systems using hybrid arrays of subarrays based on hybrid block diagonalization (BD) precoding and combining scheme. By introducing a general 3D geometric channel model, the optimal subarray separation products of the transmitter and receiver for maximizing sum-rate is designed in terms of two regular configurations of adjacent subarrays and interleaved subarrays for different users, respectively. We analyze the sensitivity of the optimal design parameters on performance in terms of a deviation factor, and derive expressions for the eigenvalues of the multiuser equivalent LOS MIMO channel matrix, which are also valid for non-optimal design. Simulation results show that the interleaved subarrays can support longer distance communication than the adjacent subarrays given the appropriate fixed subarray deployment.

  • Device-to-Device Communications Employing Fog Nodes Using Parallel and Serial Interference Cancelers

    Binu SHRESTHA  Yuyuan CHANG  Kazuhiko FUKAWA  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2023/10/06
      Vol:
    E107-B No:1
      Page(s):
    223-231

    Device-to-device (D2D) communication allows user terminals to directly communicate with each other without the need for any base stations (BSs). Since the D2D communication underlaying a cellular system shares frequency channels with BSs, co-channel interference may occur. Successive interference cancellation (SIC), which is also called the serial interference canceler, detects and subtracts user signals from received signals in descending order of received power, can cope with the above interference and has already been applied to fog nodes that manage communications among machine-to-machine (M2M) devices besides direct communications with BSs. When differences among received power levels of user signals are negligible, however, SIC cannot work well and thus causes degradation in bit error rate (BER) performance. To solve such a problem, this paper proposes to apply parallel interference cancellation (PIC), which can simultaneously detect both desired and interfering signals under the maximum likelihood criterion and can maintain good BER performance even when power level differences among users are small. When channel coding is employed, however, SIC can be superior to PIC in terms of BER under some channel conditions. Considering the superiority, this paper also proposes to select the proper cancellation scheme and modulation and coding scheme (MCS) that can maximize the throughput of D2D under a constraint of BER, in which the canceler selection is referred to as adaptive interference cancellation. Computer simulations show that PIC outperforms SIC under almost all channel conditions and thus the adaptive selection from PIC and SIC can achieve a marginal gain over PIC, while PIC can achieve 10% higher average system throughput than that of SIC. As for transmission delay time, it is demonstrated that the adaptive selection and PIC can shorten the delay time more than any other schemes, although the fog node causes the delay time of 1ms at least.

  • Location and History Information Aided Efficient Initial Access Scheme for High-Speed Railway Communications

    Chang SUN  Xiaoyu SUN  Jiamin LI  Pengcheng ZHU  Dongming WANG  Xiaohu YOU  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2023/09/14
      Vol:
    E107-B No:1
      Page(s):
    214-222

    The application of millimeter wave (mmWave) directional transmission technology in high-speed railway (HSR) scenarios helps to achieve the goal of multiple gigabit data rates with low latency. However, due to the high mobility of trains, the traditional initial access (IA) scheme with high time consumption is difficult to guarantee the effectiveness of the beam alignment. In addition, the high path loss at the coverage edge of the millimeter wave remote radio unit (mmW-RRU) will also bring great challenges to the stability of IA performance. Fortunately, the train trajectory in HSR scenarios is periodic and regular. Moreover, the cell-free network helps to improve the system coverage performance. Based on these observations, this paper proposes an efficient IA scheme based on location and history information in cell-free networks, where the train can flexibly select a set of mmW-RRUs according to the received signal quality. We specifically analyze the collaborative IA process based on the exhaustive search and based on location and history information, derive expressions for IA success probability and delay, and perform the numerical analysis. The results show that the proposed scheme can significantly reduce the IA delay and effectively improve the stability of IA success probability.

  • Content Search Method Utilizing the Metadata Matching Characteristics of Both Spatio-Temporal Content and User Request in the IoT Era

    Shota AKIYOSHI  Yuzo TAENAKA  Kazuya TSUKAMOTO  Myung LEE  

     
    PAPER-Network System

      Pubricized:
    2023/10/06
      Vol:
    E107-B No:1
      Page(s):
    163-172

    Cross-domain data fusion is becoming a key driver in the growth of numerous and diverse applications in the Internet of Things (IoT) era. We have proposed the concept of a new information platform, Geo-Centric Information Platform (GCIP), that enables IoT data fusion based on geolocation, i.e., produces spatio-temporal content (STC), and then provides the STC to users. In this environment, users cannot know in advance “when,” “where,” or “what type” of STC is being generated because the type and timing of STC generation vary dynamically with the diversity of IoT data generated in each geographical area. This makes it difficult to directly search for a specific STC requested by the user using the content identifier (domain name of URI or content name). To solve this problem, a new content discovery method that does not directly specify content identifiers is needed while taking into account (1) spatial and (2) temporal constraints. In our previous study, we proposed a content discovery method that considers only spatial constraints and did not consider temporal constraints. This paper proposes a new content discovery method that matches user requests with content metadata (topic) characteristics while taking into account spatial and temporal constraints. Simulation results show that the proposed method successfully discovers appropriate STC in response to a user request.

201-220hit(16348hit)