The search functionality is under construction.

Keyword Search Result

[Keyword] ERST(70hit)

1-20hit(70hit)

  • Improving Sliced Wasserstein Distance with Geometric Median for Knowledge Distillation Open Access

    Hongyun LU  Mengmeng ZHANG  Hongyuan JING  Zhi LIU  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2024/03/08
      Vol:
    E107-D No:7
      Page(s):
    890-893

    Currently, the most advanced knowledge distillation models use a metric learning approach based on probability distributions. However, the correlation between supervised probability distributions is typically geometric and implicit, causing inefficiency and an inability to capture structural feature representations among different tasks. To overcome this problem, we propose a knowledge distillation loss using the robust sliced Wasserstein distance with geometric median (GMSW) to estimate the differences between the teacher and student representations. Due to the intuitive geometric properties of GMSW, the student model can effectively learn to align its produced hidden states from the teacher model, thereby establishing a robust correlation among implicit features. In experiment, our method outperforms state-of-the-art models in both high-resource and low-resource settings.

  • Multi-Dimensional Fused Gromov Wasserstein Discrepancy for Edge-Attributed Graphs Open Access

    Keisuke KAWANO  Satoshi KOIDE  Hiroaki SHIOKAWA  Toshiyuki AMAGASA  

     
    PAPER

      Pubricized:
    2024/01/12
      Vol:
    E107-D No:5
      Page(s):
    683-693

    Graph dissimilarities provide a powerful and ubiquitous approach for applying machine learning algorithms to edge-attributed graphs. However, conventional optimal transport-based dissimilarities cannot handle edge-attributes. In this paper, we propose an optimal transport-based dissimilarity between graphs with edge-attributes. The proposed method, multi-dimensional fused Gromov-Wasserstein discrepancy (MFGW), naturally incorporates the mismatch of edge-attributes into the optimal transport theory. Unlike conventional optimal transport-based dissimilarities, MFGW can directly handle edge-attributes in addition to structural information of graphs. Furthermore, we propose an iterative algorithm, which can be computed on GPUs, to solve non-convex quadratic programming problems involved in MFGW.  Experimentally, we demonstrate that MFGW outperforms the conventional optimal transport-based dissimilarity in several machine learning applications including supervised classification, subgraph matching, and graph barycenter calculation.

  • Social Relation Atmosphere Recognition with Relevant Visual Concepts

    Ying JI  Yu WANG  Kensaku MORI  Jien KATO  

     
    PAPER

      Pubricized:
    2023/06/02
      Vol:
    E106-D No:10
      Page(s):
    1638-1649

    Social relationships (e.g., couples, opponents) are the foundational part of society. Social relation atmosphere describes the overall interaction environment between social relationships. Discovering social relation atmosphere can help machines better comprehend human behaviors and improve the performance of social intelligent applications. Most existing research mainly focuses on investigating social relationships, while ignoring the social relation atmosphere. Due to the complexity of the expressions in video data and the uncertainty of the social relation atmosphere, it is even difficult to define and evaluate. In this paper, we innovatively analyze the social relation atmosphere in video data. We introduce a Relevant Visual Concept (RVC) from the social relationship recognition task to facilitate social relation atmosphere recognition, because social relationships contain useful information about human interactions and surrounding environments, which are crucial clues for social relation atmosphere recognition. Our approach consists of two main steps: (1) we first generate a group of visual concepts that preserve the inherent social relationship information by utilizing a 3D explanation module; (2) the extracted relevant visual concepts are used to supplement the social relation atmosphere recognition. In addition, we present a new dataset based on the existing Video Social Relation Dataset. Each video is annotated with four kinds of social relation atmosphere attributes and one social relationship. We evaluate the proposed method on our dataset. Experiments with various 3D ConvNets and fusion methods demonstrate that the proposed method can effectively improve recognition accuracy compared to end-to-end ConvNets. The visualization results also indicate that essential information in social relationships can be discovered and used to enhance social relation atmosphere recognition.

  • Enhanced Full Attention Generative Adversarial Networks

    KaiXu CHEN  Satoshi YAMANE  

     
    LETTER-Core Methods

      Pubricized:
    2023/01/12
      Vol:
    E106-D No:5
      Page(s):
    813-817

    In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.

  • Intrinsic Representation Mining for Zero-Shot Slot Filling

    Sixia LI  Shogo OKADA  Jianwu DANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/08/19
      Vol:
    E105-D No:11
      Page(s):
    1947-1956

    Zero-shot slot filling is a domain adaptation approach to handle unseen slots in new domains without training instances. Previous studies implemented zero-shot slot filling by predicting both slot entities and slot types. Because of the lack of knowledge about new domains, the existing methods often fail to predict slot entities for new domains as well as cannot effectively predict unseen slot types even when slot entities are correctly identified. Moreover, for some seen slot types, those methods may suffer from the domain shift problem, because the unseen context in new domains may change the explanations of the slots. In this study, we propose intrinsic representations to alleviate the domain shift problems above. Specifically, we propose a multi-relation-based representation to capture both the general and specific characteristics of slot entities, and an ontology-based representation to provide complementary knowledge on the relationships between slots and values across domains, for handling both unseen slot types and unseen contexts. We constructed a two-step pipeline model using the proposed representations to solve the domain shift problem. Experimental results in terms of the F1 score on three large datasets—Snips, SGD, and MultiWOZ 2.3—showed that our model outperformed state-of-the-art baselines by 29.62, 10.38, and 3.89, respectively. The detailed analysis with the average slot F1 score showed that our model improved the prediction by 25.82 for unseen slot types and by 10.51 for seen slot types. The results demonstrated that the proposed intrinsic representations can effectively alleviate the domain shift problem for both unseen slot types and seen slot types with unseen contexts.

  • Rate-Encoding A/D Converter Based on Spiking Neuron Model with Rectangular Wave Threshold Signal

    Yusuke MATSUOKA  Hiroyuki KAWASAKI  

     
    PAPER-Nonlinear Problems

      Pubricized:
    2022/02/21
      Vol:
    E105-A No:8
      Page(s):
    1101-1109

    This paper proposes and characterizes an A/D converter (ADC) based on a spiking neuron model with a rectangular threshold signal. The neuron repeats an integrate-and-fire process and outputs a superstable spike sequence. The dynamics of this system are closely related to those of rate-encoding ADCs. We propose an ADC system based on the spiking neuron model. We derive a theoretical parameter region in a limited time interval of the digital output sequence. We analyze the conversion characteristics in this region and verify that they retain the monotonic increase and rate encoding of an ADC.

  • Lightweight Operation History Graph for Traceability on Program Elements

    Takayuki OMORI  Katsuhisa MARUYAMA  Atsushi OHNISHI  

     
    PAPER-Software System

      Pubricized:
    2020/12/15
      Vol:
    E104-D No:3
      Page(s):
    404-418

    History data of edit operations are more beneficial than those stored in version control systems since they provide detailed information on how source code was changed. Meanwhile, a large number of recorded edit operations discourage developers and researchers from roughly understanding the changes. To assist with this task, it is desirable that they easily obtain traceability links for changed program elements over two source code snapshots before and after a code change. In this paper, we propose a graph representation called Operation History Graph (OHG), which presents code change information with such traceability links that are inferred from the history of edit operations. An OHG instance is generated by parsing any source code snapshot restored by edit histories and combining resultant abstract syntax trees (ASTs) into a single graph structure. To improve the performance of building graph instances, we avoided simply maintaining every program element. Any program element presenting the inner-structure of methods and non-changed elements are omitted. In addition, we adopted a lightweight static analysis for type name resolving to reduce required memory resource in the analysis while the accuracy of name resolving is preserved. Moreover, we assign a specific ID to each node and edge in the graph instance so that a part of the graph data can be separately stored and loaded on demand. These decisions make it feasible to build, manipulate, and store the graph with limited computer resources. To demonstrate the usefulness of the proposed operation history graph and verify whether detected traceability links are sufficient to reveal actual changes of program elements, we implemented tools to generate and manipulate OHG instances. The evaluation on graph generation performance shows that our tool can reduce the required computer resource as compared to another tool authors previously proposed. Moreover, the evaluation on traceability shows that OHG provides traceability links with sufficient accuracy as compared to the baseline approach using GumTree.

  • Comparing Two Extended Concept Mapping Approaches to Investigate the Distribution of Students' Achievements

    Didik Dwi PRASETYA  Tsukasa HIRASHIMA  Yusuke HAYASHI  

     
    LETTER-Educational Technology

      Pubricized:
    2020/11/02
      Vol:
    E104-D No:2
      Page(s):
    337-340

    This study compared two extended concept mapping approaches and investigated the distribution of students' understanding and knowledge structure. The students in the experimental group used Extended Kit-Build (EKB), where a learner extends a concept map built by kit-building, and those in the control group utilized the Extended Scratch-Build (ESB), where a learner extends a concept map made by scratch-building. The results suggested that the experimental group had better achievements in both the original material and the additional material.

  • Spectrum-Based Fault Localization Framework to Support Fault Understanding Open Access

    Yong WANG  Zhiqiu HUANG  Yong LI  RongCun WANG  Qiao YU  

     
    LETTER-Software Engineering

      Pubricized:
    2019/01/15
      Vol:
    E102-D No:4
      Page(s):
    863-866

    A spectrum-based fault localization technique (SBFL), which identifies fault location(s) in a buggy program by comparing the execution statistics of the program spectra of passed executions and failed executions, is a popular automatic debugging technique. However, the usefulness of SBFL is mainly affected by the following two factors: accuracy and fault understanding in reality. To solve this issue, we propose a SBFL framework to support fault understanding. In the framework, we firstly localize a suspicious fault module to start debugging and then generate a weighted fault propagation graph (WFPG) for the hypothesis fault module, which weights the suspiciousness for the nodes to further perform block-level fault localization. In order to evaluate the proposed framework, we conduct a controlled experiment to compare two different module-level SBFL approaches and validate the effectiveness of WFPG. According to our preliminary experiments, the results are promising.

  • In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer

    Takeshi HOMMA  Yasunari OBUCHI  Kazuaki SHIMA  Rintaro IKESHITA  Hiroaki KOKUBO  Takuya MATSUMOTO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2018/08/31
      Vol:
    E101-D No:12
      Page(s):
    3123-3137

    For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.

  • Complicated Superstable Periodic Orbits in a Simple Spiking Neuron Model with Rectangular Threshold Signal

    Yusuke MATSUOKA  

     
    LETTER-Nonlinear Problems

      Vol:
    E101-A No:11
      Page(s):
    1944-1948

    We studied complicated superstable periodic orbits (SSPOs) in a spiking neuron model with a rectangular threshold signal. The neuron exhibited SSPOs with various periods that changed dramatically when we varied the parameter space. Using a one-dimensional return map defined by the spike phase, we evaluated period changes and showed its complicated distribution. Finally, we constructed a test circuit to confirm the typical phenomena displayed by the mathematical model.

  • Reciprocal Kit-Build Concept Map: An Approach for Encouraging Pair Discussion to Share Each Other's Understanding

    Warunya WUNNASRI  Jaruwat PAILAI  Yusuke HAYASHI  Tsukasa HIRASHIMA  

     
    PAPER-Educational Technology

      Pubricized:
    2018/05/29
      Vol:
    E101-D No:9
      Page(s):
    2356-2367

    Collaborative learning is an active teaching and learning strategy, in which learners who give each other elaborated explanations can learn most. However, it is difficult for learners to explain their own understanding elaborately in collaborative learning. In this study, we propose a collaborative use of a Kit-Build concept map (KB map) called “Reciprocal KB map”. In a Reciprocal KB map for a pair discussion, at first, the two participants make their own concept maps expressing their comprehension. Then, they exchange the components of their maps and request each other to reconstruct their maps by using the components. The differences between the original map and the reconstructed map are diagnosed automatically as an advantage of the KB map. Reciprocal KB map is expected to encourage pair discussion to recognize the understanding of each other and to create an effective discussion. In an experiment reported in this paper, Reciprocal KB map was used for supporting a pair discussion and was compared with a pair discussion which was supported by a traditional concept map. Nineteen pairs of university students were requested to use the traditional concept map in their discussion, while 20 pairs of university students used Reciprocal KB map for discussing the same topic. The results of the experiment were analyzed using three metrics: a discussion score, a similarity score, and questionnaires. The discussion score, which investigates the value of talk in discussion, demonstrates that Reciprocal KB map can promote more effective discussion between the partners compared to the traditional concept map. The similarity score, which evaluates the similarity of the concept maps, demonstrates that Reciprocal KB map can encourage the pair of partners to understand each other better compared to the traditional concept map. Last, the questionnaires illustrate that Reciprocal KB map can support the pair of partners to collaborate in the discussion smoothly and that the participants accepted this method for sharing their understanding with each other. These results suggest that Reciprocal KB map is a promising approach for encouraging pairs of partners to understand each other and to promote the effective discussions.

  • Effects of Finite Superstrate and Asymmetrical Ground on High Gain Superstrate Antenna

    Jae-Gon LEE  Taek-Sun KWON  Jeong-Hae LEE  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2018/02/16
      Vol:
    E101-B No:8
      Page(s):
    1884-1890

    In this paper, we present the effects of finite superstrates and asymmetrical grounds on the performance of high gain superstrate antennas. First, when the source of a superstrate antenna is located at an edge of a ground plane, that is, an asymmetric ground plane, the gain of the superstrate antenna can be made to match the gain of the superstrate antenna with a symmetrical ground plane using the PEC (E-plane asymmetric) or the AMC wall (H-plane asymmetric) near the edge. Second, the gain of the superstrate antenna, which has a ground plane with dimensions sufficiently close to infinite, is found to be roughly proportional to the reflection magnitude of a partially reflective surface (PRS). It is found that when the square ground size has a finite dimension of two wavelengths or less, the reflection magnitude of the PRS should have the optimum value for achieving maximum gain. Finally, the gain of the superstrate antenna is studied when the ground plane differs from a PRS. For the above three cases, the performances of the superstrate antenna are verified and compared by analysis, full-wave simulation, and measurement.

  • Compact Controlled Reception Pattern Antenna (CRPA) Array Based on Mu-Zero Resonance (MZR) Antenna

    Jae-Gon LEE  Taek-Sun KWON  Bo-Hee CHOI  Jeong-Hae LEE  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2017/12/20
      Vol:
    E101-B No:6
      Page(s):
    1427-1433

    In this paper, a compact controlled reception pattern antenna (CRPA) array based on a mu-zero resonance (MZR) antenna is proposed for a global positioning system (GPS). The MZR antenna can be minimized by designing structure based in mu-negative (MNG) transmission line. The MNG transmission line can be implemented by a gap structure for the series capacitance and a shorting via for a short-ended boundary condition. The CRPA array, which operates in L1 (1.57542GHz) and L2 (1.2276GHz) bands, is designed as a cylinder with a diameter and a height of 127mm (5 inches) and 20mm, respectively, and is composed of seven radiating elements. To design the compact CRPA array with high performance attributes such as an impedance matching (VSWR) value of less than 2, an isolation between array elements (<-12dB), an axial ratio (<5dB), and a circular polarization (CP) gain (>-1dBic: L1 band and >-3dBic: L2 band), we employ two orthogonal MZR antennas, a superstrate, and chip couplers. The performances of the CRPA antenna are verified and compared by an analytic analysis, a full-wave simulation, and measurements.

  • Modeling Storylines in Lyrics

    Kento WATANABE  Yuichiroh MATSUBAYASHI  Kentaro INUI  Satoru FUKAYAMA  Tomoyasu NAKANO  Masataka GOTO  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/12/22
      Vol:
    E101-D No:4
      Page(s):
    1167-1179

    This paper addresses the issue of modeling the discourse nature of lyrics and presented the first study aiming at capturing the two common discourse-related notions: storylines and themes. We assume that a storyline is a chain of transitions over topics of segments and a song has at least one entire theme. We then hypothesize that transitions over topics of lyric segments can be captured by a probabilistic topic model which incorporates a distribution over transitions of latent topics and that such a distribution of topic transitions is affected by the theme of lyrics. Aiming to test those hypotheses, this study conducts experiments on the word prediction and segment order prediction tasks exploiting a large-scale corpus of popular music lyrics for both English and Japanese (around 100 thousand songs). The findings we gained from these experiments can be summarized into two respects. First, the models with topic transitions significantly outperformed the model without topic transitions in word prediction. This result indicates that typical storylines included in our lyrics datasets were effectively captured as a probabilistic distribution of transitions over latent topics of segments. Second, the model incorporating a latent theme variable on top of topic transitions outperformed the models without such variables in both word prediction and segment order prediction. From this result, we can conclude that considering the notion of theme does contribute to the modeling of storylines of lyrics.

  • Comparative Study between Two Approaches Using Edit Operations and Code Differences to Detect Past Refactorings

    Takayuki OMORI  Katsuhisa MARUYAMA  

     
    PAPER-Software Engineering

      Pubricized:
    2017/11/27
      Vol:
    E101-D No:3
      Page(s):
    644-658

    Understanding which refactoring transformations were performed is in demand in modern software constructions. Traditionally, many researchers have been tackling understanding code changes with history data derived from version control systems. In those studies, problems of the traditional approach are pointed out, such as entanglement of multiple changes. To alleviate the problems, operation histories on IDEs' code editors are available as a new source of software evolution data nowadays. By replaying such histories, we can investigate past code changes in a fine-grained level. However, the prior studies did not provide enough evidence of their effectiveness for detecting refactoring transformations. This paper describes an experiment in which participants detect refactoring transformations performed by other participants after investigating the code changes with an operation-replay tool and diff tools. The results show that both approaches have their respective factors that pose misunderstanding and overlooking of refactoring transformations. Two negative factors on divided operations and generated compound operations were observed in the operation-based approach, whereas all the negative factors resulted from three problems on tangling, shadowing, and out-of-order of code changes in the difference-based approach. This paper also shows seven concrete examples of participants' mistakes in both approaches. These findings give us hints for improving existing tools for understanding code changes and detecting refactoring transformations.

  • Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning

    Miquel ESPI  Masakiyo FUJIMOTO  Tomohiro NAKATANI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/06/23
      Vol:
    E98-D No:10
      Page(s):
    1799-1807

    We present a method for recognition of acoustic events in conversation scenarios where speech usually overlaps with other acoustic events. While speech is usually considered the most informative acoustic event in a conversation scene, it does not always contain all the information. Non-speech events, such as a door knock, steps, or a keyboard typing can reveal aspects of the scene that speakers miss or avoid to mention. Moreover, being able to robustly detect these events could further support speech enhancement and recognition systems by providing useful information cues about the surrounding scenarios and noise. In acoustic event detection, state-of-the-art techniques are typically based on derived features (e.g. MFCC, or Mel-filter-banks) which have successfully parameterized the spectrogram of speech but reduce resolution and detail when we are targeting other kinds of events. In this paper, we propose a method that learns features in an unsupervised manner from high-resolution spectrogram patches (considering a patch as a certain number of consecutive frame features stacked together), and integrates within the deep neural network framework to detect and classify acoustic events. Superiority over both previous works in the field, and similar approaches based on derived features, has been assessed by statical measures and evaluation with CHIL2007 corpus, an annotated database of seminar recordings.

  • Towards Logging Optimization for Dynamic Object Process Graph Construction

    Takashi ISHIO  Hiroki WAKISAKA  Yuki MANABE  Katsuro INOUE  

     
    LETTER-Software System

      Vol:
    E96-D No:11
      Page(s):
    2470-2472

    Logging the execution process of a program is a popular activity for practical program understanding. However, understanding the behavior of a program from a complete execution trace is difficult because a system may generate a substantial number of runtime events. To focus on a small subset of runtime events, a dynamic object process graph (DOPG) has been proposed. Although a DOPG can potentially facilitate program understanding, the logging process has not been adapted for DOPGs. If a developer is interested in the behavior of a particular object, only the runtime events related to the object are necessary to construct a DOPG. The vast majority of runtime events in a complete execution trace are irrelevant to the interesting object. This paper analyzes actual DOPGs and reports that a logging tool can be optimized to record only the runtime events related to a particular object specified by a developer.

  • Fuzzy Matching of Semantic Class in Chinese Spoken Language Understanding

    Yanling LI  Qingwei ZHAO  Yonghong YAN  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:8
      Page(s):
    1845-1852

    Semantic concept in an utterance is obtained by a fuzzy matching methods to solve problems such as words' variation induced by automatic speech recognition (ASR), or missing field of key information by users in the process of spoken language understanding (SLU). A two-stage method is proposed: first, we adopt conditional random field (CRF) for building probabilistic models to segment and label entity names from an input sentence. Second, fuzzy matching based on similarity function is conducted between the named entities labeled by a CRF model and the reference characters of a dictionary. The experiments compare the performances in terms of accuracy and processing speed. Dice similarity and cosine similarity based on TF score can achieve better accuracy performance among four similarity measures, which equal to and greater than 93% in F1-measure. Especially the latter one improved by 8.8% and 9% respectively compared to q-gram and improved edit-distance, which are two conventional methods for string fuzzy matching.

  • Automatic Allocation of Training Data for Speech Understanding Based on Multiple Model Combinations

    Kazunori KOMATANI  Mikio NAKANO  Masaki KATSUMARU  Kotaro FUNAKOSHI  Tetsuya OGATA  Hiroshi G. OKUNO  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:9
      Page(s):
    2298-2307

    The optimal way to build speech understanding modules depends on the amount of training data available. When only a small amount of training data is available, effective allocation of the data is crucial to preventing overfitting of statistical methods. We have developed a method for allocating a limited amount of training data in accordance with the amount available. Our method exploits rule-based methods for when the amount of data is small, which are included in our speech understanding framework based on multiple model combinations, i.e., multiple automatic speech recognition (ASR) modules and multiple language understanding (LU) modules, and then allocates training data preferentially to the modules that dominate the overall performance of speech understanding. Experimental evaluation showed that our allocation method consistently outperforms baseline methods that use a single ASR module and a single LU module while the amount of training data increases.

1-20hit(70hit)