IEICE global.ieice.org Site

Keyword Search Result

[Keyword] representation(233hit)

21-40hit(233hit)

Representation Learning of Tongue Dynamics for a Silent Speech Interface
Hongcui WANG Pierre ROUSSEL Bruce DENBY

PAPER-Speech and Hearing

Pubricized:
2021/08/24
Vol:
E104-D No:12
Page(s):
2209-2217
A Silent Speech Interface (SSI) is a sensor-based, Artificial Intelligence (AI) enabled system in which articulation is performed without the use of the vocal chords, resulting in a voice interface that conserves the ambient audio environment, protects private data, and also functions in noisy environments. Though portable SSIs based on ultrasound imaging of the tongue have obtained Word Error Rates rivaling that of acoustic speech recognition, SSIs remain relegated to the laboratory due to stability issues. Indeed, reliable extraction of acoustic features from ultrasound tongue images in real-life situations has proven elusive. Recently, Representation Learning has shown considerable success in learning underlying structure in noisy, high-dimensional raw data. In its unsupervised form, Representation Learning is able to reveal structure in unlabeled data, thus greatly simplifying the data preparation task. In the present article, a 3D Convolutional Neural Network architecture is applied to unlabeled ultrasound images, and is shown to reliably predict future tongue configurations. By comparing the 3DCNN to a simple previous-frame predictor, it is possible to recognize tongue trajectories comprising transitions between regions of stability that correlate with formant trajectories in a spectrogram of the signal. Prospects for using the underlying structural representation to provide features for subsequent speech processing tasks are presented.
Explanatory Rule Generation for Advanced Driver Assistant Systems
Juha HOVI Ryutaro ICHISE

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2021/06/11
Vol:
E104-D No:9
Page(s):
1427-1439
Autonomous vehicles and advanced driver assistant systems (ADAS) are receiving notable attention as research fields in both academia and private industry. Some decision-making systems use sets of logical rules to map knowledge of the ego-vehicle and its environment into actions the ego-vehicle should take. However, such rulesets can be difficult to create — for example by manually writing them — due to the complexity of traffic as an operating environment. Furthermore, the building blocks of the rules must be defined. One common solution to this is using an ontology specifically aimed at describing traffic concepts and their hierarchy. These ontologies must have a certain expressive power to enable construction of useful rules. We propose a process of generating sets of explanatory rules for ADAS applications from data using ontology as a base vocabulary and present a ruleset generated as a result of our experiments that is correct for the scope of the experiment.
A Global Deep Reranking Model for Semantic Role Classification
Haitong YANG Guangyou ZHOU Tingting HE Maoxi LI

LETTER-Natural Language Processing

Pubricized:
2021/04/15
Vol:
E104-D No:7
Page(s):
1063-1066
The current approaches to semantic role classification usually first define a representation vector for a candidate role and feed the vector into a deep neural network to perform classification. The representation vector contains some lexicalization features like word embeddings, lemmar embeddings. From linguistics, the semantic role frame of a sentence is a joint structure with strong dependencies between arguments which is not considered in current deep SRL systems. Therefore, this paper proposes a global deep reranking model to exploit these strong dependencies. The evaluation experiments on the CoNLL 2009 shared tasks show that our system can outperforms a strong local system significantly that does not consider role dependency relations.
Multiclass Dictionary-Based Statistical Iterative Reconstruction for Low-Dose CT
Hiryu KAMOSHITA Daichi KITAHARA Ken'ichi FUJIMOTO Laurent CONDAT Akira HIRABAYASHI

PAPER-Numerical Analysis and Optimization

Pubricized:
2020/10/06
Vol:
E104-A No:4
Page(s):
702-713
This paper proposes a high-quality computed tomography (CT) image reconstruction method from low-dose X-ray projection data. A state-of-the-art method, proposed by Xu et al., exploits dictionary learning for image patches. This method generates an overcomplete dictionary from patches of standard-dose CT images and reconstructs low-dose CT images by minimizing the sum of a data fidelity and a regularization term based on sparse representations with the dictionary. However, this method does not take characteristics of each patch, such as textures or edges, into account. In this paper, we propose to classify all patches into several classes and utilize an individual dictionary with an individual regularization parameter for each class. Furthermore, for fast computation, we introduce the orthogonality to column vectors of each dictionary. Since similar patches are collected in the same cluster, accuracy degradation by the orthogonality hardly occurs. Our simulations show that the proposed method outperforms the state-of-the-art in terms of both accuracy and speed.
Efficient Hybrid GF(2^m) Multiplier for All-One Polynomial Using Varied Karatsuba Algorithm
Yu ZHANG Yin LI

LETTER-VLSI Design Technology and CAD

Pubricized:
2020/09/15
Vol:
E104-A No:3
Page(s):
636-639
The PCHS (Park-Chang-Hong-Seo) algorithm is a varied Karatsuba algorithm (KA) that utilizes a different splitting strategy with no overlap module. Such an algorithm has been applied to develop efficient hybrid GF(2m) multipliers for irreducible trinomials and pentanomials. However, compared with KA-based hybrid multipliers, these multipliers usually match space complexity but require more gates delay. In this paper, we proposed a new design of hybrid multiplier using PCHS algorithm for irreducible all-one polynomial. The proposed scheme skillfully utilizes redundant representation to combine and simplify the subexpressions computation, which result in a significant speedup of the implementation. As a main contribution, the proposed multiplier has exactly the same space and time complexities compared with the KA-based scheme. It is the first time to show that different splitting strategy for KA also can develop the same efficient multiplier.
Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition
Shilei CHENG Mei XIE Zheng MA Siqi LI Song GU Feng YANG

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2020/10/01
Vol:
E104-D No:1
Page(s):
220-224
As characterizing videos simultaneously from spatial and temporal cues have been shown crucial for video processing, with the shortage of temporal information of soft assignment, the vector of locally aggregated descriptor (VLAD) should be considered as a suboptimal framework for learning the spatio-temporal video representation. With the development of attention mechanisms in natural language processing, in this work, we present a novel model with VLAD following spatio-temporal self-attention operations, named spatio-temporal self-attention weighted VLAD (ST-SAWVLAD). In particular, sequential convolutional feature maps extracted from two modalities i.e., RGB and Flow are receptively fed into the self-attention module to learn soft spatio-temporal assignments parameters, which enabling aggregate not only detailed spatial information but also fine motion information from successive video frames. In experiments, we evaluate ST-SAWVLAD by using competitive action recognition datasets, UCF101 and HMDB51, the results shcoutstanding performance. The source code is available at:https://github.com/badstones/st-sawvlad.
Example Phrase Adaptation Method for Customized, Example-Based Dialog System Using User Data and Distributed Word Representations
Norihide KITAOKA Eichi SETO Ryota NISHIMURA

PAPER-Speech and Hearing

Pubricized:
2020/07/30
Vol:
E103-D No:11
Page(s):
2332-2339
We have developed an adaptation method which allows the customization of example-based dialog systems for individual users by applying “plus” and “minus” operations to the distributed representations obtained using the word2vec method. After retrieving user-related profile information from the Web, named entity extraction is applied to the retrieval results. Words with a high term frequency-inverse document frequency (TF-IDF) score are then adopted as user related words. Next, we calculate the similarity between the distrubuted representations of selected user-related words and nouns in the existing example phrases, using word2vec embedding. We then generate phrases adapted to the user by substituting user-related words for highly similar words in the original example phrases. Word2vec also has a special property which allows the arithmetic operations “plus” and “minus” to be applied to distributed word representations. By applying these operations to words used in the original phrases, we are able to determine which user-related words can be used to replace the original words. The user-related words are then substituted to create customized example phrases. We evaluated the naturalness of the generated phrases and found that the system could generate natural phrases.
ChangeMacroRecorder: Accurate Recording of Fine-Grained Textual Changes of Source Code
Katsuhisa MARUYAMA Shinpei HAYASHI Takayuki OMORI

PAPER-Software Engineering

Pubricized:
2020/08/24
Vol:
E103-D No:11
Page(s):
2262-2277
Recording source code changes comes to be well recognized as an effective means for understanding the evolution of existing software and making its future changes efficient. Therefore, modern integrated development environments (IDEs) tend to employ tools that record fine-grained textual changes of source code. However, there is still no satisfactory tool that accurately records textual changes. We propose ChangeMacroRecorder that automatically and silently records all textual changes of source code and in real time correlates those textual changes with actions causing them while a programmer is writing and modifying it on the Eclipse's Java editor. The improvement with respect to the accuracy of recorded textual changes enables both programmers and researchers to exactly understand how the source code was evolved. This paper presents detailed information on how ChangeMacroRecorder achieves the accurate recording of textual changes and demonstrates how accurate textual changes were recorded in our experiment consisting of nine programming tasks.
Joint Multi-Patch and Multi-Task CNNs for Robust Face Recognition
Yanfei LIU Junhua CHEN Yu QIU

PAPER-Pattern Recognition

Pubricized:
2020/07/02
Vol:
E103-D No:10
Page(s):
2178-2187
In this paper, we present a joint multi-patch and multi-task convolutional neural networks (JMM-CNNs) framework to learn more descriptive and robust face representation for face recognition. In the proposed JMM-CNNs, a set of multi-patch CNNs and a feature fusion network are constructed to learn and fuse global and local facial features, then a multi-task learning algorithm, including face recognition task and pose estimation task, is operated on the fused feature to obtain a pose-invariant face representation for the face recognition task. To further enhance the pose insensitiveness of the learned face representation, we also introduce a similarity regularization term on features of the two tasks to propose a regularization loss. Moreover, a simple but effective patch sampling strategy is applied to make the JMM-CNNs have an end-to-end network architecture. Experiments on Multi-PIE dataset demonstrate the effectiveness of the proposed method, and we achieve a competitive performance compared with state-of-the-art methods on Labeled Face in the Wild (LFW), YouTube Faces (YTF) and MegaFace Challenge.
Joint Representations of Knowledge Graphs and Textual Information via Reference Sentences
Zizheng JI Zhengchao LEI Tingting SHEN Jing ZHANG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2020/02/26
Vol:
E103-D No:6
Page(s):
1362-1370
The joint representations of knowledge graph have become an important approach to improve the quality of knowledge graph, which is beneficial to machine learning, data mining, and artificial intelligence applications. However, the previous work suffers severely from the noise in text when modeling the text information. To overcome this problem, this paper mines the high-quality reference sentences of the entities in the knowledge graph, to enhance the representation ability of the entities. A novel framework for joint representation learning of knowledge graphs and text information based on reference sentence noise-reduction is proposed, which embeds the entity, the relations, and the words into a unified vector space. The proposed framework consists of knowledge graph representation learning module, textual relation representation learning module, and textual entity representation learning module. Experiments on entity prediction, relation prediction, and triple classification tasks are conducted, results show that the proposed framework can significantly improve the performance of mining and fusing the text information. Especially, compared with the state-of-the-art method[15], the proposed framework improves the metric of H@10 by 5.08% and 3.93% in entity prediction task and relation prediction task, respectively, and improves the metric of accuracy by 5.08% in triple classification task.
Leveraging Entity-Type Properties in the Relational Context for Knowledge Graph Embedding
Md Mostafizur RAHMAN Atsuhiro TAKASU

PAPER

Pubricized:
2020/02/03
Vol:
E103-D No:5
Page(s):
958-968
Knowledge graph embedding aims to embed entities and relations of multi-relational data in low dimensional vector spaces. Knowledge graphs are useful for numerous artificial intelligence (AI) applications. However, they (KGs) are far from completeness and hence KG embedding models have quickly gained massive attention. Nevertheless, the state-of-the-art KG embedding models ignore the category specific projection of entities and the impact of entity types in relational aspect. For example, the entity “Washington” could belong to the person or location category depending on its appearance in a specific relation. In a KG, an entity usually holds many type properties. It leads us to a very interesting question: are all the type properties of an entity are meaningful for a specific relation? In this paper, we propose a KG embedding model TPRC that leverages entity-type properties in the relational context. To show the effectiveness of our model, we apply our idea to the TransE, TransR and TransD. Our approach outperforms other state-of-the-art approaches as TransE, TransD, DistMult and ComplEx. Another, important observation is: introducing entity type properties in the relational context can improve the performances of the original translation distance based models.
Multimodal Analytics to Understand Self-Regulation Process of Cognitive and Behavioral Strategies in Real-World Learning
Masaya OKADA Yasutaka KUROKI Masahiro TADA

PAPER-Human-computer Interaction

Pubricized:
2020/02/05
Vol:
E103-D No:5
Page(s):
1039-1054
Recent studies suggest that learning “how to learn” is important because learners must be self-regulated to take more responsibility for their own learning processes, meta-cognitive control, and other generative learning thoughts and behaviors. The mechanism that enables a learner to self-regulate his/her learning strategies has been actively studied in classroom settings, but has seldom been studied in the area of real-world learning in out-of-school settings (e.g., environmental learning in nature). A feature of real-world learning is that a learner's cognition of the world is updated by his/her behavior to investigate the world, and vice versa. This paper models the mechanism of real-world learning for executing and self-regulating a learner's cognitive and behavioral strategies to self-organize his/her internal knowledge space. Furthermore, this paper proposes multimodal analytics to integrate heterogeneous data resources of the cognitive and behavioral features of real-world learning, to structure and archive the time series of strategies occurring through learner-environment interactions, and to assess how learning should be self-regulated for better understanding of the world. Our analysis showed that (1) intellectual achievements are built by self-regulating learning to chain the execution of cognitive and behavioral strategies, and (2) a clue to predict learning outcomes in the world is analyzing the quantity and frequency of strategies that a learner uses and self-regulates. Assessment based on these findings can encourage a learner to reflect and improve his/her way of learning in the world.
Air Quality Index Forecasting via Deep Dictionary Learning
Bin CHEN

PAPER-Image Recognition, Computer Vision

Pubricized:
2020/02/20
Vol:
E103-D No:5
Page(s):
1118-1125
Air quality index (AQI) is a non-dimensional index for the description of air quality, and is widely used in air quality management schemes. A novel method for Air Quality Index Forecasting based on Deep Dictionary Learning (AQIF-DDL) and machine vision is proposed in this paper. A sky image is used as the input of the method, and the output is the forecasted AQI value. The deep dictionary learning is employed to automatically extract the sky image features and achieve the AQI forecasting. The idea of learning deeper dictionary levels stemmed from the deep learning is also included to increase the forecasting accuracy and stability. The proposed AQIF-DDL is compared with other deep learning based methods, such as deep belief network, stacked autoencoder and convolutional neural network. The experimental results indicate that the proposed method leads to good performance on AQI forecasting.
Rust Detection of Steel Structure via One-Class Classification and L2 Sparse Representation with Decision Fusion
Guizhong ZHANG Baoxian WANG Zhaobo YAN Yiqiang LI Huaizhi YANG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/11/11
Vol:
E103-D No:2
Page(s):
450-453
In this work, we present one novel rust detection method based upon one-class classification and L2 sparse representation (SR) with decision fusion. Firstly, a new color contrast descriptor is proposed for extracting the rust features of steel structure images. Considering that the patterns of rust features are more simplified than those of non-rust ones, one-class support vector machine (SVM) classifier and L2 SR classifier are designed with these rust image features, respectively. After that, a multiplicative fusion rule is advocated for combining the one-class SVM and L2 SR modules, thereby achieving more accurate rust detecting results. In the experiments, we conduct numerous experiments, and when compared with other developed rust detectors, the presented method can offer better rust detecting performances.
Secure Overcomplete Dictionary Learning for Sparse Representation
Takayuki NAKACHI Yukihiro BANDOH Hitoshi KIYA

PAPER

Pubricized:
2019/10/09
Vol:
E103-D No:1
Page(s):
50-58
In this paper, we propose secure dictionary learning based on a random unitary transform for sparse representation. Currently, edge cloud computing is spreading to many application fields including services that use sparse coding. This situation raises many new privacy concerns. Edge cloud computing poses several serious issues for end users, such as unauthorized use and leak of data, and privacy failures. The proposed scheme provides practical MOD and K-SVD dictionary learning algorithms that allow computation on encrypted signals. We prove, theoretically, that the proposal has exactly the same dictionary learning estimation performance as the non-encrypted variant of MOD and K-SVD algorithms. We apply it to secure image modeling based on an image patch model. Finally, we demonstrate its performance on synthetic data and a secure image modeling application for natural images.
Measuring Semantic Similarity between Words Based on Multiple Relational Information
Jianyong DUAN Yuwei WU Mingli WU Hao WANG

PAPER-Natural Language Processing

Pubricized:
2019/09/27
Vol:
E103-D No:1
Page(s):
163-169
The similarity of words extracted from the rich text relation network is the main way to calculate the semantic similarity. Complex relational information and text content in Wikipedia website, Community Question Answering and social network, provide abundant corpus for semantic similarity calculation. However, most typical research only focused on single relationship. In this paper, we propose a semantic similarity calculation model which integrates multiple relational information, and map multiple relationship to the same semantic space through learning representing matrix and semantic matrix to improve the accuracy of semantic similarity calculation. In experiments, we confirm that the semantic calculation method which integrates many kinds of relationships can improve the accuracy of semantic calculation, compared with other semantic calculation methods.
Sampling Shape Contours Using Optimization over a Geometric Graph
Kazuya OSE Kazunori IWATA Nobuo SUEMATSU

PAPER-Pattern Recognition

Pubricized:
2019/09/11
Vol:
E102-D No:12
Page(s):
2547-2556
Consider selecting points on a contour in the x-y plane. In shape analysis, this is frequently referred to as contour sampling. It is important to select the points such that they effectively represent the shape of the contour. Generally, the stroke order and number of strokes are informative for that purpose. Several effective methods exist for sampling contours drawn with a certain stroke order and number of strokes, such as the English alphabet or Arabic figures. However, many contours entail an uncertain stroke order and number of strokes, such as pictures of symbols, and little research has focused on methods for sampling such contours. This is because selecting the points in this case typically requires a large computational cost to check all the possible choices. In this paper, we present a sampling method that is useful regardless of whether the contours are drawn with a certain stroke order and number of strokes or not. Our sampling method thereby expands the application possibilities of contour processing. We formulate contour sampling as a discrete optimization problem that can be solved using a type of direct search. Based on a geometric graph whose vertices are the points and whose edges form rectangles, we construct an effective objective function for the problem. Using different shape datasets, we demonstrate that our sampling method is effective with respect to shape representation and retrieval.
Network Embedding with Deep Metric Learning
Xiaotao CHENG Lixin JI Ruiyang HUANG Ruifei CUI

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2018/12/26
Vol:
E102-D No:3
Page(s):
568-578
Network embedding has attracted an increasing amount of attention in recent years due to its wide-ranging applications in graph mining tasks such as vertex classification, community detection, and network visualization. Network embedding is an important method to learn low-dimensional representations of vertices in networks, aiming to capture and preserve the network structure. Almost all the existing network embedding methods adopt the so-called Skip-gram model in Word2vec. However, as a bag-of-words model, the skip-gram model mainly utilized the local structure information. The lack of information metrics for vertices in global network leads to the mix of vertices with different labels in the new embedding space. To solve this problem, in this paper we propose a Network Representation Learning method with Deep Metric Learning, namely DML-NRL. By setting the initialized anchor vertices and adding the similarity measure in the training progress, the distance information between different labels of vertices in the network is integrated into the vertex representation, which improves the accuracy of network embedding algorithm effectively. We compare our method with baselines by applying them to the tasks of multi-label classification and data visualization of vertices. The experimental results show that our method outperforms the baselines in all three datasets, and the method has proved to be effective and robust.
Rectifying Transformation Networks for Transformation-Invariant Representations with Power Law
Chunxiao FAN Yang LI Lei TIAN Yong LI

LETTER-Image Recognition, Computer Vision

Pubricized:
2018/12/04
Vol:
E102-D No:3
Page(s):
675-679
This letter proposes a representation learning framework of convolutional neural networks (Convnets) that aims to rectify and improve the feature representations learned by existing transformation-invariant methods. The existing methods usually encode feature representations invariant to a wide range of spatial transformations by augmenting input images or transforming intermediate layers. Unfortunately, simply transforming the intermediate feature maps may lead to unpredictable representations that are ineffective in describing the transformed features of the inputs. The reason is that the operations of convolution and geometric transformation are not exchangeable in most cases and so exchanging the two operations will yield the transformation error. The error may potentially harm the performance of the classification networks. Motivated by the fractal statistics of natural images, this letter proposes a rectifying transformation operator to minimize the error. The proposed method is differentiable and can be inserted into the convolutional architecture without making any modification to the optimization algorithm. We show that the rectified feature representations result in better classification performance on two benchmarks.
Real-Time Sparse Visual Tracking Using Circulant Reverse Lasso Model
Chenggang GUO Dongyi CHEN Zhiqi HUANG

PAPER-Image Recognition, Computer Vision

Pubricized:
2018/10/09
Vol:
E102-D No:1
Page(s):
175-184
- HTML
- PDF(4MB) >> Buy this Article
- Errata[Uploaded on April 1,2019]
Sparse representation has been successfully applied to visual tracking. Recent progresses in sparse tracking are mainly made within the particle filter framework. However, most sparse trackers need to extract complex feature representations for each particle in the limited sample space, leading to expensive computation cost and yielding inferior tracking performance. To deal with the above issues, we propose a novel sparse tracking method based on the circulant reverse lasso model. Benefiting from the properties of circulant matrices, densely sampled target candidates are implicitly generated by cyclically shifting the base feature descriptors, and then embedded into a reverse sparse reconstruction model as a dictionary to encode a robust appearance template. The alternating direction method of multipliers is employed for solving the reverse sparse model and the optimization process can be efficiently solved in the frequency domain, which enables the proposed tracker to run in real-time. The calculated sparse coefficient map represents the similarity scores between the template and circular shifted samples. Thus the target location can be directly predicted according to the coordinates of the peak coefficient. A scale-aware template updating strategy is combined with the correlation filter template learning to take into account both appearance deformations and scale variations. Both quantitative and qualitative evaluations on two challenging tracking benchmarks demonstrate that the proposed algorithm performs favorably against several state-of-the-art sparse representation based tracking methods.

21-40hit(233hit)

Keyword Search Result

[Keyword] representation(233hit)

Representation Learning of Tongue Dynamics for a Silent Speech Interface

Explanatory Rule Generation for Advanced Driver Assistant Systems

A Global Deep Reranking Model for Semantic Role Classification

Multiclass Dictionary-Based Statistical Iterative Reconstruction for Low-Dose CT

Efficient Hybrid GF(2^m) Multiplier for All-One Polynomial Using Varied Karatsuba Algorithm

Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition

Example Phrase Adaptation Method for Customized, Example-Based Dialog System Using User Data and Distributed Word Representations

ChangeMacroRecorder: Accurate Recording of Fine-Grained Textual Changes of Source Code

Joint Multi-Patch and Multi-Task CNNs for Robust Face Recognition

Joint Representations of Knowledge Graphs and Textual Information via Reference Sentences

Leveraging Entity-Type Properties in the Relational Context for Knowledge Graph Embedding

Multimodal Analytics to Understand Self-Regulation Process of Cognitive and Behavioral Strategies in Real-World Learning

Air Quality Index Forecasting via Deep Dictionary Learning

Rust Detection of Steel Structure via One-Class Classification and L2 Sparse Representation with Decision Fusion

Secure Overcomplete Dictionary Learning for Sparse Representation

Measuring Semantic Similarity between Words Based on Multiple Relational Information

Sampling Shape Contours Using Optimization over a Geometric Graph

Network Embedding with Deep Metric Learning

Rectifying Transformation Networks for Transformation-Invariant Representations with Power Law

Real-Time Sparse Visual Tracking Using Circulant Reverse Lasso Model

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles