IEICE global.ieice.org Site

Keyword Search Result

[Keyword] PAR(2741hit)

101-120hit(2741hit)

Research on Mongolian-Chinese Translation Model Based on Transformer with Soft Context Data Augmentation Technique
Qing-dao-er-ji REN Yuan LI Shi BAO Yong-chao LIU Xiu-hong CHEN

PAPER-Neural Networks and Bioengineering

Pubricized:
2021/11/19
Vol:
E105-A No:5
Page(s):
871-876
As the mainstream approach in the field of machine translation, neural machine translation (NMT) has achieved great improvements on many rich-source languages, but performance of NMT for low-resource languages ae not very good yet. This paper uses data enhancement technology to construct Mongolian-Chinese pseudo parallel corpus, so as to improve the translation ability of Mongolian-Chinese translation model. Experiments show that the above methods can improve the translation ability of the translation model. Finally, a translation model trained with large-scale pseudo parallel corpus and integrated with soft context data enhancement technology is obtained, and its BLEU value is 39.3.
Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments
Jing WANG Yiyu LUO Weiming YI Xiang XIE

PAPER-Speech and Hearing

Pubricized:
2022/01/11
Vol:
E105-D No:4
Page(s):
766-777
Speech separation is the task of extracting target speech while suppressing background interference components. In applications like video telephones, visual information about the target speaker is available, which can be leveraged for multi-speaker speech separation. Most previous multi-speaker separation methods are mainly based on convolutional or recurrent neural networks. Recently, Transformer-based Seq2Seq models have achieved state-of-the-art performance in various tasks, such as neural machine translation (NMT), automatic speech recognition (ASR), etc. Transformer has showed an advantage in modeling audio-visual temporal context by multi-head attention blocks through explicitly assigning attention weights. Besides, Transformer doesn't have any recurrent sub-networks, thus supporting parallelization of sequence computation. In this paper, we propose a novel speaker-independent audio-visual speech separation method based on Transformer, which can be flexibly applied to unknown number and identity of speakers. The model receives both audio-visual streams, including noisy spectrogram and speaker lip embeddings, and predicts a complex time-frequency mask for the corresponding target speaker. The model is made up by three main components: audio encoder, visual encoder and Transformer-based mask generator. Two different structures of encoders are investigated and compared, including ResNet-based and Transformer-based. The performance of the proposed method is evaluated in terms of source separation and speech quality metrics. The experimental results on the benchmark GRID dataset show the effectiveness of the method on speaker-independent separation task in multi-talker environments. The model generalizes well to unseen identities of speakers and noise types. Though only trained on 2-speaker mixtures, the model achieves reasonable performance when tested on 2-speaker and 3-speaker mixtures. Besides, the model still shows an advantage compared with previous audio-visual speech separation works.
An Algorithm for Single Snapshot 2D-DOA Estimation Based on a Three-Parallel Linear Array Model Open Access
Shiwen LIN Yawen ZHOU Weiqin ZOU Huaguo ZHANG Lin GAO Hongshu LIAO Wanchun LI

PAPER-Digital Signal Processing

Pubricized:
2021/10/05
Vol:
E105-A No:4
Page(s):
673-681
Estimating the spatial parameters of the signals by using the effective data of a single snapshot is essential in the field of reconnaissance and confrontation. Major drawback of existing algorithms is that its constructed covariance matrix has a great degree of rank loss. The performance of existing algorithms gets degraded with low signal-to-noise ratio. In this paper, a three-parallel linear array based algorithm is proposed to achieve two-dimensional direction of arrival estimates in a single snapshot scenario. The key points of the proposed algorithm are: 1) construct three pseudo matrices with full rank and no rank loss by using the single snapshot data from the received signal model; 2) by using the rotation relation between pseudo matrices, the matched 2D-DOA is obtained with an efficient parameter matching method. Main objective of this work is on improving the angle estimation accuracy and reducing the loss of degree of freedom in single snapshot 2D-DOA estimation.
Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering
Xiang SHEN Dezhi HAN Chin-Chen CHANG Liang ZONG

PAPER-Natural Language Processing

Pubricized:
2022/01/06
Vol:
E105-D No:4
Page(s):
785-796
Visual Question Answering (VQA) is multi-task research that requires simultaneous processing of vision and text. Recent research on the VQA models employ a co-attention mechanism to build a model between the context and the image. However, the features of questions and the modeling of the image region force irrelevant information to be calculated in the model, thus affecting the performance. This paper proposes a novel dual self-guided attention with sparse question networks (DSSQN) to address this issue. The aim is to avoid having irrelevant information calculated into the model when modeling the internal dependencies on both the question and image. Simultaneously, it overcomes the coarse interaction between sparse question features and image features. First, the sparse question self-attention (SQSA) unit in the encoder calculates the feature with the highest weight. From the self-attention learning of question words, the question features of larger weights are reserved. Secondly, sparse question features are utilized to guide the focus on image features to obtain fine-grained image features, and to also prevent irrelevant information from being calculated into the model. A dual self-guided attention (DSGA) unit is designed to improve modal interaction between questions and images. Third, the sparse question self-attention of the parameter δ is optimized to select these question-related object regions. Our experiments with VQA 2.0 benchmark datasets demonstrate that DSSQN outperforms the state-of-the-art methods. For example, the accuracy of our proposed model on the test-dev and test-std is 71.03% and 71.37%, respectively. In addition, we show through visualization results that our model can pay more attention to important features than other advanced models. At the same time, we also hope that it can promote the development of VQA in the field of artificial intelligence (AI).
Sublinear Computation Paradigm: Constant-Time Algorithms and Sublinear Progressive Algorithms Open Access
Kyohei CHIBA Hiro ITO

INVITED PAPER-Algorithms and Data Structures

Pubricized:
2021/10/08
Vol:
E105-A No:3
Page(s):
131-141
The challenges posed by big data in the 21st Century are complex: Under the previous common sense, we considered that polynomial-time algorithms are practical; however, when we handle big data, even a linear-time algorithm may be too slow. Thus, sublinear- and constant-time algorithms are required. The academic research project, “Foundations of Innovative Algorithms for Big Data,” which was started in 2014 and will finish in September 2021, aimed at developing various techniques and frameworks to design algorithms for big data. In this project, we introduce a “Sublinear Computation Paradigm.” Toward this purpose, we first provide a survey of constant-time algorithms, which are the most investigated framework of this area, and then present our recent results on sublinear progressive algorithms. A sublinear progressive algorithm first outputs a temporary approximate solution in constant time, and then suggests better solutions gradually in sublinear-time, finally finds the exact solution. We present Sublinear Progressive Algorithm Theory (SPA Theory, for short), which enables to make a sublinear progressive algorithm for any property if it has a constant-time algorithm and an exact algorithm (an exponential-time one is allowed) without losing any computation time in the big-O sense.
Private Decision Tree Evaluation with Constant Rounds via (Only) SS-3PC over Ring and Field
Hikaru TSUCHIDA Takashi NISHIDE Yusaku MAEDA

PAPER

Pubricized:
2021/09/14
Vol:
E105-A No:3
Page(s):
214-230
Multiparty computation (MPC) is the technology that computes an arbitrary function represented as a circuit without revealing input values. Typical MPC uses secret sharing (SS) schemes, garbled circuit (GC), and homomorphic encryption (HE). These cryptographic technologies have a trade-off relationship for the computation cost, communication cost, and type of computable circuit. Hence, the optimal choice depends on the computing resources, communication environment, and function related to applications. The private decision tree evaluation (PDTE) is one of the important applications of secure computation. There exist several PDTE protocols with constant communication rounds using GC, HE, and SS-MPC over the field. However, to the best of our knowledge, PDTE protocols with constant communication rounds using MPC based on SS over the ring (requiring only lower computation costs and communication complexity) are non-trivial and still missing. In this paper, we propose a PDTE protocol based on a three-party computation (3PC) protocol over the ring with one corruption. We also propose another three-party PDTE protocol over the field with one corruption that is more efficient than the naive construction.
Private Decision Tree Evaluation by a Single Untrusted Server for Machine Learnig as a Service
Yoshifumi SAITO Wakaha OGATA

PAPER

Pubricized:
2021/09/17
Vol:
E105-A No:3
Page(s):
203-213
In this paper, we propose the first private decision tree evaluation (PDTE) schemes which are suitable for use in Machine Learning as a Service (MLaaS) scenarios. In our schemes, a user and a model owner send the ciphertexts of a sample and a decision tree model, respectively, and a single server classifies the sample without knowing the sample nor the decision tree. Although many PDTE schemes have been proposed so far, most of them require to reveal the decision tree to the server. This is undesirable because the classification model is the intellectual property of the model owner, and/or it may include sensitive information used to train the model, and therefore the model also should be hidden from the server. In other PDTE schemes, multiple servers jointly conduct the classification process and the decision tree is kept secret from the servers under the assumption they do not collude. Unfortunately, this assumption may not hold because MLaaS is usually provided by a single company. In contrast, our schemes do not have such problems. In principle, fully homomorphic encryption allows us to classify an encrypted sample based on an encrypted decision tree, and in fact, the existing non-interactive PDTE scheme can be modified so that the server classifies only handling ciphertexts. However, the resulting scheme is less efficient than ours. We also show the experimental results for our schemes.
Discriminative Part CNN for Pedestrian Detection
Yu WANG Cong CAO Jien KATO

PAPER-Image Recognition, Computer Vision

Pubricized:
2021/12/06
Vol:
E105-D No:3
Page(s):
700-712
Pedestrian detection is a significant task in computer vision. In recent years, it is widely used in applications such as intelligent surveillance systems and automated driving systems. Although it has been exhaustively studied in the last decade, the occlusion handling issue still remains unsolved. One convincing idea is to first detect human body parts, and then utilize the parts information to estimate the pedestrians' existence. Many parts-based pedestrian detection approaches have been proposed based on this idea. However, in most of these approaches, the low-quality parts mining and the clumsy part detector combination is a bottleneck that limits the detection performance. To eliminate the bottleneck, we propose Discriminative Part CNN (DP-CNN). Our approach has two main contributions: (1) We propose a high-quality body parts mining method based on both convolutional layer features and body part subclasses. The mined part clusters are not only discriminative but also representative, and can help to construct powerful pedestrian detectors. (2) We propose a novel method to combine multiple part detectors. We convert the part detectors to a middle layer of a CNN and optimize the whole detection pipeline by fine-tuning that CNN. In experiments, it shows astonishing effectiveness of optimization and robustness of occlusion handling.
A Localization Method Based on Partial Correlation Analysis for Dynamic Wireless Network Open Access
Yuki HORIGUCHI Yusuke ITO Aohan LI Mikio HASEGAWA

LETTER-Nonlinear Problems

Pubricized:
2021/09/08
Vol:
E105-A No:3
Page(s):
594-597
Recent localization methods for wireless networks cannot be applied to dynamic networks with unknown topology. To solve this problem, we propose a localization method based on partial correlation analysis in this paper. We evaluate our proposed localization method in terms of accuracy, which shows that our proposed method can achieve high accuracy localization for dynamic networks with unknown topology.
Efficiency and Accuracy Improvements of Secure Floating-Point Addition over Secret Sharing Open Access
Kota SASAKI Koji NUIDA

PAPER

Pubricized:
2021/09/09
Vol:
E105-A No:3
Page(s):
231-241
In secure multiparty computation (MPC), floating-point numbers should be handled in many potential applications, but these are basically expensive. In particular, for MPC based on secret sharing (SS), the floating-point addition takes many communication rounds though the addition is the most fundamental operation. In this paper, we propose an SS-based two-party protocol for floating-point addition with 13 rounds (for single/double precision numbers), which is much fewer than the milestone work of Aliasgari et al. in NDSS 2013 (34 and 36 rounds, respectively) and also fewer than the state of the art in the literature. Moreover, in contrast to the existing SS-based protocols which are all based on “roundTowardZero” rounding mode in the IEEE 754 standard, we propose another protocol with 15 rounds which is the first result realizing more accurate “roundTiesToEven” rounding mode. We also discuss possible applications of the latter protocol to secure Validated Numerics (a.k.a. Rigorous Computation) by implementing a simple example.
An Efficient Secure Division Protocol Using Approximate Multi-Bit Product and New Constant-Round Building Blocks Open Access
Keitaro HIWATASHI Satsuya OHATA Koji NUIDA

PAPER-Cryptography and Information Security

Pubricized:
2021/09/28
Vol:
E105-A No:3
Page(s):
404-416
Integer division is one of the most fundamental arithmetic operators and is ubiquitously used. However, the existing division protocols in secure multi-party computation (MPC) are inefficient and very complex, and this has been a barrier to applications of MPC such as secure machine learning. We already have some secure division protocols working in Z2n. However, these existing results have drawbacks that those protocols needed many communication rounds and needed to use bigger integers than in/output. In this paper, we improve a secure division protocol in two ways. First, we construct a new protocol using only the same size integers as in/output. Second, we build efficient constant-round building blocks used as subprotocols in the division protocol. With these two improvements, communication rounds of our division protocol are reduced to about 36% (87 rounds → 31 rounds) for 64-bit integers in comparison with the most efficient previous one.
Simultaneous Scheduling and Core-Type Optimization for Moldable Fork-Join Tasks on Heterogeneous Multicores
Hiroki NISHIKAWA Kana SHIMADA Ittetsu TANIGUCHI Hiroyuki TOMIYAMA

PAPER

Pubricized:
2021/09/01
Vol:
E105-A No:3
Page(s):
540-548
With the demand for energy-efficient and high- performance computing, multicore architecture has become more appealing than ever. Multicore task scheduling is one of domains in parallel computing which exploits the parallelism of multicore. Unlike traditional scheduling, multicore task scheduling has recently been studied on the assumption that tasks have inherent parallelism and can be split into multiple sub-tasks in data parallel fashion. However, it is still challenging to properly determine the degree of parallelism of tasks and mapping on multicores. Our proposed scheduling techniques determine the degree of parallelism of tasks, and sub-tasks are decided which type of cores to be assigned to heterogeneous multicores. In addition, two approaches to hardware/software codesign for heterogeneous multicore systems are proposed. The works optimize the types of cores organized in the architecture simultaneously with scheduling of the tasks such that the overall energy consumption is minimized under a deadline constraint, a warm start approach is also presented to effectively solve the problem. The experimental results show the simultaneous scheduling and core-type optimization technique remarkably reduces the energy consumption.
FPGA Implementation of 3-Bit Quantized Multi-Task CNN for Contour Detection and Disparity Estimation
Masayuki MIYAMA

PAPER-Image Recognition, Computer Vision

Pubricized:
2021/10/26
Vol:
E105-D No:2
Page(s):
406-414
Object contour detection is a task of extracting the shape created by the boundaries between objects in an image. Conventional methods limit the detection targets to specific categories, or miss-detect edges of patterns inside an object. We propose a new method to represent a contour image where the pixel value is the distance to the boundary. Contour detection becomes a regression problem that estimates this contour image. A deep convolutional network for contour estimation is combined with stereo vision to detect unspecified object contours. Furthermore, thanks to similar inference targets and common network structure, we propose a network that simultaneously estimates both contour and disparity with fully shared weights. As a result of experiments, the multi-tasking network drew a good precision-recall curve, and F-measure was about 0.833 for FlyingThings3D dataset. L1 loss of disparity estimation for the dataset was 2.571. This network reduces the amount of calculation and memory capacity by half, and accuracy drop compared to the dedicated networks is slight. Then we quantize both weights and activations of the network to 3-bit. We devise a dedicated hardware architecture for the quantized CNN and implement it on an FPGA. This circuit uses only internal memory to perform forward propagation calculations, that eliminates high-power external memory accesses. This circuit is a stall-free pixel-by-pixel pipeline, and performs 8 rows, 16 input channels, 16 output channels, 3 by 3 pixels convolution calculations in parallel. The convolution calculation performance at the operating frequency of 250 MHz is 9 TOPs/s.
Reducing Energy Consumption of Wakeup Logic through Double-Stage Tag Comparison
Yasutaka MATSUDA Ryota SHIOYA Hideki ANDO

PAPER-Computer System

Pubricized:
2021/11/02
Vol:
E105-D No:2
Page(s):
320-332
The high energy consumption of current processors causes several problems, including a limited clock frequency, short battery lifetime, and reduced device reliability. It is therefore important to reduce the energy consumption of the processor. Among resources in a processor, the issue queue (IQ) is a large consumer of energy, much of which is consumed by the wakeup logic. Within the wakeup logic, the tag comparison that checks source operand readiness consumes a significant amount of energy. This paper proposes an energy reduction scheme for tag comparison, called double-stage tag comparison. This scheme first compares the lower bits of the tag and then, only if these match, compares the higher bits. Because the energy consumption of tag comparison is roughly proportional to the total number of bits compared, energy is saved by reducing this number. However, this sequential comparison increases the delay of the IQ, thereby increasing the clock cycle time. Although this can be avoided by allocating an extra cycle to the issue operation, this in turn degrades the IPC. To avoid IPC degradation, we reconfigure a small number of entries in the IQ, where several oldest instructions that are likely to have an adverse effect on performance reside, to a single stage for tag comparison. Our evaluation results for SPEC2017 benchmark programs show that the double-stage tag comparison achieves on average a 21% reduction in the energy consumed by the wakeup logic (15% when including the overhead) with only 3.0% performance degradation.
Layer-Based Communication-Efficient Federated Learning with Privacy Preservation
Zhuotao LIAN Weizheng WANG Huakun HUANG Chunhua SU

PAPER

Pubricized:
2021/09/28
Vol:
E105-D No:2
Page(s):
256-263
In recent years, federated learning has attracted more and more attention as it could collaboratively train a global model without gathering the users' raw data. It has brought many challenges. In this paper, we proposed layer-based federated learning system with privacy preservation. We successfully reduced the communication cost by selecting several layers of the model to upload for global averaging and enhanced the privacy protection by applying local differential privacy. We evaluated our system in non independently and identically distributed scenario on three datasets. Compared with existing works, our solution achieved better performance in both model accuracy and training time.
Design and Performance of Low-Density Parity-Check Codes for Noisy Channels with Synchronization Errors
Ryo SHIBATA Hiroyuki YASHIMA

LETTER-Coding Theory

Pubricized:
2021/07/14
Vol:
E105-A No:1
Page(s):
63-67
In this letter, we study low-density parity-check (LDPC) codes for noisy channels with insertion and deletion (ID) errors. We first propose a design method of irregular LDPC codes for such channels, which can be used to simultaneously obtain degree distributions for different noise levels. We then show the asymptotic/finite-length decoding performances of designed codes and compare them with the symmetric information rates of cascaded ID-noisy channels. Moreover, we examine the relationship between decoding performance and a code structure of irregular LDPC codes.
A Novel Transferable Sparse Regression Method for Cross-Database Facial Expression Recognition
Wenjing ZHANG Peng SONG Wenming ZHENG

LETTER-Image Recognition, Computer Vision

Pubricized:
2021/10/12
Vol:
E105-D No:1
Page(s):
184-188
In this letter, we propose a novel transferable sparse regression (TSR) method, for cross-database facial expression recognition (FER). In TSR, we firstly present a novel regression function to regress the data into a latent representation space instead of a strict binary label space. To further alleviate the influence of outliers and overfitting, we impose a row sparsity constraint on the regression term. And a pairwise relation term is introduced to guide the feature transfer learning. Secondly, we design a global graph to transfer knowledge, which can well preserve the cross-database manifold structure. Moreover, we introduce a low-rank constraint on the graph regularization term to uncover additional structural information. Finally, several experiments are conducted on three popular facial expression databases, and the results validate that the proposed TSR method is superior to other non-deep and deep transfer learning methods.
Device-Free Localization via Sparse Coding with a Generalized Thresholding Algorithm
Qin CHENG Linghua ZHANG Bo XUE Feng SHU Yang YU

PAPER-Wireless Communication Technologies

Pubricized:
2021/08/05
Vol:
E105-B No:1
Page(s):
58-66
As an emerging technology, device-free localization (DFL) using wireless sensor networks to detect targets not carrying any electronic devices, has spawned extensive applications, such as security safeguards and smart homes or hospitals. Previous studies formulate DFL as a classification problem, but there are still some challenges in terms of accuracy and robustness. In this paper, we exploit a generalized thresholding algorithm with parameter p as a penalty function to solve inverse problems with sparsity constraints for DFL. The function applies less bias to the large coefficients and penalizes small coefficients by reducing the value of p. By taking the distinctive capability of the p thresholding function to measure sparsity, the proposed approach can achieve accurate and robust localization performance in challenging environments. Extensive experiments show that the algorithm outperforms current alternatives.
CMOS Image Sensor with Pixel-Parallel ADC and HDR Reconstruction from Intermediate Exposure Images Open Access
Shinnosuke KURATA Toshinori OTAKA Yusuke KAMEDA Takayuki HAMAMOTO

LETTER-Image

Pubricized:
2021/07/26
Vol:
E105-A No:1
Page(s):
82-86
We propose a HDR (high dynamic range) reconstruction method in an image sensor with a pixel-parallel ADC (analog-to-digital converter) for non-destructively reading out the intermediate exposure image. We report the circuit design for such an image sensor and the evaluation of the basic HDR reconstruction method.
Parameter Estimation of Markovian Arrivals with Utilization Data
Chen LI Junjun ZHENG Hiroyuki OKAMURA Tadashi DOHI

PAPER-Fundamental Theories for Communications

Pubricized:
2021/07/08
Vol:
E105-B No:1
Page(s):
1-10
Utilization data (a kind of incomplete data) is defined as the fraction of a fixed period in which the system is busy. In computer systems, utilization data is very common and easily observable, such as CPU utilization. Unlike inter-arrival times and waiting times, it is more significant to consider the parameter estimation of transaction-based systems with utilization data. In our previous work [7], a novel parameter estimation method using utilization data for an Mt/M/1/K queueing system was presented to estimate the parameters of a non-homogeneous Poisson process (NHPP). Since NHPP is classified as a simple counting process, it may not fit actual arrival streams very well. As a generalization of NHPP, Markovian arrival process (MAP) takes account of the dependency between consecutive arrivals and is often used to model complex, bursty, and correlated traffic streams. In this paper, we concentrate on the parameter estimation of an MAP/M/1/K queueing system using utilization data. In particular, the parameters are estimated by using maximum likelihood estimation (MLE) method. Numerical experiments on real utilization data validate the proposed approach and evaluate the effective traffic intensity of the arrival stream of MAP/M/1/K queueing system. Besides, three kinds of utilization datasets are created from a simulation to assess the effects of observed time intervals on both estimation accuracy and computational cost. The numerical results show that MAP-based approach outperforms the exiting method in terms of both the estimation accuracy and computational cost.

101-120hit(2741hit)

Keyword Search Result

[Keyword] PAR(2741hit)

Research on Mongolian-Chinese Translation Model Based on Transformer with Soft Context Data Augmentation Technique

Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments

An Algorithm for Single Snapshot 2D-DOA Estimation Based on a Three-Parallel Linear Array Model Open Access

Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering

Sublinear Computation Paradigm: Constant-Time Algorithms and Sublinear Progressive Algorithms Open Access

Private Decision Tree Evaluation with Constant Rounds via (Only) SS-3PC over Ring and Field

Private Decision Tree Evaluation by a Single Untrusted Server for Machine Learnig as a Service

Discriminative Part CNN for Pedestrian Detection

A Localization Method Based on Partial Correlation Analysis for Dynamic Wireless Network Open Access

Efficiency and Accuracy Improvements of Secure Floating-Point Addition over Secret Sharing Open Access

An Efficient Secure Division Protocol Using Approximate Multi-Bit Product and New Constant-Round Building Blocks Open Access

Simultaneous Scheduling and Core-Type Optimization for Moldable Fork-Join Tasks on Heterogeneous Multicores

FPGA Implementation of 3-Bit Quantized Multi-Task CNN for Contour Detection and Disparity Estimation

Reducing Energy Consumption of Wakeup Logic through Double-Stage Tag Comparison

Layer-Based Communication-Efficient Federated Learning with Privacy Preservation

Design and Performance of Low-Density Parity-Check Codes for Noisy Channels with Synchronization Errors

A Novel Transferable Sparse Regression Method for Cross-Database Facial Expression Recognition

Device-Free Localization via Sparse Coding with a Generalized Thresholding Algorithm

CMOS Image Sensor with Pixel-Parallel ADC and HDR Reconstruction from Intermediate Exposure Images Open Access

Parameter Estimation of Markovian Arrivals with Utilization Data

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles