Kazunori KOMATANI Naoki HOTTA Satoshi SATO Mikio NAKANO
Appropriate turn-taking is important in spoken dialogue systems as well as generating correct responses. Especially if the dialogue features quick responses, a user utterance is often incorrectly segmented due to short pauses within it by voice activity detection (VAD). Incorrectly segmented utterances cause problems both in the automatic speech recognition (ASR) results and turn-taking: i.e., an incorrect VAD result leads to ASR errors and causes the system to start responding though the user is still speaking. We develop a method that performs a posteriori restoration for incorrectly segmented utterances and implement it as a plug-in for the MMDAgent open-source software. A crucial part of the method is to classify whether the restoration is required or not. We cast it as a binary classification problem of detecting originally single utterances from pairs of utterance fragments. Various features are used representing timing, prosody, and ASR result information. Experiments show that the proposed method outperformed a baseline with manually-selected features by 4.8% and 3.9% in cross-domain evaluations with two domains. More detailed analysis revealed that the dominant and domain-independent features were utterance intervals and results from the Gaussian mixture model (GMM).
Kosuke TOMITA Masahide HATANAKA Takao ONOYE
Viterbi decoding is commonly used for several protocols, but computational cost is quite high and thus it is necessary to implement it effectively. This paper describes GPU implementation of Viterbi decoder utilizing three-point Viterbi decoding algorithm (TVDA), in which the received bits are divided into multiple chunks and several chunks are decoded simultaneously. Coalesced access and Warp Shuffle, which is new instruction introduced are also utilized in order to improve decoder performance. In addition, iterative execution of parallel chunks decoding reduces the latency of proposed Viterbi decoder in order to utilize the decoder as a part of GPU-based SDR transceiver. As the result, the throughput of proposed Viterbi decoder is improved by 23.1%.
Chiaki UEDA Minami IBATA Tadahiro AZETSU Noriaki SUETAKE Eiji UCHINO
In a food image acquired by a digital camera, its intensity and saturation components are sometimes decreased depending on the illumination environment. In this case, the food image does not look delicious. In general, RGB components are transformed into hue, saturation and intensity components, and then the saturation and intensity components are enhanced so that the food image looks delicious. However, these processes are complex and involve a gamut problem. In this paper, we propose an intensity and saturation enhancement method while preserving the hue in the RGB color space for the food image. In this method, at first, the intensity components are enhanced avoiding the saturation deterioration. Then the saturation components of the regions having the hue components frequently appeared in foods are enhanced. In order to illustrate the effectiveness of the proposed method, the enhancement experiments using several food images are done.
Keita KONNO Naoki HONMA Dai SASAKAWA Kentaro NISHIMORI Nobuyasu TAKEMURA Tsutomu MITSUI Yoshitaka TSUNEKAWA
This paper proposes a method that uses bistatic Multiple-Input Multiple-Output (MIMO) radar to locate living-bodies. In this method, directions of living-bodies are estimated by the MUltiple SIgnal Classification (MUSIC) method at the transmitter and receiver, where the Fourier transformed virtual Single-Input Multiple-Output (SIMO) channel matrix is used. Body location is taken as the intersection of the two directions. The proposal uses a single frequency and so has a great advantage over conventional methods that need a wide frequency band. Also, this method can be used in multipath-rich environments such as indoors. An experiment is performed in an indoor environment, and the MIMO channels yielded by various subject numbers and positions are measured. The result indicates that the proposed method can estimate multiple living-body locations with high accuracy, even in multipath environments.
Xiantao JIANG Tian SONG Takashi SHIMAMOTO Wen SHI Lisheng WANG
The next generation high efficiency video coding (HEVC) standard achieves high performance by extending the encoding block to 64×64. There are some parallel tools to improve the efficiency for encoder and decoder. However, owing to the dependence of the current prediction block and surrounding block, parallel processing at CU level and Sub-CU level are hard to achieve. In this paper, focusing on the spatial motion vector prediction (SMVP) and temporal motion vector prediction (TMVP), parallel improvement for spatio-temporal prediction algorithms are presented, which can remove the dependency between prediction coding units and neighboring coding units. Using this proposal, it is convenient to process motion estimation in parallel, which is suitable for different parallel platforms such as multi-core platform, compute unified device architecture (CUDA) and so on. The simulation experiment results demonstrate that based on HM12.0 test model for different test sequences, the proposed algorithm can improve the advanced motion vector prediction with only 0.01% BD-rate increase that result is better than previous work, and the BDPSNR is almost the same as the HEVC reference software.
Jun JIANG Xiaohong WU Xiaohai HE Pradeep KARN
Crowd collectiveness, i.e., a quantitative metric for collective motion, has received increasing attention in recent years. Most of existing methods build a collective network by assuming each agent in the crowd interacts with neighbors within fixed radius r region or fixed k nearest neighbors. However, they usually use a universal r or k for different crowded scenes, which may yield inaccurate network topology and lead to lack of adaptivity to varying collective motion scenarios, thereby resulting in poor performance. To overcome these limitations, we propose a compressive sensing (CS) based method for measuring crowd collectiveness. The proposed method uncovers the connections among agents from the motion time series by solving a CS problem, which needs not specify an r or k as a priori. A descriptor based on the average velocity correlations of connected agents is then constructed to compute the collectiveness value. Experimental results demonstrate that the proposed method is effective in measuring crowd collectiveness, and performs on par with or better than the state-of-the-art methods.
Xiao WU Zhou JIN Dan NIU Yasuaki INOUE
An effective time-step control method is proposed for the damped pseudo-transient analysis (DPTA). This method is based on the idea of the switched evolution/relaxation method which can automatically adapt the step size for different circuit states. Considering the number of iterations needed for the convergence of the Newton-Raphson method, the new method adapts the suitable time-step size with the status of previous steps. By numerical examples, it is proved that this method can improve the simulation efficiency and convergence for the DPTA method to solve nonlinear DC circuits.
The pilot symbols in the broadband Air-to-Ground (A/G) communications system, e.g., L-band Digital Aeronautical Communications System (L-DACS1), are expected to be also utilized for navigation. In order to identify the co-channel signals from different Ground Stations (GSs), the N-Shift Zero Correlation Zone (NS-ZCZ) sequences are employed for pilot sequences. The ideal correlation property of the proposed pilot sequence in ZCZ can maintain the signal with less co-channel interference. The simulation confirms that the more co-channel GSs are employed, the higher navigation accuracy can be achieved.
Yuki KOGA Tokiyoshi MATSUDA Mutsumi KIMURA Dapeng WANG Mamoru FURUTA Masashi KASAMI Shigekazu TOMAI Koki YANO
We have developed a capacitance sensor of frequency modulation for integrated touchpanels using amorphous In-Sn-Zn-O (α-ITZO) thin-film transistors (TFTs). This capacitance sensor consists of a ring oscillator, whose one stage is replaced by a reset transistor, sensing transistor, and sensing electrode. The sensing electrode is prepared as one terminal to form a sensing capacitor when the other terminal is added by a finger. The ring oscillator consists of pseudo CMOS inverters. We confirm that the oscillation frequency changes when the other terminal is added. This result suggests that this capacitance sensor can be applied to integrated touchpanels on flatpanel displays.
Mototaka OCHI Yoko SHIDA Hiroyuki OKUNO Hiroshi GOTO Toshihiro KUGIMIYA Moriyoshi KANAMARU
An Al-N system optical absorption layer has been developed, to be used for Al-based metal mesh electrodes on touch screen panels. The triple-layered electrode effectively suppresses the optical reflection in both visible light and the blue color region and exhibits excellent wet etching property that accommodates micro-fabrication. Due to its high noise immunity and contact sensitivity originating from its low electrical resistivity, the proposed metal mesh electrodes are useful for touch-sensitive panels in the next generation ultra-high-resolution displays.
This paper presents an Adapting Block-Propagative Background Subtraction (ABPBGS) designed for Ultra High Definition Television (UHDTV) foreground detection. The main idea is to detect block after block along the objects in order to skip all areas of the image in which there is no moving object. This is particularly interesting for UHDTV when the objects of interest could represent not even 0.1% of the total area. From a seed block which is determined in a previous iteration, the detection will spread along an object as long as it detects a part of that object. A block history map guaranties that each block is processed only once. Moreover, only small blocks are loaded and processed, thus saving computational time and memory usage. The process of each block is independent enough to be easily parallelized. Compared to 9 state-of-the-art works, the ABPBGS achieved the best results with an average global quality score of 0.57 (1 being the maximum) on a dataset of 4K and 8K UHDTV sequences developed for this work. None of the state-of-the-art methods could process 4K videos in reasonable time while the ABPBGS has shown an average speed of 5.18fps. In comparison, 5 of the 9 state-of-the-art methods performed slower on 270p down-scale version of the same videos. The experiments have also shown that for the process an 8K UHDTV video the ABPBGS can divide the memory required by about 24 for a total of 450MB.
Yan REN Guilin WANG Yunhong HU Qiuyan WANG
In this paper, we first propose a notion of multiple authorities attribute-based designated confirmer signature scheme with unified verification. In a multiple authorities attribute-based designated confirmer signature scheme with unified verification, both the signer and the designated confirmer can run the same protocols to confirm a valid signature or disavow an invalid signature. Then, we construct a multiple authorities attribute-based designated confirmer signature scheme with unified verification. Finally, we prove the correctness and security of the proposed scheme.
Trung Kien VU Sungoh KWON Sangchul OH
Heterogeneous hetworks (HetNets) have been introduced as an emerging technology in order to meet the increasing demand for mobile data. HetNets are a combination of multi-layer networks such as macrocells and small cells. In such networks, users may suffer significant cross-layer interference. To manage this interference, the 3rd Generation Partnership Project (3GPP) has introduced enhanced Inter-Cell Interference Coordination (eICIC) techniques. Almost Blank SubFrame (ABSF) is one of the time-domain techniques used in eICIC solutions. We propose a dynamically optimal Signal-to-Interference-and-Noise Ratio (SINR)-based ABSF framework to ensure macro user performance while maintaining small user performance. We also study cooperative mechanisms to help small cells collaborate efficiently in order to reduce mutual interference. Simulations show that our proposed scheme achieves good performance and outperforms the existing ABSF frameworks.
Ryota KAWASHIMA Hiroshi MATSUO
An L2-in-L3 tunneling technology plays an important role in network virtualization based on the concept of Software-Defined Networking (SDN). VXLAN (Virtual eXtensible LAN) and NVGRE (Network Virtualization using Generic Routing Encapsulation) protocols are being widely used in public cloud datacenters. These protocols resolve traditional VLAN problems such as a limitation of the number of virtual networks, however, their network performances are low without dedicated hardware acceleration. Although STT (Stateless Transport Tunneling) achieves far better performance, it has pragmatic problems in that STT packets can be dropped by network middleboxes like stateful firewalls because of modified TCP header semantics. In this paper, we propose yet another layer 4 protocol (Segment-oriented Connection-less Protocol, SCLP) for existing tunneling protocols. Our previous study revealed that the high-performance of STT mainly comes from 2-level software packet pre-reassembly before decapsulation. The SCLP header is designed to take advantage of such processing without modifying existing protocol semantics. We implement a VXLAN over SCLP tunneling and evaluate its performance by comparing with the original VXLAN (over UDP), NVGRE, Geneve, and STT. The results show that the throughput of the proposed method was comparable to STT and almost 70% higher than that of other protocols.
The advanced front-end (AFE) for automatic speech recognition (ASR) was standardized by the European Telecommunications Standards Institute (ETSI). The AFE provides speech enhancement realized by an iterative Wiener filter (IWF) in which a smoothed FFT spectrum over adjacent frames is used to design the filter. We have previously proposed robust time-varying complex Auto-Regressive (TV-CAR) speech analysis for an analytic signal and evaluated the performance of speech processing such as F0 estimation and speech enhancement. TV-CAR analysis can estimate more accurate spectrum than FFT, especially in low frequencies because of the nature of the analytic signal. In addition, TV-CAR can estimate more accurate speech spectrum against additive noise. In this paper, a time-invariant version of wide-band TV-CAR analysis is introduced to the IWF in the AFE and is evaluated using the CENSREC-2 database and its baseline script.
Takahiro NATORI Nari TANABE Toshihiro FURUKAWA
This paper proposes the MIMO MC-CDMA channel estimation method for the various mobile environments. The distinctive feature of the proposed method is possible to robustly estimate with respect to the mobile velocity using the Kalman filter with the colored driving source. Effectiveness of the proposed method are shown by computer simulations.
Hideki SAKAI Takahiro ISHINABE Hideo FUJIKAKE
To develop a flexible liquid crystal display (LCD) with a wide viewing angle range and high contrast ratio, we have proposed a flexible blue-phase LC device sustained by polymer walls inside the LC cell. We clarified that the polymer walls can maintain a constant cell gap and suppress the generation of alignment defects of the blue-phase LC in a bending state.
Shuta ISHIZUKA Takuya MUKAI Hideki KAKEYA
We realize homogenous luminance of the directional backlight for the time-division multiplexing autostereoscopic display using a convex lens array with the elemental lenses whose phase of placement in each row differs from one another. The validity of the proposed optical design is confirmed by a prototype system.
Kenta KURIHARA Masanori KIKUCHI Shoko IMAIZUMI Sayaka SHIOTA Hitoshi KIYA
In many multimedia applications, image encryption has to be conducted prior to image compression. This paper proposes a JPEG-friendly perceptual encryption method, which enables to be conducted prior to JPEG and Motion JPEG compressions. The proposed encryption scheme can provides approximately the same compression performance as that of JPEG compression without any encryption, where both gray scale images and color ones are considered. It is also shown that the proposed scheme consists of four block-based encryption steps, and provide a reasonably high level of security. Most of conventional perceptual encryption schemes have not been designed for international compression standards, but this paper focuses on applying the JPEG and Motion JPEG standards, as one of the most widely used image compression standards. In addition, this paper considers an efficient key management scheme, which enables an encryption with multiple keys to be easy to manage its keys.
Neural networks are widely used in various fields due to their superior learning abilities. This paper proposes a hardware winner-take-all neural network (WTANN) that employs a new winner-take-all (WTA) circuit with phase-modulated pulse signals and digital phase-locked loops (DPLLs). The system uses DPLL as a computing element, so all input values are expressed by phases of rectangular signals. The proposed WTA circuit employs a simple winner search circuit. The proposed WTANN architecture is described by very high speed integrated circuit (VHSIC) hardware description language (VHDL), and its feasibility was tested and verified through simulations and experiments. Conventional WTA takes a global winner search approach, in which vector distances are collected from all neurons and compared. In contrast, the WTA in the proposed system is carried out locally by a distributed winner search circuit among neurons. Therefore, no global communication channels with a wide bandwidth between the winner search module and each neuron are required. Furthermore, the proposed WTANN can easily extend the system scale, merely by increasing the number of neurons. The circuit size and speed were then evaluated by applying the VHDL description to a logic synthesis tool and experiments using a field programmable gate array (FPGA). Vector classifications with WTANN using two kinds of data sets, Iris and Wine, were carried out in VHDL simulations. The results revealed that the proposed WTANN achieved valid learning.