Tianbin WANG Ruiyang HUANG Nan HU Huansha WANG Guanghan CHU
Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.
Kenya SAKAMOTO Shizuka SHIRAI Noriko TAKEMURA Jason ORLOSKY Hiroyuki NAGATAKI Mayumi UEDA Yuki URANISHI Haruo TAKEMURA
This study explores significant eye-gaze features that can be used to estimate subjective difficulty while reading educational comics. Educational comics have grown rapidly as a promising way to teach difficult topics using illustrations and texts. However, comics include a variety of information on one page, so automatically detecting learners' states such as subjective difficulty is difficult with approaches such as system log-based detection, which is common in the Learning Analytics field. In order to solve this problem, this study focused on 28 eye-gaze features, including the proposal of three new features called “Variance in Gaze Convergence,” “Movement between Panels,” and “Movement between Tiles” to estimate two degrees of subjective difficulty. We then ran an experiment in a simulated environment using Virtual Reality (VR) to accurately collect gaze information. We extracted features in two unit levels, page- and panel-units, and evaluated the accuracy with each pattern in user-dependent and user-independent settings, respectively. Our proposed features achieved an average F1 classification-score of 0.721 and 0.742 in user-dependent and user-independent models at panel unit levels, respectively, trained by a Support Vector Machine (SVM).
In this paper, we propose a new method for non-dominant limb training. The method is that a learner aims at a motion which is generated by reversing his/her own motion of dominant limb, when he/she tries to train himself/herself for non-dominant limb training. In addition, we designed and developed interface for the new method which can select feedback types. One is an interface using AR and sound, and the other is an interface using AR and vibration. We found that vibration feedback was effective for non-dominant hand training of pitching motion, while sound feedback was not so effective as vibration.
Fauzan ARROFIQI Takashi WATANABE Achmad ARIFIN
The purpose of this study was to develop a practical functional electrical stimulation (FES) controller for joint movements restoration based on an optimal control technique by cascading a linear model predictive control (MPC) and a nonlinear transformation. The cascading configuration was aimed to obtain an FES controller that is able to deal with a nonlinear system. The nonlinear transformation was utilized to transform the linear solution of linear MPC to become a nonlinear solution in form of optimized electrical stimulation intensity. Four different types of nonlinear functions were used to realize the nonlinear transformation. A simple parameter estimation to determine the value of the nonlinear transformation parameter was also developed. The tracking control capability of the proposed controller along with the parameter estimation was examined in controlling the 1-DOF wrist joint movement through computer simulation. The proposed controller was also compared with a fuzzy FES controller. The proposed MPC-FES controller with estimated parameter value worked properly and had a better control accuracy than the fuzzy controller. The parameter estimation was suggested to be useful and effective in practical FES control applications to reduce the time-consuming of determining the parameter value of the proposed controller.
Longjiao ZHAO Yu WANG Jien KATO Yoshiharu ISHIKAWA
Convolutional Neural Networks (CNNs) have recently demonstrated outstanding performance in image retrieval tasks. Local convolutional features extracted by CNNs, in particular, show exceptional capability in discrimination. Recent research in this field has concentrated on pooling methods that incorporate local features into global features and assess the global similarity of two images. However, the pooling methods sacrifice the image's local region information and spatial relationships, which are precisely known as the keys to the robustness against occlusion and viewpoint changes. In this paper, instead of pooling methods, we propose an alternative method based on local similarity, determined by directly using local convolutional features. Specifically, we first define three forms of local similarity tensors (LSTs), which take into account information about local regions as well as spatial relationships between them. We then construct a similarity CNN model (SCNN) based on LSTs to assess the similarity between the query and gallery images. The ideal configuration of our method is sought through thorough experiments from three perspectives: local region size, local region content, and spatial relationships between local regions. The experimental results on a modified open dataset (where query images are limited to occluded ones) confirm that the proposed method outperforms the pooling methods because of robustness enhancement. Furthermore, testing on three public retrieval datasets shows that combining LSTs with conventional pooling methods achieves the best results.
Wenkai LIU Cuizhu QIN Menglong WU Wenle BAI Hongxia DONG
Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
Minhaz KAMAL Chowdhury Mohammad ABDULLAH Fairuz SHAIARA Abu Raihan Mostofa KAMAL Md Mehedi HASAN Jik-Soo KIM Md Azam HOSSAIN
The literature presents a digitized pension system based on a consortium blockchain, with the aim of overcoming existing pension system challenges such as multiparty collaboration, manual intervention, high turnaround time, cost transparency, auditability, etc. In addition, the adoption of hyperledger fabric and the introduction of smart contracts aim to transform multi-organizational workflow into a synchronized, automated, modular, and error-free procedure.
The finger-vein-based deep neural network authentication system has been applied widely in real scenarios, such as countries' banking and entrance guard systems. However, to ensure performance, the deep neural network should train many parameters, which needs lots of time and computing resources. This paper proposes a method that introduces artificial features with prior knowledge into the convolution layer. First, it designs a multi-direction pattern base on the traditional local binary pattern, which extracts general spatial information and also reduces the spatial dimension. Then, establishes a sample effective deep convolutional neural network via combination with convolution, with the ability to extract deeper finger vein features. Finally, trains the model with a composite loss function to increase the inter-class distance and reduce the intra-class distance. Experiments show that the proposed methods achieve a good performance of higher stability and accuracy of finger vein recognition.
Fei WU Shuaishuai LI Guangchuan PENG Yongheng MA Xiao-Yuan JING
Cross-modal hashing technology has attracted much attention for its favorable retrieval performance and low storage cost. However, for existing cross-modal hashing methods, the heterogeneity of data across modalities is still a challenge and how to fully explore and utilize the intra-modality features has not been well studied. In this paper, we propose a novel cross-modal hashing approach called Modality-fused Graph Network (MFGN). The network architecture consists of a text channel and an image channel that are used to learn modality-specific features, and a modality fusion channel that uses the graph network to learn the modality-shared representations to reduce the heterogeneity across modalities. In addition, an integration module is introduced for the image and text channels to fully explore intra-modality features. Experiments on two widely used datasets show that our approach achieves better results than the state-of-the-art cross-modal hashing methods.
Yue XIE Ruiyu LIANG Zhenlin LIANG Xiaoyan ZHAO Wenhao ZENG
To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.
Sung-Gyun LIM Dong-Ha KIM Kwan-Jung OH Gwangsoon LEE Jun Young JEONG Jae-Gon KIM
The MPEG Immersive Video (MIV) standard for immersive video coding provides users with an immersive sense of 6 degrees of freedom (6DoF) of view position and orientation by efficiently compressing multiview video acquired from different positions in a limited 3D space. In the MIV reference software called Test Model for Immersive Video (TMIV), the number of pixels to be compressed and transmitted is reduced by removing inter-view redundancy. Therefore, the occupancy information that indicates whether each pixel is valid or invalid must also be transmitted to the decoder for viewport rendering. The occupancy information is embedded in a geometry atlas and transmitted to the decoder side. At this time, to prevent occupancy errors that may occur during the compression of the geometry atlas, a guard band is set in the depth dynamic range. Reducing this guard band can improve the rendering quality by allowing a wider dynamic range for depth representation. Therefore, in this paper, based on the analysis of occupancy error of the current TMIV, two methods of occupancy error correction which allow depth dynamic range extension in the case of computer-generated (CG) sequences are presented. The experimental results show that the proposed method gives an average 2.2% BD-rate bit saving for CG compared to the existing TMIV.
Min Ho KWAK Youngwoo KIM Kangin LEE Jae Young CHOI
This letter proposes a novel lightweight deep learning object detector named LW-YOLOv4-tiny, which incorporates the convolution block feature addition module (CBFAM). The novelty of LW-YOLOv4-tiny is the use of channel-wise convolution and element-wise addition in the CBFAM instead of utilizing the concatenation of different feature maps. The model size and computation requirement are reduced by up to 16.9 Mbytes, 5.4 billion FLOPs (BFLOPS), and 11.3 FPS, which is 31.9%, 22.8%, and 30% smaller and faster than the most recent version of YOLOv4-tiny. From the MSCOCO2017 and PASCAL VOC2012 benchmarks, LW-YOLOv4-tiny achieved 40.2% and 69.3% mAP, respectively.
Meng ZHAO Junfeng WU Hong YU Haiqing LI Jingwen XU Siqi CHENG Lishuai GU Juan MENG
Accurate fish detection is of great significance in aquaculture. However, the non-uniform strong reflection in aquaculture ponds will affect the precision of fish detection. This paper combines YOLOv4 and CVAE to accurately detect fishes in the image with non-uniform strong reflection, in which the reflection in the image is removed at first and then the reflection-removed image is provided for fish detecting. Firstly, the improved YOLOv4 is applied to detect and mask the strong reflective region, to locate and label the reflective region for the subsequent reflection removal. Then, CVAE is combined with the improved YOLOv4 for inferring the priori distribution of the Reflection region and restoring the Reflection region by the distribution so that the reflection can be removed. For further improving the quality of the reflection-removed images, the adversarial learning is appended to CVAE. Finally, YOLOV4 is used to detect fishes in the high quality image. In addition, a new image dataset of pond cultured takifugu rubripes is constructed,, which includes 1000 images with fishes annotated manually, also a synthetic dataset including 2000 images with strong reflection is created and merged with the generated dataset for training and verifying the robustness of the proposed method. Comprehensive experiments are performed to compare the proposed method with the state-of-the-art fish detecting methods without reflection removal on the generated dataset. The results show that the fish detecting precision and recall of the proposed method are improved by 2.7% and 2.4% respectively.
Chengkai CAI Kenta IWAI Takanobu NISHIURA
The acquisition of distant sound has always been a hot research topic. Since sound is caused by vibration, one of the best methods for measuring distant sound is to use a laser Doppler vibrometer (LDV). This laser has high directivity, that enables it to acquire sound from far away, which is of great practical use for disaster relief and other situations. However, due to the vibration characteristics of the irradiated object itself and the reflectivity of its surface (or other reasons), the acquired sound is often lacking frequency components in certain frequency bands and is mixed with obvious noise. Therefore, when using LDV to acquire distant speech, if we want to recognize the actual content of the speech, it is necessary to enhance the acquired speech signal in some way. Conventional speech enhancement methods are not generally applicable due to the various types of degradation in observed speech. Moreover, while several speech enhancement methods for LDV have been proposed, they are only effective when the irradiated object is known. In this paper, we present a speech enhancement method for LDV that can deal with unknown irradiated objects. The proposed method is composed of noise reduction, pitch detection, power spectrum envelope estimation, power spectrum reconstruction, and phase estimation. Experimental results demonstrate the effectiveness of our method for enhancing the acquired speech with unknown irradiated objects.
Jiang MA Jun ZHANG Yanguo JIA Xiumin SHEN
Pseudorandom sequences with large linear complexity can resist the linear attack. The trace representation plays an important role in analysis and design of pseudorandom sequences. In this letter, we present the construction of a family of new binary sequences derived from Euler quotients modulo pq, where pq is a product of two primes and p divides q-1. Firstly, the linear complexity of the sequences are investigated. It is proved that the sequences have larger linear complexity and can resist the attack of Berlekamp-Massey algorithm. Then, we give the trace representation of the proposed sequences by determining the corresponding defining pair. Moreover, we generalize the result to the Euler quotients modulo pmqn with m≤n. Results indicate that the generalized sequences still have high linear complexity. We also give the trace representation of the generalized sequences by determining the corresponding defining pair. The result will be helpful for the implementation and the pseudorandom properties analysis of the sequences.
Shigenori KINJO Takayuki GAMOH Masaaki YAMANAKA
A new zero-forcing block diagonalization (ZF-BD) scheme that enables both a more simplified ZF-BD and further increase in sum rate of MU-MIMO channels is proposed in this paper. The proposed scheme provides the improvement in BER performance for equivalent SU-MIMO channels. The proposed scheme consists of two components. First, a permuted channel matrix (PCM), which is given by moving the submatrix related to a target user to the bottom of a downlink MIMO channel matrix, is newly defined to obtain a precoding matrix for ZF-BD. Executing QR decomposition alone for a given PCM provides null space for the target user. Second, a partial MSQRD (PMSQRD) algorithm, which adopts MSQRD only for a target user to provide improvement in bit rate and BER performance for the user, is proposed. Some numerical simulations are performed, and the results show improvement in sum rate performance of the total system. In addition, appropriate bit allocation improves the bit error rate (BER) performance in each equivalent SU-MIMO channel. A successive interference cancellation is applied to achieve further improvement in BER performance of user terminals.
He HE Shun KOJIMA Kazuki MARUTA Chang-Jun AHN
In mobile communication systems, the channel state information (CSI) is severely affected by the noise effect of the receiver. The adaptive subcarrier grouping (ASG) for sample matrix inversion (SMI) based minimum mean square error (MMSE) adaptive array has been previously proposed. Although it can reduce the additive noise effect by increasing samples to derive the array weight for co-channel interference suppression, it needs to know the signal-to-noise ratio (SNR) in advance to set the threshold for subcarrier grouping. This paper newly proposes adaptive zero padding (AZP) in the time domain to improve the weight accuracy of the SMI matrix. This method does not need to estimate the SNR in advance, and even if the threshold is always constant, it can adaptively identify the position of zero-padding to eliminate the noise interference of the received signal. Simulation results reveal that the proposed method can achieve superior bit error rate (BER) performance under various Rician K factors.
Qingjuan ZHANG Shanqi PANG Yuan LI
Variable strength orthogonal array, as a special form of variable strength covering array, plays an important role in computer software testing and cryptography. In this paper, we study the construction of variable strength orthogonal arrays with strength two containing strength greater than two by Galois field and construct some variable strength orthogonal arrays with strength l containing strength greater than l by Fan-construction.
This research develops a new automatic path following control method for a car model based on just-in-time modeling. The purpose is that a lot of basic driving data for various situations are accumulated into a database, and we realize automatic path following for unknown roads by using only data in the database. Especially, just-in-time modeling is repeatedly utilized in order to follow the desired points on the given road. From the results of a numerical simulation, it turns out that the proposed new method can make the car follow the desired points on the given road with small error, and it shows high computational efficiency.
In this letter, we consider the problem of joint selection of transmitters and receivers in a distributed multi-input multi-output radar network for localization. Different from previous works, we consider a more mathematically challenging but generalized situation that the transmitting signals are not perfectly orthogonal. Taking Cramér Rao lower bound as performance metric, we propose a scheme of joint selection of transmitters and receivers (JSTR) aiming at optimizing the localization performance under limited number of nodes. We propose a bi-convex relaxation to replace the resultant NP hard non-convex problem. Using the bi-convexity, the surrogate problem can be efficiently resolved by nonlinear alternating direction method of multipliers. Simulation results reveal that the proposed algorithm has very close performance compared with the computationally intensive but global optimal exhaustive search method.