Yi LIU Wei QIN Qibin ZHENG Gensong LI Mengmeng LI
Feature selection based on particle swarm optimization is often employed for promoting the performance of artificial intelligence algorithms. However, its interpretability has been lacking of concrete research. Improving the stability of the feature selection method is a way to effectively improve its interpretability. A novel feature selection approach named Interpretable Particle Swarm Optimization is developed in this paper. It uses four data perturbation ways and three filter feature selection methods to obtain stable feature subsets, and adopts Fuch map to convert them to initial particles. Besides, it employs similarity mutation strategy, which applies Tanimoto distance to choose the nearest 1/3 individuals to the previous particles to implement mutation. Eleven representative algorithms and four typical datasets are taken to make a comprehensive comparison with our proposed approach. Accuracy, F1, precision and recall rate indicators are used as classification measures, and extension of Kuncheva indicator is employed as the stability measure. Experiments show that our method has a better interpretability than the compared evolutionary algorithms. Furthermore, the results of classification measures demonstrate that the proposed approach has an excellent comprehensive classification performance.
Weina ZHOU Xinxin HUANG Xiaoyang ZENG
As a kind of marine vehicles, Unmanned Surface Vehicles (USV) are widely used in military and civilian fields because of their low cost, good concealment, strong mobility and high speed. High-precision detection of obstacles plays an important role in USV autonomous navigation, which ensures its subsequent path planning. In order to further improve obstacle detection performance, we propose an encoder-decoder architecture named Fusion Refinement Network (FRN). The encoder part with a deeper network structure enables it to extract more rich visual features. In particular, a dilated convolution layer is used in the encoder for obtaining a large range of obstacle features in complex marine environment. The decoder part achieves the multiple path feature fusion. Attention Refinement Modules (ARM) are added to optimize features, and a learnable fusion algorithm called Feature Fusion Module (FFM) is used to fuse visual information. Experimental validation results on three different datasets with real marine images show that FRN is superior to state-of-the-art semantic segmentation networks in performance evaluation. And the MIoU and MPA of the FRN can peak at 97.01% and 98.37% respectively. Moreover, FRN could maintain a high accuracy with only 27.67M parameters, which is much smaller than the latest obstacle detection network (WaSR) for USV.
Hongzhe LIU Ningwei WANG Xuewei LI Cheng XU Yaze LI
In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.
Ya ZENG Li WAN Qiuhong LUO Mao CHEN
Traditional pipeline methods for task-oriented dialogue systems are designed individually and expensively. Existing memory augmented end-to-end methods directly map the inputs to outputs and achieve promising results. However, the most existing end-to-end solutions store the dialogue history and knowledge base (KB) information in the same memory and represent KB information in the form of KB triples, making the memory reader's reasoning on the memory more difficult, which makes the system difficult to retrieve the correct information from the memory to generate a response. Some methods introduce many manual annotations to strengthen reasoning. To reduce the use of manual annotations, while strengthening reasoning, we propose a hierarchical memory model (HM2Seq) for task-oriented systems. HM2Seq uses a hierarchical memory to separate the dialogue history and KB information into two memories and stores KB in KB rows, then we use memory rows pointer combined with an entity decoder to perform hierarchical reasoning over memory. The experimental results on two publicly available task-oriented dialogue datasets confirm our hypothesis and show the outstanding performance of our HM2Seq by outperforming the baselines.
Linh T. HOANG Anh-Tuan H. BUI Chuyen T. NGUYEN Anh T. PHAM
Deployment of machine-type communications (MTCs) over the current cellular network could lead to severe overloading of the radio access network of Long Term Evolution (LTE)-based systems. This paper proposes a slotted access-based solution, called the Slotted Access For Group Paging (SAFGP), to cope with the paging-induced MTC traffic. The proposed SAFGP splits paged devices into multiple access groups, and each group is then allocated separate radio resources on the LTE's Physical Random Access Channel (PRACH) in a periodic manner during the paging interval. To support the proposed scheme, a new adaptive barring algorithm is proposed to stabilize the number of successful devices in each dedicated access slot. The objective is to let as few devices transmitting preambles in an access slot as possible while ensuring that the number of preambles selected by exactly one device approximates the maximum number of uplink grants that can be allocated by the eNB for an access slot. Analysis and simulation results demonstrate that, given the same amount of time-frequency resources, the proposed method significantly improves the access success and resource utilization rates at the cost of slightly increasing the access delay compared to state-of-the-art methods.
Zhaoqi LI Ta LI Qingwei ZHAO Pengyuan ZHANG
Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.
Taiki YAMAGIWA Yoshiki KAYANO Yoshio KAMI Fengchao XIAO
In this paper, an experimental method is proposed for extracting the primary and secondary parameters of transmission lines with frequency dispersion. So far, there is no report of these methods being applied to transmission lines with frequency dispersion. This paper provides an experimental evaluation means of transmission lines with frequency dispersion and clarifies the issues when applying the proposed method. In the proposed experimental method, unnecessary components such as connectors are removed by using a simple de-embedding method. The frequency response of the primary and secondary parameters extracted by using the method reproduced all dispersion characteristics of a transmission line with frequency dispersion successfully. It is demonstrated that an accurate RLGC equivalent-circuit model is obtained experimentally, which can be used to quantitatively evaluate the frequency/time responses of shielded-FPC with frequency dispersion and to validate RLGC equivalent-circuit models extracted by using electromagnetic field analysis.
Naoki HIRAKURA Masaki AIDA Konosuke KAWASHIMA
While social media is now used by many people and plays a role in distributing information, it has recently created an unexpected problem: the actual shrinkage of information sources. This is mainly due to the ease of connecting people with similar opinions and the recommendation system. Biased information distribution promotes polarization that divides people into multiple groups with opposing views. Also, people may receive only the seemingly positive information that they prefer, or may trigger them into holding onto their opinions more strongly when they encounter opposing views. This, combined with the characteristics of social media, is accelerating the polarization of opinions and eventually social division. In this paper, we propose a model of opinion formation on social media to simulate polarization. While based on the idea that opinion neutrality is only relative, this model provides new techniques for dealing with polarization.
Chongzheng HAO Xiaoyu DANG Sai LI Chenghua WANG
This paper presents a deep neural network (DNN) based symbol detection and modulation classification detector (SDMCD) for mixed blind signals detection. Unlike conventional methods that employ symbol detection after modulation classification, the proposed SDMCD can perform symbol recovery and modulation identification simultaneously. A cumulant and moment feature vector is presented in conjunction with a low complexity sparse autoencoder architecture to complete mixed signals detection. Numerical results show that SDMCD scheme has remarkable symbol error rate performance and modulation classification accuracy for various modulation formats in AWGN and Rayleigh fading channels. Furthermore, the proposed detector has robust performance under the impact of frequency and phase offsets.
Phong X. NGUYEN Hung Q. CAO Khang V. T. NGUYEN Hung NGUYEN Takehisa YAIRI
In recent years, there has been an increasing trend of applying artificial intelligence in many different fields, which has a profound and direct impact on human life. Consequently, this raises the need to understand the principles of model making predictions. Since most current high-precision models are black boxes, neither the AI scientist nor the end-user profoundly understands what is happening inside these models. Therefore, many algorithms are studied to explain AI models, especially those in the image classification problem in computer vision such as LIME, CAM, GradCAM. However, these algorithms still have limitations, such as LIME's long execution time and CAM's confusing interpretation of concreteness and clarity. Therefore, in this paper, we will propose a new method called Segmentation - Class Activation Mapping (SeCAM)/ This method combines the advantages of these algorithms above while at simultaneously overcoming their disadvantages. We tested this algorithm with various models, including ResNet50, InceptionV3, and VGG16 from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set. Outstanding results were achieved when the algorithm has met all the requirements for a specific explanation in a remarkably short space of time.
This work investigates the effect of channel estimation error on the average secrecy outage capacity of dual selection in the presence of multiple eavesdroppers. The dual selection selects a transmit antenna of Alice and Bob (i.e., user terminal) which provide the best received signal to noise ratio (SNR) using channel state information from every user terminals. Using Gaussian approximation, this paper obtains the tight analytical expression of the dual selection for the average secrecy outage capacity over channel estimation error and multiple eavesdroppers. Using asymptotic analysis, this work quantifies the high SNR power offset and the high SNR slope for the average secrecy outage capacity at high SNR.
Chenchen MENG Jun WANG Chengzhi DENG Yuanyun WANG Shengqian WANG
Feature representation is a key component of most visual tracking algorithms. It is difficult to deal with complex appearance changes with low-level hand-crafted features due to weak representation capacities of such features. In this paper, we propose a novel tracking algorithm through combining a joint dictionary pair learning with convolutional neural networks (CNN). We utilize CNN model that is trained on ImageNet-Vid to extract target features. The CNN includes three convolutional layers and two fully connected layers. A dictionary pair learning follows the second fully connected layer. The joint dictionary pair is learned upon extracted deep features by the trained CNN model. The temporal variations of target appearances are learned in the dictionary learning. We use the learned dictionaries to encode target candidates. A linear combination of atoms in the learned dictionary is used to represent target candidates. Extensive experimental evaluations on OTB2015 demonstrate the superior performances against SOTA trackers.
Yahui TANG Tong LI Rui ZHU Cong LIU Shuaipeng ZHANG
Service mining aims to use process mining for the analysis of services, making it possible to discover, analyze, and improve service processes. In the context of Web services, the recording of all kinds of events related to activities is possible, which can be used to extract new information of service processes. However, the distributed nature of the services tends to generate large-scale service event logs, which complicates the discovery and analysis of service processes. To solve this problem, this research focus on the existing large-scale service event logs, a hybrid genetic service mining based on a trace clustering population method (HGSM) is proposed. By using trace clustering, the complex service system is divided into multiple functionally independent components, thereby simplifying the mining environment; And HGSM improves the mining efficiency of the genetic mining algorithm from the aspects of initial population quality improvement and genetic operation improvement, makes it better handle large service event logs. Experimental results demonstrate that compare with existing state-of-the-art mining methods, HGSM has better characteristics to handle large service event logs, in terms of both the mining efficiency and model quality.
Recently several researchers have proposed various methods to build intelligent stock trading and portfolio management systems using rapid advancements in artificial intelligence including machine learning techniques. However, existing technical analysis-based stock price prediction studies primarily depend on price change or price-related moving average patterns, and information related to trading volume is only used as an auxiliary indicator. This study focuses on the effect of changes in trading volume on stock prices and proposes a novel method for short-term stock price predictions based on trading volume patterns. Two rapid volume decrease patterns are defined based on the combinations of multiple volume moving averages. The dataset filtered using these patterns is learned through the supervised learning of neural networks. Experimental results based on the data from Korea Composite Stock Price Index and Korean Securities Dealers Automated Quotation, show that the proposed prediction system can achieve a trading performance that significantly exceeds the market average.
Nowadays, a rapid increase of demand on high-performance computation causes the enthusiastic research activities regarding massively parallel systems. An interconnection network in a massively parallel system interconnects a huge number of processing elements so that they can cooperate to process tasks by communicating among others. By regarding a processing element and a link between a pair of processing elements as a node and an edge, respectively, many problems with respect to communication and/or routing in an interconnection network are reducible to the problems in the graph theory. For interconnection networks of the massively parallel systems, many topologies have been proposed so far. The hypercube is a very popular topology and it has many variants. The bicube is a such topology and it can interconnect the same number of nodes with the same degree as the hypercube while its diameter is almost half of that of the hypercube. In addition, the bicube keeps the node-symmetric property. Hence, we focus on the bicube and propose an algorithm that gives a minimal or shortest path between an arbitrary pair of nodes. We give a proof of correctness of the algorithm and demonstrate its execution.
Lu ZHAO Bo XU Tianqing CAO Jiao DU
A unified construction for yielding optimal and balanced quaternary sequences from ideal/optimal balanced binary sequences was proposed by Zeng et al. In this paper, the linear complexity over finite field 𝔽2, 𝔽4 and Galois ring ℤ4 of the quaternary sequences are discussed, respectively. The exact values of linear complexity of sequences obtained by Legendre sequence pair, twin-prime sequence pair and Hall's sextic sequence pair are derived.
Lukas NAKAMURA Hiromitsu AWANO
We propose “Temporal Ensemble SSDLite,” a new method for video object detection that boosts accuracy while maintaining detection speed and energy consumption. Object detection for video is becoming increasingly important as a core part of applications in robotics, autonomous driving and many other promising fields. Many of these applications require high accuracy and speed to be viable, but are used in compute and energy restricted environments. Therefore, new methods that increase the overall performance of video object detection i.e., accuracy and speed have to be developed. To increase accuracy we use ensemble, the machine learning method of combining predictions of multiple different models. The drawback of ensemble is the increased computational cost which is proportional to the number of models used. We overcome this deficit by deploying our ensemble temporally, meaning we inference with only a single model at each frame, cycling through our ensemble of models at each frame. Then, we combine the predictions for the last N frames where N is the number of models in our ensemble through non-max-suppression. This is possible because close frames in a video are extremely similar due to temporal correlation. As a result, we increase accuracy through the ensemble while only inferencing a single model at each frame and therefore keeping the detection speed. To evaluate the proposal, we measure the accuracy, detection speed and energy consumption on the Google Edge TPU, a machine learning inference accelerator, with the Imagenet VID dataset. Our results demonstrate an accuracy boost of up to 4.9% while maintaining real-time detection speed and an energy consumption of 181mJ per image.
Weichuang YU Peiyu HE Fan PAN Ao CUI Zili XU
To reduce mutual coupling of a two-level nested array (TLNA) with an even number of sensors, we propose an improved array configuration that exhibits all the good properties of the prototype optimal configuration under the constraint of a fixed number of sensors N and achieves reduction of mutual coupling. Compared with the prototype optimal TLNA (POTLNA), which inner level and outer level both have N/2 sensors, those of the improved optimal TLNA (IOTLNA) are N/2-1 and N/2+1. It is proved that the physical aperture and uniform degrees of freedom (uDOFs) of IOTLNA are the same as those of POTLNA, and the number of sensor pairs with small separations of IOTLNA is reduced. We also construct an improved optimal second-order super nested array (SNA) by using the IOTLNA as the parent nested array, termed IOTLNA-SNA, which has the same physical aperture and the same uDOFs, as well as the IOTLNA. Numerical simulations demonstrate the better performance of the improved array configurations.
Menglong WU Cuizhu QIN Hongxia DONG Wenkai LIU Xiaodong NIE Xichang CAI Yundong LI
In many screen to camera communication (S2C) systems, the barcode preprocessing method is a significant prerequisite because barcodes may be deformed due to various environmental factors. However, previous studies have focused on barcode detection under static conditions; to date, few studies have been carried out on dynamic conditions (for example, the barcode video stream or the transmitter and receiver are moving). Therefore, we present a detection and tracking method for dynamic barcodes based on a Siamese network. The backbone of the CNN in the Siamese network is improved by SE-ResNet. The detection accuracy achieved 89.5%, which stands out from other classical detection networks. The EAO reaches 0.384, which is better than previous tracking methods. It is also superior to other methods in terms of accuracy and robustness. The SE-ResNet in this paper improved the EAO by 1.3% compared with ResNet in SiamMask. Also, our method is not only applicable to static barcodes but also allows real-time tracking and segmentation of barcodes captured in dynamic situations.
Dehua LIANG Jun SHIOMI Noriyuki MIURA Masanori HASHIMOTO Hiromitsu AWANO
Reservoir computing (RC) is an attractive alternative to machine learning models owing to its computationally inexpensive training process and simplicity. In this work, we propose EnsembleBloomCA, which utilizes cellular automata (CA) and an ensemble Bloom filter to organize an RC system. In contrast to most existing RC systems, EnsembleBloomCA eliminates all floating-point calculation and integer multiplication. EnsembleBloomCA adopts CA as the reservoir in the RC system because it can be implemented using only binary operations and is thus energy efficient. The rich pattern dynamics created by CA can map the original input into a high-dimensional space and provide more features for the classifier. Utilizing an ensemble Bloom filter as the classifier, the features provided by the reservoir can be effectively memorized. Our experiment revealed that applying the ensemble mechanism to the Bloom filter resulted in a significant reduction in memory cost during the inference phase. In comparison with Bloom WiSARD, one of the state-of-the-art reference work, the EnsembleBloomCA model achieves a 43× reduction in memory cost while maintaining the same accuracy. Our hardware implementation also demonstrated that EnsembleBloomCA achieved over 23× and 8.5× reductions in area and power, respectively.