Dongzhen WANG Daqing HUANG Cheng XU
The reconnaissance mode with the cooperation of two unmanned aerial vehicles (UAVs) equipped with airborne visual tracking platforms is a common practice for localizing a target. Apart from the random noises from sensors, the localization performance is much dependent on their cooperative trajectories. In our previous work, we have proposed a cooperative trajectory generating method that proves better than EKF based method. In this letter, an improved online trajectory generating method is proposed to enhance the previous one. First, the least square estimation method has been replaced with a geometric-optimization based estimation method, which can obtain a better estimation performance than the least square method proposed in our previous work; second, in the trajectory optimization phase, the position error caused by estimation method is also considered, which can further improve the optimization performance of the next way points of the two UAVs. The improved method can well be applied to the two-UAV trajectory planning for corporative target localization, and the simulation results confirm that the improved method achieves an obviously better localization performance than our previous method and the EKF-based method.
Miho SHINOHARA Yukina TAMURA Shinya MOCHIDUKI Hiroaki KUDO Mitsuho YAMADA
We investigated the function in the Lateral Geniculate Nucleus of avoidance behavior due to the inconsistency between binocular retinal images due to blue from vergence eye movement based on avoidance behavior caused by the inconsistency of binocular retinal images when watching the rim of a blue-yellow equiluminance column.
Zhaoyang HOU Zheng XIANG Peng REN Qiang HE Ling ZHENG
In this paper, the distributed cooperative communication of unmanned aerial vehicles (UAVs) is studied, where the condition number (CN) and the inner product (InP) are used to measure the quality of communication links. By optimizing the relative position of UAVs, large channel capacity and stable communication links can be obtained. Using the spherical wave model under the line of sight (LOS) channel, CN expression of the channel matrix is derived when there are Nt transmitters and two receivers in the system. In order to maximize channel capacity, we derive the UAVs position constraint equation (UAVs-PCE), and the constraint between BS elements distance and carrier wavelength is analyzed. The result shows there is an area where no matter how the UAVs' positions are adjusted, the CN is still very large. Then a special scenario is considered where UAVs form a rectangular lattice array, and the optimal constraint between communication distance and UAVs distance is derived. After that, we derive the InP of channel matrix and the gradient expression of InP with respect to UAVs' position. The particle swarm optimization (PSO) algorithm is used to minimize the CN and the gradient descent (GD) algorithm is used to minimize the InP by optimizing UAVs' position iteratively. Both of the two algorithms present great potentials for optimizing the CN and InP respectively. Furthermore, a hybrid algorithm named PSO-GD combining the advantage of the two algorithms is proposed to maximize the communication capacity with lower complexity. Simulations show that PSO-GD is more efficient than PSO and GD. PSO helps GD to break away from local extremum and provides better positions for GD, and GD can converge to an optimal solution quickly by using the gradient information based on the better positions. Simulations also reveal that a better channel can be obtained when those parameters satisfy the UAVs position constraint equation (UAVs-PCE), meanwhile, theory analysis also explains the abnormal phenomena in simulations.
Lei YANG Tingxiao YANG Hiroki KIMURA Yuichiro YOSHIMURA Kumiko ARAI Taka-aki NAKADA Huiqin JIANG Toshiya NAKAGUCHI
In medical fields, detecting traumatic bleedings has always been a difficult task due to the small size, low contrast of targets and large number of images. In this work we propose an automatic traumatic bleeding detection approach from contrast enhanced CT images via deep CNN networks, containing segmentation process and classification process. CT values of DICOM images are extracted and processed via three different window settings first. Small 3D patches are cropped from processed images and segmented by a 3D CNN network. Then segmentation results are converted to point cloud data format and classified by a classifier. The proposed pre-processing approach makes the segmentation network be able to detect small and low contrast targets and achieve a high sensitivity. The additional classification network solves the boundary problem and short-sighted problem generated during the segmentation process to further decrease false positives. The proposed approach is tested with 3 CT cases containing 37 bleeding regions. As a result, a total of 34 bleeding regions are correctly detected, the sensitivity reaches 91.89%. The average false positive number of test cases is 1678. 46.1% of false positive predictions are decreased after being classified. The proposed method is proved to be able to achieve a high sensitivity and be a reference of medical doctors.
It is found that the electrical resistance-length characteristic in an electroactive supercoiled polymer artificial muscle strongly depends on the temperature. This may come from the thermal expansion of coils in the artificial muscle, which increases the contact area of neighboring coils and results in a lower electrical resistance at a higher temperature. On the other hand, the electrical resistance-length characteristic collected during electrical driving seriously deviates from those collected at constant temperatures. Inhomogeneous heating during electrical driving seems to be a key for the deviation.
Ji HU Chenggang YAN Jiyong ZHANG Dongliang PENG Chengwei REN Shengying YANG
Online learning is a method which updates the model gradually and can modify and strengthen the previous model, so that the updated model can adapt to the new data without having to relearn all the data. However, the accuracy of the current online multiclass learning algorithm still has room for improvement, and the ability to produce sparse models is often not strong. In this paper, we propose a new Multiclass Truncated Gradient Confidence-Weighted online learning algorithm (MTGCW), which combine the Truncated Gradient algorithm and the Confidence-weighted algorithm to achieve higher learning performance. The experimental results demonstrate that the accuracy of MTGCW algorithm is always better than the original CW algorithm and other baseline methods. Based on these results, we applied our algorithm for phishing website recognition and image classification, and unexpectedly obtained encouraging experimental results. Thus, we have reasons to believe that our classification algorithm is clever at handling unstructured data which can promote the cognitive ability of computers to a certain extent.
Shakhnaz AKHMEDOVA Vladimir STANOVOV Sophia VISHNEVSKAYA Chiori MIYAJIMA Yukihiro KAMIYA
This study is focused on the automated detection of a complex system operator's condition. For example, in this study a person's reaction while listening to music (or not listening at all) was determined. For this purpose various well-known data mining tools as well as ones developed by authors were used. To be more specific, the following techniques were developed and applied for the mentioned problems: artificial neural networks and fuzzy rule-based classifiers. The neural networks were generated by two modifications of the Differential Evolution algorithm based on the NSGA and MOEA/D schemes, proposed for solving multi-objective optimization problems. Fuzzy logic systems were generated by the population-based algorithm called Co-Operation of Biology Related Algorithms or COBRA. However, firstly each person's state was monitored. Thus, databases for problems described in this study were obtained by using non-contact Doppler sensors. Experimental results demonstrated that automatically generated neural networks and fuzzy rule-based classifiers can properly determine the human condition and reaction. Besides, proposed approaches outperformed alternative data mining tools. However, it was established that fuzzy rule-based classifiers are more accurate and interpretable than neural networks. Thus, they can be used for solving more complex problems related to the automated detection of an operator's condition.
Byeonghak KIM Murray LOEW David K. HAN Hanseok KO
To date, many studies have employed clustering for the classification of unlabeled data. Deep separate clustering applies several deep learning models to conventional clustering algorithms to more clearly separate the distribution of the clusters. In this paper, we employ a convolutional autoencoder to learn the features of input images. Following this, k-means clustering is conducted using the encoded layer features learned by the convolutional autoencoder. A center loss function is then added to aggregate the data points into clusters to increase the intra-cluster homogeneity. Finally, we calculate and increase the inter-cluster separability. We combine all loss functions into a single global objective function. Our new deep clustering method surpasses the performance of existing clustering approaches when compared in experiments under the same conditions.
Yuchao SUN Qiao PENG Dengyin ZHANG
With the development of the Internet of Vehicles, License plate detection technology is widely used, e.g., smart city and edge senor monitor. However, traditional license plate detection methods are based on the license plate edge detection, only suitable for limited situation, such as, wealthy light and favorable camera's angle. Fortunately, deep learning networks represented by YOLOv3 can solve the problem, relying on strict condition. Although YOLOv3 make it better to detect large targets, its low performance in detecting small targets and lack of the real-time interactively. Motivated by this, we present a faster and lightweight YOLOv3 model for multi-vehicle or under-illuminated images scenario. Generally, our model can serves as a guideline for optimizing neural network in multi-vehicle scenario.
Chikako TAKASAKI Atsuko TAKEFUSA Hidemoto NAKADA Masato OGUCHI
With the development of cameras and sensors and the spread of cloud computing, life logs can be easily acquired and stored in general households for the various services that utilize the logs. However, it is difficult to analyze moving images that are acquired by home sensors in real time using machine learning because the data size is too large and the computational complexity is too high. Moreover, collecting and accumulating in the cloud moving images that are captured at home and can be used to identify individuals may invade the privacy of application users. We propose a method of distributed processing over the edge and cloud that addresses the processing latency and the privacy concerns. On the edge (sensor) side, we extract feature vectors of human key points from moving images using OpenPose, which is a pose estimation library. On the cloud side, we recognize actions by machine learning using only the feature vectors. In this study, we compare the action recognition accuracies of multiple machine learning methods. In addition, we measure the analysis processing time at the sensor and the cloud to investigate the feasibility of recognizing actions in real time. Then, we evaluate the proposed system by comparing it with the 3D ResNet model in recognition experiments. The experimental results demonstrate that the action recognition accuracy is the highest when using LSTM and that the introduction of dropout in action recognition using 100 categories alleviates overfitting because the models can learn more generic human actions by increasing the variety of actions. In addition, it is demonstrated that preprocessing using OpenPose on the sensor side can substantially reduce the transfer quantity from the sensor to the cloud.
Taku SUZUKI Mikihito SUZUKI Kenichi HIGUCHI
This paper proposes a parallel peak cancellation (PC) process for the computational complexity-efficient algorithm called PC with a channel-null constraint (PCCNC) in the adaptive peak-to-average power ratio (PAPR) reduction method using the null space in a multiple-input multiple-output (MIMO) channel for MIMO-orthogonal frequency division multiplexing (OFDM) signals. By simultaneously adding multiple PC signals to the time-domain transmission signal vector, the required number of iterations of the iterative algorithm is effectively reduced along with the PAPR. We implement a constraint in which the PC signal is transmitted only to the null space in the MIMO channel by beamforming (BF). By doing so the data streams do not experience interference from the PC signal on the receiver side. Since the fast Fourier transform (FFT) and inverse FFT (IFFT) operations at each iteration are not required unlike the previous algorithm and thanks to the newly introduced parallel processing approach, the enhanced PCCNC algorithm reduces the required total computational complexity and number of iterations compared to the previous algorithms while achieving the same throughput-vs.-PAPR performance.
Zhi LIU Yifan SU Shuzhong YANG Mengmeng ZHANG
Cross-component linear model (CCLM) chromaticity prediction is a new technique introduced in Versatile Video Coding (VVC), which utilizes the reconstructed luminance component to predict the chromaticity parts, and can improve the coding performance. However, it increases the coding complexity. In this paper, how to accelerate the chroma intra-prediction process is studied based on texture characteristics. Firstly, two observations have been found through experimental statistics for the process. One is that the choice of the chroma intra-prediction candidate modes is closely related to the texture complexity of the coding unit (CU), and the other is that whether the direct mode (DM) is selected is closely related to the texture similarity between current chromaticity CU and the corresponding luminance CU. Secondly, a fast chroma intra-prediction mode decision algorithm is proposed based on these observations. A modified metric named sum modulus difference (SMD) is introduced to measure the texture complexity of CU and guide the filtering of the irrelevant candidate modes. Meanwhile, the structural similarity index measurement (SSIM) is adopted to help judging the selection of the DM mode. The experimental results show that compared with the reference model VTM8.0, the proposed algorithm can reduce the coding time by 12.92% on average, and increases the BD-rate of Y, U, and V components by only 0.05%, 0.32%, and 0.29% respectively.
Shogo NAKAMURA Sho IWAZAKI Koichi ICHIGE
This paper presents a method to optimize 2-D sparse array configurations along with a technique to interpolate holes to accurately estimate the direction of arrival (DOA). Conventional 2-D sparse arrays are often defined using a closed-form representation and have the property that they can create hole-free difference co-arrays that can estimate DOAs of incident signals that outnumber the physical elements. However, this property restricts the array configuration to a limited structure and results in a significant mutual coupling effect between consecutive sensors. In this paper, we introduce an optimization-based method for designing 2-D sparse arrays that enhances flexibility of array configuration as well as DOA estimation accuracy. We also propose a method to interpolate holes in 2-D co-arrays by nuclear norm minimization (NNM) that permits holes and to extend array aperture to further enhance DOA estimation accuracy. The performance of the proposed optimum arrays is evaluated through numerical examples.
Nida RASHEED Waqar S. QURESHI Shoab A. KHAN Manshoor A. NAQVI Eisa ALANAZI
Surveillance through aerial systems is in place for years. Such systems are expensive, and a large fleet is in operation around the world without upgrades. These systems have low resolution and multiple analog cameras on-board, with Digital Video Recorders (DVRs) at the control station. Generated digital videos have multi-scenes from multi-feeds embedded in a single video stream and lack video stabilization. Replacing on-board analog cameras with the latest digital counterparts requires huge investment. These videos require stabilization and other automated video analysis prepossessing steps before passing it to the mosaicing algorithm. Available mosaicing software are not tailored to segregate feeds from different cameras and scenes, automate image enhancements, and stabilize before mosaicing (image stitching). We present "AirMatch", a new automated system that first separates camera feeds and scenes, then stabilize and enhance the video feed of each camera; generates a mosaic of each scene of every feed and produce a super quality mosaic by stitching mosaics of all feeds. In our proposed solution, state-of-the-art video analytics techniques are tailored to work on videos from vintage cameras in aerial applications. Our new framework is independent of specialized hardware requirements and generates effective mosaics. Affine motion transform with smoothing Gaussian filter is selected for the stabilization of videos. A histogram-based method is performed for scene change detection and image contrast enhancement. Oriented FAST and rotated BRIEF (ORB) is selected for feature detection and descriptors in video stitching. Several experiments on a number of video streams are performed and the analysis shows that our system can efficiently generate mosaics of videos with high distortion and artifacts, compared with other commercially available mosaicing software.
Lin YAN Mingyong ZENG Shuai REN Zhangkai LUO
Encrypted traffic identification is to predict traffic types of encrypted traffic. A deep residual convolution network is proposed for this task. The Softmax classifier is fused with its angular variant, which sets an angular margin to achieve better discrimination. The proposed method improves representation learning and reaches excellent results on the public dataset.
Yih-Cherng LEE Hung-Wei HSU Jian-Jiun DING Wen HOU Lien-Shiang CHOU Ronald Y. CHANG
Automatic tracking and classification are essential for studying the behaviors of wild animals. Owing to dynamic far-shooting photos, the occlusion problem, protective coloration, the background noise is irregular interference for designing a computerized algorithm for reducing human labeling resources. Moreover, wild dolphin images are hard-acquired by on-the-spot investigations, which takes a lot of waiting time and hardly sets the fixed camera to automatic monitoring dolphins on the ocean in several days. It is challenging tasks to detect well and classify a dolphin from polluted photos by a single famous deep learning method in a small dataset. Therefore, in this study, we propose a generic Cascade Small Object Detection (CSOD) algorithm for dolphin detection to handle small object problems and develop visualization to backbone based classification (V2BC) for removing noise, highlighting features of dolphin and classifying the name of dolphin. The architecture of CSOD consists of the P-net and the F-net. The P-net uses the crude Yolov3 detector to be a core network to predict all the regions of interest (ROIs) at lower resolution images. Then, the F-net, which is more robust, is applied to capture the ROIs from high-resolution photos to solve single detector problems. Moreover, a visualization to backbone based classification (V2BC) method focuses on extracting significant regions of occluded dolphin and design significant post-processing by referencing the backbone of dolphins to facilitate for classification. Compared to the state of the art methods, including faster-rcnn, yolov3 detection and Alexnet, the Vgg, and the Resnet classification. All experiments show that the proposed algorithm based on CSOD and V2BC has an excellent performance in dolphin detection and classification. Consequently, compared to the related works of classification, the accuracy of the proposed designation is over 14% higher. Moreover, our proposed CSOD detection system has 42% higher performance than that of the original Yolov3 architecture.
Hiryu KAMOSHITA Daichi KITAHARA Ken'ichi FUJIMOTO Laurent CONDAT Akira HIRABAYASHI
This paper proposes a high-quality computed tomography (CT) image reconstruction method from low-dose X-ray projection data. A state-of-the-art method, proposed by Xu et al., exploits dictionary learning for image patches. This method generates an overcomplete dictionary from patches of standard-dose CT images and reconstructs low-dose CT images by minimizing the sum of a data fidelity and a regularization term based on sparse representations with the dictionary. However, this method does not take characteristics of each patch, such as textures or edges, into account. In this paper, we propose to classify all patches into several classes and utilize an individual dictionary with an individual regularization parameter for each class. Furthermore, for fast computation, we introduce the orthogonality to column vectors of each dictionary. Since similar patches are collected in the same cluster, accuracy degradation by the orthogonality hardly occurs. Our simulations show that the proposed method outperforms the state-of-the-art in terms of both accuracy and speed.
Ryousei TAKANO Kuniyasu SUZAKI
A conventional data center that consists of monolithic-servers is confronted with limitations including lack of operational flexibility, low resource utilization, low maintainability, etc. Resource disaggregation is a promising solution to address the above issues. We propose a concept of disaggregated cloud data center architecture called Flow-in-Cloud (FiC) that enables an existing cluster computer system to expand an accelerator pool through a high-speed network. FlowOS-RM manages the entire pool resources, and deploys a user job on a dynamically constructed slice according to a user request. This slice consists of compute nodes and accelerators where each accelerator is attached to the corresponding compute node. This paper demonstrates the feasibility of FiC in a proof of concept experiment running a distributed deep learning application on the prototype system. The result successfully warrants the applicability of the proposed system.
Spectral graph theory provides an algebraic approach to investigate the characteristics of weighted networks using the eigenvalues and eigenvectors of a matrix (e.g., normalized Laplacian matrix) that represents the structure of the network. However, it is difficult to accurately represent the structures of large-scale and complex networks (e.g., social network) as a matrix. This difficulty can be avoided if there is a universality, such that the eigenvalues are independent of the detailed structure in large-scale and complex network. In this paper, we clarify Wigner's Semicircle Law for weighted networks as such a universality. The law indicates that the eigenvalues of the normalized Laplacian matrix of weighted networks can be calculated from a few network statistics (the average degree, average link weight, and square average link weight) when the weighted networks satisfy a sufficient condition of the node degrees and the link weights.
The pervasive application of Small Private Online Course (SPOC) provides a powerful impetus for the reform of higher education. During the teaching process, a teacher needs to understand the difficulty of SPOC videos for students in real time to be more focused on the difficulties and key points of the course in a flipped classroom. However, existing educational data mining techniques pay little attention to the SPOC video difficulty clustering or classification. In this paper, we propose an approach to cluster SPOC videos based on the difficulty using video-watching data in a SPOC. Specifically, a bipartite graph that expresses the learning relationship between students and videos is constructed based on the number of video-watching times. Then, the SimRank++ algorithm is used to measure the similarity of the difficulty between any two videos. Finally, the spectral clustering algorithm is used to implement the video clustering based on the obtained similarity of difficulty. Experiments on a real data set in a SPOC show that the proposed approach has better clustering accuracy than other existing ones. This approach facilitates teachers learn about the overall difficulty of a SPOC video for students in real time, and therefore knowledge points can be explained more effectively in a flipped classroom.