Xina CHENG Yiming ZHAO Takeshi IKENAGA
Real-time 3D players tracking plays an important role in sports analysis, especially for the live services of sports broadcasting, which have a strict limitation on processing time. For these kinds of applications, 3D trajectories of players contribute to high-level game analysis such as tactic analysis and commercial applications such as TV contents. Thus real-time implementation for 3D players tracking is expected. In order to achieve real-time for 60fps videos with high accuracy, (that means the processing time should be less than 16.67ms per frame), the factors that limit the processing time of target algorithm include: 1) Large image area of each player. 2) Repeated processing of multiple players in multiple views. 3) Complex calculation of observation algorithm. To deal with the above challenges, this paper proposes a representative spatial selection and temporal combination based real-time implementation for multi-view volleyball players tracking on the GPU device. First, the representative spatial pixel selection, which detects the pixels that mostly represent one image region to scale down the image spatially, reduces the number of processing pixels. Second, the representative temporal likelihood combination shares observation calculation by using the temporal correlation between images so that the times of complex calculation is reduced. The experiments are based on videos of the Final and Semi-Final Game of 2014 Japan Inter High School Games of Men's Volleyball in Tokyo Metropolitan Gymnasium. On the GPU device GeForce GTX 1080Ti, the tracking system achieves real-time on 60fps videos and keeps the tracking accuracy higher than 97%.
Saya OHIRA Naoki TSUCHIYA Tetsuya MATSUMURA
We propose a three-dimensional (3D) sound processor architecture that includes super-directional modulation intellectual property (IP) and 3D sound processing IP and for consumer applications. In addition, we also propose an automatic design environment for 3D sound processing IP. This processor can generate realistic small sound fields in arbitrary spaces using ultrasound. In particular, in the 3D sound processing IP, in order to reproduce 3D audio, it is necessary to reproduce the personal frequency characteristics of complex head related transfer functions. For this reason, we have constructed an automatic design environment with high reconfigurability. This automatic design environment is based on high-level synthesis, and it is possible to automatically generate a C-based algorithm simulator and automatically synthesize the IP hardware by inputting a parameter description file for filter design. This automatic design environment can reduce the design period to approximately 1/5 as compared with conventional manual design. Applying the automatic design environment, a 3D sound processing IP was designed experimentally. The designed IP can be sufficiently applied to consumer applications from the viewpoints of hardware amount and power consumption.
Xiao-Yi ZHAO Chao-Yi DONG Peng ZHOU Mei-Jia ZHU Jing-Wen REN Xiao-Yan CHEN
The paper employed an Alexnet, which is a deep learning framework, to automatically diagnose the damages of wind power generator blade surfaces. The original images of wind power generator blade surfaces were captured by machine visions of a 4-rotor UAV (unmanned aerial vehicle). Firstly, an 8-layer Alexnet, totally including 21 functional sub-layers, is constructed and parameterized. Secondly, the Alexnet was trained with 10000 images and then was tested by 6-turn 350 images. Finally, the statistic of network tests shows that the average accuracy of damage diagnosis by Alexnet is about 99.001%. We also trained and tested a traditional BP (Back Propagation) neural network, which have 20-neuron input layer, 5-neuron hidden layer, and 1-neuron output layer, with the same image data. The average accuracy of damage diagnosis of BP neural network is 19.424% lower than that of Alexnet. The point shows that it is feasible to apply the UAV image acquisition and the deep learning classifier to diagnose the damages of wind turbine blades in service automatically.
Neunghoe KIM Jongwook JEONG Mansoo HWANG
Free/libre open source software (FLOSS) are being rapidly employed in several companies and organizations, because it can be modified and used for free. Hence, the use of FLOSS could contribute to its originally intended benefits and to the competence of its users. In this study, we analyzed the effect of using FLOSS on related competences. We investigated the change in the competences through an empirical study before and after the use of FLOSS among project participants. Consequently, it was confirmed that the competences of the participants improved after utilizing FLOSS.
File systems based on persistent memory deploy Copy-on-Write (COW) or logging to guarantee data consistency. However, COW has a write amplification problem and logging has a double write problem. Both COW and logging increase write traffic on persistent memory. In this work, we present adaptive differential logging and zero-copy logging for persistent memory. Adaptive differential logging applies COW or logging selectively to each block. If the updated size of a block is smaller than or equal to half of the block size, we apply logging to the block. If the updated size of a block is larger than half of the block size, we apply COW to the block. Zero-copy logging treats an user buffer on persistent memory as a redo log. Zero-copy logging does not incur any additional data copy. We implement adaptive differential logging and zero-copy logging on both NOVA and PMFS file systems. Our measurement on real workloads shows that adaptive differential logging and zero-copy logging get 150.6% and 149.2% performance improvement over COW, respectively.
Jinna LV Bin WU Yunlei ZHANG Yunpeng XIAO
Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.
Chao GENG Bo LIU Shigetoshi NAKATAKE
In integrated circuit design of advanced technology nodes, layout density uniformity significantly influences the manufacturability due to the CMP variability. In analog design, especially, designers are suffering from passing the density checking since there are few useful tools. To tackle this issue, we focus a transistor-array(TA)-style analog layout, and propose a density optimization algorithm consistent with complicated design rules. Based on TA-style, we introduce a density-aware layout format to explicitly control the layout pattern density, and provide the mathematical optimization approach. Hence, a design flow incorporating our density optimization can drastically reduce the design time with fewer iterations. In a design case of an OPAMP layout in a 65nm CMOS process, the result demonstrates that the proposed approach achieves more than 48× speed-up compared with conventional manual layout, meanwhile it shows a good circuit performance in the post-layout simulation.
Po-Yu KUO Chia-Hsin HSIEH Jin-Fa LIN Ming-Hwa SHEU Yi-Ting HUNG
A novel low power sense-amplifier based flip-flop (FF) is presented. By using a simplified SRAM based latch design and pass transistor logic (PTL) circuit scheme, the transistor-count of the FF design is greatly reduced as well as leakage power performance. The performance claims are verified through extensive post-layout simulations. Compared to the conventional sense-amplifier FF design, the proposed circuit achieves 19.6% leakage reduction. Moreover, the delay, and area are reduced by 21.8% and 31%, respectively. The performance edge becomes even better when the flip-flop is integrated in N-bit register file.
Can CHEN Chao ZHOU Jian LIU Dengyin ZHANG
Distributed compressive video sensing (DCVS) has received considerable attention due to its potential in source-limited communication, e.g., wireless video sensor networks (WVSNs). Multi-hypothesis (MH) prediction, which treats the target block as a linear combination of hypotheses, is a state-of-the-art technique in DCVS. The common approach is under the supposition that blocks that are dissimilar from the target block are given lower weights than blocks that are more similar. This assumption can yield acceptable reconstruction quality, but it is not suitable for scenarios with more details. In this paper, based on the joint sparsity model (JSM), the authors present a Tikhonov-regularized MH prediction scheme in which the most similar block provides the similar common portion and the others blocks provide respective unique portions, differing from the common supposition. Specifically, a new scheme for generating hypotheses and a Euclidean distance-based metric for the regularized term are proposed. Compared with several state-of-the-art algorithms, the authors show the effectiveness of the proposed scheme when there are a limited number of hypotheses.
Recently, more and more people start investing. Understanding the factors affecting financial products is important for making investment decisions. However, it is difficult to understand factors for novices because various factors affect each other. Various technique has been studied, but conventional factor analysis methods focus on revealing the impact of factors over a certain period locally, and it is not easy to predict net asset values. As a reasonable solution for the prediction of net asset values, in this paper, we propose a trend shift model for the global analysis of factors by introducing trend change points as shift interference variables into state space models. In addition, to realize the trend shift model efficiently, we propose an effective trend detection method, TP-TBSM (two-phase TBSM), by extending TBSM (trend-based segmentation method). Comparing with TBSM, TP-TBSM could detect trends flexibly by reducing the dependence on parameters. We conduct experiments with eleven investment trust products and reveal the usefulness and effectiveness of the proposed model and method.
The effectiveness of model adaptation in dialogue speech synthesis is explored. The proposed adaptation method is based on a conversion from a base model learned with a large dataset into a target, dialogue-style speech model. The proposed method is shown to improve the intelligibility of synthesized dialogue speech, while maintaining the speaking style of dialogue.
Tuan Linh DANG Yukinobu HOSHINO
This paper presents a hybrid architecture for a neural network (NN) trained by a particle swarm optimization (PSO) algorithm. The NN is implemented on the hardware side while the PSO is executed by a processor on the software side. In addition, principal component analysis (PCA) is also applied to reduce correlated information. The PCA module is implemented in hardware by the SystemVerilog programming language to increase operating speed. Experimental results showed that the proposed architecture had been successfully implemented. In addition, the hardware-based NN trained by PSO (NN-PSO) program was faster than the software-based NN trained by the PSO program. The proposed NN-PSO with PCA also obtained better recognition rates than the NN-PSO without-PCA.
Tsuyoshi SUGIURA Satoshi FURUTA Tadamasa MURAKAMI Koki TANJI Norihisa OTANI Toshihiko YOSHIMASU
This paper presents high efficiency Class-E and compact Doherty power amplifiers (PAs) with novel harmonics termination for handset applications using a GaAs/InGaP heterojunction bipolar transistor (HBT) process. The novel harmonics termination circuit effectively reduces the insertion loss of the matching circuit, allowing a device with a compact size. The Doherty PA uses a lumped-element transformer which consists of metal-insulator-metal (MIM) capacitors on an IC substrate, a bonding-wire inductor and short micro-strip lines on a printed circuit board (PCB). The fabricated Class-E PA exhibits a power added efficiency (PAE) as high as 69.0% at 1.95GHz and as high as 67.6% at 2.535GHz. The fabricated Doherty PA exhibits an average output power of 25.5dBm and a PAE as high as 50.1% under a 10-MHz band width quadrature phase shift keying (QPSK) 6.16-dB peak-to-average-power-ratio (PAPR) LTE signal at 1.95GHz. The fabricated chip size is smaller than 1mm2. The input and output Doherty transformer areas are 0.5mm by 1.0mm and 0.7mm by 0.7mm, respectively.
Qiusheng HE Xiuyan SHAO Wei CHEN Xiaoyun LI Xiao YANG Tongfeng SUN
In order to solve the influence of scale change on target tracking using the drone, a multi-scale target tracking algorithm is proposed which based on the color feature tracking algorithm. The algorithm realized adaptive scale tracking by training position and scale correlation filters. It can first obtain the target center position of next frame by computing the maximum of the response, where the position correlation filter is learned by the least squares classifier and the dimensionality reduction for color features is analyzed by principal component analysis. The scale correlation filter is obtained by color characteristics at 33 rectangular areas which is set by the scale factor around the central location and is reduced dimensions by orthogonal triangle decomposition. Finally, the location and size of the target are updated by the maximum of the response. By testing 13 challenging video sequences taken by the drone, the results show that the algorithm has adaptability to the changes in the target scale and its robustness along with many other performance indicators are both better than the most state-of-the-art methods in illumination Variation, fast motion, motion blur and other complex situations.
Chaima DHAHRI Kazunori MATSUMOTO Keiichiro HOASHI
Upcoming mood prediction plays an important role in different topics such as bipolar depression disorder in psychology and quality-of-life and recommendations on health-related quality of life research. The mood in this study is defined as the general emotional state of a user. In contrast to emotions which is more specific and varying within a day, the mood is described as having either a positive or negative valence[1]. We propose an autonomous system that predicts the upcoming user mood based on their online activities over cyber, social and physical spaces without using extra-devices and sensors. Recently, many researchers have relied on online social networks (OSNs) to detect user mood. However, all the existing works focused on inferring the current mood and only few works have focused on predicting the upcoming mood. For this reason, we define a new goal of predicting the upcoming mood. We, first, collected ground truth data during two months from 383 subjects. Then, we studied the correlation between extracted features and user's mood. Finally, we used these features to train two predictive systems: generalized and personalized. The results suggest a statistically significant correlation between tomorrow's mood and today's activities on OSNs, which can be used to develop a decent predictive system with an average accuracy of 70% and a recall of 75% for the correlated users. This performance was increased to an average accuracy of 79% and a recall of 80% for active users who have more than 30 days of history data. Moreover, we showed that, for non-active users, referring to a generalized system can be a solution to compensate the lack of data at the early stage of the system, but when enough data for each user is available, a personalized system is used to individually predict the upcoming mood.
Kazuki OTOMO Satoru KOBAYASHI Kensuke FUKUDA Hiroshi ESAKI
System logs are useful to understand the status of and detect faults in large scale networks. However, due to their diversity and volume of these logs, log analysis requires much time and effort. In this paper, we propose a log event anomaly detection method for large-scale networks without pre-processing and feature extraction. The key idea is to embed a large amount of diverse data into hidden states by using latent variables. We evaluate our method with 12 months of system logs obtained from a nation-wide academic network in Japan. Through comparisons with Kleinberg's univariate burst detection and a traditional multivariate analysis (i.e., PCA), we demonstrate that our proposed method achieves 14.5% higher recall and 3% higher precision than PCA. A case study shows detected anomalies are effective information for troubleshooting of network system faults.
Yoichi SASAKI Tetsuo SHIBUYA Kimihito ITO Hiroki ARIMURA
In this paper, we study the approximate point set matching (APSM) problem with minimum RMSD score under translation, rotation, and one-to-one correspondence in d-dimension. Since most of the previous works about APSM problems use similality scores that do not especially care about one-to-one correspondence between points, such as Hausdorff distance, we cannot easily apply previously proposed methods to our APSM problem. So, we focus on speed-up of exhaustive search algorithms that can find all approximate matches. First, we present an efficient branch-and-bound algorithm using a novel lower bound function of the minimum RMSD score for the enumeration version of APSM problem. Then, we modify this algorithm for the optimization version. Next, we present another algorithm that runs fast with high probability when a set of parameters are fixed. Experimental results on both synthetic datasets and real 3-D molecular datasets showed that our branch-and-bound algorithm achieved significant speed-up over the naive algorithm still keeping the advantage of generating all answers.
Tao BAN Ryoichi ISAWA Shin-Ying HUANG Katsunari YOSHIOKA Daisuke INOUE
Along with the proliferation of IoT (Internet of Things) devices, cyberattacks towards them are on the rise. In this paper, aiming at efficient precaution and mitigation of emerging IoT cyberthreats, we present a multimodal study on applying machine learning methods to characterize malicious programs which target multiple IoT platforms. Experiments show that opcode sequences obtained from static analysis and API sequences obtained by dynamic analysis provide sufficient discriminant information such that IoT malware can be classified with near optimal accuracy. Automated and accelerated identification and mitigation of new IoT cyberthreats can be enabled based on the findings reported in this study.
Zhixiao WANG Mengnan HOU Guan YUAN Jing HE Jingjing CUI Mingjun ZHU
Social networks often demonstrate hierarchical community structure with communities embedded in other ones. Most existing hierarchical community detection methods need one or more tunable parameters to control the resolution levels, and the obtained dendrograms, a tree describing the hierarchical community structure, are extremely complex to understand and analyze. In the paper, we propose a parameter-free hierarchical community detection method based on micro-community and minimum spanning tree. The proposed method first identifies micro-communities based on link strength between adjacent vertices, and then, it constructs minimum spanning tree by successively linking these micro-communities one by one. The hierarchical community structure of social networks can be intuitively revealed from the merging order of these micro-communities. Experimental results on synthetic and real-world networks show that our proposed method exhibits good accuracy and efficiency performance and outperforms other state-of-the-art methods. In addition, our proposed method does not require any pre-defined parameters, and the output dendrogram is simple and meaningful for understanding and analyzing the hierarchical community structure of social networks.
Zhiqiang YI Meilin HE Peng PAN Haiquan WANG
This paper analyzes the performance of various decoders in a two-user interference channel, and some improved decoders based on enhanced utilization of channel state information at the receiver side are presented. Further, new decoders, namely hierarchical constellation based decoders, are proposed. Simulations show that the improved decoders and the proposed decoders have much better performance than existing decoders. Moreover, the proposed decoders have lower decoding complexity than the traditional maximum likelihood decoder.