The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.72

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E101-D No.5  (Publication Date:2018/05/01)

    Special Section on Machine Vision and its Applications
  • FOREWORD Open Access

    Norimichi UKITA  

     
    FOREWORD-Machine Vision and its Applications

      Page(s):
    1221-1221
  • Training of CNN with Heterogeneous Learning for Multiple Pedestrian Attributes Recognition Using Rarity Rate

    Hiroshi FUKUI  Takayoshi YAMASHITA  Yuji YAMAUCHI  Hironobu FUJIYOSHI  Hiroshi MURASE  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1222-1231

    Pedestrian attribute information is important function for an advanced driver assistance system (ADAS). Pedestrian attributes such as body pose, face orientation and open umbrella indicate the intended action or state of the pedestrian. Generally, this information is recognized using independent classifiers for each task. Performing all of these separate tasks is too time-consuming at the testing stage. In addition, the processing time increases with increasing number of tasks. To address this problem, multi-task learning or heterogeneous learning is performed to train a single classifier to perform multiple tasks. In particular, heterogeneous learning is able to simultaneously train a classifier to perform regression and recognition tasks, which reduces both training and testing time. However, heterogeneous learning tends to result in a lower accuracy rate for classes with few training samples. In this paper, we propose a method to improve the performance of heterogeneous learning for such classes. We introduce a rarity rate based on the importance and class probability of each task. The appropriate rarity rate is assigned to each training sample. Thus, the samples in a mini-batch for training a deep convolutional neural network are augmented according to this rarity rate to focus on the classes with a few samples. Our heterogeneous learning approach with the rarity rate performs pedestrian attribute recognition better, especially for classes representing few training samples.

  • Line-Based SLAM Using Non-Overlapping Cameras in an Urban Environment

    Atsushi KAWASAKI  Kosuke HARA  Hideo SAITO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1232-1242

    We propose a method of line-based Simultaneous Localization and Mapping (SLAM) using non-overlapping multiple cameras for vehicles running in an urban environment. It uses corresponding line segments between images taken by different frames and different cameras. The contribution is a novel line segment matching algorithm by warping processing based on urban structures. This idea significantly improves the accuracy of line segment matching when viewing direction are very different, so that a number of correspondences between front-view and rear-view cameras can be found and the accuracy of SLAM can be improved. Additionally, to enhance the accuracy of SLAM we apply a geometrical constraint of urban area for initial estimation of 3D mapping of line segments and optimization by bundle adjustment. We can further improve the accuracy of SLAM by combining points and lines. The position error is stable within 1.5m for the entire image dataset evaluated in this paper. The estimation accuracy of our method is as high as that of ground truth captured by RTK-GPS. Our high accuracy SLAM algorithm can be apply for generating a road map represented by line segments. According to an evaluation of our generating map, true positive rate around the vehicle exceeding 70% is achieved.

  • Real-Time Color Image Improvement System for Visual Testing of Nuclear Reactors

    Naoki HOSOYA  Atsushi MIYAMOTO  Junichiro NAGANUMA  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1243-1250

    Nuclear power plants require in-vessel inspections for soundness checks and preventive maintenance. One inspection procedure is visual testing (VT), which is based on video images of an underwater camera in a nuclear reactor. However, a lot of noise is superimposed on VT images due to radiation exposure. We propose a technique for improving the quality of those images by image processing that reduces radiation noise and enhances signals. Real-time video processing was achieved by applying the proposed technique with a parallel processing unit. Improving the clarity of VT images will lead to reducing the burden on inspectors.

  • Multi-Peak Estimation for Real-Time 3D Ping-Pong Ball Tracking with Double-Queue Based GPU Acceleration

    Ziwei DENG  Yilin HOU  Xina CHENG  Takeshi IKENAGA  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1251-1259

    3D ball tracking is of great significance in ping-pong game analysis, which can be utilized to applications such as TV contents and tactic analysis, with some of them requiring real-time implementation. This paper proposes a CPU-GPU platform based Particle Filter for multi-view ball tracking including 4 proposals. The multi-peak estimation and the ball-like observation model are proposed in the algorithm design. The multi-peak estimation aims at obtaining a precise ball position in case the particles' likelihood distribution has multiple peaks under complex circumstances. The ball-like observation model with 4 different likelihood evaluation, utilizes the ball's unique features to evaluate the particle's similarity with the target. In the GPU implementation, the double-queue structure and the vectorized data combination are proposed. The double-queue structure aims at achieving task parallelism between some data-independent tasks. The vectorized data combination reduces the time cost in memory access by combining 3 different image data to 1 vector data. Experiments are based on ping-pong videos recorded in an official match taken by 4 cameras located in 4 corners of the court. The tracking success rate reaches 99.59% on CPU. With the GPU acceleration, the time consumption is 8.8 ms/frame, which is sped up by a factor of 98 compared with its CPU version.

  • Pixel Selection and Intensity Directed Symmetry for High Frame Rate and Ultra-Low Delay Matching System

    Tingting HU  Takeshi IKENAGA  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1260-1269

    High frame rate and ultra-low delay matching system plays an increasingly important role in human-machine interactive applications which call for higher frame rate and lower delay for a better experience. The large amount of processing data and the complex computation in a local feature based matching system, make it difficult to achieve a high process speed and ultra-low delay matching with limited resource. Aiming at a matching system with the process speed of more than 1000 fps and with the delay of less than 1 ms/frame, this paper puts forward a local binary feature based matching system with field-programmable gate array (FPGA). Pixel selection based 4-1-4 parallel matching and intensity directed symmetry are proposed for the implementation of this system. To design a basic framework with the high process speed and ultra-low delay using limited resource, pixel selection based 4-1-4 parallel matching is proposed, which makes it possible to use only one-thread resource consumption to achieve a four-thread processing. Assumes that the orientation of the keypoint will bisect the patch best and will point to the region with high intensity, intensity directed symmetry is proposed to calculate the keypoint orientation in a hardware friendly way, which is an important part for a rotation-robust matching system. Software experiment result shows that the proposed keypoint orientation calculation method achieves almost the same performance with the state-of-art intensity centroid orientation calculation method in a matching system. Hardware experiment result shows that the designed image process core supports to process VGA (640×480) videos at a process speed of 1306 fps and with a delay of 0.8083 ms/frame.

  • Object Specific Deep Feature for Face Detection

    Xianxu HOU  Jiasong ZHU  Ke SUN  Linlin SHEN  Guoping QIU  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1270-1277

    Motivated by the observation that certain convolutional channels of a Convolutional Neural Network (CNN) exhibit object specific responses, we seek to discover and exploit the convolutional channels of a CNN in which neurons are activated by the presence of specific objects in the input image. A method for explicitly fine-tuning a pre-trained CNN to induce object specific channel (OSC) and systematically identifying it for the human faces has been developed. In this paper, we introduce a multi-scale approach to constructing robust face heatmaps based on OSC features for rapidly filtering out non-face regions thus significantly improving search efficiency for face detection. We show that multi-scale OSC can be used to develop simple and compact face detectors in unconstrained settings with state of the art performance.

  • Point of Gaze Estimation Using Corneal Surface Reflection and Omnidirectional Camera Image

    Taishi OGAWA  Atsushi NAKAZAWA  Toyoaki NISHIDA  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1278-1287

    We present a human point of gaze estimation system using corneal surface reflection and omnidirectional image taken by spherical panorama cameras, which becomes popular recent years. Our system enables to find where a user is looking at only from an eye image in a 360° surrounding scene image, thus, does not need gaze mapping from partial scene images to a whole scene image that are necessary in conventional eye gaze tracking system. We first generate multiple perspective scene images from an omnidirectional (equirectangular) image and perform registration between the corneal reflection and perspective images using a corneal reflection-scene image registration technique. We then compute the point of gaze using a corneal imaging technique leveraged by a 3D eye model, and project the point to an omnidirectional image. The 3D eye pose is estimate by using the particle-filter-based tracking algorithm. In experiments, we evaluated the accuracy of the 3D eye pose estimation, robustness of registration and accuracy of PoG estimations using two indoor and five outdoor scenes, and found that gaze mapping error was 5.546 [deg] on average.

  • Accelerating Existing Non-Blind Image Deblurring Techniques through a Strap-On Limited-Memory Switched Broyden Method

    Ichraf LAHOULI  Robby HAELTERMAN  Joris DEGROOTE  Michal SHIMONI  Geert DE CUBBER  Rabah ATTIA  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1288-1295

    Video surveillance from airborne platforms can suffer from many sources of blur, like vibration, low-end optics, uneven lighting conditions, etc. Many different algorithms have been developed in the past that aim to recover the deblurred image but often incur substantial CPU-time, which is not always available on-board. This paper shows how a “strap-on” quasi-Newton method can accelerate the convergence of existing iterative methods with little extra overhead while keeping the performance of the original algorithm, thus paving the way for (near) real-time applications using on-board processing.

  • Superimposing Thermal-Infrared Data on 3D Structure Reconstructed by RGB Visual Odometry

    Masahiro YAMAGUCHI  Trong Phuc TRUONG  Shohei MORI  Vincent NOZICK  Hideo SAITO  Shoji YACHIDA  Hideaki SATO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1296-1307

    In this paper, we propose a method to generate a three-dimensional (3D) thermal map and RGB + thermal (RGB-T) images of a scene from thermal-infrared and RGB images. The scene images are acquired by moving both a RGB camera and an thermal-infrared camera mounted on a stereo rig. Before capturing the scene with those cameras, we estimate their respective intrinsic parameters and their relative pose. Then, we reconstruct the 3D structures of the scene by using Direct Sparse Odometry (DSO) using the RGB images. In order to superimpose thermal information onto each point generated from DSO, we propose a method for estimating the scale of the point cloud corresponding to the extrinsic parameters between both cameras by matching depth images recovered from the RGB camera and the thermal-infrared camera based on mutual information. We also generate RGB-T images using the 3D structure of the scene and Delaunay triangulation. We do not rely on depth cameras and, therefore, our technique is not limited to scenes within the measurement range of the depth cameras. To demonstrate this technique, we generate 3D thermal maps and RGB-T images for both indoor and outdoor scenes.

  • Simultaneous Object Segmentation and Recognition by Merging CNN Outputs from Uniformly Distributed Multiple Viewpoints

    Yoshikatsu NAKAJIMA  Hideo SAITO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1308-1316

    We propose a novel object recognition system that is able to (i) work in real-time while reconstructing segmented 3D maps and simultaneously recognize objects in a scene, (ii) manage various kinds of objects, including those with smooth surfaces and those with a large number of categories, utilizing a CNN for feature extraction, and (iii) maintain high accuracy no matter how the camera moves by distributing the viewpoints for each object uniformly and aggregating recognition results from each distributed viewpoint as the same weight. Through experiments, the advantages of our system with respect to current state-of-the-art object recognition approaches are demonstrated on the UW RGB-D Dataset and Scenes and on our own scenes prepared to verify the effectiveness of the Viewpoint-Class-based approach.

  • Multicultural Facial Expression Recognition Based on Differences of Western-Caucasian and East-Asian Facial Expressions of Emotions

    Gibran BENITEZ-GARCIA  Tomoaki NAKAMURA  Masahide KANEKO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1317-1324

    An increasing number of psychological studies have demonstrated that the six basic expressions of emotions are not culturally universal. However, automatic facial expression recognition (FER) systems disregard these findings and assume that facial expressions are universally expressed and recognized across different cultures. Therefore, this paper presents an analysis of Western-Caucasian and East-Asian facial expressions of emotions based on visual representations and cross-cultural FER. The visual analysis builds on the Eigenfaces method, and the cross-cultural FER combines appearance and geometric features by extracting Local Fourier Coefficients (LFC) and Facial Fourier Descriptors (FFD) respectively. Furthermore, two possible solutions for FER under multicultural environments are proposed. These are based on an early race detection, and independent models for culture-specific facial expressions found by the analysis evaluation. HSV color quantization combined with LFC and FFD compose the feature extraction for race detection, whereas culture-independent models of anger, disgust and fear are analyzed for the second solution. All tests were performed using Support Vector Machines (SVM) for classification and evaluated using five standard databases. Experimental results show that both solutions overcome the accuracy of FER systems under multicultural environments. However, the approach which individually considers the culture-specific facial expressions achieved the highest recognition rate.

  • Extraction and Recognition of Shoe Logos with a Wide Variety of Appearance Using Two-Stage Classifiers

    Kazunori AOKI  Wataru OHYAMA  Tetsushi WAKABAYASHI  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1325-1332

    A logo is a symbolic presentation that is designed not only to identify a product manufacturer but also to attract the attention of shoppers. Shoe logos are a challenging subject for automatic extraction and recognition using image analysis techniques because they have characteristics that distinguish them from those of other products; that is, there is much within-class variation in the appearance of shoe logos. In this paper, we propose an automatic extraction and recognition method for shoe logos with a wide variety of appearance using a limited number of training samples. The proposed method employs maximally stable extremal regions for the initial region extraction, an iterative algorithm for region grouping, and gradient features and a support vector machine for logo recognition. The results of performance evaluation experiments using a logo dataset that consists of a wide variety of appearances show that the proposed method achieves promising performance for both logo extraction and recognition.

  • Image-Based Food Calorie Estimation Using Recipe Information

    Takumi EGE  Keiji YANAI  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Page(s):
    1333-1341

    Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.

  • Regular Section
  • Long-Term Tracking Based on Multi-Feature Adaptive Fusion for Video Target

    Hainan ZHANG  Yanjing SUN  Song LI  Wenjuan SHI  Chenglong FENG  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/02/02
      Page(s):
    1342-1349

    The correlation filter-based trackers with an appearance model established by single feature have poor robustness to challenging video environment which includes factors such as occlusion, fast motion and out-of-view. In this paper, a long-term tracking algorithm based on multi-feature adaptive fusion for video target is presented. We design a robust appearance model by fusing powerful features including histogram of gradient, local binary pattern and color-naming at response map level to conquer the interference in the video. In addition, a random fern classifier is trained as re-detector to detect target when tracking failure occurs, so that long-term tracking is implemented. We evaluate our algorithm on large-scale benchmark datasets and the results show that the proposed algorithm have more accurate and more robust performance in complex video environment.

  • A Hardware-Based Caching System on FPGA NIC for Blockchain

    Yuma SAKAKIBARA  Shin MORISHIMA  Kohei NAKAMURA  Hiroki MATSUTANI  

     
    PAPER-Computer System

      Pubricized:
    2018/02/02
      Page(s):
    1350-1360

    Engineers and researchers have recently paid attention to Blockchain. Blockchain is a fault-tolerant distributed ledger without administrators. Blockchain is originally derived from cryptocurrency, but it is possible to be applied to other industries. Transferring digital asset is called a transaction. Blockchain holds all transactions, so the total amount of Blockchain data will increase as time proceeds. On the other hand, the number of Internet of Things (IoT) products has been increasing. It is difficult for IoT products to hold all Blockchain data because of their storage capacity. Therefore, they access Blockchain data via servers that have Blockchain data. However, if a lot of IoT products access Blockchain network via servers, server overloads will occur. Thus, it is useful to reduce workloads and improve throughput. In this paper, we propose a caching technique using a Field Programmable Gate Array-based (FPGA) Network Interface Card (NIC) which possesses four 10Gigabit Ethernet (10GbE) interfaces. The proposed system can reduce server overloads, because the FPGA NIC instead of the server responds to requests from IoT products if cache hits. We implemented the proposed hardware cache to achieve high throughput on NetFPGA-10G board. We counted the number of requests that the server or the FPGA NIC processed as an evaluation. As a result, the throughput improved by on average 1.97 times when hitting the cache.

  • A Real-Time Subtask-Assistance Strategy for Adaptive Services Composition

    Li QUAN  Zhi-liang WANG  Xin LIU  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2018/01/30
      Page(s):
    1361-1369

    Reinforcement learning has been used to adaptive service composition. However, traditional algorithms are not suitable for large-scale service composition. Based on Q-Learning algorithm, a multi-task oriented algorithm named multi-Q learning is proposed to realize subtask-assistance strategy for large-scale and adaptive service composition. Differ from previous studies that focus on one task, we take the relationship between multiple service composition tasks into account. We decompose complex service composition task into multiple subtasks according to the graph theory. Different tasks with the same subtasks can assist each other to improve their learning speed. The results of experiments show that our algorithm could obtain faster learning speed obviously than traditional Q-learning algorithm. Compared with multi-agent Q-learning, our algorithm also has faster convergence speed. Moreover, for all involved service composition tasks that have the same subtasks between each other, our algorithm can improve their speed of learning optimal policy simultaneously in real-time.

  • Detecting Malware-Infected Devices Using the HTTP Header Patterns

    Sho MIZUNO  Mitsuhiro HATADA  Tatsuya MORI  Shigeki GOTO  

     
    PAPER-Information Network

      Pubricized:
    2018/02/08
      Page(s):
    1370-1379

    Damage caused by malware has become a serious problem. The recent rise in the spread of evasive malware has made it difficult to detect it at the pre-infection timing. Malware detection at post-infection timing is a promising approach that fulfills this gap. Given this background, this work aims to identify likely malware-infected devices from the measurement of Internet traffic. The advantage of the traffic-measurement-based approach is that it enables us to monitor a large number of endhosts. If we find an endhost as a source of malicious traffic, the endhost is likely a malware-infected device. Since the majority of malware today makes use of the web as a means to communicate with the C&C servers that reside on the external network, we leverage information recorded in the HTTP headers to discriminate between malicious and benign traffic. To make our approach scalable and robust, we develop the automatic template generation scheme that drastically reduces the amount of information to be kept while achieving the high accuracy of classification; since it does not make use of any domain knowledge, the approach should be robust against changes of malware. We apply several classifiers, which include machine learning algorithms, to the extracted templates and classify traffic into two categories: malicious and benign. Our extensive experiments demonstrate that our approach discriminates between malicious and benign traffic with up to 97.1% precision while maintaining the false positive rate below 1.0%.

  • Retweeting Prediction Based on Social Hotspots and Dynamic Tensor Decomposition

    Qian LI  Xiaojuan LI  Bin WU  Yunpeng XIAO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/01/30
      Page(s):
    1380-1392

    In social networks, predicting user behavior under social hotspots can aid in understanding the development trend of a topic. In this paper, we propose a retweeting prediction method for social hotspots based on tensor decomposition, using user information, relationship and behavioral data. The method can be used to predict the behavior of users and analyze the evolvement of topics. Firstly, we propose a tensor-based mechanism for mining user interaction, and then we propose that the tensor be used to solve the problem of inaccuracy that arises when interactively calculating intensity for sparse user interaction data. At the same time, we can analyze the influence of the following relationship on the interaction between users based on characteristics of the tensor in data space conversion and projection. Secondly, time decay function is introduced for the tensor to quantify further the evolution of user behavior in current social hotspots. That function can be fit to the behavior of a user dynamically, and can also solve the problem of interaction between users with time decay. Finally, we invoke time slices and discretization of the topic life cycle and construct a user retweeting prediction model based on logistic regression. In this way, we can both explore the temporal characteristics of user behavior in social hotspots and also solve the problem of uneven interaction behavior between users. Experiments show that the proposed method can improve the accuracy of user behavior prediction effectively and aid in understanding the development trend of a topic.

  • Modeling Complex Relationship Paths for Knowledge Graph Completion

    Ping ZENG  Qingping TAN  Xiankai MENG  Haoyu ZHANG  Jianjun XU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/02/20
      Page(s):
    1393-1400

    Determining the validity of knowledge triples and filling in the missing entities or relationships in the knowledge graph are the crucial tasks for large-scale knowledge graph completion. So far, the main solutions use machine learning methods to learn the low-dimensional distributed representations of entities and relationships to complete the knowledge graph. Among them, translation models obtain excellent performance. However, the proposed translation models do not adequately consider the indirect relationships among entities, affecting the precision of the representation. Based on the long short-term memory neural network and existing translation models, we propose a multiple-module hybrid neural network model called TransP. By modeling the entity paths and their relationship paths, TransP can effectively excavate the indirect relationships among the entities, and thus, improve the quality of knowledge graph completion tasks. Experimental results show that TransP outperforms state-of-the-art models in the entity prediction task, and achieves the comparable performance with previous models in the relationship prediction task.

  • Study on Driver Agent Based on Analysis of Driving Instruction Data — Driver Agent for Encouraging Safe Driving Behavior (1) —

    Takahiro TANAKA  Kazuhiro FUJIKAKE  Takashi YONEKAWA  Misako YAMAGISHI  Makoto INAGAMI  Fumiya KINOSHITA  Hirofumi AOKI  Hitoshi KANAMORI  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2018/01/24
      Page(s):
    1401-1409

    In recent years, the number of traffic accidents caused by elderly drivers has increased in Japan. However, cars are an important mode of transportation for the elderly. Therefore, to ensure safe driving, a system that can assist elderly drivers is required. We propose a driver-agent system that provides support to elderly drivers during and after driving and encourages them to improve their driving. This paper describes the prototype system and the analysis conducted of the teaching records of a human instructor, the impression caused by the instructions on a subject during driving, and subjective evaluation of the driver-agent system.

  • Exponential Neighborhood Preserving Embedding for Face Recognition

    Ruisheng RAN  Bin FANG  Xuegang WU  

     
    PAPER-Pattern Recognition

      Pubricized:
    2018/01/23
      Page(s):
    1410-1420

    Neighborhood preserving embedding is a widely used manifold reduced dimensionality technique. But NPE has to encounter two problems. One problem is that it suffers from the small-sample-size (SSS) problem. Another is that the performance of NPE is seriously sensitive to the neighborhood size k. To overcome the two problems, an exponential neighborhood preserving embedding (ENPE) is proposed in this paper. The main idea of ENPE is that the matrix exponential is introduced to NPE, then the SSS problem is avoided and low sensitivity to the neighborhood size k is gotten. The experiments are conducted on ORL, Georgia Tech and AR face database. The results show that, ENPE shows advantageous performance over other unsupervised methods, such as PCA, LPP, ELPP and NPE. Another is that ENPE is much less sensitive to the neighborhood parameter k contrasted with the unsupervised manifold learning methods LPP, ELPP and NPE.

  • Novel Defogging Algorithm Based on the Joint Use of Saturation and Color Attenuation Prior

    Chen QU  Duyan BI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2018/01/30
      Page(s):
    1421-1429

    Focusing on the defects of famous defogging algorithms for fog images based on the atmosphere scattering model, we find that it is necessary to obtain accurate transmission map that can reflect the real depths both in large depth and close range. And it is hard to tackle this with just one prior because of the differences between the large depth and close range in foggy images. Hence, we propose a novel prior that simplifies the solution of transmission map by transferring coefficient, called saturation prior. Then, under the Random Walk model, we constrain the transferring coefficient with the color attenuation prior that can obtain good transmission map in large depth regions. More importantly, we design a regularization weight to balance the influences of saturation prior and color attenuation prior to the transferring coefficient. Experimental results demonstrate that the proposed defogging method outperforms the state-of-art image defogging methods based on single prior in terms of details restoring and color preserving.

  • Graph-Based Video Search Reranking with Local and Global Consistency Analysis

    Soh YOSHIDA  Takahiro OGAWA  Miki HASEYAMA  Mitsuji MUNEYASU  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2018/01/30
      Page(s):
    1430-1440

    Video reranking is an effective way for improving the retrieval performance of text-based video search engines. This paper proposes a graph-based Web video search reranking method with local and global consistency analysis. Generally, the graph-based reranking approach constructs a graph whose nodes and edges respectively correspond to videos and their pairwise similarities. A lot of reranking methods are built based on a scheme which regularizes the smoothness of pairwise relevance scores between adjacent nodes with regard to a user's query. However, since the overall consistency is measured by aggregating only the local consistency over each pair, errors in score estimation increase when noisy samples are included within query-relevant videos' neighbors. To deal with the noisy samples, the proposed method leverages the global consistency of the graph structure, which is different from the conventional methods. Specifically, in order to detect this consistency, the propose method introduces a spectral clustering algorithm which can detect video groups, in which videos have strong semantic correlation, on the graph. Furthermore, a new regularization term, which smooths ranking scores within the same group, is introduced to the reranking framework. Since the score regularization is performed by both local and global aspects simultaneously, the accurate score estimation becomes feasible. Experimental results obtained by applying the proposed method to a real-world video collection show its effectiveness.

  • Tree-Based Feature Transformation for Purchase Behavior Prediction

    Chunyan HOU  Chen CHEN  Jinsong WANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/02/02
      Page(s):
    1441-1444

    In the era of e-commerce, purchase behavior prediction is one of the most important issues to promote both online companies' sales and the consumers' experience. The previous researches usually use the feature engineering and ensemble machine learning algorithms for the prediction. The performance really depends on designed features and the scalability of algorithms because the large-scale data and a lot of categorical features lead to huge samples and the high-dimensional feature. In this study, we explore an alternative to use tree-based Feature Transformation (FT) and simple machine learning algorithms (e.g. Logistic Regression). Random Forest (RF) and Gradient Boosting decision tree (GB) are used for FT. Then, the simple algorithm, rather than ensemble algorithms, is used to predict purchase behavior based on transformed features. Tree-based FT regards the leaves of trees as transformed features, and can learn high-order interactions among original features. Compared with RF, if GB is used for FT, simple algorithms are enough to achieve better performance.

  • Complex-Valued Fully Convolutional Networks for MIMO Radar Signal Segmentation

    Motoko TACHIBANA  Kohei YAMAMOTO  Kurato MAENO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2018/02/20
      Page(s):
    1445-1448

    Radar is expected in advanced driver-assistance systems for environmentally robust measurements. In this paper, we propose a novel radar signal segmentation method by using a complex-valued fully convolutional network (CvFCN) that comprises complex-valued layers, real-valued layers, and a bidirectional conversion layer between them. We also propose an efficient automatic annotation system for dataset generation. We apply the CvFCN to two-dimensional (2D) complex-valued radar signal maps (r-maps) that comprise angle and distance axes. An r-maps is a 2D complex-valued matrix that is generated from raw radar signals by 2D Fourier transformation. We annotate the r-maps automatically using LiDAR measurements. In our experiment, we semantically segment r-map signals into pedestrian and background regions, achieving accuracy of 99.7% for the background and 96.2% for pedestrians.

  • Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage

    Yinan LIU  Qingbo WU  Liangzhi TANG  Linfeng XU  

     
    LETTER-Pattern Recognition

      Pubricized:
    2018/02/21
      Page(s):
    1449-1452

    In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.

  • Bilateral Convolutional Activations Encoded with Fisher Vectors for Scene Character Recognition

    Zhong ZHANG  Hong WANG  Shuang LIU  Tariq S. DURRANI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/02/02
      Page(s):
    1453-1456

    A rich and robust representation for scene characters plays a significant role in automatically understanding the text in images. In this letter, we focus on the issue of feature representation, and propose a novel encoding method named bilateral convolutional activations encoded with Fisher vectors (BCA-FV) for scene character recognition. Concretely, we first extract convolutional activation descriptors from convolutional maps and then build a bilateral convolutional activation map (BCAM) to capture the relationship between the convolutional activation response and the spatial structure information. Finally, in order to obtain the global feature representation, the BCAM is injected into FV to encode convolutional activation descriptors. Hence, the BCA-FV can effectively integrate the prominent features and spatial structure information for character representation. We verify our method on two widely used databases (ICDAR2003 and Chars74K), and the experimental results demonstrate that our method achieves better results than the state-of-the-art methods. In addition, we further validate the proposed BCA-FV on the “Pan+ChiPhoto” database for Chinese scene character recognition, and the experimental results show the good generalization ability of the proposed BCA-FV.

  • Pedestrian Detectability Estimation Considering Visual Adaptation to Drastic Illumination Change

    Yuki IMAEDA  Takatsugu HIRAYAMA  Yasutomo KAWANISHI  Daisuke DEGUCHI  Ichiro IDE  Hiroshi MURASE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/02/20
      Page(s):
    1457-1461

    We propose an estimation method of pedestrian detectability considering the driver's visual adaptation to drastic illumination change, which has not been studied in previous works. We assume that driver's visual characteristics change in proportion to the elapsed time after illumination change. In this paper, as a solution, we construct multiple estimators corresponding to different elapsed periods, and estimate the detectability by switching them according to the elapsed period. To evaluate the proposed method, we construct an experimental setup to present a participant with illumination changes and conduct a preliminary simulated experiment to measure and estimate the pedestrian detectability according to the elapsed period. Results show that the proposed method can actually estimate the detectability accurately after a drastic illumination change.

  • Real-Time Approximation of a Normal Distribution Function for Normal-Mapped Surfaces

    Han-sung SON  JungHyun HAN  

     
    LETTER-Computer Graphics

      Pubricized:
    2018/02/06
      Page(s):
    1462-1465

    This paper proposes to pre-compute approximate normal distribution functions and store them in textures such that real-time applications can process complex specular surfaces simply by sampling the textures. The proposed method is compatible with the GPU pipeline-based algorithms, and rendering is completed at real time. The experimental results show that the features of complex specular surfaces, such as the glinty appearance of leather and metallic flakes, are successfully reproduced.