1-11hit |
Junki OSHIBA Motoi IWATA Koichi KISE
Recently, deep learning for image generation with a guide for the generation has been progressing. Many methods have been proposed to generate the animation of facial expression change from a single face image by transferring some facial expression information to the face image. In particular, the method of using facial landmarks as facial expression information can generate a variety of facial expressions. However, most methods do not focus on anime characters but humans. Moreover, we attempted to apply several existing methods to anime characters by training the methods on an anime character face dataset; however, they generated images with noise, even in regions where there was no change. The first order motion model (FOMM) is an image generation method that takes two images as input and transfers one facial expression or pose to the other. By explicitly calculating the difference between the two images based on optical flow, FOMM can generate images with low noise in the unchanged regions. In the following, we focus on the aspect of the face image generation in FOMM. When we think about the employment of facial landmarks as targets, the performance of FOMM is not enough because FOMM cannot use a facial landmark as a facial expression target because the appearances of a face image and a facial landmark are quite different. Therefore, we propose an advanced FOMM method to use facial landmarks as a facial expression target. In the proposed method, we change the input data and data flow to use facial landmarks. Additionally, to generate face images with expressions that follow the target landmarks more closely, we introduce the landmark estimation loss, which is computed by comparing the landmark detected from the generated image with the target landmark. Our experiments on an anime character face image dataset demonstrated that our method is effective for landmark-guided face image generation for anime characters. Furthermore, our method outperformed other methods quantitatively and generated face images with less noise.
Yu SONG Xu QIAO Yutaro IWAMOTO Yen-Wei CHEN Yili CHEN
Accurate and automatic quantitative cephalometry analysis is of great importance in orthodontics. The fundamental step for cephalometry analysis is to annotate anatomic-interested landmarks on X-ray images. Computer-aided automatic method remains to be an open topic nowadays. In this paper, we propose an efficient deep learning-based coarse-to-fine approach to realize accurate landmark detection. In the coarse detection step, we train a deep learning-based deformable transformation model by using training samples. We register test images to the reference image (one training image) using the trained model to predict coarse landmarks' locations on test images. Thus, regions of interest (ROIs) which include landmarks can be located. In the fine detection step, we utilize trained deep convolutional neural networks (CNNs), to detect landmarks in ROI patches. For each landmark, there is one corresponding neural network, which directly does regression to the landmark's coordinates. The fine step can be considered as a refinement or fine-tuning step based on the coarse detection step. We validated the proposed method on public dataset from 2015 International Symposium on Biomedical Imaging (ISBI) grand challenge. Compared with the state-of-the-art method, we not only achieved the comparable detection accuracy (the mean radial error is about 1.0-1.6mm), but also largely shortened the computation time (4 seconds per image).
Siya BAO Tomoyuki NITTA Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose a safe and comprehensive route finding algorithm for pedestrians based on lighting and landmark conditions. Safety and comprehensiveness can be predicted by the five possible indicators: (1) lighting conditions, (2) landmark visibility, (3) landmark effectiveness, (4) turning counts along a route, and (5) road widths. We first investigate impacts of these five indicators on pedestrians' perceptions on safety and comprehensiveness during route findings. After that, a route finding algorithm is proposed for pedestrians. In the algorithm, we design the score based on the indicators (1), (2), (3), and (5) above and also introduce a turning count reduction strategy for the indicator (4). Thus we find out a safe and comprehensive route through them. In particular, we design daytime score and nighttime score differently and find out an appropriate route depending on the time periods. Experimental simulation results demonstrate that the proposed algorithm obtains higher scores compared to several existing algorithms. We also demonstrate that the proposed algorithm is able to find out safe and comprehensive routes for pedestrians in real environments in accordance with questionnaire results.
Truc Hung NGO Yen-Wei CHEN Naoki MATSUSHIRO Masataka SEO
Facial paralysis is a popular clinical condition occurring in 30 to 40 patients per 100,000 people per year. A quantitative tool to support medical diagnostics is necessary. This paper proposes a simple, visual and robust method that can objectively measure the degree of the facial paralysis by the use of spatiotemporal features. The main contribution of this paper is the proposal of an effective spatiotemporal feature extraction method based on a tracking of landmarks. Our method overcomes the drawbacks of the other techniques such as the influence of irrelevant regions, noise, illumination change and time-consuming process. In addition, the method is simple and visual. The simplification helps to reduce the time-consuming process. Also, the movements of landmarks, which relate to muscle movement ability, are visual. Therefore, the visualization helps reveal regions of serious facial paralysis. For recognition rate, experimental results show that our proposed method outperformed the other techniques tested on a dynamic facial expression image database.
Jung-In LEE Jeung-Yoon CHOI Hong-Goo KANG
There have been steady demands for a speech segmentation method to handle various speech applications. Conventional segmentation algorithms show reliable performance but they require a sufficient training database. This letter proposes a manner class segmentation method based on the acoustic event and landmark detection used in the knowledge-based speech recognition system. Measurements of sub-band abruptness and additional parameters are used to detect the acoustic events. Candidates of manner classes are segmented from the acoustic events and determined based on the knowledge of acoustic phonetics and acoustic parameters. Manners of vowel/glide, nasal, fricative, stop burst, stop closure, and silence are segmented in this system. In total, 71% of manner classes are correctly segmented with 20-ms error boundaries.
Safi-Ullah NASIR Tae-Hyung KIM
Computing the level of trust between two indirectly connected users in an online social network (OSN) is a problem that has received considerable attention of researchers in recent years. Most algorithms focus on finding the most accurate prediction of trust; however, little work has been done to make them fast enough to be applied on today's very large OSNs. To address this need we propose a method for fast trust computation that is suitable for very large social networks. Our method uses min-max trust propagation strategies along with the landmark based method. Path strength of every node is pre-computed to and from a small set of reference users or landmarks. Using these pre-computed values, we estimate the strength of trust paths from the source user to in-neighbors of the target user. The final trust estimate is obtained by aggregating information from most reliable in-neighbors of the target user. We also describe how the landmark based method can be used for fast trust computation according to other trust propagation models. Experiments on a variety of real social network datasets verify the efficiency and accuracy of our method.
Masashi KOMORI Hiroko KAMIDE Satoru KAWAMURA Chika NAGAOKA
This study investigated the relationship between social skills and facial asymmetry in facial expressions. Three-dimensional facial landmark data of facial expressions (neutral, happy, and angry) were obtained from Japanese participants (n = 62). Following a facial expression task, each participant completed KiSS-18 (Kikuchi's Scale of Social Skills; Kikuchi, 2007). Using a generalized Procrustes analysis, faces and their mirror-reversed versions were represented as points on a hyperplane. The asymmetry of each individual face was defined as Euclidian distance between the face and its mirror reversed face on this plane. Subtraction of the asymmetry level of a neutral face of each individual from the asymmetry level of a target emotion face was defined as the index of “expression asymmetry” given by a particular emotion. Correlation coefficients of KiSS-18 scores and expression asymmetry scores were computed for both happy and angry expressions. Significant negative correlations between KiSS-18 scores and expression asymmetries were found for both expressions. Results indicate that the symmetry in facial expressions increases with higher level of social skills.
Jung-In LEE Jeung-Yoon CHOI Hong-Goo KANG
Refinement methods for landmark detection and extraction of articulator-free features for a knowledge-based speech recognition system are described. Sub-band energy difference profiles are used to detect landmarks, with additional parameters used to improve accuracy. For articulator-free feature extraction, duration, relative energy, and silence detection are additionally used to find [continuant] and [strident] features. Vowel, obstruent and sonorant consonant landmarks, and locations of voicing onsets and offsets are detected within a unified framework with 85% accuracy overall. Additionally, 75% and 79% of [continuant] and [strident] features, respectively, are detected from landmarks.
Yuan HU Jingqi YAN Wei LI Pengfei SHI
A robust method is presented for 3D face landmarking with facial pose and expression variations. This method is based on Multi-level Partition of Unity (MPU) Implicits without relying on texture, pose, orientation and expression information. The MPU Implicits reconstruct 3D face surface in a hierarchical way. From lower to higher reconstruction levels, the local shapes can be reconstructed gradually according to their significance. For 3D faces, three landmarks, nose, left eyehole and right eyehole, can be detected uniquely with the analysis of curvature features at lower levels. Experimental results on GavabDB database show that this method is invariant to pose, holes, noise and expression. The overall performance of 98.59% is achieved under pose and expression variations.
Kanji TANAKA Yoshihiko KIMURO Kentaro YAMANO Mitsuru HIRAYAMA Eiji KONDO Michihito MATSUMOTO
This work is concerned with the problem of robot localization using standard RFID tags as landmarks and an RFID reader as a landmark sensor. A main advantage of such an RFID-based localization system is the availability of landmark ID measurement, which trivially solves the data association problem. While the main drawback of an RFID system is its low spatial accuracy. The result in this paper is an improvement of the localization accuracy for a standard short-range RFID sensor. One of the main contributions is a proposal of a machine learning approach in which multiple classifiers are trained to distinguish RFID-signal features of each location. Another contribution is a design tool for tag arrangement by which the tag configuration needs not be manually designed by the user, but can be automatically recommended by the system. The effectiveness of the proposed technique is evaluated experimentally with a real mobile robot and an RFID system.
Ronghua YAN Naoyuki TOKUDA Juichi MIYAMICHI
Unlike the time-consuming contour tracking method of snakes [5] which requires a considerable number of iterated computations before contours are successfully tracked down, we present a faster and accurate model-based landmarks" tracking method where a single iteration of the dynamic programming is sufficient to obtain a local minimum to an integral measure of the elastic and the image energy functionals. The key lies in choosing a relatively small number of salient land-marks", or features of objects, rather than their contours as a target of tracking within the image structure. The landmarks comprising singular points along the model contours are tracked down within the image structure all inside restricted search areas of 41 41 pixels whose respective locations in image structure are dictated by their locations in the model. A Manhattan distance and a template corner detection function of Singh and Shneier [7] are used as elastic energy and image energy respectively in the algorithm. A first approximation to the image contour is obtained in our method by applying the thin-plate spline transformation of Bookstein [2] using these landmarks as fixed points of the transformation which is capable of preserving a global shape information of the model including the relative configuration of landmarks and consequently surrounding contours of the model in the image structure. The actual image contours are further tracked down by applying an active edge tracker using now simplified line search segments so that individual differences persisting between the mapped model contour are substantially eliminated. We have applied our method tentatively to portraits of a class album to demonstrate the effectiveness of the method. Our experiments convincingly show that using only about 11 feature points our method provides not only a much improved computational complexity requiring only 0.94sec. in CPU time by SGI's indigo2 but also more accurate shape representations than those obtained by the snakes methods. The method is powerful in a problem domain where the model-based approach is applicable, possibly allowing real time processing because a most time consuming algorithm of corner template evaluation can be easily implemented by parallel processing firmware.