Speaker change detection involves the identification of the time indices of an audio stream, where the identity of the speaker changes. This paper proposes novel measures for speaker change detection over the centroid model, which divides the feature space into non-overlapping clusters for effective speaker-change comparison. The centroid model is a computationally-efficient variant of the widely-used mixture-distribution based background models for speaker recognition. Experiments on both synthetic and real-world data were performed; the results show that the proposed approach yields promising results compared with the conventional statistical measures.
Andre CAVALCANTE Allan Kardec BARROS Yoshinori TAKEUCHI Noboru OHNISHI
In this letter, a new approach to segment depth-of-field (DoF) images is proposed. The methodology is based on a two-stage model of visual neuron. The first stage is a retinal filtering by means of luminance normalizing non-linearity. The second stage is a V1-like filtering using filters estimated by independent component analysis (ICA). Segmented image is generated by the response activity of the neuron measured in terms of kurtosis. Results demonstrate that the model can discriminate image parts in different levels of depth-of-field. Comparison with other methodologies and limitations of the proposed methodology are also presented.
Zhenfeng SHI Dan LE Liyang YU Xiamu NIU
3D Mesh segmentation has become an important research field in computer graphics during the past few decades. Many geometry based and semantic oriented approaches for 3D mesh segmentation has been presented. However, only a few algorithms based on Markov Random Field (MRF) has been presented for 3D object segmentation. In this letter, we present a definition of mesh segmentation according to the labeling problem. Inspired by the capability of MRF combining the geometric information and the topology information of a 3D mesh, we propose a novel 3D mesh segmentation model based on MRF and Graph Cuts. Experimental results show that our MRF-based schema achieves an effective segmentation.
Yaping HUANG Siwei LUO Shengchun WANG
Railway inspection is important in railway maintenance. There are several tasks in railway inspection, e.g., defect detection and bolt detection. For those inspection tasks, the detection of rail surface is a fundamental and key issue. In order to detect rail defects and missing bolts, one must know the exact location of the rail surface. To deal with this problem, we propose an efficient Rail Surface Detection (RSD) algorithm that combines boundary and region information in a uniform formulation. Moreover, we reevaluate the rail location by introducing the top down information–bolt location prior. The experimental results show that the proposed algorithm can detect the rail surface efficiently.
Sho ENDO Jun SONODA Motoyuki SATO Takafumi AOKI
Finite difference time domain (FDTD) method has been accelerated on the Cell Broadband Engine (Cell B.E.). However the problem has arisen that speedup is limited by the bandwidth of the main memory on large-scale analysis. As described in this paper, we propose a novel algorithm and implement FDTD using it. We compared the novel algorithm with results obtained using region segmentation, thereby demonstrating that the proposed algorithm has shorter calculation time than that provided by region segmentation.
A novel grouping approach to segment text lines from handwritten documents is presented. In this text line segmentation algorithm, for each text line, a text string that connects the center points of the characters in this text line is built. The text lines are then segmented using the resulting text strings. Since the characters of the same text line are situated close together and aligned on a smooth curve, 2D tensor voting is used to reduce the conflicts when building these text strings. First, the text lines are represented by separate connected components. The center points of these connected components are then encoded by second order tensors. Finally, a voting process is applied to extract the curve saliency values and normal vectors, which are used to remove outliers and build the text strings. The experimental results obtained from the test dataset of the ICDAR 2009 Handwriting Segmentation Contest show that the proposed method generates high detection rate and recognition accuracy.
Andrew FINCH Keiji YASUDA Hideo OKUMA Eiichiro SUMITA Satoshi NAKAMURA
The contribution of this paper is two-fold. Firstly, we conduct a large-scale real-world evaluation of the effectiveness of integrating an automatic transliteration system with a machine translation system. A human evaluation is usually preferable to an automatic evaluation, and in the case of this evaluation especially so, since the common machine translation evaluation methods are affected by the length of the translations they are evaluating, often being biassed towards translations in terms of their length rather than the information they convey. We evaluate our transliteration system on data collected in field experiments conducted all over Japan. Our results conclusively show that using a transliteration system can improve machine translation quality when translating unknown words. Our second contribution is to propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the overfitting problem inherent in maximum likelihood training. We demonstrate the effectiveness of our Bayesian segmentation by using it to build a translation model for a phrase-based statistical machine translation (SMT) system trained to perform transliteration by monotonic transduction from character sequence to character sequence. The Bayesian segmentation was used to construct a phrase-table and we compared the quality of this phrase-table to one generated in the usual manner by the state-of-the-art GIZA++ word alignment process used in combination with phrase extraction heuristics from the MOSES statistical machine translation system, by using both to perform transliteration generation within an identical framework. In our experiments on English-Japanese data from the NEWS2010 transliteration generation shared task, we used our technique to bilingually co-segment the training corpus. We then derived a phrase-table from the segmentation from the sample at the final iteration of the training procedure, and the resulting phrase-table was used to directly substitute for the phrase-table extracted by using GIZA++/MOSES. The phrase-table resulting from our Bayesian segmentation model was approximately 30% smaller than that produced by the SMT system's training procedure, and gave an increase in transliteration quality measured in terms of both word accuracy and F-score.
Ryousei TAKANO Tomohiro KUDOH Yuetsu KODAMA Fumihiro OKAZAKI
Packet pacing is a well-known technique for reducing the short-time-scale burstiness of traffic, and software-based packet pacing has been categorized into two approaches: the timer interrupt-based approach and the gap packet-based approach. The former was originally hard to implement for Gigabit class networks because it requires the operating system to handle too frequent periodic timer interrupts, thus incurring a large overhead. On the other hand, a gap packet-based packet pacing mechanism achieves precise pacing without depending on the timer resolution. However, in order to guarantee the accuracy of rate control, the system must be able to transmit packets at the wire rate. In this paper, we propose a high-resolution timer-based packet pacing mechanism that determines the transmission timing of packets by using a sub-microsecond resolution timer. The high-resolution timer is a light-weight mechanism compared to the traditional low-resolution periodic timer. With recent progress in hardware protocol offload technologies and multicore-aware network protocol stacks, we believe high-resolution timer-based packet pacing has become practical. Our experimental results show that the proposed mechanism can work on a wider range of systems without degrading the accuracy of rate control. However, a higher CPU load is observed when the number of traffic classes increases, compared to a gap packet-based pacing mechanism.
Je-Hoon LEE Young-Jun SONG Sang-Choon KIM
This paper presents a self-timed SRAM system employing new memory segment technique that divides memory cell arrays into multiple regions based on its latency, not the size of the memory cell array. This is the main difference between the proposed memory segmentation technique and the conventional method. Consequently, the proposed method provides a more efficient way to reduce the memory access time. We also proposed an architecture of dummy cell and completion signal generator for the handshaking protocol. We synthesized a 8 MB SRAM system consisting of 16 512K memory blocks using Hynix 0.35-µm CMOS process. Our implantation shows 15% higher performance compared to the other systems. Our implementation results shows a trade-off between the area overhead and the performance for the number of memory segmentation.
Luis Ricardo SAPAICO Hamid LAGA Masayuki NAKAJIMA
We propose a system that, using video information, segments the mouth region from a face image and then detects the protrusion of the tongue from inside the oral cavity. Initially, under the assumption that the mouth is closed, we detect both mouth corners. We use a set of specifically oriented Gabor filters for enhancing horizontal features corresponding to the shadow existing between the upper and lower lips. After applying the Hough line detector, the extremes of the line that was found are regarded as the mouth corners. Detection rate for mouth corner localization is 85.33%. These points are then input to a mouth appearance model which fits a mouth contour to the image. By segmenting its bounding box we obtain a mouth template. Next, considering the symmetric nature of the mouth, we divide the template into right and left halves. Thus, our system makes use of three templates. We track the mouth in the following frames using normalized correlation for mouth template matching. Changes happening in the mouth region are directly described by the correlation value, i.e., the appearance of the tongue in the surface of the mouth will cause a decrease in the correlation coefficient through time. These coefficients are used for detecting the tongue protrusion. The right and left tongue protrusion positions will be detected by analyzing similarity changes between the right and left half-mouth templates and the currently tracked ones. Detection rates under the default parameters of our system are 90.20% for the tongue protrusion regardless of the position, and 84.78% for the right and left tongue protrusion positions. Our results demonstrate the feasibility of real-time tongue protrusion detection in vision-based systems and motivates further investigating the usage of this new modality in human-computer communication.
Masakazu MURATA Yoshiaki TANIGUCHI Go HASEGAWA Hirotaka NAKANO
In the present paper, we propose an object tracking method called scenario-type hypothesis object tracking. In the proposed method, an indoor monitoring region is divided into multiple closed micro-cells using sensor nodes that can detect objects and their moving directions. Sensor information is accumulated in a tracking server through wireless multihop networks, and object tracking is performed at the tracking server. In order to estimate the trajectory of objects from sensor information, we introduce a novel concept of the virtual world, which consists of virtual micro-cells and virtual objects. Virtual objects are generated, transferred, and deleted in virtual micro-cells according to sensor information. In order to handle specific movements of objects in micro-cells, such as slowdown of passing objects in a narrow passageway, we also consider the generation of virtual objects according to interactions among virtual objects. In addition, virtual objects are generated when the tracking server estimates loss of sensor information in order to decrease the number of object tracking failures. Through simulations, we confirm that the ratio of successful tracking is improved by up to 29% by considering interactions among virtual objects. Furthermore, the tracking performance is improved up to 6% by considering loss of sensor information.
Xiaolin ZHAO Xin YU Liguo SUN Kangqiao HU Guijin WANG Li ZHANG
Tracking a non-rigid object in a video in the presence of background clutter and partial occlusion is challenging. We propose a non-rigid object-tracking paradigm by repeatedly detecting and associating saliency regions. Saliency region segmentation is operated in each frame. The segmentation results provide rich spatial support for tracking and make the reliable tracking of non-rigid object without drifting possible. The precise object region is obtained simultaneously by associating the saliency region using two independent observers. Our formulation is quite general and other salient-region segmentation algorithms also can be used. Experimental results have shown that such a paradigm can effectively handle tracking problems of objects with rapid movement, rotation and partial occlusion.
Michael PAUL Andrew FINCH Eiichiro SUMITA
This paper proposes an unsupervised word segmentation algorithm that identifies word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches. The method can be applied to any language pair in which the source language is unsegmented and the target language segmentation is known. In the first step, an iterative bootstrap method is applied to learn multiple segmentation schemes that are consistent with the phrasal segmentations of an SMT system trained on the resegmented bitext. In the second step, multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating five Asian languages into English revealed that the proposed method of integrating multiple segmentation schemes outperforms SMT models trained on any of the learned word segmentations and performs comparably to available monolingually built segmentation tools.
Somying THAINIMIT Chirayuth SREECHOLPECH Vuttipong AREEKUL Chee-Hung Henry CHU
Iris recognition is an important biometric method for personal identification. The accuracy of an iris recognition system highly depends on the success of an iris segmentation step. In this paper, a robust and accurate iris segmentation algorithm for closed-up NIR eye images is developed. The proposed method addressed problems of different characteristics of iris databases using local image properties. A precise pupil boundary is located with an adaptive thresholding combined with a gradient-based refinement approach. A new criteria, called a local signal-to-noise ratio (LSNR) of an edge map of an eye image is proposed for localization of the iris's outer boundary. The boundary is modeled with a weighted circular integral of LSNR optimization technique. The proposed method is experimented with multiple iris databases. The obtained results demonstrated that the proposed iris segmentation method is robust and desirable. The proposed method accurately segments iris region, excluding eyelids, eyelashes and light reflections against multiple iris databases without parameter tunings. The proposed iris segmentation method reduced false negative rate of the iris recognition system by half, compared to results obtained using Masek's method.
Sihyoung LEE Sunil CHO Yong Man RO
The active shape model (ASM) has been widely adopted by automated bone segmentation approaches for radiographic images. In radiographic images of the distal radius, multiple edges are often observed in the near vicinity of the bone, typically caused by the presence of thin soft tissue. The presence of multiple edges decreases the segmentation accuracy when segmenting the distal radius using ASM. In this paper, we propose an enhanced distal radius segmentation method that makes use of a modified version of ASM, reducing the number of segmentation errors. To mitigate segmentation errors, the proposed method emphasizes the presence of the bone edge and downplays the presence of a soft tissue edge by making use of Dual energy X-ray absorptiometry (DXA). To verify the effectiveness of the proposed segmentation method, experiments were performed with 30 distal radius patient images. For the images used, compared to ASM-based segmentation, the proposed method improves the segmentation accuracy with 47.4% (from 0.974 mm to 0.512 mm).
Yoshiki YUNBE Masayuki MIYAMA Yoshio MATSUDA
This paper describes an affine motion estimation processor for real-time video segmentation. The processor estimates the dominant motion of a target region with affine parameters. The processor is based on the Pseudo-M-estimator algorithm. Introduction of an image division method and a binary weight method to the original algorithm reduces data traffic and hardware costs. A pixel sampling method is proposed that reduces the clock frequency by 50%. The pixel pipeline architecture and a frame overlap method double throughput. The processor was prototyped on an FPGA; its function and performance were subsequently verified. It was also implemented as an ASIC. The core size is 5.05.0 mm2 in 0.18 µm process, standard cell technology. The ASIC can accommodate a VGA 30 fps video with 120 MHz clock frequency.
Suk Tae SEO In Keun LEE Seo Ho SON Hyong Gun LEE Soon Hak KWON
We propose a simple but effective image segmentation method not based on thresholding but on a merging strategy by evaluating joint probability of gray levels on co-occurrence matrix. The effectiveness of the proposed method is shown through a segmentation experiment.
This paper combines the LBP operator and the active contour model. It introduces a salient gradient vector flow snake (SGVF snake), based on a novel edge map generated from the salient boundary point image (SBP image). The MDGVM criterion process helps to reduce feature detail and background noise as well as retaining the salient boundary points. The resultant SBP image as an edge map gives powerful support to the SGVF snake because of the inherent combination of the intensity, gradient and texture cues. Experiments prove that the MDGVM process has high efficiency in reducing outliers and the SGVF snake is a large improvement over the GVF snake for contour detection, especially in natural images with low contrast and small texture background.
Hyunjin PARK Alfred HERO Peyton BLAND Marc KESSLER Jongbum SEO Charles MEYER
A good abdominal probabilistic atlas can provide important information to guide segmentation and registration applications in the abdomen. Here we build and test probabilistic atlases using 24 abdominal CT scans with available expert manual segmentations. Atlases are built by picking a target and mapping other training scans onto that target and then summing the results into one probabilistic atlas. We improve our previous abdominal atlas by 1) choosing a least biased target as determined by a statistical tool, i.e. multidimensional scaling operating on bending energy, 2) using a better set of control points to model the deformation, and 3) using higher information content CT scans with visible internal liver structures. One atlas is built in the least biased target space and two atlases are built in other target spaces for performance comparisons. The value of an atlas is assessed based on the resulting segmentations; whichever atlas yields the best segmentation performance is considered the better atlas. We consider two segmentation methods of abdominal volumes after registration with the probabilistic atlas: 1) simple segmentation by atlas thresholding and 2) application of a Bayesian maximum a posteriori method. Using jackknifing we measure the atlas-augmented segmentation performance with respect to manual expert segmentation and show that the atlas built in the least biased target space yields better segmentation performance than atlases built in other target spaces.
Mustafa M. SAMI Masahisa SAITO Shogo MURAMATSU Hisakazu KIKUCHI Takashi SAKU
We have developed a new computer-aided diagnostic system for differentiating oral borderline malignancies in hematoxylin-eosin stained microscopic images. Epithelial dysplasia and carcinoma in-situ (CIS) of oral mucosa are two different borderline grades similar to each other, and it is difficult to distinguish between them. A new image processing and analysis method has been applied to a variety of histopathological features and shows the possibility for differentiating the oral cancer borderline grades automatically. The method is based on comparing the drop-shape similarity level in a particular manually selected pair of neighboring rete ridges. It was found that the considered similarity level in dysplasia was higher than those in epithelial CIS, of which pathological diagnoses were conventionally made by pathologists. The developed image processing method showed a good promise for the computer-aided pathological assessment of oral borderline malignancy differentiation in clinical practice.