Ching-Tang HSIEH Mu-Chun SU Chih-Hsu HSU
For reducing requirement of large memory and minimizing computation complexity in a large-vocabulary continuous speech recognition system, speech segmentation plays an important role in speech recognition systems. In this paper, we formulate the speech segmentation as a two-phase problem. Phase 1 (frame labeling) involves labeling frames of speech data. Frames are classified into three types: (1) silence, (2) consonant and (3) vowel according to two segmentation features. In phase 2 (syllabic unit segmentation) we apply the concept of transition states to segment continuous speech data into syllabic units based on the labeled frames. The novel class of hyperrectangular composite neural networks (HRCNNs) is used to cluster frames. The HRCNNs integrate the rule-based approach and neural network paradigms, therefore, this special hybrid system may neutralize the disadvantages of each alternative. The parameters of the trained HRCNNs are utilized to extract both crisp and fuzzy classification rules. In our experiments, a database containing continuous reading-rate Mandarin speech recorded from newscast was utilized to illustrate the performance of the proposed speaker independent speech segmentation system. The effectiveness of the proposed segmentation system is confirmed by the experimental results.
Takayuki YASUNO Satoshi SUZUKI Yasuhiko YASUDA
Three dimensional model based coding methods are proposed as next generation image coding methods. These new representations need 3D reconstruction techniques. This paper presents a method that extracts the surfaces of static objects that occlude other objects from a spatiotemporal image captured with straight-line camera motion. We propose the concept of occlusion types and show that the occlusion types are restricted to only eight patterns. Furthermore, we show occlusion type pairs contain information that confirms the existence of surfaces. Occlusion information gives strong cues for segmentation and representation. The method can estimate not only the 3D positions of edge points but also the surfaces bounded by the edge points. We show that combinations of occlusion types contain information that can confirm surface existence. The method was tested successfully on real images by reconstructing flat and curved surfaces. Videos can be hierarchically structured with the method. The method makes various applications possible, such as object selective image communication and object selective video editing.
Hirobumi YAMADA Yasuaki NAKANO
This paper proposes a method for cursive handwritten word recognition. Cursive word recognition generally consists of segmentation of a cursive word, character recognition and word recognition. Traditional approaches detect one candidate of segmentation point between characters, and cut the touching characters at the point [1]. But, it is difficult to detect a correct segmentation point between characters in cursive word, because form of touching characters varies greatly by cases. In this research, we determine multiple candidates as segmentation points between characters. Character recognition and word recognition decide which candidate is the most plausible touching point. As a result of the experiment, at the character recognition stage, recognition rate was 75.7%, while cumulative recognition rate within best three candidates was 93.7%. In word recognition, recognition rate was 79.8%, while cumulative recognition rate within best five candidates was 91.7% when lexicon size is 50. The processing speed is about 30 sec/word on SPARC station 5.
Satoshi NAOI Misako SUWA Maki YABUKI
The global interpolation method we proposed can extract a handwritten alpha-numeric character pattern even if it overlaps a border. Our method interpolates blank segments in a character after borders are removed by evaluating segment pattern continuity and connectedness globally to produce characters with smooth edges. The main feature of this method is to evaluate global component label connectivity as pattern connectedness. However, it is impossible for the method to interpolate missing superpositioning loop segments, because they lack segment pattern continuity and they have already had global component label connectivity. To solve this problem, we improved the method by adding loop interpolation as a global evaluation. The evaluation of character segment continuity is also improved to achieve higher quality character patterns. There is no database of overlapping characters, so we also propose an evaluation method which generates various kinds of overlapping numerals from an ETL database. Experimental results using these generated patterns showed that the improved global interpolation method is very effective for numbers that overlap a border.
Rachid SAMMOUDA Noboru NIKI Hiromu NISHITANI
In this paper, we present some contributions to improve a previous work's approach presented for the segmentation of magnetic resonance images of the human brain, based on the unsupervised Hopfield neural network. We formulate the segmentation problem as a minimization of an energy function constructed with two terms, the cost-term as a sum of errors' squares, and the second term is a temporary noise added to the cost-term as an excitation to the network to escape from certain local minimums and be more close to the global minimum. Also, to ensure the convergence of the network and its utility in clinic with useful results, the minimization is achieved with a step function permitting the network to reach its stability corresponding to a local minimum close to the global minimum in a prespecified period of time. We present here our approach segmentations results of a patient data diagnosed with a metastatic tumor in the brain, and we compare them to those obtained based on, previous works using Hopfield neural networks, Boltzmann machine and the conventional ISODATA clustering technique.
AbdelMalek B.C. ZIDOURI Supoj CHINVEERAPHAN Makoto SATO
In this paper we describa a system for Off-line Recognition of Arabic characters and Numerals. This is based on expressing the machine printed Arabic alpha-numerical text in terms of strokes obtained by MCR (Minimum Covering Run) expression. The strokes are rendered meaningful by a labeling process. They are used to detect the baseline and to provide necessary features for recognition. The features selected proved to be effective to the extent that with simple right to left analysis we could achieve interesting results. The recognition is achieved by matching to reference prototypes designed for the 28 Arabic characters and 10 numerals. The recognition rate is 97%.
A new motion field segmentation algorithm under the 8-parameters motion model is presented which uses a multipass iterative region-refining techinique. The iterative region-refining module consists of a seed block detection and subsequent region-refining iterations. An initial estimate of an object motion is provided in the seed block detection process. This initial estimate is iteratively updated and approaches to a reliable mapping parameter set in region-refining process. A multipass composition of the module makes it possible to detect multiple motions in a scene. Our simulation results confirm that the proposed method successfully partitions an image into independently moving objects with allowable computation time.
This paper proposes an efficient clustering algorithm for region merging. To speed up the search of the best pair of regions which is merged into one region, dissimilarity values of all possible pairs of regions are stored in a heap. Then the best pair can be found as the element of the root node of the binary tree corresponding to the heap. Since only adjacent pairs of regions are possible to be merged in image segmentation, this constraints of neighboring relations are represented by sorted linked lists. Then we can reduce the computation for updating the dissimilarity values and neighboring relations which are influenced by the merging of the best pair. The proposed algorithm is applied to the segmentations of a monochrome image and range images.
Satoshi NAOI Maki YABUKI Atsuko ASAKAWA Yoshinobu HOTTA
The global interpolation method we propose evaluates segment pattern continuity and connectedness to produce characters with smooth edges while interpreting blank or missing segments based on global label connectivities, e.g, in extracting a handwritten character overlapping a border, correctly. Conventional character segmentation involving overlapping a border concentrates on removing the thin border based on known format information rather than extracting the character. This generates discontinuous segments which produce distortion due to thinning and errors in direction codes, and is the problem to recognize the extracted character. In our method, characters contacting a border are extracted after the border itself is labeled and removed automatically by devising how to extract wavy and oblique borders involved in fax communication. The absence of character segments is then interpolated based on segment continuity. Interpolated segments are relabeled and checked for matching against the original labeled pattern. If a match cannot be made, segments are reinterpolated until they can be identified. Experimental results show that global interpolation interprets the absence of character segments correctly and generates with smooth edges.
Haisong GU Yoshiaki SHIRAI Minoru ASADA
This paper presents a method for spatial and temporal segmentation of long image sequences which include multiple independently moving objects, based on the Minimum Description Length (MDL) principle. By obtaining an optimal motion description, we extract spatiotemporal (ST) segments in the image sequence, each of which consists of edge segments with similar motions. First, we construct a family of 2D motion models, each of which is completely determined by its specified set of equations. Then, based on these sets of equations we formulate the motion description length in a long sequence. The motion state of one object at one moment is determined by finding the model with shortest description length. Temporal segmentation is carried out when the motion state is found to have changed. At the same time, the spatial segmentation is globally optimized in such a way that the motion description of the entire scene reaches a minimum.
Eiji OHIRA Hirohiko SAGAWA Tomoko SAKIYAMA Masaru OHKI
This paper discusses sign word segmentation methods and extraction of motion features for sign language recognition. Because Japanese sign language grammar has not yet been systematized and because sign language does not have prepositions, it is more difficult to use grammar and meaning information in sign language recognition than in speech recognition. Segmentation significantly improves recognition efficiency, so we propose a method of dividing sign language based on rests and on the envelope and minimum of motion speed. The sign unit corresponding to a sign word is detected based on the divided position using such features as the change of hand shape. Experiments confirmed the validity of word segmentation of sign language based on the temporal structure of motion.
This paper describes a segmentation method of liver structure from abdominal CT images using a three–layered neural network (NN). Before the NN segmentation, preprocessing is employed to locally enhance the contrast of the region of interest. Postprocessing is also automatically applied after the NN segmentation in order to remove the unwanted spots and smooth the detected boundary. To evaluate the performance of the proposed method, the NN–determined boundaries are compared with those traced by two highly trained surgeons. Our preliminary results show that the proposed method has potential utility in automatic segmentation of liver structure and other organs in the human body.
Mitsu YOSHIMURA Tatsuro SHIMIZU Isao YOSHIMURA
An automatic zip code recognition system for Japanese mail is proposed in this paper. It is assumed that a zip code is composed of three numerals and requited to be written in a specified frame. In actual images, however, the three numerals sometimes extend outside the specified frame and are not clearly separated. Considering this situation, the authors devised a system with two stages, the segmentation stage and the recognition stage. The segmentation stage consists of five steps: setting and adjusting of initial areas for numeral images (figures), calculation of the center of gravity of each figure, search for the horizontal and vertical boundaries of each figure, determination of the final area for each figure, and normalization of the figure in each final area. In the recognition stage, the Localized Arc Pattern Method (Arc method) proposed by Yoshimura et al. (1991) is implemented hierarchically; that is, a simple Arc method is applied first to each figure and a more complex one is applied subsequently unless the figure is identified in the first step. In the recognition process, every figure is judged as a numeral or otherwise rejected. The proposed system was applied to a database provided by the Institute for Post and Telecommunications Policy (IPTP). The segmentation algorithm yielded an adequate result. The recognition algorithm yielded scores as high as 90.6% in correct recognition rate and 0.7% in error rate. The best score of the precision index (P-index) specified by the IPTP was as low as 15.7 for the above mentioned IPTP database, while the score for another IPTP database was 16.9.
Kei TAKIZAWA Daisaku ARITA Michihiko MINOH Katsuo IKEDA
A method for extracting and recognizing character strings from unformed document images, which have inclined character strings and have no structure at all, is described. To process such kinds of unformed documents, previous schemes, which are intended only to deal with documents containing nothing but horizontal or vertical strings of characters, do not work well. Our method is based on the idea that the processes of recognition and extraction of character patterns should operate together, and on the characteristic that the character patterns are located close to each other when they belong to the same string. The method has been implemented and applied to several images. The experimental results show the robustness of our method.
Takashi SAITOH Toshifumi YAMAAI Michiyoshi TACHIKAWA
A system for segmentation of document image and ordering text areas is described, and applied to complex printed page layouts of both Japanese and English. There is no need to make any assumptions about the shape of blocks, hence the segmentation technique can handle not only skewed images without skew-correction but also documents where columns are not rectangular. In this technique, based on the bottom-up strategy, the connected components are extracted from the reduced image, and classiferd according to their local information. The connected components calssified as characters are then merged into lines, and the lines are merged into areas. Extracted text areas are classified as body, caption, header or footer. A tree graph of the layout of the body texts is made, and the texts ordered by preorder traversal on the graph. We introduce the concept of an influence range of each node, a procedure for handling titles, thus obtaining good results on various documents. The total system is fast and compact.
Hsiao-Jing CHEN Yoshiaki SHIRAI
A method is presented to perform image segmentation by accumulatively observing apparent motion in a long image sequence of a dynamic scene. In each image in the sequence, locations are grouped into small patches of approximately uniform optical flow. To reduce the noise in computed flow vectors, a local image motion vector of each patch is computed by averaging flow vectors in the corresponding patches in several images. A segment contains patches belonging to the same 3-D plane in the scene. Initial segments are obtained in the image, and then an attempt to merge or split segments is iterated to update the segments. In order to remove inherent ambiguities in motion-based segmentation, temporal coherence between the local image motion of a patch and the apprent motion of every plane is investigated over long time. In each image, a patch is grouped into the segment of the plane whose apparent motion is temporally most coherent with the local image motion of the patch. When apparent motions of two planes are temporally incoherent, segments of the planes are retained as individual ones.
Analysis of satellite images requires classificatio of image objects. Since different categories may have almost the same brightness or feature in high dimensional remote sensing data, many object categories overlap with each other. How to segment the object categories accurately is still an open question. It is widely recognized that the assumptions required by many classification methods (maximum likelihood estimation, etc.) are suspect for textural features based on image pixel brightness. We propose an image feature based neural network approach for the segmentation of AVHRR images. The learning algoriothm is a modified backpropagation with gain and weight decay, since feedforward networks using the backpropagation algorithm have been generally successful and enjoy wide popularity. Destructive algorithms that adapt the neural architecture during the training have been developed. The classification accuracy of 100% is reached for a validation data set. Classification result is compared with that of Kohonen's LVQ and basic backpropagation algorithm based pixel-by-pixel method. Visual investigation of the result images shows that our method can not only distinguish the categories with similar signatures very well, but also is robustic to noise.
A method of range image segmentation using four Markov random field(MRF)s is described in this paper. MRFs are used in depth smoothing, gradient smoothing, edge detection and surface type labeling stage. First, range and its gradient images are smoothed preserving jump and roof edges respectively using line process concept one after another. Then jump and roof edges are extracted, combined and refined using penalizing undesirable edge patterns. Finally, curvatures are computed and the surface types are labeled according to the signs of principal curvatures. The surface type labels are refined using winner-takes-all layers in the stage. The final output is a set of regions with its exact surface type. The energy function is used in order to represent constraints of each stage and the minimum energy state is found using iterative method. Several experimental results show the generality of our approach and the execution speed of the proposed method is faster than that of a typical region merging method. This promises practical applications of our method.
Hironori OKII Noriaki KANEKI Hiroshi HARA Koichi ONO
This paper describes a color segmentation method which is essential for automatic diagnosis of stained images. This method is applicable to the variance of input images using a three-layered neural network model. In this network, a back-propagation algorithm was used for learning, and the training data sets of RGB values were selected between the dark and bright images of normal mammary glands. Features of both normal mammary glands and breast cancer tissues stained with hematoxylin-eosin (HE) staining were segmented into three colors. Segmented results indicate that this network model can successfully extract features at various brightness levels and magnifications as long as HE staining is used. Thus, this color segmentation method can accommodate change in brightness levels as well as hue values of input images. Moreover, this method is effective to the variance of scaling and rotation of extracting targets.
Hsiao-Jing CHEN Yoshiaki SHIRAI Minoru ASADA
A method for detecting multiple rigid motions in images from an optical flow field obtained with multi-scale, multi-orientation filters is proposed. Convolving consecutive gray scale images with a set of eight orientation-selective spatial Gaussian filters yields eight gradient constraint equations for the two components of a flow vector at every location. The flow vector and an uncertainty measure are obtained from these equations. In the neighborhood of motion boundary, the uncertainty of the flow vectors increase. By using multiple sets of filters of different scales, multiple flow vectors are obtained at every location, from which the one with minimal uncertainty measure is selected. The obtained flow field is then segmented in order to solve the aperture problem and to remove noise without blurring discontinuity in the flow field. Discontinuities are first detected as those locations where flow vectors have relatively larger uncertainty measures. Then similar flow vectors are gouped into regions. By modeling flow vectors, regions are merged to form segments each of which belongs to a planar patch of a rigid object in the scene.