Aroba KHAN Hernan AGUIRRE Kiyoshi TANAKA
This paper presents two halftoning methods to improve efficiency in generating structurally similar halftone images using Structure Similarity Index Measurement (SSIM). Proposed Method I reduces the pixel evaluation area by applying pixel-swapping algorithm within inter-correlated blocks followed by phase block-shifting. The effect of various initial pixel arrangements is also investigated. Proposed Method II further improves efficiency by applying bit-climbing algorithm within inter-correlated blocks of the image. Simulation results show that proposed Method I improves efficiency as well as image quality by using an appropriate initial pixel arrangement. Proposed Method II reaches a better image quality with fewer evaluations than pixel-swapping algorithm used in Method I and the conventional structure aware halftone methods.
Ali NADIAN GHOMSHEH Alireza TALEBPOUR
In this paper, a new skin detection method using pixel color and image regional information, intended for objectionable image filtering is proposed. The method consists of three stages: skin detection, feature extraction and image classification. Skin detection is implemented in two steps. First, a Sinc function, fitted to skin color distribution in the Cb-Cr chrominance plane is used for detecting pixels with skin color properties. Next, to benefit regional information, based on the theory of color image reproduction, it's shown that the scattering of skin pixels in the RGB color space can be approximated by an exponential function. This function is incorporated to extract the final accurate skin map of the image. As objectionable image features, new shape and direction features, along with area feature are extracted. Finally, a Multi-Layer Perceptron trained with the best set of input features is used for filtering images. Experimental results on a dataset of 1600 images illustrate that the regional method improves the pixel-based skin detection rate by 10%. The final classification result with 94.12% accuracy showed better results when compared to other methods.
Chen LIU Xin JIN Tianruo ZHANG Satoshi GOTO
High-definition (HD) videos become more and more popular on portable devices these years. Due to the resolution mismatch between the HD video sources and the relative low-resolution screens of portable devices, the HD videos are usually fully decoded and then down-sampled (FDDS) for the displays, which not only increase the cost of both computational power and memory bandwidth, but also lose the details of video contents. In this paper, an encoder-unconstrained partial decoding scheme for H.264/AVC is presented to solve the problem by only decoding the object of interest (OOI) related region, which is defined by users. A simplified compression domain tracking method is utilized to ensure that the OOI locates in the center of the display area. The decoded partial area (DPA) adaptation, the reference block relocation (RBR) and co-located temporal Intra prediction (CTIP) methods are proposed to improve the visual quality for the DPA with low complexity. The simulation results show that the proposed partial decoding scheme provides an average of 50.16% decoding time reduction comparing to the fully decoding process. The displayed region also presents the original HD granularity of OOI. The proposed partial decoding scheme is especially useful for displaying HD video on the devices of which the battery life is a crucial factor.
Weiqin YING Xing XU Yuxiang FENG Yu WU
A conical area evolutionary algorithm (CAEA) is presented to further improve computational efficiencies of evolutionary algorithms for bi-objective optimization. CAEA partitions the objective space into a number of conical subregions and then solves a scalar subproblem in each subregion that uses a conical area indicator as its scalar objective. The local Pareto optimality of the solution with the minimal conical area in each subregion is proved. Experimental results on bi-objective problems have shown that CAEA offers a significantly higher computational efficiency than the multi-objective evolutionary algorithm based on decomposition (MOEA/D) while CAEA competes well with MOEA/D in terms of solution quality.
Our research is focused on examining the video quality assessment model based on the MPEG-7 descriptor. Video quality is estimated by using several features based on the predicted frame quality such as average value, worst value, best value, standard deviation, and the predicted frame rate obtained from descriptor information. As a result, assessment of video quality can be conducted with a high prediction accuracy with correlation coefficient=0.94, standard deviation of error=0.24, maximum error=0.68 and outlier ratio=0.23.
Osamu SUGIMOTO Sei NAITO Yoshinori HATORI
In this paper, we propose a novel method of measuring the perceived picture quality of H.264 coded video based on parametric analysis of the coded bitstream. The parametric analysis means that the proposed method utilizes only bitstream parameters to evaluate video quality, while it does not have any access to the baseband signal (pixel level information) of the decoded video. The proposed method extracts quantiser-scale, macro block type and transform coefficients from each macroblock. These parameters are used to calculate spatiotemporal image features to reflect the perception of coding artifacts which have a strong relation to the subjective quality. A computer simulation shows that the proposed method can estimate the subjective quality at a correlation coefficient of 0.923 whereas the PSNR metric, which is referred to as a benchmark, correlates the subjective quality at a correlation coefficient of 0.793.
Seung-Woo HONG Euisin LEE Ho-Yong RYU Sang-Ha KIM
For monitoring of a large-scale continuous object, a large number of sensor nodes might be participated with object detection and tracking. In order to reduce huge quantities of data from the sensor nodes, previous studies focus on representative selection for data reporting to a sink. However, they simply choose representatives among a large number of candidates without consideration of node deployment environments and detection accuracy. Hence, this letter proposes a novel object tracking scheme that first makes a small set of candidates and then chooses a small number of representatives in the set. Also, since the scheme also considers object alteration for representative selection, it can provide high energy-efficiency despite reducing data reporting.
Kento TERAI Daisuke ANZAI Kyesan LEE Kentaro YANAGIHARA Shinsuke HARA
In a wireless multi-hop network between a source node (S) and a destination node (D), multipath routing in which S redundantly sends the same packets to D through multiple routes at the same time is effective for enhancing the reliability of the wireless data transmission by means of route diversity. However, when applying the multipath routing to a factory where huge robots are moving around, if closer multiple routes are selected, the probability that they are blocked by the robots at the same time becomes higher, so the reliability in terms of packet loss rate cannot be enhanced. In this paper, we propose a multipath routing method which can select physically distant multiple routes without any knowledge on the locations of nodes. We introduce a single metric composed of “the distance between routes” and “the route quality” by means of scalarization in multi-objective maximization problem and apply a genetic algorithm (GA) for searching for adequate routes which maximize the metric. Computer simulation results show that the proposed method can adaptively control the topologies of selected routes between S and D, and effectively reduce the packet loss rates.
Xian-Hua HAN Yen-Wei CHEN Xiang RUAN
In this paper, we propose N-Dimensional (ND) Tensor Supervised Neighborhood Embedding (ND TSNE) for discriminant feature representation, which is used for view-based object recognition. ND TSNE uses a general Nth order tensor discriminant and neighborhood-embedding analysis approach for object representation. The benefits of ND TSNE include: (1) a natural way of representing data without losing structure information, i.e., the information about the relative positions of pixels or regions; (2) a reduction in the small sample size problem, which occurs in conventional supervised learning because the number of training samples is much less than the dimensionality of the feature space; (3) preserving a neighborhood structure in tensor feature space for object recognition and a good convergence property in training procedure. With Tensor-subspace features, the random forests is used as a multi-way classifier for object recognition, which is much easier for training and testing compared with multi-way SVM. We demonstrate the performance advantages of our proposed approach over existing techniques using experiments on the COIL-100 and the ETH-80 datasets.
Xinyue ZHAO Yutaka SATOH Hidenori TAKAUJI Shun'ichi KANEKO
This paper presents a novel method for robust object tracking in video sequences using a hybrid feature-based observation model in a particle filtering framework. An ideal observation model should have both high ability to accurately distinguish objects from the background and high reliability to identify the detected objects. Traditional features are better at solving the former problem but weak in solving the latter one. To overcome that, we adopt a robust and dynamic feature called Grayscale Arranging Pairs (GAP), which has high discriminative ability even under conditions of severe illumination variation and dynamic background elements. Together with the GAP feature, we also adopt the color histogram feature in order to take advantage of traditional features in resolving the first problem. At the same time, an efficient and simple integration method is used to combine the GAP feature with color information. Comparative experiments demonstrate that object tracking with our integrated features performs well even when objects go across complex backgrounds.
Qingyi GU Takeshi TAKAKI Idaku ISHII
We describe a cell-based connected component labeling algorithm to calculate the 0th and 1st moment features as the attributes for labeled regions. These can be used to indicate their sizes and positions for multi-object extraction. Based on the additivity in moment features, the cell-based labeling algorithm can label divided cells of a certain size in an image by scanning the image only once to obtain the moment features of the labeled regions with remarkably reduced computational complexity and memory consumption for labeling. Our algorithm is a simple-one-time-scan cell-based labeling algorithm, which is suitable for hardware and parallel implementation. We also compared it with conventional labeling algorithms. The experimental results showed that our algorithm is faster than conventional raster-scan labeling algorithms.
Norimichi UKITA Kunihito TERASHITA Masatsugu KIDODE
We propose a method for calibrating the topology of distributed pan-tilt cameras (i.e. the structure of routes among and within FOVs) and its probabilistic model. To observe as many objects as possible for as long as possible, pan-tilt control is an important issue in automatic calibration as well as in tracking. In a calibration period, each camera should be controlled towards an object that goes through an unreliable route whose topology is not calibrated yet. This camera control allows us to efficiently establish the topology model. After the topology model is established, the camera should be directed towards the route with the biggest possibility of object observation. We propose a camera control framework based on the mixture of the reliability of the estimated routes and the probability of object observation. This framework is applicable both to camera calibration and object tracking by adjusting weight variables. Experiments demonstrate the efficiency of our camera control scheme for establishing the camera topology model and tracking objects as long as possible.
We propose a motion detection model, which is suitable for higher speed operation than the video rate, inspired by the neuronal propagation in the hippocampus in the brain. The model detects motion of edges, which are extracted from monocular image sequences, on specified 2D maps without image matching. We introduce gating units into a CA3-CA1 model, where CA3 and CA1 are the names of hippocampal regions. We use the function of gating units to reduce mismatching for applying our model in complicated situations. We also propose a map-division method to achieve accurate detection. We have evaluated the performance of the proposed model by using artificial and real image sequences. The results show that the proposed model can run up to 1.0 ms/frame if using a resolution of 6460 units division of 320240 pixels image. The detection rate of moving edges is achieved about 99% under a complicated situation. We have also verified that the proposed model can achieve accurate detection of approaching objects at high frame rate (>100 fps), which is better than conventional models, provided we can obtain accurate positions of image features and filter out the origins of false positive results in the post-processing.
Shayma ALKOBAISI Wan D. BAE Sada NARAYANAPPA
The increase in the advanced location based services such as traffic coordination and management necessitates the need for advanced models tracking the positions of Moving Objects (MOs) like vehicles. Due to computer processing limitations, it is impossible for MOs to continuously update their locations. This results in the uncertainty nature of a MO's location between any two reported positions. Efficiently managing and quantifying the uncertainty regions of MOs are needed in order to support different types of queries and to improve query response time. This challenging problem of modeling uncertainty regions associated with MO was recently addressed by researchers and resulted in models that ranged from linear which require few properties of MOs as input to the models, to non-linear that are able to more accurately represent uncertainty regions by considering higher degree input. This paper summarizes and discusses approaches in modeling uncertainty regions associated with MOs. It further illustrates the need for appropriate approximations especially in the case of non-linear models as the uncertainty regions become rather irregularly shaped and difficult to manage. Finally, we demonstrate through several experimental sets the advantage of non-linear models over linear models when the uncertainty regions of MOs are approximated by two different approximations; the Minimum Bounding Box (MBB) and the Tilted Minimum Bounding Box (TMBB).
Quan MIAO Guijin WANG Xinggang LIN
Object tracking is a major technique in image processing and computer vision. Tracking speed will directly determine the quality of applications. This paper presents a parallel implementation for a recently proposed scale- and rotation-invariant on-line object tracking system. The algorithm is based on NVIDIA's Graphics Processing Units (GPU) using Compute Unified Device Architecture (CUDA), following the model of single instruction multiple threads. Specifically, we analyze the original algorithm and propose the GPU-based parallel design. Emphasis is placed on exploiting the data parallelism and memory usage. In addition, we apply optimization technique to maximize the utilization of NVIDIA's GPU and reduce the data transfer time. Experimental results show that our GPGPU-based method running on a GTX480 graphics card could achieve up to 12X speed-up compared with the efficiency equivalence on an Intel E8400 3.0 GHz CPU, including I/O time.
We revisit the problem with generic object recognition from the point of view of human-computer interaction. While many existing algorithms for generic object recognition first try to detect target objects before features are extracted and classified in processing, our work is motivated by the belief that solving the task of detection by computer is not always necessary in many practical situations, such as those involving mobile recognition systems with touch displays and cameras. It is natural for these systems to ask users to input the segmentation data for targets through their touch displays. Speaking from the perspective of usability, such systems should involve rough segmentation to reduce the user workload. In this situation, different people would provide different segmentation data. Here, an interesting question arises – if multiple training samples are generated from a single image by using various segmentation data created by different people, what would happen to the accuracy of classification? We created “20 wild bird datasets” that had a large number of rough segmentation datasets made by 383 people in an attempt to answer this question. Our experiments revealed two interesting facts: (i) generating multiple training samples from a single image had positive effects on classification accuracies, especially when image features including spatial information were used and (ii) augmenting training samples with artificial segmentation data synthesized with a morphing technique also had slightly positive effects on classification accuracies.
Gibran FUENTES PINEDA Hisashi KOGA Toshinori WATANABE
We present a scalable approach to automatically discovering particular objects (as opposed to object categories) from a set of images. The basic idea is to search for local image features that consistently appear in the same images under the assumption that such co-occurring features underlie the same object. We first represent each image in the set as a set of visual words (vector quantized local image features) and construct an inverted file to memorize the set of images in which each visual word appears. Then, our object discovery method proceeds by searching the inverted file and extracting visual word sets whose elements tend to appear in the same images; such visual word sets are called co-occurring word sets. Because of unstable and polysemous visual words, a co-occurring word set typically represents only a part of an object. We observe that co-occurring word sets associated with the same object often share many visual words with one another. Hence, to obtain the object models, we further cluster highly overlapping co-occurring word sets in an agglomerative manner. Remarkably, we accelerate both extraction and clustering of co-occurring word sets by Min-Hashing. We show that the models generated by our method can effectively discriminate particular objects. We demonstrate our method on the Oxford buildings dataset. In a quantitative evaluation using a set of ground truth landmarks, our method achieved higher scores than the state-of-the-art methods.
Dan-ni AI Xian-hua HAN Guifang DUAN Xiang RUAN Yen-wei CHEN
This paper addresses the problem of ordering the color SIFT descriptors in the independent component analysis for image classification. Component ordering is of great importance for image classification, since it is the foundation of feature selection. To select distinctive and compact independent components (IC) of the color SIFT descriptors, we propose two ordering approaches based on local variation, named as the localization-based IC ordering and the sparseness-based IC ordering. We evaluate the performance of proposed methods, the conventional IC selection method (global variation based components selection) and original color SIFT descriptors on object and scene databases, and obtain the following two main results. First, the proposed methods are able to obtain acceptable classification results in comparison with original color SIFT descriptors. Second, the highest classification rate can be obtained by using the global selection method in the scene database, while the local ordering methods give the best performance for the object database.
In this paper, a new power control scheme is proposed to maximize the network throughput with fairness provisioning. Based on the Stackelberg game model, the proposed scheme consists of two control mechanisms; user-level and system-level mechanisms. Control decisions in each mechanism act cooperatively and collaborate with each other to satisfy efficiency and fairness requirements. Simulation results demonstrate that the proposed scheme has excellent network performance, while other schemes cannot offer such an attractive performance balance.
Jeong-Hun SEO Inyong CHOI Sang Bae CHON Koeng-Mo SUNG
The adequate evaluation of sound quality is an important issue for the lossy compression codecs, such as MP3. ITU-R Rec BS. 1387-1 (PEAQ – Perceptual Evaluation of Audio Quality) is the most widely used method to evaluate sound quality objectively. However, PEAQ can only be used for mono signals or two channel stereo signals, because it considers only timbral factors when assessing sound quality. This paper introduces an improved objective quality assessment method that can be used for mono signals and multichannel audio signals that considers both “spatial” and “timbral” factors. The “spatial” factors, which measure perceptual distortions in spatial impression, are important to evaluate the quality of multichannel sounds.