The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PAR(2741hit)

821-840hit(2741hit)

  • Locality-Constrained Multi-Task Joint Sparse Representation for Image Classification

    Lihua GUO  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E96-D No:9
      Page(s):
    2177-2181

    In the image classification applications, the test sample with multiple man-handcrafted descriptions can be sparsely represented by a few training subjects. Our paper is motivated by the success of multi-task joint sparse representation (MTJSR), and considers that the different modalities of features not only have the constraint of joint sparsity across different tasks, but also have the constraint of local manifold structure across different features. We introduce the constraint of local manifold structure into the MTJSR framework, and propose the Locality-constrained multi-task joint sparse representation method (LC-MTJSR). During the optimization of the formulated objective, the stochastic gradient descent method is used to guarantee fast convergence rate, which is essential for large-scale image categorization. Experiments on several challenging object classification datasets show that our proposed algorithm is better than the MTJSR, and is competitive with the state-of-the-art multiple kernel learning methods.

  • Study of a Multiuser Resource Allocation Scheme for a 2-Hop OFDMA Virtual Cellular Network

    Gerard J. PARAISON  Eisuke KUDOH  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E96-B No:8
      Page(s):
    2112-2118

    In the next generation mobile network, the demand for high data rate transmission will require an increase in the transmission power if the current mobile cellular network architecture is used. Multihop networks are considered to be a key solution to this problem. However, a new resource allocation algorithm is also required for the new network architecture. In this paper, we propose a resource allocation scheme for a parallel relay 2-hop OFDMA virtual cellular network (VCN) which can be applied in a multiuser environment. We evaluate, by computer simulation, the ergodic channel capacity of the VCN using the proposed algorithm, and compare the results with those of the conventional single hop network (SHN). In addition, we analyze the effect of the location of the relay wireless ports on the ergodic channel capacity of the VCN. We also study the degree of fairness of the VCN, using the proposed scheme, compared with that of the SHN. For low transmission power, the simulation results show: a) the VCN can provide a better ergodic channel capacity and a better degree of fairness than the SHN, b) the distance ratio for which the ergodic channel capacity of the VCN is maximal can be found in the interval 0.20.3, c) the ergodic channel capacity of the VCN remains better than that of the SHN as the number of users increases, and d) as the distance between the relay WPs and the base station increases, the channel capacity of VCN approaches that of the SHN.

  • Proximity Based Object Segmentation in Natural Color Images Using the Level Set Method

    Tran Lan Anh NGUYEN  Gueesang LEE  

     
    PAPER-Image

      Vol:
    E96-A No:8
      Page(s):
    1744-1751

    Segmenting indicated objects from natural color images remains a challenging problem for researches of image processing. In this paper, a novel level set approach is presented, to address this issue. In this segmentation algorithm, a contour that lies inside a particular region of the concerned object is first initialized by a user. The level set model is then applied, to extract the object of arbitrary shape and size containing this initial region. Constrained on the position of the initial contour, our proposed framework combines two particular energy terms, namely local and global energy, in its energy functional, to control movement of the contour toward object boundaries. These energy terms are mainly based on graph partitioning active contour models and Bhattacharyya flow, respectively. Its flow describes dissimilarities, measuring correlative relationships between the region of interest and surroundings. The experimental results obtained from our image collection show that the suggested method yields accurate and good performance, or better than a number of segmentation algorithms, when applied to various natural images.

  • Two Dimensional M-Channel Non-separable Filter Banks Based on Cosine Modulated Filter Banks with Diagonal Shifts

    Taichi YOSHIDA  Seisuke KYOCHI  Masaaki IKEHARA  

     
    PAPER-Digital Signal Processing

      Vol:
    E96-A No:8
      Page(s):
    1685-1694

    In this paper, we propose a new class of two dimensional (2D) M-channel (M-ch) non-separable filter banks (FBs) based on cosine modulated filter banks (CMFBs) via a new diagonally modulation scheme. Until now, many researchers have proposed 2D non-separable CMFBs. Nevertheless, efficient direction-selective CMFBs have not been yet. Thanks to our new modulations with diagonal shifts, proposed CMFBs have several frequency supports including direction-selective ones which cannot be realized by conventional ones. In a simulation, we show design examples of proposed CMFBs and their various directional frequency supports.

  • On-Line Model Parameter Estimations for Time-Delay Systems

    Jung Hun PARK  Soohee HAN  Bokyu KWON  

     
    LETTER-Fundamentals of Information Systems

      Vol:
    E96-D No:8
      Page(s):
    1867-1870

    This paper concerns a problem of on-line model parameter estimations for multiple time-delay systems. In order to estimate unknown model parameters from measured state variables, we propose two schemes using Lyapunov's direct method, called parallel and series-parallel model estimators. It is shown through a numerical example that the proposed parallel and series-parallel model estimators can be effective when sufficiently rich inputs are applied.

  • Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC

    Gugang GAO  Peng CAO  Jun YANG  Longxing SHI  

     
    PAPER-Application

      Vol:
    E96-D No:8
      Page(s):
    1654-1666

    One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 44, parallel processing order for deblocking. With our mapping strategies, we improved the algorithm's performance on REMUS-II. For example, with a luma 1616 MB, the Hybrid VBSMC achieves 4 times greater performance than VBSMC and 2.2 times greater performance than fixed 44 partition approach. Finally, we achieve 1080p@33fps H.264 high-profile (HiP)@level 4.1 decoding when the working frequency of REMUS-II is 200 MHz. Compared with typical hardware platforms, we can achieve better performance, area, and flexibility. For example, our performance achieves approximately 175% improvement than that of a commercial CGRA processor XPP-III while only using 70% of its area.

  • High Throughput Parallelization of AES-CTR Algorithm

    Nhat-Phuong TRAN  Myungho LEE  Sugwon HONG  Seung-Jae LEE  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E96-D No:8
      Page(s):
    1685-1695

    Data encryption and decryption are common operations in network-based application programs that must offer security. In order to keep pace with the high data input rate of network-based applications such as the multimedia data streaming, real-time processing of the data encryption/decryption is crucial. In this paper, we propose a new parallelization approach to improve the throughput performance for the de-facto standard data encryption and decryption algorithm, AES-CTR (Counter mode of AES). The new approach extends the size of the block encrypted at one time across the unit block boundaries, thus effectively encrypting multiple unit blocks at the same time. This reduces the associated parallelization overheads such as the number of procedure calls, the scheduling and the synchronizations compared with previous approaches. Therefore, this leads to significant throughput performance improvements on a computing platform with a general-purpose multi-core processor and a Graphic Processing Unit (GPU).

  • Creating Chinese-English Comparable Corpora

    Degen HUANG  Shanshan WANG  Fuji REN  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:8
      Page(s):
    1853-1861

    Comparable Corpora are valuable resources for many NLP applications, and extensive research has been done on information mining based on comparable corpora in recent years. While there are not enough large-scale available public comparable corpora at present, this paper presents a bi-directional CLIR-based method for creating comparable corpora from two independent news collections in different languages. The original Chinese document collections and English documents collections are crawled from XinHuaNet respectively and formatted in a consistent manner. For each document from the two collections, the best query keywords are extracted to represent the essential content of the document, and then the keywords are translated into the language of the other collection. The translated queries are run against the collection in the same language to pick up the candidate documents in the other language and candidates are aligned based on their publication dates and the similarity scores. Results show that our approach significantly outperforms previous approaches to the construction of Chinese-English comparable corpora.

  • Fast Iterative Mining Using Sparsity-Inducing Loss Functions

    Hiroto SAIGO  Hisashi KASHIMA  Koji TSUDA  

     
    PAPER-Pattern Recognition

      Vol:
    E96-D No:8
      Page(s):
    1766-1773

    Apriori-based mining algorithms enumerate frequent patterns efficiently, but the resulting large number of patterns makes it difficult to directly apply subsequent learning tasks. Recently, efficient iterative methods are proposed for mining discriminative patterns for classification and regression. These methods iteratively execute discriminative pattern mining algorithm and update example weights to emphasize on examples which received large errors in the previous iteration. In this paper, we study a family of loss functions that induces sparsity on example weights. Most of the resulting example weights become zeros, so we can eliminate those examples from discriminative pattern mining, leading to a significant decrease in search space and time. In computational experiments we compare and evaluate various loss functions in terms of the amount of sparsity induced and resulting speed-up obtained.

  • A Control Method of Dynamic Selfish Routing Based on a State-Dependent Tax

    Takafumi KANAZAWA  Takurou MISAKA  Toshimitsu USHIO  

     
    PAPER-Concurrent Systems

      Vol:
    E96-A No:8
      Page(s):
    1794-1802

    A selfish routing game is a simple model of selfish behaviors in networks. It is called that Braess's paradox occurs in the selfish routing game if an equilibrium flow achieved by players' selfish behaviors is not the optimal minimum latency flow. In order to make the minimum latency flow a Nash equilibrium, a marginal cost tax has been proposed. Braess graphs have also been proposed to discuss Braess's paradox. In a large population of selfish players, conflicts between purposes of each player and the population causes social dilemmas. In game theory, to resolve the social dilemmas, a capitation tax and/or a subsidy has been introduced, and players' dynamical behaviors have been formulated by replicator dynamics. In this paper, we formulate replicator dynamics in the Braess graphs and investigate stability of the minimum latency flow with and without the marginal cost tax. An additional latency caused by the marginal cost tax is also shown. To resolve the problem of the additional latency, we extend the capitation tax and the subsidy to a state-dependent tax and apply it to the stabilization problem of the minimum latency flow.

  • Skew Estimation by Parts

    Soma SHIRAISHI  Yaokai FENG  Seiichi UCHIDA  

     
    PAPER-Pattern Recognition

      Vol:
    E96-D No:7
      Page(s):
    1503-1512

    This paper proposes a new part-based approach for skew estimation of document images. The proposed method first estimates skew angles on rather small areas, which are the local parts of characters, and subsequently determines the global skew angle by aggregating those local estimations. A local skew estimation on a part of a skewed character is performed by finding an identical part from prepared upright character images and calculating the angular difference. Specifically, a keypoint detector (e.g. SURF) is used to determine the local parts of characters, and once the parts are described as feature vectors, a nearest neighbor search is conducted in the instance database to identify the parts. Finally, a local skew estimation is acquired by calculating the difference of the dominant angles of brightness gradient of the parts. After the local skew estimation, the global skew angle is estimated by the majority voting of those local estimations, disregarding some noisy estimations. Our experiments have shown that the proposed method is more robust to short and sparse text lines and non-text backgrounds in document images compared to conventional methods.

  • Operational Performance of an Optical Serial-to-Parallel Converter Based on a Mach-Zehnder Delay Interferometer and a Phase-Shifted Preamble for DPSK-Formatted Signals

    Kotaro NEGISHI  Hiroyuki UENOHARA  

     
    PAPER

      Vol:
    E96-C No:7
      Page(s):
    1012-1018

    We have investigated the operational performance of an optical serial-to-parallel conversion scheme using a phase-shifted preamble handling optical packets formatted by differential phase shift keying (DPSK) for integrated optical serial-to-parallel converter (OSPC). The same architecture for on-off keyed signals, based on a transmitter-side preamble at the top of the packet and phase-shifted by π/2, which is then -π/2 phase-biased with a Mach-Zehnder delay interferometer (MZDI), is available for binary and differential PSK signals. The delay length of these signals is determined by the relative timing positions of the gated bit and a balanced receiver-side photodetector. We simulated the operational performance of this scheme and its tolerance against the degree of modulation and optical chirp, with our results showing that a phase shift of more than 0.94π is required in order to attain a suppression ratio in the OSPC output consistent with a bit error rate of less than 10-9 (based on the ratio of intensity of the extracted bit to the maximum peak intensity of the cancelled bits using a single-arm phase modulator). However, by using a Mach-Zehnder phase modulator, the modulation angle can be relaxed to about 0.36π. Experimental investigation of the OSPC showed that its functional tolerance with respect to the modulation angle agreed well with the simulated values. Finally, we performed optical label processing using the OSPC in conjunction with an address table, and our results confirmed the potential of the OSPC for use in label recognition.

  • Accurate Imaging Method for Moving Target with Arbitrary Shape for Multi-Static UWB Radar

    Ryo YAMAGUCHI  Shouhei KIDERA  Tetsuo KIRIMOTO  

     
    PAPER-Sensing

      Vol:
    E96-B No:7
      Page(s):
    2014-2023

    Ultra-wideband pulse radar is a promising technology for the imaging sensors of rescue robots operating in disaster scenarios, where optical sensors are not applicable because of thick smog or high-density gas. For the above application, while one promising ultra-wideband radar imaging algorithm for a target with arbitrary motion has already been proposed with a compact observation model, it is based on an ellipsoidal approximation of the target boundary, and is difficult to apply to complex target shapes. To tackle the above problem, this paper proposes a non-parametric and robust imaging algorithm for a target with arbitrary motion including rotation and translation being observed by multi-static radar, which is based on the matching of target boundary points obtained by the range points migration (RPM) algorithm extended to the multi-static radar model. To enhance the imaging accuracy in situations having lower signal-to-noise ratios, the proposed method also adopts an integration scheme for the obtained range points, the antenna location part of which is correctly compensated for the estimated target motion. Results from numerical simulations show that the proposed method accurately extracts the surface of a moving target, and estimates the motion of the target, without any target or motion model.

  • Revisiting Shared Cache Contention Problems: A Practical Hardware-Software Cooperative Approach

    Eunji PAK  Sang-Hoon KIM  Jaehyuk HUH  Seungryoul MAENG  

     
    PAPER-Computer System

      Vol:
    E96-D No:7
      Page(s):
    1457-1466

    Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.

  • Multiple-Cell-Upset Tolerant 6T SRAM Using NMOS-Centered Cell Layout

    Shusuke YOSHIMOTO  Shunsuke OKUMURA  Koji NII  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER-Reliability, Maintainability and Safety Analysis

      Vol:
    E96-A No:7
      Page(s):
    1579-1585

    This paper presents a proposed NMOS-centered 6T SRAM cell layout that reduces a neutron-induced multiple-cell-upset (MCU) SER on a same wordline. We implemented an 1-Mb SRAM macro in a 65-nm CMOS process and irradiated neutrons as a neutron-accelerated test to evaluate the MCU SER. The proposed 6T SRAM macro improves the horizontal MCU SER by 67–98% compared with a general macro that has PMOS-centered 6T SRAM cells.

  • Design and Prototyping of Error Resilient Multi-Server Video Streaming System with Inter-Stream FEC

    Akihiro FUJIMOTO  Yusuke HIROTA  Hideki TODE  Koso MURAKAMI  

     
    PAPER-Network

      Vol:
    E96-B No:7
      Page(s):
    1826-1836

    To establish seamless and highly robust content distribution, we proposed the new concept of Inter-Stream Forward Error Correction (FEC), an efficient data recovery method leveraging several video streams. Our previous research showed that Inter-Stream FEC had significant recovery capability compared with the conventional FEC method under ideal modeling conditions and assumptions. In this paper, we design the Inter-Stream FEC architecture in detail with a view to practical application. The functional requirements for practical feasibility are investigated, such as simplicity and flexibility. Further, the investigation clarifies a challenging problem: the increase in processing delay created by the asynchronous arrival of packets. To solve this problem, we propose a pragmatic parity stream construction method. We implement and evaluate experimentally a prototype system with Inter-Stream FEC. The results demonstrate that the proposed system could achieve high recovery performance in our experimental environment.

  • Area-Efficient QC-LDPC Decoder Architecture Based on Stride Scheduling and Memory Bank Division

    Bongjin KIM  In-Cheol PARK  

     
    PAPER-Fundamental Theories for Communications

      Vol:
    E96-B No:7
      Page(s):
    1772-1779

    In this paper, an area-efficient decoder architecture is proposed for the quasi-cyclic low-density parity check (QC-LDPC) codes specified in the IEEE 802.16e WiMAX standard. The decoder supports all the code rates and codeword lengths defined in the standard. In order to achieve low area and maximize hardware utilization, the decoder utilizes 4 decoding function units, which is the greatest common divisor of the expansion factors. In addition, the decoder adopts a novel scheduling scheme named stride scheduling, which stores the extrinsic messages in non-sequential order to replace the conventional complex flexible permutation network with simple small-sized cyclic shifters and also minimize the number of memory accesses. To further minimize the complexity, the number of extrinsic memory instances for 24 block columns is reduced to 5 banks by identifying independent sets. All the memory instances used in the decoder are single-port memories which cost less area and price compared to dual-port ones. Finally, the decoding function units have partially parallel structure to make the decoding throughput sufficiently over the requirement of the WiMAX standard. The proposed decoder is synthesized with 49 K equivalent gates and 54,144 bits of memory, and the implementation occupies 0.40 mm2 in a 65 nm CMOS technology.

  • VACED-SIM: A Simulator for Scalability Prediction in Large-Scale Parallel Computing

    Yufei LIN  Xuejun YANG  Xinhai XU  Xiaowei GUO  

     
    PAPER-Computer System

      Vol:
    E96-D No:7
      Page(s):
    1430-1442

    Scaling up the system size has been the common approach to achieving high performance in parallel computing. However, designing and implementing a large-scale parallel system can be very costly in terms of money and time. When building a target system, it is desirable to initially build a smaller version by using the processing nodes with the same architecture as those in the target system. This allows us to achieve efficient and scalable prediction by using the smaller system to predict the performance of the target system. Such scalability prediction is critical because it enables system designers to evaluate different design alternatives so that a certain performance goal can be successfully achieved. As the de facto standard for writing parallel applications, MPI is widely used in large-scale parallel computing. By categorizing the discrete event simulation methods for MPI programs and analyzing the characteristics of scalability prediction, we propose a novel simulation method, called virtual-actual combined execution-driven (VACED) simulation, to achieve scalable prediction for MPI programs. The basic idea behind is to predict the execution time of an MPI program on a target machine by running it on a smaller system so that we can predict its communication time by virtual simulation and obtain its sequential computation time by actual execution. We introduce a model for the VACED simulation as well as the design and implementation of VACED-SIM, a lightweight simulator based on fine-grained activity and event definitions. We have validated our approach on a sub-system of Tianhe-1A. Our experimental results show that VACED-SIM exhibits higher accuracy and efficiency than MPI-SIM. In particular, for a target system with 1024 cores, the relative errors of VACED-SIM are less than 10% and the slowdowns are close to 1.

  • Low Complexity Keypoint Extraction Based on SIFT Descriptor and Its Hardware Implementation for Full-HD 60 fps Video

    Takahiro SUZUKI  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1376-1383

    Scale-Invariant Feature Transform (SIFT) has lately attracted attention in computer vision as a robust keypoint detection algorithm which is invariant for scale, rotation and illumination changes. However, its computational complexity is too high to apply in practical real-time applications. This paper proposes a low complexity keypoint extraction algorithm based on SIFT descriptor and utilization of the database, and its real-time hardware implementation for Full-HD resolution video. The proposed algorithm computes SIFT descriptor on the keypoint obtained by corner detection and selects a scale from the database. It is possible to parallelize the keypoint detection and descriptor computation modules in the hardware. These modules do not depend on each other in the proposed algorithm in contrast with SIFT that computes a scale. The processing time of descriptor computation in this hardware is independent of the number of keypoints because its descriptor generation is pipelining structure of pixel. Evaluation results show that the proposed algorithm on software is 12 times faster than SIFT. Moreover, the proposed hardware on FPGA is 427 times faster than SIFT and 61 times faster than the proposed algorithm on software. The proposed hardware performs keypoint extraction and matching at 60 fps for Full-HD video.

  • Facial Image Super-Resolution Reconstruction Based on Separated Frequency Components

    Hyunduk KIM  Sang-Heon LEE  Myoung-Kyu SOHN  Dong-Ju KIM  Byungmin KIM  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1315-1322

    Super resolution (SR) reconstruction is the process of fusing a sequence of low-resolution images into one high-resolution image. Many researchers have introduced various SR reconstruction methods. However, these traditional methods are limited in the extent to which they allow recovery of high-frequency information. Moreover, due to the self-similarity of face images, most of the facial SR algorithms are machine learning based. In this paper, we introduce a facial SR algorithm that combines learning-based and regularized SR image reconstruction algorithms. Our conception involves two main ideas. First, we employ separated frequency components to reconstruct high-resolution images. In addition, we separate the region of the training face image. These approaches can help to recover high-frequency information. In our experiments, we demonstrate the effectiveness of these ideas.

821-840hit(2741hit)