Hiroyuki IHARA Tomoharu SHIBUYA
Spatially coupled (SC) low-density parity-check (LDPC) codes are defined by bipartite graphs that are obtained by assembling prototype graphs. The combination and connection of prototype graphs are designated by specifying some parameters, and Kudekar et al. showed that BP threshold of the ensemble of SC LDPC codes agrees with MAP threshold of the ensemble of regular LDPC codes when those parameters are grown up so that the code length tends to infinity. When we design SC LDPC codes with practical code length, however, it is not clear how to set those parameters to enhance the performance of SC LDPC codes. In this paper, we provide the result of numerical experiments that suggest the dependence of error performance of SC LDPC codes over BEC on their design parameters.
Hui ZHAO Shuqiang YANG Hua FAN Zhikun CHEN Jinghu XU
Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
Shouyi YIN Rui SHI Leibo LIU Shaojun WEI
Coarse-grained Reconfigurable Architecture (CGRA) is a parallel computing platform that provides both high performance of hardware and high flexibility of software. It is becoming a promising platform for embedded and mobile applications. Since the embedded and mobile devices are usually battery-powered, improving battery lifetime becomes one of the primary design issues in using CGRAs. In this paper, we propose a battery-aware task-mapping method to optimize energy consumption and improve battery lifetime. The proposed method mainly addresses two problems: task partitioning and task scheduling when mapping applications onto CGRA. The task partitioning and scheduling are formulated as a joint optimization problem of minimizing the energy consumption. The nonlinear effects of real battery are taken into account in problem formulation. Using the insights from the problem formulation, we design the task-mapping algorithm. We have used several real-world benchmarks to test the effectiveness of the proposed method. Experiment results show that our method can dramatically lower the energy consumption and prolong the battery-life.
Junya KAIDA Yuko HARA-AZUMI Takuji HIEDA Ittetsu TANIGUCHI Hiroyuki TOMIYAMA Koji INOUE
This paper studies the static mapping of multiple applications on embedded many-core SoCs. The mapping techniques proposed in this paper take into account both inter-application and intra-application parallelism in order to fully utilize the potential parallelism of the many-core architecture. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. Experiments show the effectiveness of the proposed techniques.
Youngsoo PARK Taewon KIM Namho HUR
A method of frame synchronization between the color video and depth-map video for depth based 3D video using edge coherence is proposed. We find a synchronized pair of frames using edge coherence by computing the maximum number of overlapped edge pixels between the color video and depth-map video in regions of temporal frame difference. The experimental results show that the proposed method can be used for synchronization of depth-based 3D video and that it is robust against Gaussian noise with σ = less than 30 and video compression by H.264/AVC with QP = less than 44.
Song-Hyon KIM Kyong-Ha LEE Inchul SONG Hyebong CHOI Yoon-Joon LEE
We address the problem of processing graph pattern matching queries over a massive set of data graphs in this letter. As the number of data graphs is growing rapidly, it is often hard to process such queries with serial algorithms in a timely manner. We propose a distributed graph querying algorithm, which employs feature-based comparison and a filter-and-verify scheme working on the MapReduce framework. Moreover, we devise an efficient scheme that adaptively tunes a proper feature size at runtime by sampling data graphs. With various experiments, we show that the proposed method outperforms conventional algorithms in terms of scalability and efficiency.
Chen ZHANG ShiXiong XIA Bing LIU Lei ZHANG
Maximum margin clustering (MMC) is a newly proposed clustering method that extends the large-margin computation of support vector machine (SVM) to unsupervised learning. Traditionally, MMC is formulated as a nonconvex integer programming problem which makes it difficult to solve. Several methods rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programming (SDP) or second-order cone program (SOCP), which are computationally expensive and have difficulty handling large-scale data sets. In linear cases, by making use of the constrained concave-convex procedure (CCCP) and cutting plane algorithm, several MMC methods take linear time to converge to a local optimum, but in nonlinear cases, time complexity is still high. Since extreme learning machine (ELM) has achieved similar generalization performance at much faster learning speed than traditional SVM and LS-SVM, we propose an extreme maximum margin clustering (EMMC) algorithm based on ELM. It can perform well in nonlinear cases. Moreover, the kernel parameters of EMMC need not be tuned by means of random feature mappings. Experimental results on several real-world data sets show that EMMC performs better than traditional MMC methods, especially in handling large-scale data sets.
Gugang GAO Peng CAO Jun YANG Longxing SHI
One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 44, parallel processing order for deblocking. With our mapping strategies, we improved the algorithm's performance on REMUS-II. For example, with a luma 1616 MB, the Hybrid VBSMC achieves 4 times greater performance than VBSMC and 2.2 times greater performance than fixed 44 partition approach. Finally, we achieve 1080p@33fps H.264 high-profile (HiP)@level 4.1 decoding when the working frequency of REMUS-II is 200 MHz. Compared with typical hardware platforms, we can achieve better performance, area, and flexibility. For example, our performance achieves approximately 175% improvement than that of a commercial CGRA processor XPP-III while only using 70% of its area.
Akio MATOBA Narutoshi HORIMOTO Toshimichi SAITO
This letter studies a digital return map that is a mapping from a set of lattice points to itself. The digital map can exhibit various periodic orbits. As a typical example, we present the digital logistic map based on the logistic map. Two fundamental results are shown. When the logistic map has a unique periodic orbit, the digital map can have plural periodic orbits. When the logistic map has an unstable period-3 orbit that causes chaos, the digital map can have a stable period-3 orbit with various domain of attractions.
Fanxin ZENG Xiaoping ZENG Zhenyu ZHANG Guixin XUAN
A unified construction for transforming binary sequences of balance or unbalance into quaternary sequences is presented. On the one hand, when optimal and balanced binary sequences with even period are employed, our construction is exactly the same Jang, et al.'s and Chung, et al.'s ones, which result in balanced quaternary sequences with optimal autocorrelation magnitude. On the other hand, when ideal and balanced binary sequences with odd period N are made use of, our construction produces new balanced quaternary sequences with optimal autocorrelation value (OAV), in which there are N distinct sequences in terms of cyclic shift equivalence, and includes Tang, et al.'s and Jang, et al.'s ones as special cases. In addition, when binary sequences without period 2n-1 or balance are employed, the transformation of Jang, et al.'s method is invalid, however, the proposed construction works very good. As a consequence, this unified construction allows us to construct optimal and balanced quaternary sequences from ideal/optimal balanced binary sequences with arbitrary period.
Jiachen HUANG Changyong PAN Kewu PENG Liwen FAN Jian SONG
Amplitude phase shift keying (APSK) constellation with Gray mapping was proposed recently. Inspired by the simplified soft demapping for regular Gray-QAM, a simplified soft demapping algorithm for Gray-APSK is proposed in this paper. Compared with conventional soft demapping schemes, its complexity is greatly reduced with only a little SNR loss, which is validated by the complexity analysis and FPGA compilation results.
The performance of a mobile database management system (DBMS) in which most queries are made up of random data accesses if the NAND flash memory is used as storage media of the DBMS is degraded. The reason for this is that the performance of NAND flash memory is good for writing sequentially but poor when writing randomly. Thus, a new storage structure and querying policies are needed in mobile DBMS when flash memory is used as the storage media. In this letter, we propose a new policy of database page management to enhance the frequent random update performance, and then evaluate the performance experimentally.
Suk-Hwan LEE Seong-Geun KWON Ki-Ryong KWON
With the rapid expansion of vector data model application to digital content such as drawings and digital maps, the security and retrieval for vector data models have become an issue. In this paper, we present a vector data-hashing algorithm for the authentication, copy protection, and indexing of vector data models that are composed of a number of layers in CAD family formats. The proposed hashing algorithm groups polylines in a vector data model and generates group coefficients by the curvatures of the first and second type of polylines. Subsequently, we calculate the feature coefficients by projecting the group coefficients onto a random pattern, and finally generate the binary hash from binarization of the feature coefficients. Based on experimental results using a number of drawings and digital maps, we verified the robustness of the proposed hashing algorithm against various attacks and the uniqueness and security of the random key.
A smart meter, a component of smart grid systems, has low computational capability. This is in contrast to data collection units (DCUs) and meter data management servers (MDMSs). In this study, we propose a lightweight signature scheme and an authentication and report protocol that can reduce computational overhead and facilitate efficient operation of these three components in smart grid systems. The proposed signature scheme, called the linear map digital signature scheme (LMDSS), uses properties of linear maps and matrix operations; hence, it has low computational requirements. The proposed protocol is a delegation-based authentication protocol that uses an intermediate node, DCU. Using the proposed protocol, authentication of a DCU can be completed by a hash function. To evaluate the performance of the scheme and protocol, we implemented signature schemes, ours and others, and simulated the proposed protocol to obtain analytical results. We also proved that the scheme is secure by using a random oracle and analyzing the security of the protocol based on possible attack scenarios.
Zhi DENG Huaxi GU Yingtang YANG Hua YOU
In this paper, an energy- and traffic-balance-aware mapping algorithm from IP cores to nodes in a network is proposed for application-specific Network-on-Chip(NoC). The multi-objective optimization model is set up by considering the NoC architecture, and addressed by the proposed mapping algorithm that decomposes mapping optimization into a number of scalar subproblems simultaneously. In order to show performance of the proposed algorithm, the application specific benchmark is applied in the simulation. The experimental results demonstrate that the algorithm has advantages in energy consumption and traffic balance over other algorithms.
Xin HE Huiyun JING Qi HAN Xiamu NIU
Existing salient object detection methods either simply use a threshold to detect desired salient objects from saliency map or search the most promising rectangular window covering salient objects on the saliency map. There are two problems in the existing methods: 1) The performance of threshold-dependent methods depends on a threshold selection and it is difficult to select an appropriate threshold value. 2) The rectangular window not only covers the salient object but also contains background pixels, which leads to imprecise salient object detection. For solving these problems, a novel saliency threshold-free method for detecting the salient object with a well-defined boundary is proposed in this paper. We propose a novel window search algorithm to locate a rectangular window on our saliency map, which contains as many as possible pixels belonging the salient object and as few as possible background pixels. Once the window is determined, GrabCut is applied to extract salient object with a well-defined boundary. Compared with existing methods, our approach doesn't need any threshold to binarize the saliency map and additional operations. Experimental results show that our approach outperforms 4 state-of-the-art salient object detection methods, yielding higher precision and better F-Measure.
Kobkrit VIRIYAYUDHAKORN Susumu KUNIFUJI
Recent idea visualization programs still lack automatic idea summarization capabilities. This paper presents a knowledge-based method for automatically providing a short piece of English text about a topic to each idea group in idea charts. This automatic topic identification makes used Yet Another General Ontology (YAGO) and Wordnet as its knowledge bases. We propose a novel topic selection method and we compared its performance with three existing methods using two experimental datasets constructed using two idea visualization programs, i.e., the KJ Method (Kawakita Jiro Method) and mind-mapping programs. Our proposed topic identification method outperformed the baseline method in terms of both performance and consistency.
Bei HUANG Kaidi YOU Yun CHEN Zhiyi YU Xiaoyang ZENG
Reed-Solomon (RS) codes are widely used in digital communication and storage systems. Unlike usual VLSI approaches, this paper presents a high throughput fully programmable Reed-Solomon decoder on a multi-core processor. The multi-core processor platform is a 2-Dimension mesh array of Single Instruction Multiple Data (SIMD) cores, and it is well suited for digital communication applications. By fully extracting the parallelizable operations of the RS decoding process, we propose multiple optimization techniques to improve system throughput, including: task level parallelism on different cores, data level parallelism on each SIMD core, minimizing memory access, and route length minimized task mapping techniques. For RS(255, 239, 8), experimental results show that our 12-core implementation achieve a throughput of 4.35 Gbps, which is much better than several other published implementations. From the results, it is predictable that the throughput is linear with the number of cores by our approach.
Shouhei KIDERA Tetsuo KIRIMOTO
Microwave imaging techniques, in particular synthetic aperture radar (SAR), are able to obtain useful images even in adverse weather or darkness, which makes them suitable for target position or feature estimation. However, typical SAR imagery is not informative for the operator, because it is synthesized using complex radio signals with greater than 1.0 m wavelength. To deal with the target identification issue for imaging radar, various automatic target recognition (ATR) techniques have been developed. One of the most promising ATR approaches is based on neural network classification. However, in the case of SAR images heavily contaminated by random or speckle noises, the classification accuracy is severely degraded because it only compares the outputs of neurons in the final layer. To overcome this problem, this paper proposes a self organized map (SOM) based ATR method, where the binary SAR image is classified using the unified distance matrix (U-matrix) metric given by the SOM. Our numerical analyses and experiments on 5 types of civilian airplanes, demonstrate that the proposed method remarkably enhances the classification accuracy, particular in lower S/N situations, and holds a significant robustness to the angular variations of the observation.
Guo-An JIAN Cheng-An CHIEN Peng-Sheng CHEN Jiun-In GUO
This paper proposes a verification-aware design methodology that provides developers with a systematic and reliable approach to performing thread-pipelining parallelization on sequential programs. In contrast to traditional design flow, a behavior-model program is constructed before parallelizing as a bridge to help developers gradually leverage the technique of thread-pipelining parallelization. The proposed methodology integrates verification mechanisms into the design flow. To demonstrate the practicality of the proposed methodology, we applied it to the parallelization of a 3D depth map generator with thread pipelining. The parallel 3D depth map generator was further integrated into a 3D video playing system for evaluation of the verification overheads of the proposed methodology and the system performance. The results show the parallel system can achieve 33.72 fps in D1 resolution and 12.22 fps in HD720 resolution through a five-stage pipeline. When verifying the parallel program, the proposed verification approach keeps the performance degradation within 23% and 21.1% in D1 and HD720 resolutions, respectively.