Tankut ACARMAN Can GÖÇMENOĞLU
Limited satellite visibility, multipath and non-line-of-sight signals reduce the performance of the stand-alone Global Navigation Satellite System (GNSS) receiver in urban environments. Embedding 3D model of urban structures in the condition of restricted visibility of the GNSS satellites due to urban canyons may improve position measurement accuracy significantly. State-of-the-art methods use raytracing or rasterization techniques applied on a 3D map to detect satellite visibility. But these techniques are computationally expensive and limit their widespread benefits for mobile and automotive applications. In this paper, a texture-based satellite visibility detection (TBSVD) methodology suitable for mobile and automotive grade Graphical Processing Units is presented. This methodology applies ray marching algorithm on a 2D height map texture of urban structures, and it is proposed as a more efficient alternative to 3D raytracing or rasterization methodology. Real road test in the business district of the metropolitan city is conducted in order to evaluate its performance. TBSVD is implemented in conventional ranging-based GNSS solution and the results illustrate the effectiveness of the proposed approach.
Motoki AMAGASAKI Ryo ARAKI Masahiro IIDA Toshinori SUEYOSHI
Most modern field programmable gate arrays (FPGAs) use a lookup table (LUT) as their basic logic cell. LUT resource requirements increase as O(2k) with an increasing number of inputs, k, so LUTs with more than six inputs negatively affect the overall FPGA performance. To address this problem, we propose a scalable logic module (SLM), which is a logic cell with less configuration memory, by using partial functions of the Shannon expansion for logics that appear frequently. In addition, we develop a technology mapping tool for SLM. The key feature of our tool is to combine a function decomposition process with traditional cut-based mapping. Experimental results show that an SLM-based FPGA with our mapping method uses much fewer configuration memory bits and has a smaller area than conventional LUT-based FPGAs.
Yuma KINOSHITA Sayaka SHIOTA Masahiro IWAHASHI Hitoshi KIYA
A number of successful tone mapping operators (TMOs) for contrast compression have been proposed due to the need to visualize high dynamic range (HDR) images on low dynamic range devices. This paper proposes a novel inverse tone mapping (TM) operation and a new remapping framework with the operation. Existing inverse TM operations require either the store of some parameters calculated in forward TM, or data-depended operations. The proposed inverse TM operation enables to estimate HDR images from LDR ones mapped by the Reinhard's global operator, not only without keeping any parameters but also without any data-depended calculation. The proposed remapping framework with the inverse operation consists of two TM operations. The first TM operation is carried out by the Reinhard's global operator, and then the generated LDR one is stored. When we want different quality LDR ones, the proposed inverse TM operation is applied to the stored LDR one to generate an HDR one, and the second TM operation is applied to the HDR one to generate an LDR one with desirable quality, by using an arbitrary TMO. This framework allows not only to visualize an HDR image on low dynamic range devices at low computing cost, but also to efficiently store an HDR one as an LDR one. In simulations, it is shown that the proposed inverse TM operation has low computational cost, compared to the conventional ones. Furthermore, it is confirmed that the proposed framework allows to remap the stored LDR one to another LDR one whose quality is the same as that of the LDR one remapped by the conventional inverse TMO with parameters.
Yao HU Ikki FUJIWARA Michihiro KOIBUCHI
A number of parallel applications run on a high-performance computing (HPC) system simultaneously. Job mapping and scheduling become crucial to improve system utilization, because fragmentation prevents an incoming job from being assigned even if there are enough compute nodes unused. Wireless supercomputers and datacenters with free-space optical (FSO) terminals have been proposed to replace the conventional wired interconnection so that a diverse application workload can be better supported by changing their network topologies. In this study we firstly present an efficient job mapping by swapping the endpoints of FSO links in a wireless HPC system. Our evaluation shows that an FSO-equipped wireless HPC system can achieve shorter average queuing length and queuing time for all the dispatched user jobs. Secondly, we consider the use of a more complicated and enhanced scheduling algorithm, which can further improve the system utilization over different host networks, as well as the average response time for all the dispatched user jobs. Finally, we present the performance advantages of the proposed wireless HPC system under more practical assumptions such as different cabinet capacities and diverse subtopology packings.
Gian MAYUGA Yuta YAMATO Tomokazu YONEDA Yasuo SATO Michiko INOUE
Embedded memory is extensively being used in SoCs, and is rapidly growing in size and density. It contributes to SoCs to have greater features, but at the expense of taking up the most area. Due to continuous scaling of nanoscale device technology, large area size memory introduces aging-induced faults and soft errors, which affects reliability. In-field test and repair, as well as ECC, can be used to maintain reliability, and recently, these methods are used together to form a combined approach, wherein uncorrectable words are repaired, while correctable words are left to the ECC. In this paper, we propose a novel in-field repair strategy that repairs uncorrectable words, and possibly correctable words, for an ECC-based memory architecture. It executes an adaptive reconfiguration method that ensures 'fresh' memory words are always used until spare words run out. Experimental results demonstrate that our strategy enhances reliability, and the area overhead contribution is small.
Aseffa DEREJE TEKILU Chin-Hsien WU
A map-reduce framework is popular for big data analysis. In the typical map-reduce framework, both master node and worker nodes can use hard-disk drives (HDDs) as local disks for the map-reduce computation. However, because of the inherit mechanical problems of HDDs, the I/O performance is a bottleneck for the map-reduce framework when I/O-intensive applications (e.g., sorting) are performed. Replacing HDDs with solid-state drives (SSDs) is not economical, although SSDs have better performance than HDDs. In this paper, we propose a virtualization-based hybrid storage system for the map-reduce framework. The objective of the paper is to combine the advantages of the fast access property of SSDs and the low cost of HDDs by realizing an economical design and improving I/O performance of a map-reduce framework in a virtualization environment. We propose three storage combinations: SSD-based, HDD-based, and a hybrid of SSD-based and HDD-based storage systems which balances speed, capacity, and lifetime. According to experiments, the hybrid of SSD-based and HDD-based storage systems offers superior performance and economy.
Byungnam LIM Yeeun SHIM Yon Dohn CHUNG
For an efficient processing of large data in a distributed system, Hadoop MapReduce performs task scheduling such that tasks are distributed with consideration of the data locality. The data locality, however, is limitedly exploited, since it is pursued one node at a time basis without considering the global optimality. In this paper, we propose a novel task scheduling algorithm that globally considers the data locality. Through experiments, we show our algorithm improves the performance of MapReduce in various situations.
Osamu UCHIDA Masafumi KOSUGI Gaku ENDO Takamitsu FUNAYAMA Keisuke UTSU Sachi TAJIMA Makoto TOMITA Yoshitaka KAJITA Yoshiro YAMAMOTO
It is important to collect and spread accurate information quickly during disasters. Therefore, utilizing Twitter at the time of accidents has been gaining attention in recent year. In this paper, we propose a real-time information sharing system during disaster based on the utilization of Twitter. The proposed system consists of two sub-systems, a disaster information tweeting system that automatically attaches user's current geo-location information (address) and the hashtag of the form “#(municipality name) disaster,” and a disaster information mapping system that displays neighboring disaster-related tweets on a map.
This paper describes two speed-up techniques for Boolean matching of LUT-based circuits. One is one-hot encoding technique for variables representing input assignments. Though it requires more variables than existing binary encoding technique, almost all added clauses using one-hot encoding are binary clauses, which are suitable for efficient Boolean constraint propagation. The other is CEGAR (counter example guided abstraction refinement) technique which reduces the CPU time significantly. With both techniques, we can solve Boolean matching problem with 9 input function in 20 milliseconds on average, which is faster than the existing algorithms more than one order of magnitude.
Most unsupervised video segmentation algorithms are difficult to handle object extraction in dynamic real-world scenes with large displacements, as foreground hypothesis is often initialized with no explicit mutual constraint on top-down spatio-temporal coherency despite that it may be imposed to the segmentation objective. To handle such situations, we propose a multiscale saliency flow (MSF) model that jointly learns both foreground and background features of multiscale salient evidences, hence allowing temporally coherent top-down information in one frame to be propagated throughout the remaining frames. In particular, the top-down evidences are detected by combining saliency signature within a certain range of higher scales of approximation coefficients in wavelet domain. Saliency flow is then estimated by Gaussian kernel correlation of non-maximal suppressed multiscale evidences, which are characterized by HOG descriptors in a high-dimensional feature space. We build the proposed MSF model in accordance with the primary object hypothesis that jointly integrates temporal consistent constraints of saliency map estimated at multiple scales into the objective. We demonstrate the effectiveness of the proposed multiscale saliency flow for segmenting dynamic real-world scenes with large displacements caused by uniform sampling of video sequences.
Performance evaluation of an improved multiband impulse radio ultra-wideband (MIR UWB) system based on sub-band selection is proposed in this paper. In the improved scheme, a data mapping algorithm is introduced to a conventional MIR UWB system, and out of all the sub-bands, only partial ones are selected to transmit information data, which can improve the flexibility of sub-bands/spectrum allocation, avoid interference and provide a variety of data rates. Given diagrams of a transmitter and receiver, the exact bit error rate (BER) of the improved system is derived. A comparison of system performance between the improved MIR UWB system and the conventional MIR UWB system is presented in different channels. Simulation results show that the improved system can achieve the same data rate and better BER performance than the conventional MIR UWB system under additive white Gaussian noise (AWGN), multipath fading and interference coexistence channels. In addition, different data transmission rates and BER performances can be easily achieved by an appropriate choice of system parameters.
Yining XU Yang LIU Junya KAIDA Ittetsu TANIGUCHI Hiroyuki TOMIYAMA
This paper proposes a static application mapping technique, based on integer linear programming, for non-hierarchical manycore embedded systems. Unlike previous work which was designed for hierarchical manycore SoCs, this work allows more flexible application mapping to achieve higher performance. The experimental results show the effectiveness of this work.
Junki KAWAGUCHI Hayato MASHIKO Yukihide KOHIRA
In general-synchronous framework, in which the clock is distributed periodically to each register but not necessarily simultaneously, circuit performance is expected to be improved compared to complete-synchronous framework, in which the clock is distributed periodically and simultaneously to each register. To improve the circuit performance more, logic synthesis for general-synchronous framework is required. In this paper, under the assumption that any clock schedule is realized by an ideal clock distribution circuit, when two or more cell libraries are available, a technology mapping method which assigns a cell to each gate in the given logic circuit by using integer linear programming is proposed. In experiments, we show the effectiveness of the proposed technology mapping method.
Zhihong LIU Aimal KHAN Peixin CHEN Yaping LIU Zhenghu GONG
MapReduce still suffers from a problem known as skew, where load is unevenly distributed among tasks. Existing solutions follow a similar pattern that estimates the load of each task and then rebalances the load among tasks. However, these solutions often incur heavy overhead due to the load estimation and rebalancing. In this paper, we present DynamicAdjust, a dynamic resource adjustment technique for mitigating skew in MapReduce. Instead of rebalancing the load among tasks, DynamicAdjust adjusts resources dynamically for the tasks that need more computation, thereby accelerating these tasks. Through experiments using real MapReduce workloads on a 21-node Hadoop cluster, we show that DynamicAdjust can effectively mitigate the skew and speed up the job completion time by up to 37.27% compared to the native Hadoop YARN.
Esmaeil POURJAM Daisuke DEGUCHI Ichiro IDE Hiroshi MURASE
Human body segmentation has many applications in a wide variety of image processing tasks, from intelligent vehicles to entertainment. A substantial amount of research has been done in the field of segmentation and it is still one of the active research areas, resulting in introduction of many innovative methods in literature. Still, until today, a method that can overcome the human segmentation problems and adapt itself to different kinds of situations, has not been introduced. Many of methods today try to use the graph-cut framework to solve the segmentation problem. Although powerful, these methods rely on a distance penalty term (intensity difference or RGB color distance). This term does not always lead to a good separation between two regions. For example, if two regions are close in color, even if they belong to two different objects, they will be grouped together, which is not acceptable. Also, if one object has multiple parts with different colors, e.g. humans wear various clothes with different colors and patterns, each part will be segmented separately. Although this can be overcome by multiple inputs from user, the inherent problem would not be solved. In this paper, we have considered solving the problem by making use of a human probability map, super-pixels and Grab-cut framework. Using this map relives us from the need for matching the model to the actual body, thus helps to improve the segmentation accuracy. As a result, not only the accuracy has improved, but also it also became comparable to the state-of-the-art interactive methods.
Leibo LIU Dong WANG Yingjie CHEN Min ZHU Shouyi YIN Shaojun WEI
This paper presents the design of a multiple-standard 1080 high definition (HD) video decoder on a mixed-grained reconfigurable computing platform integrating coarse-grained reconfigurable processing units (RPUs) and FPGAs. The proposed RPU, including 16×16 multi-functional processing elements (PEs), is used to accelerate compute-intensive tasks in the video decoding. A soft-core-based microprocessor array is implemented on the FPGA and adopted to speed-up the dynamic reconfiguration of the RPU. Furthermore, a mail-box-based communication scheme is utilized to improve the communication efficiency between RPUs and FPGAs. By exploiting dynamic reconfiguration of the RPUs and static reconfiguration of the FPGAs, the proposed platform achieves scalable performances and cost trade-offs to support a variety of video coding standards, including MPEG-2, AVS, H.264, and HEVC. The measured results show that the proposed platform can support H.264 1080 HD video streams at up to 57 frames per second (fps) and HEVC 1080 HD video streams at up to 52fps under 250MHz, at the same time, it achieves a 3.6× performance gain over an industrial coarse-grained reconfigurable processor for H.264 decoding, and a 6.43× performance boosts over a general purpose processor based implementation for HEVC decoding.
In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.
Yuichi TAZAKI Jingyu XIANG Tatsuya SUZUKI Blaine LEVEDAHL
This research develops a method for trajectory planning of robotic systems with differential constraints based on hierarchical partitioning of a continuous state space. Unlike conventional roadmaps which is constructed in the configuration space, the proposed state roadmap also includes additional state information, such as velocity and orientation. A bounded domain of the additional state is partitioned into sub-intervals with multiple resolution levels. Each node of a state roadmap consists of a fixed position and an interval of additional state values. A valid transition is defined between a pair of nodes if any combination of additional states, within their respective intervals, produces a trajectory that satisfies a set of safety constraints. In this manner, a trajectory connecting arbitrary start and goal states subject to safety constraints can be obtained by applying a graph search technique on the state roadmap. The hierarchical nature of the state roadmap reduces the computational cost of roadmap construction, the required storage size of computed roadmaps, as well as the computational cost of path planning. The state roadmap method is evaluated in the trajectory planning examples of an omni-directional mobile robot and a car-like robot with collision avoidance and various types of constraints.
Hongyeon KIM Sungmin KANG Seokjoo LEE Jun-Ki MIN
MapReduce is considered as the de facto framework for storing and processing massive data due to its fascinating features: simplicity, flexibility, fault tolerance and scalability. However, since the MapReduce framework does not provide an efficient access method to data (i.e., an index), whole data should be retrieved even though a user wants to access a small portion of data. Thus, in this paper, we devise an efficient algorithm constructing quadtrees with MapReduce. Our proposed algorithms reduce the index construction time by utilizing a sampling technique to partition a data set. To improve the query performance, we extend the quadtree construction algorithm in which the adjacent nodes of a quadtree are integrated when the number of points located in the nodes is less than the predefined threshold. Furthermore, we present an effective algorithm for incremental update. Our experimental results show the efficiency of our proposed algorithms in diverse environments.
Recently, Park and Lee suggested a new framework for realizing Identity-Based Encryption (IBE) trapdoor called ‘two-equation-revocation’, and proposed a new IBE system that makes use of a Map-To-Point hash function. In this paper, we present a variant of the PL system by giving a simple way to remove the Map-To-Point hash function from the PL system. Our variant is proven to be secure under non-standard security assumptions, which results in the degradation of security. Instead, our variant can have several efficiency advantages over the PL system: (1) it provides receiver's anonymity, (2) it has no correctness error, (3) it has shorter ciphertext, and (4) it has faster encryption. As a result, (when not considering security assumptions and security losses) our variant is as efficient as the Boneh-Boyen and Sakai-Kasahara IBE systems that are considered as being the most practical ones.