The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] tin(3578hit)

1141-1160hit(3578hit)

  • Robust Lightweight Embedded Virtualization Layer Design with Simple Hardware Assistance

    Tsung-Han LIN  Yuki KINEBUCHI  Tatsuo NAKAJIMA  

     
    PAPER-Computer System and Services

      Vol:
    E95-D No:12
      Page(s):
    2821-2832

    In this paper, we propose a virtualization architecture for a multi-core embedded system to provide more system reliability and security while maintaining performance and without introducing additional special hardware supports or implementing a complex protection mechanism in the virtualization layer. Embedded systems, especially consumer electronics, have often used virtualization. Virtualization is not a new technique, as there are various uses for both GPOS (General Purpose Operating System) and RTOS (Real Time Operating System). The surge of the multi-core platforms in embedded systems also helps consolidate the virtualization system for better performance and lower power consumption. Embedded virtualization design usually uses two approaches. The first is to use the traditional VMM, but it is too complicated for use in the embedded environment without additional special hardware support. The other approach uses the microkernel, which imposes a modular design. The guest systems, however, would suffer from considerable modifications in this approach, as the microkernel allows guest systems to run in the user space. For some RTOSes and their applications originally running in the kernel space, this second approach is more difficult to use because those codes use many privileged instructions. To achieve better reliability and keep the virtualization layer design lightweight, this work uses a common hardware component adopted in multi-core embedded processors. In most embedded platforms, vendors provide additional on-chip local memory for each physical core, and these local memory areas are only private to their cores. By taking advantage of this memory architecture, we can mitigate the above-mentioned problems at once. We choose to re-map the virtualization layer's program on the local memory, called SPUMONE, which runs all guest systems in the kernel space. Doing so, it can provide additional reliability and security for the entire system because the SPUMONE design in a multi-core platform has each instance installed on a separate processor core. This design differs from traditional virtualization layer design, and the content of each SPUMONE is inaccessible to the others. We also achieve this goal without adding overhead to the overall performance.

  • An Optimal Resource Sharing in Hierarchical Virtual Organizations in the Grid

    Kyong Hoon KIM  Guy Martin TCHAMGOUE  Yong-Kee JUN  Wan Yeon LEE  

     
    LETTER

      Vol:
    E95-D No:12
      Page(s):
    2948-2951

    In large-scale collaborative computing, users and resource providers organize various Virtual Organizations (VOs) to share resources and services. A VO organizes other sub-VOs for the purpose of achieving the VO goal, which forms hierarchical VO environments. VO participants agree upon a certain policies, such as resource sharing amount or user accesses. In this letter, we provide an optimal resource sharing mechanism in hierarchical VO environments under resource sharing agreements. The proposed algorithm enhances resource utilization and reduces mean response time of each user.

  • A Fully Programmable Reed-Solomon Decoder on a Multi-Core Processor Platform

    Bei HUANG  Kaidi YOU  Yun CHEN  Zhiyi YU  Xiaoyang ZENG  

     
    PAPER-Computer Architecture

      Vol:
    E95-D No:12
      Page(s):
    2939-2947

    Reed-Solomon (RS) codes are widely used in digital communication and storage systems. Unlike usual VLSI approaches, this paper presents a high throughput fully programmable Reed-Solomon decoder on a multi-core processor. The multi-core processor platform is a 2-Dimension mesh array of Single Instruction Multiple Data (SIMD) cores, and it is well suited for digital communication applications. By fully extracting the parallelizable operations of the RS decoding process, we propose multiple optimization techniques to improve system throughput, including: task level parallelism on different cores, data level parallelism on each SIMD core, minimizing memory access, and route length minimized task mapping techniques. For RS(255, 239, 8), experimental results show that our 12-core implementation achieve a throughput of 4.35 Gbps, which is much better than several other published implementations. From the results, it is predictable that the throughput is linear with the number of cores by our approach.

  • Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture

    Dajiang LIU  Shouyi YIN  Chongyong YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-Computer Architecture

      Vol:
    E95-D No:12
      Page(s):
    2898-2907

    Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.

  • Comparing Operating Systems Scalability on Multicore Processors by Microbenchmarking

    Yan CUI  Yu CHEN  Yuanchun SHI  

     
    PAPER-Computer System and Services

      Vol:
    E95-D No:12
      Page(s):
    2810-2820

    Multicore processor architectures have become ubiquitous in today's computing platforms, especially in parallel computing installations, with their power and cost advantages. While the technology trend continues towards having hundreds of cores on a chip in the foreseeable future, an urgent question posed to system designers as well as application users is whether applications can receive sufficient support on today's operating systems for them to scale to many cores. To this end, people need to understand the strengths and weaknesses on their support on scalability and to identify major bottlenecks limiting the scalability, if any. As open-source operating systems are of particular interests in the research and industry communities, in this paper we choose three operating systems (Linux, Solaris and FreeBSD) to systematically evaluate and compare their scalability by using a set of highly-focused microbenchmarks for broad and detailed understanding their scalability on an AMD 32-core system. We use system profiling tools and analyze kernel source codes to find out the root cause of each observed scalability bottleneck. Our results reveal that there is no single operating system among the three standing out on all system aspects, though some system(s) can prevail on some of the system aspects. For example, Linux outperforms Solaris and FreeBSD significantly for file-descriptor- and process-intensive operations. For applications with intensive sockets creation and deletion operations, Solaris leads FreeBSD, which scales better than Linux. With the help of performance tools and source code instrumentation and analysis, we find that synchronization primitives protecting shared data structures in the kernels are the major bottleneck limiting system scalability.

  • Power Gating Implementation for Supply Noise Mitigation with Body-Tied Triple-Well Structure

    Yasumichi TAKAI  Masanori HASHIMOTO  Takao ONOYE  

     
    PAPER-Circuit Design

      Vol:
    E95-A No:12
      Page(s):
    2220-2225

    This paper investigates power gating implementations that mitigate power supply noise. We focus on the body connection of power-gated circuits, and examine the amount of power supply noise induced by power-on rush current and the contribution of a power-gated circuit as a decoupling capacitance during the sleep mode. To figure out the best implementation, we designed and fabricated a test chip in 65 nm process. Experimental results with measurement and simulation reveal that the power-gated circuit with body-tied structure in triple-well is the best implementation from the following three points; power supply noise due to rush current, the contribution of decoupling capacitance during the sleep mode and the leakage reduction thanks to power gating.

  • Lossless Compression of Double-Precision Floating-Point Data for Numerical Simulations: Highly Parallelizable Algorithms for GPU Computing

    Mamoru OHARA  Takashi YAMAGUCHI  

     
    PAPER-Parallel and Distributed Computing

      Vol:
    E95-D No:12
      Page(s):
    2778-2786

    In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massively-parallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.

  • Asymptotically Optimal Merging on ManyCore GPUs

    Arne KUTZNER  Pok-Son KIM  Won-Kwang PARK  

     
    PAPER-Parallel and Distributed Computing

      Vol:
    E95-D No:12
      Page(s):
    2769-2777

    We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log (+1)) element comparisons, where m and n are the sizes of the input sequences with m ≤ n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0 ≤ i ≤ l). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.

  • Geographic Routing Algorithm with Location Errors

    Yuanwei JING  Yan WANG  

     
    LETTER-Information Network

      Vol:
    E95-D No:12
      Page(s):
    3092-3096

    Geographic routing uses the geographical location information provided by nodes to make routing decisions. However, the nodes can not obtain accurate location information due to the effect of measurement error. A new routing strategy using maximum expected distance and angle (MEDA) algorithm is proposed to improve the performance and promote the successive transmission rate. We firstly introduce the expected distance and angle, and then we employ the principal component analysis to construct the object function for selecting the next hop node. We compare the proposed algorithm with maximum expectation within transmission range (MER) and greedy routing scheme (GRS) algorithms. Simulation results show that the proposed MEDA algorithm outperforms the MER and GRS algorithms with higher successive transmission rate.

  • Image Recovery by Decomposition with Component-Wise Regularization

    Shunsuke ONO  Takamichi MIYATA  Isao YAMADA  Katsunori YAMAOKA  

     
    PAPER-Image

      Vol:
    E95-A No:12
      Page(s):
    2470-2478

    Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.

  • Parallel Dynamic Cloud Rendering Method Based on Physical Cellular Automata Model

    Liqiang ZHANG  Chao LI  Haoliang SUN  Changwen ZHENG  Pin LV  

     
    PAPER-Parallel and Distributed Computing

      Vol:
    E95-D No:12
      Page(s):
    2750-2758

    Due to the complicated composition of cloud and its disordered transformation, the rendering of cloud does not perfectly meet actual prospect by current methods. Based on physical characteristics of cloud, a physical cellular automata model of Dynamic cloud is designed according to intrinsic factor of cloud, which describes the rules of hydro-movement, deposition and accumulation and diffusion. Then a parallel computing architecture is designed to compute the large-scale data set required by the rendering of dynamical cloud, and a GPU-based ray-casting algorithm is implemented to render the cloud volume data. The experiment shows that cloud rendering method based on physical cellular automata model is very efficient and able to adequately exhibit the detail of cloud.

  • Region Oriented Routing FPGA Architecture for Dynamic Power Gating

    Ce LI  Yiping DONG  Takahiro WATANABE  

     
    PAPER-Physical Level Design

      Vol:
    E95-A No:12
      Page(s):
    2199-2207

    Dynamic power gating applicable to FPGA can reduce the power consumption effectively. In this paper, we propose a sophisticated routing architecture for a region oriented FPGA which supports dynamic power gating. This is the first routing solution of dynamic power gating for coarse-grained FPGA. This paper has 2 main contributions. First, it improves the routing resource graph and routing architecture to support special routing for a region oriented FPGA. Second, some routing channels are made wider to avoid congestion. Experimental result shows that 7.7% routing area can be reduced compared with the symmetric Wilton switch box in the region. Also, our proposed FPGA architecture with sophisticated P&R can reduce the power consumption of the system implemented in FPGA.

  • Implicit Influencing Group Discovery from Mobile Applications Usage

    Masaji KATAGIRI  Minoru ETOH  

     
    PAPER-Office Information Systems, e-Business Modeling

      Vol:
    E95-D No:12
      Page(s):
    3026-3036

    This paper presents an algorithmic approach to acquiring the influencing relationships among users by discovering implicit influencing group structure from smartphone usage. The method assumes that a time series of users' application downloads and activations can be represented by individual inter-personal influence factors. To achieve better predictive performance and also to avoid over-fitting, a latent feature model is employed. The method tries to extract the latent structures by monitoring cross validating predictive performances on approximated influence matrices with reduced ranks, which are generated based on an initial influence matrix obtained from a training set. The method adopts Nonnegative Matrix Factorization (NMF) to reduce the influence matrix dimension and thus to extract the latent features. To validate and demonstrate its ability, about 160 university students voluntarily participated in a mobile application usage monitoring experiment. An empirical study on real collected data reveals that the influencing structure consisted of six influencing groups with two types of mutual influence, i.e. intra-group influence and inter-group influence. The results also highlight the importance of sparseness control on NMF for discovering latent influencing groups. The obtained influencing structure provides better predictive performance than state-of-the-art collaborative filtering methods as well as conventional methods such as user-based collaborative filtering techniques and simple popularity.

  • Impact of Elastic Optical Paths That Adopt Distance Adaptive Modulation to Create Efficient Networks

    Tatsumi TAKAGI  Hiroshi HASEGAWA  Ken-ichi SATO  Yoshiaki SONE  Akira HIRANO  Masahiko JINNO  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E95-B No:12
      Page(s):
    3793-3801

    We propose optical path routing and frequency slot assignment algorithms that can make the best use of elastic optical paths and the capabilities of distance adaptive modulation. Due to the computational difficulty of the assignment problem, we develop algorithms for 1+1 dedicated/1:1 shared protected ring networks and unprotected mesh networks to that fully utilize the characteristics of the topologies. Numerical experiments elucidate that the introduction of path elasticity and distance adaptive modulation significantly reduce the occupied bandwidth.

  • A Novel Energy Efficient Routing Protocol for Wireless Sensor Networks: Greedy Routing for Maximum Lifetime

    Jean Marc Kouakou ATTOUNGBLE  Kazunori OKADA  

     
    PAPER-Network

      Vol:
    E95-B No:12
      Page(s):
    3802-3810

    In this paper, we present Greedy Routing for Maximum Lifetime (GRMax) [1],[2] which can use the limited energy available to nodes in a Wireless Sensor Network (WSN) in order to delay the dropping of packets, thus extend the network lifetime. We define network lifetime as the time period until a source node starts to drop packets because it has no more paths to the destination [3]. We introduce the new concept of Network Connectivity Aiming (NCA) node. The primary goal of NCA nodes is to maintain network connectivity and avoid network partition. To evaluate GRMax, we compare its performance with Geographic and Energy Aware Routing (GEAR) [4], which is an energy efficient geographic routing protocol and Greedy Perimeter Stateless Routing (GPSR) [5], which is a milestone among geographic routing protocol. We evaluate and compare the performance of GPSR, GEAR, and GRMax using OPNET Modeler version 15. The results show that GRMax performs better than GEAR and GPSR with respect to the number of successfully delivered packets and the time period before the nodes begin to drop packets. Moreover, with GRMax, there are fewer dead nodes in the system and less energy is required to deliver packets to destination node (sink).

  • Cooperative Spectrum Sensing for Cognitive Radio Systems with Imperfect Reporting Channels

    Jeong Woo LEE  

     
    LETTER-Terrestrial Wireless Communication/Broadcasting Technologies

      Vol:
    E95-B No:11
      Page(s):
    3629-3632

    A novel cooperative spectrum sensing scheme suitable for wireless cognitive radio system with imperfect reporting channels is proposed. In the proposed scheme, binary local decision bits are transmitted to the fusion center and combined to form a soft-valued decision statistics in the fusion center. To form a decision statistics, a majority-decision-aided weighting rule is proposed. The proposed scheme provides a reliable sensing capability even with poor reporting channels.

  • Performance Improvement of Post-FFT Adaptive Array with Reciprocal Interpolation for ISDB -T

    Tomoaki TAKEUCHI  Hiroyuki HAMAZUMI  Kazuhiko SHIBUYA  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E95-B No:11
      Page(s):
    3527-3535

    As many digital terrestrial broadcasting stations have been installed and are now broadcasting, the problem of poor reception has become serious even though the receiving powers are high. Although we had developed a interference canceller for broadcast-wave relay stations, an adaptive array is desirable to be more robust against low-D/U multipath environment as a receiver for the service area. In this paper, we propose a weighting coefficient optimization algorithm for post-FFT adaptive array using the reciprocals of weighting coefficients. Numerical examples show the effectiveness of the proposed method.

  • Design of a New 4-Dimensional Constellation-Rotation Modulation Method for DVB-NGH

    Taejin JUNG  Hyoungsoo LIM  

     
    LETTER-Terrestrial Wireless Communication/Broadcasting Technologies

      Vol:
    E95-B No:11
      Page(s):
    3625-3628

    In this letter, we propose a new 4-dimensional constellation-rotation (CR) modulation method that achieves diversity gain of 4 in Rayleigh fading channels. The proposed scheme consists of two consecutive CR operations for QAM symbols unlike the conventional 2-dimensional CR method based on only one CR operation. Computer simulation results show that the new method exhibits much better performance than the conventional one in terms of code rate and channel erasure ratio.

  • Numerical Modeling; Thickness Dependence of J-V Characteristic for Multi-Layered OLED Device Open Access

    Sang-Gun LEE  Hong-Seok CHOI  Chang-Wook HAN  Seok-Jong LEE  Yoon-Heung TAK  Byung-Chul AHN  

     
    INVITED PAPER

      Vol:
    E95-C No:11
      Page(s):
    1756-1760

    A numerical model of multi-layered organic light emitting diode (OLED) is presented in this paper. The current density-voltage (J-V) model for OLED was performed by using the injection-limited current and bulk-limited current. The mobility equation was based on the field dependent model, so called “Poole-Frenkel mobility model.” The accuracy of this simulation was represented by comparing to the experimental results with a variable of EML thickness of multi-layered OLED device. There are two hetero-junction models which should be dealt with in the simulation. The Langevin recombination rate of electron and hole is also calculated through the device simulation.

  • Trusted Inter-Domain Fast Authentication Protocol in Split Mechanism Network

    Lijuan ZHENG  Yingxin HU  Zhen HAN  Fei MA  

     
    LETTER-Information Network

      Vol:
    E95-D No:11
      Page(s):
    2728-2731

    Previous inter-domain fast authentication schemes only realize the authentication of user identity. We propose a trusted inter-domain fast authentication scheme based on the split mechanism network. The proposed scheme can realize proof of identity and integrity verification of the platform as well as proof of the user identity. In our scheme, when the mobile terminal moves to a new domain, the visited domain directly authenticates the mobile terminal using the ticket issued by the home domain rather than authenticating it through its home domain. We demonstrate that the proposed scheme is highly effective and more secure than contemporary inter-domain fast authentication schemes.

1141-1160hit(3578hit)