The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SC(4570hit)

1081-1100hit(4570hit)

  • Clique-Based Architectural Synthesis of Flow-Based Microfluidic Biochips

    Trung Anh DINH  Shigeru YAMASHITA  Tsung-Yi HO  Yuko HARA-AZUMI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2668-2679

    Microfluidic biochips, also referred to “lab-on-a-chip,” have been recently proposed to integrate all the necessary functions for biochemical analyses. This technology starts a new era of biology science, where a combination of electronic and biology is first introduced. There are several types of microfluidic biochips; among them there has been a great interest in flow-based microfluidic biochips, in which the flows of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex resource units such as micropumps, switches and mixers can be built. For efficient execution, the flows of liquid routes in microfluidic biochips need to be scheduled under some resource constraints and routing constraints. The execution time of a biochemical application depends strongly on the binding and scheduling result. The most previously developed binding and scheduling algorithm is based on heuristics, and there has been no method to obtain optimal results. Considering the above, we propose an optimal method by casting the problem to a clique problem. Moreover, this paper also presents some heuristic techniques for computational time reduction. Experiments demonstrate that the proposed method is able to reduce the execution time of biochemical applications by more than 15% compared with the previous approach. Moreover, the proposed heuristic method is able to produce the results at no or little cost of optimality, in significantly shorter time than the optimal method.

  • Retrieval and Localization of Multiple Specific Objects with Hough Voting Based Ranking and A Contrario Decision

    Pradit MITTRAPIYANURUK  Pakorn KAEWTRAKULPONG  

     
    PAPER-Vision

      Vol:
    E96-A No:12
      Page(s):
    2717-2727

    We present an algorithm for simultaneously recognizing and localizing planar textured objects in an image. The algorithm can scale efficiently with respect to a large number of objects added into the database. In contrast to the current state-of-the-art on large scale image search, our algorithm can accurately work with query images consisting of several specific objects and/or multiple instances of the same object. Our proposed algorithm consists of two major steps. The first step is to generate a set of hypotheses that provides information about the identities and the locations of objects in the image. To serve this purpose, we extend Bag-Of-Visual-Word (BOVW) image retrieval by incorporating a re-ranking scheme based on the Hough voting technique. Subsequently, in the second step, we propose a geometric verification algorithm based on A Contrario decision framework to draw out the final detection results from the generated hypotheses. We demonstrate the performance of the algorithm on the scenario of recognizing CD covers with a database consisting of more than ten thousand images of different CD covers. Our algorithm yield to the detection results of more than 90% precision and recall within a few seconds of processing time per image.

  • A GPU Implementation of Dynamic Programming for the Optimal Polygon Triangulation

    Yasuaki ITO  Koji NAKANO  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2596-2603

    This paper presents a GPU (Graphics Processing Units) implementation of dynamic programming for the optimal polygon triangulation. Recently, GPUs can be used for general purpose parallel computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA (Compute Unified Device Architecture) provided by NVIDIA. The optimal polygon triangulation problem for a convex polygon is an optimization problem to find a triangulation with minimum total weight. It is known that this problem for a convex n-gon can be solved using the dynamic programming technique in O(n3) time using a work space of size O(n2). In this paper, we propose an efficient parallel implementation of this O(n3)-time algorithm on the GPU. In our implementation, we have used two new ideas to accelerate the dynamic programming. The first idea (adaptive granularity) is to partition the dynamic programming algorithm into many sequential kernel calls of CUDA, and to select the best parameters for the size and the number of blocks for each kernel call. The second idea (sliding and mirroring arrangements) is to arrange the working data for coalesced access of the global memory in the GPU to minimize the memory access overhead. Our implementation using these two ideas solves the optimal polygon triangulation problem for a convex 8192-gon in 5.57 seconds on the NVIDIA GeForce GTX 680, while a conventional CPU implementation runs in 1939.02 seconds. Thus, our GPU implementation attains a speedup factor of 348.02.

  • Network Interface Architecture with Scalable Low-Latency Message Receiving Mechanism

    Noboru TANABE  Atsushi OHTA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2536-2544

    Most of scientists except computer scientists do not want to make efforts for performance tuning with rewriting their MPI applications. In addition, the number of processing elements which can be used by them is increasing year by year. On large-scale parallel systems, the number of accumulated messages on a message buffer tends to increase in some of their applications. Since searching message queue in MPI is time-consuming, system side scalable acceleration is needed for those systems. In this paper, a support function named LHS (Limited-length Head Separation) is proposed. Its performance in searching message buffer and hardware cost are evaluated. LHS accelerates searching message buffer by means of switching location to store limited-length heads of messages. It uses the effects such as increasing hit rate of cache on host with partial off-loading to hardware. Searching speed of message buffer when the order of message reception is different from the receiver's expectation is accelerated 14.3 times with LHS on FPGA-based network interface card (NIC) named DIMMnet-2. This absolute performance is 38.5 times higher than that of IBM BlueGene/P although the frequency is 8.5times slower than BlueGene/P. LHS has higher scalability than ALPU in the performance per frequency. Since these results are obtained with partially on loaded linear searching on old Pentium®4, performance gap will increase using state of art CPU. Therefore, LHS is more suitable for larger parallel systems. The discussions for adopting proposed method to state of art processors and systems are also presented.

  • An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

    Hui ZHAO  Shuqiang YANG  Hua FAN  Zhikun CHEN  Jinghu XU  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2654-2662

    Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.

  • Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access

    Lei GUO  Yuhua TANG  Yong DOU  Yuanwu LEI  Meng MA  Jie ZHOU  

     
    PAPER-Computer System

      Vol:
    E96-D No:12
      Page(s):
    2765-2775

    The effective bandwidth of the dynamic random-access memory (DRAM) for the alternate row-wise/column-wise matrix access (AR/CMA) mode, which is a basic characteristic in scientific and engineering applications, is very low. Therefore, we propose the window memory layout scheme (WMLS), which is a matrix layout scheme that does not require transposition, for AR/CMA applications. This scheme maps one row of a logical matrix into a rectangular memory window of the DRAM to balance the bandwidth of the row- and column-wise matrix access and to increase the DRAM IO bandwidth. The optimal window configuration is theoretically analyzed to minimize the total number of no-data-visit operations of the DRAM. Different WMLS implementationsare presented according to the memory structure of field-programmable gata array (FPGA), CPU, and GPU platforms. Experimental results show that the proposed WMLS can significantly improve DRAM bandwidth for AR/CMA applications. achieved speedup factors of 1.6× and 2.0× are achieved for the general-purpose CPU and GPU platforms, respectively. For the FPGA platform, the WMLS DRAM controller is custom. The maximum bandwidth for the AR/CMA mode reaches 5.94 GB/s, which is a 73.6% improvement compared with that of the traditional row-wise access mode. Finally, we apply WMLS scheme for Chirp Scaling SAR application, comparing with the traditional access approach, the maximum speedup factors of 4.73X, 1.33X and 1.56X can be achieved for FPGA, CPU and GPU platform, respectively.

  • Apps at Hand: Personalized Live Homescreen Based on Mobile App Usage Prediction

    Xiao XIA  Xinye LIN  Xiaodong WANG  Xingming ZHOU  Deke GUO  

     
    LETTER-Information Network

      Vol:
    E96-D No:12
      Page(s):
    2860-2864

    To facilitate the discovery of mobile apps in personal devices, we present the personalized live homescreen system. The system mines the usage patterns of mobile apps, generates personalized predictions, and then makes apps available at users' hands whenever they want them. Evaluations have verified the promising effectiveness of our system.

  • DC-DC Converter-Aware Task Scheduling and Dynamic Reconfiguration for Energy Harvesting Embedded Systems

    Kyungsoo LEE  Tohru ISHIHARA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2660-2667

    Energy-harvesting devices are materials that allow ambient energy sources to be converters into usable electrical power. While a battery powers the modern embedded systems, these energy-harvesting devices power the energy-harvesting embedded systems. This claims a new energy efficient management techniques for the energy-harvesting systems dislike the previous management techniques. The higher entire system efficiency in an energy-harvesting system can be obtained by a higher generating efficiency, a higher consuming efficiency, or a higher transferring efficiency. This paper presents a generalized technique for a dynamic reconfiguration and a task scheduling considering the power loss in DC-DC converters in the system. The proposed technique minimizes the power loss in the DC-DC converter and charger of the system. The proposed technique minimizes the power loss in the DC-DC converters and charger of the system. Experiments with actual application demonstrate that our approach reduces the total energy consumption by 22% in average over the conventional approach.

  • An Improved Quantization Scheme for Lattice-Reduction Aided MIMO Detection Based on Gram-Schmidt Orthogonalization

    Wei HOU  Tadashi FUJINO  Toshiharu KOJIMA  

     
    PAPER-Communication Theory

      Vol:
    E96-A No:12
      Page(s):
    2405-2414

    Lattice-reduction (LR) technique has been adopted to improve the performance and reduce the complexity in MIMO data detection. This paper presents an improved quantization scheme for LR aided MIMO detection based on Gram-Schmidt orthogonalization. For the LR aided detection, the quantization step applies the simple rounding operation, which often leads to the quantization errors. Meanwhile, these errors may result in the detection errors. Hence the purpose of the proposed detection is to further solve the problem of degrading the performance due to the quantization errors in the signal estimation. In this paper, the proposed quantization scheme decreases the quantization errors using a simple tree search with a threshold function. Through the analysis and the simulation results, we observe that the proposed detection can achieve the nearly optimal performance with very low complexity, and require a little additional complexity compared to the conventional LR-MMSE detection in the high Eb/N0 region. Furthermore, this quantization error reduction scheme is also efficient even for the high modulation order.

  • New Formulation for the Recursive Transfer Method Using the Weak Form Theory Framework and Its Application to Microwave Scattering

    Hatsuhiro KATO  Hatsuyoshi KATO  

     
    PAPER-Numerical Analysis and Optimization

      Vol:
    E96-A No:12
      Page(s):
    2698-2708

    The recursive transfer method (RTM) is a numerical technique that was developed to analyze scattering phenomena and its formulation is constructed with a difference equation derived from a differential equation by Numerov's discretization method. However, the differential equation to which Numerov's method is applicable is restricted and therefore the application range of RTM is also limited. In this paper, we provide a new discretization scheme to extend RTM formulation using the weak form theory framework. The effectiveness of the proposed formulation is confirmed by microwave scattering induced by a metallic pillar placed asymmetrically in the waveguide. A notable feature of RTM is that it can extract a localized wave from scattering waves even if the tail of the localized wave reaches to the ends of analyzing region. The discrepancy between the experimental and theoretical data is suppressed with in an upper bound determined by the standing wave ratio of the waveguide.

  • Unsupervised Sentiment-Bearing Feature Selection for Document-Level Sentiment Classification

    Yan LI  Zhen QIN  Weiran XU  Heng JI  Jun GUO  

     
    PAPER-Pattern Recognition

      Vol:
    E96-D No:12
      Page(s):
    2805-2813

    Text sentiment classification aims to automatically classify subjective documents into different sentiment-oriented categories (e.g. positive/negative). Given the high dimensionality of features describing documents, how to effectively select the most useful ones, referred to as sentiment-bearing features, with a lack of sentiment class labels is crucial for improving the classification performance. This paper proposes an unsupervised sentiment-bearing feature selection method (USFS), which incorporates sentiment discriminant analysis (SDA) into sentiment strength calculation (SSC). SDA applies traditional linear discriminant analysis (LDA) in an unsupervised manner without losing local sentiment information between documents. We use SSC to calculate the overall sentiment strength for each single feature based on its affinities with some sentiment priors. Experiments, performed using benchmark movie reviews, demonstrated the superior performance of USFS.

  • Synchronization-Aware Virtual Machine Scheduling for Parallel Applications in Xen

    Cheol-Ho HONG  Chuck YOO  

     
    LETTER

      Vol:
    E96-D No:12
      Page(s):
    2720-2723

    In this paper, we propose a synchronization-aware VM scheduler for parallel applications in Xen. The proposed scheduler prevents threads from waiting for a significant amount of time during synchronization. For this purpose, we propose an identification scheme that can identify the threads that have awaited other threads for a long time. In this scheme, a detection module that can infer the internal status of guest OSs was developed. We also present a scheduling policy that can accelerate bottlenecks of concurrent VMs. We implemented our VM scheduler in the recent Xen hypervisor with para-virtualized Linux-based operating systems. We show that our approach can improve the performance of concurrent VMs by up to 43% as compared to the credit scheduler.

  • GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

    Fumihiko INO  Shinta NAKAGAWA  Kenichi HAGIHARA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2604-2616

    This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.

  • Study of Multi-Cell Interference in a 2-Hop OFDMA Virtual Cellular Network

    Gerard J. PARAISON  Eisuke KUDOH  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E96-B No:12
      Page(s):
    3163-3171

    In the literature, many resource allocation schemes have been proposed for multi-hop networks. However, the analyses provided focus mainly on the single cell case. Inter-cell interference severely degrades the performance of a wireless mobile network. Therefore, incorporating the analysis of inter-cell interference into the study of a scheme is required to more fully understand the performance of that scheme. The authors of this paper have proposed a parallel relaying scheme for a 2-hop OFDMA virtual cellular network (VCN). The purpose of this paper is to study a new version of that scheme which considers a multi-cell environment and evaluate the performance of the VCN. The ergodic channel capacity and outage capacity of the VCN in the presence of inter-cell interference are evaluated, and the results are compared to those of the single hop network (SHN). Furthermore, the effect of the location and number of wireless ports in the VCN on the channel capacity of the VCN is investigated, and the degree of fairness of the VCN relative to that of the SHN is compared. Using computer simulations, it is found that in the presence of inter-cell interference, a) the VCN outperforms the SHN even in the interference dominant transmission power region (when a single cell is considered, the VCN is better than the SHN only in the noise dominant transmission power region), b) the channel capacity of the VCN remains greater than that of the SHN even if the VCN is fully loaded, c) an optimal distance ratio for the location of the wireless ports can be found in the interval 0.2∼0.4, d) increasing the number of wireless ports from 3 to 6 can increase the channel capacity of the VCN, and e) the VCN can achieve better outage capacity than the SHN.

  • Device-Aware Visual Quality Adaptation for Wireless N-Screen Multicast Systems

    Inwoong LEE  Jincheol PARK  Seonghyun KIM  Taegeun OH  Sanghoon LEE  

     
    PAPER-Terrestrial Wireless Communication/Broadcasting Technologies

      Vol:
    E96-B No:12
      Page(s):
    3181-3189

    We seek a resource allocation algorithm through carrier allocation and modulation mode selection for improving the quality of service (QoS) that can adapt to various screen sizes and dynamic channel variations. In terms of visual quality, the expected visual entropy (EVE) is defined to quantify the visual information of being contained in each layer of the scalable video coding (SVC). Fairness optimization is conducted to maximize the EVE using an objective function for given constraints of radio resources. To conduct the fairness optimization, we propose a novel approximation algorithm for resource allocation for the maximal EVE. Simulations confirm that the QoS in terms of the EVE or peak signal to noise ratio (PSNR) is significantly improved by using the novel algorithm.

  • Dual-Edge-Triggered Flip-Flop-Based High-Level Synthesis with Programmable Duty Cycle

    Keisuke INOUE  Mineo KANEKO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E96-A No:12
      Page(s):
    2689-2697

    This paper addresses a high-level synthesis (HLS) using dual-edge-triggered flip-flops (DETFFs) as memory elements. In DETFF-based HLS, the duty cycle becomes a manageable resource to improve the timing performance. To utilize the duty cycle radically, a programmable duty cycle (PDC) mechanism is built into this HLS, and captured by a new HLS task named PDC scheduling. As a first step toward DETFF-based HLS with PDC, the execution time minimization problem is formulated for given results of operation scheduling. A linear program is presented to solve this problem in polynomial time. As a next step, simultaneous operation scheduling and PDC scheduling problem for the same objective is tackled. A mixed integer linear programming-based (MILP) approach is presented to solve this problem. The experimental results show that the MILP can reduce the execution time for several benchmarks.

  • CMOS Driver for Heavy-Load Flat-Panel Scan-Line Circuit Based on Complementary Dual-Bootstrap

    Shu-Chung YI  Zhi-Ming LIN  Po-Yo KUO  Hsin-Chi LAI  

     
    PAPER

      Vol:
    E96-C No:11
      Page(s):
    1399-1403

    This paper, presents a high-speed full swing driver for a heavy-load flat-panel scan-line circuit. The high driving capability is achieved using the proposed Complementary Dual-Bootstrap (CDUB) technique. The scan-line CDUB driver was fabricated in a 0.35-µm CMOS technology. The measured results, under the flat-panel scan-line load model, indicate that the delay time is within 2.8µs and the average power is 0.74mW for a 5V supply voltage.

  • On Discrete Logarithm Based Additively Homomorphic Encryption

    Jae Hong SEO  Keita EMURA  

     
    LETTER-Cryptography and Information Security

      Vol:
    E96-A No:11
      Page(s):
    2286-2289

    In this paper, we examine additive homomorphic encryptions in the discrete logarithm setting. Recently, Wang et al. proposed an additive homomorphic encryption scheme by modifying the ElGamal encryption scheme [Information Sciences 181(2011) 3308-3322]. We show that their scheme allows only limited number of additions among encrypted messages, which is different from what they claimed.

  • Mathematical Analysis of Call Admission Control in Mobile Hotspots

    Jae Young CHOI  Bong Dae CHOI  

     
    PAPER-Fundamental Theories for Communications

      Vol:
    E96-B No:11
      Page(s):
    2816-2827

    A mobile hotspot is a moving vehicle that hosts an Access Point (AP) such as train, bus and subway where users in these vehicles connect to external cellular network through AP to access their internet services. To meet Quality of Service (QoS) requirements, typically throughput and/or delay, a Call Admission Control (CAC) is needed to restrict the number of users accepted by the AP. In this paper, we analyze a modified guard channel scheme as CAC for mobile hotspot as follows: During a mobile hotspot is in the stop-state, we adopt a guard channel scheme where the optimal number of resource units is reserved for vertical handoff users from cellular network to WLAN. During a mobile hotspot is in the move-state, there are no handoff calls and so no resources for handoff calls are reserved in order to maximize the utility of the WLAN capacity. We model call's arrival and departure processes by Markov Modulated Poisson Process (MMPP) and then we model our CAC by 2-dimensional continuous time Markov chain (CTMC) for single traffic and 3-dimensional CTMC for two types of traffic. We solve steady-state probabilities by the Quasi-Birth and Death (QBD) method and we get various performance measures such as the new call blocking probabilities, the handoff call dropping probabilities and the channel utilizations. We compare our CAC with the conventional guard channel scheme which the number of guard resources is fixed all the time regardless of states of the mobile hotspot. Finally, we find the optimal threshold value on the amount of resources to be reserved for the handoff call subject to a strict constraint on the handoff call dropping probability.

  • Congestion Control, Routing and Scheduling in Communication Networks: A Tutorial Open Access

    Jean WALRAND  Abhay K. PAREKH  

     
    INVITED PAPER

      Vol:
    E96-B No:11
      Page(s):
    2714-2723

    In communication networks, congestion control, routing, and multiple access schemes for scheduling transmissions are typically regulated by distributed algorithms. Engineers designed these algorithms using clever heuristics that they refined in the light of simulation results and experiments. Over the last two decades, a deeper understanding of these algorithms emerged through the work of researchers. This understanding has a real potential for improving the design of protocols for data centers, cloud computing, and even wireless networks. Since protocols tend to be standardized by engineers, it is important that they become familiar with the insights that emerged in research. We hope that this paper might appeal to practitioners and make the research results intuitive and useful. The methods that the paper describes may be useful for many other resource allocation problems such as in call centers, manufacturing lines, hospitals and the service industry.

1081-1100hit(4570hit)