The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] purpose(17hit)

1-17hit
  • A Full-Flexibility-Guaranteed Pin-Count Reduction Design for General-Purpose Digital Microfluidic Biochips

    Trung Anh DINH  Shigeru YAMASHITA  Tsung-Yi HO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E99-A No:2
      Page(s):
    570-578

    Different from application-specific digital microfluidic biochips, a general-purpose design has several advantages such as dynamic reconfigurability, and fast on-line evaluation for real-time applications. To achieve such superiority, this design typically activates each electrode in the chip using an individual control pin. However, as the design complexity increases substantially, an order-of-magnitude increase in the number of control pins will significantly affect the manufacturing cost. To tackle this problem, several methods adopting a pin-sharing mechanism for general-purpose designs have been proposed. Nevertheless, these approaches sacrifice the flexibility of droplet movement, and result in an increase of bioassay completion time. In this paper, we present a novel pin-count reduction design methodology for general-purpose microfluidic biochips. Distinguished from previous approaches, the proposed methodology not only reduces the number of control pins significantly but also guarantees the full flexibility of droplet movement to ensure the minimal bioassay completion time.

  • Acceleration of the Fast Multipole Method on FPGA Devices

    Hitoshi UKAWA  Tetsu NARUMI  

     
    LETTER-Application

      Pubricized:
    2014/11/19
      Vol:
    E98-D No:2
      Page(s):
    309-312

    The fast multipole method (FMM) for N-body simulations is attracting much attention since it requires minimal communication between computing nodes. We implemented hardware pipelines specialized for the FMM on an FPGA device, the GRAPE-9. An N-body simulation with 1.6×107 particles ran 16 times faster than that on a CPU. Moreover the particle-to-particle stage of the FMM on the GRAPE-9 executed 2.5 times faster than on a GPU in a limited case.

  • A Framework of Time, Place, Purpose and Personal Profile Based Recommendation Service for Mobile Environment

    Sineenard PINYAPONG  Toshikazu KATO  

     
    PAPER

      Vol:
    E88-D No:5
      Page(s):
    938-946

    Nowadays more people have started using their mobile phone to access information they need from anywhere at anytime. In advanced mobile technology, Location Service allows users to quickly pinpoint their location as well as makes a recommendation to fascinating events. However, users desire more appropriate recommendation services. In other words, the message service should push a message at a proper place in time. In consequence, customers obtain a higher level of satisfaction. In this paper, we propose a framework of time, place, purpose and personal profile based recommendation service. We illustrate scenarios in "push", "pull" and "don't disturb" services, where our DB queries can recommend the relevant message to users. The three factors: time, place and purpose are mutually dependent and the basic rules to analyze the essential data are summarized. We also create algorithms for DB query. We are filtering messages by one important factor: personal profile such as user's preference and degree of preference. Furthermore, we discuss an implementation of the prototype system, including results of experimental evaluation.

  • A Multipurpose Image Watermarking Method for Copyright Notification and Protection

    Zhe-Ming LU  Hao-Tian WU  Dian-Guo XU  Sheng-He SUN  

     
    LETTER-Applications of Information Security Techniques

      Vol:
    E86-D No:9
      Page(s):
    1931-1933

    This paper presents an image watermarking method for two purposes: to notify the copyright owner with a visible watermark, and to protect the copyright with an invisible watermark. These two watermarks are embedded in different blocks with different methods. Simulation results show that the visible watermark is hard to remove and the invisible watermark is robust.

  • PARS Architecture: A Reconfigurable Architecture with Generalized Execution Model--Design and Implementation of Its Prototype Processor

    Kazuya TANIGAWA  Tetsuo HIRONAKA  Akira KOJIMA  Noriyoshi YOSHIDA  

     
    PAPER

      Vol:
    E86-D No:5
      Page(s):
    830-840

    Reconfigurable architectures have been focused for its potential on achieving high performance by reconfiguring special purpose circuits for a target application and its flexibility due to its ability of reconfiguring. We have set our sights on use of a reconfigurable architecture as a general-purpose computer by extending the advantageous properties of the architecture. To achieve the goal, a generalized execution model for reconfigurable architecture is required, so we have proposed an Ideal PARallel Structure (I-PARS) execution model. In the I-PARS execution model, any programs based on its model has no restriction depending on hardware structures based on a specific reconfigurable processor, which makes it easier to develop software. Further, we have proposed a PARS architecture which executes programs based on the I-PARS execution model effectively. The PARS architecture has a large reconfigurable part for highly parallel execution, which utilizes parallelism described on the I-PARS execution model. For effective utilization of the reconfigurable part in the PARS architecture, it has an ability to reconfigure and execute operations simultaneously in one cycle. Further, the PARS architecture supports branch operations to introduce control flow in an execution on the architecture, which makes it possible to skip an execution which does not produce a valid result. In this paper, we introduce the detailed structure of an implemented prototype processor based on the PARS architecture. In the implementation, 420,377 CMOS transistors were used, which was only 3.8% of the number of transistors used in the UltraSPARC-III in logic circuits. Additionally, we evaluated the performance of the prototype processor by using some benchmark programs. From the evaluation results, we found that the prototype processor could achieve nearly the same performance and be implemented with extremely the less number of transistors compared with UltraSPARC-III 750MHz.

  • SP2: A Very Large-Scale Event Driven Logic Simulation Hardware

    Hirofumi HAMAMURA  Hiroaki KOMATSU  

     
    PAPER-Logic Simulation

      Vol:
    E85-A No:12
      Page(s):
    2737-2745

    This paper describes special-purpose hardware for large-scale logic simulation, called SP2, which executes an event driven algorithm and can simulate up to sixteen million gates. SP2 was developed, in 1992, for system verification of large-scale computer designs as a successor to SP1, which was developed in 1987. SP2 provides enhanced performance, throughput, and delay accuracy over SP1. Since 1992, SP2 has been widely used for system-level simulation of mainframes, super computers, UNIX servers and microprocessors. It is used as a powerful simulator, in all stages of design verification, or in early stages, before regression testing, by using emulators.

  • A High Assurance On-Line Recovery Technology for a Space On-Board Computer

    Hiroyuki YASHIRO  Teruo FUJIWARA  Kinji MORI  

     
    PAPER-Issues

      Vol:
    E84-D No:10
      Page(s):
    1350-1359

    A high assurance on-line recovery technology for a space on-board computer that can be realized using commercial devices is proposed whereby a faulty processor node confirms its normality and then recovers without affecting the other processor nodes in operation. Also, the result of an evaluation test using the breadboard model implementing this technology is reported. Because this technology enables simple and assured recovery of a faulty processor node regardless of its degree of redundancy, it can be applied to various applications, such as a launch vehicle, a satellite, and a reusable launch vehicle. As a result, decreasing the cost of an on-board computer is possible while maintaining its high reliability.

  • Floating-Point Divide Operation without Special Hardware Supports

    Takashi AMISAKI  Umpei NAGASHIMA  Kazutoshi TANABE  

     
    LETTER-Numerical Analysis and Optimization

      Vol:
    E82-A No:1
      Page(s):
    173-177

    Three multiplicative algorithms for the floating-point divide operation are compared: the Newton-Raphson method, Goldschmidt's algorithm, and a naive method that simply calculates a form of the Taylor series expansion of a reciprocal. The series also provides a theoretical basis for Goldschmidt's algorithm. It is well known that, of the Newton-Raphson method and Goldschmidt's algorithm, the former is the more accurate while the latter is the faster on a pipelined unit. However, little is reported about the naive method. In this report, we analyze the speed and accuracy of each method and present the results of numerical tests, which we conducted to confirm the validity of the accuracy analysis. Basically, the comparison are made in the context of software implementation (e. g. , a macro library) and compliance with the IEEE Standard 754 rounding is not considered. It is shown that the naive method is useful in a realistic setting where the number of iterations is small and the method is implemented on a pipelined floating-point unit with a multiply-accumulate configuration. In such a situation, the naive method gives a more accurate result with a slightly lower latency, as compared with Goldschmidt's algorithm, and is much faster than but slightly inferior in accuracy to the Newton-Raphson method.

  • Adaptive Speed Control of a General-Purpose Processor Based on Activities

    Sanehiro FURUICHI  Toru AIHARA  

     
    LETTER

      Vol:
    E81-C No:9
      Page(s):
    1481-1483

    This paper proposes a new method for dynamically controlling the clock speed of a processor in order to reduce power consumption without decreasing system performance. It automatically tunes the processor's speed by monitoring its activities and avoiding useless work so as not to exhaust the battery energy. Experiments with performance bottlenecks caused by disk activities show that the proposed method is very effective in comparison with the traditional one, in which the processor's speed is fixed.

  • A Three-Dimensional Instrumentation VLSI Processor Based on a Concurrent Memory-Access Scheme

    Seunghwan LEE  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E80-C No:11
      Page(s):
    1491-1498

    Three-dimensional (3-D) instrumentation using an image sequence is a promising instrumentation method for intelligent systems in which accurate 3-D information is required. However, real-time instrumentation is difficult since much computation time and a large memory bandwidth are required. In this paper, a 3-D instrumentation VLSI processor with a concurrent memory-access scheme is proposed. To reduce the access time, frequently used data are stored in a cache register array and are concurrently transferred to processing elements using simple interconnections to the 8-nearest neighbor registers. Based on a row and column memory access pattern, we propose a diagonally interleaved frame memory by which pixel values of a row and column are stored across memory modules. Based on the concurrent memory-access scheme, a 40 GOPS vprocessor is designed and the delay time for the instrumentation is estimated to be 42 ms for a 256256 images.

  • Special-Purpose Hardware Architecture for Large Scale Linear Programming

    Shinhaeng LEE  Shin'ichiro OMACHI  Hirotomo ASO  

     
    PAPER-Computer Architecture

      Vol:
    E80-D No:9
      Page(s):
    893-898

    Linear programming techniques are useful in many diverse applications such as: production planning, energy distribution etc. To find an optimal solution of the linear programming problem, we have to repeat computations and it takes a lot of processing time. For high speed computation of linear programming, special purpose hardware has been sought. This paper proposes a systolic array for solving linear programming problems using the revised simplex method which is a typical algorithm of linear programming. This paper also proposes a modified systolic array that can solve linear programming problems whose sizes are very large.

  • Hardware Framework for Accelerating the Execution Speed of a Genetic Algorithm

    Barry SHACKLEFORD  Etsuko OKUSHI  Mitsuhiro YASUDA  Hisao KOIZUMI  Katsuhiko SEO  Takashi IWAMOTO  

     
    PAPER-Multi Processors

      Vol:
    E80-C No:7
      Page(s):
    962-969

    Genetic algorithms were introduced by Holland in 1975 as a method of solving difficult optimization problems by means of simulated evolution. A major drawback of genetic algorithms is their slowness when emulated by software on conventional computers. Described is an adaptation of the original genetic algorithm that is advantageous to hardware implementation along with the architecture of a hardware framework that performs the functions of population storage, selection, crossover, mutation, fitness evaluation, and survival determination. Programming of the framework is illustrated with the set coverage problem that exhibits a 6,000 speed-up over software emulation on a 100 MHz workstation.

  • Design of a CAM-Based Collision Detection VLSI Processor for Robotics

    Masanori HARIYAMA  Michitaka KANEYAMA  

     
    PAPER

      Vol:
    E77-C No:7
      Page(s):
    1108-1115

    Real-time collision detection is one of the most important intelligent processings in robotics. In collision detection, a large storage capasity is usually required to store the 3-dimensional information on the obstacles located in a workspace. Moreover, high-computational power is essential in not only coordinate transformation but also matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a content-addressable memory (CAM). A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be performed by only magnitude comparison in parallel. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed, and each PE performs coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. When the 16 PEs and 144-kb CAM are used, the performance is evaluated to be 90 ms.

  • A VLSI-Oriented Model-Based Robot Vision Processor for 3-D Instrumentation and Object Recognition

    Yoshifumi SASAKI  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E77-C No:7
      Page(s):
    1116-1122

    In robot vision system, enormously large computation power is required to perform three-dimensional (3-D) instrumentation and object recognition. However, many kinds of complex and irregular operations are required to make accurate 3-D instrumentation and object recognition in the conventional method for software implementation. In this paper, a VLSI-oriented Model-Based Robot Vision (MBRV) processor is proposed for high-speed and accurate 3-D instrumentation and object recognition. An input image is compared with two-dimensional (2-D) silhouette images which are generated from the 3-D object models by means of perspective projection. Because the MBRV algorithm always gives the candidates for the accurate 3-D instrumentation and object recognition result with simple and regular procedures, it is suitable for the implementation of the VLSI processor. Highly parallel architecture is employed in the VLSI processor to reduce the latency between the image acquisition and the output generation of the 3-D instrumentation and object recognition results. As a result, 3-D instrumentation and object recognition can be performed 10000 times faster than a 28.5 MIPS workstation.

  • A Collision Detection Processor for Intelligent Vehicles

    Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E76-C No:12
      Page(s):
    1804-1811

    Since carelessness in driving causes a terrible traffic accident, it is an important subject for a vehicle to avoid collision autonomously. Real-time collision detection between a vehicle and obstacles will be a key target for the next-generation car electronics system. In collision detection, a large storage capacity is usually required to store the 3-D information on the obstacles lacated in a workspace. Moreover, high-computational power is essential not only in coordinate transformation but also in matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a Content-Addressable Memory (CAM) which evaluates the magnitude relationships between an input word and all the stored words in parallel. A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be parformed only by parallel magnitude comparison. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. The collision detection time becomes 5.2 ms using 20 PEs and five CAMs with a 42-kbit capacity.

  • Unified Scheduling of High Performance Parallel VLSI Processors for Robotics

    Bumchul KIM  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Parallel Processor Scheduling

      Vol:
    E76-A No:6
      Page(s):
    904-910

    The performance of processing elements can be improved by the progress of VLSI circuit technology, while the communication overhead can not be negligible in parallel processing system. This paper presents a unified scheduling that allocates tasks having different task processing times in multiple processing elements. The objective function is formulated to measure communication time between processing elements. By employing constraint conditions, the scheduling efficiently generates an optimal solution using an integer programming so that minimum communication time can be achieved. We also propose a VLSI processor for robotics whose latency is very small. In the VLSI processor, the data transfer between two processing elements can be done very quickly, so that the communication cycle time is greatly reduced.

  • An Algorithm for the K-Selection Problem Using Special-Purpose Sorters

    Heung-Shik KIM  Jong-Soo PARK  Myunghwan KIM  

     
    PAPER-Algorithm and Computational Complexity

      Vol:
    E75-D No:5
      Page(s):
    704-708

    An algorithm is presented for selecting the k-th smallest element of a totally ordered (but not sorted) set of n elements, 1kn, in the case that a special-purpose sorter is used as a coprocessor. When the pipeline merge sorter is used as the special-purpose sorter, we analyze the comparison complexity of the algorithm for the given capacity of the sorter. The comparison complexity of the algorithm is 1.4167no(n), provided that the capacity of the sorter is 256 elements. The comparison complexity of the algorithm decreases as the capacity of the sorter increases.