IEICE global.ieice.org Site

Keyword Search Result

[Keyword] purpose(17hit)

1-17hit

A Full-Flexibility-Guaranteed Pin-Count Reduction Design for General-Purpose Digital Microfluidic Biochips
Trung Anh DINH Shigeru YAMASHITA Tsung-Yi HO

PAPER-VLSI Design Technology and CAD

Vol:
E99-A No:2
Page(s):
570-578
Different from application-specific digital microfluidic biochips, a general-purpose design has several advantages such as dynamic reconfigurability, and fast on-line evaluation for real-time applications. To achieve such superiority, this design typically activates each electrode in the chip using an individual control pin. However, as the design complexity increases substantially, an order-of-magnitude increase in the number of control pins will significantly affect the manufacturing cost. To tackle this problem, several methods adopting a pin-sharing mechanism for general-purpose designs have been proposed. Nevertheless, these approaches sacrifice the flexibility of droplet movement, and result in an increase of bioassay completion time. In this paper, we present a novel pin-count reduction design methodology for general-purpose microfluidic biochips. Distinguished from previous approaches, the proposed methodology not only reduces the number of control pins significantly but also guarantees the full flexibility of droplet movement to ensure the minimal bioassay completion time.
Acceleration of the Fast Multipole Method on FPGA Devices
Hitoshi UKAWA Tetsu NARUMI

LETTER-Application

Pubricized:
2014/11/19
Vol:
E98-D No:2
Page(s):
309-312
The fast multipole method (FMM) for N-body simulations is attracting much attention since it requires minimal communication between computing nodes. We implemented hardware pipelines specialized for the FMM on an FPGA device, the GRAPE-9. An N-body simulation with 1.6×107 particles ran 16 times faster than that on a CPU. Moreover the particle-to-particle stage of the FMM on the GRAPE-9 executed 2.5 times faster than on a GPU in a limited case.
A Framework of Time, Place, Purpose and Personal Profile Based Recommendation Service for Mobile Environment
Sineenard PINYAPONG Toshikazu KATO

PAPER

Vol:
E88-D No:5
Page(s):
938-946
Nowadays more people have started using their mobile phone to access information they need from anywhere at anytime. In advanced mobile technology, Location Service allows users to quickly pinpoint their location as well as makes a recommendation to fascinating events. However, users desire more appropriate recommendation services. In other words, the message service should push a message at a proper place in time. In consequence, customers obtain a higher level of satisfaction. In this paper, we propose a framework of time, place, purpose and personal profile based recommendation service. We illustrate scenarios in "push", "pull" and "don't disturb" services, where our DB queries can recommend the relevant message to users. The three factors: time, place and purpose are mutually dependent and the basic rules to analyze the essential data are summarized. We also create algorithms for DB query. We are filtering messages by one important factor: personal profile such as user's preference and degree of preference. Furthermore, we discuss an implementation of the prototype system, including results of experimental evaluation.
A Multipurpose Image Watermarking Method for Copyright Notification and Protection
Zhe-Ming LU Hao-Tian WU Dian-Guo XU Sheng-He SUN

LETTER-Applications of Information Security Techniques

Vol:
E86-D No:9
Page(s):
1931-1933
This paper presents an image watermarking method for two purposes: to notify the copyright owner with a visible watermark, and to protect the copyright with an invisible watermark. These two watermarks are embedded in different blocks with different methods. Simulation results show that the visible watermark is hard to remove and the invisible watermark is robust.
PARS Architecture: A Reconfigurable Architecture with Generalized Execution Model--Design and Implementation of Its Prototype Processor
Kazuya TANIGAWA Tetsuo HIRONAKA Akira KOJIMA Noriyoshi YOSHIDA

PAPER

Vol:
E86-D No:5
Page(s):
830-840
Reconfigurable architectures have been focused for its potential on achieving high performance by reconfiguring special purpose circuits for a target application and its flexibility due to its ability of reconfiguring. We have set our sights on use of a reconfigurable architecture as a general-purpose computer by extending the advantageous properties of the architecture. To achieve the goal, a generalized execution model for reconfigurable architecture is required, so we have proposed an Ideal PARallel Structure (I-PARS) execution model. In the I-PARS execution model, any programs based on its model has no restriction depending on hardware structures based on a specific reconfigurable processor, which makes it easier to develop software. Further, we have proposed a PARS architecture which executes programs based on the I-PARS execution model effectively. The PARS architecture has a large reconfigurable part for highly parallel execution, which utilizes parallelism described on the I-PARS execution model. For effective utilization of the reconfigurable part in the PARS architecture, it has an ability to reconfigure and execute operations simultaneously in one cycle. Further, the PARS architecture supports branch operations to introduce control flow in an execution on the architecture, which makes it possible to skip an execution which does not produce a valid result. In this paper, we introduce the detailed structure of an implemented prototype processor based on the PARS architecture. In the implementation, 420,377 CMOS transistors were used, which was only 3.8% of the number of transistors used in the UltraSPARC-III in logic circuits. Additionally, we evaluated the performance of the prototype processor by using some benchmark programs. From the evaluation results, we found that the prototype processor could achieve nearly the same performance and be implemented with extremely the less number of transistors compared with UltraSPARC-III 750MHz.
SP2: A Very Large-Scale Event Driven Logic Simulation Hardware
Hirofumi HAMAMURA Hiroaki KOMATSU

PAPER-Logic Simulation

Vol:
E85-A No:12
Page(s):
2737-2745
This paper describes special-purpose hardware for large-scale logic simulation, called SP2, which executes an event driven algorithm and can simulate up to sixteen million gates. SP2 was developed, in 1992, for system verification of large-scale computer designs as a successor to SP1, which was developed in 1987. SP2 provides enhanced performance, throughput, and delay accuracy over SP1. Since 1992, SP2 has been widely used for system-level simulation of mainframes, super computers, UNIX servers and microprocessors. It is used as a powerful simulator, in all stages of design verification, or in early stages, before regression testing, by using emulators.
A High Assurance On-Line Recovery Technology for a Space On-Board Computer
Hiroyuki YASHIRO Teruo FUJIWARA Kinji MORI

PAPER-Issues

Vol:
E84-D No:10
Page(s):
1350-1359
A high assurance on-line recovery technology for a space on-board computer that can be realized using commercial devices is proposed whereby a faulty processor node confirms its normality and then recovers without affecting the other processor nodes in operation. Also, the result of an evaluation test using the breadboard model implementing this technology is reported. Because this technology enables simple and assured recovery of a faulty processor node regardless of its degree of redundancy, it can be applied to various applications, such as a launch vehicle, a satellite, and a reusable launch vehicle. As a result, decreasing the cost of an on-board computer is possible while maintaining its high reliability.
Floating-Point Divide Operation without Special Hardware Supports
Takashi AMISAKI Umpei NAGASHIMA Kazutoshi TANABE

LETTER-Numerical Analysis and Optimization

Vol:
E82-A No:1
Page(s):
173-177
Three multiplicative algorithms for the floating-point divide operation are compared: the Newton-Raphson method, Goldschmidt's algorithm, and a naive method that simply calculates a form of the Taylor series expansion of a reciprocal. The series also provides a theoretical basis for Goldschmidt's algorithm. It is well known that, of the Newton-Raphson method and Goldschmidt's algorithm, the former is the more accurate while the latter is the faster on a pipelined unit. However, little is reported about the naive method. In this report, we analyze the speed and accuracy of each method and present the results of numerical tests, which we conducted to confirm the validity of the accuracy analysis. Basically, the comparison are made in the context of software implementation (e. g. , a macro library) and compliance with the IEEE Standard 754 rounding is not considered. It is shown that the naive method is useful in a realistic setting where the number of iterations is small and the method is implemented on a pipelined floating-point unit with a multiply-accumulate configuration. In such a situation, the naive method gives a more accurate result with a slightly lower latency, as compared with Goldschmidt's algorithm, and is much faster than but slightly inferior in accuracy to the Newton-Raphson method.
Adaptive Speed Control of a General-Purpose Processor Based on Activities
Sanehiro FURUICHI Toru AIHARA

LETTER

Vol:
E81-C No:9
Page(s):
1481-1483
This paper proposes a new method for dynamically controlling the clock speed of a processor in order to reduce power consumption without decreasing system performance. It automatically tunes the processor's speed by monitoring its activities and avoiding useless work so as not to exhaust the battery energy. Experiments with performance bottlenecks caused by disk activities show that the proposed method is very effective in comparison with the traditional one, in which the processor's speed is fixed.
A Three-Dimensional Instrumentation VLSI Processor Based on a Concurrent Memory-Access Scheme
Seunghwan LEE Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Integrated Electronics

Vol:
E80-C No:11
Page(s):
1491-1498
Three-dimensional (3-D) instrumentation using an image sequence is a promising instrumentation method for intelligent systems in which accurate 3-D information is required. However, real-time instrumentation is difficult since much computation time and a large memory bandwidth are required. In this paper, a 3-D instrumentation VLSI processor with a concurrent memory-access scheme is proposed. To reduce the access time, frequently used data are stored in a cache register array and are concurrently transferred to processing elements using simple interconnections to the 8-nearest neighbor registers. Based on a row and column memory access pattern, we propose a diagonally interleaved frame memory by which pixel values of a row and column are stored across memory modules. Based on the concurrent memory-access scheme, a 40 GOPS vprocessor is designed and the delay time for the instrumentation is estimated to be 42 ms for a 256256 images.
Special-Purpose Hardware Architecture for Large Scale Linear Programming
Shinhaeng LEE Shin'ichiro OMACHI Hirotomo ASO

PAPER-Computer Architecture

Vol:
E80-D No:9
Page(s):
893-898
Linear programming techniques are useful in many diverse applications such as: production planning, energy distribution etc. To find an optimal solution of the linear programming problem, we have to repeat computations and it takes a lot of processing time. For high speed computation of linear programming, special purpose hardware has been sought. This paper proposes a systolic array for solving linear programming problems using the revised simplex method which is a typical algorithm of linear programming. This paper also proposes a modified systolic array that can solve linear programming problems whose sizes are very large.
Hardware Framework for Accelerating the Execution Speed of a Genetic Algorithm
Barry SHACKLEFORD Etsuko OKUSHI Mitsuhiro YASUDA Hisao KOIZUMI Katsuhiko SEO Takashi IWAMOTO

PAPER-Multi Processors

Vol:
E80-C No:7
Page(s):
962-969
Genetic algorithms were introduced by Holland in 1975 as a method of solving difficult optimization problems by means of simulated evolution. A major drawback of genetic algorithms is their slowness when emulated by software on conventional computers. Described is an adaptation of the original genetic algorithm that is advantageous to hardware implementation along with the architecture of a hardware framework that performs the functions of population storage, selection, crossover, mutation, fitness evaluation, and survival determination. Programming of the framework is illustrated with the set coverage problem that exhibits a 6,000 speed-up over software emulation on a 100 MHz workstation.
Design of a CAM-Based Collision Detection VLSI Processor for Robotics
Masanori HARIYAMA Michitaka KANEYAMA

PAPER

Vol:
E77-C No:7
Page(s):
1108-1115
Real-time collision detection is one of the most important intelligent processings in robotics. In collision detection, a large storage capasity is usually required to store the 3-dimensional information on the obstacles located in a workspace. Moreover, high-computational power is essential in not only coordinate transformation but also matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a content-addressable memory (CAM). A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be performed by only magnitude comparison in parallel. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed, and each PE performs coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. When the 16 PEs and 144-kb CAM are used, the performance is evaluated to be 90 ms.
A VLSI-Oriented Model-Based Robot Vision Processor for 3-D Instrumentation and Object Recognition
Yoshifumi SASAKI Michitaka KAMEYAMA

PAPER

Vol:
E77-C No:7
Page(s):
1116-1122
In robot vision system, enormously large computation power is required to perform three-dimensional (3-D) instrumentation and object recognition. However, many kinds of complex and irregular operations are required to make accurate 3-D instrumentation and object recognition in the conventional method for software implementation. In this paper, a VLSI-oriented Model-Based Robot Vision (MBRV) processor is proposed for high-speed and accurate 3-D instrumentation and object recognition. An input image is compared with two-dimensional (2-D) silhouette images which are generated from the 3-D object models by means of perspective projection. Because the MBRV algorithm always gives the candidates for the accurate 3-D instrumentation and object recognition result with simple and regular procedures, it is suitable for the implementation of the VLSI processor. Highly parallel architecture is employed in the VLSI processor to reduce the latency between the image acquisition and the output generation of the 3-D instrumentation and object recognition results. As a result, 3-D instrumentation and object recognition can be performed 10000 times faster than a 28.5 MIPS workstation.
A Collision Detection Processor for Intelligent Vehicles
Masanori HARIYAMA Michitaka KAMEYAMA

PAPER

Vol:
E76-C No:12
Page(s):
1804-1811
Since carelessness in driving causes a terrible traffic accident, it is an important subject for a vehicle to avoid collision autonomously. Real-time collision detection between a vehicle and obstacles will be a key target for the next-generation car electronics system. In collision detection, a large storage capacity is usually required to store the 3-D information on the obstacles lacated in a workspace. Moreover, high-computational power is essential not only in coordinate transformation but also in matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a Content-Addressable Memory (CAM) which evaluates the magnitude relationships between an input word and all the stored words in parallel. A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be parformed only by parallel magnitude comparison. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. The collision detection time becomes 5.2 ms using 20 PEs and five CAMs with a 42-kbit capacity.
Unified Scheduling of High Performance Parallel VLSI Processors for Robotics
Bumchul KIM Michitaka KAMEYAMA Tatsuo HIGUCHI

PAPER-Parallel Processor Scheduling

Vol:
E76-A No:6
Page(s):
904-910
The performance of processing elements can be improved by the progress of VLSI circuit technology, while the communication overhead can not be negligible in parallel processing system. This paper presents a unified scheduling that allocates tasks having different task processing times in multiple processing elements. The objective function is formulated to measure communication time between processing elements. By employing constraint conditions, the scheduling efficiently generates an optimal solution using an integer programming so that minimum communication time can be achieved. We also propose a VLSI processor for robotics whose latency is very small. In the VLSI processor, the data transfer between two processing elements can be done very quickly, so that the communication cycle time is greatly reduced.
An Algorithm for the K-Selection Problem Using Special-Purpose Sorters
Heung-Shik KIM Jong-Soo PARK Myunghwan KIM

PAPER-Algorithm and Computational Complexity

Vol:
E75-D No:5
Page(s):
704-708
An algorithm is presented for selecting the k-th smallest element of a totally ordered (but not sorted) set of n elements, 1kn, in the case that a special-purpose sorter is used as a coprocessor. When the pipeline merge sorter is used as the special-purpose sorter, we analyze the comparison complexity of the algorithm for the given capacity of the sorter. The comparison complexity of the algorithm is 1.4167no(n), provided that the capacity of the sorter is 256 elements. The comparison complexity of the algorithm decreases as the capacity of the sorter increases.

Keyword Search Result

[Keyword] purpose(17hit)

A Full-Flexibility-Guaranteed Pin-Count Reduction Design for General-Purpose Digital Microfluidic Biochips

Acceleration of the Fast Multipole Method on FPGA Devices

A Framework of Time, Place, Purpose and Personal Profile Based Recommendation Service for Mobile Environment

A Multipurpose Image Watermarking Method for Copyright Notification and Protection

PARS Architecture: A Reconfigurable Architecture with Generalized Execution Model--Design and Implementation of Its Prototype Processor

SP2: A Very Large-Scale Event Driven Logic Simulation Hardware

A High Assurance On-Line Recovery Technology for a Space On-Board Computer

Floating-Point Divide Operation without Special Hardware Supports

Adaptive Speed Control of a General-Purpose Processor Based on Activities

A Three-Dimensional Instrumentation VLSI Processor Based on a Concurrent Memory-Access Scheme

Special-Purpose Hardware Architecture for Large Scale Linear Programming

Hardware Framework for Accelerating the Execution Speed of a Genetic Algorithm

Design of a CAM-Based Collision Detection VLSI Processor for Robotics

A VLSI-Oriented Model-Based Robot Vision Processor for 3-D Instrumentation and Object Recognition

A Collision Detection Processor for Intelligent Vehicles

Unified Scheduling of High Performance Parallel VLSI Processors for Robotics

An Algorithm for the K-Selection Problem Using Special-Purpose Sorters

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles