The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] arc(1309hit)

881-900hit(1309hit)

  • ReVolver/C40: A Scalable Parallel Computer for Volume Rendering--Design and Implementation--

    Shin-ichiro MORI  Tomoaki TSUMURA  Masahiro GOSHIMA  Yasuhiko NAKASHIMA  Hiroshi NAKASHIMA  Shinji TOMITA  

     
    PAPER

      Vol:
    E86-D No:10
      Page(s):
    2006-2015

    This paper describes the architecture of ReVolver/C40 a scalable parallel machine for volume rendering and its prototype implementation. The most important feature of ReVolver/C40 is view-independent real time rendering of translucent 3D object by using perspective projection. In order to realize this feature, the authors propose a parallel volume memory architecture based on the principal axis oriented sampling method and parallel treble volume memory. This paper also discusses the implementation issues of ReVolver/C40 where various kinds of parallelism extracted to achieve high-perfromance rendering are explained. The prototype systems had been developed and their performance evaluation results are explained. As the results of the evaluation of the prototype systems, ReVolver/C40 with 32 parallel volume memory is estimated to achieve more than 10 frame per second for 2563 volume data on 2562 screen by using perspective projection. The authors also review the development of ReVolver/C40 from several view points.

  • A Flexible Architecture for Digital Signal Processing

    Wichai BOONKUMKLAO  Yoshikazu MIYANAGA  Kobchai DEJHAN  

     
    PAPER-VLSI Systems

      Vol:
    E86-D No:10
      Page(s):
    2179-2186

    In this paper, we introduce a flexible design for intellectual property(IP) which has become important to design system LSI. The proposed IPs which have high flexibility for user requirement. The design priority is determined by setting parameters as the number of arithmetic unit, internal bitlength, clock speed and so on. The design time can thus be reduced. Designed IP is based on the reconfigurable architecture in which many structures can be dynamically selected. This paper shows a implementation of Frequency Response Masking digital filter(FRM) and Principal Components Analysis(PCA) using a reconfigurable architecture. We show the method to realize the designed circuit and the results of experiments using field programmable gate array(FPGA).

  • Pot: A General Purpose Monitor for Parallel Computers

    Yuso KANAMORI  Oki MINABE  Masaki WAKABAYASHI  Hideharu AMANO  

     
    PAPER

      Vol:
    E86-D No:10
      Page(s):
    2025-2033

    At the initial stage of developing parallel machines, a software monitor, which manages communication between host computers, program loading and debugging, is necessary. However, it is often a cumbersome job to develop such a monitoring system especially when the target takes a parallel architecture. To solve this problem, we developed an integrated monitor system called "Pot". "Pot" consists of a system runs on the host computer and simple code on a target machine. In order to reduce the development costs, the program on a target machine is as simple as possible while "Pot" on the host computer itself provides various functions for system development.

  • Reduced Complexity Iterative Decoding Using a Sub-Optimum Minimum Distance Search

    Jun ASATANI  Takuya KOUMOTO  Kenichi TOMITA  Tadao KASAMI  

     
    LETTER-Coding Theory

      Vol:
    E86-A No:10
      Page(s):
    2596-2600

    In this letter, we propose (1) a new sub-optimum minimum distance search (sub-MDS), whose search complexity is reduced considerably compared with optimum MDSs and (2) a termination criterion, called near optimality condition, to reduce the average number of decoding iterations with little degradation of error performance for the proposed decoding using sub-MDS iteratively. Consequently, the decoding algorithm can be applied to longer codes with feasible complexity. Simulation results for several Reed-Muller (RM) codes of lengths 256 and 512 are given.

  • A High-Performance Tree-Block Pipelining Architecture for Separable 2-D Inverse Discrete Wavelet Transform

    Yeu-Horng SHIAU  Jer Min JOU  

     
    PAPER

      Vol:
    E86-D No:10
      Page(s):
    1966-1975

    In this paper, a high-performance pipelining architecture for 2-D inverse discrete wavelet transform (IDWT) is proposed. We use a tree-block pipeline-scheduling scheme to increase computation performance and reduce temporary buffers. The scheme divides the input subbands into several wavelet blocks and processes these blocks one by one, so the size of buffers for storing temporal subbands is greatly reduced. After scheduling the data flow, we fold the computations of all wavelet blocks into the same low-pass and high-pass filters to achieve higher hardware utilization and minimize hardware cost, and pipeline these two filters efficiently to reach higher throughput rate. For the computations of N N-sample 2-D IDWT with filter length of size K, our architecture takes at most (2/3)N2 cycles and requires 2N(K-2) registers. In addition, each filter is designed regularly and modularly, so it is easily scalable for different filter lengths and different levels. Because of its small storage, regularity, and high performance, the architecture can be applied to time-critical image decompression.

  • Design Development of SPARC64 V Microprocessor

    Mariko SAKAMOTO  Akira KATSUNO  Aiichiro INOUE  Takeo ASAKAWA  Kuniki MORITA  Tsuyoshi MOTOKURUMADA  Yasunori KIMURA  

     
    INVITED PAPER

      Vol:
    E86-D No:10
      Page(s):
    1955-1965

    We developed a SPARC-V9 processor, the SPARC64 V. It has an operating frequency of 1.35 GHz and contains 191 million transistors fabricated using 0.13-µm CMOS technology with eight-layer copper metallization. SPECjbb2000 (CPU# 32) is 492683, highest on the market and 42% higher than the next highest system. SPEC CPU2000 performance is 858 for SPECint and 1228 for SPECfp. The processor is designed to provide the high system performance and high reliability required of enterprise server systems. It is also designed to address the performance requirements of high-performance computing. During our development of several generations of mainframe processors, we conducted many related experiments, and obtained enterprise server system (EPS) development skills, an understanding of EPS workload characteristics, and technology that provides high reliability, availability, and serviceability. We used those as bases of the new processor development. The approach quite effectively moves beyond differences between mainframe and SPARC systems. At the beginning of development and before the start of hardware design, we developed a software performance simulator so we could understand the performance impacts of created specifications, thereby enabling us to make appropriate decisions about hardware design. We took this approach to solve performance problems before tape-out and avoid spending additional time on design update and physical machine reconstruction. We were successful, completing the high-performance processor development on schedule and in a short time. This paper describes the SPARC64 V microprocessor and performance analyses for development of its design.

  • Performance Evaluation of Instruction Set Architecture of MBP-Light in JUMP-1

    Noriaki SUZUKI  Hideharu AMANO  

     
    PAPER

      Vol:
    E86-D No:10
      Page(s):
    1996-2005

    The instruction set architecture of MBP-light, a dedicated processor for the DSM (Distributed Shared Memory) management of JUMP-1 is analyzed with a real prototype. The Buffer-Register Architecture proposed for MBP-core improves performance with 5.64% in the home cluster and 6.27% in a remote cluster. Only a special instruction for hashing cluster address is efficient and improves the performance with 2.80%, but other special instructions are almost useless. It appears that the dominant operations in the DSM management program were handling packet queues assigned into the local cluster. Thus, common RISC instructions, especially load/store instructions, are frequently used. Separating instruction and data memory improves performance with 33%. The results suggest that another alternative which provides separate on-chip cache and instructions dedicated for packet queue management is advantageous.

  • Resource-Optimal Software Pipelining Using Flow Graphs

    Dirk FIMMEL  Jan MULLER  Renate MERKER  

     
    INVITED PAPER-Software Systems and Technologies

      Vol:
    E86-D No:9
      Page(s):
    1560-1568

    We present a new approach to the loop scheduling problem, which excels previous solutions in two important aspects: The resource constraints are formulated using flow graphs, and the initiation interval λ is treated as a rational variable. The approach supports heterogeneous processor architectures and pipelined functional units, and the Integer Linear Programming implementation produces an optimum loop schedule, whereby a minimum λ is achieved. Our flow graph model facilitates the cyclic binding of loop operations to functional units. Compared to previous research results, the solution can provide faster loop schedules and a significant reduction of the problem complexity and solution time.

  • Batch-Incremental Nearest Neighbor Search Algorithm and Its Performance Evaluation

    Yaokai FENG  Akifumi MAKINOUCHI  

     
    PAPER-Databases

      Vol:
    E86-D No:9
      Page(s):
    1856-1867

    In light of the increasing number of computer applications that rely heavily on multimedia data, the database community has focused on the management and retrieval of multidimensional data. Nearest Neighbor queries (NN queries) have been widely used to perform content-based retrieval (e.g., similarity search) in multimedia applications. Incremental NN (INN) query is a kind of NN queries and can also be used when the number of the NN objects to be retrieved is not known in advance. This paper points out the weaknesses of the existing INN search algorithms and proposes a new one, called Batch-Incremental Nearest Neighbor search algorithm (denoted B-INN search algorithm), which can be used to process the INN query efficiently. The B-INN search algorithm is different from the existing INN search algorithms in that it does not employ the priority queue that is used in the existing INN search algorithms and is very CPU and memory intensive for large databases in high-dimensional spaces. And it incrementally reports b(b > 1) objects simultaneously (Batch-Incremental), whereas the existing INN search algorithms report the neighbors one by one. In order to implement the B-INN search, a new search (called k-d-NN search) with a new pruning strategy is proposed. Performance tests indicate that the B-INN search algorithm clearly outperforms the existing INN search algorithms in high-dimensional spaces.

  • On 1-Inkdot Alternating Pushdown Automata with Sublogarithmic Space

    Jianliang XU  Yong CHEN  Tsunehiro YOSHINAGA  Katsushi INOUE  

     
    PAPER-Theory of Automata, Formal Language Theory

      Vol:
    E86-D No:9
      Page(s):
    1814-1824

    This paper introduces a 1-inkdot two-way alternating pushdown automaton which is a two-way alternating pushdown automaton (2apda) with the additional power of marking at most 1 tape-cell on the input (with an inkdot) once. We first investigate a relationship between the accepting powers of sublogarithmically space-bounded 2apda's with and without 1 inkdot, and show, for example, that sublogarithmically space-bounded 2apda's with 1 inkdot are more powerful than those which have no inkdots. We next investigate an alternation hierarchy for sublogarithmically space-bounded 1-inkdot 2apda's, and show that the alternation hierarchy on the first level for 1-inkdot 2apda's holds, and we also show that 1-inkdot two-way nondeterministic pushdown automata using sublogarithmic space are incomparable with 1-inkdot two-way alternating pushdown automata with only universal states using the same space.

  • A Fast Encoding Method for Vector Quantization Based on 2-Pixel-Merging Sum Pyramid Data Structure

    Zhibin PAN  Koji KOTANI  Tadahiro OHMI  

     
    LETTER-Image

      Vol:
    E86-A No:9
      Page(s):
    2419-2423

    A fast winner search method for VQ based on 2-pixel-merging sum pyramid is proposed in order to reject a codeword at an earlier stage to reduce the computational burden. The necessary search scope of promising codewords is meanwhile narrowed by using sorted real sums. The high search efficiency is confirmed by experimental results.

  • HTN: A New Hierarchical Interconnection Network for Massively Parallel Computers

    M.M. Hafizur RAHMAN  Susumu HORIGUCHI  

     
    PAPER-Networking and Architectures

      Vol:
    E86-D No:9
      Page(s):
    1479-1486

    Interconnection networks usually suffer from Little's Law: low cost implies low performance and high performance is obtained high cost. However, hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in communication patterns of massively parallel computers. In this paper, we propose a new hierarchical interconnection network, called Hierarchical Torus Network (HTN). This network reduces the number of vertical links in 3D stacked implementation while maintaining good network features. This paper addresses the architectural details of the HTN, and explores aspects such as the network diameter, average distance, bisection width, peak number of vertical links, and VLSI layout area of the HTN as well as for several commonly used networks for parallel computers. It is shown that the HTN possesses several attractive features including small diameter, small average distance, small number of wires, a particularly small number of vertical links, and economic layout area.

  • A Robust Array Architecture for a Capacitorless MISS Tunnel-Diode Memory

    Satoru HANZAWA  Takeshi SAKATA  Tomonori SEKIGUCHI  Hideyuki MATSUOKA  

     
    PAPER-Integrated Electronics

      Vol:
    E86-C No:9
      Page(s):
    1886-1893

    With the aim of applying a MISS tunnel-diode cell to a high-density RAM, we studied its problems and developed three circuit technologies to solve them. The first, a standby-voltage control scheme, reduces standby currents and increases the signal current by 3.4 times compared to the conventional one. The second, a hierarchical bit-line structure, reduces the number of memory cells in a bit-line without increasing the number of sense amplifiers. The third, a twin-dummy-cell technique, generates a proper reference signal to discriminate read currents. These technologies enable a capacitorless MISS diode cell with an effective cell area of 6F 2 (F: minimum feature size) to be applied to a high-density RAM.

  • Integrated Pre-Fetching and Replacing Algorithm for Graceful Image Caching

    Zhou SU  Teruyoshi WASHIZAWA  Jiro KATTO  Yasuhiko YASUDA  

     
    PAPER-Multimedia Systems

      Vol:
    E86-B No:9
      Page(s):
    2753-2763

    The efficient distribution of stored information has become a major concern in the Internet. Since the web workload characteristics show that more than 60% of network traffic is caused by image documents, how to efficiently distribute image documents from servers to end clients is an important issue. Proxy cache is an efficient solution to reduce network traffic. And it has been shown that an image caching method (Graceful Caching) based on hierarchical coding format performs better than conventional caching schemes in recent years. However, as the capacity of the cache is limited, how to efficiently allocate the cache memory to achieve a minimum expected delay time is still a problem to be resolved. This paper presents an integrated caching algorithm to deal with the above problem for image databases, web browsers, proxies and other similar applications in the Internet. By analyzing the web request distribution of the Graceful Caching, both replacing and pre-fetching algorithms are proposed. We also show that our proposal can be carried out based on information readily available in the proxy server; it flexibly adapts its parameters to the hit rates and access pattern of users' requesting documents in the Graceful Caching. Finally we verify the performance of this algorithm by simulations.

  • Technology Scalable Matrix Architecture for Data Parallel Applications

    Mostafa SOLIMAN  Stanislav SEDUKHIN  

     
    PAPER-Networking and Architectures

      Vol:
    E86-D No:9
      Page(s):
    1549-1559

    Within a few years it will be possible to integrate a billion transistors on a single chip operating at frequency more than 10 GHz. At this integration level, we propose using a multi-level ISA to express fine-grain data parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of data parallel applications are scalar, vector, and matrix, our proposed Trident processor extends a scalar ISA with vector and matrix instruction sets to effectively process matrix formulated applications. Like vector architectures, the Trident processor consists of a set of parallel lanes (each lane contains a set of vector pipelines and a slice of register file) combined with a fast scalar core. However, Trident processor can effectively process on the parallel lanes not only vector but also matrix data. One key point of our architecture is the local communication within and across lanes to overcome the limitations of the future VLSI technology. Another key point is the effective execution of a mixture of scalar, vector, and matrix operations. This paper describes the architecture of the Trident processor and evaluates its performance on BLAS and on the standard matrix bidiagonalization algorithm. The last one is evaluated as an example of an entire application based on a mixture of scalar, vector, and matrix operations. Our results show that many data parallel applications, such as scientific, engineering, multimedia, etc., can be speeded up on the Trident processor. Besides, the scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires only replication of parallel lanes.

  • Results Merging with the OASIS System: An Experimental Comparison of Two Techniques

    Vitaliy KLUEV  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1773-1780

    Mechanisms used for results merging are very important for distributed search systems. They are to select the most relevant documents retrieved by different servers and put them on the top of the list returned to the end user. There are several approaches to solve key problems of this task such as eliminating duplicates and ranking results combined. But it is still not clear how to achieve this. We use the clustering technique to divide retrieved results into several groups and a metric on the base of the vector space model to arrange items inside each group. Preliminary tests were conducted using the OASIS system and several collections of real Internet data. They showed relatively superior results when compared to the neural network clustering and LSI calculation. Proposed mechanisms can be applied to metasearch systems and to distributed search systems as well because such mechanisms do not require any special information except standard de facto data received from servers.

  • An A* Search in Sentential Matching for Question Answering

    Tatsunori MORI  Tomohiro OHTA  Katsuyuki FUJIHATA  Ryutaro KUMON  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1658-1668

    In this paper, we propose a method to introduce an A* search control into sentential matching mechanism for question-answering systems, in order to reduce the response time while the accuracy of the answer is preserved. The question answering is a new technology to retrieve not relevant documents but the answer(s) directly by combining several methodology including IR and IE. One of the essential processes is the sentential matching between the user's query and each sentence in documents. In general, in order to obtain matching score precisely in higher resolution, we need some processes with higher computational costs. We therefore introduce an A* search in which both the processing cost and the resolution of matching score are took into account simultaneously. According to the experiments in NTCIR3 QAC1 Task1, the system with the controlled search is 3.4-8.5 times faster than the system with no control.

  • Factor Controlled Hierarchical SOM Visualization for Large Set of Data

    Junan CHAKMA  Kyoji UMEMURA  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1796-1803

    Self-organizing map is a widely used tool in high-dimensional data visualization. However, despite its benefits of plotting very high-dimensional data on a low-dimensional grid, browsing and understanding the meaning of a trained map turn to be a difficult task -- specially when number of nodes or the size of data increases. Though there are some well-known techniques to visualize SOMs, they mainly deals with cluster boundaries and they fail to consider raw information available in original data in browsing SOMs. In this paper, we propose our Factor controlled Hierarchical SOM that enables us select number of data to train and label a particular map based on a pre-defined factor and provides consistent hierarchical SOM browsing.

  • Measuring Errors on 3D Meshes Using Pixel Based Search

    Kohji INAGAKI  Masahiro OKUDA  Masaaki IKEHARA  Shin-ichi TAKAHASHI  

     
    PAPER-Computer Graphics

      Vol:
    E86-D No:9
      Page(s):
    1903-1908

    Due to the explosive growth of the network technologies, 3D models and animations have led to a great interest in various media. Especially 3D mesh models (3D meshes), which approximate surfaces by polygonal meshes are widely used to model 3D objects. In 1D and 2D signals such as speech, audio, images, video, etc., the signal values are located on "grids", for example the signals of images are defined on pixels. Thus, the errors of such signals can be explicitly defined by differences of the values on the "grids". However since in the 3D meshes, vertices are located on arbitrary positions in a 3D space and are triangulated in arbitrary ways, the grids cannot be defined. This makes it difficult to measure error on the 3D meshes. In this paper, we propose a new numerical method to measure the errors between two different 3D meshes.

  • An All-Port Matched Impedance-Transforming Marchand Balun and Its Mixer Application

    Mitchai CHONGCHEAWCHAMNAN  Kamorn BANDUDEJ  Apisak WORAPISHET  Choon Yong NG  Ian D. ROBERTSON  

     
    PAPER

      Vol:
    E86-C No:8
      Page(s):
    1593-1600

    A new technique to reduce the isolation network's size in a Marchand balun needed for perfect all-port matching and isolation is proposed. The proposed isolation circuit is realized using a coupled-line phase-inverter in place of the bulky 180line section that has been previously proposed. Analysis of the proposed circuit yields the required relationship between coupling coefficient and electrical length of the coupler. Based on the design equations, the circuit is experimentally demonstrated at 1.8 GHz and has shown excellent results. The obtained output return loss and isolation loss are more than 18 dB and 40 dB, respectively. The proposed balun was then applied to the application of a doubled-balanced ring-diode mixer. The designed mixer achieves a low conversion loss of 6 dB at its operating frequency, which is 1.5 dB lower than for a doubled-balanced diode mixer using a conventional impedance-transforming Marchand balun. The RF-IF and LO-IF isolations are well below 25 dB and 18 dB across 1 GHz RF operating bandwidth, respectively.

881-900hit(1309hit)