1-3hit |
A base model should be transmitted first in progressive transmission schemes, and its transmission delay dominates initiation time for rendering. To reduce the initiation time, we restructure the base model to transmit visible vertices and triangles for some specific viewpoints first, and therefore clients can start rendering when parts of model file are received. Simulation results show that only 37.4% - 51.3% of model file are required to start rendering, and hence the initiation time is significantly reduced.
This paper presents a novel digit-level algorithm for motion estimation (ME) and its hardware implementations. It uses the most-significant-digit-first (MSD-first) processing and on-line arithmetic ME components. A dedicated array architecture is also proposed for applications with high-throughput ME. Various fast search algorithms were presented in literatures to reduce the complexity but sacrifice the motion vector (MV) quality. Our MSD-first ME decomposes the summation of absolute differences (SAD) and comparison operations to digit level with MSD-plane first. These comparisons are interleaved into SADs to distinguish the MV as soon as possible. The algorithm precisely extracts the impossible candidates and removes their rest operations. It saves 47.4 % to 64.3 % of SAD computations in full search block matching (FSBM) ME. In the past, the high implementation cost of redundant number system prevented the practical use of on-line arithmetic. Besides, the redundant SAD removal results in irregular data flow in system-level integration. All these problems are solved by our novel architecture design. In this paper, we propose novel architecture designs to solve these problems. Besides, the architecture requires only one memory access per pixel to lower memory bandwidth by extensive data parallelism and a particular memory addressing while keeping the controller simple. A 4 4 array processor is implemented in 0.35 µm 1P4M CMOS cell library, with 2.84 ns cycle time and 1510 gates. It can support 83 M FSBM operations per second. After normalization, our implementation can support 2.67 times SAD operations per unit area (estimated in gate count) of the conventional two's complement ones. MSD-first ME can realize with other ME algorithms to improve the performance as well.
Hun-Chen CHEN Tian-Sheuan CHANG Jiun-In GUO Chein-Wei JEN
This paper presents a long length discrete Hartley transform (DHT) design with a new hardware efficient distributed arithmetic (DA) approach. The new DA design approach not only explores the constant property of coefficients as the conventional DA, but also exploits its cyclic property. To efficiently apply this approach to long length DHT, we first decompose the long length DHT algorithm to short ones using the prime factor algorithm (PFA), and further reformulate it by using Agarwal-Cooley algorithm such that all the partitioned short DHT still consists of the cyclic property. Besides, we also exploit the scheme of computation sharing on the content of ROM to reduce the hardware cost with the trade-off in slowing down the computing speeds. Comparing with the existing designs shows that the proposed design can reduce the area-delay product from 52% to 91% according to a 0.35 µm CMOS cell library.