Makoto SUGIHARA Tohru ISHIHARA Kazuaki MURAKAMI
This paper proposes a task scheduling approach for reliable cache architectures (RCAs) of multiprocessor systems. The RCAs dynamically switch their operation modes for reducing the usage of vulnerable SRAMs under real-time constraints. A mixed integer programming model has been built for minimizing vulnerability under real-time constraints. Experimental results have shown that our task scheduling approach achieved 47.7-99.9% less vulnerability than a conventional one.
In this paper, we propose memory and performance optimized architecture to accelerate the operation speed of adaptive deblocking filter for H.264/JVT/AVC video coding. The proposed deblocking filter executes loading/storing and filtering operations with only 192 cycles for 1 macroblock. Only 244 internal buffers and 3216 internal SRAM are adopted for the buffering operation of deblocking filter with I/O bandwidth of 32 bit. The proposed architecture can process the filtering operation for 1 macroblock with less filtering cycles and lower memory sizes than some conventional approaches of realizing deblocking filter. The efficient hardware architecture is implemented with novel data arrangement, hybrid filter scheduling and minimum number of buffer. The proposed architecture is suitable for low cost and real-time applications, and the real-time decoding with 1080HD (19201088@30 fps) can be easily achieved when working frequency is 70 MHz.
Yiqing HUANG Zhenyu LIU Yang SONG Satoshi GOTO Takeshi IKENAGA
One hardware efficient and high speed architecture for variable block size motion estimation (VBSME) in H.264 is presented in this paper. By improving the pipeline structure and processing element (PE) circuits, the system latency and hardware cost is reduced, which makes this structure more hardware efficient than the original Propagate Partial SAD architecture. For small and middle frame size picture's coding, the proposed structure can save 12.1% hardware cost compared with original Propagate Partial SAD structure. In the case of HDTV, since small inter modes trivially contribute to the coding quality, we remove modes below 88 in our design. By adopting mode reduction technique, when the set number of PE array is less than 8, the proposed mode reduction based Propagate Partial SAD structure can work at faster clock speed and consume less hardware cost than widely used SAD Tree architecture. It is more robust to the high speed timing constraint when parallel processing is considered. With TSMC 0.18 µm technology in worst work conditions (1.62 V, 125), its peak throughput of 8-set PE array structure is 720p@30 Hz with 12864 search range and 5 reference frames. 12 k gates hardware cost can be reduced by our design compared with the parallel SAD Tree architecture.
Yibo FAN Takeshi IKENAGA Satoshi GOTO
Variable Block Size Motion Estimation (VBSME) costs a lot of computation during video coding. Search range reduction algorithm is widely used to reduce computational cost of motion estimation. Current VBSME designs are not suitable for this algorithm. This paper proposes a reconfigurable design of VBSME which can be efficiently used with search range reduction algorithm. While using proposed design, nm reference MBs form an MB array which can be processed in parallel. n and m can be configured according to the new search range shape calculated by algorithm. In this way, the parallelism of proposed design is very flexible and can be adapted to any search range shape. The hardware resource is also fully used while performing VBSME. There are two primary reconfigurable modules in this design: PEGA (PE Group Array) and SAD comparator. By using TSMC 0.18 µm standard cell library, the implementation results show that the hardware cost of design which uses 16 PEGs (PE Groups) is about 179 K Gates, the clock frequency is 167 MHz.
Binary search tree and framed ALOHA algorithms are commonly adopted to solve the anti-collision problem in RFID systems. In this letter, the read efficiency of these two anti-collision algorithms is compared through computer simulations. Simulation results indicate the framed ALOHA algorithm requires less total read time than the binary search tree algorithm. The initial frame length strongly affects the uplink throughput for the framed ALOHA algorithm.
Sang-Wook KIM Jinho KIM Sanghyun PARK
Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.
Futoshi FURUTA Kazuo SAITOH Akira YOSHIDA Hideo SUZUKI
We have designed a superconductor-semiconductor hybrid analog-to-digital (A/D) converter and experimentally evaluated its performance at sampling frequencies up to 18.6 GHz. The A/D converter consists of a superconductor front-end circuit and a semiconductor back-end circuit. The front-end circuit includes a sigma-delta modulator and an interface circuit, which is for transmitting data signal to the semiconductor back-end circuit. The semiconductor back-end circuit performs decimation filtering. The design of the modulator was modified to reduce effects of integrator leak and thermal noise on signal-to-noise ratio (SNR). Using the improved modulator design, we achieved a bit-accuracy close to the ideal value. The hybrid architecture enabled us to reduce the integration scale of the front-end circuit to fewer than 500 junctions. This simplicity makes feasible a circuit based on a high TC superconductor as well as on a low TC superconductor. The experimental results show that the hybrid A/D converter operated perfectly and that SNR was 84.8 dB (bit accuracy~13.8 bit) at a band width of 9.1 MHz. This converter has the highest performance of all sigma-delta A/D converters.
Jin-Song ZHANG Satoshi NAKAMURA
An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.
Xiang-Hui WEI Shen LI Yang SONG Satoshi GOTO
Motion estimation (ME) is a computation-intensive module in video coding system. In MPEG-2 to H.264 transcoding, motion vector (MV) from MPEG-2 reused as search center in H.264 encoder is a simple but effective technique to simplify ME processing. However, directly applying MPEG-2 MV as search center will bring difficulties on application of data reuse method in hardware design, because the irregular overlapping of search windows between successive macro block (MB). In this paper, we propose a search window reuse scheme for transcoding, especially for HDTV application. By utilizing the similarity between neighboring MV, overlapping area of search windows can be regularized. Experiment results show that our method achieves average 93.1% search window reuse-rate in HDTV720p sequence with almost no video quality degradation. Compared to transcoding method without any data reuse scheme, bandwidth of the proposed method can be reduced to 40.6% of that.
Won-Jong LEE Vason P. SRINI Woo-Chan PARK Shigeru MURAKI Tack-Don HAN
We present an adaptive dynamic load balancing scheme for 3D texture based sort-last parallel volume rendering on a PC cluster equipped with GPUs. Our scheme exploits not only task parallelism but also data parallelism during rendering by combining the hierarchical data structures (octree and parallel BSP tree) in order to skip empty regions and distribute proper workloads to rendering nodes. Our scheme can also conduct a valid parallel rendering and image compositing in visibility order by employing a 3D clustering algorithm. To alleviate the imbalance when the transfer function is changed, a load rebalancing is inexpensively supported by exchanging only needed data. A detailed performance analysis is provided and scaling characteristics of our scheme are discussed. These show that our scheme can achieve significant performance gains by increasing parallelism and decreasing synchronizing costs compared to the traditional static distribution schemes.
Takatoshi JITSUHIRO Tomoji TORIYAMA Kiyoshi KOGURE
We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.
Sanghoon HWANG Junho MOON Minkyu SONG
In this paper, a CMOS analog-to-digital converter (ADC) with a 6-bit 100 MSPS at 1.8 V is described. The architecture of the proposed ADC is based on a folding type with a resistive interpolation technique for low power consumption. To reduce the power consumption, a folder reduction technique to decrease the number of folding blocks (NFB) by half of the conventional ones, an averaging folder technique, and a compensated resistive interpolation technique are proposed. Further, an auto-switching encoder for efficient digital processing is also presented. With the clock speed of 100 MSPS, the ADC achieves an effective resolution bandwidth (ERBW) of 50 MHz, while consuming only 4.5 mW of power. The measured result of figure-of-merit (FoM) is 0.93 [pJ/convstep]. The active chip occupies an area of 0.28 mm2 in 0.18 µm CMOS technology.
Yasuhito ASANO Yu TEZUKA Takao NISHIZEKI
The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring Web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pages given high score by the algorithm were really related to a given topic, and hence the algorithm could be used to find related pages. However, the algorithm and the variants including Bharat's improved HITS, abbreviated to BHITS, proposed by Bharat and Henzinger cannot be used to find related pages any more on today's Web, due to an increase of spam links. In this paper, we first propose three methods to find "linkfarms," that is, sets of spam links forming a densely connected subgraph of a Web graph. We then present an algorithm, called a trust-score algorithm, to give high scores to pages which are not spam pages with a high probability. Combining the three methods and the trust-score algorithm with BHITS, we obtain several variants of the HITS algorithm. We ascertain by experiments that one of them, named TaN+BHITS using the trust-score algorithm and the method of finding linkfarms by employing name servers, is most suitable for finding related pages on today's Web. Our algorithms take time and memory no more than those required by the original HITS algorithm, and can be executed on a PC with a small amount of main memory.
A new Hybrid-Carry-Selection (HCS) approach for deriving an efficient modulo 2n-1 addition is presented in this study. Its resulting adder architecture is simple and applicable for all n values. Based on 180-nm CMOS technology, the HCS-based modulo 2n-1 adder demonstrates its superiority in Area-Time (AT) performance over existing solutions.
Hirotoshi HONMA Shigeru MASUYAMA
Let G =(V, E) be an undirected simple graph with u ∈ V. If there exist any two vertices in G whose distance becomes longer when a vertex u is removed, then u is defined as a hinge vertex. Finding the set of hinge vertices in a graph is useful for identifying critical nodes in an actual network. A number of studies concerning hinge vertices have been made in recent years. In a number of graph problems, it is known that more efficient sequential or parallel algorithms can be developed by restricting classes of graphs. In this paper, we shall propose a parallel algorithm which runs in O(log n) time with O(n/log n) processors on EREW PRAM for finding all hinge vertices of a circular-arc graph.
Soodesh BULJORE Markus MUCK Patricia MARTIGNE Paul HOUZE Hiroshi HARADA Kentaro ISHIZU Oliver HOLLAND Andrej MIHAILOVIC Kostas A. TSAGKARIS Oriol SALLENT Gary CLEMO Mahesh SOORIYABANDARA Vladimir IVANOV Klaus NOLTE Makis STAMETALOS
The Project Authorization Request (PAR) for the IEEE P1900.4 Working Group (WG), under the IEEE Standards Coordinating Committee 41 (SCC41) was approved in December 2006, leading to this WG being officially launched in February 2007 [1]. The scope of this standard is to devise a functional architecture comprising building blocks to enable coordinated network-device distributed decision making, with the goal of aiding the optimization of radio resource usage, including spectrum access control, in heterogeneous wireless access networks. This paper introduces the activities and work under progress in IEEE P1900.4, including its scope and purpose in Sects. 1 and 2, the reference usage scenarios where the standard would be applicable in Sect. 4, and its current system architecture in Sect. 5.
Keehang KWON Dae-Seong KANG Jinsoo KIM
We propose a query language based on extended regular expressions. This language extends texts with text-generating macros. These macros make it possible to define languages in a compressed, elegant way. This paper also extends queries with linear implications and additive (classical) conjunctions. To be precise, it allows goals of the form D —ο G and G1&G2 where D is a text or a macro and G is a query. The first goal is solved by adding D to the current text and then solving G. This goal is flexible in controlling the current text dynamically. The second goal is solved by solving both G1 and G2 from the current text. This goal is particularly useful for internet search.
Yusuke NAITO Kazuo OHTA Noboru KUNIHIRO
In this paper, we discuss the collision search for hash functions, mainly in terms of their advanced message modification. The advanced message modification is a collision search tool based on Wang et al.'s attacks. Two advanced message modifications have previously been proposed: cancel modification for MD4 and MD5, and propagation modification for SHA-0. In this paper, we propose a new concept of advanced message modification, submarine modification. As a concrete example combining the ideas underlying these modifications, we apply submarine modification to the collision search for SHA-0. As a result, we show that this can reduce the collision search attack complexity from 239 to 236 SHA-0 compression operations.
This paper addresses a new surface reconstruction scheme for approximating the isosurface from a set of tomographic cross sectional images. Differently from the novel Marching Cubes (MC) algorithm, our method does not extract the iso-density surface (isosurface) directly from the voxel data but calculates the iso-density point (isopoint) first. After building a coarse initial mesh approximating the ideal isosurface by the cell-boundary representation, it metamorphoses the mesh into the final isosurface by a relaxation scheme, called shrink-wrapping process. Compared with the MC algorithm, our method is robust and does not make any cracks on surface. Furthermore, since it is possible to utilize lots of additional isopoints during the surface reconstruction process by extending the adjacency definition, theoretically the resulting surface can be better in quality than the MC algorithm. According to experiments, it is proved to be very robust and efficient for isosurface reconstruction from cross sectional images.
Sihun PARK Namhi KANG Younghan KIM
Proxy Mobile IPv6 (PMIPv6) is designed not only to avoid tunneling overhead over the air but also to manage the mobility of hosts that are not equipped with any mobility management software. However, PMIPv6 leads to increasing signaling cost as mobile nodes move frequently because the protocol is based on the global mobility management protocol. In this letter we propose Localized PMIPv6 with Route Optimization (LPMIPv6-RO). Our numerical analysis shows that the proposed scheme outperforms previously proposed mobility protocols in terms of both signaling and packet delivery cost.