Media processing has become one of the dominant computing workloads. In this context, SIMD instructions have been introduced in current processors to raise performance, often the main goal of microprocessor designers. Today, however, designers have become concerned with the power consumption, and in some cases low power is the main design goal (laptops). In this paper, we show that SIMD ISA extensions on a superscalar processor can be one solution to reduce power consumption and keeping a high performance level. We reduce the average power consumption by decreasing the number of instructions, the number of cache references, and using dynamic power management to transform the speedup in performance in power consumption reduction.
Tetsuya YAMADA Makoto ISHIKAWA Yuji OGATA Takanobu TSUNODA Takahiro IRITA Saneaki TAMAKI Kunihiko NISHIYAMA Tatsuya KAMEI Ken TATEZAWA Fumio ARAKAWA Takuichiro NAKAZAWA Toshihiro HATTORI Kunio UCHIYAMA
A 32-bit embedded RISC microprocessor core integrating a DSP has been developed using a 0.18-µm five-layer-metal CMOS technology. The integrated DSP has a single-MAC and exploits CPU resources to reduce hardware. The DSP occupies only 0.5 mm2. The processor core includes a large on-chip 128 kB SRAM called U-memory. A large capacity on-chip memory decreases the amount of traffic with an external memory. And it is effective for low-power and high-performance operation. To realize low-power dissipation for the U-memory access, the active ratio of U-memory's access is reduced. The critical path is a load path from the U-memory, and we optimized the path through the whole chip. The chip achieves 0.79 mA/MHz executing Dhrystone 1.1 at 108 MHz, which is suitable for mobile applications.
Caihua WANG Hideki TANAHASHI Hidekazu HIRAYU Yoshinori NIWA Kazuhiko YAMAMOTO
In this paper, we propose a probabilistic approach to derive an approximate polyhedral description from range data. We first compare several least-squares-based methods for estimation of local normal vectors and select the most robust one based on a reasonable noise model of the range data. Second, we extract the stable planar regions from the range data by examining the distributions of the local normal vectors together with their spatial information in the 2D range image. Instead of segmenting the range data completely, we use only the geometries of the extracted stable planar regions to derive a polyhedral description of the range data. The curved surfaces in the range data are approximated by their extracted plane patches. With a probabilistic approach, the proposed method can be expected to be robust against the noise. Experimental results on real range data from different sources show the effectiveness of the proposed method.
Jian YANG Ying-Ning PENG Yoshio YAMAGUCHI Hiroyoshi YAMADA Wolfgang-M. BOERNER
The periodicity of a target scattering matrix is studied when the target is rotated about the sight line of a monostatic radar. Except for the periodicity and invariance of the scattering matrix diag(a,a), it is proved that only helixes have the quasi-invariance, and that only N-targets have the quasi-periodicity, demonstrating that a target with some angle rotation symmetry also has the scattering matrix form diag(a,a). From this result, we conclude that it is impossible to extract the shape characteristics of a complex target from its scattering matrix or its Kennaugh matrix.
It has been shown that virtual output queuing (VOQ) and a sophisticated scheduling algorithm enable an input-queued switch to achieve 100% throughput for independent arrival process. Several of the scheduling algorithms that have been proposed can be classified as either iterative scheduling algorithms or symmetric crossbar arbitration algorithms. i-OCF (oldest-cell-first) and TSA (two step arbiter) are well-known examples of iterative scheduling algorithms and symmetric crossbar arbitration algorithms, respectively. However, there are drawbacks in using these algorithms. i-OCF takes long time to find completely a conflict-free match between input ports and output ports because it requires multiple iterations. If i-OCF cannot find a conflict-free match completely, the switch throughput falls. TSA has the possibility that it finds a conflict-free match faster than i-OCF because it does not need any iterations. However, TSA suffers from the starvation problem. In this paper, we propose a new scheduling algorithm. It uses two schedulers, which we call scheduler 1 and scheduler 2, in parallel. After cells were transmitted, the information that input port i granted the offer from output port j in scheduler 2 is mapped to scheduler 1 if and only if input port i has at least one cell destined for output port j. If the information is moved, input port i and output port j are matched in scheduler 1 at the beginning of the next time slot. Our proposed algorithm uses one scheduler based on TSA and the other scheduler based on i-OCF. Numerical results show that the proposed scheduling algorithm does not require multiple iterations to find a conflict-free match completely and suffer from the starvation problem for both uniform and bursty traffic.
Koji HOSAKA Shinichi HARASE Shoji IZUMIYA Takehiko ADACHI
A cascode crystal oscillator is widely used for the stable frequency source of mobile communication equipments. Recently, IC production of the cascode crystal oscillator has become necessary. The cascode crystal oscillator is composed of a colpitts crystal oscillator and a cascode connected base-common buffer amplifier. The base bypass condenser prevents the area size reduction. In this paper, we have proposed the new structures of the cascode crystal oscillator suitable for integrated circuits. The proposed circuits have the advantages on reduction of the area size and start-up time without deteriorating the frequency stability against the load impedance variation and other performances. The simulation and experiment have shown the effectiveness of the proposed circuits.
Yasuaki WATANABE Kiyoharu OZAKI Shigeyoshi GOKA Takayuki SATO Hitoshi SEKIMOTO
A highly stable oven-controlled crystal oscillator (OCXO) with low phase-noise characteristics has been developed using a dual-mode SC-cut quartz crystal oscillator. The OCXO uses a conventional oven-control system for coarse compensation and a digital-correction system, which uses B-mode signal in an SC-cut resonator as a temperature sensor, for fine compensation. Combining these two forms of compensation greatly improves the stability of the C-mode frequency without requiring a double-oven system. The experimental results indicated that the frequency stability of the proposed OCXO, including the frequency-temperature hysteresis, is ten times better than that of a conventional, free-running OCXO. The results also indicated that the proposed OCXO has good frequency retraceability and low phase-noise characteristics.
Yih-Shen CHEN Chung-Ju CHANG Fang-Ching REN
Sophisticated and robust resource management is an essential issue in future wireless systems which will provide a variety of application services. In this paper, we employ an adaptive-network-based fuzzy inference system (ANFIS) to control the resource allocation for mobile multimedia networks. ANFIS, possessing the advantages of expert knowledge of fuzzy logic system and learning capability of neural networks, can provide a systematic approach to finding appropriate parameters for the Sugeno fuzzy model. The fuzzy resource allocation controller (FRAC) is designed in a two-layer architecture and selects properly the capacity requirement of new call request, the capacity reservation for future handoffs, and the air interface performance as input linguistic variables. Therefore, the statistical multiplexing gain of mobile multimedia networks can be maximized in the FRAC. Simulation results indicate that the proposed FRAC can keep the handoff call blocking rate low without jeopardizing the new call blocking rate. Also, the FRAC can indeed guarantee quality of service (QoS) contracts and achieve higher system performance according to network dynamics, compared with the guard channel scheme and ExpectedMax strategy.
Hiroshi ANDO Takashi MORIE Makoto MIYAKE Makoto NAGATA Atsushi IWATA
This paper proposes a new method for image segmentation and extraction using nonlinear cellular networks. Flexible segmentation of complicated natural scene images is achieved by using resistive-fuse networks, and each segmented regions is extracted by nonlinear oscillator networks. We also propose a nonlinear cellular network circuit implementing both resistive-fuse and oscillator dynamics by using pulse-modulation techniques. The basic operation of the nonlinear network circuit is confirmed by SPICE simulation. Moreover, the 1010-pixel image segmentation and extraction are demonstrated by high-speed circuit simulation.
Haruo KOBAYASHI Kensuke KOBAYASHI Masanao MORIMURA Yoshitaka ONAYA Yuuich TAKAHASHI Kouhei ENOMOTO Hideyuki KOGURE
This paper presents an explicit analysis of the output error power in wideband sampling systems with finite aperture time in the presence of sampling jitter. Sampling jitter and finite aperture time affect the ability of wideband sampling systems to capture high-frequency signals with high precision. Sampling jitter skews data acquisition timing points, which causes large errors in high-frequency (large slew rate) signal acquisition. Finite sampling-window aperture works as a low pass filter, and hence it degrades the high-frequency performance of sampling systems. In this paper, we discuss these effects explicitly not only in the case that either sampling jitter or finite aperture time exists but also the case that they exist together, for any aperture window function (whose Fourier transform exists) and sampling jitter of Gaussian distribution. These would be useful for the designer of wideband sampling data acquisition systems to know how much sampling jitter and aperture time are tolerable for a specified SNR. Some experimental measurement results as well as simulation results are provided as validation of the analytical results.
Rong-Long WANG Zheng TANG Qi-Ping CAO
A near-optimum parallel algorithm for bipartite subgraph problem using gradient ascent learning algorithm of the Hopfield neural networks is presented. This parallel algorithm, uses the Hopfield neural network updating to get a near-maximum bipartite subgraph and then performs gradient ascent learning on the Hopfield network to help the network escape from the state of the near-maximum bipartite subgraph until the state of the maximum bipartite subgraph or better one is obtained. A large number of instances have been simulated to verify the proposed algorithm, with the simulation result showing that our algorithm finds the solution quality is superior to that of best existing parallel algorithm. We also test the proposed algorithm on maximum cut problem. The simulation results also show the effectiveness of this algorithm.
Masayuki DAITO Kazumasa SUZUKI Ken-ichi UEHIGASHI Hiroshi MORITA Hitoshi SONODA Nobuhito MORIKAWA Masatoshi MORIYAMA Shoichiro SATO Terumi FUKUDA Saori NAKAMURA
A MIPS-architecture-based embedded out-of-order superscalar microprocessor targeting broadband applications has been developed. Aggressive microarchitectures, such as superpipelining and out-of-order execution, have been applied to realize better performance scalability in order to fit with next-generation broadband applications. The chip includes a 32 K-Byte instruction cache, a 32 K-Byte data cache, 6 independent execution units, and has been designed using an ASIC-style design methodology on a 0.13-µm CMOS 5-layer aluminum technology. It can operate up to 500 MHz and achieves 1005 MIPS (Dhrystone 2.1) at 500-MHz operation.
Kenji ITO Shuji TASAKA Yutaka ISHIBASHI
This paper studies effect of packet scheduling algorithms at routers on media synchronization quality in live audio and video transmission by experiment. In the experiment, we deal with four packet scheduling algorithms: First-In First-Out, Priority Queueing, Class-Based Queueing and Weighted Fair Queueing. We assess the synchronization quality of both intra-stream and inter-stream with and without media synchronization control. The paper clarifies the features of each algorithm from a media synchronization point of view. A comparison of the experimental results shows that Weighted Fair Queueing is the most efficient packet scheduling algorithm for continuous media among the four.
At present, the global Internet consists of many ASes. Each AS pays a pre-determined connection fee to another AS for connecting its network with that AS's network. The connection fee type charging may be rational in case of transferring the best-effort type traffic. However, usage charging is necessary to transferring the resource guaranteed type traffic such as the Intserv traffic and the Diffserv traffic. In this case, each AS pays a per-flow fee to another AS every time it routes a flow into another AS. The per-flow fee paid by each AS becomes a part of the cost for that AS. Thus, each AS needs to select a route with the lowest price to improve its own profit. In this paper, we call such an inter-AS routing scheme a price-based inter-AS routing scheme. When each AS has a request to route an inter-AS flow, it can select an inter-AS route with the lowest price to improve its own profit by this routing scheme. Cost-dependent pricing scheme is suitable for the price-based inter-AS routing scheme because it can reduce frequency of price information exchange between ASes. However, in the cost-dependent pricing scheme, profit in each AS depends on the distribution of path costs in that AS. Generally, ASes with narrow ranges of path costs cannot obtain sufficient profits compared to ASes with wide ranges of path costs. Thus, we propose a routing policy for ASes with narrow ranges of path costs to improve their profits efficiently and evaluate its effect using a simple routing model.
Fu-Kun CHEN Jar-Ferr YANG Yu-Pin LIN
For multimedia communications, the computational scalability of a multimedia codec is required to match with different working platforms and integrated services of media sources. In this paper, two condensed stochastic codebook search approaches are proposed to progressively reduce the computation required for the algebraic code excited linear predictive (ACELP) and multi-pulse maximum likelihood quantization (MP-MLQ) coders. By reducing the candidates of the codebook before search procedure, the proposed methods can effectively diminish the computation required for the ITU-T G.723.1 dual rate speech coder. Simulation results show that the proposed methods can save over 50 percent for the stochastic codebook search with perceptually intangible degradation in speech quality.
Hong-Bin CHIOU Sheng-Der CHIN Zsehong TSAI
We proposed an improved Hierarchical Packet Fair Queueing (H-PFQ) mechanism, using ACK Spacing, for efficient bandwidth management of TCP traffic over Internet. According to the pre-determined bandwidth sharing and the class hierarchy of all TCP sessions, we design an algorithm to calculate the required time intervals between consecutive ACK packets of each TCP session to avoid packet drops due to buffer overflow. We demonstrated via computer simulations that the proposed improvement techniques may result in much better performance than merely original H-PFQ mechanism used in the forward direction in the sense that not only effective throughput of the bottleneck link is improved but also the fairness among TCP sessions can be maintained.
Phakphoom BOONYANANT Sawasd TANTARATANA
This paper considers FIR filter design using linear predictive coding technique, for which the coefficients belong to a small set of integers, so that the coefficients have small wordlengths. Previously, integer programming was used to find the coefficients of such filters. However, the design method using integer programming suffers from high computational cost as the filter length increases. The computation can quickly become prohibition. In this paper, we propose two designs of predictive encoded FIR filters based on a modified Karmarkar's linear programming algorithm, which is known to be more suitable for solving large problems. First, we formulate the problem as a weighted minimax error problem and arrange it in a form that the modified Karmarkar algorithm can be applied. The design algorithm has the same (low) complexity as that of the weighted least-square method, but it can solve problems with some constraints, whereas the weighted least-square method cannot. However, the algorithm has a difficulty due to an ill condition caused by matrix inversion when the predictive filter order is high. To avoid this difficulty, we formulate the design as a weighted least absolute error problem. By using this second proposed algorithm, a filter with shorter coefficient wordlength can be found using a higher-order predictor filter at the expense of more computational cost. To further reduce the coefficient wordlength, the filter impulse response is separated into two sections having different ranges of coefficient values. Each section uses a different scaling factor to scale the coefficient values. With small coefficient wordlength, the filter can be realized without hardware multipliers using a low-radix signed-digit number representation. Each coefficient is distributed in space as 2-3 ternary {0,1} or quinary {0,1, 2} coefficients. Ternary coefficients require only add/subtract operation, while quinary coefficients require one-bit shift and add/subtract operations. The shift can be hardwired without any additional hardware.
In this paper, we treat visual secret sharing scheme (VSSS) for color images. We first evaluate the brightness of the decrypted color image under certain conditions on the mixture of colors. We obtain a general formula for the construction of VSSS using mixture of colors. We second propose an iterative algorithm for constructing VSSS in a practical situation. If we use the iterative construction, we have only to solve partial differential equations with small n even if n is actually large, where n denotes the number of participants. This iterative construction has never discussed in the both cases under the original images are black-white images and color images. Finally, we propose the way to embed a color image on each share for the case that the original image is color.
Makoto NAKASHIZUKA Hidetoshi OKAZAKI Hisakazu KIKUCHI
In this paper, a new image synthesis model based on a set of wavelet bases is proposed. In the proposed model, images are approximated by the sum of synthesis functions that are translated to image edge positions. By applying the proposed model to sketch-based image coding, no iterative image recovery procedure is required for image decoding. In the design of the synthesis functions, we define the synthesis functions as a linear combination of wavelet bases. The coefficients for wavelet bases are obtained from an iterative procedure. The vector quantization is applied to the vectors of the coefficients to limit the number of the synthesis functions. We apply the proposed synthesis model to the sketch-based image coding. Image coding experiments by eight synthesis functions and a comparison with the orthogonal transform methods are also given.
A novel scheduling method for asynchronous multirate/multi-task processing by programmable digital signal processors (DSPs) has been developed. This mixed scheduling method combines static and dynamic scheduling, and avoids runtime overheads due to interrupts in context switching to realizes asynchronous multirate systems. The processing delay introduced when using static scheduling with static buffering is avoided by introducing deadline scheduling in the static schedule design. In the developed software design system, a block-diagram description language is extended to describe asynchronous multi-task processing. The scheduling method enables asynchronous multirate processing, such as arbitrary-sampling-ratio rate conversion, asynchronous interface, and multimedia applications, to be efficiently realized by programmable DSPs.