1-19hit |
Masayuki ODAGAWA Tetsushi KOIDE Toru TAMAKI Shigeto YOSHIDA Hiroshi MIENO Shinji TANAKA
This paper presents examination result of possibility for automatic unclear region detection in the CAD system for colorectal tumor with real time endoscopic video image. We confirmed that it is possible to realize the CAD system with navigation function of clear region which consists of unclear region detection by YOLO2 and classification by AlexNet and SVMs on customizable embedded DSP cores. Moreover, we confirmed the real time CAD system can be constructed by a low power ASIC using customizable embedded DSP cores.
Chung-Chien HSU Kah-Meng CHEONG Tai-Shih CHI Yu TSAO
This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
Masafumi KUMAMOTO Masahiro KIDA Ryotaro HIRAYAMA Yoshinobu KAJIKAWA Toru TANI Yoshimasa KURUMI
We propose an active noise control (ANC) system for reducing periodic noise generated in a high magnetic field such as noise generated from magnetic resonance imaging (MRI) devices (MR noise). The proposed ANC system utilizes optical microphones and piezoelectric loudspeakers, because specific acoustic equipment is required to overcome the high-field problem, and consists of a head-mounted structure to control noise near the user's ears and to compensate for the low output of the piezoelectric loudspeaker. Moreover, internal model control (IMC)-based feedback ANC is employed because the MR noise includes some periodic components and is predictable. Our experimental results demonstrate that the proposed ANC system (head-mounted structure) can significantly reduce MR noise by approximately 30 dB in a high field in an actual MRI room even if the imaging mode changes frequently.
Young-Geun LEE Han-Sam JUNG Ki-Seok CHUNG
Many DSP applications such as FIR filtering and DCT (discrete cosine transformation) require multiplication with constants. Therefore, optimizing the performance of constant multiplication improves the overall performance of these applications. It is well-known that shifting can replace a constant multiplication if the constant is a power of two. In this paper, we extend this idea in such a way that by employing more than two barrel shifters, we can design highly efficient constant multipliers. We have found that by using two or three shifters, we can generate a large set of constants. Using these constants, we can execute a typical set of FIR or DCT applications with few errors. Furthermore, with variable precision support, we can carry out a fairly large class of DSP applications with high computational efficiency. Compared to conventional multipliers, we can achieve power savings of up to 56% with negligible computational errors.
A very long instruction word (VLIW) digital signal processor (DSP), called ODiN, which could execute six instructions in a single cycle simultaneously, is designed and fabricated using 0.25 µm 1-ploy 5-metal standard cell static CMOS process. The ODiN core delivers maximum 600 MIPS with 100 MHz system clock. In order to achieve high performance operation, the designed core includes compact register files, orthogonal instruction set, single cycle operations for most instructions, and parallel processing based on software scheduling. In addition, a Viterbi decoder processor and a FFT processor that are embedded make it possible to implement software defined radio (SDR) applications efficiently.
This paper presents a new DSP-oriented code optimization method to enhance performance by exploiting the specific architectural features of digital signal processors. In the proposed method, a source code is translated into the static single assignment form while preserving the high-level information related to loops and the address computation of array accesses. The information is used in generating hardware loop instructions and parallel instructions provided by most digital signal processors. In addition to the conventional control-data flow graph, a new graph is employed to make it easy to find auto-modification addressing modes efficiently. Experimental results on benchmark programs show that the proposed method is effective in improving performance.
A novel scheduling method for asynchronous multirate/multi-task processing by programmable digital signal processors (DSPs) has been developed. This mixed scheduling method combines static and dynamic scheduling, and avoids runtime overheads due to interrupts in context switching to realizes asynchronous multirate systems. The processing delay introduced when using static scheduling with static buffering is avoided by introducing deadline scheduling in the static schedule design. In the developed software design system, a block-diagram description language is extended to describe asynchronous multi-task processing. The scheduling method enables asynchronous multirate processing, such as arbitrary-sampling-ratio rate conversion, asynchronous interface, and multimedia applications, to be efficiently realized by programmable DSPs.
Nozomu TOGAWA Takashi SAKURAI Masao YANAGISAWA Tatsuo OHTSUKI
This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.
Nozomu TOGAWA Yoshiharu KATAOKA Yuichiro MIYAOKA Masao YANAGISAWA Tatsuo OHTSUKI
Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2 ns when comparing estimated area and delay with logic-synthesized area and delay.
Naoki MIZUTANI Shogo MURAMATSU Hisakazu KIKUCHI
A unified polyphase representation of analysis and synthesis filter banks is introduced in this paper, and then the efficient implementation on digital signal processors (DSP) is investigated. Especially, the number of memory accesses, power consumption, processing accuracy and the required instruction cycles are discussed. Firstly, a unified representation is given, and then two types of procedures, SIMO system-based and MISO system-based procedures, are shown, where SIMO and MISO are abbreviations for single-input/multiple-output and multiple-input/single-output, respectively. These procedures are compared to each other. It is shown that the number of data load in SIMO system-based procedure is a half of that in MISO system-based procedure for two-channel filter banks. The implementation of M-channel filter banks is also discussed.
Kiyoshi NISHIKAWA Hitoshi KIYA
This paper proposes a fast implementation technique for RLS adaptive filters. The technique has an adjustable parameter to trade the throughput and the rate of convergence of the filter according to the applications. The conventional methods for improving the throughput do not have this kind of adjustability so that the proposed technique will expand the area of applications for the RLS algorithm. We show that the improvement of the throughput can be easily achieved by rearranging the formula of the RLS algorithm and that there are no need for faster PEs for the improvement.
Ulhaqsyed MOBIN Eiji HIRAKI Hiroshi TAKANO Mutsuo NAKAOKA
This paper describes an efficient simulation approach of a DSP controlled series-parallel resonant high frequency DC-DC power converter system. Proposed power conversion circuit simulation approach is based on a circuit equation, modeled by substituting time-varying switched resistor circuit in place of all the controllable and uncontrollable power semiconductor switching blocks of power converter circuits. An algebraic algorithm transforms the matrices of the circuit equation into the matrices of the state vector equation. Solution of state equation is by 3rd order Runge Kutta numerical integration method. Simulation results are illustrated and discussed together with experimental results.
Yasuo SUZUKI Kazuhiro UEHARA Masashi NAKATSUGAWA Yushi SHIRATO Shuji KUBOTA
Software radio base and personal station prototypes are proposed and implemented. The prototypes are composed of RF/IF, A/D and D/A, pre- and post-processors, CPU, and DSP parts. System software is partitioned into CPU program and DSP program to use processor resources effectively. They support various air interfaces, some of which are equivalent to the 384 kbit/s transmission rate PHS (personal handy phone system) and a 96 kbit/s transmission rate system. The base station can also be used as a communication bridge between two systems. In order to ease IF filter requirements, the zero-stuff method is employed. Basic transmission and receiving performances are evaluated in an experiment and their results agree well with those expected.
Yoshimasa NEGISHI Eiji WATANABE Akinori NISHIHARA Takeshi YANAGISAWA
Digital Signal Processors with complex arithmetic capability (DSP-C) are useful for various applications. In this paper, we propose a method for the effective implementation of specific circuits with real coefficients on DSP-C. DSP-C has special hardware such as a complex multiplier so that a complex calculation can be performed with only one instruction. First, we show that nodes with two real coefficient input branches can be implemented by complex multiplications. We apply this implementation to 2D circuits and transversal circuits with real coefficients. Next, we introduce a new computational mode (Advanced mode) and a new multiplier into PSI, a kind of DSP-C which has been proposed already, in order to process the circuits effectively. The effectiveness of the proposed method is shown by simulation in the last part.
Nobuhiko SUGINO Hironobu MIYAZAKI Akinori NISHIHARA
Many digital signal processors (DSPs) employ indirect addressing using address registers (ARs) to indicate their memory addresses, which often leads to overhead. This paper presents methods to efficiently allocate addresses for variables in a given program so that overhead in AR update operations is reduced. Memory addressing model is generalized in such a way that AR can be updated at the codes without memory accesses. An efficient memory address allocation is obtained by a method based on the graph linearization algorithm, which takes account of the number of possible AR update operations for every memory access. In order to utilize multiple ARs, methods to assign variables into ARs are also investigated. The proposed methods are applied to the compiler for µPD77230 (NEC) and generated codes for several examples prove effectiveness of these methods.
Eiichi TERAOKA Toru KENGAKU Ikuo YASUI Kazuyuki ISHIKAWA Takahiro MATSUO Hideyuki WAKADA Narumi SAKASHITA Yukihiko SHIMAZU Takeshi TOKUDA
Built-in self-test (BIST) has been applied to test an analog to digital converter (ADC) and a digital to analog converter (DAC) embedded in a DSP-core ASIC. The eight performance characteristics of the ADC and the DAC designed in accordance with the ITU-T recommendations are measured using the BIST. Three of the eight characteristics - the attenuation/frequency distortion, the variation of gain with input level, and the signal-to-total distortion - have been evaluated and the measured results have shown good agreement with measured results by conventional tests. In the BIST operation, the DSP-core generates input stimulus and analyzes output response by control of the self-test program, The sizes of the self-test program and coefficient data are 822 words of the IROM and 384 words of the data ROM, respectively. This area overhead is less than 0.5% of total chip area. Test-time by the BIST is reduced to approximately 3.2 seconds, which is one-tenth that of conventional testing. The mixed-signal DSP-core ASIC is testable with only logic test equipment, and as a result, test-cost - that is test investment and test-time - is reduced compared with conventional test methods.
Taketora SHIRAISI Koji KAWAMOTO Kazuyuki ISHIKAWA Eiichi TERAOKA Hidehiro TAKATA Takeshi TOKUDA Kouichi NISHIDA
A low power consumption 16-bit fixed point Digital Signal Processor (DSP) has been developed to realize a half-rate CODEC for the Personal Digital Cellular (PDC) system. Dual datapath architecture has been employed to execute multiply-accumulate (MAC) operations with a high degree of efficiency. With this architecture. 86.3% of total MAC operations in the Pitch Synchronous Innovation Code Excited Linear Prediction (PSI-CELP) program are executed in parallel, so that total instruction cycles are reduced by 23.1%. The area overhead for the dual datapath architecture is only 3.0% of the total area. Furthermore, in order to reduce power consumption, circuit design techniques are also extensively applied to RAMs. ROMs, and clock circuits, which consume the great majority of power. By reducing the number of precharging bit lines, a power reduction of 49.8% is achieved in RAMs, and above 40% in ROMs. By applying gated clock to clock lines, a power reduction of 5.0% is achieved in the DSP that performs the PSI-CELP algorithm. The DSP is fabricated in 0.5 µm single-poly, double-metal CMOS technology. The PSI-CELP algorithm for the PDC half-rate CODEC can operate at 22.5 MHz instruction frequency and 1.6 V supply voltage. resulting in a low-power consumption of 28 mW.
Katsuhiko UEDA Toshio SUGIMURA Toshihiro ISHIKAWA Minoru OKAMOTO Mikio SAKAKIHARA Shinichi MARUI
This paper describes a new, low power 16-bit Digital Signal Processor (DSP). The DSP has a double-speed MAC mechanism, an accelerator for Viterbi decoding, and a block floating section which contribute to lower power consumption. The double-speed MAC can perform two multiply and accumulate operations in one instruction cycle. Since MAC operations are so common in digital signal processing, this mechanism can reduce the average clock frequency of the DSP resulting in lower power consumption. The Viterbi accelerator and block floating circuitry also reduce the clock frequency by minimizing the number of required cycles needed to be executed. The DSP was fabricated using a 0.8 µm CMOS 2-aluminum layer process technology to integrate 644 K transistors on a 9.30 mm9.09 mm die. It can realize an 11.2 kbps VSELP speech CODEC while consuming only 70 mW at 3.5 V Vdd.
Norichika KUMAMOTO Keiji AOKI Hiroaki KUNIEDA
This paper proposes a hierarchical Digital Signal Processor (DSP) Code Generator VIRGO for large scale general signal processing algorithms. Hierarchical structured Vectorized Signal Flow Graph (V-SFG) description is used as input specifications. Ths DSP independent optimization procedure for both the program size and the execution time is performed each module by each hierarchically with regard to operation order, memory assignment and register allocation. The efficient code generation is demonstrated by comparing both instruction steps and dynamic steps of a practical ADPCM encoder/decoder with a conventional method.