The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] dsp(90hit)

41-60hit(90hit)

  • A 1.5 V, 200 MHz, 400 MIPS, 188 µA/MHz and 1.2 V, 300 MHz, 600 MIPS, 169 µA/MHz Digital Signal Processor Core for 3G Wireless Applications

    Hiroshi TAKAHASHI  Shigeshi ABIKO  Kenichi TASHIRO  Kaoru AWAKA  Yutaka TOYONOH  Rimon IKENO  Shigetoshi MURAMATSU  Yasumasa IKEZAKI  Tsuyoshi TANAKA  Akihiro TAKEGAMA  Hiroshi KIMIZUKA  Hidehiko NITTA  Miki KOJIMA  Masaharu SUZUKI  James Lowell LARIMER  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    491-501

    A new high-speed and low-power digital signal processor (DSP) core, C55x, was developed for next generation applications such as 3G cellular phone, PDA, digital still camera (DSC), audio, video, embedded modem, DVD, and so on. To support such MIPS-rich applications, a packet size of an instruction fetch increased from 16-bit to 32-bit comparing with the world's most popular C54x DSP core, while maintaining complete software compatibility with the legacy DSP code. An on-chip instruction buffer queue (IBQ) automatically unpacks the packets and issues multiple instructions in parallel for the efficient use of circuit resources. The efficiency of the parallelism has been further improved by additional hardwares such as second 1717-bit MAC, a 16-bit ALU, and three temporary registers that can be used for simple computations. Four 40-bit accumulators make it possible to execute more operation per cycle with dramatically reduced overall power consumption. These new architecture allows two times efficiency of instruction per cycle (IPC) than the previous DSP core on typical applications at the same MHz. The new DSP core was designed for TI's two 130 nm technologies, one with high-VT for low-leakage and middle-performance operation at 1.5 V, and the other with low-VT for high-performance and low-VDD operation at 1.2 V, to provide best choices for any applications with a single layout data base. With the low-leakage process, the DSP core operates at over 200 MHz with 188 µA/MHz (at 75% Dual MAC + 25% ADD) active power and less than 1.63 µA standby current. The high-performance process provides it with 300 MHz with 169 µA/MHz active power and less than 680 µA standby current. The new core was designed by a semi-custom approach (ASIC + custom library) using 5-level Cu metal system with low-k dielectric material of fluorosilicate glass (FSG), and about one million transistors are contained in the core. The total balance of its power, performance, area, and leakage current (PPAL) is well suitable to most of next generation applications. In this paper, we will discuss features of the new DSP core, including circuit design techniques for high-speed and low-power, and present an example product.

  • Research of a Smart Antenna System Using a Novel Beamforming Algorithm in the IS2000 1X Channel

    Sungsoo AHN  Minsoo KIM  

     
    LETTER-Wireless Communication Technology

      Vol:
    E87-B No:4
      Page(s):
    1025-1029

    This paper presents a novel algorithm which generates a beam pattern having maximum gain towards target direction. The new technique utilizes a Generalized Conjugate Gradient Method (CGM) based on the conventional CGM for obtaining the optimal weight vector. The proposed method finds a weight vector that maximizes the SINR (Signal to Interference plus Noise Ratio). Based on the an analysis of the results of various computer simulations, it is observed that the proposed algorithm is suitable for the IS2000 1X mobile communication environments.

  • A Hardware/Software Partitioning Algorithm for Processor Cores with Packed SIMD-Type Instructions

    Nozomu TOGAWA  Koichi TACHIKAKE  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    LETTER-Design Methodology

      Vol:
    E86-A No:12
      Page(s):
    3218-3224

    This letter proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume for each operation type a super SIMD functional unit which can execute all the SIMD instructions. Secondly we reduce a SIMD instruction or "sub-function" of each super functional unit, one by one, while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally find SIMD functional unit configuration as well as a processor core architecture. The promising experimental results are also shown.

  • A Retargetable Simulator Generator for DSP Processor Cores with Packed SIMD-type Instructions

    Nozomu TOGAWA  Kyosuke KASAHARA  Yuichiro MIYAOKA  Jinku CHOI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Simulation Accelerator

      Vol:
    E86-A No:12
      Page(s):
    3099-3109

    A packed SIMD type operation or a SIMD operation is n-parallel b/n-bit sub-operations executed by the modified n-bit functional unit. Such a functional unit is called a SIMD functional unit and a processor core which can execute SIMD operations is called a SIMD processor core. SIMD operations can be effectively applied to image processing applications. This paper focuses on hardware/software cosynthesis of SIMD processor cores and particularly proposes a new simulator generator which simulates pipelined instructions for a SIMD processor. Generally, a SIMD functional unit has many options and then we can have so many different SIMD functional unit instances. However, since our hardware/software cosynthesis system synthesizes a special-purpose processor core for an input application program, it uses very limited SIMD functional unit instances. In the proposed approach, we consider a SIMD operation to be a set of SIMD sub-operations. By adding up the appropriate SIMD sub-operations, we construct a single SIMD operation. Then a SIMD functional unit behavior can be characterized by a collection of SIMD operations. This approach has the advantage that: if we have a small number of behavior libraries for SIMD sub-operations, we can instantiate a particular SIMD functional unit behavior. Experimental results demonstrate the effectiveness of the proposed approach.

  • High-Speed and Low-Power Techniques of Hardware and Software for Digital Signal Processors

    Hiroshi TAKAHASHI  Rimon IKENO  Yutaka TOYONOH  Akihiro TAKEGAMA  Yasumasa IKEZAKI  Tohru URASAKI  Hitoshi SATOH  Masayasu ITOIGAWA  Yoshinari MATSUMOTO  

     
    PAPER-Circuit Design

      Vol:
    E86-C No:4
      Page(s):
    589-596

    High-speed and low-power DSPs have been developed for versatile hand set applications. The DSP contains a 16-bit fixed point DSP core with multiple buses, highly tuned instruction sets and a low-power architecture, featuring CPU power with 404.5 µ W/MHz, chip power with 2.08 mW/MHz at peak and 200 µA stand-by current and 160 MHz/160 MIPS performance by a single DSP core, and also operates at 0.68 V within the temperature range from -40C to 125C in the worst case (Weak corner) even using much higher I-off current process compared to a conventional process to obtain a faster operating frequency. In this paper, we discuss circuit design techniques to continue scaling down valuable IP cores keeping the same functionality, better speed performance, and lower power dissipation with much lower voltage operation capability. For further power reduction by DSP software, Run-time Power Control (RPC) has been demonstrated in an MP3 player using 100 MHz/100 MIPS DSP at 1.8 V, which is a real-time application running on an Internet audio evaluation module experimentally and we obtained 32-60% power reduction on various music source data.

  • CODEC Hardware Engines for a Low-Power Baseband DSP Macro

    Hirohisa GAMBE  Teruo ISHIHARA  Yasuji OTA  Norichika KUMAMOTO  Yoshio KUNIYASU  

     
    PAPER-Integrated Electronics

      Vol:
    E85-C No:12
      Page(s):
    2123-2135

    The progress made in large-scale integration of the baseband circuits of digital cellular phones now makes it possible to implement a voice CODEC and its related functions in the baseband LSI rather than through a general-purpose digital signal processor. This paper describes an improved hardware solution that enables efficient application of the PSI-CELP CODEC-- the most complex CODEC for mobile systems--to the PDC half-rate system through its implementation as a DSP macro in a low-voltage, large-scale LSI. Specific circuit blocks are added as hardware engines to a general-purpose DSP-oriented core. These specific engines were implemented as peripheral circuits for a DSP macro that can be used as a single DSP with an added I/O circuit and is suitable for use in future highly integrated mobile baseband chips. With the assistance of these hardware engines and some additional ALU instructions to achieve efficient programming, the machine speed required for the CODEC can be relatively slow, thus allowing the same architecture to be repeatedly used without needing to set the transistor threshold voltage too low even when the use of deeper sub-micron technologies require a chip to run at a lower supply voltage. We evaluated this DSP-macro architecture using a 0.35 µm CMOS technology test chip. Then we developed a commercial base version using 0.25 µm technology and verified that it can operate at 1.2 V and that the PSI-CELP CODEC can be done at 40 MIPS with power consumption of 11 mW. We also verified that the circuit design can be applied up to 0.18 µm technology with a single threshold voltage of 0.3 V. Thus, the design of the DSP macro incorporating the hardware engines provides a great deal of flexibility that should allow its use in chips based on future technologies and the voice CODEC firmware can be effectively re-used. Although the DSP macro architecture was designed mainly through PSI-CELP application analysis, it can process other voice CODECs such as the AMR CODEC for third-generation mobile applications as well as some other mobile baseband functions such as channel CODECs. This approach can also be refined to permit its application to, for example, high-quality audio CODECs.

  • A New Small-Size Multi-Mode and Multi-Task Software Radio Prototype for Future Intelligent Transport Systems

    Hiroshi HARADA  Masayuki FUJISE  

     
    PAPER

      Vol:
    E85-B No:12
      Page(s):
    2703-2715

    In this paper, we newly developed a small-size software radio terminal that can realize global positioning service (GPS) navigation system, vehicle information and communication system (VICS), electronic toll collection system (ETC), AM/FM radio broadcasting services on middle wave (MW) and very high frequency (VHF) bands, FM multiplex broadcasting system, and several modulation schemes such as BPSK, ASK, QPSK, GMSK, and π/4QPSK by downloading software to realize each system from wired and wireless networks. The developed terminal realizes simultaneous multiple services when users would like to use several radio communication services in the driving situation by using our proposed multitask algorithm. The developed terminal has a size of 17.5 cm wide, 19.0 cm deep, and 5 cm high and worked at DC-12.0 V and around 2 A. The size and electrical power consumption are quite small and low and acceptable for consumers such as car drivers. In this paper, we introduce the configuration and proposed key technologies in our developed terminal and measure the software configuration time.

  • Loop and Address Code Optimization for Digital Signal Processors

    Jong-Yeol LEE  In-Cheol PARK  

     
    LETTER-Digital Signal Processing

      Vol:
    E85-A No:6
      Page(s):
    1408-1415

    This paper presents a new DSP-oriented code optimization method to enhance performance by exploiting the specific architectural features of digital signal processors. In the proposed method, a source code is translated into the static single assignment form while preserving the high-level information related to loops and the address computation of array accesses. The information is used in generating hardware loop instructions and parallel instructions provided by most digital signal processors. In addition to the conventional control-data flow graph, a new graph is employed to make it easy to find auto-modification addressing modes efficiently. Experimental results on benchmark programs show that the proposed method is effective in improving performance.

  • A Low-Power Embedded RISC Microprocessor with an Integrated DSP for Mobile Applications

    Tetsuya YAMADA  Makoto ISHIKAWA  Yuji OGATA  Takanobu TSUNODA  Takahiro IRITA  Saneaki TAMAKI  Kunihiko NISHIYAMA  Tatsuya KAMEI  Ken TATEZAWA  Fumio ARAKAWA  Takuichiro NAKAZAWA  Toshihiro HATTORI  Kunio UCHIYAMA  

     
    INVITED PAPER

      Vol:
    E85-C No:2
      Page(s):
    253-262

    A 32-bit embedded RISC microprocessor core integrating a DSP has been developed using a 0.18-µm five-layer-metal CMOS technology. The integrated DSP has a single-MAC and exploits CPU resources to reduce hardware. The DSP occupies only 0.5 mm2. The processor core includes a large on-chip 128 kB SRAM called U-memory. A large capacity on-chip memory decreases the amount of traffic with an external memory. And it is effective for low-power and high-performance operation. To realize low-power dissipation for the U-memory access, the active ratio of U-memory's access is reduced. The critical path is a load path from the U-memory, and we optimized the path through the whole chip. The chip achieves 0.79 mA/MHz executing Dhrystone 1.1 at 108 MHz, which is suitable for mobile applications.

  • A Single Cycle 16-Bit Microcontroller and DSP Core for Systems on Chips Solutions

    Klaus D. MAIER  

     
    PAPER-Product Designs

      Vol:
    E85-C No:2
      Page(s):
    339-346

    The C166S V2 is Infineon Technologies' latest generation 16-bit microcontroller core, member of the C166 family. This new core architecture is a huge step forward in performance and DSP capabilities: With its single cycle engine and enhanced MAC unit running at up to 200 MHz it more than doubles the performance of the fastest C166 based controllers (C166S V1) running at the same speed. Furthermore the instruction set is fully compatible with the previous C166 cores. This architecture is specifically suited for real-time embedded systems with high requirements for performance and signal processing functionality with tight cost and power budgets. As a fully synthesizable core, and with a large selection of peripherals available, the C166 V2 provides a straightforward path to the required specific systems-on-chip.

  • Single Chip Programmable Baseband ASSP for 5 GHz Wireless LAN Applications

    Johannes KNEIP  Matthias WEISS  Wolfram DRESCHER  Volker AUE  Jurgen STROBEL  Thomas OBERTHUR  Michael BOLLE  Gerhard FETTWEIS  

     
    PAPER-Product Designs

      Vol:
    E85-C No:2
      Page(s):
    359-367

    This paper presents the HiperSonic 1, a multi-standard, application-specific signal processor, designed to execute the baseband conversion algorithms in IEEE802.11a- and HIPERLAN/2-based 5 GHz wireless LAN applications. In contrast to widely existing, dedicated implementations, most of the computational effort here was mapped onto a configurable, data- and instruction-parallel DSP core. The core is supplemented by mixed signal A/D, D/A converters and hardware accelerators. Memory and register architecture, instruction set and peripheral interfaces of the chip were carefully optimized for the targeted applications, leading to a sound combination of flexibility, die area and power consumption. The 120 MHz, 7.6 million-transistor solution was implemented in 0.18 µm CMOS and performs IEEE802.11a or HiperLAN/2 compliant baseband processing at data rates up to 60 Mbit/s.

  • Asynchronous Multirate Real-Time Scheduling for Programmable DSPs

    Ichiro KURODA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E85-A No:1
      Page(s):
    241-247

    A novel scheduling method for asynchronous multirate/multi-task processing by programmable digital signal processors (DSPs) has been developed. This mixed scheduling method combines static and dynamic scheduling, and avoids runtime overheads due to interrupts in context switching to realizes asynchronous multirate systems. The processing delay introduced when using static scheduling with static buffering is avoided by introducing deadline scheduling in the static schedule design. In the developed software design system, a block-diagram description language is extended to describe asynchronous multi-task processing. The scheduling method enables asynchronous multirate processing, such as arbitrary-sampling-ratio rate conversion, asynchronous interface, and multimedia applications, to be efficiently realized by programmable DSPs.

  • A New Hardware/Software Partitioning Algorithm for DSP Processor Cores with Two Types of Register Files

    Nozomu TOGAWA  Takashi SAKURAI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    LETTER-Hardware/Software Codesign

      Vol:
    E84-A No:11
      Page(s):
    2802-2807

    This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • Area and Delay Estimation in Hardware/Software Cosynthesis for Digital Signal Processor Cores

    Nozomu TOGAWA  Yoshiharu KATAOKA  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Hardware/Software Codesign

      Vol:
    E84-A No:11
      Page(s):
    2639-2647

    Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2 ns when comparing estimated area and delay with logic-synthesized area and delay.

  • Memory Access Estimation of Filter Bank Implementation on Different DSP Architectures

    Naoki MIZUTANI  Shogo MURAMATSU  Hisakazu KIKUCHI  

     
    PAPER-Implementations of Signal Processing Systems

      Vol:
    E84-A No:8
      Page(s):
    1951-1959

    A unified polyphase representation of analysis and synthesis filter banks is introduced in this paper, and then the efficient implementation on digital signal processors (DSP) is investigated. Especially, the number of memory accesses, power consumption, processing accuracy and the required instruction cycles are discussed. Firstly, a unified representation is given, and then two types of procedures, SIMO system-based and MISO system-based procedures, are shown, where SIMO and MISO are abbreviations for single-input/multiple-output and multiple-input/single-output, respectively. These procedures are compared to each other. It is shown that the number of data load in SIMO system-based procedure is a half of that in MISO system-based procedure for two-channel filter banks. The implementation of M-channel filter banks is also discussed.

  • Code Optimization Technique for Indirect Addressing DSPs with Consideration in Local Computational Order and Memory Allocation

    Nobuhiko SUGINO  Akinori NISHIHARA  

     
    PAPER-Implementations of Signal Processing Systems

      Vol:
    E84-A No:8
      Page(s):
    1960-1968

    Digital signal processors (DSPs) usually employ indirect addressing using address registers (ARs) to indicate their memory addresses, which often introduces overhead codes in AR updates for next memory accesses. Reduction of such overhead code is one of the important issues in automatic generation of highly-efficient DSP codes. In this paper, a new automatic address allocation method incorpolated with computational order rearrangement at local commutative parts is proposed. The method formulates a given memory access sequence by a graph representation, where several strategies to handle freedom in memory access orders at the computational commutative parts are introduced and examined. A compiler scheme is also extended such that computational order at the commutative parts is rearranged according to the derived memory allocation. The proposed methods are applied to an existing DSP compiler for µPD77230(NEC), and codes generated for several examples are compared with memory allocations by the conventional methods.

  • 3D Acoustic Image Localization Algorithm by Embedded DSP

    Wataru KOBAYASHI  Noriaki SAKAMOTO  Takao ONOYE  Isao SHIRAKAWA  

     
    PAPER

      Vol:
    E84-A No:6
      Page(s):
    1423-1430

    This paper describes a realtime 3D sound localization algorithm to be implemented with the use of a low power embedded DSP. A distinctive feature of this implementation approach is that the audible frequency band is divided into three, in accordance with the analysis of the sound reflection and diffraction effects through different media from a certain sound source to human ears. In the low, intermediate, and high frequency subbands, different schemes of the 3D sound localization are devised by means of an IIR filter, parametric equalizers, and a comb filter, respectively, so as to be run realtime on a low power embedded DSP. This algorithm aims at providing a listener with the 3D sound effects through headphones at low cost and low power consumption.

  • Register Constraint Analysis to Minimize Spill Code for Application Specific DSPs

    Tatsuo WATANABE  Nagisa ISHIURA  

     
    LETTER

      Vol:
    E84-A No:6
      Page(s):
    1541-1544

    This letter presents a method which attempts to minimize the number of spill codes to resolve usage conflicts of distributed registers in application specific DSPs. It searches for a set of ordering restrictions among operations which sequentialize the lifetimes of the values residing in the same register as much as possible. Experimental results show that the proposed analysis method reduces the number of register spills into 28%.

  • A 32-bit RISC Microprocessor with DSP Functionality: Rapid Prototyping

    Byung In MOON  Dong Ryul RYU  Jong Wook HONG  Tae Young LEE  Sangook MOON  Yong Surk LEE  

     
    LETTER-Digital Signal Processing

      Vol:
    E84-A No:5
      Page(s):
    1339-1347

    We have designed a 32-bit RISC microprocessor with 16-/32-bit fixed-point DSP functionality. This processor, called YD-RISC, combines both general-purpose microprocessor and digital signal processor (DSP) functionality using the reduced instruction set computer (RISC) design principles. It has functional units for arithmetic operation, digital signal processing (DSP) and memory access. They operate in parallel in order to remove stall cycles after DSP or load/store instructions, which usually need one or more issue latency cycles in addition to the first issue cycle. High performance was achieved with these parallel functional units while adopting a sophisticated five-stage pipeline structure. The pipelined DSP unit can execute one 32-bit multiply-accumulate (MAC) or 16-bit complex multiply instruction every one or two cycles through two 17-b 17-b multipliers and an operand examination logic circuit. Power-saving techniques such as power-down mode and disabling execution blocks allow low power consumption. In the design of this processor, we use logic synthesis and automatic place-and-route. This top-down approach shortens design time, while a high clock frequency is achieved by refining the processor architecture.

  • A Novel Dynamically Programmable Arithmetic Array (DPAA) Processor for Digital Signal Processing

    Boon-Keat TAN  Ryuji YOSHIMURA  Toshimasa MATSUOKA  Kenji TANIGUCHI  

     
    PAPER

      Vol:
    E84-A No:3
      Page(s):
    741-747

    A new architecture-based Dynamically Programmable Arithmetic Array processor (DPAA) is proposed for general purpose Digital Signal Processing applications. Parallelism and pipelining are achieved by using DPAA, which consists of various basic arithmetic blocks connected through a code-division multiple access bus interface. The proposed architecture poses 100% interconnection flexibility because connections are done virtually through code matching instead of physical wire connections. Compared to conventional multiplexing architectures, the proposed interconnection topology consumes less chip area and thus, more arithmetic blocks can be incorporated. A 16-bit prototype chip incorporating 10 multipliers and 40 other arithmetic blocks had been implemented into a 4.5 mm 4.5 mm chip with 0.6 µm CMOS process. DPAA also features its simple programmability, as numerical formula can be used to configure the processor without programming languages or specialized CAD tools.

41-60hit(90hit)