The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] functional unit(8hit)

1-8hit
  • Valid Digit and Overflow Information to Reduce Energy Dissipation of Functional Units in General Purpose Processors

    Kazuhito ITO  Takuya NUMATA  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    463-472

    In order to reduce the dynamic energy dissipation in CMOS LSIs, it is effective to reduce the frequency of value changes of the signals. In this paper, a data expression with the valid digit and lower digit overflow information is proposed to suppress unnecessary signal changes in integer functional units and registers of general purpose processors. Experimental results show that the proposed method reduces the energy dissipation by 9.8% for benchmark programs.

  • A Step towards Static Script Malware Abstraction: Rewriting Obfuscated Script with Maude

    Gregory BLANC  Youki KADOBAYASHI  

     
    PAPER

      Vol:
    E94-D No:11
      Page(s):
    2159-2166

    Modern web applications incorporate many programmatic frameworks and APIs that are often pushed to the client-side with most of the application logic while contents are the result of mashing up several resources from different origins. Such applications are threatened by attackers that often attempts to inject directly, or by leveraging a stepstone website, script codes that perform malicious operations. Web scripting based malware proliferation is being more and more industrialized with the drawbacks and advantages that characterize such approach: on one hand, we are witnessing a lot of samples that exhibit the same characteristics which make these easy to detect, while on the other hand, professional developers are continuously developing new attack techniques. While obfuscation is still a debated issue within the community, it becomes clear that, with new schemes being designed, this issue cannot be ignored anymore. Because many proposed countermeasures confess that they perform better on unobfuscated contents, we propose a 2-stage technique that first relieve the burden of obfuscation by emulating the deobfuscation stage before performing a static abstraction of the analyzed sample's functionalities in order to reveal its intent. We support our proposal with evidence from applying our technique to real-life examples and provide discussion on performance in terms of time, as well as possible other applications of proposed techniques in the areas of web crawling and script classification. Additionally, we claim that such approach can be generalized to other scripting languages similar to JavaScript.

  • A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions

    Hamid NOORI  Farhad MEHDIPOUR  Koji INOUE  Kazuaki MURAKAMI  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    497-508

    Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of these custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. A quantitative approach is utilized to propose an efficient architecture for the RFU and fix its constraints. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple exits custom instructions are proposed. Conditional execution has been added to the RFU to support the multi-exit feature of custom instructions. Experimental results show that multi-exit custom instructions enhance the performance by an average of 67% compared to custom instructions limited to one basic block. A maximum speedup of 4.7, compared to a general embedded processor, and an average speedup of 1.85 was achieved on MiBench benchmark suite.

  • Design Method of High Performance and Low Power Functional Units Considering Delay Variations

    Kouichi WATANABE  Masashi IMAI  Masaaki KONDO  Hiroshi NAKAMURA  Takashi NANYA  

     
    PAPER-Circuit Synthesis

      Vol:
    E89-A No:12
      Page(s):
    3519-3528

    As VLSI technology advances, delay variations will become more serious. Delay-insensitive asynchronous dual-rail circuits tolerate any delay variation, but their energy consumption is more than double that of the single-rail circuits because signal transitions occur every cycle in all bits regardless of the input bit pattern. However, in functional units, a significant number of input bits may not change from the previous input in many cases. In such a situation, calculation of these bits is not required. Thus, we propose a method, called unflip-bits control, makes use of the above situation, to reduce energy consumption. We evaluate the energy consumption and performance penalty for the method using HSPICE and the verilog-XL simulator, and compare the method with the conventional dual-rail circuit and a synchronous circuit. Our evaluation results reveal that the proposed asynchronous dual-rail circuit has a 12-60% lower energy consumption compared with a conventional asynchronous dual-rail circuit.

  • A Cell-Driven Multiplier Generator with Delay Optimization of Partial Products Compression and an Efficient Partition Technique for the Final Addition

    Tso-Bing JUANG  Shen-Fu HSIAO  Ming-Yu TSAI  Jenq-Shiun JAN  

     
    PAPER-Digital Circuits and Computer Arithmetic

      Vol:
    E88-D No:7
      Page(s):
    1464-1471

    In this paper, a cell-driven multiplier generator is developed that can produce high-performance gate-level netlists for multiplier-related arithmetic functional units, including multipliers, multiplier and accumulators (MAC) and dot product calculator. The generator optimizes the speed/area performance both in the partial product compression and in the final addition stage for the specified process technology. In addition to the conventional CMOS full adder cells, we have also designed fast compression elements based on pass-transistor logic for further performance improvement of the generated multipliers. Simulation results show that our proposed generator could produce better multiplier-related functional units compared to those generated using Synopsys Designware library or other previously proposed approaches.

  • A SIMD Instruction Set and Functional Unit Synthesis Algorithm with SIMD Operation Decomposition

    Nozomu TOGAWA  Koichi TACHIKAKE  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Programmable Logic, VLSI, CAD and Layout

      Vol:
    E88-D No:7
      Page(s):
    1340-1349

    This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.

  • A 32-bit RISC Microprocessor with DSP Functionality: Rapid Prototyping

    Byung In MOON  Dong Ryul RYU  Jong Wook HONG  Tae Young LEE  Sangook MOON  Yong Surk LEE  

     
    LETTER-Digital Signal Processing

      Vol:
    E84-A No:5
      Page(s):
    1339-1347

    We have designed a 32-bit RISC microprocessor with 16-/32-bit fixed-point DSP functionality. This processor, called YD-RISC, combines both general-purpose microprocessor and digital signal processor (DSP) functionality using the reduced instruction set computer (RISC) design principles. It has functional units for arithmetic operation, digital signal processing (DSP) and memory access. They operate in parallel in order to remove stall cycles after DSP or load/store instructions, which usually need one or more issue latency cycles in addition to the first issue cycle. High performance was achieved with these parallel functional units while adopting a sophisticated five-stage pipeline structure. The pipelined DSP unit can execute one 32-bit multiply-accumulate (MAC) or 16-bit complex multiply instruction every one or two cycles through two 17-b 17-b multipliers and an operand examination logic circuit. Power-saving techniques such as power-down mode and disabling execution blocks allow low power consumption. In the design of this processor, we use logic synthesis and automatic place-and-route. This top-down approach shortens design time, while a high clock frequency is achieved by refining the processor architecture.

  • An FPGA-Oriented Motion-Stereo Processor with a Simple Interconnection Network for Parallel Memory Access

    Seunghwan LEE  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E83-D No:12
      Page(s):
    2122-2130

    In designing a field-programmable gate array (FPGA)-based processor for motion stereo, a parallel memory system and a simple interconnection network for parallel data transfer are essential for parallel image processing. This paper, firstly, presents an FPGA-oriented hierarchical memory system. To reduce the bandwidth requirement between an on-chip memory in an FPGA and external memories, we propose an efficient scheduling: Once pixels are transferred to the on-chip memory, operations associated with the data are consecutively performed. Secondly, a rectangular memory allocation is proposed which allocates pixels to be accessed in parallel onto different memory modules of the on-chip memory. Consequently, completely parallel access can be achieved. The memory allocation also minimizes the required capacity of the on-chip memory and thus is suitable for FPGA-based implementation. Finally, a functional unit allocation is proposed to minimize the complexity between memory modules and functional units. An experimental result shows that the performance of the processor becomes 96 times higher than that of a 400 MHz Pentium II.