IEICE global.ieice.org Site

Keyword Search Result

[Keyword] processor architecture(17hit)

1-17hit

Non-Stop Microprocessor for Fault-Tolerant Real-Time Systems Open Access
Shota NAKABEPPU Nobuyuki YAMASAKI

PAPER

Pubricized:
2023/01/25
Vol:
E106-C No:7
Page(s):
365-381
It is very important to design an embedded real-time system as a fault-tolerant system to ensure dependability. In particular, when a power failure occurs, restart processing after power restoration is required in a real-time system using a conventional processor. Even if power is restored quickly, the restart process takes a long time and causes deadline misses. In order to design a fault-tolerant real-time system, it is necessary to have a processor that can resume operation in a short time immediately after power is restored, even if a power failure occurs at any time. Since current embedded real-time systems are required to execute many tasks, high schedulability for high throughput is also important. This paper proposes a non-stop microprocessor architecture to achieve a fault-tolerant real-time system. The non-stop microprocessor is designed so as to resume normal operation even if a power failure occurs at any time, to achieve little performance degradation for high schedulability even if checkpoint creations and restorations are performed many times, to control flexibly non-volatile devices through software configuration, and to ensure data consistency no matter when a checkpoint restoration is performed. The evaluation shows that the non-stop microprocessor can restore a checkpoint within 5µsec and almost hide the overhead of checkpoint creations. The non-stop microprocessor with such capabilities will be an essential component of a fault-tolerant real-time system with high schedulability.
An Inductive Method to Select Simulation Points
MinSeong CHOI Takashi FUKUDA Masahiro GOSHIMA Shuichi SAKAI

PAPER-Architecture

Pubricized:
2016/08/24
Vol:
E99-D No:12
Page(s):
2891-2900
The time taken for processor simulation can be drastically reduced by selecting simulation points, which are dynamic sections obtained from the simulation result of processors. The overall behavior of the program can be estimated by simulating only these sections. The existing methods to select simulation points, such as SimPoint, used for selecting simulation points are deductive and based on the idea that dynamic sections executing the same static section of the program are of the same phase. However, there are counterexamples for this idea. This paper proposes an inductive method, which selects simulation points from the results obtained by pre-simulating several processors with distinctive microarchitectures, based on assumption that sections in which all the distinctive processors have similar istructions per cycle (IPC) values are of the same phase. We evaluated the first 100G instructions of SPEC 2006 programs. Our method achieved an IPC estimation error of approximately 0.1% by simulating approximately 0.05% of the 100G instructions.
Ultrasmall: A Tiny Soft Processor Architecture with Multi-Bit Serial Datapaths for FPGAs
Shinya TAKAMAEDA-YAMAZAKI Hiroshi NAKATSUKA Yuichiro TANAKA Kenji KISE

PAPER-Architecture

Pubricized:
2015/09/15
Vol:
E98-D No:12
Page(s):
2150-2158
Soft processors are widely used in FPGA-based embedded computing systems. For such purposes, efficiency in resource utilization is as important as high performance. This paper proposes Ultrasmall, a new soft processor architecture for FPGAs. Ultrasmall supports a subset of the MIPS-I instruction set architecture and employs an area efficient microarchitecture to reduce the use of FPGA resources. While supporting the original 32-bit ISA, Ultrasmall uses a 2-bit serial ALU for all of its operations. This approach significantly reduces the resource utilization instead of increasing the performance overheads. In addition to these device-independent optimizations, we applied several device-dependent optimizations for Xilinx Spartan-3E FPGAs using 4-input lookup tables (LUTs). Optimizations using specific primitives aggressively reduce the number of occupied slices. Our evaluation result shows that Ultrasmall occupies only 84% of the previous small soft processor. In addition to the utilized resource reduction, Ultrasmall achieves 2.9 times higher performance than the previous approach.
Address Order Violation Detection with Parallel Counting Bloom Filters
Naruki KURATA Ryota SHIOYA Masahiro GOSHIMA Shuichi SAKAI

PAPER

Vol:
E98-C No:7
Page(s):
580-593
To eliminate CAMs from the load/store queues, several techniques to detect memory access order violation with hash filters composed of RAMs have been proposed. This paper proposes a technique with parallel counting Bloom filters (PCBF). A Bloom filter has extremely low false positive rates owing to multiple hash functions. Although some existing researches claim the use of Bloom filters, none of them make mention to multiple hash functions. This paper also addresses the problem relevant to the variety of access sizes of load/store instructions. The evaluation results show that our technique, with only 2720-bit Bloom filters, achieves a relative IPC of 99.0% while the area and power consumption are greatly reduced to 14.3% and 22.0% compared to a conventional model with CAMs. The filter is much smaller than usual branch predictors.
Register Indirect Jump Target Forwarding
Ryota SHIOYA Naruki KURATA Takashi TOYOSHIMA Masahiro GOSHIMA Shuichi SAKAI

PAPER-Computer System

Vol:
E96-D No:2
Page(s):
278-288
Object-oriented languages have recently become common, making register indirect jumps more important than ever. In object-oriented languages, virtual functions are heavily used because they improve programming productivity greatly. Virtual function calls usually consist of register indirect jumps, and consequently, programs written in object-oriented languages contain many register indirect jumps. The prediction of the targets of register indirect jumps is more difficult than the prediction of the direction of conditional branches. Many predictors have been proposed for register indirect jumps, but they cannot predict the jump targets with high accuracy or require very complex hardware. We propose a method that resolves jump targets by forwarding execution results. Our proposal dynamically finds the producers of register indirect jumps in virtual function calls. After the execution of the producers, the execution results are forwarded to the processor's front-end. The jump targets can be resolved by the forwarded execution results without requiring prediction. Our proposal improves the performance of programs that include unpredictable register indirect jumps, because it does not rely on prediction but instead uses actual execution results. Our evaluation shows that the IPC improvement using our proposal is as high as 5.4% on average and 9.8% at maximum.
Architecture and Implementation of a Reduced EPIC Processor
Jun GAO Minxuan ZHANG Zuocheng XING Chaochao FENG

PAPER-Computer System

Vol:
E96-D No:1
Page(s):
9-18
This paper proposes a Reduced Explicitly Parallel Instruction Computing Processor (REPICP) which is an independently designed, 64-bit, general-purpose microprocessor. The REPICP based on EPIC architecture overcomes the disadvantages of hardware-based superscalar and software-based Very Long Instruction Word (VLIW) and utilizes the cooperation of compiler and hardware to enhance Instruction-Level Parallelism (ILP). In REPICP, we propose the Optimized Lock-Step execution Model (OLSM) and instruction control pipeline method. We also propose reduced innovative methods to optimize the design. The REPICP is fabricated in Artisan 0.13 µm Nominal 1P8M process with 57 M transistors. The die size of the REPICP is 100 mm2 (1010), and consumes only 12 W power when running at 300 MHz.
Low-Overhead Architecture for Security Tag
Ryota SHIOYA Daewung KIM Kazuo HORIO Masahiro GOSHIMA Shuichi SAKAI

PAPER-Computer System

Vol:
E94-D No:1
Page(s):
69-78
A security-tagged architecture is one that applies tags on data to detect attack or information leakage, tracking data flow. The previous studies using security-tagged architecture mostly focused on how to utilize tags, not how the tags are implemented. A naive implementation of tags simply adds a tag field to every byte of the cache and the memory. Such a technique, however, results in a huge hardware overhead. This paper proposes a low-overhead tagged architecture. We achieve our goal by exploiting some properties of tag, the non-uniformity and the locality of reference. Our design includes the use of uniquely designed multi-level table and various cache-like structures, all contributing to exploit these properties. Under simulation, our method was able to limit the memory overhead to 0.685%, where a naive implementation suffered 12.5% overhead.
Ultra Dependable Processor
Shuichi SAKAI Masahiro GOSHIMA Hidetsugu IRIE

INVITED PAPER

Vol:
E91-C No:9
Page(s):
1386-1393
This paper presents the processor architecture which provides much higher level dependability than the current ones. The features of it are: (1) fault tolerance and secure processing are integrated into a modern superscalar VLSI processor; (2) light-weight effective soft-error tolerant mechanisms are proposed and evaluated; (3) timing errors on random logic and registers are prevented by low-overhead mechanisms; (4) program behavior is hidden from the outer world by proposed address translation methods; (5) information leakage can be avoided by attaching policy tags for all data and monitoring them for each instruction execution; (6) injection attacks are avoided with much higher accuracy than the current systems, by providing tag trackings; (7) the overall structure of the dependable processor is proposed with a dependability manager which controls the detection of illegal conditions and recovers to the normal mode; and (8) an FPGA-based testbed system is developed where the system clock and the voltage are intentionally varied for experiment. The paper presents the fundamental scheme for the dependability, elemental technologies for dependability and the whole architecture of the ultra dependable processor. After showing them, the paper concludes with future works.
Dynamic Reconfiguration of Cache Indexing in Embedded Processors
Junhee KIM Sung-Soo LIM Jihong KIM

PAPER-VLSI Systems

Vol:
E90-D No:3
Page(s):
637-647
Cache performance optimization is an important design consideration in building high-performance embedded processors. Unlike general-purpose microprocessors, embedded processors can take advantages of application-specific information in optimizing the cache performance. One of such examples is to use modified cache index bits (over conventional index bits) based on memory access traces from key target embedded applications so that the number of conflict misses can be reduced. In this paper, we present a novel fine-grained cache reconfiguration technique which allows an intra-program reconfiguration of cache index bits, thus better reflecting the changing characteristics of a program execution. The proposed technique, called dynamic reconfiguration of index bits (DRIB), dynamically changes cache index bits in the function level. This compiler-directed and fine-grained approach allows each function to be executed using its own optimal index bits with no additional hardware support. In order to avoid potential performance degradation by frequent cache invalidations from reconfiguring cache index bits, we describe an efficient algorithm for selecting target functions whose cache index bits are reconfigured. Our algorithm ensures that the number of cache misses reduced by DRIB outnumbers the number of cache misses increased from cache invalidations. We also propose a new cache architecture, Two-Level Indexing (TLI) cache, which further reduces the number of conflict misses by intelligently dividing indexing steps into two stages. Our experimental results show that the DRIP approach combined with the TLI cache reduces the number of cache misses by 35% over the conventional cache indexing technique.
VLSI Architecture Study of a Real-Time Scalable Optical Flow Processor for Video Segmentation
Noriyuki MINEGISHI Junichi MIYAKOSHI Yuki KURODA Tadayoshi KATAGIRI Yuki FUKUYAMA Ryo YAMAMOTO Masayuki MIYAMA Kousuke IMAMURA Hideo HASHIMOTO Masahiko YOSHIMOTO

PAPER-System LSIs and Microprocessors

Vol:
E89-C No:3
Page(s):
230-242
An optical flow processor architecture is proposed. It offers accuracy and image-size scalability for video segmentation extraction. The Hierarchical Optical flow Estimation (HOE) algorithm [1] is optimized to provide an appropriate bit-length and iteration number to realize VLSI. The proposed processor architecture provides the following features. First, an algorithm-oriented data-path is introduced to execute all necessary processes of optical flow derivation allowing hardware cost minimization. The data-path is designed using 4-SIMD architecture, which enables high-throughput operation. Thereby, it achieves real-time optical flow derivation with 100% pixel density. Second, it has scalable architecture for higher accuracy and higher resolution. A third feature is the CMOS-process compatible on-chip 2-port DRAM for die-area reduction. The proposed processor has performance for CIF 30 fr/s with 189 MHz clock frequency. Its estimated core size is 6.025.33 mm2 with six-metal 90-nm CMOS technology.
Reducing Memory System Energy by Software-Controlled On-Chip Memory
Masaaki KONDO Hiroshi NAKAMURA

PAPER-Architecture and Algorithms

Vol:
E86-C No:4
Page(s):
580-588
In recent computer systems, a large portion of energy is consumed by on-chip cache accesses and data movement between cache and off-chip main memory. Reducing these memory system energy is indispensable for future microprocessors because power and thermal issues certainly become a key factor of limiting processor performance. In this paper, we discuss and evaluate how our architecture called SCIMA contributes to energy saving. SCIMA integrates software-controllable memory (SCM) into processor chip. SCIMA can save total memory system energy by using SCM under the support of compiler. The evaluation results reveal that SCIMA can reduce 5-50% of memory system energy and still faster than conventional cache based architecture.
Software Defined Radio Prototype for PHS and IEEE 802.11 Wireless LAN
Hiroyuki SHIBA Takashi SHONO Yushi SHIRATO Ichihiko TOYODA Kazuhiro UEHARA Masahiro UMEHIRA

PAPER

Vol:
E85-B No:12
Page(s):
2694-2702
A software defined radio (SDR) prototype based on a multiprocessor architecture (MPA) is developed. Software for Japanese personal handy phone system (PHS) of a 2G mobile system, and IEEE 802.11 wireless LAN, which has much wider bandwidth than the 2G systems, is successfully implemented. Newly developed flexible-rate pre-/ post-processor (FR-PPP) achieves the flexibility and wideband performance that the platform needs. This paper shows the design of the SDR prototype and evaluates its performance by experiments that include PHS processor load and wireless LAN throughput characteristics and processor load.
Code Efficiency Evaluation for Embedded Processors
Morgan Hirosuke MIKI Mamoru SAKAMOTO Shingo MIYAMOTO Yoshinori TAKEUCHI Toyohiko YOSHIDA Isao SHIRAKAWA

PAPER

Vol:
E85-A No:4
Page(s):
811-818
This paper evaluates the code efficiency of the ARM, Java, and x86 instruction sets by compiling the SPEC CPU95/CPU2000/JVM98 and CaffeineMark benchmarks, from the aspects of code sizes, basic block sizes, instruction distributions, and average instruction lengths. As a result, mainly because (i) the Java architecture is a stack machine, (ii) there are only four local variables which can be accessed by a 1-byte instruction, and (iii) additional instructions are provided for the network security, the code efficiency of Java turns out to be inferior to that of ARM Thumb. Moreover, through this efficiency analysis it should be stressed that there exists the high potential of constructing a more efficient code architecture by taking minute account of the customization of an instruction set as well as the number of registers.
Reducing Cache Energy Dissipation by Using Dual Voltage Supply
Vasily G. MOSHNYAGA Hiroshi TSUJI

PAPER-Optimization of Power and Timing

Vol:
E84-A No:11
Page(s):
2762-2768
Due to a large capacitance and enormous access rate, caches dissipate about a third of the total energy consumed by today's processors. In this paper we present a new architectural technique to reduce energy consumption in caches. Unlike previous approaches, which have focused on lowering cache capacitance and the number of accesses, our method exploits a new freedom in cache design, namely the voltage per access. Since in modern block-buffered caches, the loading capacitance operated on block-hit is much less than the capacitance operated on miss, the given clock cycle time is inefficiently utilized during the hit. We propose to trade-off this unused time with the supply voltage, lowering the voltage level on the hit and increasing it on the miss. Experiments show that the approach can half the cache energy dissipation without large performance and area overhead.
Trends in High-Performance, Low-Power Processor Architectures
Kazuaki MURAKAMI Hidetaka MAGOSHI

PAPER

Vol:
E84-C No:2
Page(s):
131-138
This paper briefly surveys architectural technologies of recent or future high-performance, low-power processors for improving the performance and power/energy consumption simultaneously. Achieving both high performance and low power at the same time imposes a lot of challenges on processor design, and therefore gives us a lot of opportunities for devising new technologies. The paper also tries to provide some insights into the technology direction in future.
System Electronics Technologies for Video Processing and Applications
Tomio KISHIMOTO Hironori YAMAUCHI Ryota KASAI

INVITED PAPER

Vol:
E82-A No:2
Page(s):
197-205
Thanks to rapid progress in computer technology and VLSI technology, we are approaching the stage where ordinary PCs will be able to handle real-time video signals as easily as they handle text data. First, features and applications of the video compression standard MPEG2 are surveyed as a typical video processing. It is clarified that real-time capability becomes more important as applications of MPEG2 widely spread. The trends of video coding in LSIs are summarized. And it is shown that the most advanced encoder/decoder LSI has an improved price-performance ratio that allows it to be adopted in consumer equipment. Finally, future directions of parallel architecture in video processing are surveyed in terms of special-purpose and general-purpose processing. The special approach has always taken the lead in video processing using sophisticated hardware-oriented parallel architectures. The general-purpose architecture method has gradually evolved in accordance with a software-oriented architecture. Both approaches will continue to evolve into a new stage by selecting possible parallel architectures such as multimedia instruction sets and process-level parallelism, and applying them in compound use. The so-called super processor architecture will emerge in the near future and it will be an ideal method that can manage rapid increase in requirements of capability and applicability in video processing.
ASIC Approaches for Vision-Based Vehicle Guidance
Ichiro MASAKI

INVITED PAPER

Vol:
E76-C No:12
Page(s):
1735-1743
This paper describes a vision system, which is based on ASIC (Application Specific Integrated Circuit) approaches, for vehicle guidance on highways. After reviewing related work in the field of intelligent vehicles, stereo vision, and ASIC-based approaches, the paper focuses on a stereo vision system developed for intelligent cruise control. The system measures the distance to the vehicle in front using trinocular triangulation. Application specific processor architectures were developed for low mass-production cost, real-time operation, low power consumption, and small physical size. The system was installed in a trunk of a car and evaluated successfully on highways.