IEICE global.ieice.org Site

Keyword Search Result

[Keyword] arc(1309hit)

381-400hit(1309hit)

Stable Adaptive Work-Stealing for Concurrent Many-Core Runtime Systems
Yangjie CAO Hongyang SUN Depei QIAN Weiguo WU

PAPER-Fundamentals of Information Systems

Vol:
E95-D No:5
Page(s):
1407-1416
The proliferation of many-core architectures has led to the explosive development of parallel applications using programming models, such as OpenMP, TBB, and Cilk/Cilk++. With increasing number of cores, however, it becomes even harder to efficiently schedule parallel applications on these resources since current many-core runtime systems still lack effective mechanisms to support collaborative scheduling of these applications. In this paper, we study feedback-driven adaptive scheduling based on work stealing, which provides an efficient solution for concurrently executing a set of applications on many-core systems. To dynamically estimate the number of cores desired by each application, a stable feedback-driven adaptive algorithm, called SAWS, is proposed using active workers and the length of active deques, which well captures the runtime characteristics of the applications. Furthermore, a prototype system is built by extending the Cilk runtime system, and the experimental results, which are obtained on a Sun Fire server, show that SAWS has more advantages for scheduling concurrent parallel applications. Specifically, compared with existing algorithms A-Steal and WS-EQUI, SAWS improves the performances by up to 12.43% and 21.32% with respect to mean response time respectively, and 25.78% and 46.98% with respect to processor utilization, respectively.
Extracting Communities of Interests for Semantics-Based Graph Searches
Makoto NAKATSUJI Akimichi TANAKA Toshio UCHIYAMA Ko FUJIMURA

PAPER

Vol:
E95-D No:4
Page(s):
932-941
Users recently find their interests by checking the contents published or mentioned by their immediate neighbors in social networking services. We propose semantics-based link navigation; links guide the active user to potential neighbors who may provide new interests. Our method first creates a graph that has users as nodes and shared interests as links. Then it divides the graph by link pruning to extract practical numbers, that the active user can navigate, of interest-sharing groups, i.e. communities of interests (COIs). It then attaches a different semantic tag to the link to each representative user, which best reflects the interests of COIs that they are included in, and to the link to each immediate neighbor of the active user. It finally calculates link attractiveness by analyzing the semantic tags on links. The active user can select the link to access by checking the semantic tags and link attractiveness. User interests extracted from large scale actual blog-entries are used to confirm the efficiency of our proposal. Results show that navigation based on link attractiveness and representative users allows the user to find new interests much more accurately than is otherwise possible.
Codestream-Based Identification of JPEG 2000 Images with Different Coding Parameters
Osamu WATANABE Takahiro FUKUHARA Hitoshi KIYA

PAPER-Image Processing and Video Processing

Vol:
E95-D No:4
Page(s):
1120-1129
A method of identifying JPEG 2000 images with different coding parameters, such as code-block sizes, quantization-step sizes, and resolution levels, is presented. It does not produce false-negative matches regardless of different coding parameters (compression rate, code-block size, and discrete wavelet transform (DWT) resolutions levels) or quantization step sizes. This feature is not provided by conventional methods. Moreover, the proposed approach is fast because it uses the number of zero-bit-planes that can be extracted from the JPEG 2000 codestream by only parsing the header information without embedded block coding with optimized truncation (EBCOT) decoding. The experimental results revealed the effectiveness of image identification based on the new method.
Intelligent Data Rate Control in Cognitive Mobile Heterogeneous Networks
Jeich MAR Hsiao-Chen NIEN Jen-Chia CHENG

PAPER

Vol:
E95-B No:4
Page(s):
1161-1169
An adaptive rate controller (ARC) based on an adaptive neural fuzzy inference system (ANFIS) is designed to autonomously adjust the data rate of a mobile heterogeneous network to adapt to the changing traffic load and the user speed for multimedia call services. The effect of user speed on the handoff rate is considered. Through simulations, it has been demonstrated that the ANFIS-ARC is able to maintain new call blocking probability and handoff failure probability of the mobile heterogeneous network below a prescribed low level over different user speeds and new call origination rates while optimizing the average throughput. It has also been shown that the mobile cognitive wireless network with the proposed CS-ANFIS-ARC protocol can support more traffic load than neural fuzzy call-admission and rate controller (NFCRC) protocol.
Local Location Search Based Progressive Geographic Multicast Protocol in Wireless Sensor Networks
Euisin LEE Soochang PARK Jeongcheol LEE Sang-Ha KIM

LETTER-Network

Vol:
E95-B No:4
Page(s):
1419-1422
To provide scalability against group size, Global Location Search based Hierarchical Geographic Multicast Protocols (GLS-HGMPs) have recently been proposed for wireless sensor networks. To reduce the communication overhead imposed by the global location search and prevent the multicast data detour imposed by the hierarchical geographic multicasting in GLS-HGMPs, this letter proposes Local Location Search based Progressive Geographic Multicast Protocol (LLS-PGMP). Simulation results show that LLS-PGMP is superior to GLS-HGMPs.
A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition
Kazuhiro NAKAMURA Ryo SHIMAZAKI Masatoshi YAMAMOTO Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER

Vol:
E95-C No:4
Page(s):
456-467
This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.
AQBE – QBE Style Queries for Archetyped Data
Shelly SACHDEVA Daigo YAGINUMA Wanming CHU Subhash BHALLA

PAPER-Biological Engineering

Vol:
E95-D No:3
Page(s):
861-871
Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.
Cell Searching and DoA Estimation Methods for Mobile Relay Stations with a Uniform Linear Array
Yo-Han KO Chang-Hwan PARK Soon-Jik KWON Yong-Soo CHO

PAPER-Transmission Systems and Transmission Equipment for Communications

Vol:
E95-B No:3
Page(s):
803-809
In this paper, cell searching and direction-of-arrival (DoA) estimation methods are proposed for mobile relay stations with a uniform linear arrays in OFDM-based cellular systems. The proposed methods can improve the performance of cell searching and DoA estimation, even when there exist symbol timing offsets among the signals received from adjacent base stations and Doppler frequency shifts caused by the movement of the mobile relay station. The performances and computational complexities of the proposed cell searching and DoA estimation methods are evaluated by computer simulation under a mobile WiMAX environment.
Single Front-End MIMO Architecture with Parasitic Antenna Elements Open Access
Mitsuteru YOSHIDA Kei SAKAGUCHI Kiyomichi ARAKI

PAPER-Wireless Communication Technologies

Vol:
E95-B No:3
Page(s):
882-888
In recent years, wireless communication technology has been studied intensively. In particular, MIMO which employs several transmit and receive antennas is a key technology for enhancing spectral efficiency. However, conventional MIMO architectures require some transceiver circuits for the sake of transmitting and receiving separate signals, which incurs the cost of one RF front-end per antenna. In addition to that, MIMO systems are assumed to be used in low spatial correlation environment between antennas. Since a short distance between each antenna causes high spatial correlation and coupling effect, it is difficult to miniaturize wireless terminals for mobile use. This paper shows a novel architecture which enables mobile terminals to be miniaturized and to work with a single RF front-end by means of adaptive analog beam-forming with parasitic antenna elements and antenna switching for spatial multiplexing. Furthermore, statistical analysis of the proposed architecture is also discussed in this paper.
Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications
Xinning LIU Chen MEI Peng CAO Min ZHU Longxing SHI

PAPER-Design Methodology

Vol:
E95-D No:2
Page(s):
374-382
This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), a dynamically coarse grain reconfigurable architecture. REMUS-II consists of a µPU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used to speeds up control-intensive tasks and data-intensive tasks respectively. The parallel computing capability and flexibility of REMUS-II makes itself an excellent candidate to process multimedia applications, which require a large amount of memory accesses. In this paper, we specifically optimize the data flow to deal with those performance-hazard and energy-hungry memory accessing in order to meet the bandwidth requirement of parallel computing. The RPU internal memory could work in multiple modes, like 2D-access mode and transformation mode, according to different multimedia access patterns. This novel design can improve the performance up to 26% compared to traditional on-chip memory. Meanwhile, the block buffer is implemented to optimize the off-chip data flow through reducing off-chip memory accesses, which reducing up to 43% compared to direct DDR access. Based on RTL simulation, REMUS-II can achieve 1080p@30 fps of H.264 High Profile@ Level 4 and High Level MPEG2 at 200 MHz clock frequency. The REMUS-II is implemented into 23.7 mm2 silicon on TSMC 65 nm logic process with a 400 MHz maximum working frequency.
Region-Oriented Placement Algorithm for Coarse-Grained Power-Gating FPGA Architecture
Ce LI Yiping DONG Takahiro WATANABE

PAPER-Design Methodology

Vol:
E95-D No:2
Page(s):
314-323
An FPGA plays an essential role in industrial products due to its fast, stable and flexible features. But the power consumption of FPGAs used in portable devices is one of critical issues. Top-down hierarchical design method is commonly used in both ASIC and FPGA design. But, in the case where plural modules are integrated in an FPGA and some of them might be in sleep-mode, current FPGA architecture cannot be fully effective. In this paper, coarse-grained power gating FPGA architecture is proposed where a whole area of an FPGA is partitioned into several regions and power supply is controlled for each region, so that modules in sleep mode can be effectively power-off. We also propose a region oriented FPGA placement algorithm fitted to this user's hierarchical design based on VPR [1]. Simulation results show that this proposed method could reduce power consumption of FPGA by 38% on average by setting unused modules or regions in sleep mode.
An Easily Testable Routing Architecture and Prototype Chip
Kazuki INOUE Masahiro KOGA Motoki AMAGASAKI Masahiro IIDA Yoshinobu ICHIDA Mitsuro SAJI Jun IIDA Toshinori SUEYOSHI

PAPER-Architecture

Vol:
E95-D No:2
Page(s):
303-313
Generally, a programmable LSI such as an FPGA is difficult to test compared to an ASIC. There are two major reasons for this. The first is that an automatic test pattern generator (ATPG) cannot be used because of the programmability of the FPGA. The other reason is that the FPGA architecture is very complex. In this paper, we propose a new FPGA architecture that will simplify the testing of the device. The base of our architecture is general island-style FPGA architecture, but it consists of a few types of circuit blocks and orderly wire connections. This paper also presents efficient test configurations for our proposed architecture. We evaluated our architecture and test configurations using a prototype chip. As a result, the chip was fully tested using our configurations in a short test time. Moreover, our architecture can provide comparable performance to a conventional FPGA architecture.
A Flexible LDPC Decoder Architecture Supporting TPMP and TDMP Decoding Algorithms
Shuangqu HUANG Xiaoyang ZENG Yun CHEN

PAPER-Application

Vol:
E95-D No:2
Page(s):
403-412
In this paper a programmable and area-efficient decoder architecture supporting two decoding algorithms for Block-LDPC codes is presented. The novel decoder can be configured to decode in either TPMP or TDMP decoding mode according to different Block-LDPC codes, essentially combining the advantages of two decoding algorithms. With a regular and scalable data-path, a Reconfigurable Serial Processing Engine (RSPE) is proposed to achieve area efficiency. To verify our proposed architecture, a flexible LDPC decoder fully compliant to IEEE 802.16e applications is implemented on a 130 nm 1P8M CMOS technology with a total area of 6.3 mm2 and maximum operating frequency of 250 MHz. The chip dissipates 592 mW when operates at 250 MHz frequency and 1.2 V supply.
Design of Area- and Power-Efficient Pipeline FFT Processors for 8x8 MIMO-OFDM Systems
Shingo YOSHIZAWA Yoshikazu MIYANAGA

PAPER-VLSI Design Technology and CAD

Vol:
E95-A No:2
Page(s):
550-558
We present area- and power-efficient pipeline 128- and 128/64-point fast Fourier transform (FFT) processors for 8x8 multiple-input multiple-output orthogonal frequency multiplexing (MIMO-OFDM) systems based on the specification framework of IEEE 802.11ac WLANs. Our new FFT processors use mixed-radix multipath delay commutator (MRMDC) architecture from the point of view of low complexity and high memory use. A conventional MRMDC architecture induces large circuits in delay commutators, which change the order of data sequences for the butterfly units. The proposed architecture replaces delay elements with new commutators that cooperate with other MIMO-OFDM processing blocks. These commutators are inserted in the front and rear of the input and output memory units. Our FFT processors exhibit a 50–51% reduction in logic gates and 70–72% reduction in power dissipation as compared with conventional ones.
Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture
Shouyi YIN Chongyong YIN Leibo LIU Min ZHU Shaojun WEI

PAPER-Design Methodology

Vol:
E95-D No:2
Page(s):
335-344
Coarse-grained reconfigurable architecture (CGRA) combines the performance of application-specific integrated circuits (ASICs) and the flexibility of general-purpose processors (GPPs), which is a promising solution for embedded systems. With the increasing complexity of reconfigurable resources (processing elements, routing cells, I/O blocks, etc.), the reconfiguration cost is becoming the performance bottleneck. The major reconfiguration cost comes from the frequent memory-read/write operations for transferring the configuration context from main memory to context buffer. To improve the overall performance, it is critical to reduce the amount of configuration context. In this paper, we propose a configuration context reduction method for CGRA. The proposed method exploits the structure correlation of computation tasks that are mapped onto CGRA and reduce the redundancies in configuration context. Experimental results show that the proposed method can averagely reduce the configuration context size up to 71% and speed up the execution up to 68%. The proposed method does not depend on any architectural feature and can be applied to CGRA with an arbitrary architecture.
Low-Complexity Memory Access Architectures for Quasi-Cyclic LDPC Decoders
Ming-Der SHIEH Shih-Hao FANG Shing-Chung TANG Der-Wei YANG

PAPER-Computer System

Vol:
E95-D No:2
Page(s):
549-557
Partially parallel decoding architectures are widely used in the design of low-density parity-check (LDPC) decoders, especially for quasi-cyclic (QC) LDPC codes. To comply with the code structure of parity-check matrices of QC-LDPC codes, many small memory blocks are conventionally employed in this architecture. The total memory area usually dominates the area requirement of LDPC decoders. This paper proposes a low-complexity memory access architecture that merges small memory blocks into memory groups to relax the effect of peripherals in small memory blocks. A simple but efficient algorithm is also presented to handle the additional delay elements introduced in the memory merging method. Experiment results on a rate-1/2 parity-check matrix defined in the IEEE 802.16e standard show that the LDPC decoder designed using the proposed memory access architecture has the lowest area complexity among related studies. Compared to a design with the same specifications, the decoder implemented using the proposed architecture requires 33% fewer gates and is more power-efficient. The proposed new memory access architecture is thus suitable for the design of low-complexity LDPC decoders.
An Efficient Conflict Detection Algorithm for Packet Filters
Chun-Liang LEE Guan-Yu LIN Yaw-Chung CHEN

PAPER

Vol:
E95-D No:2
Page(s):
472-479
Packet classification is essential for supporting advanced network services such as firewalls, quality-of-service (QoS), virtual private networks (VPN), and policy-based routing. The rules that routers use to classify packets are called packet filters. If two or more filters overlap, a conflict occurs and leads to ambiguity in packet classification. This study proposes an algorithm that can efficiently detect and resolve filter conflicts using tuple based search. The time complexity of the proposed algorithm is O(nW +s), and the space complexity is O(nW), where n is the number of filters, W is the number of bits in a header field, and s is the number of conflicts. This study uses the synthetic filter databases generated by Class-Bench to evaluate the proposed algorithm. Simulation results show that the proposed algorithm can achieve better performance than existing conflict detection algorithms both in time and space, particularly for databases with large numbers of conflicts.
A Fast Sub-Volume Search Method for Human Action Detection
Ping GUO Zhenjiang MIAO Xiao-Ping ZHANG Zhe WANG

LETTER-Image Recognition, Computer Vision

Vol:
E95-D No:1
Page(s):
285-288
This paper discusses the task of human action detection. It requires not only classifying what type the action of interest is, but also finding actions' spatial-temporal locations in a video. The novelty of this paper lies on two significant aspects. One is to introduce a new graph based representation for the search space in a video. The other is to propose a novel sub-volume search method by Minimum Cycle detection. The proposed method has a low computation complexity while maintaining a high action detection accuracy. It is evaluated on two challenging datasets which are captured in cluttered backgrounds. The proposed approach outperforms other state-of-the-art methods in most situations in terms of both Precision-Recall values and running speeds.
Privacy-Enhancing Queries in Personalized Search with Untrusted Service Providers Open Access
Yunsang OH Hyoungshick KIM Takashi OBI

PAPER-Privacy

Vol:
E95-D No:1
Page(s):
143-151
For personalized search, a user must provide her personal information. However, this sometimes includes the user's sensitive information about individuals such as health condition and private lifestyle. It is not sufficient just to protect the communication channel between user and service provider. Unfortunately, the collected personal data can potentially be misused for the service providers' commercial advantage (e.g. for advertising methods to target potential consumers). Our aim here is to protect user privacy by filtering out the sensitive information exposed from a user's query input at the system level. We propose a framework by introducing the concept of query generalizer. Query generalizer is a middleware that takes a query for personalized search, modifies the query to hide user's sensitive personal information adaptively depending on the user's privacy policy, and then forwards the modified query to the service provider. Our experimental results show that the best-performing query generalization method is capable of achieving a low traffic overhead within a reasonable range of user privacy. The increased traffic overhead varied from 1.0 to 3.3 times compared to the original query.
Accurate and Simplified Prediction of L2 Cache Vulnerability for Cost-Efficient Soft Error Protection
Yu CHENG Anguo MA Minxuan ZHANG

PAPER-Trust

Vol:
E95-D No:1
Page(s):
56-66
Soft errors caused by energetic particle strikes in on-chip cache memories have become a critical challenge for microprocessor design. Architectural vulnerability factor (AVF), which is defined as the probability that a transient fault in the structure would result in a visible error in the final output of a program, has been widely employed for accurate soft error rate estimation. Recent studies have found that designing soft error protection techniques with the awareness of AVF is greatly helpful to achieve a tradeoff between performance and reliability for several structures (i.e., issue queue, reorder buffer). Considering large on-chip L2 cache, redundancy-based protection techniques (such as ECC) have been widely employed for L2 cache data integrity with high costs. Protecting caches without accurate knowledge of the vulnerability characteristics may lead to the over-protection, thus incurring high overheads. Therefore, designing AVF-aware protection techniques would be attractive for designers to achieve a cost-efficient protection for caches, especially at early design stage. In this paper, we propose an improved AVF estimation framework for conducing comprehensive characterization of dynamic behavior and predictability of L2 cache vulnerability. We propose to employ Bayesian Additive Regression Trees (BART) method to accurately model the variation of L2 cache AVF and to quantitatively explain the important effects of several key performance metrics on L2 cache AVF. Then we employ bump hunting technique to extract some simple selecting rules based on several key performance metrics for a simplified and fast estimation of L2 cache AVF. Using the simplified L2 cache AVF estimator, we develop an AVF-aware ECC technique as an example to demonstrate the cost-efficient advantages of the AVF prediction based dynamic fault tolerant techniques. Experimental results show that compared with traditional full ECC technique, AVF-aware ECC technique reduces the L2 cache access latency by 16.5% and saves power consumption by 11.4% for SPEC2K benchmarks averagely.

381-400hit(1309hit)

Keyword Search Result

[Keyword] arc(1309hit)

Stable Adaptive Work-Stealing for Concurrent Many-Core Runtime Systems

Extracting Communities of Interests for Semantics-Based Graph Searches

Codestream-Based Identification of JPEG 2000 Images with Different Coding Parameters

Intelligent Data Rate Control in Cognitive Mobile Heterogeneous Networks

Local Location Search Based Progressive Geographic Multicast Protocol in Wireless Sensor Networks

A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition

AQBE – QBE Style Queries for Archetyped Data

Cell Searching and DoA Estimation Methods for Mobile Relay Stations with a Uniform Linear Array

Single Front-End MIMO Architecture with Parasitic Antenna Elements Open Access

Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

Region-Oriented Placement Algorithm for Coarse-Grained Power-Gating FPGA Architecture

An Easily Testable Routing Architecture and Prototype Chip

A Flexible LDPC Decoder Architecture Supporting TPMP and TDMP Decoding Algorithms

Design of Area- and Power-Efficient Pipeline FFT Processors for 8x8 MIMO-OFDM Systems

Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture

Low-Complexity Memory Access Architectures for Quasi-Cyclic LDPC Decoders

An Efficient Conflict Detection Algorithm for Packet Filters

A Fast Sub-Volume Search Method for Human Action Detection

Privacy-Enhancing Queries in Personalized Search with Untrusted Service Providers Open Access

Accurate and Simplified Prediction of L2 Cache Vulnerability for Cost-Efficient Soft Error Protection

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles