IEICE global.ieice.org Site

Keyword Search Result

[Keyword] ACH(1072hit)

441-460hit(1072hit)

Region-Based Way-Partitioning on L1 Data Cache for Low Power
Zhong ZHENG Zhiying WANG Li SHEN

LETTER-Computer System

Vol:
E96-D No:11
Page(s):
2466-2469
Power consumption has become a critical factor for embedded systems, especially for battery powered ones. Caches in these systems consume a large portion of the whole chip power. Embedded systems usually adopt set-associative caches to get better performance. However, parallel accessed cache ways incur more energy dissipation. This paper proposed a region-based way-partitioning scheme to reduce cache way access, and without sacrificing performance, to reduce the cache power consumption. The stack accesses and non-stack accesses are isolated and redirected to different ways of the L1 data cache. Under way-partitioning, cache way accesses are reduced, as well as the memory reference interference. Experimental results show that the proposed approach could save around 27.5% of L1 data cache energy on average, without significant performance degradation.
Basics of Counting Statistics Open Access
Jun OHKUBO

INVITED PAPER

Vol:
E96-B No:11
Page(s):
2733-2740
In this paper, we briefly review the scheme of counting statistics, in which a probability of the number of monitored or target transitions in a Markov jump process is evaluated. It is generally easy to construct a master equation for the Markov jump process, and the counting statistics enables us to straightforwardly obtain basic equations of the counting statistics from the master equation; the basic equation is used to calculate the cumulant generating function of the probability of the number of target transitions. For stationary cases, the probability is evaluated from the eigenvalue analysis. As for the nonstationary cases, we review a numerical integration scheme to calculate the statistics of the number of transitions.
Color Removal Considering Differences of Colors and Achromatic Color Preservation
Go TANAKA Noriaki SUETAKE Eiji UCHINO

LETTER-Image

Vol:
E96-A No:11
Page(s):
2315-2317
In this letter, a novel color removal method considering differences of colors in an input color image and achromatic color preservation is proposed. The achromatic color preservation is assigning lightness values to gray-levels concerning achromatic pixels for natural impression. The effectiveness and validity of the proposed method are verified by experiments.
Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs
Shuai MU Dongdong LI Yubei CHEN Yangdong DENG Zhihua WANG

PAPER-Computer System

Vol:
E96-D No:10
Page(s):
2194-2207
By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts
Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Computer System

Vol:
E96-D No:9
Page(s):
2047-2054
Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.
Extreme Maximum Margin Clustering
Chen ZHANG ShiXiong XIA Bing LIU Lei ZHANG

PAPER-Artificial Intelligence, Data Mining

Vol:
E96-D No:8
Page(s):
1745-1753
Maximum margin clustering (MMC) is a newly proposed clustering method that extends the large-margin computation of support vector machine (SVM) to unsupervised learning. Traditionally, MMC is formulated as a nonconvex integer programming problem which makes it difficult to solve. Several methods rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programming (SDP) or second-order cone program (SOCP), which are computationally expensive and have difficulty handling large-scale data sets. In linear cases, by making use of the constrained concave-convex procedure (CCCP) and cutting plane algorithm, several MMC methods take linear time to converge to a local optimum, but in nonlinear cases, time complexity is still high. Since extreme learning machine (ELM) has achieved similar generalization performance at much faster learning speed than traditional SVM and LS-SVM, we propose an extreme maximum margin clustering (EMMC) algorithm based on ELM. It can perform well in nonlinear cases. Moreover, the kernel parameters of EMMC need not be tuned by means of random feature mappings. Experimental results on several real-world data sets show that EMMC performs better than traditional MMC methods, especially in handling large-scale data sets.
Proximity Based Object Segmentation in Natural Color Images Using the Level Set Method
Tran Lan Anh NGUYEN Gueesang LEE

PAPER-Image

Vol:
E96-A No:8
Page(s):
1744-1751
Segmenting indicated objects from natural color images remains a challenging problem for researches of image processing. In this paper, a novel level set approach is presented, to address this issue. In this segmentation algorithm, a contour that lies inside a particular region of the concerned object is first initialized by a user. The level set model is then applied, to extract the object of arbitrary shape and size containing this initial region. Constrained on the position of the initial contour, our proposed framework combines two particular energy terms, namely local and global energy, in its energy functional, to control movement of the contour toward object boundaries. These energy terms are mainly based on graph partitioning active contour models and Bhattacharyya flow, respectively. Its flow describes dissimilarities, measuring correlative relationships between the region of interest and surroundings. The experimental results obtained from our image collection show that the suggested method yields accurate and good performance, or better than a number of segmentation algorithms, when applied to various natural images.
Using MathML Parallel Markup Corpora for Semantic Enrichment of Mathematical Expressions
Minh-Quoc NGHIEM Giovanni YOKO KRISTIANTO Akiko AIZAWA

PAPER-Data Engineering, Web Information Systems

Vol:
E96-D No:8
Page(s):
1707-1715
This paper explores the problem of semantic enrichment of mathematical expressions. We formulate this task as the translation of mathematical expressions from presentation markup to content markup. We use MathML, an application of XML, to describe both the structure and content of mathematical notations. We apply a method based on statistical machine translation to extract translation rules automatically. This approach contrasts with previous research, which tends to rely on manually encoded rules. We also introduce segmentation rules used to segment mathematical expressions. Combining segmentation rules and translation rules strengthens the translation system and archives significant improvements over a prior rule-based system.
Active Breadcrumbs: Adaptive Distribution of In-Network Guidance Information for Content-Oriented Networks
Masayuki KAKIDA Yosuke TANIGAWA Hideki TODE

PAPER

Vol:
E96-B No:7
Page(s):
1670-1679
Lately, access loads on servers are increasing due to larger content size and higher request frequency in content distribution networks. Breadcrumbs (BC), an architecture with guidance information for locating a content cache, is designed to reduce the server load and to form content-oriented network autonomously in cooperation with cached contents over IP network. We also proposed Breadcrumbs+ which solves BC's endless routing loop problem. However, Breadcrumbs takes only a passive approach; BC entries are created only when a content is downloaded and only at routers on the download path but not at any other routers. We expect that active and adaptive control of guidance information with simple complexity improves its performance with keeping scalability. In this paper, we propose Active Breadcrumbs which achieves efficient content retrieval and load-balancing through active and adaptive control of guidance information by cache-nodes themselves. In addition, we show the effectiveness of Active Breadcrumbs through the extensive computer simulation.
Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation
Zezhong LI Hideto IKEDA Junichi FUKUMOTO

PAPER-Natural Language Processing

Vol:
E96-D No:7
Page(s):
1536-1543
In most phrase-based statistical machine translation (SMT) systems, the translation model relies on word alignment, which serves as a constraint for the subsequent building of a phrase table. Word alignment is usually inferred by GIZA++, which implements all the IBM models and HMM model in the framework of Expectation Maximum (EM). In this paper, we present a fully Bayesian inference for word alignment. Different from the EM approach, the Bayesian inference makes use of all possible parameter values rather than estimating a single parameter value, from which we expect a more robust inference. After inferring the word alignment, current SMT systems usually train the phrase table from Viterbi word alignment, which is prone to learn incorrect phrases due to the word alignment mistakes. To overcome this drawback, a new phrase extraction method is proposed based on multiple Gibbs samples from Bayesian inference for word alignment. Empirical results show promising improvements over baselines in alignment quality as well as the translation performance.
Revisiting Shared Cache Contention Problems: A Practical Hardware-Software Cooperative Approach
Eunji PAK Sang-Hoon KIM Jaehyuk HUH Seungryoul MAENG

PAPER-Computer System

Vol:
E96-D No:7
Page(s):
1457-1466
Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.
Bayesian Theory Based Adaptive Proximity Data Accessing for CMP Caches
Guohong LI Zhenyu LIU Sanchuan GUO Dongsheng WANG

PAPER

Vol:
E96-A No:6
Page(s):
1293-1305
As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they induce higher overall L1 miss latencies because of the longer average distance between the requestor and the home node, and the potential congestions at certain nodes. We observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. In order to leverage the aforementioned property, we propose Bayesian Theory based Adaptive Proximity Data Accessing (APDA). In our proposal, we organize the multi-core into clusters of 2x2 nodes, and introduce the Proximity Data Prober (PDP) to detect whether an L1 miss can be served by one of the cluster L1 caches. Furthermore, we devise the Bayesian Decision Classifier (BDC) to adaptively select the remote L2 cache or the neighboring L1 node as the server according to the minimum miss cost. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the APDA can reduce the execution time by 20% and reduce the energy by 14% compared to a standard multi-core with a shared L2. The experimental results demonstrate that our proposal outperforms the up-to-date mechanisms, such as ASR, DCC and RNUCA.
A High-Speed Trace-Driven Cache Configuration Simulator for Dual-Core Processor L1 Caches
Masashi TAWADA Masao YANAGISAWA Nozomu TOGAWA

PAPER

Vol:
E96-A No:6
Page(s):
1283-1292
Recently, multi-core processors are used in embedded systems very often. Since application programs is much limited running on embedded systems, there must exists an optimal cache memory configuration in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal one. Multi-core cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast dual-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multi-core cache configurations with different cache associativities. After that, we propose a new multi-core cache configuration simulation algorithm using our new data structure associated with new theorems. Experimental results demonstrate that our algorithm obtains exact simulation results but runs 20 times faster than a conventional approach.
Bidirectional Local Template Patterns: An Effective and Discriminative Feature for Pedestrian Detection
Jiu XU Ning JIANG Satoshi GOTO

PAPER

Vol:
E96-A No:6
Page(s):
1204-1213
In this paper, a novel feature named bidirectional local template patterns (B-LTP) is proposed for use in pedestrian detection in still images. B-LTP is a combination and modification of two features, histogram of templates (HOT) and center-symmetric local binary patterns (CS-LBP). For each pixel, B-LTP defines four templates, each of which contains the pixel itself and two neighboring center-symmetric pixels. For each template, it then calculates information from the relationships among these three pixels and from the two directional transitions across these pixels. Moreover, because the feature length of B-LTP is small, it consumes less memory and computational power. Experimental results on an INRIA dataset show that the speed and detection rate of our proposed B-LTP feature outperform those of other features such as histogram of orientated gradient (HOG), HOT, and covariance matrix (COV).
A Design of Low Latency Random Access Preamble Detector for LTE Uplink Receiver
Joohyun LEE Bontae KOO Hyuckjae LEE

PAPER-Transmission Systems and Transmission Equipment for Communications

Vol:
E96-B No:5
Page(s):
1089-1096
This paper presents a hardware design of high throughput, low latency preamble detector for 3GPP LTE physical random access channel (PRACH) receiver. The presented PRACH receiver uses the pipelined structure to improve the throughput of power delay profile (PDP) generation which is executed multiple times during the preamble detection. In addition, to reduce detection latency, we propose an instantaneous preamble detection method for both restricted and unrestricted set. The proposed preamble detection method can detect all existing preambles directly and instantaneously from PDP output while conducting PDP combining for restricted set. The PDP combining enables the PRACH receiver to detect preambles robustly even in severe Doppler effect or frequency error exist. Using proposed method, the worst case preamble detection latency time can be less than 1 ms with 136 MHz clock and the proposed PRACH receiver can be implemented with approximately 237k equivalent ASIC gates count or occupying 30.2% of xc6vlx130t FPGA device.
Link Analysis Based on Rhetorical Relations for Multi-Document Summarization
Nik Adilah Hanin BINTI ZAHRI Fumiyo FUKUMOTO Suguru MATSUYOSHI

PAPER-Natural Language Processing

Vol:
E96-D No:5
Page(s):
1182-1191
This paper presents link analysis based on rhetorical relations with the aim of performing extractive summarization for multiple documents. We first extracted sentences with salient terms from individual document using statistical model. We then ranked the extracted sentences by measuring their relative importance according to their connectivity among the sentences in the document set using PageRank based on the rhetorical relations. The rhetorical relations were examined beforehand to determine which relations are crucial to this task, and the relations among sentences from documents were automatically identified by SVMs. We used the relations to emphasize important sentences during sentence ranking by PageRank and eliminate redundancy from the summary candidates. Our framework omits fully annotated sentences by humans and the evaluation results show that the combination of PageRank along with rhetorical relations does help to improve the quality of extractive summarization.
Accurate Permittivity Estimation Method with Iterative Waveform Correction for UWB Internal Imaging Radar
Ryunosuke SOUMA Shouhei KIDERA Tetsuo KIRIMOTO

PAPER-Electromagnetic Theory

Vol:
E96-C No:5
Page(s):
730-737
Ultra-wideband (UWB) pulse radar has high range resolution and permeability in a dielectric medium, and has great potential for the non-destructive inspection or early-stage detection of breast cancer. As an accurate and high-resolution imaging method for targets embedded in a dielectric medium, extended range points migration (RPM) has been developed. Although this method offers an accurate internal target image in a homogeneous media, it assumes the permittivity of the dielectric medium is given, which is not practical for general applications. Although there are various permittivity estimation methods, they have essential problems that are not suitable for clear, dielectric boundaries like walls, or is not applicable to an unknown and arbitrary shape of dielectric medium. To overcome the above drawbacks, we newly propose a permittivity estimation method suitable for various shapes of dielectric media with a clear boundary, where the dielectric boundary points and their normal vectors are accurately determined by the original RPM method. In addition, our method iteratively compensates for the scattered waveform deformation using a finite-difference time domain (FDTD) method to enhance the accuracy of the permittivity estimation. Results from a numerical simulation demonstrate that our method achieves accurate permittivity estimation even for a dielectric medium of wavelength size.
Pegasos Algorithm for One-Class Support Vector Machine
Changki LEE

LETTER-Artificial Intelligence, Data Mining

Vol:
E96-D No:5
Page(s):
1223-1226
Training one-class support vector machines (one-class SVMs) involves solving a quadratic programming (QP) problem. By increasing the number of training samples, solving this QP problem becomes intractable. In this paper, we describe a modified Pegasos algorithm for fast training of one-class SVMs. We show that this algorithm is much faster than the standard one-class SVM without loss of performance in the case of linear kernel.
A Bayesian Framework Using Multiple Model Structures for Speech Recognition
Sayaka SHIOTA Kei HASHIMOTO Yoshihiko NANKAKU Keiichi TOKUDA

PAPER-Speech and Hearing

Vol:
E96-D No:4
Page(s):
939-948
This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only “one” model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.
Mode-Matching Analysis of a Coaxially-Driven Finite Monopole Based on a Variable Bound Approach
Young Seung LEE Seung Keun PARK

PAPER-Antennas and Propagation

Vol:
E96-B No:4
Page(s):
994-1000
The problem of a finite monopole antenna driven by a coaxial cable is revisited. On the basis of a variable bound approach, the radiated field around a monopole antenna can be represented in terms of the discrete modal summation. This theoretical model allows us to avoid the difficulties experienced when dealing with integral equations having different wavenumber spectra and ensures a solution in a convergent series form so that it is numerically efficient. The behaviors of the input admittance and the current distribution to characterize the monopole antenna are shown for different coaxial-antenna geometries and also compared with other existing results.

441-460hit(1072hit)

Keyword Search Result

[Keyword] ACH(1072hit)

Region-Based Way-Partitioning on L1 Data Cache for Low Power

Basics of Counting Statistics Open Access

Color Removal Considering Differences of Colors and Achromatic Color Preservation

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs

A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

Extreme Maximum Margin Clustering

Proximity Based Object Segmentation in Natural Color Images Using the Level Set Method

Using MathML Parallel Markup Corpora for Semantic Enrichment of Mathematical Expressions

Active Breadcrumbs: Adaptive Distribution of In-Network Guidance Information for Content-Oriented Networks

Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation

Revisiting Shared Cache Contention Problems: A Practical Hardware-Software Cooperative Approach

Bayesian Theory Based Adaptive Proximity Data Accessing for CMP Caches

A High-Speed Trace-Driven Cache Configuration Simulator for Dual-Core Processor L1 Caches

Bidirectional Local Template Patterns: An Effective and Discriminative Feature for Pedestrian Detection

A Design of Low Latency Random Access Preamble Detector for LTE Uplink Receiver

Link Analysis Based on Rhetorical Relations for Multi-Document Summarization

Accurate Permittivity Estimation Method with Iterative Waveform Correction for UWB Internal Imaging Radar

Pegasos Algorithm for One-Class Support Vector Machine

A Bayesian Framework Using Multiple Model Structures for Speech Recognition

Mode-Matching Analysis of a Coaxially-Driven Finite Monopole Based on a Variable Bound Approach

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles