The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ACH(1072hit)

441-460hit(1072hit)

  • Region-Based Way-Partitioning on L1 Data Cache for Low Power

    Zhong ZHENG  Zhiying WANG  Li SHEN  

     
    LETTER-Computer System

      Vol:
    E96-D No:11
      Page(s):
    2466-2469

    Power consumption has become a critical factor for embedded systems, especially for battery powered ones. Caches in these systems consume a large portion of the whole chip power. Embedded systems usually adopt set-associative caches to get better performance. However, parallel accessed cache ways incur more energy dissipation. This paper proposed a region-based way-partitioning scheme to reduce cache way access, and without sacrificing performance, to reduce the cache power consumption. The stack accesses and non-stack accesses are isolated and redirected to different ways of the L1 data cache. Under way-partitioning, cache way accesses are reduced, as well as the memory reference interference. Experimental results show that the proposed approach could save around 27.5% of L1 data cache energy on average, without significant performance degradation.

  • Basics of Counting Statistics Open Access

    Jun OHKUBO  

     
    INVITED PAPER

      Vol:
    E96-B No:11
      Page(s):
    2733-2740

    In this paper, we briefly review the scheme of counting statistics, in which a probability of the number of monitored or target transitions in a Markov jump process is evaluated. It is generally easy to construct a master equation for the Markov jump process, and the counting statistics enables us to straightforwardly obtain basic equations of the counting statistics from the master equation; the basic equation is used to calculate the cumulant generating function of the probability of the number of target transitions. For stationary cases, the probability is evaluated from the eigenvalue analysis. As for the nonstationary cases, we review a numerical integration scheme to calculate the statistics of the number of transitions.

  • Color Removal Considering Differences of Colors and Achromatic Color Preservation

    Go TANAKA  Noriaki SUETAKE  Eiji UCHINO  

     
    LETTER-Image

      Vol:
    E96-A No:11
      Page(s):
    2315-2317

    In this letter, a novel color removal method considering differences of colors in an input color image and achromatic color preservation is proposed. The achromatic color preservation is assigning lightness values to gray-levels concerning achromatic pixels for natural impression. The effectiveness and validity of the proposed method are verified by experiments.

  • Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs

    Shuai MU  Dongdong LI  Yubei CHEN  Yangdong DENG  Zhihua WANG  

     
    PAPER-Computer System

      Vol:
    E96-D No:10
      Page(s):
    2194-2207

    By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.

  • A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

    Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Computer System

      Vol:
    E96-D No:9
      Page(s):
    2047-2054

    Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.

  • Extreme Maximum Margin Clustering

    Chen ZHANG  ShiXiong XIA  Bing LIU  Lei ZHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E96-D No:8
      Page(s):
    1745-1753

    Maximum margin clustering (MMC) is a newly proposed clustering method that extends the large-margin computation of support vector machine (SVM) to unsupervised learning. Traditionally, MMC is formulated as a nonconvex integer programming problem which makes it difficult to solve. Several methods rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programming (SDP) or second-order cone program (SOCP), which are computationally expensive and have difficulty handling large-scale data sets. In linear cases, by making use of the constrained concave-convex procedure (CCCP) and cutting plane algorithm, several MMC methods take linear time to converge to a local optimum, but in nonlinear cases, time complexity is still high. Since extreme learning machine (ELM) has achieved similar generalization performance at much faster learning speed than traditional SVM and LS-SVM, we propose an extreme maximum margin clustering (EMMC) algorithm based on ELM. It can perform well in nonlinear cases. Moreover, the kernel parameters of EMMC need not be tuned by means of random feature mappings. Experimental results on several real-world data sets show that EMMC performs better than traditional MMC methods, especially in handling large-scale data sets.

  • Proximity Based Object Segmentation in Natural Color Images Using the Level Set Method

    Tran Lan Anh NGUYEN  Gueesang LEE  

     
    PAPER-Image

      Vol:
    E96-A No:8
      Page(s):
    1744-1751

    Segmenting indicated objects from natural color images remains a challenging problem for researches of image processing. In this paper, a novel level set approach is presented, to address this issue. In this segmentation algorithm, a contour that lies inside a particular region of the concerned object is first initialized by a user. The level set model is then applied, to extract the object of arbitrary shape and size containing this initial region. Constrained on the position of the initial contour, our proposed framework combines two particular energy terms, namely local and global energy, in its energy functional, to control movement of the contour toward object boundaries. These energy terms are mainly based on graph partitioning active contour models and Bhattacharyya flow, respectively. Its flow describes dissimilarities, measuring correlative relationships between the region of interest and surroundings. The experimental results obtained from our image collection show that the suggested method yields accurate and good performance, or better than a number of segmentation algorithms, when applied to various natural images.

  • Using MathML Parallel Markup Corpora for Semantic Enrichment of Mathematical Expressions

    Minh-Quoc NGHIEM  Giovanni YOKO KRISTIANTO  Akiko AIZAWA  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E96-D No:8
      Page(s):
    1707-1715

    This paper explores the problem of semantic enrichment of mathematical expressions. We formulate this task as the translation of mathematical expressions from presentation markup to content markup. We use MathML, an application of XML, to describe both the structure and content of mathematical notations. We apply a method based on statistical machine translation to extract translation rules automatically. This approach contrasts with previous research, which tends to rely on manually encoded rules. We also introduce segmentation rules used to segment mathematical expressions. Combining segmentation rules and translation rules strengthens the translation system and archives significant improvements over a prior rule-based system.

  • Active Breadcrumbs: Adaptive Distribution of In-Network Guidance Information for Content-Oriented Networks

    Masayuki KAKIDA  Yosuke TANIGAWA  Hideki TODE  

     
    PAPER

      Vol:
    E96-B No:7
      Page(s):
    1670-1679

    Lately, access loads on servers are increasing due to larger content size and higher request frequency in content distribution networks. Breadcrumbs (BC), an architecture with guidance information for locating a content cache, is designed to reduce the server load and to form content-oriented network autonomously in cooperation with cached contents over IP network. We also proposed Breadcrumbs+ which solves BC's endless routing loop problem. However, Breadcrumbs takes only a passive approach; BC entries are created only when a content is downloaded and only at routers on the download path but not at any other routers. We expect that active and adaptive control of guidance information with simple complexity improves its performance with keeping scalability. In this paper, we propose Active Breadcrumbs which achieves efficient content retrieval and load-balancing through active and adaptive control of guidance information by cache-nodes themselves. In addition, we show the effectiveness of Active Breadcrumbs through the extensive computer simulation.

  • Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation

    Zezhong LI  Hideto IKEDA  Junichi FUKUMOTO  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:7
      Page(s):
    1536-1543

    In most phrase-based statistical machine translation (SMT) systems, the translation model relies on word alignment, which serves as a constraint for the subsequent building of a phrase table. Word alignment is usually inferred by GIZA++, which implements all the IBM models and HMM model in the framework of Expectation Maximum (EM). In this paper, we present a fully Bayesian inference for word alignment. Different from the EM approach, the Bayesian inference makes use of all possible parameter values rather than estimating a single parameter value, from which we expect a more robust inference. After inferring the word alignment, current SMT systems usually train the phrase table from Viterbi word alignment, which is prone to learn incorrect phrases due to the word alignment mistakes. To overcome this drawback, a new phrase extraction method is proposed based on multiple Gibbs samples from Bayesian inference for word alignment. Empirical results show promising improvements over baselines in alignment quality as well as the translation performance.

  • Revisiting Shared Cache Contention Problems: A Practical Hardware-Software Cooperative Approach

    Eunji PAK  Sang-Hoon KIM  Jaehyuk HUH  Seungryoul MAENG  

     
    PAPER-Computer System

      Vol:
    E96-D No:7
      Page(s):
    1457-1466

    Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.

  • Bayesian Theory Based Adaptive Proximity Data Accessing for CMP Caches

    Guohong LI  Zhenyu LIU  Sanchuan GUO  Dongsheng WANG  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1293-1305

    As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they induce higher overall L1 miss latencies because of the longer average distance between the requestor and the home node, and the potential congestions at certain nodes. We observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. In order to leverage the aforementioned property, we propose Bayesian Theory based Adaptive Proximity Data Accessing (APDA). In our proposal, we organize the multi-core into clusters of 2x2 nodes, and introduce the Proximity Data Prober (PDP) to detect whether an L1 miss can be served by one of the cluster L1 caches. Furthermore, we devise the Bayesian Decision Classifier (BDC) to adaptively select the remote L2 cache or the neighboring L1 node as the server according to the minimum miss cost. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the APDA can reduce the execution time by 20% and reduce the energy by 14% compared to a standard multi-core with a shared L2. The experimental results demonstrate that our proposal outperforms the up-to-date mechanisms, such as ASR, DCC and RNUCA.

  • A High-Speed Trace-Driven Cache Configuration Simulator for Dual-Core Processor L1 Caches

    Masashi TAWADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1283-1292

    Recently, multi-core processors are used in embedded systems very often. Since application programs is much limited running on embedded systems, there must exists an optimal cache memory configuration in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal one. Multi-core cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast dual-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multi-core cache configurations with different cache associativities. After that, we propose a new multi-core cache configuration simulation algorithm using our new data structure associated with new theorems. Experimental results demonstrate that our algorithm obtains exact simulation results but runs 20 times faster than a conventional approach.

  • Bidirectional Local Template Patterns: An Effective and Discriminative Feature for Pedestrian Detection

    Jiu XU  Ning JIANG  Satoshi GOTO  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1204-1213

    In this paper, a novel feature named bidirectional local template patterns (B-LTP) is proposed for use in pedestrian detection in still images. B-LTP is a combination and modification of two features, histogram of templates (HOT) and center-symmetric local binary patterns (CS-LBP). For each pixel, B-LTP defines four templates, each of which contains the pixel itself and two neighboring center-symmetric pixels. For each template, it then calculates information from the relationships among these three pixels and from the two directional transitions across these pixels. Moreover, because the feature length of B-LTP is small, it consumes less memory and computational power. Experimental results on an INRIA dataset show that the speed and detection rate of our proposed B-LTP feature outperform those of other features such as histogram of orientated gradient (HOG), HOT, and covariance matrix (COV).

  • A Design of Low Latency Random Access Preamble Detector for LTE Uplink Receiver

    Joohyun LEE  Bontae KOO  Hyuckjae LEE  

     
    PAPER-Transmission Systems and Transmission Equipment for Communications

      Vol:
    E96-B No:5
      Page(s):
    1089-1096

    This paper presents a hardware design of high throughput, low latency preamble detector for 3GPP LTE physical random access channel (PRACH) receiver. The presented PRACH receiver uses the pipelined structure to improve the throughput of power delay profile (PDP) generation which is executed multiple times during the preamble detection. In addition, to reduce detection latency, we propose an instantaneous preamble detection method for both restricted and unrestricted set. The proposed preamble detection method can detect all existing preambles directly and instantaneously from PDP output while conducting PDP combining for restricted set. The PDP combining enables the PRACH receiver to detect preambles robustly even in severe Doppler effect or frequency error exist. Using proposed method, the worst case preamble detection latency time can be less than 1 ms with 136 MHz clock and the proposed PRACH receiver can be implemented with approximately 237k equivalent ASIC gates count or occupying 30.2% of xc6vlx130t FPGA device.

  • Link Analysis Based on Rhetorical Relations for Multi-Document Summarization

    Nik Adilah Hanin BINTI ZAHRI  Fumiyo FUKUMOTO  Suguru MATSUYOSHI  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:5
      Page(s):
    1182-1191

    This paper presents link analysis based on rhetorical relations with the aim of performing extractive summarization for multiple documents. We first extracted sentences with salient terms from individual document using statistical model. We then ranked the extracted sentences by measuring their relative importance according to their connectivity among the sentences in the document set using PageRank based on the rhetorical relations. The rhetorical relations were examined beforehand to determine which relations are crucial to this task, and the relations among sentences from documents were automatically identified by SVMs. We used the relations to emphasize important sentences during sentence ranking by PageRank and eliminate redundancy from the summary candidates. Our framework omits fully annotated sentences by humans and the evaluation results show that the combination of PageRank along with rhetorical relations does help to improve the quality of extractive summarization.

  • Accurate Permittivity Estimation Method with Iterative Waveform Correction for UWB Internal Imaging Radar

    Ryunosuke SOUMA  Shouhei KIDERA  Tetsuo KIRIMOTO  

     
    PAPER-Electromagnetic Theory

      Vol:
    E96-C No:5
      Page(s):
    730-737

    Ultra-wideband (UWB) pulse radar has high range resolution and permeability in a dielectric medium, and has great potential for the non-destructive inspection or early-stage detection of breast cancer. As an accurate and high-resolution imaging method for targets embedded in a dielectric medium, extended range points migration (RPM) has been developed. Although this method offers an accurate internal target image in a homogeneous media, it assumes the permittivity of the dielectric medium is given, which is not practical for general applications. Although there are various permittivity estimation methods, they have essential problems that are not suitable for clear, dielectric boundaries like walls, or is not applicable to an unknown and arbitrary shape of dielectric medium. To overcome the above drawbacks, we newly propose a permittivity estimation method suitable for various shapes of dielectric media with a clear boundary, where the dielectric boundary points and their normal vectors are accurately determined by the original RPM method. In addition, our method iteratively compensates for the scattered waveform deformation using a finite-difference time domain (FDTD) method to enhance the accuracy of the permittivity estimation. Results from a numerical simulation demonstrate that our method achieves accurate permittivity estimation even for a dielectric medium of wavelength size.

  • Pegasos Algorithm for One-Class Support Vector Machine

    Changki LEE  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E96-D No:5
      Page(s):
    1223-1226

    Training one-class support vector machines (one-class SVMs) involves solving a quadratic programming (QP) problem. By increasing the number of training samples, solving this QP problem becomes intractable. In this paper, we describe a modified Pegasos algorithm for fast training of one-class SVMs. We show that this algorithm is much faster than the standard one-class SVM without loss of performance in the case of linear kernel.

  • A Bayesian Framework Using Multiple Model Structures for Speech Recognition

    Sayaka SHIOTA  Kei HASHIMOTO  Yoshihiko NANKAKU  Keiichi TOKUDA  

     
    PAPER-Speech and Hearing

      Vol:
    E96-D No:4
      Page(s):
    939-948

    This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only “one” model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.

  • Mode-Matching Analysis of a Coaxially-Driven Finite Monopole Based on a Variable Bound Approach

    Young Seung LEE  Seung Keun PARK  

     
    PAPER-Antennas and Propagation

      Vol:
    E96-B No:4
      Page(s):
    994-1000

    The problem of a finite monopole antenna driven by a coaxial cable is revisited. On the basis of a variable bound approach, the radiated field around a monopole antenna can be represented in terms of the discrete modal summation. This theoretical model allows us to avoid the difficulties experienced when dealing with integral equations having different wavenumber spectra and ensures a solution in a convergent series form so that it is numerically efficient. The behaviors of the input admittance and the current distribution to characterize the monopole antenna are shown for different coaxial-antenna geometries and also compared with other existing results.

441-460hit(1072hit)