Hiroshi NAKAMURA Weihan WANG Yuya OHTA Kimiyoshi USAMI Hideharu AMANO Masaaki KONDO Mitaro NAMIKI
Power consumption has recently emerged as a first class design constraint in system LSI designs. Specially, leakage power has occupied a large part of the total power consumption. Therefore, reduction of leakage power is indispensable for efficient design of high-performance system LSIs. Since 2006, we have carried out a research project called “Innovative Power Control for Ultra Low-Power and High-Performance System LSIs”, supported by Japan Science and Technology Agency as a CREST research program. One of the major objectives of this project is reducing the leakage power consumption of system LSIs by innovative power control through tight cooperation and co-optimization of circuit technology, architecture, and system software designs. In this project, we focused on power gating as a circuit technique for reducing leakage power. Temporal granularity is one of the most important issue in power gating. Thus, we have developed a series of Geysers as proof-of-concept CPUs which provide several mechanisms of fine-grained run-time power gating. In this paper, we describe their concept and design, and explain why co-optimization of different design layers are important. Then, three kinds of power gating implementations and their evaluation are presented from the view point of power saving and temporal granularity.
Junsang CHO Jung Wook SUH Gwanggil JEON Jechang JEONG
In this letter, we propose an error surface modeling-based segmentalized motion estimation for video coding. We proposed two algorithms previously, one was MBQME [1] and the other is HMBQME [2]. However, these algorithms are not based on locally quadratic MC prediction errors around an integer-pixel motion vector and the hypothesis that the local error plane is a convex function. Therefore, we propose an error surface considered segmentalized modeling algorithm. In this scheme, the tendency of the error surface is first assessed. Using the Sobel operation at the error surface, we classify the error surface region as plain or textured. For plain regions, conventional MBQME is appropriate as the quarter-pixel motion estimation method. For textured regions, we search the additional interpolation points for more accurate modeling. After the interpolation, we perform double precision mathematical modeling so as to find the best motion vector (MV). Experiments show that the proposed scheme has better PSNR performance than conventional modeling algorithms with minimum operation time.
We present a new approach for sparse Cholesky factorization on a heterogeneous platform with a graphics processing unit (GPU). The sparse Cholesky factorization is one of the core algorithms of numerous computing applications. We tuned the supernode data structure and used a parallelization method for GPU tasks to increase GPU utilization. Results show that our approach substantially reduces computational time.
Computer-aided detection (CADe) and diagnosis (CAD) has been a rapidly growing, active area of research in medical imaging. Machine leaning (ML) plays an essential role in CAD, because objects such as lesions and organs may not be represented accurately by a simple equation; thus, medical pattern recognition essentially require “learning from examples.” One of the most popular uses of ML is the classification of objects such as lesion candidates into certain classes (e.g., abnormal or normal, and lesions or non-lesions) based on input features (e.g., contrast and area) obtained from segmented lesion candidates. The task of ML is to determine “optimal” boundaries for separating classes in the multi-dimensional feature space which is formed by the input features. ML algorithms for classification include linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), multilayer perceptrons, and support vector machines (SVM). Recently, pixel/voxel-based ML (PML) emerged in medical image processing/analysis, which uses pixel/voxel values in images directly, instead of features calculated from segmented lesions, as input information; thus, feature calculation or segmentation is not required. In this paper, ML techniques used in CAD schemes for detection and diagnosis of lung nodules in thoracic CT and for detection of polyps in CT colonography (CTC) are surveyed and reviewed.
Identification of early aspects is the critical problem in aspect-oriented requirement engineering. But the representation of crosscutting concerns is various, which makes the identification difficult. To address the problem, this paper proposes the AspectQuery method based on goal model. We analyze four kinds of goal decomposition models, then summarize the main factors about identification of crosscutting concerns and conclude the identification rules based on a goal model. A goal is crosscutting concern when it satisfies one of the following conditions: i) the goal is contributed to realize one soft-goal; ii) parent goal of the goal is candidate crosscutting concern; iii) the goal has at least two parent goals. AspectQuery includes four steps: building the goal model, transforming the goal model, identifying the crosscutting concerns by identification rules, and composing the crosscutting concerns with the goals affected by them. We illustrate the AspectQuery method through a case study (a ticket booking management system). The results show the effectiveness of AspectQuery in identifying crosscutting concerns in the requirement phase.
Yuichiro TAJIMA Kinya FUDANO Koichi ITO Takafumi AOKI
This paper presents a fast and accurate volume correspondence matching method using 3D Phase-Only Correlation (POC). The proposed method employs (i) a coarse-to-fine strategy using multi-scale volume pyramids for correspondence search and (ii) high-accuracy POC-based local block matching for finding dense volume correspondence with sub-voxel displacement accuracy. This paper also proposes its GPU implementation to achieve fast and practical computation of volume registration. Experimental evaluation shows that the proposed approach exhibits higher accuracy and lower computational cost compared with conventional method. We also demonstrate that the GPU implementation of the proposed method can align two volume data in several seconds, which is suitable for practical use in the image-guided radiation therapy.
Masayuki NAKADA Tatsunori OBARA Tetsuya YAMAMOTO Fumiyuki ADACHI
In this paper, a direct/cooperative relay switched single carrier-frequency division multiple access (SC-FDMA) using amplify-and-forward (AF) protocol and spectrum division/adaptive subcarrier allocation (SDASA) is proposed. Using SDASA, the transmit SC signal spectrum is divided into sub-blocks, to each of which a different set of subcarriers (resource block) is adaptively allocated according to the channel conditions of mobile terminal (MT)-relay station (RS) link, RS-base station (BS) link, and MT-BS link. Cooperative relay does not always provide higher capacity than the direct communication. Switching between direct communication and cooperative relay is done depending on the channel conditions of MT-RS, RS-BS, and MT-BS links. We evaluate the achievable channel capacity by the Monte-Carlo numerical computation method. It is shown that the proposed scheme can reduce the transmit power by about 6.0 (2.0) dB compared to the direct communication (the cooperative AF relay) for a 1%-outage capacity of 3.0 bps/Hz.
Shouhei KIDERA Tetsuo KIRIMOTO
UWB (Ultra Wideband) radar offers great promise for advanced near field sensors due to its high range resolution. In particular, it is suitable for rescue or resource exploration robots, which need to identify a target in low visibility or acoustically harsh environments. Recently, radar algorithms that actively coordinate multiple scattered components have been developed to enhance the imaging range beyond what can be achieved by synthesizing a single scattered component. Although we previously developed an accurate algorithm for imaging shadow regions with low computational complexity using derivatives of observed ranges for double scattered signals, the algorithm yields inaccurate images under the severe interference situations that occur with complex-shaped or multiple objects or in noisy environments. This is because small range fluctuations arising from interference or random noises can produce non-negligible image degradation owing to inaccuracy in the range derivative calculation. As a solution to this difficulty, this paper proposes a novel imaging algorithm that does not use the range derivatives of doubly scattered signals, and instead extracts a feature of expansive distributions of the observed ranges, using a unique property inherent to the doubly scattering mechanism. Numerical simulation examples of complex-shaped or multiple targets are presented to demonstrate the distinct advantage of the proposed algorithm which creates more accurate images even for complicated objects or in noisy situations.
Kazuo MOROKUMA Atsushi TAKEMOTO Yoshio KARASAWA
We propose a novel propagation measurement scheme for terrestrial TV signal indoor reception based on a virtual array technique. The system proposed in this paper carries out two-branch recording of target signals and the reference signal. By using the signal phase reference in the reference signal, we clarify the spatial propagation characteristics obtained from the two-dimensional virtual array outputs. Outdoor measurements were performed first to investigate the validity of the proposed measurement system. The results confirm its effectiveness in accurately determining the direction-of-arrival (DOA). We then investigated the propagation characteristics in an indoor environment. The angular spectrum obtained showed clear wave propagation structure. Thus, our proposed system is promising as a very accurate measurement tool for indoor propagation analysis.
Rong WANG Zhiliang WANG Xirong MA
For the problem of Indoor Home Scene Classification, this paper proposes the BOW Model of Local Feature Information Gain. The experimental results show that not only the performance is improved but also the computation is reduced. Consequently this method out performs the state-of-the-art approach.
Hsu-Kuang CHANG King-Chu HUNG I-Chang JOU
Compiling documents in extensible markup language (XML) increasingly requires access to data services which provide both rapid response and the precise use of search engines. Efficient data service should be based on a skillful representation that can support low complexity and high precision search capabilities. In this paper, a novel complete path representation (CPR) associated with a modified inverted index is presented to provide efficient XML data services, where queries can be versatile in terms of predicates. CPR can completely preserve hierarchical information, and the new index is used to save semantic information. The CPR approach can provide template-based indexing for fast data searches. An experiment is also conducted for the evaluation of the CPR approach.
Alberto CALIXTO SIMON Saul E. POMARES HERNANDEZ Jose Roberto PEREZ CRUZ Pilar GOMEZ-GIL Khalil DRIRA
Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.
Tadayoshi ENOMOTO Nobuaki KOBAYASHI
A motion estimation (ME) multimedia processor was developed by employing dynamic voltage and frequency scaling (DVFS) technique to greatly reduce the power dissipation. To make full use of the advantages of DVFS technique, a fast motion estimation (ME) algorithm was also developed. It can adaptively predict the optimum supply voltage and the optimum clock frequency before ME process starts for each macro-block for encoding. Power dissipation of the 90-nm CMOS DVFS controlled multimedia processor, which contained an absolute difference accumulator as well as a small on-chip DC/DC level converter, a minimum value detector and DVFS controller, was reduced to 38.48 µW, which was only 3.261% that of a conventional multimedia processor.
Hiroshi YUASA Hiroshi TSUTSUI Hiroyuki OCHI Takashi SATO
We propose a novel acceleration scheme for Monte Carlo based statistical static timing analysis (MC-SSTA). MC-SSTA, which repeatedly executes ordinary STA using a set of randomly generated gate delay samples, is widely accepted as an accuracy reference. A large number of random samples, however, should be processed to obtain accurate delay distributions, and software implementation of MC-SSTA, therefore, takes an impractically long processing time. In our approach, a generalized hardware module, the STA processing element (STA-PE), is used for the delay evaluation of a logic gate, and netlist-specific information is delivered in the form of instructions from an SRAM. Multiple STA-PEs can be implemented for parallel processing, while a larger netlist can be handled if only a larger SRAM area is available. The proposed scheme is successfully implemented on Altera's Arria II GX EP2AGX125EF35C4 device in which 26 STA-PEs and a 624-port Mersenne Twister-based random number generator run in parallel at a 116 MHz clock rate. A speedup of far more than10 is achieved compared to conventional methods including GPU implementation.
In this paper, we propose an improved face clustering method using a weighted graph-based approach. We combine two parameters as the weight of a graph to improve clustering performance. One is average similarity, which is calculated with two constraints of geometric and symmetric properties, and the other is a newly proposed parameter called the orientation matching ratio, which is calculated from orientation analysis for matched keypoints in the face region. According to the results of face clustering for several datasets, the proposed method shows improved results compared to the previous method.
Jinwook JUNG Yohei NAKATA Shunsuke OKUMURA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper presents an adaptive cache architecture for wide-range reliable low-voltage operations. The proposed associativity-reconfigurable cache consists of pairs of cache ways so that it can exploit the recovery feature of the novel 7T/14T SRAM cell. Each pair has two operating modes that can be selected based upon the required voltage level of current operating conditions: normal mode for high performance and dependable mode for reliable low-voltage operations. We can obtain reliable low-voltage operations by application of the dependable mode to weaker pairs that cannot operate reliably at low voltages. Meanwhile leaving stronger pairs in the normal mode, we can minimize performance losses. Our chip measurement results show that the proposed cache can trade off its associativity with the minimum operating voltage. Moreover, it can decrease the minimum operating voltage by 140 mV achieving 67.48% and 26.70% reduction of the power dissipation and energy per instruction. Processor simulation results show that designing the on-chip caches using the proposed scheme results in 2.95% maximum IPC losses, but it can be chosen various performance levels. Area estimation results show that the proposed cache adds area overhead of 1.61% and 5.49% in 32-KB and 256-KB caches, respectively.
Kuiyuan ZHANG Jun FURUTA Ryosuke YAMAMOTO Kazutoshi KOBAYASHI Hidetoshi ONODERA
According to the process scaling, radiation-hard devices are becoming sensitive to soft errors caused by Multiple Cell Upset (MCUs). In this paper, the parasitic bipolar effects are utilized to suppress MCUs of the radiation-hard dual-modular flip-flops. Device simulations reveal that a simultaneous flip of redundant latches is suppressed by storing opposite values instead of storing the same value due to its asymmetrical structure. The state of latches becomes a specific value after a particle hit due to the bipolar effects. Spallation neutron irradiation proves that MCUs are effectively suppressed in the D-FF arrays in which adjacent two latches in different FFs store opposite values. The redundant latch structure storing the opposite values is robust to the simultaneous flip.
Naoki KAMIYA Xiangrong ZHOU Huayue CHEN Chisako MURAMATSU Takeshi HARA Hiroshi FUJITA
Our purpose in this study is to develop a scheme to segment the rectus abdominis muscle region in X-ray CT images. We propose a new muscle recognition method based on the shape model. In this method, three steps are included in the segmentation process. The first is to generate a shape model for representing the rectus abdominis muscle. The second is to recognize anatomical feature points corresponding to the origin and insertion of the muscle, and the third is to segment the rectus abdominis muscles using the shape model. We generated the shape model from 20 CT cases and tested the model to recognize the muscle in 10 other CT cases. The average value of the Jaccard similarity coefficient (JSC) between the manually and automatically segmented regions was 0.843. The results suggest the validity of the model-based segmentation for the rectus abdominis muscle.
Ran LI Zong-Liang GAN Zi-Guan CUI Xiu-Chang ZHU
Novel joint motion-compensated interpolation using eight-neighbor block motion vectors (8J-MCI) is presented. The proposed method uses bi-directional motion estimation (BME) to obtain the motion vector field of the interpolated frame and adopts motion vectors of the interpolated block and its 8-neighbor blocks to jointly predict the target block. Since the smoothness of the motion vector filed makes the motion vectors of 8-neighbor blocks quite close to the true motion vector of the interpolated block, the proposed algorithm has the better fault-tolerancy than traditional ones. Experiments show that the proposed algorithm outperforms the motion-aligned auto-regressive algorithm (MAAR, one of the state-of-the-art frame rate up-conversion (FRUC) schemes) in terms of the average PSNR for the test image sequence and offers better subjective visual quality.
Jinfeng GAO Bilan ZHU Masaki NAKAGAWA
The paper describes how a robust and compact on-line handwritten Japanese text recognizer was developed by compressing each component of an integrated text recognition system including a SVM classifier to evaluate segmentation points, an on-line and off-line combined character recognizer, a linguistic context processor, and a geometric context evaluation module to deploy it on hand-held devices. Selecting an elastic-matching based on-line recognizer and compressing MQDF2 via a combination of LDA, vector quantization and data type transformation, have contributed to building a remarkably small yet robust recognizer. The compact text recognizer covering 7,097 character classes just requires about 15 MB memory to keep 93.11% accuracy on horizontal text lines extracted from the TUAT Kondate database. Compared with the original full-scale Japanese text recognizer, the memory size is reduced from 64.1 MB to 14.9 MB while the accuracy loss is only 0.5% from 93.6% to 93.11%. The method is scalable so even systems of less than 11 MB or less than 6 MB still remain 92.80% or 90.02% accuracy, respectively.