Jun CHAI Mei WEN Nan WU Dafei HUANG Jing YANG Xing CAI Chunyuan ZHANG Qianming YANG
This paper presents a study of the applicability of clusters of GPUs to high-resolution 3D simulations of cardiac electrophysiology. By experimenting with representative cardiac cell models and ODE solvers, in association with solving the monodomain equation, we quantitatively analyze the obtainable computational capacity of GPU clusters. It is found that for a 501×501×101 3D mesh, which entails a 0.1mm spatial resolution, a 128-GPU cluster only needs a few minutes to carry out a 100,000-time-step cardiac excitation simulation that involves a four-variable cell model. Even higher spatial and temporal resolutions are achievable for such simplified mathematical models. On the other hand, our experiments also show that a dramatically larger cluster of GPUs is needed to handle a very detailed cardiac cell model.
Mitsuharu ARIMURA Hiroki KOGA Ken-ichi IWATA
In this letter, we first introduce a stronger notion of the optimistic achievable coding rate and discuss a coding theorem. Next, we give a necessary and sufficient condition under which the coding rates of all the optimal FF codes asymptotically converge to a constant.
Zhen ZHANG Shouyi YIN Leibo LIU Shaojun WEI
TSV-interconnected 3D chips face problems such as high cost, low yield and large power dissipation. We propose a wireless 3D on-chip-network architecture for application-specific SoC design, using inductive-coupling interconnect instead of TSV for inter-layer communication. Primary design challenge of inductive-coupling 3D SoC is allocating wireless links in the 3D on-chip network effectively. We develop a design flow fully exploiting the design space brought by wireless links while providing flexible tradeoff for user's choice. Experimental results show that our design brings great improvement over uniform design and Sunfloor algorithm on latency (5% to 20%) and power consumption (10% to 45%).
Shin-ya ABE Youhua SHI Kimiyoshi USAMI Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.
Xinyuan CAI Chunheng WANG Baihua XIAO Yunxue SHAO
Face verification is the task of determining whether two given face images represent the same person or not. It is a very challenging task, as the face images, captured in the uncontrolled environments, may have large variations in illumination, expression, pose, background, etc. The crucial problem is how to compute the similarity of two face images. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Independent Subspace Analysis (ISA) network. Compared to the linear or kernel based metric learning methods, the proposed deep ISA network is a deep and local learning architecture, and therefore exhibits more powerful ability to learn the nature of highly variable dataset. We evaluate our method on the Labeled Faces in the Wild dataset, and results show superior performance over some state-of-the-art methods.
Takahiro OTA Hiroyoshi MORITA Adriaan J. de Lind van WIJNGAARDEN
This paper presents a real-time and memory-efficient arrhythmia detection system with binary classification that uses antidictionary coding for the analysis and classification of electrocardiograms (ECGs). The measured ECG signals are encoded using a lossless antidictionary encoder, and the system subsequently uses the compression rate to distinguish between normal beats and arrhythmia. An automated training data procedure is used to construct the automatons, which are probabilistic models used to compress the ECG signals, and to determine the threshold value for detecting the arrhythmia. Real-time computer simulations with samples from the MIT-BIH arrhythmia database show that the averages of sensitivity and specificity of the proposed system are 97.8% and 96.4% for premature ventricular contraction detection, respectively. The automatons are constructed using training data and comprise only 11 kilobytes on average. The low complexity and low memory requirements make the system particularly suitable for implementation in portable ECG monitors.
Taeko MATSUNAGA Shinji KIMURA Yusuke MATSUNAGA
Multi-operand adders that calculate the summation of more than two operands usually consist of compressor trees, which reduce the number of operands to two without any carry propagation, and carry-propagate adders for the two operands in the ASIC implementation. Compressor trees that consist of full adders and half adders cannot be implemented efficiently on LUT-based FPGAs, and carry-chains or dedicated structures have been utilized to produce multi-operand adders on FPGAs. Recent studies indicate that compressor trees can be implemented efficiently on LUTs using Generalized Parallel Counters (GPCs) as the building blocks of compressor trees. This paper addresses the problem of synthesizing compressor trees based on GPCs. Based on the observation that characteristics such as the area, power, and delay correlate roughly to the total number and the maximum level of GPCs, the target problem can be regarded as a minimization problem for the total number of GPCs and the maximum levels of the GPCs, for which an ILP-based approach is proposed. The key point of our formulation is not to model the problem based on the structures of compressor trees like the existing approach, but instead the compression process itself is used to reduce the number of variables and constraints in the ILP formulation. The experimental results demonstrate the advantage of our formulation in terms of the quality and runtime.
The throughput rate of Viterbi decoding (VD) is not limited by the speed of functional units when look-ahead computation techniques are used. The disadvantages of the look-ahead computation in VD are the hardware complexity and the decode latency. In this paper, implementation methods of the look-ahead ACS computation are proposed to improve the hardware efficiency and reduce the latency where the hardware efficiency and the latency can be balanced with a single parameter.
Nan WU Hua WANG Hongjie ZHAO Jingming KUANG
This paper studies the performance of code-aided (CA) soft-information based carrier phase recovery, which iteratively exploits the extrinsic information from channel decoder to improve the accuracy of phase synchronization. To tackle the problem of strong coupling between phase recovery and decoding, a semi-analytical model is proposed to express the distribution of extrinsic information as a function of phase offset. Piecewise approximation of the hyperbolic tangent function is employed to linearize the expression of soft symbol decision. Building on this model, open-loop characteristic and closed-loop performance of CA iterative soft decision-directed (ISDD) carrier phase synchronizer are derived in closed-form. Monte Carlo simulation results corroborate that the proposed expressions are able to characterize the performance of CA ISDD carrier phase recovery for systems with different channel codes.
Nien-En WU Hsuan-Jung SU Hsueh-Jyh LI
Relay selection is a promising technique with which to achieve remarkable gains in multi-relay cooperative networks. Opportunistic relaying (OR) and selection cooperation (SC) are two major relay selection schemes for dual-hop decode-and-forward cooperative networks; they have been shown to be globally outage-optimal under an aggregate power constraint. However, due to channel fluctuations, the channel state information (CSI) used in the selection process may become outdated and differ from the CSI during the actual transmission of data. In this work, we study the effect of outdated CSI on OR and threshold-based SC (TSC) schemes under independent but not necessarily identically distributed Rayleigh fading channels. The source can possibly cooperate with the best relay for data transmission, with the destination performing maximal ratio combining of the signals from the source and the relay. In particular, we analyze the average symbol error probability (ASEP) of OR and TSC with outdated CSI by deriving approximate but tight closed-form expressions for the moment generating function of the end-to-end signal-to-noise ratio. We also investigate the asymptotic behavior of the ASEP. The results show that the diversity orders of OR and TSC reduce to one and two, respectively, due to the outdated CSI. However, TSC achieves full spatial diversity order when the relay-to-destination CSI is perfect. Finally, to verify the analytical results Monte Carlo simulations are performed, in which OR attains better ASEP than TSC in a perfect CSI scenario, while TSC is less susceptible to outdated CSI.
Qieshi ZHANG Sei-ichiro KAMATA
This paper proposes an improved color barycenter model (CBM) and its separation for automatic road sign (RS) detection. The previous version of CBM can find out the colors of RS, but the accuracy is not high enough for separating the magenta and blue regions and the influence of number with the same color are not considered. In this paper, the improved CBM expands the barycenter distribution to cylindrical coordinate system (CCS) and takes the number of colors at each position into account for clustering. Under this distribution, the color information can be represented more clearly for analyzing. Then aim to the characteristic of barycenter distribution in CBM (CBM-BD), a constrained clustering method is presented to cluster the CBM-BD in CCS. Although the proposed clustering method looks like conventional K-means in some part, it can solve some limitations of K-means in our research. The experimental results show that the proposed method is able to detect RS with high robustness.
When attackers compromise a client system, they can steal user input. We propose a distributed one-time keyboard system to prevent information leakage via keyboard typing. We define the problem of secure keyboard arrangement over distributed multi-devices and channels. An analytical model is proposed for the optimal keyboard layout.
Energy-harvesting devices are materials that allow ambient energy sources to be converters into usable electrical power. While a battery powers the modern embedded systems, these energy-harvesting devices power the energy-harvesting embedded systems. This claims a new energy efficient management techniques for the energy-harvesting systems dislike the previous management techniques. The higher entire system efficiency in an energy-harvesting system can be obtained by a higher generating efficiency, a higher consuming efficiency, or a higher transferring efficiency. This paper presents a generalized technique for a dynamic reconfiguration and a task scheduling considering the power loss in DC-DC converters in the system. The proposed technique minimizes the power loss in the DC-DC converter and charger of the system. The proposed technique minimizes the power loss in the DC-DC converters and charger of the system. Experiments with actual application demonstrate that our approach reduces the total energy consumption by 22% in average over the conventional approach.
Akira HIRABAYASHI Jumpei SUGIMOTO Kazushi MIMURA
The main target of compressed sensing is recovery of one-dimensional signals, because signals more than two-dimension can also be treated as one-dimensional ones by raster scan, which makes the sensing matrix huge. This is unavoidable for general sensing processes. In separable cases like discrete Fourier transform (DFT) or standard wavelet transforms, however, the corresponding sensing process can be formulated using two matrices which are multiplied from both sides of the target two-dimensional signals. We propose an approximate message passing (AMP) algorithm for the separable sensing process. Typically, we suppose DFT for the sensing process, in which the measurements are complex numbers. Therefore, the formulation includes cases in which both target signal and measurements are complex. We show the effectiveness of the proposed algorithm by computer simulations.
Hatsuhiro KATO Hatsuyoshi KATO
The recursive transfer method (RTM) is a numerical technique that was developed to analyze scattering phenomena and its formulation is constructed with a difference equation derived from a differential equation by Numerov's discretization method. However, the differential equation to which Numerov's method is applicable is restricted and therefore the application range of RTM is also limited. In this paper, we provide a new discretization scheme to extend RTM formulation using the weak form theory framework. The effectiveness of the proposed formulation is confirmed by microwave scattering induced by a metallic pillar placed asymmetrically in the waveguide. A notable feature of RTM is that it can extract a localized wave from scattering waves even if the tail of the localized wave reaches to the ends of analyzing region. The discrepancy between the experimental and theoretical data is suppressed with in an upper bound determined by the standing wave ratio of the waveguide.
Michael Joseph TAN Yuichi KAJI
Novel flash codes with small average write deficiency are proposed. A flash code is a coding scheme for avoiding the wearing of cells in flash memory. One approach to develop flash codes with large parameters is to make use of slices which are small groups of cells. Preliminary study shows that using small slices brings several favorable characteristics, but naive use of small slices induces a certain overhead. In this study, a new structure which is called a cluster is devised to develop a good slice-based flash code. Two different slice encoding schemes are used in a cluster, which decreases the overhead of using small slices while retaining its advantage. The proposed flash codes show much smaller write deficiency compared to another slice-based flash code.
The optimum generalized partial response (GPR) target for barium ferrite (BaFe) tape systems was studied. The shift in perpendicular magnetic recording technology in HDDs to systems employing single-pole-type (SPT) recording heads and media with a soft under layer (SUL) has been accompanied by a change in the read channel design, whereas current magnetic tape recording systems utilize a combination of a ring-type recording head with a single magnetic layer structured medium. Therefore, the read channel performance of current oriented BaFe particulate tape systems needs to be studied to best exploit the potential of this medium. Toward this end, DC-free, DC-full, and DC-suppressed targets were compared. The results show that assuming a GPRML detector with 16 or more states, a traditional DC-free target exhibits the best bit error rate performance for both longitudinally and perpendicularly oriented BaFe media, suggesting that the current read channel designed for longitudinally oriented media can also be utilized for BaFe particulate tape systems.
Takashi KITAMURA Keishi OKAMOTO
In this paper, we propose and implement an automated route planning framework for milk-run transport logistics by applying model checking techniques. First, we develop a formal specification framework for milk-run transport logistics. The framework adopts LTL (Linear Temporal Logic), a language based on temporal logics, as a specification language for users to be able to flexibly and formally specify complex delivery requirements for trucks. Then by applying the bounded semantics of LTL, the framework then defines the notion of “optimal truck routes”, which mean truck routes on a given route map that satisfy given delivery requirements (specified by LTL) with the minimum cost. We implement the framework as an automated route planner using the NuSMV model checker, a state-of-the-art bounded model checker. The automated route planner, given route map and delivery requirements, automatically finds optimal trucks routes on the route map satisfying the given delivery requirements. The feasibility of the implementation design is investigated by analysing its computational complexity and by showing experimental results.
Guifang SHAO Wupeng HONG Tingna WANG Yuhua WEN
An improved genetic algorithm is employed to optimize the structure of (C60)N (N≤25) fullerene clusters with the lowest energy. First, crossover with variable precision, realized by introducing the hamming distance, is developed to provide a faster search mechanism. Second, the bit string mutation and feedback mutation are incorporated to maintain the diversity in the population. The interaction between C60 molecules is described by the Pacheco and Ramalho potential derived from first-principles calculations. We compare the performance of the Improved GA (IGA) with that of the Standard GA (SGA). The numerical and graphical results verify that the proposed approach is faster and more robust than the SGA. The second finite differential of the total energy shows that the (C60)N clusters with N=7, 13, 22 are particularly stable. Performance with the lowest energy is achieved in this work.
Tetsuya KANDA Yuki MANABE Takashi ISHIO Makoto MATSUSHITA Katsuro INOUE
It is not always easy for an Android user to choose the most suitable application for a particular task from the great number of applications available. In this paper, we propose a semi-automatic approach to extract feature names from Android applications. The case study verifies that we can associate common sequences of Android API calls with feature names.