This paper presents a method for learning an overcomplete, nonnegative dictionary and for obtaining the corresponding coefficients so that a group of nonnegative signals can be sparsely represented by them. This is accomplished by posing the learning as a problem of nonnegative matrix factorization (NMF) with maximization of the incoherence of the dictionary and of the sparsity of coefficients. By incorporating a dictionary-incoherence penalty and a sparsity penalty in the NMF formulation and then adopting a hierarchically alternating optimization strategy, we show that the problem can be cast as two sequential optimal problems of quadratic functions. Each optimal problem can be solved explicitly so that the whole problem can be efficiently solved, which leads to the proposed algorithm, i.e., sparse hierarchical alternating least squares (SHALS). The SHALS algorithm is structured by iteratively solving the two optimal problems, corresponding to the learning process of the dictionary and to the estimating process of the coefficients for reconstructing the signals. Numerical experiments demonstrate that the new algorithm performs better than the nonnegative K-SVD (NN-KSVD) algorithm and several other famous algorithms, and its computational cost is remarkably lower than the compared algorithms.
We present a new approach for sparse Cholesky factorization on a heterogeneous platform with a graphics processing unit (GPU). The sparse Cholesky factorization is one of the core algorithms of numerous computing applications. We tuned the supernode data structure and used a parallelization method for GPU tasks to increase GPU utilization. Results show that our approach substantially reduces computational time.
Lingjuan WU Ryan KASTNER Bo GU Dunshan YU
Design of acoustic modem becomes increasingly important in underwater sensor networks' development. This paper presents the design of a reconfigurable acoustic modem, by defining modulation and demodulation as reconfigurable modules, the proposed modem changes its modulation scheme and data rate to provide reliable and energy efficient communication. The digital system, responsible for signal processing and control, is implemented on Xilinx Virtex5 FPGA. Hardware and software co-verification shows that the modem works correctly and can self-configure to BFSK and BPSK mode. Partial reconfiguration design method improves flexibility of algorithm design, and slice, LUT, register, DSP, RAMB are saved by 17%, 25%, 22%, 25%, 25% respectively.
Fei LI Masaya MIYAHARA Akira MATSUZAWA
This paper describes the analysis and design of low-noise analog circuits for a new architecture readout LSI, Qpix. In contrast to conventional readout LSIs using TOT method, Qpix measures deposited charge directly as well as time information. A preamplifier with a two-stage op amp and current-copy output buffers is proposed to realize these functions. This preamplifier is configured to implement a charge sensitive amplifier (CSA) and a trans-impedance amplifier (TIA). Design issues related to CSA are analyzed, which includes gain requirement of the op amp, stability and compensation of the two-stage cascode op amp, noise performance estimation, requirement for the resolution of the ADC and time response. The offset calibration method in the TIA to improve the charge detecting sensitivity is also presented. Also, some design principles for these analog circuits are presented. In order to verify the theoretical analysis, a 400-pixel high speed readout LSI: Qpix v.1 has been designed and fabricated in 180 nm CMOS process. Calculations and SPICE simulations show that the total output noise is about 0.31 mV (rms) at the output of the CSA and the offset voltage is less than 4 mV at the output of the TIA. These are attractive performances for experimental particle detector using Qpix v.1 chip as its readout LSI.
Qing LIU Tomohiro ODAKA Jousuke KUROIWA Hisakazu OGURA
An artificial fish swarm algorithm for solving symbolic regression problems is introduced in this paper. In the proposed AFSA, AF individuals represent candidate solutions, which are represented by the gene expression scheme in GEP. For evaluating AF individuals, a penalty-based fitness function, in which the node number of the parse tree is considered to be a constraint, was designed in order to obtain a solution expression that not only fits the given data well but is also compact. A number of important conceptions are defined, including distance, partners, congestion degree, and feature code. Based on the above concepts, we designed four behaviors, namely, randomly moving behavior, preying behavior, following behavior, and avoiding behavior, and present their respective formalized descriptions. The exhaustive simulation results demonstrate that the proposed algorithm can not only obtain a high-quality solution expression but also provides remarkable robustness and quick convergence.
Tatsuya KON Takashi OBI Hideaki TASHIMA Nagaaki OHYAMA
Parametric images can help investigate disease mechanisms and vital functions. To estimate parametric images, it is necessary to obtain the tissue time activity curves (tTACs), which express temporal changes of tracer activity in human tissue. In general, the tTACs are calculated from each voxel's value of the time sequential PET images estimated from dynamic PET data. Recently, spatio-temporal PET reconstruction methods have been proposed in order to take into account the temporal correlation within each tTAC. Such spatio-temporal algorithms are generally quite computationally intensive. On the other hand, typical algorithms such as the preconditioned conjugate gradient (PCG) method still does not provide good accuracy in estimation. To overcome these problems, we propose a new spatio-temporal reconstruction method based on the dynamic row-action maximum-likelihood algorithm (DRAMA). As the original algorithm does, the proposed method takes into account the noise propagation, but it achieves much faster convergence. Performance of the method is evaluated with digital phantom simulations and it is shown that the proposed method requires only a few reconstruction processes, thereby remarkably reducing the computational cost required to estimate the tTACs. The results also show that the tTACs and parametric images from the proposed method have better accuracy.
Ji-Won HUH Shuji ISOBE Eisuke KOIZUMI Hiroki SHIZUYA
In this paper, we investigate a relationship between the length-decreasing self-reducibility and the many-one-like reducibilities for partial multivalued functions. We show that if any parsimonious (many-one or metric many-one) complete function for NPMV (or NPMVg) is length-decreasing self-reducible, then any function in NPMV (or NPMVg) has a polynomial-time computable refinement. This result implies that there exists an NPMV (or NPMVg)-complete function which is not length-decreasing self-reducible unless P = NP.
Yang YU Shiro HANDA Fumihito SASAMORI Osamu TAKYU
In this paper, through extrinsic information transfer (EXIT) band chart analysis, an adaptive iterative decoding approach (AIDA) is proposed to reduce the iterative decoding complexity and delay for finite-length differentially encoded Low-density parity-check (DE-LDPC) coded systems with multiple-symbol differential detection (MSDD). The proposed AIDA can adaptively adjust the observation window size (OWS) of the MSDD soft-input soft-output demodulator (SISOD) and the outer iteration number of the iterative decoder (consisting of the MSDD SISOD and the LDPC decoder) instead of setting fixed values for the two parameters of the considered systems. The performance of AIDA depends on its stopping criterion (SC) which is used to terminate the iterative decoding before reaching the maximum outer iteration number. Many SCs have been proposed; however, these approaches focus on turbo coded systems, and it has been proven that they do not well suit for LDPC coded systems. To solve this problem, a new SC called differential mutual information (DMI) criterion, which can track the convergence status of the iterative decoding, is proposed; it is based on tracking the difference of the output mutual information of the LDPC decoder between two consecutive outer iterations of the considered systems. AIDA using the DMI criterion can adaptively adjust the out iteration number and OWS according to the convergence situation of the iterative decoding. Simulation results show that compared with using the existing SCs, AIDA using the DMI criterion can further reduce the decoding complexity and delay, and its performance is not affected by a change in the LDPC code and transmission channel parameters.
Koichi SAKAGUCHI Akinori FUJITO Seiko UCHINO Asami OHTAKE Noboru TAKISAWA Kunio AKEDO Masanao ERA
We investigated oxidation time dependence of graphene oxide employing modified Hummer method by dynamic light scattering. Oxidation reaction proceeded rapidly within about 24 hours, and was saturated. It is suggested that graphene oxides were not able to freely fragment. This implies that the oxidation reactions occur at the limited sites.
Kazunori HAYASHI Masaaki NAGAHARA Toshiyuki TANAKA
This survey provides a brief introduction to compressed sensing as well as several major algorithms to solve it and its various applications to communications systems. We firstly review linear simultaneous equations as ill-posed inverse problems, since the idea of compressed sensing could be best understood in the context of the linear equations. Then, we consider the problem of compressed sensing as an underdetermined linear system with a prior information that the true solution is sparse, and explain the sparse signal recovery based on
This paper presents a novel scale-rotation invariant generative model (SRIGM) and a kernel sparse representation classification (KSRC) method for scene categorization. Recently the sparse representation classification (SRC) methods have been highly successful in a number of image processing tasks. Despite its popularity, the SRC framework lucks the abilities to handle multi-class data with high inter-class similarity or high intra-class variation. The kernel random coordinate descent (KRCD) algorithm is proposed for
Masaya NAKAHARA Shirou MARUYAMA Tetsuji KUBOYAMA Hiroshi SAKAMOTO
A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string deterministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.
Quoc Huy DO Seiichi MITA Hossein Tehrani Nik NEJAD Long HAN
We propose a practical local and global path-planning algorithm for an autonomous vehicle or a car-like robot in an unknown semi-structured (or unstructured) environment, where obstacles are detected online by the vehicle's sensors. The algorithm utilizes a probabilistic method based on particle filters to estimate the dynamic obstacles' locations, a support vector machine to provide the critical points and Bezier curves to smooth the generated path. The generated path safely travels through various static and moving obstacles and satisfies the vehicle's movement constraints. The algorithm is implemented and verified on simulation software. Simulation results demonstrate the effectiveness of the proposed method in complicated scenarios that posit the existence of multi moving objects.
Guangchun LUO Hao CHEN Caihui QU Yuhai LIU Ke QIN
Tree partitioning arises in many parallel and distributed computing applications and storage systems. Some operator scheduling problems need to partition a tree into a number of vertex-disjoint subtrees such that some constraints are satisfied and some criteria are optimized. Given a tree T with each vertex or node assigned a nonnegative integer weight, two nonnegative integers l and u (l < u), and a positive integer p, we consider the following tree partitioning problems: partitioning T into minimum number of subtrees or p subtrees, with the condition that the sum of node weights in each subtree is at most u and at least l. To solve the two problems, we provide a fast polynomial-time algorithm, including a pre-processing method and another bottom-up scheme with dynamic programming. With experimental studies, we show that our algorithm outperforms another prior algorithm presented by Ito et al. greatly.
Davood MARDANI NAJAFABADI Masoud Reza AGHABOZORGI SAHAF Ali Akbar TADAION
In this paper, we propose a new method for wideband spectrum sensing using compressed measurements of the received wideband signal; we can directly separate information of the sub-channels and perform detection in each. Wideband spectrum sensing empowers us to rapidly access the vacant sub-channels in high utilization regime. Regarding the fact that at each time instant some sub-channels are vacant, the received signal is sparse in some bases. Then we could apply the Compressive Sensing (CS) algorithms and take the compressed measurements. On the other hand, the primary user signals in different sub-channels could have different modulation types; therefore, the signal in each sub-channel is chosen among a signal space. Knowing these signal spaces, the secondary user could separate information of different sub-channels employing the compressed measurements. We perform filtering and detection based on these compressed measurements; this decreases the computational complexity of the wideband spectrum sensing. In addition, we model the received wideband signal as a vector which has a block-sparse representation on a basis consisting of all sub-channel bases whose elements occur in clusters. Based on this feature of the received signal, we propose another wideband spectrum sensing method with lower computational complexity. In order to evaluate the performance of the proposed method, we employ the Monte-Carlo simulation. According to simulations if the compression rate is selected appropriately according to the CS theorems and the problem model, the detection performance of our method leads to the performance of the ideal filter bank-based method, which uses the ideal and impractical narrow band filters.
Young-Sik EOM Jong Wook KWAK Seong-Tae JHANG Chu-Shik JHON
Chip Multiprocessors (CMPs) allow different applications to share LLC (Last Level Cache). Since each application has different cache capacity demand, LLC capacity should be partitioned in accordance with the demands. Existing partitioning algorithms estimate the capacity demand of each core by stack processing considering the LRU (Least Recently Used) replacement policy only. However, anti-thrashing replacement algorithms like BIP (Binary Insertion Policy) and BIP-Bypass emerged to overcome the thrashing problem of LRU replacement policy in a working set greater than the available cache size. Since existing stack processing cannot estimate the capacity demand with anti-thrashing replacement policy, partitioning algorithms also cannot partition cache space with anti-thrashing replacement policy. In this letter, we prove that BIP replacement policy is not feasible to stack processing but BIP-bypass is. We modify stack processing to accommodate BIP-Bypass. In addition, we propose the pipelined hardware of modified stack processing. With this hardware, we can get the success function of the various capacities with anti-thrashing replacement policy and assess the cache capacity of shared cache adequate to each core in real time.
Xianglei XING Sidan DU Hua JIANG
We extend the Nonparametric Discriminant Analysis (NDA) algorithm to a semi-supervised dimensionality reduction technique, called Semi-supervised Nonparametric Discriminant Analysis (SNDA). SNDA preserves the inherent advantages of NDA, that is, relaxing the Gaussian assumption required for the traditional LDA-based methods. SNDA takes advantage of both the discriminating power provided by the NDA method and the locality-preserving power provided by the manifold learning. Specifically, the labeled data points are used to maximize the separability between different classes and both the labeled and unlabeled data points are used to build a graph incorporating neighborhood information of the data set. Experiments on synthetic as well as real datasets demonstrate the effectiveness of the proposed approach.
Ji-Hun EO Yeon-Ho JEONG Young-Chan JANG
An 8-bit 100-kS/s successive approximation (SA) analog-to-digital converter (ADC) is proposed for measuring EEG and MEG signals in an 88 point. The architectures of a SA ADC with a single-ended analog input and a split-capacitor-based digital-to-analog converter (SC-DAC) are used to reduce the power consumption and chip area of the entire ADC. The proposed SA ADC uses a time-domain comparator that has an input offset self-calibration circuit. It also includes a serial output interface to support a daisy channel that reduces the number of channels for the multi-point sensor interface. It is designed by using a 0.35-µm 1-poly 6-metal CMOS process with a 3.3 V supply to implement together with a conventional analog circuit such as a low-noise-amplifier. The measured DNL and INL of the SA ADC are +0.63/-0.46 and +0.46/-0.51 LSB, respectively. The SNDR is 48.39 dB for a 1.11 kHz analog input signal at a sampling rate of 100 kS/s. The power consumption and core area are 38.71 µW and 0.059 mm2, respectively.
As the number of nodes in high-performance computing (HPC) systems increases, parallel I/O becomes an important issue: collective I/O is the specialized parallel I/O that provides the function of single-file based parallel I/O. Collective I/O in most message passing interface (MPI) libraries follows a two-phase I/O scheme in which the particular processes, namely I/O aggregators, perform important roles by engaging the communications and I/O operations. This approach, however, is based on a single-core architecture. Because modern HPC systems use multi-core computational nodes, the roles of I/O aggregators need to be re-evaluated. Although there have been many previous studies that have focused on the improvement of the performance of collective I/O, it is difficult to locate a study regarding the assignment scheme for I/O aggregators that considers multi-core architectures. In this research, it was discovered that the communication costs in collective I/O differed according to the placement of the I/O aggregators, where each node had multiple I/O aggregators. The performance with the two processor affinity rules was measured and the results demonstrated that the distributed affinity rule used to locate the I/O aggregators in different sockets was appropriate for collective I/O. Because there may be some applications that cannot use the distributed affinity rule, the collective I/O scheme was modified in order to guarantee the appropriate placement of the I/O aggregators for the accumulated affinity rule. The performance of the proposed scheme was examined using two Linux cluster systems, and the results demonstrated that the performance improvements were more clearly evident when the computational node of a given cluster system had a complicated architecture. Under the accumulated affinity rule, the performance improvements between the proposed scheme and the original MPI-IO were up to approximately 26.25% for the read operation and up to approximately 31.27% for the write operation.
Ulises PINEDA-RICO Enrique STEVENS-NAVARRO
Precoding is an excellent choice for complementing the MIMO systems. Linear precoding techniques offer better performance at low signal-to-noise ratios (SNRs) while non-linear techniques perform better at higher SNRs. In addition, the non-linear techniques can achieve near optimal capacity at the expense of reasonable levels of complexity. However, precoding depends on the knowledge of the wireless channel. Recent work on MIMO systems have shown that channel-knowledge at the transmitter, in either full or partial forms, can increase the channel capacity and system performance considerably. Therefore, hybrid techniques should be deployed in order to obtain a better trade-off in terms of complexity and performance. In this paper, we present a hybrid precoding technique which deals with the condition of partial channel-knowledge while offering robustness against the effects of correlation and poorly scattered channels while at the same time keeping low levels of complexity and high performance.