Shunsuke ONO Takamichi MIYATA Isao YAMADA Katsunori YAMAOKA
Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.
Mamoru OHARA Takashi YAMAGUCHI
In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massively-parallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.
Changwoo MIN Hyung Kook JUN Won Tae KIM Young Ik EOM
A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.
Dajiang LIU Shouyi YIN Chongyong YIN Leibo LIU Shaojun WEI
Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.
Arne KUTZNER Pok-Son KIM Won-Kwang PARK
We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log (+1)) element comparisons, where m and n are the sizes of the input sequences with m ≤ n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0 ≤ i ≤ l). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.
Liqiang ZHANG Chao LI Haoliang SUN Changwen ZHENG Pin LV
Due to the complicated composition of cloud and its disordered transformation, the rendering of cloud does not perfectly meet actual prospect by current methods. Based on physical characteristics of cloud, a physical cellular automata model of Dynamic cloud is designed according to intrinsic factor of cloud, which describes the rules of hydro-movement, deposition and accumulation and diffusion. Then a parallel computing architecture is designed to compute the large-scale data set required by the rendering of dynamical cloud, and a GPU-based ray-casting algorithm is implemented to render the cloud volume data. The experiment shows that cloud rendering method based on physical cellular automata model is very efficient and able to adequately exhibit the detail of cloud.
Geographic routing uses the geographical location information provided by nodes to make routing decisions. However, the nodes can not obtain accurate location information due to the effect of measurement error. A new routing strategy using maximum expected distance and angle (MEDA) algorithm is proposed to improve the performance and promote the successive transmission rate. We firstly introduce the expected distance and angle, and then we employ the principal component analysis to construct the object function for selecting the next hop node. We compare the proposed algorithm with maximum expectation within transmission range (MER) and greedy routing scheme (GRS) algorithms. Simulation results show that the proposed MEDA algorithm outperforms the MER and GRS algorithms with higher successive transmission rate.
Kam Swee NG Hyung-Jeong YANG Soo-Hyung KIM Sun-Hee KIM
In this paper, we propose a novel incremental method for discovering latent variables from multivariate data with high efficiency. It integrates non-Gaussianity and an adaptive incremental model in an unsupervised way to extract informative features. Our proposed method discovers a small number of compact features from a very large number of features and can still achieve good predictive performance in EEG signals. The promising EEG signal classification results from our experiments prove that this approach can successfully extract important features. Our proposed method also has low memory requirements and computational costs.
A parameterization of perfect sequences over composition algebras over the real number field is presented. According to the proposed parameterization theorem, a perfect sequence can be represented as a sum of trigonometric functions and points on a unit sphere of the algebra. Because of the non-commutativity of the multiplication, there are two definitions of perfect sequences, but the equivalence of the definitions is easily shown using the theorem. A composition sequence of sequences is introduced. Despite the non-associativity, the proposed theorem reveals that the composition sequence from perfect sequences is perfect.
Takahiro IIZUKA Takashi SAKUDA Yasunori ORITSUKI Akihiro TANAKA Masataka MIYAKE Hideyuki KIKUCHIHARA Uwe FELDMANN Hans Jurgen MATTAUSCH Mitiko MIURA-MATTAUSCH
In LDMOS devices for high-voltage applications, there appears a notable fingerprint of current-voltage characteristics known as soft breakdown. Its mechanism is analyzed and modeled on LDMOS devices where a high resistive drift region exists. This analysis has revealed that the softness of breakdown, known as the expansion effect, withholding a run-away of current, is contributed by the flux of holes underneath the gate-overlap region originated by impact-ionization. The mechanism of the expansion effect is modeled and implemented into the compact model HiSIM_HV for circuit simulation. A good agreement between simulated characteristics and 2D-device simulation results is verified.
In the framework of the modernization plan of COMPASS system, the existing COMPASS signals should be transmitted along with the modernized signals to maintain backward compatibility. In this paper, an efficient multiplexing scheme based on the optimal aligning method for combining COMPASS Phase II B3 and Phase III B3 signals is proposed, which offers significantly higher efficiency than Interplex and Generalized Majority Voting (GMV) multiplexing methods. The proposed scheme can provide potential opportunities for COMPASS system and other global navigation satellite systems (GNSS) modernization and construction plans.
Fanxin ZENG Xiaoping ZENG Zhenyu ZHANG Guixin XUAN
In contemporary communications, Golay, periodic and Z- complementary sequence sets play a very important role, since such sequence sets possess impulse-like or zero correlation zone (ZCZ) autocorrelation. On the other hand, the advantages of the signals over the quadrature amplitude modulation (QAM) constellation are more and more prominent. Hence, the design of such sequence sets over the QAM constellation has turned into one of the all-important issues in communications. Therefore, the construction methods of such sequence sets over the 16-QAM constellation are investigated, in this letter, and our goals are arrived at by the known quaternary Golay, periodic and Z- complementary sequence sets. Finally, many examples illuminate the validity of the proposed methods.
Seokhyun YOON Kangwoon SEO Taehyun JEON
This letter addresses antenna ordering to improve the performance of the MIMO detectors in [4], where two low complexity MIMO detectors have been proposed based on either fully-connected or ring type pair-wise Markov random field (MRF). The former was shown to be better than the latter, while being more complex. The objective of this letter is to make the performance of the detector based on ring-type MRF (with complexity of O(2M 22m)) close to or better than that of fully-connected MRF (with complexity of O(M (M-1)22m)), by applying appropriate antenna ordering. The simulation results validate the proposed antenna ordering methods.
Takehiro ISHIGURO Takao HARA Minoru OKADA
For effective use of the frequency band, carrier superposing (common band) technique has been introduced to satellite communication systems. On the other hand, satellite's TWTA (Traveling Wave Tube Amplifier) should be operated near its saturation level for power efficiency. However, the TWTA nonlinearity characteristics around that level causes interference in carrier superposing systems. Therefore in this paper, a post-compensation technique for TWTA nonlinear distortion is introduced and verified for practical use in a carrier superposed Point to Point satellite communication system which adopts interference canceller. Simulation results show that it is possible to reduce the bit error rate degradation over the entire range, especially at nonlinear operating point.
Shinya MATSUFUJI Takahiro MATSUMOTO Pingzhi FAN
The even-shift orthogonal sequence whose out-of-phase aperiodic autocorrelation function takes zero at any even shifts is generalized to multi-dimension called even-shift orthogonal array (E-array), and the logic function of E-array of power-of-two length is clarified. It is shown that E-array can be constructed by complementary arrays, which mean pairs of arrays that the sum of each aperiodic autocorrelation function at the same phase shifts takes zero at any shift except zero shift, as well as the one-dimensional case. It is also shown that the number of mates of E-array with which the cross correlation function between E-arrays takes zero at any even shifts is equal to the dimension. Furthermore it is investigated that E-array possesses good aperiodic autocorrelation that the rate of zero correlation values to array length approaches one as the dimension becomes large.
Katsuya NAKAHIRA Jun-ichi ABE Jun MASHINO Takatoshi SUGIYAMA
This paper proposes a new channel allocation algorithm for satellite communication systems. The algorithm is based on a spectrum division transmission technique as well as a spectrum compression transmission technique that we have developed in separate pieces of work. Using these techniques, the algorithm optimizes the spectrum bandwidth and a MODCOD (modulation and FEC error coding rate) scheme to balance the usable amount of satellite transponder bandwidth and satellite transmission power. Moreover, it determines the center frequency and bandwidth of each divided subspectra depending on the unused bandwidth of the satellite transponder bandwidth. As a result, the proposed algorithm enables flexible and effective usage of satellite resources (bandwidth and power) in channel allocations and thus enhances satellite communication (SATCOM) system capacity.
Lijuan ZHENG Yingxin HU Zhen HAN Fei MA
Previous inter-domain fast authentication schemes only realize the authentication of user identity. We propose a trusted inter-domain fast authentication scheme based on the split mechanism network. The proposed scheme can realize proof of identity and integrity verification of the platform as well as proof of the user identity. In our scheme, when the mobile terminal moves to a new domain, the visited domain directly authenticates the mobile terminal using the ticket issued by the home domain rather than authenticating it through its home domain. We demonstrate that the proposed scheme is highly effective and more secure than contemporary inter-domain fast authentication schemes.
Chuzo IWAMOTO Yoshihiro WADA Kenichi MORITA
Shisen-Sho is a tile-based one-player game. The instance is a set of 136 tiles embedded on 817 rectangular grids. Two tiles can be removed if they are labeled by the same number and if they are adjacent or can be connected with at most three orthogonal line segments. Here, line segments must not cross tiles. The aim of the game is to remove all of the 136 tiles. In this paper, we consider the generalized version of Shisen-Sho, which uses an arbitrary number of tiles embedded on rectangular grids. It is shown that deciding whether the player can remove all of the tiles is NP-complete.
Tetsuhiro OKANO Shouhei KIDERA Tetsuo KIRIMOTO
Blind source separation (BSS) techniques are required for various signal decomposing issues. Independent component analysis (ICA), assuming only a statistical independence among stochastic source signals, is one of the most useful BSS tools because it does not need a priori information on each source. However, there are many requirements for decomposing multiple deterministic signals such as complex sinusoidal signals with different frequencies. These requirements may include pulse compression or clutter rejection. It has been theoretically shown that an ICA algorithm based on maximizing non-Gaussianity successfully decomposes such deterministic signals. However, this ICA algorithm does not maintain a sufficient separation performance when the frequency difference of the sinusoidal waves becomes less than a nominal frequency resolution. To solve this problem, this paper proposes a super-resolution algorithm for complex sinusoidal signals by extending the maximum likelihood ICA, where the probability density function (PDF) of a complex sinusoidal signal is exploited as a priori knowledge, in which the PDF of the signal amplitude is approximated as a Gaussian distribution with an extremely small standard deviation. Furthermore, we introduce an optimization process for this standard deviation to avoid divergence in updating the reconstruction matrix. Numerical simulations verify that our proposed algorithm remarkably enhances the separation performance compared to the conventional one, and accomplishes a super-resolution separation even in noisy situations.
An instrument that can efficiently measure individual competency of IT applications (ICITA) is presented. It allows an organization to develop and manage the IT application capability of individuals working in an enterprise IT environment. The measurement items are generated from the definition and major components of individual competency of IT applications. The reliability and validity of the instrument construct are verified by factor and correlation analysis. A 15-item instrument is proposed to efficiently measure individual competency of IT applications and the instrument will contribute to the improved ICITA of human resources working in an enterprise IT environment.