IEICE global.ieice.org Site

Keyword Search Result

[Keyword] OMP(3945hit)

1301-1320hit(3945hit)

Image Recovery by Decomposition with Component-Wise Regularization
Shunsuke ONO Takamichi MIYATA Isao YAMADA Katsunori YAMAOKA

PAPER-Image

Vol:
E95-A No:12
Page(s):
2470-2478
Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.
Lossless Compression of Double-Precision Floating-Point Data for Numerical Simulations: Highly Parallelizable Algorithms for GPU Computing
Mamoru OHARA Takashi YAMAGUCHI

PAPER-Parallel and Distributed Computing

Vol:
E95-D No:12
Page(s):
2778-2786
In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massively-parallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.
Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures
Changwoo MIN Hyung Kook JUN Won Tae KIM Young Ik EOM

LETTER

Vol:
E95-D No:12
Page(s):
2956-2957
A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.
Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture
Dajiang LIU Shouyi YIN Chongyong YIN Leibo LIU Shaojun WEI

PAPER-Computer Architecture

Vol:
E95-D No:12
Page(s):
2898-2907
Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.
Asymptotically Optimal Merging on ManyCore GPUs
Arne KUTZNER Pok-Son KIM Won-Kwang PARK

PAPER-Parallel and Distributed Computing

Vol:
E95-D No:12
Page(s):
2769-2777
We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log (+1)) element comparisons, where m and n are the sizes of the input sequences with m ≤ n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0 ≤ i ≤ l). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.
Parallel Dynamic Cloud Rendering Method Based on Physical Cellular Automata Model
Liqiang ZHANG Chao LI Haoliang SUN Changwen ZHENG Pin LV

PAPER-Parallel and Distributed Computing

Vol:
E95-D No:12
Page(s):
2750-2758
Due to the complicated composition of cloud and its disordered transformation, the rendering of cloud does not perfectly meet actual prospect by current methods. Based on physical characteristics of cloud, a physical cellular automata model of Dynamic cloud is designed according to intrinsic factor of cloud, which describes the rules of hydro-movement, deposition and accumulation and diffusion. Then a parallel computing architecture is designed to compute the large-scale data set required by the rendering of dynamical cloud, and a GPU-based ray-casting algorithm is implemented to render the cloud volume data. The experiment shows that cloud rendering method based on physical cellular automata model is very efficient and able to adequately exhibit the detail of cloud.
Geographic Routing Algorithm with Location Errors
Yuanwei JING Yan WANG

LETTER-Information Network

Vol:
E95-D No:12
Page(s):
3092-3096
Geographic routing uses the geographical location information provided by nodes to make routing decisions. However, the nodes can not obtain accurate location information due to the effect of measurement error. A new routing strategy using maximum expected distance and angle (MEDA) algorithm is proposed to improve the performance and promote the successive transmission rate. We firstly introduce the expected distance and angle, and then we employ the principal component analysis to construct the object function for selecting the next hop node. We compare the proposed algorithm with maximum expectation within transmission range (MER) and greedy routing scheme (GRS) algorithms. Simulation results show that the proposed MEDA algorithm outperforms the MER and GRS algorithms with higher successive transmission rate.
Incremental Non-Gaussian Analysis on Multivariate EEG Signal Data
Kam Swee NG Hyung-Jeong YANG Soo-Hyung KIM Sun-Hee KIM

PAPER-Artificial Intelligence, Data Mining

Vol:
E95-D No:12
Page(s):
3010-3016
In this paper, we propose a novel incremental method for discovering latent variables from multivariate data with high efficiency. It integrates non-Gaussianity and an adaptive incremental model in an unsupervised way to extract informative features. Our proposed method discovers a small number of compact features from a very large number of features and can still achieve good predictive performance in EEG signals. The promising EEG signal classification results from our experiments prove that this approach can successfully extract important features. Our proposed method also has low memory requirements and computational costs.
Parameterization of Perfect Sequences over a Composition Algebra
Takao MAEDA Takafumi HAYASHI

PAPER-Sequence

Vol:
E95-A No:12
Page(s):
2139-2147
A parameterization of perfect sequences over composition algebras over the real number field is presented. According to the proposed parameterization theorem, a perfect sequence can be represented as a sum of trigonometric functions and points on a unit sphere of the algebra. Because of the non-commutativity of the multiplication, there are two definitions of perfect sequences, but the equivalence of the definitions is easily shown using the theorem. A composition sequence of sequences is introduced. Despite the non-associativity, the proposed theorem reveals that the composition sequence from perfect sequences is perfect.
Compact Modeling of Expansion Effects in LDMOS
Takahiro IIZUKA Takashi SAKUDA Yasunori ORITSUKI Akihiro TANAKA Masataka MIYAKE Hideyuki KIKUCHIHARA Uwe FELDMANN Hans Jurgen MATTAUSCH Mitiko MIURA-MATTAUSCH

PAPER-Semiconductor Materials and Devices

Vol:
E95-C No:11
Page(s):
1817-1823
In LDMOS devices for high-voltage applications, there appears a notable fingerprint of current-voltage characteristics known as soft breakdown. Its mechanism is analyzed and modeled on LDMOS devices where a high resistive drift region exists. This analysis has revealed that the softness of breakdown, known as the expansion effect, withholding a run-away of current, is contributed by the flux of holes underneath the gate-overlap region originated by impact-ionization. The mechanism of the expansion effect is modeled and implemented into the compact model HiSIM_HV for circuit simulation. A good agreement between simulated characteristics and 2D-device simulation results is verified.
An Efficient Multiplexing Scheme for COMPASS B3 Signals
Wei LIU Yuan HU Xingqun ZHAN

LETTER-Navigation, Guidance and Control Systems

Vol:
E95-B No:11
Page(s):
3633-3636
In the framework of the modernization plan of COMPASS system, the existing COMPASS signals should be transmitted along with the modernized signals to maintain backward compatibility. In this paper, an efficient multiplexing scheme based on the optimal aligning method for combining COMPASS Phase II B3 and Phase III B3 signals is proposed, which offers significantly higher efficiency than Interplex and Generalized Majority Voting (GMV) multiplexing methods. The proposed scheme can provide potential opportunities for COMPASS system and other global navigation satellite systems (GNSS) modernization and construction plans.
16-QAM Golay, Periodic and Z- Complementary Sequence Sets
Fanxin ZENG Xiaoping ZENG Zhenyu ZHANG Guixin XUAN

LETTER-Information Theory

Vol:
E95-A No:11
Page(s):
2084-2089
In contemporary communications, Golay, periodic and Z- complementary sequence sets play a very important role, since such sequence sets possess impulse-like or zero correlation zone (ZCZ) autocorrelation. On the other hand, the advantages of the signals over the quadrature amplitude modulation (QAM) constellation are more and more prominent. Hence, the design of such sequence sets over the QAM constellation has turned into one of the all-important issues in communications. Therefore, the construction methods of such sequence sets over the 16-QAM constellation are investigated, in this letter, and our goals are arrived at by the known quaternary Golay, periodic and Z- complementary sequence sets. Finally, many examples illuminate the validity of the proposed methods.
Antenna Ordering in Low Complexity MIMO Detection Based on Ring-Type Markov Random Fields
Seokhyun YOON Kangwoon SEO Taehyun JEON

LETTER-Wireless Communication Technologies

Vol:
E95-B No:11
Page(s):
3621-3624
This letter addresses antenna ordering to improve the performance of the MIMO detectors in [4], where two low complexity MIMO detectors have been proposed based on either fully-connected or ring type pair-wise Markov random field (MRF). The former was shown to be better than the latter, while being more complex. The objective of this letter is to make the performance of the detector based on ring-type MRF (with complexity of O(2M 22m)) close to or better than that of fully-connected MRF (with complexity of O(M (M-1)22m)), by applying appropriate antenna ordering. The simulation results validate the proposed antenna ordering methods.
Post-Compensation Technique for Carrier Superposed Satellite Channel Including Nonlinear TWTA
Takehiro ISHIGURO Takao HARA Minoru OKADA

PAPER

Vol:
E95-B No:11
Page(s):
3420-3427
For effective use of the frequency band, carrier superposing (common band) technique has been introduced to satellite communication systems. On the other hand, satellite's TWTA (Traveling Wave Tube Amplifier) should be operated near its saturation level for power efficiency. However, the TWTA nonlinearity characteristics around that level causes interference in carrier superposing systems. Therefore in this paper, a post-compensation technique for TWTA nonlinear distortion is introduced and verified for practical use in a carrier superposed Point to Point satellite communication system which adopts interference canceller. Simulation results show that it is possible to reduce the bit error rate degradation over the entire range, especially at nonlinear operating point.
Even-Shift Orthogonal Arrays
Shinya MATSUFUJI Takahiro MATSUMOTO Pingzhi FAN

LETTER-Sequences

Vol:
E95-A No:11
Page(s):
1937-1940
The even-shift orthogonal sequence whose out-of-phase aperiodic autocorrelation function takes zero at any even shifts is generalized to multi-dimension called even-shift orthogonal array (E-array), and the logic function of E-array of power-of-two length is clarified. It is shown that E-array can be constructed by complementary arrays, which mean pairs of arrays that the sum of each aperiodic autocorrelation function at the same phase shifts takes zero at any shift except zero shift, as well as the one-dimensional case. It is also shown that the number of mates of E-array with which the cross correlation function between E-arrays takes zero at any even shifts is equal to the dimension. Furthermore it is investigated that E-array possesses good aperiodic autocorrelation that the rate of zero correlation values to array length approaches one as the dimension becomes large.
Novel Channel Allocation Algorithm Using Spectrum Control Technique for Effective Usage of both Satellite Transponder Bandwidth and Satellite Transmission Power
Katsuya NAKAHIRA Jun-ichi ABE Jun MASHINO Takatoshi SUGIYAMA

PAPER

Vol:
E95-B No:11
Page(s):
3393-3403
This paper proposes a new channel allocation algorithm for satellite communication systems. The algorithm is based on a spectrum division transmission technique as well as a spectrum compression transmission technique that we have developed in separate pieces of work. Using these techniques, the algorithm optimizes the spectrum bandwidth and a MODCOD (modulation and FEC error coding rate) scheme to balance the usable amount of satellite transponder bandwidth and satellite transmission power. Moreover, it determines the center frequency and bandwidth of each divided subspectra depending on the unused bandwidth of the satellite transponder bandwidth. As a result, the proposed algorithm enables flexible and effective usage of satellite resources (bandwidth and power) in channel allocations and thus enhances satellite communication (SATCOM) system capacity.
Trusted Inter-Domain Fast Authentication Protocol in Split Mechanism Network
Lijuan ZHENG Yingxin HU Zhen HAN Fei MA

LETTER-Information Network

Vol:
E95-D No:11
Page(s):
2728-2731
Previous inter-domain fast authentication schemes only realize the authentication of user identity. We propose a trusted inter-domain fast authentication scheme based on the split mechanism network. The proposed scheme can realize proof of identity and integrity verification of the platform as well as proof of the user identity. In our scheme, when the mobile terminal moves to a new domain, the visited domain directly authenticates the mobile terminal using the ticket issued by the home domain rather than authenticating it through its home domain. We demonstrate that the proposed scheme is highly effective and more secure than contemporary inter-domain fast authentication schemes.
Generalized Shisen-Sho is NP-Complete
Chuzo IWAMOTO Yoshihiro WADA Kenichi MORITA

LETTER-Fundamentals of Information Systems

Vol:
E95-D No:11
Page(s):
2712-2715
Shisen-Sho is a tile-based one-player game. The instance is a set of 136 tiles embedded on 817 rectangular grids. Two tiles can be removed if they are labeled by the same number and if they are adjacent or can be connected with at most three orthogonal line segments. Here, line segments must not cross tiles. The aim of the game is to remove all of the 136 tiles. In this paper, we consider the generalized version of Shisen-Sho, which uses an arbitrary number of tiles embedded on rectangular grids. It is shown that deciding whether the player can remove all of the tiles is NP-complete.
MLICA-Based Separation Algorithm for Complex Sinusoidal Signals with PDF Parameter Optimization
Tetsuhiro OKANO Shouhei KIDERA Tetsuo KIRIMOTO

PAPER-Sensing

Vol:
E95-B No:11
Page(s):
3556-3562
Blind source separation (BSS) techniques are required for various signal decomposing issues. Independent component analysis (ICA), assuming only a statistical independence among stochastic source signals, is one of the most useful BSS tools because it does not need a priori information on each source. However, there are many requirements for decomposing multiple deterministic signals such as complex sinusoidal signals with different frequencies. These requirements may include pulse compression or clutter rejection. It has been theoretically shown that an ICA algorithm based on maximizing non-Gaussianity successfully decomposes such deterministic signals. However, this ICA algorithm does not maintain a sufficient separation performance when the frequency difference of the sinusoidal waves becomes less than a nominal frequency resolution. To solve this problem, this paper proposes a super-resolution algorithm for complex sinusoidal signals by extending the maximum likelihood ICA, where the probability density function (PDF) of a complex sinusoidal signal is exploited as a priori knowledge, in which the PDF of the signal amplitude is approximated as a Gaussian distribution with an extremely small standard deviation. Furthermore, we introduce an optimization process for this standard deviation to avoid divergence in updating the reconstruction matrix. Numerical simulations verify that our proposed algorithm remarkably enhances the separation performance compared to the conventional one, and accomplishes a super-resolution separation even in noisy situations.
A Comprehensive Instrument for Measuring Individual Competency of IT Applications in an Enterprise IT Environment
Chui Young YOON

PAPER-Artificial Intelligence, Data Mining

Vol:
E95-D No:11
Page(s):
2651-2657
An instrument that can efficiently measure individual competency of IT applications (ICITA) is presented. It allows an organization to develop and manage the IT application capability of individuals working in an enterprise IT environment. The measurement items are generated from the definition and major components of individual competency of IT applications. The reliability and validity of the instrument construct are verified by factor and correlation analysis. A 15-item instrument is proposed to efficiently measure individual competency of IT applications and the instrument will contribute to the improved ICITA of human resources working in an enterprise IT environment.

1301-1320hit(3945hit)

Keyword Search Result

[Keyword] OMP(3945hit)

Image Recovery by Decomposition with Component-Wise Regularization

Lossless Compression of Double-Precision Floating-Point Data for Numerical Simulations: Highly Parallelizable Algorithms for GPU Computing

Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures

Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture

Asymptotically Optimal Merging on ManyCore GPUs

Parallel Dynamic Cloud Rendering Method Based on Physical Cellular Automata Model

Geographic Routing Algorithm with Location Errors

Incremental Non-Gaussian Analysis on Multivariate EEG Signal Data

Parameterization of Perfect Sequences over a Composition Algebra

Compact Modeling of Expansion Effects in LDMOS

An Efficient Multiplexing Scheme for COMPASS B3 Signals

16-QAM Golay, Periodic and Z- Complementary Sequence Sets

Antenna Ordering in Low Complexity MIMO Detection Based on Ring-Type Markov Random Fields

Post-Compensation Technique for Carrier Superposed Satellite Channel Including Nonlinear TWTA

Even-Shift Orthogonal Arrays

Novel Channel Allocation Algorithm Using Spectrum Control Technique for Effective Usage of both Satellite Transponder Bandwidth and Satellite Transmission Power

Trusted Inter-Domain Fast Authentication Protocol in Split Mechanism Network

Generalized Shisen-Sho is NP-Complete

MLICA-Based Separation Algorithm for Complex Sinusoidal Signals with PDF Parameter Optimization

A Comprehensive Instrument for Measuring Individual Competency of IT Applications in an Enterprise IT Environment

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles