The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] OMP(3945hit)

1301-1320hit(3945hit)

  • Image Recovery by Decomposition with Component-Wise Regularization

    Shunsuke ONO  Takamichi MIYATA  Isao YAMADA  Katsunori YAMAOKA  

     
    PAPER-Image

      Vol:
    E95-A No:12
      Page(s):
    2470-2478

    Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.

  • Lossless Compression of Double-Precision Floating-Point Data for Numerical Simulations: Highly Parallelizable Algorithms for GPU Computing

    Mamoru OHARA  Takashi YAMAGUCHI  

     
    PAPER-Parallel and Distributed Computing

      Vol:
    E95-D No:12
      Page(s):
    2778-2786

    In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massively-parallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.

  • Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures

    Changwoo MIN  Hyung Kook JUN  Won Tae KIM  Young Ik EOM  

     
    LETTER

      Vol:
    E95-D No:12
      Page(s):
    2956-2957

    A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.

  • Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture

    Dajiang LIU  Shouyi YIN  Chongyong YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-Computer Architecture

      Vol:
    E95-D No:12
      Page(s):
    2898-2907

    Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.

  • Asymptotically Optimal Merging on ManyCore GPUs

    Arne KUTZNER  Pok-Son KIM  Won-Kwang PARK  

     
    PAPER-Parallel and Distributed Computing

      Vol:
    E95-D No:12
      Page(s):
    2769-2777

    We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log (+1)) element comparisons, where m and n are the sizes of the input sequences with m ≤ n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0 ≤ i ≤ l). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.

  • Parallel Dynamic Cloud Rendering Method Based on Physical Cellular Automata Model

    Liqiang ZHANG  Chao LI  Haoliang SUN  Changwen ZHENG  Pin LV  

     
    PAPER-Parallel and Distributed Computing

      Vol:
    E95-D No:12
      Page(s):
    2750-2758

    Due to the complicated composition of cloud and its disordered transformation, the rendering of cloud does not perfectly meet actual prospect by current methods. Based on physical characteristics of cloud, a physical cellular automata model of Dynamic cloud is designed according to intrinsic factor of cloud, which describes the rules of hydro-movement, deposition and accumulation and diffusion. Then a parallel computing architecture is designed to compute the large-scale data set required by the rendering of dynamical cloud, and a GPU-based ray-casting algorithm is implemented to render the cloud volume data. The experiment shows that cloud rendering method based on physical cellular automata model is very efficient and able to adequately exhibit the detail of cloud.

  • Geographic Routing Algorithm with Location Errors

    Yuanwei JING  Yan WANG  

     
    LETTER-Information Network

      Vol:
    E95-D No:12
      Page(s):
    3092-3096

    Geographic routing uses the geographical location information provided by nodes to make routing decisions. However, the nodes can not obtain accurate location information due to the effect of measurement error. A new routing strategy using maximum expected distance and angle (MEDA) algorithm is proposed to improve the performance and promote the successive transmission rate. We firstly introduce the expected distance and angle, and then we employ the principal component analysis to construct the object function for selecting the next hop node. We compare the proposed algorithm with maximum expectation within transmission range (MER) and greedy routing scheme (GRS) algorithms. Simulation results show that the proposed MEDA algorithm outperforms the MER and GRS algorithms with higher successive transmission rate.

  • Incremental Non-Gaussian Analysis on Multivariate EEG Signal Data

    Kam Swee NG  Hyung-Jeong YANG  Soo-Hyung KIM  Sun-Hee KIM  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:12
      Page(s):
    3010-3016

    In this paper, we propose a novel incremental method for discovering latent variables from multivariate data with high efficiency. It integrates non-Gaussianity and an adaptive incremental model in an unsupervised way to extract informative features. Our proposed method discovers a small number of compact features from a very large number of features and can still achieve good predictive performance in EEG signals. The promising EEG signal classification results from our experiments prove that this approach can successfully extract important features. Our proposed method also has low memory requirements and computational costs.

  • Parameterization of Perfect Sequences over a Composition Algebra

    Takao MAEDA  Takafumi HAYASHI  

     
    PAPER-Sequence

      Vol:
    E95-A No:12
      Page(s):
    2139-2147

    A parameterization of perfect sequences over composition algebras over the real number field is presented. According to the proposed parameterization theorem, a perfect sequence can be represented as a sum of trigonometric functions and points on a unit sphere of the algebra. Because of the non-commutativity of the multiplication, there are two definitions of perfect sequences, but the equivalence of the definitions is easily shown using the theorem. A composition sequence of sequences is introduced. Despite the non-associativity, the proposed theorem reveals that the composition sequence from perfect sequences is perfect.

  • Compact Modeling of Expansion Effects in LDMOS

    Takahiro IIZUKA  Takashi SAKUDA  Yasunori ORITSUKI  Akihiro TANAKA  Masataka MIYAKE  Hideyuki KIKUCHIHARA  Uwe FELDMANN  Hans Jurgen MATTAUSCH  Mitiko MIURA-MATTAUSCH  

     
    PAPER-Semiconductor Materials and Devices

      Vol:
    E95-C No:11
      Page(s):
    1817-1823

    In LDMOS devices for high-voltage applications, there appears a notable fingerprint of current-voltage characteristics known as soft breakdown. Its mechanism is analyzed and modeled on LDMOS devices where a high resistive drift region exists. This analysis has revealed that the softness of breakdown, known as the expansion effect, withholding a run-away of current, is contributed by the flux of holes underneath the gate-overlap region originated by impact-ionization. The mechanism of the expansion effect is modeled and implemented into the compact model HiSIM_HV for circuit simulation. A good agreement between simulated characteristics and 2D-device simulation results is verified.

  • An Efficient Multiplexing Scheme for COMPASS B3 Signals

    Wei LIU  Yuan HU  Xingqun ZHAN  

     
    LETTER-Navigation, Guidance and Control Systems

      Vol:
    E95-B No:11
      Page(s):
    3633-3636

    In the framework of the modernization plan of COMPASS system, the existing COMPASS signals should be transmitted along with the modernized signals to maintain backward compatibility. In this paper, an efficient multiplexing scheme based on the optimal aligning method for combining COMPASS Phase II B3 and Phase III B3 signals is proposed, which offers significantly higher efficiency than Interplex and Generalized Majority Voting (GMV) multiplexing methods. The proposed scheme can provide potential opportunities for COMPASS system and other global navigation satellite systems (GNSS) modernization and construction plans.

  • 16-QAM Golay, Periodic and Z- Complementary Sequence Sets

    Fanxin ZENG  Xiaoping ZENG  Zhenyu ZHANG  Guixin XUAN  

     
    LETTER-Information Theory

      Vol:
    E95-A No:11
      Page(s):
    2084-2089

    In contemporary communications, Golay, periodic and Z- complementary sequence sets play a very important role, since such sequence sets possess impulse-like or zero correlation zone (ZCZ) autocorrelation. On the other hand, the advantages of the signals over the quadrature amplitude modulation (QAM) constellation are more and more prominent. Hence, the design of such sequence sets over the QAM constellation has turned into one of the all-important issues in communications. Therefore, the construction methods of such sequence sets over the 16-QAM constellation are investigated, in this letter, and our goals are arrived at by the known quaternary Golay, periodic and Z- complementary sequence sets. Finally, many examples illuminate the validity of the proposed methods.

  • Antenna Ordering in Low Complexity MIMO Detection Based on Ring-Type Markov Random Fields

    Seokhyun YOON  Kangwoon SEO  Taehyun JEON  

     
    LETTER-Wireless Communication Technologies

      Vol:
    E95-B No:11
      Page(s):
    3621-3624

    This letter addresses antenna ordering to improve the performance of the MIMO detectors in [4], where two low complexity MIMO detectors have been proposed based on either fully-connected or ring type pair-wise Markov random field (MRF). The former was shown to be better than the latter, while being more complex. The objective of this letter is to make the performance of the detector based on ring-type MRF (with complexity of O(2M 22m)) close to or better than that of fully-connected MRF (with complexity of O(M (M-1)22m)), by applying appropriate antenna ordering. The simulation results validate the proposed antenna ordering methods.

  • Post-Compensation Technique for Carrier Superposed Satellite Channel Including Nonlinear TWTA

    Takehiro ISHIGURO  Takao HARA  Minoru OKADA  

     
    PAPER

      Vol:
    E95-B No:11
      Page(s):
    3420-3427

    For effective use of the frequency band, carrier superposing (common band) technique has been introduced to satellite communication systems. On the other hand, satellite's TWTA (Traveling Wave Tube Amplifier) should be operated near its saturation level for power efficiency. However, the TWTA nonlinearity characteristics around that level causes interference in carrier superposing systems. Therefore in this paper, a post-compensation technique for TWTA nonlinear distortion is introduced and verified for practical use in a carrier superposed Point to Point satellite communication system which adopts interference canceller. Simulation results show that it is possible to reduce the bit error rate degradation over the entire range, especially at nonlinear operating point.

  • Even-Shift Orthogonal Arrays

    Shinya MATSUFUJI  Takahiro MATSUMOTO  Pingzhi FAN  

     
    LETTER-Sequences

      Vol:
    E95-A No:11
      Page(s):
    1937-1940

    The even-shift orthogonal sequence whose out-of-phase aperiodic autocorrelation function takes zero at any even shifts is generalized to multi-dimension called even-shift orthogonal array (E-array), and the logic function of E-array of power-of-two length is clarified. It is shown that E-array can be constructed by complementary arrays, which mean pairs of arrays that the sum of each aperiodic autocorrelation function at the same phase shifts takes zero at any shift except zero shift, as well as the one-dimensional case. It is also shown that the number of mates of E-array with which the cross correlation function between E-arrays takes zero at any even shifts is equal to the dimension. Furthermore it is investigated that E-array possesses good aperiodic autocorrelation that the rate of zero correlation values to array length approaches one as the dimension becomes large.

  • Novel Channel Allocation Algorithm Using Spectrum Control Technique for Effective Usage of both Satellite Transponder Bandwidth and Satellite Transmission Power

    Katsuya NAKAHIRA  Jun-ichi ABE  Jun MASHINO  Takatoshi SUGIYAMA  

     
    PAPER

      Vol:
    E95-B No:11
      Page(s):
    3393-3403

    This paper proposes a new channel allocation algorithm for satellite communication systems. The algorithm is based on a spectrum division transmission technique as well as a spectrum compression transmission technique that we have developed in separate pieces of work. Using these techniques, the algorithm optimizes the spectrum bandwidth and a MODCOD (modulation and FEC error coding rate) scheme to balance the usable amount of satellite transponder bandwidth and satellite transmission power. Moreover, it determines the center frequency and bandwidth of each divided subspectra depending on the unused bandwidth of the satellite transponder bandwidth. As a result, the proposed algorithm enables flexible and effective usage of satellite resources (bandwidth and power) in channel allocations and thus enhances satellite communication (SATCOM) system capacity.

  • Trusted Inter-Domain Fast Authentication Protocol in Split Mechanism Network

    Lijuan ZHENG  Yingxin HU  Zhen HAN  Fei MA  

     
    LETTER-Information Network

      Vol:
    E95-D No:11
      Page(s):
    2728-2731

    Previous inter-domain fast authentication schemes only realize the authentication of user identity. We propose a trusted inter-domain fast authentication scheme based on the split mechanism network. The proposed scheme can realize proof of identity and integrity verification of the platform as well as proof of the user identity. In our scheme, when the mobile terminal moves to a new domain, the visited domain directly authenticates the mobile terminal using the ticket issued by the home domain rather than authenticating it through its home domain. We demonstrate that the proposed scheme is highly effective and more secure than contemporary inter-domain fast authentication schemes.

  • Generalized Shisen-Sho is NP-Complete

    Chuzo IWAMOTO  Yoshihiro WADA  Kenichi MORITA  

     
    LETTER-Fundamentals of Information Systems

      Vol:
    E95-D No:11
      Page(s):
    2712-2715

    Shisen-Sho is a tile-based one-player game. The instance is a set of 136 tiles embedded on 817 rectangular grids. Two tiles can be removed if they are labeled by the same number and if they are adjacent or can be connected with at most three orthogonal line segments. Here, line segments must not cross tiles. The aim of the game is to remove all of the 136 tiles. In this paper, we consider the generalized version of Shisen-Sho, which uses an arbitrary number of tiles embedded on rectangular grids. It is shown that deciding whether the player can remove all of the tiles is NP-complete.

  • MLICA-Based Separation Algorithm for Complex Sinusoidal Signals with PDF Parameter Optimization

    Tetsuhiro OKANO  Shouhei KIDERA  Tetsuo KIRIMOTO  

     
    PAPER-Sensing

      Vol:
    E95-B No:11
      Page(s):
    3556-3562

    Blind source separation (BSS) techniques are required for various signal decomposing issues. Independent component analysis (ICA), assuming only a statistical independence among stochastic source signals, is one of the most useful BSS tools because it does not need a priori information on each source. However, there are many requirements for decomposing multiple deterministic signals such as complex sinusoidal signals with different frequencies. These requirements may include pulse compression or clutter rejection. It has been theoretically shown that an ICA algorithm based on maximizing non-Gaussianity successfully decomposes such deterministic signals. However, this ICA algorithm does not maintain a sufficient separation performance when the frequency difference of the sinusoidal waves becomes less than a nominal frequency resolution. To solve this problem, this paper proposes a super-resolution algorithm for complex sinusoidal signals by extending the maximum likelihood ICA, where the probability density function (PDF) of a complex sinusoidal signal is exploited as a priori knowledge, in which the PDF of the signal amplitude is approximated as a Gaussian distribution with an extremely small standard deviation. Furthermore, we introduce an optimization process for this standard deviation to avoid divergence in updating the reconstruction matrix. Numerical simulations verify that our proposed algorithm remarkably enhances the separation performance compared to the conventional one, and accomplishes a super-resolution separation even in noisy situations.

  • A Comprehensive Instrument for Measuring Individual Competency of IT Applications in an Enterprise IT Environment

    Chui Young YOON  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:11
      Page(s):
    2651-2657

    An instrument that can efficiently measure individual competency of IT applications (ICITA) is presented. It allows an organization to develop and manage the IT application capability of individuals working in an enterprise IT environment. The measurement items are generated from the definition and major components of individual competency of IT applications. The reliability and validity of the instrument construct are verified by factor and correlation analysis. A 15-item instrument is proposed to efficiently measure individual competency of IT applications and the instrument will contribute to the improved ICITA of human resources working in an enterprise IT environment.

1301-1320hit(3945hit)