Narihiro NAKAMOTO Tomohiro OKA Shoichi KITAZAWA Hiroshi BAN Kiyoshi KOBAYASHI
To better understand antenna properties in a narrow space such as in a densely-packed device, a circular microstrip antenna in a narrow parallel-plate waveguide is theoretically studied. An analytical expression is derived for the input impedance in a parallel-plate waveguide by using the cavity model with surface admittance on the side wall. The surface admittance is defined by the external magnetic field due to the equivalent magnetic current at the aperture and takes into account the contribution of the parallel plates to the antenna. The magnetic field external to the antenna, that is in the parallel-plate region, is determined by using a dyadic Green's function. The input impedance is then calculated by a basic definition based on the conservation of the complex power. An analytical expression which couples the resonant frequency and the surface susceptance is also formulated. Presented expressions are validated by comparison with experimental results.
Meixun JIN Yong-Hun LEE Jong-Hyeok LEE
This paper presents a new span-based dependency chart parsing algorithm that models the relations between the left and right dependents of a head. Such relations cannot be modeled in existing span-based algorithms, despite their popularity in dependency corpora. We address this problem through ternary-span combination during the subtree derivation. By modeling the relations between the left and right dependents of a head, our proposed algorithm provides a better capability of coordination disambiguation when the conjunction is annotated as the head of the left and right conjuncts. This eventually leads to state-of-the-art performance of dependency parsing on the Chinese data of the CoNLL shared task.
Yasuhiko TAMURA Junichi NAKAYAMA
This paper deals with reflection and transmission of a TE plane wave from a one-dimensional random slab with slanted fluctuation by means of the stochastic functional approach. By starting with a generalized representation of the random wavefield from a two-dimensional random slab, and by using a manner for slanted anisotropic fluctuation, the corresponding random wavefield representation and its statistical quantities for one-dimensional cases are newly derived. The first-order incoherent scattering cross section is numerically calculated and illustrated in figures.
Marcos VILLAGRA Masaki NAKANISHI Shigeru YAMASHITA Yasuhiko NAKASHIMA
In this paper we study quantum nondeterminism in multiparty communication. There are three (possibly) different types of nondeterminism in quantum computation: i) strong, ii) weak with classical proofs, and iii) weak with quantum proofs. Here we focus on the first one. A strong quantum nondeterministic protocol accepts a correct input with positive probability and rejects an incorrect input with probability 1. In this work we relate strong quantum nondeterministic multiparty communication complexity to the rank of the communication tensor in the Number-On-Forehead and Number-In-Hand models. In particular, by extending the definition proposed by de Wolf to nondeterministic tensor-rank (nrank), we show that for any boolean function f when there is no prior shared entanglement between the players, 1) in the Number-On-Forehead model the cost is upper-bounded by the logarithm of nrank(f); 2) in the Number-In-Hand model the cost is lower-bounded by the logarithm of nrank(f). Furthermore, we show that when the number of players is o(log log n), we have NQP
Mark MANULIS Koutarou SUZUKI Berkant USTAOGLU
We propose a security model, referred as g-eCK model, for group key exchange that captures essentially all non-trivial leakage of static and ephemeral secret keys of participants, i.e., group key exchange version of extended Canetti-Krawczyk (eCK) model. Moreover, we propose the first one-round tripartite key exchange (3KE) protocol secure in the g-eCK model under the gap Bilinear Diffie-Hellman (gap BDH) assumption and in the random oracle model.
Sun-Mi PARK Ku-Young CHANG Dowon HONG Changho SEO
In this paper, we derive a fast polynomial basis multiplier for GF(2m) defined by pentanomials xm+xk3+xk2+xk1+1 with 1 ≤ k1 < k2 < k3 ≤ m/2 using the presented method by Park and Chang. The proposed multiplier has the time delay TA+(2+⌈log2(m-1)⌉) TX or TA+(3+⌈log2(m-1)⌉) TX which is the lowest one compared with known multipliers for pentanomials except for special types, where TA and TX denote the delays of one AND gate and one XOR gate, respectively. On the other hand, its space complexity is very slightly greater than the best known results.
Disk arrays and prefetching schemes are used to mitigate the performance gap between main memory and disks. This paper presents a new problem that arises if prefetching schemes that are widely used in operation systems are applied to disk arrays. The key point of the problem is that block address space from the viewpoint of the host is contiguous but from that of the disk array it is discontiguous and thus more disk accesses than expected are required. This paper presents two ways to resolve the problem that arises from the Linux readahead framework. The proposed scheme prevents a readahead window from being split into multiple requests from the viewpoint of the disk array but not from the viewpoint of the host thereby reducing disk head movements. In addition, it outperforms the prior work by adopting an asynchronous solution, improving performance for fragmented files, eliminating readahead size restriction, and improving disk parallelism. We implemented the proposed scheme and integrated it with Linux. Our experiment shows that the solution significantly improved the original Linux readahead framework when a storage server processes multiple concurrent requests.
Network topology significantly affects network cost, path length, link load distribution, and reliability, so we need to consider multiple criteria with different units simultaneously when designing a network's topology. The analytic hierarchy process (AHP) is a technique of balancing multiple criteria in order to reach a rational decision. Using AHP, we can reflect the relative importance of each criterion on the evaluation result; therefore, we have applied it to network topology evaluation in past research. When evaluating network topologies using AHP, we need to construct the set of topology candidates prior to the evaluation. However, the time required to construct this set greatly increases as the network size grows. In this paper, we propose applying a binary partition approach for constructing a topology candidate set with dramatically reduced calculation time. To reduce the calculation time, we introduce an upper limit for the total link length. Although the results of AHP are affected by introducing the upper limit of the total link length, we show that desirable topologies are still selected in AHP.
Hideki KAWAGUCHI Kazunori MAEDA Shohei KODATE Yoshihiro ITO
Streak cameras are now widely used for measurements of ultra short phenomena, such as those in semi conductor luminescence and plasma gaseous discharge. To further improve the temporal resolution and carry out higher-dimensional measurements, it is necessary to understand the electron beam behavior in detail. Thus, numerical simulations play an important role in the analysis of the streak camera. The authors have been working on the development of a numerical simulation code that uses the finite difference method (FDM) for electric field analysis, the Runge-Kutta (R-K) method for charged particle motion determination, and the particle-in-cell (PIC) method for charge density calculation. However, the use of the PIC method leads to inaccuracy in the charge density calculation in cases of high-density electron beams. To improve the accuracy of the conventional analysis of the streak camera, we perform the boundary element (BE) analysis of the streak camera.
Bei HUANG Kaidi YOU Yun CHEN Zhiyi YU Xiaoyang ZENG
Reed-Solomon (RS) codes are widely used in digital communication and storage systems. Unlike usual VLSI approaches, this paper presents a high throughput fully programmable Reed-Solomon decoder on a multi-core processor. The multi-core processor platform is a 2-Dimension mesh array of Single Instruction Multiple Data (SIMD) cores, and it is well suited for digital communication applications. By fully extracting the parallelizable operations of the RS decoding process, we propose multiple optimization techniques to improve system throughput, including: task level parallelism on different cores, data level parallelism on each SIMD core, minimizing memory access, and route length minimized task mapping techniques. For RS(255, 239, 8), experimental results show that our 12-core implementation achieve a throughput of 4.35 Gbps, which is much better than several other published implementations. From the results, it is predictable that the throughput is linear with the number of cores by our approach.
Shotaro IWANAGA Shinji FUKUMA Shin-ichiro MORI
In this paper, a hybrid parallel implementation of inverse matrix computation using SMW formula is proposed. By aggregating the memory bandwidth in the hybrid parallel implementation, the bottleneck due to the memory bandwidth limitation in the authors previous multicore implementation has been dissolved. More than 8 times of speed up is also achieved with dual-core 8-nodes implementation which leads more than 20 simulation steps per second, or near real-time performance.
Yoshihiro ICHINOMIYA Tsuyoshi KIMURA Motoki AMAGASAKI Morihiro KUGA Masahiro IIDA Toshinori SUEYOSHI
SRAM-based field programmable gate arrays (FPGAs) are vulnerable to a soft-error induced by radiation. Techniques for designing dependable circuits, such as triple modular redundancy (TMR) with scrubbing, have been studied extensively. However, currently available evaluation techniques that can be used to check the dependability of these circuits are inadequate. Further, their results are restrictive because they do not represent the result in terms of general reliability indicator to decide whether the circuit is dependable. In this paper, we propose an evaluation method that provides results in terms of the realistic failure in time (FIT) by using reconfiguration-based fault-injection analysis. Current fault-injection analyses do not consider fault accumulation, and hence, they are not suitable for evaluating the dependability of a circuit such as a TMR circuit. Therefore, we configure an evaluation system that can handle fault-accumulation by using frame-based partial reconfiguration and the bootstrap method. By using the proposed method, we successfully evaluated a TMR circuit and could discuss the result in terms of realistic FIT data. Our method can evaluate the dependability of an actual system, and help with the tuning and selection in dependable system design.
Dajiang LIU Shouyi YIN Chongyong YIN Leibo LIU Shaojun WEI
Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.
Changwoo MIN Hyung Kook JUN Won Tae KIM Young Ik EOM
A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.
Shogo MORI Gosuke OHASHI Yoshifumi SHIMODAIRA
This study examines the robustness of image quality factors in various types of environment illumination using a parameter design in the field of quality engineering. Experimental results revealed that image quality factors are influenced by environment illuminations in the following order: minimum luminance, maximum luminance and gamma.
Hiromitsu AWANO Hiroshi TSUTSUI Hiroyuki OCHI Takashi SATO
Random telegraph noise (RTN) is a phenomenon that is considered to limit the reliability and performance of circuits using advanced devices. The time constants of carrier capture and emission and the associated change in the threshold voltage are important parameters commonly included in various models, but their extraction from time-domain observations has been a difficult task. In this study, we propose a statistical method for simultaneously estimating interrelated parameters: the time constants and magnitude of the threshold voltage shift. Our method is based on a graphical network representation, and the parameters are estimated using the Markov chain Monte Carlo method. Experimental application of the proposed method to synthetic and measured time-domain RTN signals was successful. The proposed method can handle interrelated parameters of multiple traps and thereby contributes to the construction of more accurate RTN models.
Arne KUTZNER Pok-Son KIM Won-Kwang PARK
We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log (+1)) element comparisons, where m and n are the sizes of the input sequences with m ≤ n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0 ≤ i ≤ l). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.
Mamoru OHARA Takashi YAMAGUCHI
In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massively-parallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.
Shunsuke ONO Takamichi MIYATA Isao YAMADA Katsunori YAMAOKA
Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.
A parameterization of perfect sequences over composition algebras over the real number field is presented. According to the proposed parameterization theorem, a perfect sequence can be represented as a sum of trigonometric functions and points on a unit sphere of the algebra. Because of the non-commutativity of the multiplication, there are two definitions of perfect sequences, but the equivalence of the definitions is easily shown using the theorem. A composition sequence of sequences is introduced. Despite the non-associativity, the proposed theorem reveals that the composition sequence from perfect sequences is perfect.