IEICE global.ieice.org Site

Keyword Search Result

[Keyword] PAR(2741hit)

1541-1560hit(2741hit)

A Design of AES Encryption Circuit with 128-bit Keys Using Look-Up Table Ring on FPGA
Hui QIN Tsutomu SASAO Yukihiro IGUCHI

PAPER-Computer Components

Vol:
E89-D No:3
Page(s):
1139-1147
This paper addresses a pipelined partial rolling (PPR) architecture for the AES encryption. The key technique is the PPR architecture. With the proposed architecture on the Altera Stratix FPGA, two PPR implementations achieve 6.45 Gbps throughput and 12.78 Gbps throughput, respectively. Compared with the unrolling implementation that achieves a throughput of 22.75 Gbps on the same FPGA, the two PPR implementations improve the memory efficiency (i.e., throughput divided by the size of memory for core) by 13.4% and 12.3%, respectively, and reduce the amount of the memory by 75% and 50%, respectively. Also, the PPR implementation has a up to 9.83% higher memory efficiency than the fastest previous FPGA implementation known to date. In terms of resource efficiency (i.e., throughput divided by the equivalent logic element or slice), one PPR implementation offers almost the same as the rolling implementation, and the other PPR implementation offers a medium value between the rolling implementation and the unrolling implementation that has the highest resource efficiency. However, the two PPR implementations can be implemented on the minimum-sized Stratix FPGA while the unrolling implementation cannot. The PPR architecture fills the gap between unrolling and rolling architectures and is suitable for small and medium-sized FPGAs.
Message Scheduling for Irregular Data Redistribution in Parallelizing Compilers
Hui WANG Minyi GUO Daming WEI

PAPER-Parallel/Distributed Programming Models, Paradigms and Tools

Vol:
E89-D No:2
Page(s):
418-424
In parallelizing compilers on distributed memory systems, distributions of irregular sized array blocks are provided for load balancing and irregular problems. The irregular data redistribution is different from the regular block-cyclic redistribution. This paper is devoted to scheduling message for irregular data redistribution that attempt to obtain suboptimal solutions while satisfying the minimal communication costs condition and the minimal step condition. Based on the list scheduling, an efficient algorithm is developed and its experimental results are compared with previous algorithms. The improved list algorithm provides more chance for conflict messages in its relocation phase, since it allocates conflict messages through methods used in a divide-and-conquer algorithm and a relocation algorithm proposed previously. The method of selecting the smallest relocation cost guarantees that the improved list algorithm is more efficient than the other two in average.
Development and Implementation of an Interactive Parallelization Assistance Tool for OpenMP: iPat/OMP
Makoto ISHIHARA Hiroki HONDA Mitsuhisa SATO

PAPER-Parallel/Distributed Programming Models, Paradigms and Tools

Vol:
E89-D No:2
Page(s):
399-407
iPat/OMP is an interactive parallelization assistance tool for OpenMP. In the present paper, we describe the design concept of iPat/OMP, the parallelization sequence achieved by the tool and its current implementation status. In addition, we present an evaluation of the performance of the implemented functionalities. The experimental results show that iPat/OMP can detect parallelism and create an appropriate OpenMP directive for several for-loops.
A Multi-Projector Display System with Virtual Camera Method for Distortion Correction on Quadric Surface Screens
Masato OGATA Hiroyuki WADA Kagenori KAJIHARA Jeroen van BAAR

PAPER-Computer Graphics

Vol:
E89-D No:2
Page(s):
814-824
Multi-projector technology has been under consideration in recent years. This technology allows the generation of wide field of view and high-resolution images in a cost-effective manner. It is expected to be applied extensively to training simulators where vivid immersive sensations and precision are required. However, in many systems the viewing frustums cannot be automatically assigned for distributed rendering, and the required manual setup is complicated and difficult. This is because the camera should be coincide exactly with a desired eye point to avoid perspective distortions. For the actual applications, the camera is seldom able to be set up at the desired eye point because of physical constraints, e.g., a narrow cockpit with many instruments. To resolve this issue, we have developed a "virtual camera method" that yields high-precision calibration regardless of the camera position. This method takes advantage of the quadratic nature of the display surface. We developed a practical real-time multi-projector display system for applications such as training simulators, that require high-accuracy in geometry and rapid response time.
Weaknesses of Two SAS-Like Password Authentication Schemes
Min-Hung CHIANG Wei-Chi KU

LETTER-Fundamental Theories for Communications

Vol:
E89-B No:2
Page(s):
594-597
In 2000, Sandirigama, Shimizu, and Noda proposed a simple password authentication scheme, SAS. However, SAS was later found to be flawed. Recently, Chen, Lee, Horng proposed two SAS-like schemes, which were claimed to be more secure than similar schemes. Herein, we show that both their schemes are still vulnerable to denial-of-service attacks. Additionally, Chen-Lee-Horng's second scheme is not easily reparable.
A Convergence Study of the Discrete FGDLS Algorithm
Sabin TABIRCA Tatiana TABIRCA Laurence T. YANG

PAPER-Parallel/Distributed Algorithms

Vol:
E89-D No:2
Page(s):
673-678
The Feedback-Guided Dynamic Loop Scheduling (FGDLS) algorithm [1] is a recent dynamic approach to the scheduling of a parallel loop within a sequential outer loop. Earlier papers have analysed convergence under the assumption that the workload is a positive, continuous, function of a continuous argument (the iteration number). However, this assumption is unrealistic since it is known that the iteration number is a discrete variable. In this paper we extend the proof of convergence of the algorithm to the case where the iteration number is treated as a discrete variable. We are able to establish convergence of the FGDLS algorithm for the case when the workload is monotonically decreasing.
Partial Key Exposure Attacks on Unbalanced RSA with the CRT
Hee Jung LEE Young-Ho PARK Taekyoung KWON

LETTER-Information Security

Vol:
E89-A No:2
Page(s):
626-629
In RSA public-key cryptosystem, a small private key is often preferred for efficiency but such a small key could degrade security. Thus the Chinese Remainder Theorem (CRT) is tactically used, especially in time-critical applications like smart cards. As for using the CRT in RSA, care must be taken to resist partial key exposure attacks. While it is common to choose two distinct primes with similar size in RSA, May has shown that a composite modulus N can be factored in the balanced RSA with the CRT of half of the least (or most) significant bits of a private key is revealed with a small public key. However, in the case that efficiency is more critical than security, such as smart cards, unbalanced primes might be chosen. Thus, we are interested in partial key exposure attacks to the unbalanced RSA with the CRT. In this paper, we obtain the similar results as the balanced RSA. We show that in the unbalanced RSA if the N1/4 least (or most) significant bits are revealed, a private key can be recovered in polynomial time under a small public key.
Controller/Precompiler for Portable Checkpointing
Gabriel RODRIGUEZ María J. MARTIN Patricia GONZALEZ Juan TOURIÑO

PAPER-Parallel/Distributed Programming Models, Paradigms and Tools

Vol:
E89-D No:2
Page(s):
408-417
This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up user's effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.
An Algorithm for Node-to-Set Disjoint Paths Problem in Bi-Rotator Graphs
Keiichi KANEKO

PAPER-Parallel/Distributed Algorithms

Vol:
E89-D No:2
Page(s):
647-653
An algorithm is described for solving the node-to-set disjoint paths problem in bi-rotator graphs, which are obtained by making each edge of a rotator graph bi-directional. The algorithm is of polynomial order of n for an n-bi-rotator graph. It is based on recursion and divided into three cases according to the distribution of destination nodes in the classes into which the nodes in a bi-rotator graph are categorized. We estimated that it obtains 2n-3 disjoint paths with a time complexity of O(n5), that the sum of the path lengths is O(n3), and that the length of the maximum path is O(n2). Computer experiment showed that the average execution time was O(n3.9) and, the average sum of the path lengths was O(n3.0).
Parity Placement Schemes to Facilitate Recovery from Triple Column Disk Failure in Disk Array Systems
Chih-Shing TAU Tzone-I WANG

PAPER-Coding Theory

Vol:
E89-A No:2
Page(s):
583-591
This paper presents two improved triple parity placement schemes, the HDD1 (Horizontal and Dual Diagonal) scheme and the HDD2 scheme, to enhance the reliability of a disk array system. Both the schemes can tolerate up to three column disk failures by using three types of parity information (horizontal, diagonal, and anti-diagonal parities) in a disk array. HDD1 scheme can decrease the frequency of bottlenecks because its horizontal and anti-diagonal parities are uniformly distributed over a disk array, with its diagonal parities placed in dedicated column disks. HDD2 scheme possesses one more column disks than HDD1 to store the horizontal parities and an additional diagonal parity; its anti-diagonal and diagonal parities are placed in the same way as in HDD1 scheme, only with a minor difference. The encoding and decoding algorithms of the two schemes are rather simple and straightforward, some steps of its procedure can even be executed in parallel, which makes the disk failure recovery faster.
A 385-500 GHz Low Noise Superconductor-Insulator- Superconductor Mixer for ALMA Band 8
Wenlei SHAN Shinichiro ASAYAMA Mamoru KAMIKURA Takashi NOGUCHI Shengcai SHI Yutaro SEKIMOTO

PAPER

Vol:
E89-C No:2
Page(s):
170-176
We report on the design and experimental results of a fix-tuned Superconductor-Insulator-Superconductor (SIS) mixer for Atacama Large Millimeter/submillimeter Array (ALMA) band 8 (385-500 GHz) receivers. Nb-based SIS junctions of a current density of 10 kA/cm2 and one micrometer size (fabricated with a two-step lift-off process) are employed to accomplish the ALMA receiver specification, which requires wide frequency coverage as well as low noise temperature. A parallel-connected twin-junction (PCTJ) is designed to resonate at the band center to tune out the junction geometric capacitance. A waveguide-microstrip probe is optimized to have nearly frequency-independent impedance at the probe's feed point, thereby making it easy to match the low-impedance PCTJ over a wide frequency band. The RF embedding impedance is retrieved by fitting the measured pumped I-V curves to confirm good matching between PCTJ and signal source. We demonstrate here a minimum double-sideband receiver noise temperature of 3 times of quantum limits for an intermediate-frequency range of 4-8 GHz. The mixers were measured in band 8 cartridge with a sideband separation scheme. Single-sideband receiver noise below ALMA specification was achieved over the whole band.
Independent Row-Oblique Parity for Double Disk Failure Correction
Chih-Shing TAU Tzone-I WANG

PAPER-Coding Theory

Vol:
E89-A No:2
Page(s):
592-599
This paper proposes a parity placement scheme, Row-Oblique Parity (ROP), for protecting against double disk failure in disk array systems. It stores all data unencoded, and uses only exclusive-or (XOR) operations to compute parity. ROP is provably optimal in computational complexity, both during construction and reconstruction. It is optimal in the capacity of redundant information stored and accessed. The simplicity of ROP allowed us to implement it within the current available RAID framework.
Novel Design of Microstrip Bandpass Filters with a Controllable Dual-Passband Response: Description and Implementation
Sheng SUN Lei ZHU

PAPER-Microwaves, Millimeter-Waves

Vol:
E89-C No:2
Page(s):
197-202
Novel microstrip dual-band bandpass filters with controllable fractional bandwidths and good in-between isolation are presented and implemented. A half-wavelength stepped-impedance resonator is firstly characterized, aiming at producing the two resonant frequencies at 2.4 and 5.2 GHz. Two types of coupled microstrip lines in the parallel and anti-parallel formats are then investigated in terms of unified equivalent J-inverter network. Extensive results are derived to quantitatively show their distinctive frequency-distributed coupling performances under different coupling lengths. The coupling degrees of these two coupled lines at the two resonances are properly adjusted to achieve the dual-passband response with varied or tunable bandwidths. In addition, the parallel coupled line is modeled to bring out a transmission zero between the two resonances so as to achieve the good in-between isolation. The three two-stage bandpass filters are initially designed to exhibit their dual-band response with changeable dual-band bandwidths. A three-stage dual-band filter is in final optimally designed and its predicted performance is confirmed in experiment.
Mapping of Hierarchical Parallel Genetic Algorithms for Protein Folding onto Computational Grids
Weiguo LIU Bertil SCHMIDT

PAPER-Grid Computing

Vol:
E89-D No:2
Page(s):
589-596
Genetic algorithms are a general problem-solving technique that has been widely used in computational biology. In this paper, we present a framework to map hierarchical parallel genetic algorithms for protein folding problems onto computational grids. By using this framework, the two level communication parts of hierarchical parallel genetic algorithms are separated. Thus both parts of the algorithm can evolve independently. This permits users to experiment with alternative communication models on different levels conveniently. The underlying programming techniques are based on generic programming, a programming technique suited for the generic representation of abstract concepts. This allows the framework to be built in a generic way at application level and thus provides good extensibility and flexibility. Experiments show that it can lead to significant runtime savings on PC clusters and computational grids.
Ultra Low Profile Dipole Antenna with a Simplified Feeding Structure and a Parasitic Element
Arpa THUMVICHIT Tadashi TAKANO Yukio KAMATA

PAPER-Antennas and Propagation

Vol:
E89-B No:2
Page(s):
576-580
This study is devoted to a half-wave dipole with a conductor plane at a distance much smaller than a quarter wavelength which we designate as an ultra low profile dipole (ULPD) antenna in this paper. The concerns of ULPD antenna are the feeding method and the impedance matching, because the input impedance usually tends to be lowered by the existence of a metallic structure in its proximity. In this paper, we propose a ULPD antenna with an excellent impedance matching and a coaxial feed built within the antenna structure so that the external matching and a balun are not required. A coaxial cable is used as a feed line and extended to be a half of a half wavelength dipole. The other half is made up of a parasitic element, which is connected to the outer conductor of the coaxial radiator. To make a matching, the outer conductor of the coaxial radiator is stripped off at a suitable length, and the total length of a dipole is considered for its resonance at a desired frequency of 2 GHz. The experiment has been conducted. The results show the return loss of -27 dB and the maximum gain of 9 dBi in the normal direction to the conductor plane. The computational results are also obtained, which agree well with the experimental results.
Entropy Based Evaluation of Communication Predictability in Parallel Applications
Alex K. JONES Jiang ZHENG Ahmed AMER

PAPER-Performance Evaluation

Vol:
E89-D No:2
Page(s):
469-478
The performance of parallel computing applications is highly dependent on the efficiency of the underlying communication operations. While often characterized as dynamic, these communication operations frequently exhibit spatial and temporal locality as well as regularity in structure. These characteristics can be exploited to improve communication performance if the correct prediction model is selected to a suitable communication topology. In this paper we describe an entropy based methodology for quantifying and evaluating the success of different prediction models on actual workloads drawn from representative parallel benchmarks. We evaluate two different prediction criteria and combinations thereof: (1) Messages are partitioned by source node. (2) Use of a first order context model. We also describe the threshold for predication designed to largely avoid incorrect predication overheads. Our results show for simple predication models, even on highly dynamic benchmark applications, predictability can be improved by several orders of magnitude. In fact, using simple prediction techniques, over 75% of the communication volume is accurately predictable.
A Coarse-Grain Hierarchical Technique for 2-Dimensional FFT on Configurable Parallel Computers
Xizhen XU Sotirios G. ZIAVRAS

PAPER-Parallel/Distributed Algorithms

Vol:
E89-D No:2
Page(s):
639-646
FPGAs (Field-Programmable Gate Arrays) have been widely used as coprocessors to boost the performance of data-intensive applications [1],[2]. However, there are several challenges to further boost FPGA performance: the communication overhead between the host workstation and the FPGAs can be substantial; large-scale applications cannot fit in a single FPGA because of its limited capacity; mapping an application algorithm to FPGAs still remains a daunting job in configurable system design. To circumvent these problems, we propose in this paper the FPGA-based Hierarchical-SIMD (H-SIMD) machine with its codesign of the Pyramidal Instruction Set Architecture (PISA). PISA comprises high-level instructions implemented as FPGA functions of coarse-grain SIMD (Single-Instruction, Multiple-Data) tasks to facilitate ease of program development, code portability across different H-SIMD implementations and high performance. We assume a multi-FPGA board where each FPGA is configured as a separate SIMD machine. Multiple FPGA chips can work in unison at a higher SIMD level, if needed, controlled by the host. Additionally, by using a memory switching scheme and the high-level PISA to partition applications into coarse-grain tasks, host-FPGA communication overheads can be hidden. We enlist the two-dimensional Fast Fourier Transform (2D FFT) to test the effectiveness of H-SIMD. The test results show sustained high performance for this problem. The H-SIMD machine even outperforms a Xeon processor for this problem.
Toward Incremental Parallelization Using Navigational Programming
Lei PAN Wenhui ZHANG Arthur ASUNCION Ming Kin LAI Michael B. DILLENCOURT Lubomir F. BIC Laurence T. YANG

PAPER-Parallel/Distributed Programming Models, Paradigms and Tools

Vol:
E89-D No:2
Page(s):
390-398
The Navigational Programming (NavP) methodology is based on the principle of self-migrating computations. It is a truly incremental methodology for developing parallel programs: each step represents a functioning program, and each intermediate program is an improvement over its predecessor. The transformations are mechanical and straightforward to apply. We illustrate our methodology in the context of matrix multiplication, showing how the transformations lead from a sequential program to a fully parallel program. The NavP methodology is conducive to new ways of thinking that lead to ease of programming and high performance. Even though our parallel algorithm was derived using a sequence of mechanical transformations, it displays certain performance advantages over the classical handcrafted Gentleman's Algorithm.
Lowering Error Floor of Irregular LDPC Codes by CRC and OSD Algorithm
Satoshi GOUNAI Tomoaki OHTSUKI

PAPER-Fundamental Theories for Communications

Vol:
E89-B No:1
Page(s):
1-10
Irregular Low-Density Parity-Check (LDPC) codes generally achieve better performance than regular LDPC codes at low Eb/N0 values. They have, however, higher error floors than regular LDPC codes. With respect to the construction of the irregular LDPC code, it can achieve the trade-off between the performance degradation of low Eb/N0 region and lowering error floor. It is known that a decoding algorithm can achieve very good performance if it combines the Ordered Statistic Decoding (OSD) algorithm and the Log Likelihood Ratio-Belief Propagation (LLR-BP) decoding algorithm. Unfortunately, all the codewords obtained by the OSD algorithm satisfy the parity check equation of the LDPC code. While we can not use the parity check equation of the LDPC code to stop the decoding process, the wrong codeword that satisfies the parity check equation raises the error floor. Once a codeword that satisfies the parity check equation is generated by the LLR-BP decoding algorithm, we regard that codeword as the final estimate and halt decoding; the OSD algorithm is not performed. In this paper, we propose a new encoding/decoding scheme to lower the error floor created by irregular LDPC codes. The proposed encoding scheme encodes information bits by Cyclic Redundancy Check (CRC) and LDPC code. The proposed decoding scheme, which consists of the LLR-BP decoding, CRC check, and OSD decoding, detects errors in the codewords obtained by the LLR-BP decoding algorithm and the OSD decoding algorithm using the parity check equations of LDPC codes and CRC. Computer simulations show that the proposed encoding/decoding scheme can lower the error floor of irregular LDPC codes.
Japanese Dependency Structure Analysis Using Information about Multiple Pauses and F₀
Meirong LU Kazuyuki TAKAGI Kazuhiko OZEKI

PAPER-Speech and Hearing

Vol:
E89-D No:1
Page(s):
298-304
Syntax and prosody are closely related to each other. This paper is concerned with the problem of exploiting pause information for recovering dependency structures of read Japanese sentences. Our parser can handle both symbolic information such as dependency rule and numerical information such as the probability of dependency distance of a phrase in a unified way as linguistic information. In our past work, post-phrase pause that immediately succeeds a phrase in question was employed as prosodic information. In this paper, we employed two kinds of pauses in addition to the post-phrase pause: post-post-phrase pause that immediately succeeds the phrase that follows a phrase in question, and pre-phrase pause that immediately precedes a phrase in question. By combining the three kinds of pause information linearly with the optimal combination weights that were determined experimentally, the parsing accuracy was improved compared to the case where only the post-phrase pause was used as in our previous work. Linear combination of pause and fundamental frequency information yielded further improvement of parsing accuracy.

1541-1560hit(2741hit)

Keyword Search Result

[Keyword] PAR(2741hit)

A Design of AES Encryption Circuit with 128-bit Keys Using Look-Up Table Ring on FPGA

Message Scheduling for Irregular Data Redistribution in Parallelizing Compilers

Development and Implementation of an Interactive Parallelization Assistance Tool for OpenMP: iPat/OMP

A Multi-Projector Display System with Virtual Camera Method for Distortion Correction on Quadric Surface Screens

Weaknesses of Two SAS-Like Password Authentication Schemes

A Convergence Study of the Discrete FGDLS Algorithm

Partial Key Exposure Attacks on Unbalanced RSA with the CRT

Controller/Precompiler for Portable Checkpointing

An Algorithm for Node-to-Set Disjoint Paths Problem in Bi-Rotator Graphs

Parity Placement Schemes to Facilitate Recovery from Triple Column Disk Failure in Disk Array Systems

A 385-500 GHz Low Noise Superconductor-Insulator- Superconductor Mixer for ALMA Band 8

Independent Row-Oblique Parity for Double Disk Failure Correction

Novel Design of Microstrip Bandpass Filters with a Controllable Dual-Passband Response: Description and Implementation

Mapping of Hierarchical Parallel Genetic Algorithms for Protein Folding onto Computational Grids

Ultra Low Profile Dipole Antenna with a Simplified Feeding Structure and a Parasitic Element

Entropy Based Evaluation of Communication Predictability in Parallel Applications

A Coarse-Grain Hierarchical Technique for 2-Dimensional FFT on Configurable Parallel Computers

Toward Incremental Parallelization Using Navigational Programming

Lowering Error Floor of Irregular LDPC Codes by CRC and OSD Algorithm

Japanese Dependency Structure Analysis Using Information about Multiple Pauses and F₀

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles