IEICE global.ieice.org Site

Author Search Result

[Author] Naoki NISHI(7hit)

1-7hit

Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis
Naoki NISHIKAWA Keisuke IWAI Hidema TANAKA Takakazu KUROKAWA

PAPER-Fundamentals of Information Systems

Vol:
E97-D No:6
Page(s):
1506-1515
Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption has remained a primary deterrent for such processing on devices of all sizes. However, GPU vendors are currently announcing their future roadmaps of GPU architecture development: Nvidia Corp. promotes the Kepler architecture and AMD Corp. emphasizes the GCN architecture. Therefore, we evaluated throughput and power efficiency of three 128-bit block ciphers on GPUs with recent Nvidia Kepler and AMD GCN architectures. From our experiments, whereas the throughput and per-watt throughput of AES-128 on Radeon HD 7970 (2048 cores) with GCN architecture are 205.0Gbps and 1.3Gbps/Watt respectively, those on Geforce GTX 680 (1536 cores) with Kepler architecture are, respectively, 63.9Gbps and 0.43Gbps/W; an approximately 3.2 times throughput difference occurs between AES-128 on the two GPUs. Next, we investigate the reasons for the throughput difference using our micro-benchmark suites. According to the results, we speculate that to ameliorate Kepler GPUs as co-processor of block ciphers, the arithmetic and logical instructions must be improved in terms of software and hardware.
HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU
Keisuke IWAI Naoki NISHIKAWA Takakazu KUROKAWA

PAPER

Vol:
E96-D No:12
Page(s):
2575-2586
Many-core computer systems with GPUs are coming into mainstream use from high-end computing, including supercomputers, to embedded processors. Consequently, the implementation of cryptographic methods on GPGPU is also becoming popular because of such systems' performance. However, many factors affect the performance of GPUs. To cope with this problem, we developed a new translator, HiCrypt, which can generate an optimized GPGPU program written in both of CUDA and OpenCL from a cipher program written in standard C language with directives. Users must annotate only variables and an encoding/decoding function, which are characteristics of cipher programs, with directives. To evaluate the translator, five representative cipher programs are translated into CUDA and OpenCL programs by the translator. Generated programs perform high throughput almost identical to hand optimized programs for all five cipher programs. HiCrypt will contribute to development and evaluate of new and various symmetric block ciphers using GPGPU.
Four-Channel Reciever optoelectronic Integrated Circuit Arrays for Optical Interconnections
Hideki HAYASHI Goro SASAKI Hiroshi YANO Naoki NISHIYAMA Michio MURATA

PAPER

Vol:
E77-C No:1
Page(s):
23-29
Ultrahigh speed and low crosstalk four-channel receiver optoelectronic integrated circuit (OEIC) arrays comprising GaInAs pin PDs and A1InAs/GaInAs HEMTs have been successfully fabricated on an InP substrate. These arrays were designed to have good crosstalk characteristics which are the most critical issue in array devices. The resistive-load OEIC arrays exhibited high speed operation up to 5 Gb/s and low crosstalk of less than -38 dB between whole adjacent channels over entire frequency range below 4.0 GHz. The average sensitivity of resistive-load OEIC arrays was -18.5 dBm at 3 Gb/s for a bit-error-rate of 10-9 over four channels. Good uniformity of device characteristics was obtained over 2-inch InP wafer. These results suggest that receiver OEIC arrays are quite promising for the application to high-speed multi-channel optical interconnections.
Practical Performance and Prospect of Underwater Optical Wireless Communication ——Results of Optical Characteristic Measurement at Visible Light Band under Water and Communication Tests with the Prototype Modem in the Sea—— Open Access
Takao SAWA Naoki NISHIMURA Koji TOJO Shin ITO

INVITED PAPER

Vol:
E102-A No:1
Page(s):
156-167
Underwater optical wireless communication has been merely a theory for a long time because light sources are too weak to use them as emitters for communications. In the past decade, however, underwater optical wireless communications have used laser diodes or light emitting diodes as emitters with visible light in high brightness with low power consumption. Recently, they have become practical. As described in this paper, recent trends of underwater optical wireless communication study, practical modems and prospective uses of underwater optical wireless communication are presented first. Next, optical characteristics of the seawater in various conditions are explained based on the experimental data measured using the profiler for underwater optics produced especially for this study. Then the prototype underwater optical wireless communication modem developed by our team is introduced. It was tested in several sea areas, which confirmed bi-directional communication in the 120m range at 20Mbps and a remote desktop connection between under water vehicles at 100m range. In addition, one modem was set in air; other was set in water. The modems mutually communicated directly through the sea surface.
An Area-Effective Datapath Architecture for Embedded Microprocessors and Scalable Systems
Toshiaki INOUE Takashi MANABE Sunao TORII Satoshi MATSUSHITA Masato EDAHIRO Naoki NISHI Masakazu YAMASHINA

INVITED PAPER

Vol:
E84-C No:8
Page(s):
1014-1020
We have proposed area-reduction techniques for superscalar datapath architectures with 34 SIMD instructions and have developed an integer-media unit based on these techniques. The unit's design is both functionally asymmetrical and integer-SIMD unified, and the resulting savings in area are 27%-48% as compared to other, functionally equivalent mid-level microprocessor designs, with performance that is, at most, only 7.2% lower. Further, in 2-D IDCT processing, the unit outperforms embedded microprocessor designs without SIMD functions by 49%-118%. Specifically, effective area reduction of adders, shifters, and multiply-and-adders has been achieved by using the unified design. These area-effective techniques are useful for embedded microprocessors and scalable systems that employ highly parallel superscalar and on-chip parallel architectures. The integer-media unit has been implemented in an evaluation chip fabricated with 0.15-µm 5-metal CMOS technology.
Low-Temperature MBE Growth of a TlGaAs/GaAs Multiple Quantum-Well Structure
Naoki NISHIMOTO Nobuhiro KOBAYASHI Naoyuki KAWASAKI Yasuaki HIGUCHI Yasutomo KAJIKAWA

PAPER

Vol:
E86-C No:10
Page(s):
2082-2084
A TlGaAs/GaAs multiple quantum-well (MQW) structure having four identical well layers was grown on a GaAs (001) substrate by low-temperature molecular-beam epitaxy (MBE) at 190. The (004) X-ray diffraction (XRD) curve of this sample showed satellite peaks up to the 3rd order at least. The measured XRD curve agreed well with the theoretically simulated one with a Tl content of x=7% and a width of 57 for the TlxGa1-xAs well layers. This result indicates that the grown MQW structure has good single-crystalline quality as well as flat and sharp interfaces.
An Automatic Bi-Directional Bus Repeater Control Scheme Using Dynamic Collaborative Driving Techniques
Masahiro NOMURA Taku OHSAWA Koichi TAKEDA Yoetsu NAKAZAWA Yoshinori HIROTA Yasuhiko HAGIHARA Naoki NISHI

PAPER-Interface and Interconnect Techniques

Vol:
E89-C No:3
Page(s):
334-341
This paper describes a newly developed automatic direction control scheme for bi-directional bus repeaters that uses dynamic collaborative driving techniques. Repeater directions are rapidly determined by detecting the direction of control signal propagation through an additional control signal line that is driven by dynamic collaborative drivers. Application to an on-chip peripheral bus reduces control circuit transistor counts by about 75% and the number of control signal lines by about 50% without loss of speed. Experimental results for a 0.18-µm CMOS implementation indicate that the proposed scheme is four times faster than a conventional scheme with no bi-directional bus repeaters.

Author Search Result

[Author] Naoki NISHI(7hit)

Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis

HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU

Four-Channel Reciever optoelectronic Integrated Circuit Arrays for Optical Interconnections

Practical Performance and Prospect of Underwater Optical Wireless Communication ——Results of Optical Characteristic Measurement at Visible Light Band under Water and Communication Tests with the Prototype Modem in the Sea—— Open Access

An Area-Effective Datapath Architecture for Embedded Microprocessors and Scalable Systems

Low-Temperature MBE Growth of a TlGaAs/GaAs Multiple Quantum-Well Structure

An Automatic Bi-Directional Bus Repeater Control Scheme Using Dynamic Collaborative Driving Techniques

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles