The search functionality is under construction.

Author Search Result

[Author] Naoki NISHI(7hit)

1-7hit
  • Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis

    Naoki NISHIKAWA  Keisuke IWAI  Hidema TANAKA  Takakazu KUROKAWA  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E97-D No:6
      Page(s):
    1506-1515

    Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption has remained a primary deterrent for such processing on devices of all sizes. However, GPU vendors are currently announcing their future roadmaps of GPU architecture development: Nvidia Corp. promotes the Kepler architecture and AMD Corp. emphasizes the GCN architecture. Therefore, we evaluated throughput and power efficiency of three 128-bit block ciphers on GPUs with recent Nvidia Kepler and AMD GCN architectures. From our experiments, whereas the throughput and per-watt throughput of AES-128 on Radeon HD 7970 (2048 cores) with GCN architecture are 205.0Gbps and 1.3Gbps/Watt respectively, those on Geforce GTX 680 (1536 cores) with Kepler architecture are, respectively, 63.9Gbps and 0.43Gbps/W; an approximately 3.2 times throughput difference occurs between AES-128 on the two GPUs. Next, we investigate the reasons for the throughput difference using our micro-benchmark suites. According to the results, we speculate that to ameliorate Kepler GPUs as co-processor of block ciphers, the arithmetic and logical instructions must be improved in terms of software and hardware.

  • HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU

    Keisuke IWAI  Naoki NISHIKAWA  Takakazu KUROKAWA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2575-2586

    Many-core computer systems with GPUs are coming into mainstream use from high-end computing, including supercomputers, to embedded processors. Consequently, the implementation of cryptographic methods on GPGPU is also becoming popular because of such systems' performance. However, many factors affect the performance of GPUs. To cope with this problem, we developed a new translator, HiCrypt, which can generate an optimized GPGPU program written in both of CUDA and OpenCL from a cipher program written in standard C language with directives. Users must annotate only variables and an encoding/decoding function, which are characteristics of cipher programs, with directives. To evaluate the translator, five representative cipher programs are translated into CUDA and OpenCL programs by the translator. Generated programs perform high throughput almost identical to hand optimized programs for all five cipher programs. HiCrypt will contribute to development and evaluate of new and various symmetric block ciphers using GPGPU.

  • Four-Channel Reciever optoelectronic Integrated Circuit Arrays for Optical Interconnections

    Hideki HAYASHI  Goro SASAKI  Hiroshi YANO  Naoki NISHIYAMA  Michio MURATA  

     
    PAPER

      Vol:
    E77-C No:1
      Page(s):
    23-29

    Ultrahigh speed and low crosstalk four-channel receiver optoelectronic integrated circuit (OEIC) arrays comprising GaInAs pin PDs and A1InAs/GaInAs HEMTs have been successfully fabricated on an InP substrate. These arrays were designed to have good crosstalk characteristics which are the most critical issue in array devices. The resistive-load OEIC arrays exhibited high speed operation up to 5 Gb/s and low crosstalk of less than -38 dB between whole adjacent channels over entire frequency range below 4.0 GHz. The average sensitivity of resistive-load OEIC arrays was -18.5 dBm at 3 Gb/s for a bit-error-rate of 10-9 over four channels. Good uniformity of device characteristics was obtained over 2-inch InP wafer. These results suggest that receiver OEIC arrays are quite promising for the application to high-speed multi-channel optical interconnections.

  • Practical Performance and Prospect of Underwater Optical Wireless Communication ——Results of Optical Characteristic Measurement at Visible Light Band under Water and Communication Tests with the Prototype Modem in the Sea—— Open Access

    Takao SAWA  Naoki NISHIMURA  Koji TOJO  Shin ITO  

     
    INVITED PAPER

      Vol:
    E102-A No:1
      Page(s):
    156-167

    Underwater optical wireless communication has been merely a theory for a long time because light sources are too weak to use them as emitters for communications. In the past decade, however, underwater optical wireless communications have used laser diodes or light emitting diodes as emitters with visible light in high brightness with low power consumption. Recently, they have become practical. As described in this paper, recent trends of underwater optical wireless communication study, practical modems and prospective uses of underwater optical wireless communication are presented first. Next, optical characteristics of the seawater in various conditions are explained based on the experimental data measured using the profiler for underwater optics produced especially for this study. Then the prototype underwater optical wireless communication modem developed by our team is introduced. It was tested in several sea areas, which confirmed bi-directional communication in the 120m range at 20Mbps and a remote desktop connection between under water vehicles at 100m range. In addition, one modem was set in air; other was set in water. The modems mutually communicated directly through the sea surface.

  • An Area-Effective Datapath Architecture for Embedded Microprocessors and Scalable Systems

    Toshiaki INOUE  Takashi MANABE  Sunao TORII  Satoshi MATSUSHITA  Masato EDAHIRO  Naoki NISHI  Masakazu YAMASHINA  

     
    INVITED PAPER

      Vol:
    E84-C No:8
      Page(s):
    1014-1020

    We have proposed area-reduction techniques for superscalar datapath architectures with 34 SIMD instructions and have developed an integer-media unit based on these techniques. The unit's design is both functionally asymmetrical and integer-SIMD unified, and the resulting savings in area are 27%-48% as compared to other, functionally equivalent mid-level microprocessor designs, with performance that is, at most, only 7.2% lower. Further, in 2-D IDCT processing, the unit outperforms embedded microprocessor designs without SIMD functions by 49%-118%. Specifically, effective area reduction of adders, shifters, and multiply-and-adders has been achieved by using the unified design. These area-effective techniques are useful for embedded microprocessors and scalable systems that employ highly parallel superscalar and on-chip parallel architectures. The integer-media unit has been implemented in an evaluation chip fabricated with 0.15-µm 5-metal CMOS technology.

  • Low-Temperature MBE Growth of a TlGaAs/GaAs Multiple Quantum-Well Structure

    Naoki NISHIMOTO  Nobuhiro KOBAYASHI  Naoyuki KAWASAKI  Yasuaki HIGUCHI  Yasutomo KAJIKAWA  

     
    PAPER

      Vol:
    E86-C No:10
      Page(s):
    2082-2084

    A TlGaAs/GaAs multiple quantum-well (MQW) structure having four identical well layers was grown on a GaAs (001) substrate by low-temperature molecular-beam epitaxy (MBE) at 190. The (004) X-ray diffraction (XRD) curve of this sample showed satellite peaks up to the 3rd order at least. The measured XRD curve agreed well with the theoretically simulated one with a Tl content of x=7% and a width of 57 for the TlxGa1-xAs well layers. This result indicates that the grown MQW structure has good single-crystalline quality as well as flat and sharp interfaces.

  • An Automatic Bi-Directional Bus Repeater Control Scheme Using Dynamic Collaborative Driving Techniques

    Masahiro NOMURA  Taku OHSAWA  Koichi TAKEDA  Yoetsu NAKAZAWA  Yoshinori HIROTA  Yasuhiko HAGIHARA  Naoki NISHI  

     
    PAPER-Interface and Interconnect Techniques

      Vol:
    E89-C No:3
      Page(s):
    334-341

    This paper describes a newly developed automatic direction control scheme for bi-directional bus repeaters that uses dynamic collaborative driving techniques. Repeater directions are rapidly determined by detecting the direction of control signal propagation through an additional control signal line that is driven by dynamic collaborative drivers. Application to an on-chip peripheral bus reduces control circuit transistor counts by about 75% and the number of control signal lines by about 50% without loss of speed. Experimental results for a 0.18-µm CMOS implementation indicate that the proposed scheme is four times faster than a conventional scheme with no bi-directional bus repeaters.