The search functionality is under construction.

Keyword Search Result

[Keyword] SPM(11hit)

1-11hit
  • Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV

    Siyi HU  Makiko ITO  Takahide YOSHIKAWA  Yuan HE  Hiroshi NAKAMURA  Masaaki KONDO  

     
    PAPER

      Pubricized:
    2023/07/20
      Vol:
    E106-D No:12
      Page(s):
    2015-2025

    Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.

  • AuGe-Alloy Source and Drain Formation by the Lift-Off Process for the Scaling of Bottom-Contact Type Pentacene-Based OFETs

    Shun-ichiro OHMI  Mizuha HIROKI  Yasutaka MAEDA  

     
    PAPER

      Vol:
    E102-C No:2
      Page(s):
    138-142

    The AuGe-alloy source and drain (S/D) formed on SiO2/Si(100) by the lithography process was investigated for the scaling of the organic field-effect transistors (OFETs) with bottom-contact geometry. The S/D was fabricated by the lift-off process utilizing the resist of OFPR. The OFETs with minimum channel length of 2.4 µm was successfully fabricated by the lift-off process. The fabrication yield of Au S/D was 57%, while it was increased to 93% and 100% in case of the Au-1%Ge and Au-7.4%Ge S/D, respectively. Although the mobility of the OFETs with Au-7.4%Ge S/D was decreased to 1.1×10-3 cm2/(Vs), it was able to be increased to 5.5×10-2 cm2/(Vs) by the surface cleaning utilizing H2SO4/H2O2 mixture solution (SPM) and post metallization annealing (PMA) after lift-off process, which was higher than that of OFET with Au S/D.

  • A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats

    Hang CUI  Shoichi HIRASAWA  Hiroaki KOBAYASHI  Hiroyuki TAKIZAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/13
      Vol:
    E101-D No:9
      Page(s):
    2307-2314

    Sparse Matrix-Vector multiplication (SpMV) is a computational kernel widely used in many applications. Because of the importance, many different implementations have been proposed to accelerate this computational kernel. The performance characteristics of those SpMV implementations are quite different, and it is basically difficult to select the implementation that has the best performance for a given sparse matrix without performance profiling. One existing approach to the SpMV best-code selection problem is by using manually-predefined features and a machine learning model for the selection. However, it is generally hard to manually define features that can perfectly express the characteristics of the original sparse matrix necessary for the code selection. Besides, some information loss would happen by using this approach. This paper hence presents an effective deep learning mechanism for SpMV code selection best suited for a given sparse matrix. Instead of using manually-predefined features of a sparse matrix, a feature image and a deep learning network are used to map each sparse matrix to the implementation, which is expected to have the best performance, in advance of the execution. The benefits of using the proposed mechanism are discussed by calculating the prediction accuracy and the performance. According to the evaluation, the proposed mechanism can select an optimal or suboptimal implementation for an unseen sparse matrix in the test data set in most cases. These results demonstrate that, by using deep learning, a whole sparse matrix can be used to do the best implementation prediction, and the prediction accuracy achieved by the proposed mechanism is higher than that of using predefined features.

  • Codebook Learning for Image Recognition Based on Parallel Key SIFT Analysis

    Feng YANG  Zheng MA  Mei XIE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2017/01/10
      Vol:
    E100-D No:4
      Page(s):
    927-930

    The quality of codebook is very important in visual image classification. In order to boost the classification performance, a scheme of codebook generation for scene image recognition based on parallel key SIFT analysis (PKSA) is presented in this paper. The method iteratively applies classical k-means clustering algorithm and similarity analysis to evaluate key SIFT descriptors (KSDs) from the input images, and generates the codebook by a relaxed k-means algorithm according to the set of KSDs. With the purpose of evaluating the performance of the PKSA scheme, the image feature vector is calculated by sparse code with Spatial Pyramid Matching (ScSPM) after the codebook is constructed. The PKSA-based ScSPM method is tested and compared on three public scene image datasets. The experimental results show the proposed scheme of PKSA can significantly save computational time and enhance categorization rate.

  • Quantization Error Improvement for Optical Quantization Using Dual Rail Configuration

    Tomotaka NAGASHIMA  Makoto HASEGAWA  Takuya MURAKAWA  Tsuyoshi KONISHI  

     
    PAPER-Optical A/D Conversion

      Vol:
    E98-C No:8
      Page(s):
    808-815

    We investigate a quantization error improvement technique using a dual rail configuration for optical quantization. Our proposed optical quantization uses intensity-to-wavelength conversion based on soliton self-frequency shift and spectral compression based on self-phase modulation. However, some unfavorable input peak power regions exist due to stagnations of wavelength shift or distortions of spectral compression. These phenomena could induce a serious quantization error and degrade the effective number of bit (ENOB). In this work, we propose a quantization error improvement technique which can make up for the unfavorable input peak power regions. We experimentally verify the quantization error improvement effect by the proposed technique in 6 bit optical quantization. The estimated ENOB is improved from 5.35 bit to 5.66 bit. In addition, we examine the XPM influence between counter-propagating pulses at high sampling rate. Experimental results and numerical simulation show that the XPM influence is negligible under ∼40 GS/s conditions.

  • Parallel Use of Dispersion Devices for Resolution Improvement of Optical Quantization at High Sampling Rate

    Tomotaka NAGASHIMA  Takema SATOH  Petre CATALIN  Kazuyoshi ITOH  Tsuyoshi KONISHI  

     
    PAPER

      Vol:
    E97-C No:7
      Page(s):
    787-794

    We investigate resolution improvement in optical quantization with keeping high sampling rate performance in optical sampling. Since our optical quantization approach uses power-to-wavelength conversion based on soliton self-frequency shift, a spectral compression can improve resolution in exchange for sampling rate degradation. In this work, we propose a different approach for resolution improvement by parallel use of dispersion devices so as to avoid sampling rate degradation. Additional use of different dispersion devices can assist the wavelength separation ability of an original dispersion device. We demonstrate the principle of resolution improvement in 3 bit optical quantization. Simulation results based on experimental evaluation of 3 bit optical quantization system shows 4 bit optical quantization is achieved by parallel use of dispersion devices in 3 bit optical quantization system. The maximum differential non-linearity (DNL) and integral non-linearity (INL) are 0.49 least significant bit (LSB) and 0.50 LSB, respectively. The effective number of bits (ENOB) estimated to 3.62 bit.

  • Dynamic Allocation of SPM Based on Time-Slotted Cache Conflict Graph for System Optimization

    Jianping WU  Ming LING  Yang ZHANG  Chen MEI  Huan WANG  

     
    PAPER-Computer System

      Vol:
    E95-D No:8
      Page(s):
    2039-2052

    This paper proposes a novel dynamic Scratch-pad Memory allocation strategy to optimize the energy consumption of the memory sub-system. Firstly, the whole program execution process is sliced into several time slots according to the temporal dimension; thereafter, a Time-Slotted Cache Conflict Graph (TSCCG) is introduced to model the behavior of Data Cache (D-Cache) conflicts within each time slot. Then, Integer Nonlinear Programming (INP) is implemented, which can avoid time-consuming linearization process, to select the most profitable data pages. Virtual Memory System (VMS) is adopted to remap those data pages, which will cause severe Cache conflicts within a time slot, to SPM. In order to minimize the swapping overhead of dynamic SPM allocation, a novel SPM controller with a tightly coupled DMA is introduced to issue the swapping operations without CPU's intervention. Last but not the least, this paper discusses the fluctuation of system energy profit based on different MMU page size as well as the Time Slot duration quantitatively. According to our design space exploration, the proposed method can optimize all of the data segments, including global data, heap and stack data in general, and reduce the total energy consumption by 27.28% on average, up to 55.22% with a marginal performance promotion. And comparing to the conventional static CCG (Cache Conflicts Graph), our approach can obtain 24.7% energy profit on average, up to 30.5% with a sight boost in performance.

  • Performance Analysis of Coherent Ultrashort Light Pulse CDMA Communication Systems with Nonlinear Optical Thresholder

    Yasutaka IGARASHI  Hiroyuki YASHIMA  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E89-B No:4
      Page(s):
    1205-1213

    We theoretically analyze the performance of coherent ultrashort light pulse code-division multiple-access (CDMA) communication systems with a nonlinear optical thresholder. The coherent ultrashort light pulse CDMA is a promising system for an optical local area network (LAN) due to its advantages of asynchronous transmission, high information security, multiple access capability, and optical processing. The nonlinear optical thresholder is based on frequency chirping induced by self-phase modulation (SPM) in optical fiber, and discriminates an ultrashort pulse from multiple access interference (MAI) with picosecond duration. The numerical results show that the thermal noise caused in a photodetector dominates the bit error rate (BER). BER decreases as the fiber length in the nonlinear thresholder and the photocurrent difference in the photodetector increase. Using the nonlinear optical thresholder allows for the response time of the photodetector to be at least 100 times the duration of the ultrashort pulses. We also show that the optimum cut-off frequency at the nonlinear thresholder to achieve the minimum BER increases with fiber length, the total number of users, and the load resistance in the photodetector.

  • Estimation of Multiple Coherent Source Locations by Using SPM Method Combined with Signal Subspace Fitting Technique

    Yuzo YOSHIMOTO  Kazumasa TAIRA  Kunio SAWAYA  Risaburo SATO  

     
    PAPER-Measurements

      Vol:
    E88-B No:8
      Page(s):
    3164-3169

    A visualization method of coherent source locations based on the Sampled Pattern Matching (SPM) method is described. Modified SPM method is proposed to improve the S/N, in which the measurement of the electric field distribution is repeated in appropriate time duration and eigenvalue decomposition of the covariance matrix is introduced. A combination of the modified SPM method with the Weighted Subspace Fitting (WSF) method is also proposed to estimate accurate source locations. A calibration technique by using a reference antenna to compensate the complex pattern of the receiving antenna is proposed. Experimental investigation to estimate source location for one dipole antenna and two dipole antennas is also made to demonstrate the validity of the proposed method.

  • Timing Jitter Characteristics of RZ Pulse Nonlinear Transmission on Dispersion Managed Fiber Link

    Kazuho ANDO  Masanori HANAWA  Mikio TAKAHARA  

     
    PAPER-Communication Systems

      Vol:
    E82-A No:10
      Page(s):
    2081-2088

    One of the limitation factors on the achievable distance for long-haul nonlinear Return-to-Zero (RZ)-Gaussian pulse transmission on optical fiber links is timing jitter. Although it is well known that the dispersion management technique is very effective to reduce the timing jitter, comparisons among some dispersion management methods based on the timing jitter reduction have not been reported yet. In this paper, timing jitter reduction by some dispersion management methods in nonlinear RZ-Gaussian pulse transmission systems are discussed. Moreover, we will report that the amount of timing jitter at the receiver side drastically changes depending on the configuration of dispersion managed optical fiber transmission line.

  • CNV Based Intermedia Synchronization Mechanism under High Speed Communication Environment

    Chan-Hyun YOUN  Yoshiaki NEMOTO  Shoichi NOGUCHI  

     
    PAPER-Communication Networks and Service

      Vol:
    E76-B No:6
      Page(s):
    634-645

    In this paper, we discuss to the intermedia synchronization problems for high speed multimedia communication. Especially, we described how software synchronization can be operated, and estimated the skew bound in CNV when considering the network delay. And we applied CNV to the intermedia synchronization and a hybrid model (HSM) is proposed. Furthermore, we used the statistical approach to evaluate the performance of the synchronization mechanisms. The results of performance evaluation show that HSM has good performance in the probability of estimation error.