The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] RP(993hit)

1-20hit(993hit)

  • Deep Learning-Inspired Automatic Minutiae Extraction from Semi-Automated Annotations Open Access

    Hongtian ZHAO  Hua YANG  Shibao ZHENG  

     
    PAPER-Vision

      Pubricized:
    2024/04/05
      Vol:
    E107-A No:9
      Page(s):
    1509-1521

    Minutiae pattern extraction plays a crucial role in fingerprint registration and identification for electronic applications. However, the extraction accuracy is seriously compromised by the presence of contaminated ridge lines and complex background scenarios. General image processing-based methods, which rely on many prior hypotheses, fail to effectively handle minutiae extraction in complex scenarios. Previous works have shown that CNN-based methods can perform well in object detection tasks. However, the deep neural networks (DNNs)-based methods are restricted by the limitation of public labeled datasets due to legitimate privacy concerns. To address these challenges comprehensively, this paper presents a fully automated minutiae extraction method leveraging DNNs. Firstly, we create a fingerprint minutiae dataset using a semi-automated minutiae annotation algorithm. Subsequently, we propose a minutiae extraction model based on Residual Networks (Resnet) that enables end-to-end prediction of minutiae. Moreover, we introduce a novel non-maximal suppression (NMS) procedure, guided by the Generalized Intersection over Union (GIoU) metric, during the inference phase to effectively handle outliers. Experimental evaluations conducted on the NIST SD4 and FVC 2004 databases demonstrate the superiority of the proposed method over existing state-of-the-art minutiae extraction approaches.

  • Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation Open Access

    Hongliang FU  Qianqian LI  Huawei TAO  Chunhua ZHU  Yue XIE  Ruxue GUO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2024/04/12
      Vol:
    E107-D No:8
      Page(s):
    1097-1100

    Speech emotion recognition (SER) is a key research technology to realize the third generation of artificial intelligence, which is widely used in human-computer interaction, emotion diagnosis, interpersonal communication and other fields. However, the aliasing of language and semantic information in speech tends to distort the alignment of emotion features, which affects the performance of cross-corpus SER system. This paper proposes a cross-corpus SER model based on causal emotion information representation (CEIR). The model uses the reconstruction loss of the deep autoencoder network and the source domain label information to realize the preliminary separation of causal features. Then, the causal correlation matrix is constructed, and the local maximum mean difference (LMMD) feature alignment technology is combined to make the causal features of different dimensions jointly distributed independent. Finally, the supervised fine-tuning of labeled data is used to achieve effective extraction of causal emotion information. The experimental results show that the average unweighted average recall (UAR) of the proposed algorithm is increased by 3.4% to 7.01% compared with the latest partial algorithms in the field.

  • Investigating and Enhancing the Neural Distinguisher for Differential Cryptanalysis Open Access

    Gao WANG  Gaoli WANG  Siwei SUN  

     
    PAPER-Information Network

      Pubricized:
    2024/04/12
      Vol:
    E107-D No:8
      Page(s):
    1016-1028

    At Crypto 2019, Gohr first adopted the neural distinguisher for differential cryptanalysis, and since then, this work received increasing attention. However, most of the existing work focuses on improving and applying the neural distinguisher, the studies delving into the intrinsic principles of neural distinguishers are finite. At Eurocrypt 2021, Benamira et al. conducted a study on Gohr’s neural distinguisher. But for the neural distinguishers proposed later, such as the r-round neural distinguishers trained with k ciphertext pairs or ciphertext differences, denoted as NDcpk_r (Gohr’s neural distinguisher is the special NDcpk_r with K = 1) and NDcdk_r , such research is lacking. In this work, we devote ourselves to study the intrinsic principles and relationship between NDcdk_r and NDcpk_r. Firstly, we explore the working principle of NDcd1_r through a series of experiments and find that it strongly relies on the probability distribution of ciphertext differences. Its operational mechanism bears a strong resemblance to that of NDcp1_r given by Benamira et al.. Therefore, we further compare them from the perspective of differential cryptanalysis and sample features, demonstrating the superior performance of NDcp1_r can be attributed to the relationships between certain ciphertext bits, especially the significant bits. We then extend our investigation to NDcpk_r, and show that its ability to recognize samples heavily relies on the average differential probability of k ciphertext pairs and some relationships in the ciphertext itself, but the reliance between k ciphertext pairs is very weak. Finally, in light of the findings of our research, we introduce a strategy to enhance the accuracy of the neural distinguisher by using a fixed difference to generate the negative samples instead of the random one. Through the implementation of this approach, we manage to improve the accuracy of the neural distinguishers by approximately 2% to 8% for 7-round Speck32/64 and 9-round Simon32/64.

  • CPNet: Covariance-Improved Prototype Network for Limited Samples Masked Face Recognition Using Few-Shot Learning Open Access

    Sendren Sheng-Dong XU  Albertus Andrie CHRISTIAN  Chien-Peng HO  Shun-Long WENG  

     
    PAPER-Image

      Pubricized:
    2023/12/11
      Vol:
    E107-A No:8
      Page(s):
    1296-1308

    During the COVID-19 pandemic, a robust system for masked face recognition has been required. Most existing solutions used many samples per identity for the model to recognize, but the processes involved are very laborious in a real-life scenario. Therefore, we propose “CPNet” as a suitable and reliable way of recognizing masked faces from only a few samples per identity. The prototype classifier uses a few-shot learning paradigm to perform the recognition process. To handle complex and occluded facial features, we incorporated the covariance structure of the classes to refine the class distance calculation. We also used sharpness-aware minimization (SAM) to improve the classifier. Extensive in-depth experiments on a variety of datasets show that our method achieves remarkable results with accuracy as high as 95.3%, which is 3.4% higher than that of the baseline prototype network used for comparison.

  • Privacy Preserving Function Evaluation Using Lookup Tables with Word-Wise FHE Open Access

    Ruixiao LI  Hayato YAMANA  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/11/16
      Vol:
    E107-A No:8
      Page(s):
    1163-1177

    Homomorphic encryption (HE) is a promising approach for privacy-preserving applications, enabling a third party to assess functions on encrypted data. However, problems persist in implementing privacy-preserving applications through HE, including 1) long function evaluation latency and 2) limited HE primitives only allowing us to perform additions and multiplications. A homomorphic lookup-table (LUT) method has emerged to solve the above problems and enhance function evaluation efficiency. By leveraging homomorphic LUTs, intricate operations can be substituted. Previously proposed LUTs use bit-wise HE, such as TFHE, to evaluate single-input functions. However, the latency increases with the bit-length of the function’s input(s) and output. Additionally, an efficient implementation of multi-input functions remains an open question. This paper proposes a novel LUT-based privacy-preserving function evaluation method to handle multi-input functions while reducing the latency by adopting word-wise HE. Our optimization strategy adjusts table sizes to minimize the latency while preserving function output accuracy, especially for common machine-learning functions. Through our experimental evaluation utilizing the BFV scheme of the Microsoft SEAL library, we confirmed the runtime of arbitrary functions whose LUTs consist of all input-output combinations represented by given input bits: 1) single-input 12-bit functions in 0.14 s, 2) single-input 18-bit functions in 2.53 s, 3) two-input 6-bit functions in 0.17 s, and 4) three-input 4-bit functions in 0.20 s, employing four threads. Besides, we confirmed that our proposed table size optimization strategy worked well, achieving 1.2 times speed up with the same absolute error of order of magnitude of -4 (a × 10-4 where 1/$\sqrt{10}$ ≤ a < $\sqrt{10})$ for Swish and 1.9 times speed up for ReLU while decreasing the absolute error from order -2 to -4 compared to the baseline, i.e., polynomial approximation.

  • Estimation of Drone Payloads Using Millimeter-Wave Fast-Chirp-Modulation MIMO Radar Open Access

    Kenshi OGAWA  Masashi KUROSAKI  Ryohei NAKAMURA  

     
    PAPER-Sensing

      Vol:
    E107-B No:5
      Page(s):
    419-428

    With the development of drone technology, concerns have arisen about the possibility of drones being equipped with threat payloads for terrorism and other crimes. A drone detection system that can detect drones carrying payloads is needed. A drone’s propeller rotation frequency increases with payload weight. Therefore, a method for estimating propeller rotation frequency will effectively detect the presence or absence of a payload and its weight. In this paper, we propose a method for classifying the payload weight of a drone by estimating its propeller rotation frequency from radar images obtained using a millimeter-wave fast-chirp-modulation multiple-input and multiple-output (MIMO) radar. For each drone model, the proposed method requires a pre-prepared reference dataset that establishes the relationships between the payload weight and propeller rotation frequency. Two experimental measurement cases were conducted to investigate the effectiveness of our proposal. In case 1, we assessed four drones (DJI Matrice 600, DJI Phantom 3, DJI Mavic Pro, and DJI Mavic Mini) to determine whether the propeller rotation frequency of any drone could be correctly estimated. In case 2, experiments were conducted on a hovering Phantom 3 drone with several payloads in a stable position for calculating the accuracy of the payload weight classification. The experimental results indicated that the proposed method could estimate the propeller rotation frequency of any drone and classify payloads in a 250 g step with high accuracy.

  • Grid Sample Based Temporal Iteration for Fully Pipelined 1-ms SLIC Superpixel Segmentation System Open Access

    Yuan LI  Tingting HU  Ryuji FUCHIKAMI  Takeshi IKENAGA  

     
    PAPER-Computer System

      Pubricized:
    2023/12/19
      Vol:
    E107-D No:4
      Page(s):
    515-524

    A 1 millisecond (1-ms) vision system, which processes videos at 1000 frames per second (FPS) within 1 ms/frame delay, plays an increasingly important role in fields such as robotics and factory automation. Superpixel as one of the most extensively employed image oversegmentation methods is a crucial pre-processing step for reducing computations in various computer vision applications. Among the different superpixel methods, simple linear iterative clustering (SLIC) has gained widespread adoption due to its simplicity, effectiveness, and computational efficiency. However, the iterative assignment and update steps in SLIC make it challenging to achieve high processing speed. To address this limitation and develop a SLIC superpixel segmentation system with a 1 ms delay, this paper proposes grid sample based temporal iteration. By leveraging the high frame rate of the input video, the proposed method distributes the iterations into the temporal domain, ensuring that the system's delay keeps within one frame. Additionally, grid sample information is added as initialization information to the obtained superpixel centers for enhancing the stability of superpixels. Furthermore, a selective label propagation based pipeline architecture is proposed for parallel computation of all the possibilities of label propagation. This eliminates data dependency between adjacent pixels and enables a fully pipelined system. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC algorithm. When considering label consistency, the proposed system surpasses the performance of state-of-the-art superpixel segmentation methods. Moreover, in terms of hardware performance, the proposed system processes 1000 FPS images with 0.985 ms/frame delay.

  • Efficient Construction of Encoding Polynomials in a Distributed Coded Computing Scheme

    Daisuke HIBINO  Tomoharu SHIBUYA  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/08/10
      Vol:
    E107-A No:3
      Page(s):
    476-485

    Distributed computing is one of the powerful solutions for computational tasks that need the massive size of dataset. Lagrange coded computing (LCC), proposed by Yu et al. [15], realizes private and secure distributed computing under the existence of stragglers, malicious workers, and colluding workers by using an encoding polynomial. Since the encoding polynomial depends on a dataset, it must be updated every arrival of new dataset. Therefore, it is necessary to employ efficient algorithm to construct the encoding polynomial. In this paper, we propose Newton coded computing (NCC) which is based on Newton interpolation to construct the encoding polynomial. Let K, L, and T be the number of data, the length of each data, and the number of colluding workers, respectively. Then, the computational complexity for construction of an encoding polynomial is improved from O(L(K+T)log 2(K+T)log log (K+T)) for LCC to O(L(K+T)log (K+T)) for the proposed method. Furthermore, by applying the proposed method, the computational complexity for updating the encoding polynomial is improved from O(L(K+T)log 2(K+T)log log (K+T)) for LCC to O(L) for the proposed method.

  • Efficient Homomorphic Evaluation of Arbitrary Uni/Bivariate Integer Functions and Their Applications

    Daisuke MAEDA  Koki MORIMURA  Shintaro NARISADA  Kazuhide FUKUSHIMA  Takashi NISHIDE  

     
    PAPER

      Pubricized:
    2023/09/14
      Vol:
    E107-A No:3
      Page(s):
    234-247

    We propose how to homomorphically evaluate arbitrary univariate and bivariate integer functions such as division. A prior work proposed by Okada et al. (WISTP'18) uses polynomial evaluations such that the scheme is still compatible with the SIMD operations in BFV and BGV schemes, and is implemented with the input domain ℤ257. However, the scheme of Okada et al. requires the quadratic numbers of plaintext-ciphertext multiplications and ciphertext-ciphertext additions in the input domain size, and although these operations are more lightweight than the ciphertext-ciphertext multiplication, the quadratic complexity makes handling larger inputs quite inefficient. In this work, first we improve the prior work and also propose a new approach that exploits the packing method to handle the larger input domain size instead of enabling the SIMD operation, thus making it possible to work with the larger input domain size, e.g., ℤ215 in a reasonably efficient way. In addition, we show how to slightly extend the input domain size to ℤ216 with a relatively moderate overhead. Further we show another approach to handling the larger input domain size by using two ciphertexts to encrypt one integer plaintext and applying our techniques for uni/bivariate function evaluation. We implement the prior work of Okada et al., our improved version of Okada et al., and our new scheme in PALISADE with the input domain ℤ215, and confirm that the estimated run-times of the prior work and our improved version of the prior work are still about 117 days and 59 days respectively while our new scheme can be computed in 307 seconds.

  • On Extension of Evaluation Algorithms in Keyed-Homomorphic Encryption

    Hirotomo SHINOKI  Koji NUIDA  

     
    PAPER

      Pubricized:
    2023/06/27
      Vol:
    E107-A No:3
      Page(s):
    218-233

    Homomorphic encryption (HE) is public key encryption that enables computation over ciphertexts without decrypting them. To overcome an issue that HE cannot achieve IND-CCA2 security, the notion of keyed-homomorphic encryption (KH-PKE) was introduced (Emura et al., PKC 2013), which has a separate homomorphic evaluation key and can achieve stronger security named KH-CCA security. The contributions of this paper are twofold. First, recall that the syntax of KH-PKE assumes that homomorphic evaluation is performed for single operations, and KH-CCA security was formulated based on this syntax. Consequently, if the homomorphic evaluation algorithm is enhanced in a way of gathering up sequential operations as a single evaluation, then it is not obvious whether or not KH-CCA security is preserved. In this paper, we show that KH-CCA security is in general not preserved under such modification, while KH-CCA security is preserved when the original scheme additionally satisfies circuit privacy. Secondly, Catalano and Fiore (ACM CCS 2015) proposed a conversion method from linearly HE schemes into two-level HE schemes, the latter admitting addition and a single multiplication for ciphertexts. In this paper, we extend the conversion to the case of linearly KH-PKE schemes to obtain two-level KH-PKE schemes. Moreover, based on the generalized version of Catalano-Fiore conversion, we also construct a similar conversion from d-level KH-PKE schemes into 2d-level KH-PKE schemes.

  • CMND: Consistent-Aware Multi-Server Network Design Model for Delay-Sensitive Applications

    Akio KAWABATA  Bijoy CHAND CHATTERJEE  Eiji OKI  

     
    PAPER-Network System

      Vol:
    E107-B No:3
      Page(s):
    321-329

    This paper proposes a network design model, considering data consistency for a delay-sensitive distributed processing system. The data consistency is determined by collating the own state and the states of slave servers. If the state is mismatched with other servers, the rollback process is initiated to modify the state to guarantee data consistency. In the proposed model, the selected servers and the master-slave server pairs are determined to minimize the end-to-end delay and the delay for data consistency. We formulate the proposed model as an integer linear programming problem. We evaluate the delay performance and computation time. We evaluate the proposed model in two network models with two, three, and four slave servers. The proposed model reduces the delay for data consistency by up to 31 percent compared to that of a typical model that collates the status of all servers at one master server. The computation time is a few seconds, which is an acceptable time for network design before service launch. These results indicate that the proposed model is effective for delay-sensitive applications.

  • Giving a Quasi-Initial Solution to Ising Machines by Controlling External Magnetic Field Coefficients

    Soma KAWAKAMI  Kentaro OHNO  Dema BA  Satoshi YAGI  Junji TERAMOTO  Nozomu TOGAWA  

     
    PAPER

      Pubricized:
    2023/08/16
      Vol:
    E107-A No:1
      Page(s):
    52-62

    Ising machines can find optimum or quasi-optimum solutions of combinatorial optimization problems efficiently and effectively. It is known that, when a good initial solution is given to an Ising machine, we can finally obtain a solution closer to the optimal solution. However, several Ising machines cannot directly accept an initial solution due to its computational nature. In this paper, we propose a method to give quasi-initial solutions into Ising machines that cannot directly accept them. The proposed method gives the positive or negative external magnetic field coefficients (magnetic field controlling term) based on the initial solutions and obtains a solution by using an Ising machine. Then, the magnetic field controlling term is re-calculated every time an Ising machine repeats the annealing process, and hence the solution is repeatedly improved on the basis of the previously obtained solution. The proposed method is applied to the capacitated vehicle routing problem with an additional constraint (constrained CVRP) and the max-cut problem. Experimental results show that the total path distance is reduced by 5.78% on average compared to the initial solution in the constrained CVRP and the sum of cut-edge weight is increased by 1.25% on average in the max-cut problem.

  • High Precision Fingerprint Verification for Small Area Sensor Based on Deep Learning

    Nabilah SHABRINA  Dongju LI  Tsuyoshi ISSHIKI  

     
    PAPER-Biometrics

      Pubricized:
    2023/06/26
      Vol:
    E107-A No:1
      Page(s):
    157-168

    The fingerprint verification system is widely used in mobile devices because of fingerprint's distinctive features and ease of capture. Typically, mobile devices utilize small sensors, which have limited area, to capture fingerprint. Meanwhile, conventional fingerprint feature extraction methods need detailed fingerprint information, which is unsuitable for those small sensors. This paper proposes a novel fingerprint verification method for small area sensors based on deep learning. A systematic method combines deep convolutional neural network (DCNN) in a Siamese network for feature extraction and XGBoost for fingerprint similarity training. In addition, a padding technique also introduced to avoid wraparound error problem. Experimental results show that the method achieves an improved accuracy of 66.6% and 22.6% in the FingerPassDB7 and FVC2006DB1B dataset, respectively, compared to the existing methods.

  • A Simple Design of Reconfigurable Intelligent Surface-Assisted Index Modulation: Generalized Reflected Phase Modulation

    Chaorong ZHANG  Yuyang PENG  Ming YUE  Fawaz AL-HAZEMI  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2023/05/30
      Vol:
    E107-A No:1
      Page(s):
    182-186

    As a potential member of next generation wireless communications, the reconfigurable intelligent surface (RIS) can control the reflected elements to adjust the phase of the transmitted signal with less energy consumption. A novel RIS-assisted index modulation scheme is proposed in this paper, which is named the generalized reflected phase modulation (GRPM). In the GRPM, the transmitted bits are mapped into the reflected phase combination which is conveyed through the reflected elements on the RIS, and detected by the maximum likelihood (ML) detector. The performance analysis of the GRPM with the ML detector is presented, in which the closed form expression of pairwise error probability is derived. The simulation results show the bit error rate (BER) performance of GRPM by comparing with various RIS-assisted index modulation schemes in the conditions of various spectral efficiency and number of antennas.

  • Deep Unrolling of Non-Linear Diffusion with Extended Morphological Laplacian

    Gouki OKADA  Makoto NAKASHIZUKA  

     
    PAPER-Image

      Pubricized:
    2023/07/21
      Vol:
    E106-A No:11
      Page(s):
    1395-1405

    This paper presents a deep network based on unrolling the diffusion process with the morphological Laplacian. The diffusion process is an iterative algorithm that can solve the diffusion equation and represents time evolution with Laplacian. The diffusion process is applied to smoothing of images and has been extended with non-linear operators for various image processing tasks. In this study, we introduce the morphological Laplacian to the basic diffusion process and unwrap to deep networks. The morphological filters are non-linear operators with parameters that are referred to as structuring elements. The discrete Laplacian can be approximated with the morphological filters without multiplications. Owing to the non-linearity of the morphological filter with trainable structuring elements, the training uses error back propagation and the network of the morphology can be adapted to specific image processing applications. We introduce two extensions of the morphological Laplacian for deep networks. Since the morphological filters are realized with addition, max, and min, the error caused by the limited bit-length is not amplified. Consequently, the morphological parts of the network are implemented in unsigned 8-bit integer with single instruction multiple data set (SIMD) to achieve fast computation on small devices. We applied the proposed network to image completion and Gaussian denoising. The results and computational time are compared with other denoising algorithm and deep networks.

  • Plane-Wave Spectrum Analysis of Spherical Wave Absorption and Reflection by Metasurface Absorber

    Tu NGUYEN VAN  Satoshi YAGITANI  Kensuke SHIMIZU  Shinjiro NISHI  Mitsunori OZAKI  Tomohiko IMACHI  

     
    PAPER-Electromagnetic Compatibility(EMC)

      Pubricized:
    2023/07/24
      Vol:
    E106-B No:11
      Page(s):
    1182-1191

    A metasurface absorber capable of monitoring two-dimensional (2-d) electric field distributions has been developed, where a matrix of lumped resistors between surface patches formed on a mushroom-type structure works as a 2-d array of short dipole sensors. In this paper absorption and reflection of a spherical wave incident on the metasurface absorber are analyzed by numerical computation by the plane-wave spectrum (PWS) technique using 2-d Fourier analysis. The electromagnetic field of the spherical wave incident on the absorber surface is expanded into a large number of plane waves, for each of which the TE and TM reflection and absorption coefficients are applied. Then by synthesizing all the plane wave fields we obtain the spatial distributions of reflected and absorbed fields. The detailed formulation of the computation is described, and the computed field distributions are compared with those obtained by simulation and actual measurement when the spherical wave from a dipole is illuminated onto a metasurface absorber. It is demonstrated that the PWS technique is effective and efficient in obtaining the accurate field distributions of the spherical wave on and around the absorber. This is useful for evaluating the performance of the metasurface absorber to absorb and measure the spherical wave field distributions around an EM source.

  • 128 Gbit/s Operation of AXEL with Energy Efficiency of 1.5 pJ/bit for Optical Interconnection Open Access

    Wataru KOBAYASHI  Shigeru KANAZAWA  Takahiko SHINDO  Manabu MITSUHARA  Fumito NAKAJIMA  

     
    INVITED PAPER

      Pubricized:
    2023/06/05
      Vol:
    E106-C No:11
      Page(s):
    732-738

    We evaluated the energy efficiency per 1-bit transmission of an optical light source on InP substrate to achieve optical interconnection. A semiconductor optical amplifier (SOA) assisted extended reach EADFB laser (AXEL) was utilized as the optical light source to enhance the energy efficiency compared to the conventional electro-absorption modulator integrated with a DFB laser (EML). The AXEL has frequency bandwidth extendibility for operation of over 100Gbit/s, which is difficult when using a vertical cavity surface emitting laser (VCSEL) without an equalizer. By designing the AXEL for low power consumption, we were able to achieve 64-Gbit/s, 1.0pJ/bit and 128-Gbit/s, 1.5pJ/bit operation at 50°C with the transmitter dispersion and eye closure quaternary of 1.1dB.

  • Hybrid Electromagnetic Simulation Using 2D-FDTD and Ray-Tracing Methods for Airport Surfaces

    Ryosuke SUGA  Megumi WATANABE  Atsushi KEZUKA  

     
    PAPER-Electromagnetic Theory

      Pubricized:
    2023/06/05
      Vol:
    E106-C No:11
      Page(s):
    774-779

    In this paper, a hybrid electromagnetic simulation method of two-dimensional FDTD and ray-tracing methods suitable for an airport surface was proposed. The power variation due to ground reflection, refraction and creeping is calculated by two-dimensional FDTD method and ray-tracing method is used to calculate the reflecting and diffracted powers from buildings. The proposed approach was validated by measurement using a 1/50 scale-model of an airport model with a building model in various positions at 5 GHz. The proposed method allowed measured power distributions to correlate with simulated figures to within 4.8 dB and their null positions were also estimated to an error tolerance of within 0.01 m.

  • LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs

    Song LIU  Jie MA  Chenyu ZHAO  Xinhe WAN  Weiguo WU  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2023/02/10
      Vol:
    E106-A No:8
      Page(s):
    1043-1050

    GPUs have become the dominant computing units to meet the need of high performance in various computational fields. But the long operation latency causes the underutilization of on-chip computing resources, resulting in performance degradation when running parallel tasks on GPUs. A good warp scheduling strategy is an effective solution to hide latency and improve resource utilization. However, most current warp scheduling algorithms on GPUs ignore the ability of long operations to hide latency. In this paper, we propose a long-operation-first warp scheduling algorithm, LFWS, for GPU platforms. The LFWS filters warps in the ready state to a ready queue and updates the queue in time according to changes in the status of the warp. The LFWS divides the warps in the ready queue into long and short operation groups based on the type of operations in their instruction buffers, and it gives higher priority to the long-operating warp in the ready queue. This can effectively use the long operations to hide some of the latency from each other and enhance the system's ability to hide the latency. To verify the effectiveness of the LFWS, we implement the LFWS algorithm on a simulation platform GPGPU-Sim. Experiments are conducted over various CUDA applications to evaluate the performance of LFWS algorithm, compared with other five warp scheduling algorithms. The results show that the LFWS algorithm achieves an average performance improvement of 8.01% and 5.09%, respectively, over three traditional and two novel warp scheduling algorithms, effectively improving computational resource utilization on GPU.

  • Effect of the State of Catalytic Nanoparticles on the Growth of Vertically Aligned Carbon Nanotubes

    Shohei SAKURAI  Mayu IIDA  Kosei OKUNUKI  Masahito KUSHIDA  

     
    PAPER

      Pubricized:
    2023/01/13
      Vol:
    E106-C No:6
      Page(s):
    208-213

    In this study, vertically aligned carbon nanotubes (VA-CNTs) were grown from filler-added LB films with accumulated AlFe2O4 nanoparticles and palmitic acid (C16) as the filler molecule after different hydrogen reduction temperatures of 500°C and 750°C, and the grown VA-CNTs were compared and evaluated. As a result, VA-CNTs were approximately doubled in length after 500°C hydrogen reduction compared to 750°C hydrogen reduction when AlFe2O4 NPs were used. On the other hand, when the catalyst area ratio was decreased by using palmitic acid, i.e., the distance between CNTs was increased, VA-CNTs rapidly shortened after 500°C hydrogen reduction, and VA-CNTs were no longer obtained even in the range where VA-CNTs were obtained in 750°C hydrogen reduction. The inner and outer diameters of VA-CNTs decreased with decreasing catalyst area ratio at 750°C hydrogen reduction and tended to increase at 500°C hydrogen reduction. The morphology of the catalyst nanoparticles after CVD was observed to change significantly depending on the hydrogen reduction temperature and catalyst area ratio. These observations indicate that the state of the catalyst nanoparticles immediately before the CNT growth process greatly affects the physical properties of the CNTs.

1-20hit(993hit)