Haochen LYU Jianjun LI Yin YE Chin-Chen CHANG
The purpose of Facial Beauty Prediction (FBP) is to automatically assess facial attractiveness based on human aesthetics. Most neural network-based prediction methods do not consider the ranking information in the task. For scoring tasks like facial beauty prediction, there is abundant ranking information both between images and within images. Reasonable utilization of these information during training can greatly improve the performance of the model. In this paper, we propose a novel end-to-end Convolutional Neural Network (CNN) model based on ranking information of images, incorporating a Rank Module and an Adaptive Weight Module. We also design pairwise ranking loss functions to fully leverage the ranking information of images. Considering training efficiency and model inference capability, we choose ResNet-50 as the backbone network. We conduct experiments on the SCUT-FBP5500 dataset and the results show that our model achieves a new state-of-the-art performance. Furthermore, ablation experiments show that our approach greatly contributes to improving the model performance. Finally, the Rank Module with the corresponding ranking loss is plug-and-play and can be extended to any CNN model and any task with ranking information. Code is available at https://github.com/nehcoah/Rank-Info-Net.
Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.
Kota HISAFURU Kazunari TAKASAKI Nozomu TOGAWA
In recent years, with the wide spread of the Internet of Things (IoT) devices, security issues for hardware devices have been increasing, where detecting their anomalous behaviors becomes quite important. One of the effective methods for detecting anomalous behaviors of IoT devices is to utilize consumed energy and operation duration time extracted from their power waveforms. However, the existing methods do not consider the shape of time-series data and cannot distinguish between power waveforms with similar consumed energy and duration time but different shapes. In this paper, we propose a method for detecting anomalous behaviors based on the shape of time-series data by incorporating a shape-based distance (SBD) measure. The proposed method first obtains the entire power waveform of the target IoT device and extracts several application power waveforms. After that, we give the invariances to them, and we can effectively obtain the SBD between every two application power waveforms. Based on the SBD values, the local outlier factor (LOF) method can finally distinguish between normal application behaviors and anomalous application behaviors. Experimental results demonstrate that the proposed method successfully detects anomalous application behaviors, while the existing state-of-the-art method fails to detect them.
Takashi YOKOTA Kanemitsu OOTSU Shun KOJIMA
An interconnection network is an inevitable component for constructing parallel computers. It connects computation nodes so that the nodes can communicate with each other. As a parallel computation essentially requires inter-node communication according to a parallel algorithm, the interconnection network plays an important role in terms of communication performance. This paper focuses on the collective communication that is frequently performed in parallel computation and this paper addresses the Cup-Stacking method that is proposed in our preceding work. The key issues of the method are splitting a large packet into slices, re-shaping the slice, and stacking the slices, in a genetic algorithm (GA) manner. This paper discusses extending the Cup-Stacking method by introducing additional items (genes) and proposes the extended Cup-Stacking method. Furthermore, this paper places comprehensive discussions on the drawbacks and further optimization of the method. Evaluation results reveal the effectiveness of the extended method, where the proposed method achieves at most seven percent improvement in duration time over the former Cup-Stacking method.
Kaito TOMARI Jun YONEDA Tetsuo KODERA
Reducing on-chip microwave crosstalk is crucial for semiconductor spin qubit integration. Toward crosstalk reduction and qubit integration, we investigate on-chip microwave crosstalk for gate electrode pad designs with (i) etched trenches between contact pads or (ii) contact pads with reduced sizes. We conclude that the design with feature (ii) is advantageous for high-density integration of semiconductor qubits with small crosstalk (below -25 dB at 6 GHz), favoring the introduction of flip-chip bonding.
Yuetsu KODAMA Masaaki KONDO Mitsuhisa SATO
The supercomputer, “Fugaku”, which ranked number one in multiple supercomputing lists, including the Top500 in June 2020, has various power control features, such as (1) an eco mode that utilizes only one of two floating-point pipelines while decreasing the power supply to the chip; (2) a boost mode that increases clock frequency; and (3) a core retention feature that turns unused cores to the low-power state. By orchestrating these power-performance features while considering the characteristics of running applications, we can potentially gain even better system-level energy efficiency. In this paper, we report on the performance and power consumption of Fugaku using SPEC HPC benchmarks. Consequently, we confirmed that it is possible to reduce the energy by about 17% while improving the performance by about 2% from the normal mode by combining boost mode and eco mode.
Conggai LI Feng LIU Xin ZHOU Yanli XU
To obtain a full picture of potential applications for propagation-delay based X channels, it is important to obtain all feasible schemes of cyclic interference alignment including the encoder, channel instance, and decoder. However, when the dimension goes larger, theoretical analysis about this issue will become tedious and even impossible. In this letter, we propose a computer-aided solution by searching the channel space and the scheduling space, which can find all feasible schemes in details. Examples are given for some typical X channels. Computational complexity is further analyzed.
Wen LIU Yixiao SHAO Shihong ZHAI Zhao YANG Peishuai CHEN
Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.
Chenxu WANG Hideki KAWAGUCHI Kota WATANABE
An approach to dedicated computers is discussed in this study as a possibility for portable, low-cost, and low-power consumption high-performance computing technologies. Particularly, dataflow architecture dedicated computer of the finite integration technique (FIT) for 2D magnetostatic field simulation is considered for use in industrial applications. The dataflow architecture circuit of the BiCG-Stab matrix solver of the FIT matrix calculation is designed by the very high-speed integrated circuit hardware description language (VHDL). The operation of the dedicated computer's designed circuit is considered by VHDL logic circuit simulation.
Fairuz SAFWAN MAHAD Masakazu IWAMURA Koichi KISE
3D reconstruction methods using neural networks are popular and have been studied extensively. However, the resulting models typically lack detail, reducing the quality of the 3D reconstruction. This is because the network is not designed to capture the fine details of the object. Therefore, in this paper, we propose two networks designed to capture both the coarse and fine details of the object to improve the reconstruction of the detailed parts of the object. To accomplish this, we design two networks. The first network uses a multi-scale architecture with skip connections to associate and merge features from other levels. For the second network, we design a multi-branch deep generative network that separately learns the local features, generic features, and the intermediate features through three different tailored components. In both network architectures, the principle entails allowing the network to learn features at different levels that can reconstruct the fine parts and the overall shape of the reconstructed 3D model. We show that both of our methods outperformed state-of-the-art approaches.
Tomoki SHIMIZU Kohei ITO Kensuke IIZUKA Kazuei HIRONAKA Hideharu AMANO
The multi-FPGA system known as, the Flow-in-Cloud (FiC) system, is composed of mid-range FPGAs that are directly interconnected by high-speed serial links. FiC is currently being developed as a server for multi-access edge computing (MEC), which is one of the core technologies of 5G. Because the applications of MEC are sometimes timing-critical, a static time division multiplexing (STDM) network has been used on FiC. However, the STDM network exhibits the disadvantage of decreasing link utilization, especially under light traffic. To solve this problem, we propose a hybrid router that combines packet switching for low-priority communication and STDM for high-priority communication. In our hybrid network, the packet switching uses slots that are unused by the STDM; therefore, best-effort communication by packet switching and QoS guarantee communication by the STDM can be used simultaneously. Furthermore, to improve each link utilization under a low network traffic load, we propose a dynamic communication switching algorithm. In our algorithm, each router monitors the network load metrics, and according to the metrics, timing-critical tasks select the STDM according to the metrics only when congestion occurs. This can achieve both QoS guarantee and efficient utilization of each link with a small resource overhead. In our evaluation, the dynamic algorithm was up to 24.6% faster on the execution time with a high network load compared to the packet switching on a real multi-FPGA system with 24 boards.
Takashi YOKOTA Kanemitsu OOTSU Shun KOJIMA
Parallel computing essentially consists of computation and communication and, in many cases, communication performance is vital. Many parallel applications use collective communications, which often dominate the performance of the parallel execution. This paper focuses on collective communication performance to speed-up the parallel execution. This paper firstly offers our experimental result that splitting a session of collective communication to small portions (slices) possibly enables efficient communication. Then, based on the results, this paper proposes a new concept cup-stacking with a genetic algorithm based methodology. The preliminary evaluation results reveal the effectiveness of the proposed method.
Wen SHAO Rei KAWAKAMI Takeshi NAEMURA
Previous studies on anomaly detection in videos have trained detectors in which reconstruction and prediction tasks are performed on normal data so that frames on which their task performance is low will be detected as anomalies during testing. This paper proposes a new approach that involves sorting video clips, by using a generative network structure. Our approach learns spatial contexts from appearances and temporal contexts from the order relationship of the frames. Experiments were conducted on four datasets, and we categorized the anomalous sequences by appearance and motion. Evaluations were conducted not only on each total dataset but also on each of the categories. Our method improved detection performance on both anomalies with different appearance and different motion from normality. Moreover, combining our approach with a prediction method produced improvements in precision at a high recall.
Xiang SHEN Dezhi HAN Chin-Chen CHANG Liang ZONG
Visual Question Answering (VQA) is multi-task research that requires simultaneous processing of vision and text. Recent research on the VQA models employ a co-attention mechanism to build a model between the context and the image. However, the features of questions and the modeling of the image region force irrelevant information to be calculated in the model, thus affecting the performance. This paper proposes a novel dual self-guided attention with sparse question networks (DSSQN) to address this issue. The aim is to avoid having irrelevant information calculated into the model when modeling the internal dependencies on both the question and image. Simultaneously, it overcomes the coarse interaction between sparse question features and image features. First, the sparse question self-attention (SQSA) unit in the encoder calculates the feature with the highest weight. From the self-attention learning of question words, the question features of larger weights are reserved. Secondly, sparse question features are utilized to guide the focus on image features to obtain fine-grained image features, and to also prevent irrelevant information from being calculated into the model. A dual self-guided attention (DSGA) unit is designed to improve modal interaction between questions and images. Third, the sparse question self-attention of the parameter δ is optimized to select these question-related object regions. Our experiments with VQA 2.0 benchmark datasets demonstrate that DSSQN outperforms the state-of-the-art methods. For example, the accuracy of our proposed model on the test-dev and test-std is 71.03% and 71.37%, respectively. In addition, we show through visualization results that our model can pay more attention to important features than other advanced models. At the same time, we also hope that it can promote the development of VQA in the field of artificial intelligence (AI).
Fairuz Safwan MAHAD Masakazu IWAMURA Koichi KISE
Neural network-based three-dimensional (3D) reconstruction methods have produced promising results. However, they do not pay particular attention to reconstructing detailed parts of objects. This occurs because the network is not designed to capture the fine details of objects. In this paper, we propose a network designed to capture both the coarse and fine details of objects to improve the reconstruction of the fine parts of objects.
Masayuki ODAGAWA Tetsushi KOIDE Toru TAMAKI Shigeto YOSHIDA Hiroshi MIENO Shinji TANAKA
This paper presents examination result of possibility for automatic unclear region detection in the CAD system for colorectal tumor with real time endoscopic video image. We confirmed that it is possible to realize the CAD system with navigation function of clear region which consists of unclear region detection by YOLO2 and classification by AlexNet and SVMs on customizable embedded DSP cores. Moreover, we confirmed the real time CAD system can be constructed by a low power ASIC using customizable embedded DSP cores.
Masayuki ODAGAWA Takumi OKAMOTO Tetsushi KOIDE Toru TAMAKI Shigeto YOSHIDA Hiroshi MIENO Shinji TANAKA
In this paper, we present a classification method for a Computer-Aided Diagnosis (CAD) system in a colorectal magnified Narrow Band Imaging (NBI) endoscopy. In an endoscopic video image, color shift, blurring or reflection of light occurs in a lesion area, which affects the discrimination result by a computer. Therefore, in order to identify lesions with high robustness and stable classification to these images specific to video frame, we implement a CAD system for colorectal endoscopic images with the Convolutional Neural Network (CNN) feature and Support Vector Machine (SVM) classification on the embedded DSP core. To improve the robustness of CAD system, we construct the SVM learned by multiple image sizes data sets so as to adapt to the noise peculiar to the video image. We confirmed that the proposed method achieves higher robustness, stable, and high classification accuracy in the endoscopic video image. The proposed method also can cope with differences in resolution by old and new endoscopes and perform stably with respect to the input endoscopic video image.
Due to the rapid development of different processors, e.g., x86 and Sunway, software porting between different platforms is becoming more frequent. However, the migrated software's execution efficiency on the target platform is different from that of the source platform, and most of the previous studies have investigated the improvement of the efficiency from the hardware perspective. To the best of our knowledge, this is the first paper to exclusively focus on studying what software factors can result in performance change after software migration. To perform our study, we used SonarQube to detect and measure five software factors, namely Duplicated Lines (DL), Code Smells Density (CSD), Big Functions (BF), Cyclomatic Complexity (CC), and Complex Functions (CF), from 13 selected projects of SPEC CPU2006 benchmark suite. Then, we measured the change of software performance by calculating the acceleration ratio of execution time before (x86) and after (Sunway) software migration. Finally, we performed a multiple linear regression model to analyze the relationship between the software performance change and the software factors. The results indicate that the performance change of software migration from the x86 platform to the Sunway platform is mainly affected by three software factors, i.e., Code Smell Density (CSD), Cyclomatic Complexity (CC), and Complex Functions (CF). The findings can benefit both researchers and practitioners.
Yusuke HARA Xueting WANG Toshihiko YAMASAKI
Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.
Masayuki ODAGAWA Takumi OKAMOTO Tetsushi KOIDE Toru TAMAKI Bisser RAYTCHEV Kazufumi KANEDA Shigeto YOSHIDA Hiroshi MIENO Shinji TANAKA Takayuki SUGAWARA Hiroshi TOISHI Masayuki TSUJI Nobuo TAMBA
In this paper, we present a hardware implementation of a colorectal cancer diagnosis support system using a colorectal endoscopic video image on customizable embedded DSP. In an endoscopic video image, color shift, blurring or reflection of light occurs in a lesion area, which affects the discrimination result by a computer. Therefore, in order to identify lesions with high robustness and stable classification to these images specific to video frame, we implement a computer-aided diagnosis (CAD) system for colorectal endoscopic images with Narrow Band Imaging (NBI) magnification with the Convolutional Neural Network (CNN) feature and Support Vector Machine (SVM) classification. Since CNN and SVM need to perform many multiplication and accumulation (MAC) operations, we implement the proposed hardware system on a customizable embedded DSP, which can realize at high speed MAC operations and parallel processing with Very Long Instruction Word (VLIW). Before implementing to the customizable embedded DSP, we profile and analyze processing cycles of the CAD system and optimize the bottlenecks. We show the effectiveness of the real-time diagnosis support system on the embedded system for endoscopic video images. The prototyped system demonstrated real-time processing on video frame rate (over 30fps @ 200MHz) and more than 90% accuracy.