Meiting XUE Wenqi WU Jinfeng LUO Yixuan ZHANG Bei ZHAO
Join is an important but data-intensive and compute-intensive operation in database systems. Moreover, there are multiple types of join operations according to different join conditions and data relationships with diverse complexities. Because most existing solutions for accelerating the join operation on field programmable gate arrays (FPGAs) focus only on the easiest join application, this study presents a novel architecture that is suitable for multiple types of join operation. This architecture has a modular design and consists of three components that are executed sequentially and in pipeline. Specifically, the top-K sorter is used instead of the full sorter to reduce resource utilization and advance the merge processing. Further, the architecture is perfectly compatible with both N-to-1 and N-to-M join relationships, and can also adapt well to both equi-join and band-join. Experimental results show that this design, which is implemented on an FPGA, achieved a high join throughput of 242.1 million tuples per second, which is better than other reported FPGA implementations.
Daisuke ISHII Takanori HARA Kenichi HIGUCHI
In this paper, we investigate a method for clustering user equipment (UE)-specific transmission access points (APs) in downlink cell-free multiple-input multiple-output (MIMO) assuming that the APs distributed over the system coverage know only part of the instantaneous channel state information (CSI). As a beamforming (BF) method based on partial CSI, we use a layered partially non-orthogonal zero-forcing (ZF) method based on channel matrix muting, which is applicable to the case where different transmitting AP groups are selected for each UE under partial CSI conditions. We propose two AP clustering methods. Both proposed methods first tentatively determine the transmitting APs independently for each UE and then iteratively update the transmitting APs for each UE based on the estimated throughput considering the interference among the UEs. One of the two proposed methods introduces a UE cluster for each UE into the iterative updates of the transmitting APs to balance throughput performance and scalability. Computer simulations show that the proposed methods achieve higher geometric-mean and worst user throughput than those for the conventional methods.
Yuta MINAMIKAWA Kazumasa SHINAGAWA
Secure computation is a kind of cryptographic techniques that enables to compute a function while keeping input data secret. Komano and Mizuki (International Journal of Information Security 2022) proposed a model of coin-based protocols, which are secure computation protocols using physical coins. They designed AND, XOR, and COPY protocols using so-called hand operations, which move coins from one player’s palm to the other palm. However, hand operations cannot be executed when all players’ hands are occupied. In this paper, we propose coin-based protocols without hand operations. In particular, we design a three-coin NOT protocol, a seven-coin AND protocol, a six-coin XOR protocol, and a five-coin COPY protocol without hand operations. Our protocols use random flips only as shuffle operations and are enough to compute any function since they have the same format of input and output, i.e., committed-format protocols.
Artificial intelligence and the introduction of Internet of Things technologies have benefited from technological advances and new automated computer system technologies. Eventually, it is now possible to integrate them into a single offline industrial system. This is accomplished through machine-to-machine communication, which eliminates the human factor. The purpose of this article is to examine security systems for machine-to-machine communication systems that rely on identification and authentication algorithms for real-time monitoring. The article investigates security methods for quickly resolving data processing issues by using the Security operations Center’s main machine to identify and authenticate devices from 19 different machines. The results indicate that when machines are running offline and performing various tasks, they can be exposed to data leaks and malware attacks by both the individual machine and the system as a whole. The study looks at the operation of 19 computers, 7 of which were subjected to data leakage and malware attacks. AnyLogic software is used to create visual representations of the results using wireless networks and algorithms based on previously processed methods. The W76S is used as a protective element within intelligent sensors due to its built-in memory protection. For 4 machines, the data leakage time with malware attacks was 70 s. For 10 machines, the duration was 150 s with 3 attacks. Machine 15 had the longest attack duration, lasting 190 s and involving 6 malware attacks, while machine 19 had the shortest attack duration, lasting 200 s and involving 7 malware attacks. The highest numbers indicated that attempting to hack a system increased the risk of damaging a device, potentially resulting in the entire system with connected devices failing. Thus, illegal attacks by attackers using malware may be identified over time, and data processing effects can be prevented by intelligent control. The results reveal that applying identification and authentication methods using a protocol increases cyber-physical system security while also allowing real-time monitoring of offline system security.
In this study, we consider the data compression with side information available at both the encoder and the decoder. The information source is assigned to a variable-length code that does not have to satisfy the prefix-free constraints. We define several classes of codes whose codeword lengths and error probabilities satisfy worse-case criteria in terms of side-information. As a main result, we investigate the exact first-order asymptotics with second-order bounds scaled as Θ(√n) as blocklength n increases under the regime of nonvanishing error probabilities. To get this result, we also derive its one-shot bounds by employing the cutoff operation.
Adiabatic logic circuits are regarded as one of the most attractive solutions for low-power circuit design. This study is dedicated to optimizing the design of the Two-Level Adiabatic Logic (2LAL) circuit, which boasts a relatively simple structure and superior low-power performance among many asymptotically adiabatic or quasi-adiabatic logic families, but suffers from a large number of timing buffers for “decompute”. Our focus is on the “early decompute” technique for fully pipelined 2LAL, and we propose two ILP approaches for minimizing hardware cost through optimization of early decompute. In the first approach, the problem is formulated as a kind of scheduling problem, while it is reformulated as node selection problem (stable set problem). The performance of the proposed methods are evaluated using several benchmark circuits from ISCAS-85, and the maximum 70% hardware reduction is observed compared with an existing method.
Ryota KOBAYASHI Takanori HARA Yasuaki YUDA Kenichi HIGUCHI
This paper extends our previously reported non-orthogonal multiple access (NOMA)-based highly-efficient and low-latency hybrid automatic repeat request (HARQ) method for ultra-reliable low latency communications (URLLC) to the case with inter-base station cooperation. In the proposed method, delay-sensitive URLLC packets are preferentially multiplexed with best-effort enhanced mobile broadband (eMBB) packets in the same channel using superposition coding to reduce the transmission latency of the URLLC packet while alleviating the throughput loss in eMBB. Although data transmission to the URLLC terminal is conducted by multiple base stations based on inter-base station cooperation, the proposed method allocates radio resources to URLLC terminals which include scheduling (bandwidth allocation) and power allocation at each base station independently to achieve the short transmission latency required for URLLC. To avoid excessive radio resource assignment to URLLC terminals due to independent resource assignment at each base station, which may result in throughput degradation in eMBB terminals, we employ an adaptive path-loss-dependent weighting approach in the scheduling-metric calculation. This achieves appropriate radio resource assignment to URLLC terminals while reducing the packet error rate (PER) and transmission delay time thanks to the inter-base station cooperation. We show that the proposed method significantly improves the overall performance of the system that provides simultaneous eMBB and URLLC services.
Kengo TAJIRI Ryoichi KAWAHARA Yoichi MATSUO
Machine learning (ML) has been used for various tasks in network operations in recent years. However, since the scale of networks has grown and the amount of data generated has increased, it has been increasingly difficult for network operators to conduct their tasks with a single server using ML. Thus, ML with edge-cloud cooperation has been attracting attention for efficiently processing and analyzing a large amount of data. In the edge-cloud cooperation setting, although transmission latency, bandwidth congestion, and accuracy of tasks using ML depend on the load balance of processing data with edge servers and a cloud server in edge-cloud cooperation, the relationship is too complex to estimate. In this paper, we focus on monitoring anomalous traffic as an example of ML tasks for network operations and formulate transmission latency, bandwidth congestion, and the accuracy of the task with edge-cloud cooperation considering the ratio of the amount of data preprocessed in edge servers to that in a cloud server. Moreover, we formulate an optimization problem under constraints for transmission latency and bandwidth congestion to select the proper ratio by using our formulation. By solving our optimization problem, the optimal load balance between edge servers and a cloud server can be selected, and the accuracy of anomalous traffic monitoring can be estimated. Our formulation and optimization framework can be used for other ML tasks by considering the generating distribution of data and the type of an ML model. In accordance with our formulation, we simulated the optimal load balance of edge-cloud cooperation in a topology that mimicked a Japanese network and conducted an anomalous traffic detection experiment by using real traffic data to compare the estimated accuracy based on our formulation and the actual accuracy based on the experiment.
Song LIU Jie MA Chenyu ZHAO Xinhe WAN Weiguo WU
GPUs have become the dominant computing units to meet the need of high performance in various computational fields. But the long operation latency causes the underutilization of on-chip computing resources, resulting in performance degradation when running parallel tasks on GPUs. A good warp scheduling strategy is an effective solution to hide latency and improve resource utilization. However, most current warp scheduling algorithms on GPUs ignore the ability of long operations to hide latency. In this paper, we propose a long-operation-first warp scheduling algorithm, LFWS, for GPU platforms. The LFWS filters warps in the ready state to a ready queue and updates the queue in time according to changes in the status of the warp. The LFWS divides the warps in the ready queue into long and short operation groups based on the type of operations in their instruction buffers, and it gives higher priority to the long-operating warp in the ready queue. This can effectively use the long operations to hide some of the latency from each other and enhance the system's ability to hide the latency. To verify the effectiveness of the LFWS, we implement the LFWS algorithm on a simulation platform GPGPU-Sim. Experiments are conducted over various CUDA applications to evaluate the performance of LFWS algorithm, compared with other five warp scheduling algorithms. The results show that the LFWS algorithm achieves an average performance improvement of 8.01% and 5.09%, respectively, over three traditional and two novel warp scheduling algorithms, effectively improving computational resource utilization on GPU.
Noriko YUASA Masahiro YAMAGUCHI Kosuke SHIMA Takanobu OTSUKA
At manufacturing sites, mass customization is expanding along with the increasing variety of customer needs. This situation leads to complications in production planning for the factory manager, and production plans are likely to change suddenly at the manufacturing site. Because such sudden fluctuations in production often occur, it is particularly difficult to optimize the parts supply operations in these production processes. As a solution to such problems, Industry 4.0 has expanded to promote the use of digital technologies at manufacturing sites; however, these solutions can be expensive and time-consuming to introduce. Therefore, not all factory managers are favorable toward introducing digital technology. In this study, we propose a method to support parts supply operations that decreases work stagnation and fluctuation without relying on the experience of workers who supply parts in the various production processes. Furthermore, we constructed a system that is inexpensive and easy to introduce using both LPWA and BLE communications. The purpose of the system is to level out work in in-process logistics. In an experiment, the proposed method was introduced to a manufacturing site, and we compared how the workload of the site's workers changed. The experimental results show that the proposed method is effective for workload leveling in parts supply operations.
Katsuhiko ISHIKAWA Taro MURAKAMI Mikiya TANIGUCHI
This study examined whether distance learning in a first-year PBL courses in the first unit of instruction improves the effectiveness of subsequent group work learning over face-to-face learning. The first-year PBL consisted of three units: an input unit, a group work unit and an outcomes presentation unit. In 2017/2018, the input unit was conducted in the classroom with face-to-face learning. In 2017, a workshop was held in addition to face-to-face learning in classroom. In 2020/2021, the input unit was conducted with distance learning. In the years, approximately 100 people completed the questionnaire. A preliminary check confirmed that the average score of students' self-assessment of their own social skills were not significantly different among the four years. Analysis showed that in 2018, the perceived efficacy in the group work unit depended on learners' high social skills. Alternatively, in 2017/2020/2021, the perceived efficacy in group work was not dependent on learners' social skills. This suggests that distance learning and face-to-face learning with workshop learning, instead of full face-to-face learning for the units placed before the group work unit facilitates the learning efficacy of the group work unit, even for students with social skill concerns.
Tatsuya SATO Taku SHIMOSAWA Yosuke HIMURA
Enterprises have paid attention to consortium blockchains like Hyperledger Fabric, which is one of the most promising platforms, for efficient decentralized transactions without depending on any particular organization. A consortium blockchain-based system will be typically built across multiple organizations. In such blockchain-based systems, system operations across multiple organizations in a decentralized manner are essential to maintain the value of introducing consortium blockchains. Decentralized system operations have recently been becoming realistic with the evolution of consortium blockchains. For instance, the release of Hyperledger Fabric v2.x, in which individual operational tasks for a blockchain network, such as command execution of configuration change of channels (Fabric's sub-networks) and upgrade of chaincodes (Fabric's smart contracts), can be partially executed in a decentralized manner. However, the operations workflows also include the preceding procedure of pre-sharing, coordinating, and pre-agreeing the operational information (e.g., configuration parameters) among organizations, after which operation executions can be conducted, and this preceding procedure relies on costly manual tasks. To realize efficient decentralized operations workflows for consortium blockchain-based systems in general, we propose a decentralized inter-organizational operations method that we call Operations Smart Contract (OpsSC), which defines an operations workflow as a smart contract. Furthermore, we design and implement OpsSC for blockchain network operations with Hyperledger Fabric v2.x. This paper presents OpsSC for operating channels and chaincodes, which are essential for managing the blockchain networks, through clarifying detailed workflows of those operations. A cost evaluation based on an estimation model shows that the total operational cost for executing a typical operational scenario to add an organization to a blockchain network having ten organizations could be reduced by 54 percent compared with a conventional script-based method. The implementation of OpsSC has been open-sourced and registered as one of Hyperledger Labs projects, which hosts experimental projects approved by Hyperledger.
Yun WU Yu SHI Jieming YANG Lishan BAO Chunzhe LI
In the Artificial Intelligence for IT Operations scenarios, KPI (Key Performance Indicator) is a very important operation and maintenance monitoring indicator, and research on KPI anomaly detection has also become a hot spot in recent years. Aiming at the problems of low detection efficiency and insufficient representation learning of existing methods, this paper proposes a fast clustering-based KPI anomaly detection method HCE-DWL. This paper firstly adopts the combination of hierarchical agglomerative clustering (HAC) and deep assignment based on CNN-Embedding (CE) to perform cluster analysis (that is HCE) on KPI data, so as to improve the clustering efficiency of KPI data, and then separately the centroid of each KPI cluster and its Transformed Outlier Scores (TOS) are given weights, and finally they are put into the LightGBM model for detection (the Double Weight LightGBM model, referred to as DWL). Through comparative experimental analysis, it is proved that the algorithm can effectively improve the efficiency and accuracy of KPI anomaly detection.
For many countries in the world, 5G is of strategic significance. In the 5G era, telecom operators are expected to enable and provide multiple services with different communication characteristics like enhanced broadband, ultra-reliable and extreme real-time communications at the same time. To meet the requirements, the 5G network essentially will be more complex compared with traditional 3G/4G networks. The unique characteristics of 5G resulted from new technologies bring a lot of opportunities as well as significant challenges. In this paper we first introduce 5G vision and check the global status. And then we illustrate the 5G technical essentials and point out the new opportunities that 5G will bring to us. We also highlight the coming challenges and share our 5G experience and solutions toward 5G vision in many aspects, including network, management and business.
Kento SUGIURA Yoshiharu ISHIKAWA
With the rapid increase in the number of CPU cores, software that can utilize these many cores is required. A lock-free algorithm based on compare-and-swap (CAS) operations is one of the concurrency control methods to implement such multi-threading software. A multi-word CAS (MwCAS) operation is an extension of a CAS operation to swap multiple words atomically. However, we noticed that the performance of the existing MwCAS implementation is limited because of garbage collection even if in a low-contention environment. To achieve high performance in low-contention workloads, we propose a new MwCAS algorithm without garbage collection. Experimental results show that our approach is three to five times faster than implementation with garbage collection in low-contention workloads. Moreover, the performance of the proposed method is also superior in a high-contention environment.
Yuki IMAI Shinichi NISHIZAWA Kazuhito ITO
Environmental power generation devices such as solar cells are used as power sources for IoT devices. Due to the large internal resistance of such power source, LSIs in the IoT devices may malfunction when the LSI operates at high speed, a large current flows, and the voltage drops. In this paper, a standard cell library of stacked structured cells is proposed to increase the delay of logic circuits within the range not exceeding the clock cycle, thereby reducing the maximum current of the LSIs. We show that the maximum power consumption of LSIs can be reduced without increasing the energy consumption of the LSIs.
Huy H. NGUYEN Minoru KURIBAYASHI Junichi YAMAGISHI Isao ECHIZEN
Deep neural networks (DNNs) have achieved excellent performance on several tasks and have been widely applied in both academia and industry. However, DNNs are vulnerable to adversarial machine learning attacks in which noise is added to the input to change the networks' output. Consequently, DNN-based mission-critical applications such as those used in self-driving vehicles have reduced reliability and could cause severe accidents and damage. Moreover, adversarial examples could be used to poison DNN training data, resulting in corruptions of trained models. Besides the need for detecting adversarial examples, correcting them is important for restoring data and system functionality to normal. We have developed methods for detecting and correcting adversarial images that use multiple image processing operations with multiple parameter values. For detection, we devised a statistical-based method that outperforms the feature squeezing method. For correction, we devised a method that uses for the first time two levels of correction. The first level is label correction, with the focus on restoring the adversarial images' original predicted labels (for use in the current task). The second level is image correction, with the focus on both the correctness and quality of the corrected images (for use in the current and other tasks). Our experiments demonstrated that the correction method could correct nearly 90% of the adversarial images created by classical adversarial attacks and affected only about 2% of the normal images.
Yoichi MATSUO Tatsuaki KIMURA Ken NISHIMATSU
When a failure occurs in a network element, such as switch, router, and server, network operators need to recognize the service impact, such as time to recovery from the failure or severity of the failure, since service impact is essential information for handling failures. In this paper, we propose Deep learning based Service Impact Prediction system (DeepSIP), which predicts the service impact of network failure in a network element using a temporal multimodal convolutional neural network (CNN). More precisely, DeepSIP predicts the time to recovery from the failure and the loss of traffic volume due to the failure in a network on the basis of information from syslog messages and traffic volume. Since the time to recovery is useful information for a service level agreement (SLA) and the loss of traffic volume is directly related to the severity of the failure, we regard the time to recovery and the loss of traffic volume as the service impact. The service impact is challenging to predict, since it depends on types of network failures and traffic volume when the failure occurs. Moreover, network elements do not explicitly contain any information about the service impact. To extract the type of network failures and predict the service impact, we use syslog messages and past traffic volume. However, syslog messages and traffic volume are also challenging to analyze because these data are multimodal, are strongly correlated, and have temporal dependencies. To extract useful features for prediction, we develop a temporal multimodal CNN. We experimentally evaluated DeepSIP in terms of accuracy by comparing it with other NN-based methods by using synthetic and real datasets. For both datasets, the results show that DeepSIP outperformed the baselines.
Tatsuki OKUYAMA Nobuhide NONAKA Satoshi SUYAMA Yukihiko OKUMURA Takahiro ASAI
The fifth-generation (5G) mobile communications system initially introduced massive multiple-input multiple-output (M-MIMO) with analog beamforming (BF) to compensate for the larger path-loss in millimeter-wave (mmW) bands. To solve a coverage issue and support high mobility of the mmW bands, base station (BS) cooperation technologies have been investigated in high-mobility environments. However, previous works assume one mobile station (MS) scenario and analog BF that does not suppress interference among MSs. In order to improve system performance in the mmW bands, fully digital BF that includes digital precoding should be employed to suppress the interference even when MSs travel in high mobility. This paper proposes two mmW BS cooperation technologies that are inter-baseband unit (inter-BBU) and intra-BBU cooperation for the fully digital BF. The inter-BBU cooperation exploits two M-MIMO antennas in two BBUs connected to one central unit by limited-bandwidth fronthaul, and the intra-BBU cooperates two M-MIMO antennas connected to one BBU with Doppler frequency shift compensation. This paper verifies effectiveness of the BS cooperation technologies by both computer simulations and outdoor experimental trials. First, it is shown that that the intra-BBU cooperation can achieve an excellent transmission performance in cases of two and four MSs moving at a velocity of 90km/h by computer simulations. Second, the outdoor experimental trials clarifies that the inter-BBU cooperation maintains the maximum throughput in a wider area than non-BS cooperation when only one MS moves at a maximum velocity of 120km/h.
Zimin ZHAO Ying KANG Aiqin HOU Daguang GAN
Differentiable neural architecture search (DARTS) is now a widely disseminated weight-sharing neural architecture search method and it consists of two stages: search and evaluation. However, the original DARTS suffers from some well-known shortcomings. Firstly, the width and depth of the network, as well as the operation of two stages are discontinuous, which causes a performance collapse. Secondly, DARTS has a high computational overhead. In this paper, we propose a synchronous progressive approach to solve the discontinuity problem for network depth and width and we use the 0-1 loss function to alleviate the discontinuity problem caused by the discretization of operation. The computational overhead is reduced by using the partial channel connection. Besides, we also discuss and propose a solution to the aggregation of skip operations during the search process of DARTS. We conduct extensive experiments on CIFAR-10 and WANFANG datasets, specifically, our approach reduces search time significantly (from 1.5 to 0.1 GPU days) and improves the accuracy of image recognition.