Yuto ARIMURA Shigeru YAMASHITA
Stochastic Computing (SC) allows additions and multiplications to be realized with lower power than the conventional binary operations if we admit some errors. However, for many complex functions which cannot be realized by only additions and multiplications, we do not know a generic efficient method to calculate a function by using an SC circuit; it is necessary to realize an SC circuit by using a generic method such as polynomial approximation methods for such a function, which may lose the advantage of SC. Thus, there have been many researches to consider efficient SC realization for specific functions; an efficient SC square root circuit with a feedback circuit was proposed by D. Wu et al. recently. This paper generalizes the SC square root circuit with a feedback circuit; we identify a situation when we can implement a function efficiently by an SC circuit with a feedback circuit. As examples of our generalization, we propose SC circuits to calculate the n-th root calculation and division. We also show our analysis on the accuracy of our SC circuits and the hardware costs; our results show the effectiveness of our method compared to the conventional SC designs; our framework may be able to implement a SC circuit that is better than the existing methods in terms of the hardware cost or the calculation error.
Haochen LYU Jianjun LI Yin YE Chin-Chen CHANG
The purpose of Facial Beauty Prediction (FBP) is to automatically assess facial attractiveness based on human aesthetics. Most neural network-based prediction methods do not consider the ranking information in the task. For scoring tasks like facial beauty prediction, there is abundant ranking information both between images and within images. Reasonable utilization of these information during training can greatly improve the performance of the model. In this paper, we propose a novel end-to-end Convolutional Neural Network (CNN) model based on ranking information of images, incorporating a Rank Module and an Adaptive Weight Module. We also design pairwise ranking loss functions to fully leverage the ranking information of images. Considering training efficiency and model inference capability, we choose ResNet-50 as the backbone network. We conduct experiments on the SCUT-FBP5500 dataset and the results show that our model achieves a new state-of-the-art performance. Furthermore, ablation experiments show that our approach greatly contributes to improving the model performance. Finally, the Rank Module with the corresponding ranking loss is plug-and-play and can be extended to any CNN model and any task with ranking information. Code is available at https://github.com/nehcoah/Rank-Info-Net.
Rongqi ZHANG Chunyun PAN Yafei WANG Yuanyuan YAO Xuehua LI
With maturation of 5G technology in recent years, multimedia services such as live video streaming and online games on the Internet have flourished. These multimedia services frequently require low latency, which pose a significant challenge to compute the high latency requirements multimedia tasks. Mobile edge computing (MEC), is considered a key technology solution to address the above challenges. It offloads computation-intensive tasks to edge servers by sinking mobile nodes, which reduces task execution latency and relieves computing pressure on multimedia devices. In order to use MEC paradigm reasonably and efficiently, resource allocation has become a new challenge. In this paper, we focus on the multimedia tasks which need to be uploaded and processed in the network. We set the optimization problem with the goal of minimizing the latency and energy consumption required to perform tasks in multimedia devices. To solve the complex and non-convex problem, we formulate the optimization problem as a distributed deep reinforcement learning (DRL) problem and propose a federated Dueling deep Q-network (DDQN) based multimedia task offloading and resource allocation algorithm (FDRL-DDQN). In the algorithm, DRL is trained on the local device, while federated learning (FL) is responsible for aggregating and updating the parameters from the trained local models. Further, in order to solve the not identically and independently distributed (non-IID) data problem of multimedia devices, we develop a method for selecting participating federated devices. The simulation results show that the FDRL-DDQN algorithm can reduce the total cost by 31.3% compared to the DQN algorithm when the task data is 1000 kbit, and the maximum reduction can be 35.3% compared to the traditional baseline algorithm.
Yuichiro TANAKA Hakaru TAMUKOH
In this study, we introduce a reservoir-based one-dimensional (1D) convolutional neural network that processes time-series data at a low computational cost, and investigate its performance and training time. Experimental results show that the proposed network consumes lower training computational costs and that it outperforms the conventional reservoir computing in a sound-classification task.
Shuyun LUO Wushuang WANG Yifei LI Jian HOU Lu ZHANG
Crowdsourcing becomes a popular data-collection method to relieve the burden of high cost and latency for data-gathering. Since the involved users in crowdsourcing are volunteers, need incentives to encourage them to provide data. However, the current incentive mechanisms mostly pay attention to the data quantity, while ignoring the data quality. In this paper, we design a Data-quality awaRe IncentiVe mEchanism (DRIVE) for collaborative tasks based on the Stackelberg game to motivate users with high quality, the highlight of which is the dynamic reward allocation scheme based on the proposed data quality evaluation method. In order to guarantee the data quality evaluation response in real-time, we introduce the mobile edge computing framework. Finally, one case study is given and its real-data experiments demonstrate the superior performance of DRIVE.
Yeongwoo HA Seongbeom PARK Jieun LEE Sangeun OH
With the recent advances in IoT, there is a growing interest in multi-surface computing, where a mobile app can cooperatively utilize multiple devices' surfaces. We propose a novel framework that seamlessly augments mobile apps with multi-surface computing capabilities. It enables various apps to employ multiple surfaces with acceptable performance.
Yu WANG Liangyong YANG Jilian ZHANG Xuelian DENG
Cloud computing has become the mainstream computing paradigm nowadays. More and more data owners (DO) choose to outsource their data to a cloud service provider (CSP), who is responsible for data management and query processing on behalf of DO, so as to cut down operational costs for the DO. However, in real-world applications, CSP may be untrusted, hence it is necessary to authenticate the query result returned from the CSP. In this paper, we consider the problem of approximate string query result authentication in the context of database outsourcing. Based on Merkle Hash Tree (MHT) and Trie, we propose an authenticated tree structure named MTrie for authenticating approximate string query results. We design efficient algorithms for query processing and query result authentication. To verify effectiveness of our method, we have conducted extensive experiments on real datasets and the results show that our proposed method can effectively authenticate approximate string query results.
Lin CHEN Xueyuan YIN Dandan ZHAO Hongwei LU Lu LI Yixiang CHEN
ARM chips with low energy consumption and low-cost investment have been rapidly applied to smart office and smart entertainment including cloud mobile phones and cloud games. This paper first summarizes key technologies and development status of the above scenarios including CPU, memory, IO hardware virtualization characteristics, ARM hypervisor and container, GPU virtualization, network virtualization, resource management and remote transmission technologies. Then, in view of the current lack of publicly referenced ARM cloud constructing solutions, this paper proposes and constructs an implementation framework for building an ARM cloud, and successively focuses on the formal definition of virtualization framework, Android container system and resource quota management methods, GPU virtualization based on API remoting and GPU pass-through, and the remote transmission technology. Finally, the experimental results show that the proposed model and corresponding component implementation methods are effective, especially, the pass-through mode for virtualizing GPU resources has higher performance and higher parallelism.
Adiabatic logic circuits are regarded as one of the most attractive solutions for low-power circuit design. This study is dedicated to optimizing the design of the Two-Level Adiabatic Logic (2LAL) circuit, which boasts a relatively simple structure and superior low-power performance among many asymptotically adiabatic or quasi-adiabatic logic families, but suffers from a large number of timing buffers for “decompute”. Our focus is on the “early decompute” technique for fully pipelined 2LAL, and we propose two ILP approaches for minimizing hardware cost through optimization of early decompute. In the first approach, the problem is formulated as a kind of scheduling problem, while it is reformulated as node selection problem (stable set problem). The performance of the proposed methods are evaluated using several benchmark circuits from ISCAS-85, and the maximum 70% hardware reduction is observed compared with an existing method.
Jiaxuan LU Yutaka MASUDA Tohru ISHIHARA
Approximate computing (AC) saves energy and improves performance by introducing approximation into computation in error-torrent applications. This work focuses on an AC strategy that accurately performs important computations and approximates others. In order to make AC circuits practical, we need to determine which computation is how important carefully, and thus need to appropriately approximate the redundant computation for maintaining the required computational quality. In this paper, we focus on the importance of computations at the flip-flop (FF) level and propose a novel importance evaluation methodology. The key idea of the proposed methodology is a two-step fault injection algorithm to extract the near-optimal set of redundant FFs in the circuit. In the first step, the proposed methodology performs the FI simulation for each FF and extracts the candidates of redundant FFs. Then, in the second step, the proposed methodology extracts the set of redundant FFs in a binary search manner. Thanks to the two-step strategy, the proposed algorithm reduces the complexity of architecture exploration from an exponential order to a linear order without understanding the functionality and behavior of the target application program. Experimental results show that the proposed methodology identifies the candidates of redundant FFs depending on the given constraints. In a case study of an image processing accelerator, the truncation for identified redundant FFs reduces the circuit area by 29.6% and saves power dissipation by 44.8% under the ASIC implementation while satisfying the PSNR constraint. Similarly, the dynamic power dissipation is saved by 47.2% under the FPGA implementation.
Daisuke HIBINO Tomoharu SHIBUYA
Distributed computing is one of the powerful solutions for computational tasks that need the massive size of dataset. Lagrange coded computing (LCC), proposed by Yu et al. [15], realizes private and secure distributed computing under the existence of stragglers, malicious workers, and colluding workers by using an encoding polynomial. Since the encoding polynomial depends on a dataset, it must be updated every arrival of new dataset. Therefore, it is necessary to employ efficient algorithm to construct the encoding polynomial. In this paper, we propose Newton coded computing (NCC) which is based on Newton interpolation to construct the encoding polynomial. Let K, L, and T be the number of data, the length of each data, and the number of colluding workers, respectively. Then, the computational complexity for construction of an encoding polynomial is improved from O(L(K+T)log 2(K+T)log log (K+T)) for LCC to O(L(K+T)log (K+T)) for the proposed method. Furthermore, by applying the proposed method, the computational complexity for updating the encoding polynomial is improved from O(L(K+T)log 2(K+T)log log (K+T)) for LCC to O(L) for the proposed method.
Information-theoretic security and computational security are fundamental paradigms of security in the theory of cryptography. The two paradigms interact with each other but have shown different progress, which motivates us to explore the intersection between them. In this paper, we focus on Multi-Party Computation (MPC) because the security of MPC is formulated by simulation-based security, which originates from computational security, even if it requires information-theoretic security. We provide several equivalent formalizations of the security of MPC under a semi-honest model from the viewpoints of information theory and statistics. The interpretations of these variants are so natural that they support the other aspects of simulation-based security. Specifically, the variants based on conditional mutual information and sufficient statistics are interesting because security proofs for those variants can be given by information measures and factorization theorem, respectively. To exemplify this, we show several security proofs of BGW (Ben-Or, Goldwasser, Wigderson) protocols, which are basically proved by constructing a simulator.
Secure two-party computation is a cryptographic tool that enables two parties to compute a function jointly without revealing their inputs. It is known that any function can be realized in the correlated randomness (CR) model, where a trusted dealer distributes input-independent CR to the parties beforehand. Sometimes we can construct more efficient secure two-party protocol for a function g than that for a function f, where g is a restriction of f. However, it is not known in which case we can construct more efficient protocol for domain-restricted function. In this paper, we focus on the size of CR. We prove that we can construct more efficient protocol for a domain-restricted function when there is a “good” structure in CR space of a protocol for the original function, and show a unified way to construct a more efficient protocol in such case. In addition, we show two applications of the above result: The first application shows that some known techniques of reducing CR size for domain-restricted function can be derived in a unified way, and the second application shows that we can construct more efficient protocol than an existing one using our result.
Ayano NAKAI-KASAI Naoyuki HAYASHI Tadashi WADAYAMA
In this paper, we consider precoder design for wireless data aggregation in sensor networks. The precoder optimization problem can be formulated as minimization of mean squared error under transmit power and block diagonal constraints. We include statistical correlation of data into the optimization problem, which is appeared in typical applications but is ignored in conventional designing methods. We propose precoder optimization algorithms based on projected gradient descent with projection onto the constraint sets. The proposed method can achieve better performance than the conventional methods that do not incorporate data correlation, especially when data are highly correlated. We also extend the proposed approach to the context of over-the-air computation.
Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.
Rikuya SASAKI Hiroyuki ICHIDA Htoo Htoo Sandi KYAW Keiichi KANEKO
The increasing demand for high-performance computing in recent years has led to active research on massively parallel systems. The interconnection network in a massively parallel system interconnects hundreds of thousands of processing elements so that they can process large tasks while communicating among others. By regarding the processing elements as nodes and the links between processing elements as edges, respectively, we can discuss various problems of interconnection networks in the framework of the graph theory. Many topologies have been proposed for interconnection networks of massively parallel systems. The hypercube is a very popular topology and it has many variants. The cross-cube is such a topology, which can be obtained by adding one extra edge to each node of the hypercube. The cross-cube reduces the diameter of the hypercube, and allows cycles of odd lengths. Therefore, we focus on the cross-cube and propose an algorithm that constructs disjoint paths from a node to a set of nodes. We give a proof of correctness of the algorithm. Also, we show that the time complexity and the maximum path length of the algorithm are O(n3 log n) and 2n - 3, respectively. Moreover, we estimate that the average execution time of the algorithm is O(n2) based on a computer experiment.
Tomoki MINAMATA Hiroki HAMASAKI Hiroshi KAWASAKI Hajime NAGAHARA Satoshi ONO
This paper proposes a novel application of coded apertures (CAs) for visual information hiding. CA is one of the representative computational photography techniques, in which a patterned mask is attached to a camera as an alternative to a conventional circular aperture. With image processing in the post-processing phase, various functions such as omnifocal image capturing and depth estimation can be performed. In general, a watermark embedded as high-frequency components is difficult to extract if captured outside the focal length, and defocus blur occurs. Installation of a CA into the camera is a simple solution to mitigate the difficulty, and several attempts are conducted to make a better design for stable extraction. On the contrary, our motivation is to design a specific CA as well as an information hiding scheme; the secret information can only be decoded if an image with hidden information is captured with the key aperture at a certain distance outside the focus range. The proposed technique designs the key aperture patterns and information hiding scheme through evolutionary multi-objective optimization so as to minimize the decryption error of a hidden image when using the key aperture while minimizing the accuracy when using other apertures. During the optimization process, solution candidates, i.e., key aperture patterns and information hiding schemes, are evaluated on actual devices to account for disturbances that cannot be considered in optical simulations. Experimental results have shown that decoding can be performed with the designed key aperture and similar ones, that decrypted image quality deteriorates as the similarity between the key and the aperture used for decryption decreases, and that the proposed information hiding technique works on actual devices.
Kairi TOKUDA Takehiro SATO Eiji OKI
Mobile edge computing (MEC) is a key technology for providing services that require low latency by migrating cloud functions to the network edge. The potential low quality of the wireless channel should be noted when mobile users with limited computing resources offload tasks to an MEC server. To improve the transmission reliability, it is necessary to perform resource allocation in an MEC server, taking into account the current channel quality and the resource contention. There are several works that take a deep reinforcement learning (DRL) approach to address such resource allocation. However, these approaches consider a fixed number of users offloading their tasks, and do not assume a situation where the number of users varies due to user mobility. This paper proposes Deep reinforcement learning model for MEC Resource Allocation with Dummy (DMRA-D), an online learning model that addresses the resource allocation in an MEC server under the situation where the number of users varies. By adopting dummy state/action, DMRA-D keeps the state/action representation. Therefore, DMRA-D can continue to learn one model regardless of variation in the number of users during the operation. Numerical results show that DMRA-D improves the success rate of task submission while continuing learning under the situation where the number of users varies.
Kota HISAFURU Kazunari TAKASAKI Nozomu TOGAWA
In recent years, with the wide spread of the Internet of Things (IoT) devices, security issues for hardware devices have been increasing, where detecting their anomalous behaviors becomes quite important. One of the effective methods for detecting anomalous behaviors of IoT devices is to utilize consumed energy and operation duration time extracted from their power waveforms. However, the existing methods do not consider the shape of time-series data and cannot distinguish between power waveforms with similar consumed energy and duration time but different shapes. In this paper, we propose a method for detecting anomalous behaviors based on the shape of time-series data by incorporating a shape-based distance (SBD) measure. The proposed method first obtains the entire power waveform of the target IoT device and extracts several application power waveforms. After that, we give the invariances to them, and we can effectively obtain the SBD between every two application power waveforms. Based on the SBD values, the local outlier factor (LOF) method can finally distinguish between normal application behaviors and anomalous application behaviors. Experimental results demonstrate that the proposed method successfully detects anomalous application behaviors, while the existing state-of-the-art method fails to detect them.
Kenji KANAI Hidehiro KANEMITSU Taku YAMAZAKI Shintaro MORI Aram MINE Sumiko MIYATA Hironobu IMAMURA Hidenori NAKAZATO
A city-level digital twin is a critical enabling technology to construct a smart city that helps improve citizens' living conditions and quality of life. Currently, research and development regarding the digital replica city are pursued worldwide. However, many research projects only focus on creating the 3D city model. A mechanism to involve key players, such as data providers, service providers, and application developers, is essential for constructing the digital replica city and producing various city applications. Based on this motivation, the authors of this paper are pursuing a research project, namely Decentralized Digital Twin EcoSystem (D2EcoSys), to create an ecosystem to advance (and self-grow) the digital replica city regarding time and space directions, city services, and values. This paper introduces an overview of the D2EcoSys project: vision, problem statement, and approach. In addition, the paper discusses the recent research results regarding networking technologies and demonstrates an early testbed built in the Kashiwa-no-ha smart city.