Yu WANG Liangyong YANG Jilian ZHANG Xuelian DENG
Cloud computing has become the mainstream computing paradigm nowadays. More and more data owners (DO) choose to outsource their data to a cloud service provider (CSP), who is responsible for data management and query processing on behalf of DO, so as to cut down operational costs for the DO. However, in real-world applications, CSP may be untrusted, hence it is necessary to authenticate the query result returned from the CSP. In this paper, we consider the problem of approximate string query result authentication in the context of database outsourcing. Based on Merkle Hash Tree (MHT) and Trie, we propose an authenticated tree structure named MTrie for authenticating approximate string query results. We design efficient algorithms for query processing and query result authentication. To verify effectiveness of our method, we have conducted extensive experiments on real datasets and the results show that our proposed method can effectively authenticate approximate string query results.
Jiaxuan LU Yutaka MASUDA Tohru ISHIHARA
Approximate computing (AC) saves energy and improves performance by introducing approximation into computation in error-torrent applications. This work focuses on an AC strategy that accurately performs important computations and approximates others. In order to make AC circuits practical, we need to determine which computation is how important carefully, and thus need to appropriately approximate the redundant computation for maintaining the required computational quality. In this paper, we focus on the importance of computations at the flip-flop (FF) level and propose a novel importance evaluation methodology. The key idea of the proposed methodology is a two-step fault injection algorithm to extract the near-optimal set of redundant FFs in the circuit. In the first step, the proposed methodology performs the FI simulation for each FF and extracts the candidates of redundant FFs. Then, in the second step, the proposed methodology extracts the set of redundant FFs in a binary search manner. Thanks to the two-step strategy, the proposed algorithm reduces the complexity of architecture exploration from an exponential order to a linear order without understanding the functionality and behavior of the target application program. Experimental results show that the proposed methodology identifies the candidates of redundant FFs depending on the given constraints. In a case study of an image processing accelerator, the truncation for identified redundant FFs reduces the circuit area by 29.6% and saves power dissipation by 44.8% under the ASIC implementation while satisfying the PSNR constraint. Similarly, the dynamic power dissipation is saved by 47.2% under the FPGA implementation.
In this paper, we introduce a framework of distributed orthogonal approximate message passing for recovering sparse vector based on sensing by multiple nodes. The iterative recovery process consists of local computation at each node, and global computation performed either by a particular node or joint computation on the overall network by exchanging messages. We then propose a method to reduce the communication cost between the nodes while maintaining the recovery performance.
Satoshi DENNO Taichi YAMAGAMI Yafei HOU
This paper proposes low complexity resource allocation in frequency domain non-orthogonal multiple access where many devices access with a base station. The number of the devices is assumed to be more than that of the resource for network capacity enhancement, which is demanded in massive machine type communications (mMTC). This paper proposes two types of resource allocation techniques, all of which are based on the MIN-MAX approach. One of them seeks for nicer resource allocation with only channel gains. The other technique applies the message passing algorithm (MPA) for better resource allocation. The proposed resource allocation techniques are evaluated by computer simulation in frequency domain non-orthogonal multiple access. The proposed technique with the MPA achieves the best bit error rate (BER) performance in the proposed techniques. However, the computational complexity of the proposed techniques with channel gains is much smaller than that of the proposed technique with the MPA, whereas the BER performance of the proposed techniques with channel gains is only about 0.1dB inferior to that with the MPA in the multiple access with the overloading ratio of 1.5 at the BER of 10-4. They attain the gain of about 10dB at the BER of 10-4 in the multiple access with the overloading ration of 2.0. Their complexity is 10-16 as small as the conventional technique.
With the emergence of a large quantity of data in science and industry, it is urgent to improve the prediction accuracy and reduce the high complexity of Gaussian process regression (GPR). However, the traditional global approximation and local approximation have corresponding shortcomings, such as global approximation tends to ignore local features, and local approximation has the problem of over-fitting. In order to solve these problems, a large-scale Gaussian process regression algorithm (RFFLT) combining random Fourier features (RFF) and local approximation is proposed. 1) In order to speed up the training time, we use the random Fourier feature map input data mapped to the random low-dimensional feature space for processing. The main innovation of the algorithm is to design features by using existing fast linear processing methods, so that the inner product of the transformed data is approximately equal to the inner product in the feature space of the shift invariant kernel specified by the user. 2) The generalized robust Bayesian committee machine (GRBCM) based on Tsallis mutual information method is used in local approximation, which enhances the flexibility of the model and generates a sparse representation of the expert weight distribution compared with previous work. The algorithm RFFLT was tested on six real data sets, which greatly shortened the time of regression prediction and improved the prediction accuracy.
Tian FANG Feng LIU Conggai LI Fangjiong CHEN Yanli XU
Underwater acoustic channels (UWA) are usually sparse, which can be exploited for adaptive equalization to improve the system performance. For the shallow UWA channels, based on the proportional minimum symbol error rate (PMSER) criterion, the adaptive equalization framework requires the sparsity selection. Since the sparsity of the L0 norm is stronger than that of the L1, we choose it to achieve better convergence. However, because the L0 norm leads to NP-hard problems, it is difficult to find an efficient solution. In order to solve this problem, we choose the Gaussian function to approximate the L0 norm. Simulation results show that the proposed scheme obtains better performance than the L1 based counterpart.
Sakyo HASHIMOTO Keigo TAKEUCHI
This letter simplifies and analyze existing state evolution recursions for conjugate gradient. The proposed simplification reduces the complexity for solving the recursions from cubic order to square order in the total number of iterations. The simplified recursions are still catastrophically sensitive to numerical errors, so that arbitrary-precision arithmetic is used for accurate evaluation of the recursions.
Toshihiro YOSHIDA Keigo TAKEUCHI
This paper addresses short-length sparse superposition codes (SSCs) over the additive white Gaussian noise channel. Damped approximate message-passing (AMP) is used to decode short SSCs with zero-mean independent and identically distributed Gaussian dictionaries. To design damping factors in AMP via deep learning, this paper constructs deep-unfolded damped AMP decoding networks. An annealing method for deep learning is proposed for designing nearly optimal damping factors with high probability. In annealing, damping factors are first optimized via deep learning in the low signal-to-noise ratio (SNR) regime. Then, the obtained damping factors are set to the initial values in stochastic gradient descent, which optimizes damping factors for slightly larger SNR. Repeating this annealing process designs damping factors in the high SNR regime. Numerical simulations show that annealing mitigates fluctuation in learned damping factors and outperforms exhaustive search based on an iteration-independent damping factor.
This study proposes a heterogeneous integration of precise and approximate storage in data center storage. The storage control engine allocates precise and error-tolerant applications to precise and approximate storage, respectively. The appropriate use of both precise and approximate storage is examined by applying a non-volatile memory capacity algorithm. To respond to the changes in application over time, the non-volatile memory capacity algorithm changes capacity of storage class memories (SCMs), namely the memory-type SCM (M-SCM) and storage-type SCM (S-SCM), in non-volatile memory resource. A three-dimensional triple-level cell (TLC) NAND flash is used as a large capacity memory. The results indicate that precise storage exhibits a high performance when the maximum storage cost is high. By contrast, with a low maximum storage cost, approximate storage exhibits high performance using a low bit cost approximate multiple-level cell (MLC) S-SCM.
Yutaka MASUDA Yusei HONDA Tohru ISHIHARA
Approximate computing (AC) has recently emerged as a promising approach to the energy-efficient design of digital systems. For realizing the practical AC design, we need to verify whether the designed circuit can operate correctly under various operating conditions. Namely, the verification needs to efficiently find fatal logic errors or timing errors that violate the constraint of computational quality. This work focuses on the verification where the computational results can be observed, the computational quality can be calculated from computational results, and the constraint of computational quality is given and defined as the constraint which is set to the computational quality of designed AC circuit with given workloads. Then, this paper proposes a novel dynamic verification framework of the AC circuit. The key idea of the proposed framework is to incorporate a quality assessment capability into the Coverage-based Grey-box Fuzzing (CGF). CGF is one of the most promising techniques in the research field of software security testing. By repeating (1) mutation of test patterns, (2) execution of the program under test (PUT), and (3) aggregation of coverage information and feedback to the next test pattern generation, CGF can explore the verification space quickly and automatically. On the other hand, CGF originally cannot consider the computational quality by itself. For overcoming this quality unawareness in CGF, the proposed framework additionally embeds the Design Under Verification (DUV) component into the calculation part of computational quality. Thanks to the DUV integration, the proposed framework realizes the quality-aware feedback loop in CGF and thus quickly enhances the verification coverage for test patterns that violate the quality constraint. In this work, we quantitatively compared the verification coverage of the approximate arithmetic circuits between the proposed framework and the random test. In a case study of an approximate multiply-accumulate (MAC) unit, we experimentally confirmed that the proposed framework achieved 3.85 to 10.36 times higher coverage than the random test.
Lingxiao HOU Yutaka MASUDA Tohru ISHIHARA
The approximate logarithmic multiplier proposed by Mitchell provides an efficient alternative for processing dense multiplication or multiply-accumulate operations in applications such as image processing and real-time robotics. It offers the advantages of small area, high energy efficiency and is suitable for applications that do not necessarily achieve high accuracy. However, its maximum error of 11.1% makes it challenging to deploy in applications requiring relatively high accuracy. This paper proposes a novel operand decomposition method (OD) that decomposes one multiplication into the sum of multiple approximate logarithmic multiplications to widely reduce Mitchell multiplier errors while taking full advantage of its area savings. Based on the proposed OD method, this paper also proposes an accuracy reconfigurable multiply-accumulate (MAC) unit that provides multiple reconfigurable accuracies with high parallelism. Compared to a MAC unit consisting of accurate multipliers, the area is significantly reduced to less than half, improving the hardware parallelism while satisfying the required accuracy for various scenarios. The experimental results show the excellent applicability of our proposed MAC unit in image smoothing and robot localization and mapping application. We have also designed a prototype processor that integrates the minimum functionality of this MAC unit as a vector accelerator and have implemented a software-level accuracy reconfiguration in the form of an instruction set extension. We experimentally confirmed the correct operation of the proposed vector accelerator, which provides the different degrees of accuracy and parallelism at the software level.
Naoya HIEDA Keita MORIMOTO Akito IGUCHI Yasuhide TSUJI Tatsuya KASHIWA
In order to increase communication capacity, the use of millimeter-wave and terahertz-wave bands are being actively explored. Non-radiative dielectric waveguide known as NRD guide is one of promising platform of millimeter-wave integrated circuits thanks to its non-radiative and low loss nature. In order to develop millimeter wave circuits with various functions, various circuit components have to be efficiently designed to meet requirements from application side. In this paper, for efficient design of NRD guide devices, we develop a topology optimal design technique based on function-expansion-method which can express arbitrary structure with arbitrary geometric topology. In the present approach, recently developed two-dimensional full-vectorial finite element method (2D-FVFEM) for NRD guide devices is employed to improve computational efficiency and several evolutionary approaches, which do not require appropriate initial structure depending on a given design problem, are used to optimize design variables, thus, NRD guide devices having desired functions are efficiently obtained without requiring designer's special knowledge.
Rui JIANG Xiao ZHOU You Yun XU Li ZHANG
Millimeter wave (mmWave) massive Multiple-Input Multiple-Output (MIMO) systems generally adopt hybrid precoding combining digital and analog precoder as an alternative to full digital precoding to reduce RF chains and energy consumption. In order to balance the relationship between spectral efficiency, energy efficiency and hardware complexity, the hybrid-connected system structure should be adopted, and then the solution process of hybrid precoding can be simplified by decomposing the total achievable rate into several sub-rates. However, the singular value decomposition (SVD) incurs high complexity in calculating the optimal unconstrained hybrid precoder for each sub-rate. Therefore, this paper proposes PAST, a low complexity hybrid precoding algorithm based on projection approximate subspace tracking. The optimal unconstrained hybrid precoder of each sub-rate is estimated with the PAST algorithm, which avoids the high complexity process of calculating the left and right singular vectors and singular value matrix by SVD. Simulations demonstrate that PAST matches the spectral efficiency of SVD-based hybrid precoding in full-connected (FC), hybrid-connected (HC) and sub-connected (SC) system structure. Moreover, the superiority of PAST over SVD-based hybrid precoding in terms of complexity and increases with the number of transmitting antennas.
Hiroshi ETO Takehiro ITO Zhilong LIU Eiji MIYANO
This paper studies generalized variants of the MAXIMUM INDEPENDENT SET problem, called the MAXIMUM DISTANCE-d INDEPENDENT SET problem (MaxDdIS for short). For an integer d≥2, a distance-d independent set of an unweighted graph G=(V, E) is a subset S⊆V of vertices such that for any pair of vertices u, v∈S, the number of edges in any path between u and v is at least d in G. Given an unweighted graph G, the goal of MaxDdIS is to find a maximum-cardinality distance-d independent set of G. In this paper, we analyze the (in)approximability of the problem on r-regular graphs (r≥3) and planar graphs, as follows: (1) For every fixed integers d≥3 and r≥3, MaxDdIS on r-regular graphs is APX-hard. (2) We design polynomial-time O(rd-1)-approximation and O(rd-2/d)-approximation algorithms for MaxDdIS on r-regular graphs. (3) We sharpen the above O(rd-2/d)-approximation algorithms when restricted to d=r=3, and give a polynomial-time 2-approximation algorithm for MaxD3IS on cubic graphs. (4) Finally, we show that MaxDdIS admits a polynomial-time approximation scheme (PTAS) for planar graphs.
In this paper, the scattering far-field from a circular electric conducting cylinder has been analyzed by physical optics (PO) approximation for both H and E polarizations. The evaluation of radiation integrations due to the PO current is conducted numerically and analytically. While non-uniform and uniform asymptotic solutions have been derived by the saddle point method, a separate approximation has been made for forward scattering direction. Comparisons among our approximation, direct numerical integration and exact solution results yield a good agreement for electrically large cylinders.
Tatsuma MORI Taito MANABE Yuichiro SHIBATA
The convex hull is the minimum convex surrounding a given set of points. Since the process of finding convex hulls has various practical application fields including embedded real-time systems, efficient acceleration of convex hull algorithms is an important problem in computer geometry. In this paper, we discuss an FPGA acceleration approach to address this problem. In order to compute the convex hull of an unsorted point set, it is necessary to store all the points during the computation, and thus the capacity of a on-chip memory is likely to be a major constraint for efficient FPGA implementation. On the other hand, approximate convex hulls are often sufficient for practical applications. Therefore, we propose a hardware oriented approximate convex hull algorithm, which can process the input points as a stream without storing all the points in the memory. We also propose some computation reduction techniques for efficient FPGA implementation. Then, we present FPGA implementation of the proposed algorithm, which is parallelized both in temporal and spatial domains, and evaluate its effectiveness in terms of performance and accuracy. As a result, we demonstrated 11 to 30 times faster performance compared to the widely-used convex hull software library Qhull. In addition, accuracy assessment revealed that the maximum approximation error normalized to the diameters of point sets was 0.038%, which was reasonably small for practical use cases.
Xiaoling YU Yuntao WANG Chungen XU Tsuyoshi TAKAGI
Due to the property of supporting arbitrary operation over the encrypted data, fully homomorphic encryption (FHE) has drawn considerable attention since it appeared. Some FHE schemes have been constructed based on the general approximate common divisor (GACD) problem, which is widely believed intractable. Therefore, studying the GACD problem's hardness can provide proper security parameters for these FHE schemes and their variants. This paper aims to study an orthogonal lattice algorithm introduced by Ding and Tao (Ding-Tao algorithm) to solve the GACD problem. We revisit the condition that Ding-Tao algorithm works and obtain a new bound of the GACD samples' number based on geometric series assumption. Simultaneously, we also give an analysis of the bound given in the previous work. To further verify the theoretical results, we conduct experiments on Ding-Tao algorithm under our bound. We show a comparison with the experimental results under the previous bound, which indicates the success probability under our bound is higher than that of the previous bound with the growth of the bound.
Yasushi ESAKI Yuta NAKAHARA Toshiyasu MATSUSHIMA
There have been some researchers that investigate the accuracy of the approximation to a function that shows a generating pattern of data by a deep neural network. However, they have confirmed only whether at least one function close to the function showing a generating pattern exists in function classes of deep neural networks whose parameter values are changing. Therefore, we propose a new criterion to infer the approximation accuracy. Our new criterion shows the existence ratio of functions close to the function showing a generating pattern in the function classes. Moreover, we show a deep neural network with a larger number of layers approximates the function showing a generating pattern more accurately than one with a smaller number of layers under the proposed criterion, with numerical simulations.
Yutaka MASUDA Jun NAGAYAMA TaiYu CHENG Tohru ISHIHARA Yoichi MOMIYAMA Masanori HASHIMOTO
This work proposes a design methodology that saves the power dissipation under voltage over-scaling (VOS) operation. The key idea of the proposed design methodology is to combine critical path isolation (CPI) and bit-width scaling (BWS) under the constraint of computational quality, e.g., Peak Signal-to-Noise Ratio (PSNR) in the image processing domain. Conventional CPI inherently cannot reduce the delay of intrinsic critical paths (CPs), which may significantly restrict the power saving effect. On the other hand, the proposed methodology tries to reduce both intrinsic and non-intrinsic CPs. Therefore, our design dramatically reduces the supply voltage and power dissipation while satisfying the quality constraint. Moreover, for reducing co-design exploration space, the proposed methodology utilizes the exclusiveness of the paths targeted by CPI and BWS, where CPI aims at reducing the minimum supply voltage of non-intrinsic CP, and BWS focuses on intrinsic CPs in arithmetic units. From this key exclusiveness, the proposed design splits the simultaneous optimization problem into three sub-problems; (1) the determination of bit-width reduction, (2) the timing optimization for non-intrinsic CPs, and (3) investigating the minimum supply voltage of the BWS and CPI-applied circuit under quality constraint, for reducing power dissipation. Thanks to the problem splitting, the proposed methodology can efficiently find quality-constrained minimum-power design. Evaluation results show that CPI and BWS are highly compatible, and they significantly enhance the efficacy of VOS. In a case study of a GPGPU processor, the proposed design saves the power dissipation by 42.7% with an image processing workload and by 51.2% with a neural network inference workload.
Convolutional approximate message-passing (CAMP) is an efficient algorithm to solve linear inverse problems. CAMP aims to realize advantages of both approximate message-passing (AMP) and orthogonal/vector AMP. CAMP uses the same low-complexity matched-filter as AMP. To realize the asymptotic Gaussianity of estimation errors for all right-orthogonally invariant matrices, as guaranteed in orthogonal/vector AMP, the Onsager correction in AMP is replaced with a convolution of all preceding messages. CAMP was proved to be asymptotically Bayes-optimal if a state-evolution (SE) recursion converges to a fixed-point (FP) and if the FP is unique. However, no proofs for the convergence were provided. This paper presents a theoretical analysis for the convergence of the SE recursion. Gaussian signaling is assumed to linearize the SE recursion. A condition for the convergence is derived via a necessary and sufficient condition for which the linearized SE recursion has a unique stationary solution. The SE recursion is numerically verified to converge toward the Bayes-optimal solution if and only if the condition is satisfied. CAMP is compared to conjugate gradient (CG) for Gaussian signaling in terms of the convergence properties. CAMP is inferior to CG for matrices with a large condition number while they are comparable to each other for a small condition number. These results imply that CAMP has room for improvement in terms of the convergence properties.