Mami NAGOYA Tomoaki KIMURA Hiroyuki TSUJI
A simple depth-key-based image composition is proposed, which uses two still images with depth information, background and foreground object. The proposed method can place the object at various locations in the background considering the depth in the 3D world coordinate system. The main feature is that a simple algorithm is provided, which enables us to achieve the depthward movement within the camera plane, without being aware of the 3D world coordinate system. Two algorithms are proposed (P-OMDD and O-OMDD), which are based on the pin-hole camera model. As an advantage, camera calibration is not required before applying the algorithm in these methods. Since a single image is used for the object representation, each of the proposed methods has its limitations in terms of fidelity of the composite image. P-OMDD faithfully reproduces the angle at which the object is seen, but the pixels of the hidden surface are missing. On the contrary, O-OMDD can avoid the hidden surface problem, but the angle of the object is fixed, wherever it moves. It is verified through several experiments that, when using O-OMDD, subjectively natural composite images can be obtained under any object movement, in terms of size and position in the camera plane. Future tasks include improving the change in illumination due to positional changes and the partial loss of objects due to noise in depth images.
Naohisa NISHIDA Tatsumi OBA Yuji UNAGAMI Jason PAUL CRUZ Naoto YANAI Tadanori TERUYA Nuttapong ATTRAPADUNG Takahiro MATSUDA Goichiro HANAOKA
Machine learning models inherently memorize significant amounts of information, and thus hiding not only prediction processes but also trained models, i.e., model obliviousness, is desirable in the cloud setting. Several works achieved model obliviousness with the MNIST dataset, but datasets that include complicated samples, e.g., CIFAR-10 and CIFAR-100, are also used in actual applications, such as face recognition. Secret sharing-based secure prediction for CIFAR-10 is difficult to achieve. When a deep layer architecture such as CNN is used, the calculation error when performing secret calculation becomes large and the accuracy deteriorates. In addition, if detailed calculations are performed to improve accuracy, a large amount of calculation is required. Therefore, even if the conventional method is applied to CNN as it is, good results as described in the paper cannot be obtained. In this paper, we propose two approaches to solve this problem. Firstly, we propose a new protocol named Batch-normalizedActivation that combines BatchNormalization and Activation. Since BatchNormalization includes real number operations, when performing secret calculation, parameters must be converted into integers, which causes a calculation error and decrease accuracy. By using our protocol, calculation errors can be eliminated, and accuracy degradation can be eliminated. Further, the processing is simplified, and the amount of calculation is reduced. Secondly, we explore a secret computation friendly and high accuracy architecture. Related works use a low-accuracy, simple architecture, but in reality, a high accuracy architecture should be used. Therefore, we also explored a high accuracy architecture for the CIFAR10 dataset. Our proposed protocol can compute prediction of CIFAR-10 within 15.05 seconds with 87.36% accuracy while providing model obliviousness.
Hideyuki ICHIHARA Motoi FUKUDA Tsuyoshi IWAGAKI Tomoo INOUE
Stochastic computing (SC), which is an approximate computation with probabilities, has attracted attention owing to its small area, small power consumption and high fault tolerance. In this paper, we focus on the transient fault tolerance of SC based on linear finite state machines (linear FSMs). We show that state assignment of FSMs considerably affects the fault tolerance of linear FSM-based SC circuits, and present a Markov model for representing the impact of the state assignment on the behavior of faulty FSMs and estimating the expected error significance of the faulty FSM-based SC circuits. Furthermore, we propose a heuristic algorithm for appropriate state assignment that can mitigate the influence of transient faults. Experimental analysis shows that the state assignment has an impact on the transient fault tolerance of linear FSM-based SC circuits and the proposed state assignment algorithm can achieve a quasi-optimal state assignment in terms of high fault tolerance.
Ayana KAWAMURA Yuma KINOSHITA Takayuki NAKACHI Sayaka SHIOTA Hitoshi KIYA
We propose a privacy-preserving machine learning scheme with encryption-then-compression (EtC) images, where EtC images are images encrypted by using a block-based encryption method proposed for EtC systems with JPEG compression. In this paper, a novel property of EtC images is first discussed, although EtC ones was already shown to be compressible as a property. The novel property allows us to directly apply EtC images to machine learning algorithms non-specialized for computing encrypted data. In addition, the proposed scheme is demonstrated to provide no degradation in the performance of some typical machine learning algorithms including the support vector machine algorithm with kernel trick and random forests under the use of z-score normalization. A number of facial recognition experiments with are carried out to confirm the effectiveness of the proposed scheme.
Jingcheng SHEN Fumihiko INO Albert FARRÉS Mauricio HANZICH
Graphics processing units (GPUs) are highly efficient architectures for parallel stencil code; however, the small device (i.e., GPU) memory capacity (several tens of GBs) necessitates the use of out-of-core computation to process excess data. Great programming effort is needed to manually implement efficient out-of-core stencil code. To relieve such programming burdens, directive-based frameworks emerged, such as the pipelined accelerator (PACC); however, they usually lack specific optimizations to reduce data transfer. In this paper, we extend PACC with two data-centric optimizations to address data transfer problems. The first is a direct-mapping scheme that eliminates host (i.e., CPU) buffers, which intermediate between the original data and device buffers. The second is a region-sharing scheme that significantly reduces host-to-device data transfer. The extended PACC was applied to an acoustic wave propagator, automatically extending the length of original serial code 2.3-fold to obtain the out-of-core code. Experimental results revealed that on a Tesla V100 GPU, the generated code ran 41.0, 22.1, and 3.6 times as fast as implementations based on Open Multi-Processing (OpenMP), Unified Memory, and the previous PACC, respectively. The generated code also demonstrated usefulness with small datasets that fit in the device capacity, running 1.3 times as fast as an in-core implementation.
Shoko IMAIZUMI Yusuke IZAWA Ryoichi HIRASAWA Hitoshi KIYA
We propose a reversible data hiding (RDH) method in compressible encrypted images called the encryption-then-compression (EtC) images. The proposed method allows us to not only embed a payload in encrypted images but also compress the encrypted images containing the payload. In addition, the proposed RDH method can be applied to both plain images and encrypted ones, and the payload can be extracted flexibly in the encrypted domain or from the decrypted images. Various RDH methods have been studied in the encrypted domain, but they are not considered to be two-domain data hiding, and the resultant images cannot be compressed by using image coding standards, such as JPEG-LS and JPEG 2000. In our experiment, the proposed method shows high performance in terms of lossless compression efficiency by using JPEG-LS and JPEG 2000, data hiding capacity, and marked image quality.
Daisuke KANEMOTO Shun KATSUMATA Masao AIHARA Makoto OHKI
This paper proposes a novel compressed sensing (CS) framework for reconstructing electroencephalogram (EEG) signals. A feature of this framework is the application of independent component analysis (ICA) to remove the interference from artifacts after undersampling in a data processing unit. Therefore, we can remove the ICA processing block from the sensing unit. In this framework, we used a random undersampling measurement matrix to suppress the Gaussian. The developed framework, in which the discrete cosine transform basis and orthogonal matching pursuit were used, was evaluated using raw EEG signals with a pseudo-model of an eye-blink artifact. The normalized mean square error (NMSE) and correlation coefficient (CC), obtained as the average of 2,000 results, were compared to quantitatively demonstrate the effectiveness of the proposed framework. The evaluation results of the NMSE and CC showed that the proposed framework could remove the interference from the artifacts under a high compression ratio.
High level synthesis (HLS) is a source-code-driven Register Transfer Level (RTL) design tool, and the performance, the power consumption, and the area of a generated RTL are limited partly by the description of a HLS input source code. In order to break through such kind of limitation and to get a further optimized RTL, the optimization of the input source code is indispensable. Routing congestion is one of such problems we need to consider the refinement of a HLS input source code. In this paper, we propose a novel HLS flow that performs code improvements by detecting congested parts directly from HLS input source code without using physical logic synthesis, and regenerating the input source code for HLS. In our approach, the origin of the wire congestion is detected from the HLS input source code by applying pattern matching on Program-Dependence Graph (PDG) constructed from the HLS input source code, the possibility of wire congestion is reported.
Yuta KAIHORI Yu YAMASAKI Tsuyoshi KONISHI
A high degree of freedom in spectral domain allows us to accommodate additional optical signal processing for wavelength division multiplexing in photonic analog-to-digital conversion. We experimentally verified a spectral compression to save a necessary bandwidth for soliton self-frequency shift based optical quantization through the cascade of the four-wave mixing based and the sum-frequency generation based spectral compression. This approach can realize 0.03 nm individual bandwidth correspond to save up to more than 85 percent of bandwidth for 7-bit optical quantization in C-band.
This paper proposes a deterministic pilot pattern placement optimization scheme based on the quantum genetic algorithm (QGA) which aims to improve the performance of sparse channel estimation in orthogonal frequency division multiplexing (OFDM) systems. By minimizing the mutual incoherence property (MIP) of the sensing matrix, the pilot pattern placement optimization is modeled as the solution of a combinatorial optimization problem. QGA is used to solve the optimization problem and generate optimized pilot pattern that can effectively avoid local optima traps. The simulation results demonstrate that the proposed method can generate a sensing matrix with a smaller MIP than a random search or the genetic algorithm (GA), and the optimized pilot pattern performs well for sparse channel estimation in OFDM systems.
Tomoko K. MATSUSHIMA Shoichiro YAMASAKI
The direct sequence code division multiple access (DS-CDMA) technique is widely used in various communication systems. When adopting orthogonal variable spreading factor (OVSF) codes, DS-CDMA is particularly suitable for supporting multi-user/multi-rate data transmission services. A useful property of OVSF codes is that no two code sequences assigned to different users will ever interfere with each other, even if their spreading factors are different. Conventional OVSF codes are constructed based on binary orthogonal codes, called Walsh codes, and OVSF code sequences are binary sequences. In this paper, we propose new OVSF codes that are constructed based on polyphase orthogonal codes and consist of complex sequences in which each symbol is represented as a complex number. Construction of the proposed codes is based on a tree structure that is similar to conventional OVSF codes. Since the proposed codes are generalized versions of conventional OVSF codes, any conventional OVSF code can be presented as a special case of the proposed codes. Herein, we show the method used to construct the proposed OVSF codes, after which the orthogonality of the codes, including conventional OVSF codes, is investigated. Among the advantages of our proposed OVSF codes is that the spreading factor can be designed more flexibly in each layer than is possible with conventional OVSF codes. Furthermore, combination of the proposed code and a non-binary phase modulation is well suited to DS-CDMA systems where the level fluctuation of signal envelope is required to be suppressed.
Secure multi-party computation (MPC) allows a set of parties to compute a function jointly while keeping their inputs private. MPC has been actively studied, and there are many research results both in the theoretical and practical research fields. In this paper, we introduce the basic matters on MPC and show recent practical advances. We first explain the settings, security notions, and cryptographic building blocks of MPC. Then, we show and discuss current situations on higher-level secure protocols, privacy-preserving data analysis, and frameworks/compilers for implementing MPC applications with low-cost.
Tsunehiro YOSHINAGA Makoto SAKAMOTO
This paper investigates the closure properties of multi-inkdot nondeterministic Turing machines with sublogarithmic space. We show that the class of sets accepted by the Turing machines is not closed under concatenation with regular set, Kleene closure, length-preserving homomorphism, and intersection.
Ryuta SHINGAI Yuria HIRAGA Hisakazu FUKUOKA Takamasa MITANI Takashi NAKADA Yasuhiko NAKASHIMA
Modern deep learning has significantly improved performance and has been used in a wide variety of applications. Since the amount of computation required for the inference process of the neural network is large, it is processed not by the data acquisition location like a surveillance camera but by the server with abundant computing power installed in the data center. Edge computing is getting considerable attention to solve this problem. However, edge computing can provide limited computation resources. Therefore, we assumed a divided/distributed neural network model using both the edge device and the server. By processing part of the convolution layer on edge, the amount of communication becomes smaller than that of the sensor data. In this paper, we have evaluated AlexNet and the other eight models on the distributed environment and estimated FPS values with Wi-Fi, 3G, and 5G communication. To reduce communication costs, we also introduced the compression process before communication. This compression may degrade the object recognition accuracy. As necessary conditions, we set FPS to 30 or faster and object recognition accuracy to 69.7% or higher. This value is determined based on that of an approximation model that binarizes the activation of Neural Network. We constructed performance and energy models to find the optimal configuration that consumes minimum energy while satisfying the necessary conditions. Through the comprehensive evaluation, we found that the optimal configurations of all nine models. For small models, such as AlexNet, processing entire models in the edge was the best. On the other hand, for huge models, such as VGG16, processing entire models in the server was the best. For medium-size models, the distributed models were good candidates. We confirmed that our model found the most energy efficient configuration while satisfying FPS and accuracy requirements, and the distributed models successfully reduced the energy consumption up to 48.6%, and 6.6% on average. We also found that HEVC compression is important before transferring the input data or the feature data between the distributed inference processes.
Nurimisaki and Sashigane are Nikoli's pencil puzzles. We study the computational complexity of Nurimisaki and Sashigane puzzles. It is shown that deciding whether a given instance of each puzzle has a solution is NP-complete.
In this paper, we propose a secure computation of sparse coding and its application to Encryption-then-Compression (EtC) systems. The proposed scheme introduces secure sparse coding that allows computation of an Orthogonal Matching Pursuit (OMP) algorithm in an encrypted domain. We prove theoretically that the proposed method estimates exactly the same sparse representations that the OMP algorithm for non-encrypted computation does. This means that there is no degradation of the sparse representation performance. Furthermore, the proposed method can control the sparsity without decoding the encrypted signals. Next, we propose an EtC system based on the secure sparse coding. The proposed secure EtC system can protect the private information of the original image contents while performing image compression. It provides the same rate-distortion performance as that of sparse coding without encryption, as demonstrated on both synthetic data and natural images.
Toshinori SATO Tomoaki UKEZONO
This paper proposes a technique that increases the lifetime of large scale integration (LSI) devices. As semiconductor technology improves at miniaturizing transistors, aging effects due to bias temperature instability (BTI) seriously affects their lifetime. BTI increases the threshold voltage of transistors thereby also increasing the delay of an electronics device, resulting in failures due to timing violations. To compensate for aging-induced timing violations, we exploit configurable approximate computing. Assuming that target circuits have exact and approximate modes, they are configured for the approximate mode if an aging sensor predicts violations. Experiments using an example circuit revealed an increase in its lifetime to >10 years.
Yi GUO Heming SUN Ping LEI Shinji KIMURA
Approximate multiplier design is an effective technique to improve hardware performance at the cost of accuracy loss. The current approximate multipliers are mostly ASIC-based and are dedicated for one particular application. In contrast, FPGA has been an attractive choice for many applications because of its high performance, reconfigurability, and fast development round. This paper presents a novel methodology for designing approximate multipliers by employing the FPGA-based fabrics (primarily look-up tables and carry chains). The area and latency are significantly reduced by applying approximation on carry results and cutting the carry propagation path in the multiplier. Moreover, we explore higher-order multipliers on architectural space by using our proposed small-size approximate multipliers as elementary modules. For different accuracy-hardware requirements, eight configurations for approximate 8×8 multiplier are discussed. In terms of mean relative error distance (MRED), the error of the proposed 8×8 multiplier is as low as 1.06%. Compared with the exact multiplier, our proposed design can reduce area by 43.66% and power by 24.24%. The critical path latency reduction is up to 29.50%. The proposed multiplier design has a better accuracy-hardware tradeoff than other designs with comparable accuracy. Moreover, image sharpening processing is used to assess the efficiency of approximate multipliers on application.
This paper describes designing a new pedestrian navigation system using a comprehensive framework called the pedestrian navigation concept reference model (PNCRM). We implement this system as a publicly-available smartphone application and evaluate its positioning performance near Omiya station's western entrance. We also evaluate users' subjective impressions of the system using a questionnaire. In both cases, promising results are obtained, showing that the PNCRM can be used as a tool for designing pedestrian navigation systems, allowing such systems to be created systematically.
Qian WANG Qingmei ZHOU Wei ZHAO Xuangou WU Xun SHAO
In the age of big data, recommendation systems provide users with fast access to interesting information, resulting to a significant commercial value. However, the extreme sparseness of user assessment data is one of the key factors that lead to the poor performance of recommendation algorithms. To address this problem, we propose a spectral clustering recommendation scheme with low-rank matrix completion and spectral clustering. Our scheme exploits spectral clustering to achieve the division of a similar user group. Meanwhile, the low-rank matrix completion is used to effectively predict un-rated items in the sub-matrix of the spectral clustering. With the real dataset experiment, the results show that our proposed scheme can effectively improve the prediction accuracy of un-rated items.