It is a hot issue that speeding up the network layers and decreasing the network parameters in convolutional neural networks (CNNs). In this paper, we propose a novel method, namely, symmetric decomposition of convolution kernels (SDKs). It symmetrically separates k×k convolution kernels into (k×1 and 1×k) or (1×k and k×1) kernels. We conduct the comparison experiments of the network models designed by SDKs on MNIST and CIFAR-10 datasets. Compared with the corresponding CNNs, we obtain good recognition performance, with 1.1×-1.5× speedup and more than 30% reduction of network parameters. The experimental results indicate our method is useful and effective for CNNs in practice, in terms of speedup performance and reduction of parameters.
Jinfa WANG Siyuan JIA Hai ZHAO Jiuqiang XU Chuan LIN
Detecting anomalies, such as network failure or intentional attack in Internet, is a vital but challenging task. Although numerous techniques have been developed based on Internet traffic, detecting anomalies from the perspective of Internet topology structure is going to be possible because the anomaly detection of structured datasets based on complex network theory has become a focus of attention recently. In this paper, an anomaly detection method for the large-scale Internet topology is proposed to detect local structure crashes caused by the cascading failure. In order to quantify the dynamic changes of Internet topology, the network path changes coefficient (NPCC) is put forward which highlights the Internet abnormal state after it is attacked continuously. Furthermore, inspired by Fibonacci Sequence, we proposed the decision function that can determine whether the Internet is abnormal or not. That is the current Internet is abnormal if its NPCC is out of the normal domain calculated using the previous k NPCCs of Internet topology. Finally the new Internet anomaly detection method is tested against the topology data of three Internet anomaly events. The results show that the detection accuracy of all events are over 97%, the detection precision for three events are 90.24%, 83.33% and 66.67%, when k=36. According to the experimental values of index F1, larger values of k offer better detection performance. Meanwhile, our method has better performance for the anomaly behaviors caused by network failure than those caused by intentional attack. Compared with traditional anomaly detection methods, our work is more simple and powerful for the government or organization in items of detecting large-scale abnormal events.
Takashi YOKOTA Kanemitsu OOTSU Takeshi OHKAWA
This paper intends to reduce duration times in typical collective communications. We introduce logical addressing system apart from the physical one and, by rearranging the logical node addresses properly, we intend to reduce communication overheads so that ideal communication is performed. One of the key issues is rearrangement of the logical addressing system. We introduce genetic algorithm (GA) as meta-heuristic solution as well as the random search strategy. Our GA-based method achieves at most 2.50 times speedup in three-traffic-pattern cases.
This paper proposes a block-permutation-based encryption (BPBE) scheme for the encryption-then-compression (ETC) system that enhances the color scrambling. A BPBE image can be obtained through four processes, positional scrambling, block rotation/flip, negative-positive transformation, and color component shuffling, after dividing the original image into multiple blocks. The proposed scheme scrambles the R, G, and B components independently in positional scrambling, block rotation/flip, and negative-positive transformation, by assigning different keys to each color component. The conventional scheme considers the compression efficiency using JPEG and JPEG 2000, which need a color conversion before the compression process by default. Therefore, the conventional scheme scrambles the color components identically in each process. In contrast, the proposed scheme takes into account the RGB-based compression, such as JPEG-LS, and thus can increase the extent of the scrambling. The resilience against jigsaw puzzle solver (JPS) can consequently be increased owing to the wider color distribution of the BPBE image. Additionally, the key space for resilience against brute-force attacks has also been expanded exponentially. Furthermore, the proposed scheme can maintain the JPEG-LS compression efficiency compared to the conventional scheme. We confirm the effectiveness of the proposed scheme by experiments and analyses.
Kazuyoshi TSUCHIYA Chiaki OGAWA Yasuyuki NOGAMI Satoshi UEHARA
Pseudorandom number generators are required to generate pseudorandom numbers which have good statistical properties as well as unpredictability in cryptography. An m-sequence is a linear feedback shift register sequence with maximal period over a finite field. M-sequences have good statistical properties, however we must nonlinearize m-sequences for cryptographic purposes. A geometric sequence is a sequence given by applying a nonlinear feedforward function to an m-sequence. Nogami, Tada and Uehara proposed a geometric sequence whose nonlinear feedforward function is given by the Legendre symbol, and showed the period, periodic autocorrelation and linear complexity of the sequence. Furthermore, Nogami et al. proposed a generalization of the sequence, and showed the period and periodic autocorrelation. In this paper, we first investigate linear complexity of the geometric sequences. In the case that the Chan-Games formula which describes linear complexity of geometric sequences does not hold, we show the new formula by considering the sequence of complement numbers, Hasse derivative and cyclotomic classes. Under some conditions, we can ensure that the geometric sequences have a large linear complexity from the results on linear complexity of Sidel'nikov sequences. The geometric sequences have a long period and large linear complexity under some conditions, however they do not have the balance property. In order to construct sequences that have the balance property, we propose interleaved sequences of the geometric sequence and its complement. Furthermore, we show the periodic autocorrelation and linear complexity of the proposed sequences. The proposed sequences have the balance property, and have a large linear complexity if the geometric sequences have a large one.
A parallel phrase matching (PM) engine for dictionary compression is presented. Hardware based parallel chaining hash can eliminate erroneous PM results raised by hash collision; while newly-designed storage architecture holding PM results solved the data dependency issue; Thus, the average compression speed is increased by 53%.
Recent studies utilize multiple kernel learning to deal with incomplete-data problem. In this study, we introduce new methods that do not only complete multiple incomplete kernel matrices simultaneously, but also allow control of the flexibility of the model by parameterizing the model matrix. By imposing restrictions on the model covariance, overfitting of the data is avoided. A limitation of kernel matrix estimations done via optimization of an objective function is that the positive definiteness of the result is not guaranteed. In view of this limitation, our proposed methods employ the LogDet divergence, which ensures the positive definiteness of the resulting inferred kernel matrix. We empirically show that our proposed restricted covariance models, employed with LogDet divergence, yield significant improvements in the generalization performance of previous completion methods.
Yuhua SUN Qiang WANG Qiuyan WANG Tongjiang YAN
In the past two decades, many generalized cyclotomic sequences have been constructed and they have been used in cryptography and communication systems for their high linear complexity and low autocorrelation. But there are a few of papers focusing on the 2-adic complexities of such sequences. In this paper, we first give a property of a class of Gaussian periods based on Whiteman's generalized cyclotomic classes of order 4. Then, as an application of this property, we study the 2-adic complexity of a class of Whiteman's generalized cyclotomic sequences constructed from two distinct primes p and q. We prove that the 2-adic complexity of this class of sequences of period pq is lower bounded by pq-p-q-1. This lower bound is at least greater than one half of its period and thus it shows that this class of sequences can resist against the rational approximation algorithm (RAA) attack.
Luo CHEN Ye WU Wei XIONG Ning JING
In terms of spatial online aggregation, traditional stand-alone serial methods gradually become limited. Although parallel computing is widely studied nowadays, there scarcely has research conducted on the index-based parallel online aggregation methods, specifically for spatial data. In this letter, a parallel multilevel indexing method is proposed to accelerate spatial online aggregation analyses, which contains two steps. In the first step, a parallel aR tree index is built to accelerate aggregate query locally. In the second step, a multilevel sampling data pyramid structure is built based on the parallel aR tree index, which contribute to the concurrent returned query results with certain confidence degree. Experimental and analytical results verify that the methods are capable of handling billion-scale data.
Tongxin YANG Tomoaki UKEZONO Toshinori SATO
Multiplication is a key fundamental function for many error-tolerant applications. Approximate multiplication is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes an accuracy-controllable multiplier whose final product is generated by a carry-maskable adder. The proposed scheme can dynamically select the length of the carry propagation to satisfy the accuracy requirements flexibly. The partial product tree of the multiplier is approximated by the proposed tree compressor. An 8×8 multiplier design is implemented by employing the carry-maskable adder and the compressor. Compared with a conventional Wallace tree multiplier, the proposed multiplier reduced power consumption by between 47.3% and 56.2% and critical path delay by between 29.9% and 60.5%, depending on the required accuracy. Its silicon area was also 44.6% smaller. In addition, results from two image processing applications demonstrate that the quality of the processed images can be controlled by the proposed multiplier design.
Manabu KOBAYASHI Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
F.P. Preparata et al. have proposed a fault diagnosis model to find all faulty units in the multicomputer system by using outcomes which each unit tests some other units. In this paper, for probabilistic diagnosis models, we show an efficient diagnosis algorithm to obtain a posteriori probability that each of units is faulty given the test outcomes. Furthermore, we propose a method to analyze the diagnostic error probability of this algorithm.
Dongshin YANG Yutaka JITSUMATSU
Compressed Sensing (CS) is known to provide better channel estimation performance than the Least Square (LS) method for channel estimation. However, multipath delays may not be resolved if they span between the grids. This grid problem of CS is an obstacle to super resolution channel estimation. An Atomic Norm (AN) minimization is one of the methods for estimating continuous parameters. The AN minimization can successfully recover a spectrally sparse signal from a few time-domain samples even though the dictionary is continuous. There are studies showing that the AN minimization method has better resolution than conventional CS methods. In this paper, we propose a channel estimation method based on the AN minimization for Spread Spectrum (SS) systems. The accuracy of the proposed channel estimation is compared with the conventional LS method and Dantzig Selector (DS) of the CS. In addition to the application of channel estimation in wireless communication, we also show that the AN minimization can be applied to Global Positioning System (GPS) using Gold sequence.
Md Belayet ALI Takashi HIRAYAMA Katsuhisa YAMANAKA Yasuaki NISHITANI
In this paper, we propose a design of reversible adder/subtractor blocks and arithmetic logic units (ALUs). The main concept of our approach is different from that of the existing related studies; we emphasize the function design. Our approach of investigating the reversible functions includes (a) the embedding of irreversible functions into incompletely-specified reversible functions, (b) the operation assignment, and (c) the permutation of function outputs. We give some extensions of these techniques for further improvements in the design of reversible functions. The resulting reversible circuits are smaller than that of the existing design in terms of the number of multiple-control Toffoli gates. To evaluate the quantum cost of the obtained circuits, we convert the circuits to reduced quantum circuits for experiments. The results also show the superiority of our realization of adder/subtractor blocks and ALUs in quantum cost.
Fanxin ZENG Xiping HE Guojun LI Guixin XUAN Zhenyu ZHANG Yanni PENG Sheng LU Li YAN
This paper improves the family size of quadrature amplitude modulation (QAM) complementary sequences with binary inputs. By employing new mathematical description: B-type-2 of 4q-QAM constellation (integer q ≥ 2), a new construction yielding 4q-QAM complementary sequences (CSs) with length 2m (integer m ≥ 2) is developed. The resultant sequences include the known QAM CSs with binary inputs as special cases, and the family sizes of new sequences are approximately 22·2q-4q-1(22·2q-3-1) times as many as the known. Also, both new sequences and the known have the same the peak envelope power (PEP) upper bounds, when they are used in an orthogonal frequency-division multiplexing communication system.
Satoshi KAWAKAMI Takatsugu ONO Toshiyuki OHTSUKA Koji INOUE
We propose a parallel precomputation method for real-time model predictive control. The key idea is to use predicted input values produced by model predictive control to solve an optimal control problem in advance. It is well known that control systems are not suitable for multi- or many-core processors because feedback-loop control systems are inherently based on sequential operations. However, since the proposed method does not rely on conventional thread-/data-level parallelism, it can be easily applied to such control systems without changing the algorithm in applications. A practical evaluation using three real-world model predictive control system simulation programs demonstrates drastic performance improvement without degrading control quality offered by the proposed method.
Yusuke SUZUKI Hiroshi YAMADA Shinpei KATO Kenji KONO
Graphics processing units (GPUs) have become an attractive platform for general-purpose computing (GPGPU) in various domains. Making GPUs a time-multiplexing resource is a key to consolidating GPGPU applications (apps) in multi-tenant cloud platforms. However, advanced GPGPU apps pose a new challenge for consolidation. Such highly functional GPGPU apps, referred to as GPU eaters, can easily monopolize a shared GPU and starve collocated GPGPU apps. This paper presents GLoop, which is a software runtime that enables us to consolidate GPGPU apps including GPU eaters. GLoop offers an event-driven programming model, which allows GLoop-based apps to inherit the GPU eaters' high functionality while proportionally scheduling them on a shared GPU in an isolated manner. We implemented a prototype of GLoop and ported eight GPU eaters on it. The experimental results demonstrate that our prototype successfully schedules the consolidated GPGPU apps on the basis of its scheduling policy and isolates resources among them.
By exploiting the inherent sparsity of wireless channels, the channel estimation in an orthogonal frequency division multiplexing (OFDM) system can be cast as a compressed sensing (CS) problem to estimate the channel more accurately. Practically, matching pursuit algorithms such as orthogonal matching pursuit (OMP) are used, where path delays of the channel is guessed based on correlation values for every quantized delay with residual. This full search approach requires a predefined grid of delays with high resolution, which induces the high computational complexity because correlation values with residual at a huge number of grid points should be calculated. Meanwhile, the correlation values with high resolution can be obtained by interpolation between the correlation values at a low resolution grid. Also, the interpolation can be implemented with a low pass filter (LPF). By using this fact, in this paper we substantially reduce the computational complexity to calculate the correlation values in channel estimation using CS.
Osamu WATANABE Hiroyuki KOBAYASHI Hitoshi KIYA
An efficient two-layer coding method using the histogram packing technique with the backward compatibility to the legacy JPEG is proposed in this paper. The JPEG XT, which is the international standard to compress HDR images, adopts two-layer coding scheme for backward compatibility to the legacy JPEG. However, this two-layer coding structure does not give better lossless performance than the other existing methods for HDR image compression with single-layer structure. Moreover, the lossless compression of the JPEG XT has a problem on determination of the coding parameters; The lossless performance is affected by the input images and/or the parameter values. That is, finding appropriate combination of the values is necessary to achieve good lossless performance. It is firstly pointed out that the histogram packing technique considering the histogram sparseness of HDR images is able to improve the performance of lossless compression. Then, a novel two-layer coding with the histogram packing technique and an additional lossless encoder is proposed. The experimental results demonstrate that not only the proposed method has a better lossless compression performance than that of the JPEG XT, but also there is no need to determine image-dependent parameter values for good compression performance without losing the backward compatibility to the well known legacy JPEG standard.
Gaoyuan ZHANG Hong WEN Longye WANG Xiaoli ZENG Jie TANG Runfa LIAO Liang SONG
A simple and novel multiple-symbol differential detection (MSDD) scheme is proposed for IEEE 802.15.4 binary phase shift keying (BPSK) receivers. The detection is initiated by estimating and compensating the carrier frequency offset (CFO) effect in the chip sample of interest. With these new statistics, the decisions are jointly made by allowing the observation window length to be longer than two bit intervals. Simulation results demonstrate that detection reliability of the IEEE 802.15.4 BPSK receivers is significantly improved. Namely, at packet error rate (PER) of 1×10-3, the signal-to-noise ratio (SNR) gap between ideal coherent detection (perfect carrier reference phase and no CFO) with differential decoding and conventional optimal single differential coherent detection (SDCD) is filled by 2.1dB when the observation window length is set to 6bit intervals. Then, the benefit that less energy consumed by retransmissions is successfully achieved.
Koichi MITSUNARI Yoshinori TAKEUCHI Masaharu IMAI Jaehoon YU
A significant portion of computational resources of embedded systems for visual detection is dedicated to feature extraction, and this severely affects the detection accuracy and processing performance of the system. To solve this problem, we propose a feature descriptor based on histograms of oriented gradients (HOG) consisting of simple linear algebra that can extract equivalent information to the conventional HOG feature descriptor at a low computational cost. In an evaluation, a leading-edge detection algorithm with this decomposed vector HOG (DV-HOG) achieved equivalent or better detection accuracy compared with conventional HOG feature descriptors. A hardware implementation of DV-HOG occupies approximately 14.2 times smaller cell area than that of a conventional HOG implementation.