A Tikhonov regularized RLS algorithm with an exponential weighting factor, i.e., a leaky RLS (LRLS) algorithm was proposed by the author. A quadratic version of the LRLS algorithm also exists in the literature of adaptive filters. In this letter, a cubic version of the LRLS filter which is computationally efficient is proposed when the length of the adaptive filter is short. The proposed LRLS filter includes only a divide per iteration although its multiplications and additions increase in number. Simulation results show that the proposed LRLS filter is faster for its short length than the existing quadratic version of the LRLS filter.
Di YAO Xin ZHANG Qiang YANG Weibo DENG
An improved beamformer, which uses joint estimation of the reconstructed interference-plus-noise (IPN) covariance matrix and array steering vector (ASV), is proposed. It can mitigate the problem of performance degradation in situations where the desired signal exists in the sample covariance matrix and the steering vector pointing has large errors. In the proposed method, the covariance matrix is reconstructed by weighted sum of the exterior products of the interferences' ASV and their individual power to reject the desired signal component, the coefficients of which can be accurately estimated by the compressed sensing (CS) and total least squares (TLS) techniques. Moreover, according to the theorem of sequential vector space projection, the actual ASV is estimated from an intersection of two subspaces by applying the alternating projection algorithm. Simulation results are provided to demonstrate the performance of the proposed beamformer, which is clearly better than the existing robust adaptive beamformers.
Takashi YOKOTA Kanemitsu OOTSU Takeshi OHKAWA
Interconnection network is one of the inevitable components in parallel computers, since it is responsible to communication capabilities of the systems. It affects the system-level performance as well as the physical and logical structure of the systems. Although many studies are reported to enhance the interconnection network technology, we have to discuss many issues remaining. One of the most important issues is congestion management. In an interconnection network, many packets are transferred simultaneously and the packets interfere to each other in the network. Congestion arises as a result of the interferences. Its fast spreading speed seriously degrades communication performance and it continues for long time. Thus, we should appropriately control the network to suppress the congested situation for maintaining the maximum performance. Many studies address the problem and present effective methods, however, the maximal performance in an ideal situation is not sufficiently clarified. Solving the ideal performance is, in general, an NP-hard problem. This paper introduces particle swarm optimization (PSO) methodology to overcome the problem. In this paper, we first formalize the optimization problem suitable for the PSO method and present a simple PSO application as naive models. Then, we discuss reduction of the size of search space and introduce three practical variations of the PSO computation models as repetitive model, expansion model, and coding model. We furthermore introduce some non-PSO methods for comparison. Our evaluation results reveal high potentials of the PSO method. The repetitive and expansion models achieve significant acceleration of collective communication performance at most 1.72 times faster than that in the bursty communication condition.
Yuan SUN Xing-she ZHOU Gang YANG
In this letter, we investigate the computation offloading problem in cloud based multi-robot systems, in which user weights, communication interference and cloud resource limitation are jointly considered. To minimize the system cost, two offloading selection and resource allocation algorithms are proposed. Numerical results show that the proposed algorithms both can greatly reduce the overall system cost, and the greedy selection based algorithm even achieves near-optimal performance.
Can CHEN Dengyin ZHANG Jian LIU
Multi-hypothesis prediction technique, which exploits inter-frame correlation efficiently, is widely used in block-based distributed compressive video sensing. To solve the problem of inaccurate prediction in multi-hypothesis prediction technique at a low sampling rate and enhance the reconstruction quality of non-key frames, we present a resample-based hybrid multi-hypothesis scheme for block-based distributed compressive video sensing. The innovations in this paper include: (1) multi-hypothesis reconstruction based on measurements reorganization (MR-MH) which integrates side information into the original measurements; (2) hybrid multi-hypothesis (H-MH) reconstruction which mixes multiple multi-hypothesis reconstructions adaptively by resampling each reconstruction. Experimental results show that the proposed scheme outperforms the state-of-the-art technique at the same low sampling rate.
Shanlin XIAO Tsuyoshi ISSHIKI Dongju LI Hiroaki KUNIEDA
Object detection is an essential and expensive process in many computer vision systems. Standard off-the-shelf embedded processors are hard to achieve performance-power balance for implementation of object detection applications. In this work, we explore an Application Specific Instruction set Processor (ASIP) for object detection using Histogram of Oriented Gradients (HOG) feature. Algorithm simplifications are adopted to reduce memory bandwidth requirements and mathematical complexity without losing reliability. Also, parallel histogram generation and on-the-fly Support Vector Machine (SVM) calculation architecture are employed to reduce the necessary cycle counts. The HOG algorithm on the proposed ASIP was accelerated by a factor of 63x compared to the pure software implementation. The ASIP was synthesized for a standard 90nm CMOS library, with a silicon area of 1.31mm2 and 47.8mW power consumption at a 200MHz frequency. Our object detection processor can achieve 42 frames-per-second (fps) on VGA video. The evaluation and implementation results show that the proposed ASIP is both area-efficient and power-efficient while being competitive with commercial CPUs/DSPs. Furthermore, our ASIP exhibits comparable performance even with hard-wire designs.
Knowledge graphs have been shown to be useful to many tasks in artificial intelligence. Triples of knowledge graphs are traditionally structured by human editors or extracted from semi-structured information; however, editing is expensive, and semi-structured information is not common. On the other hand, most such information is stored as text. Hence, it is necessary to develop a method that can extract knowledge from texts and then construct or populate a knowledge graph; this has been attempted in various ways. Currently, there are two approaches to constructing a knowledge graph. One is open information extraction (Open IE), and the other is knowledge graph embedding; however, neither is without problems. Stanford Open IE, the current best such system, requires labeled sentences as training data, and knowledge graph embedding systems require numerous triples. Recently, distributed representations of words have become a hot topic in the field of natural language processing, since this approach does not require labeled data for training. These require only plain text, but Mikolov showed that it can perform well with the word analogy task, answering questions such as, “a is to b as c is to __?.” This can be considered as a knowledge extraction task from a text for finding the missing entity of a triple. However, the accuracy is not sufficiently high when applied in a straightforward manner to relations in knowledge graphs, since the method uses only one triple as a positive example. In this paper, we analyze why distributed representations perform such tasks well; we also propose a new method for extracting knowledge from texts that requires much less annotated data. Experiments show that the proposed method achieves considerable improvement compared with the baseline; in particular, the improvement in HITS@10 was more than doubled for some relations.
Vinay RAVINDRA Hirobumi SAITO Jiro HIROKAWA Miao ZHANG Atsushi TOMIKI
A TM010 cavity power combiner is presented, which achieves direct interface to microstrip lines via magnetic field coupling. A prototype is fabricated and its S-matrix measured. From the S-parameters we calculate that it shows less than 0.85 dB insertion loss over 250 MHz bandwidth at X-band. The return power to the input ports is less than -15 dB over this bandwidth. We verify the insertion loss estimation using S-matrix, by measuring transmission S-parameter of a concatenated 2-port divider-combiner network. Similarly analyzed is the case of performance of power combiner when one of the input fails. We find that we can achieve graceful degradation provided we ensure some particular reflection phase at the degraded port.
Rong CHEN Cunqian FENG Sisan HE Yi RAO
The extraction of micro-motion parameters is deeply influenced by the precision of estimation on translational motion parameters. Based on the periodicity of micro-motion, the quadratic polynomial fitting is carried out among range delays to align envelope. The micro-motion component of phase information is eliminated by conjugate multiplication after which the translational motion parameters are estimated. Then the translational motion is precisely compensated through the third order polynomial fitting. Results of simulation demonstrate that the algorithm put forward here can realize the precise compensation for translational motion parameters even under an environment with low signal noise ratio (SNR).
Seiji MOCHIZUKI Katsushige MATSUBARA Keisuke MATSUMOTO Chi Lan Phuong NGUYEN Tetsuya SHIBAYAMA Kenichi IWATA Katsuya MIZUMOTO Takahiro IRITA Hirotaka HARA Toshihiro HATTORI
A 197mW 70ms-latency Full-HD 12-channel video-processing SoC for in-vehicle information systems has been implemented in 16nm CMOS. The SoC integrates 17 video processors of 6 types to operate video processing independently of other processing in CPU/GPU. The synchronous scheme between the video processors achieves 70ms low-latency for driver assistance. The optimized implementation of lossy and lossless video-data compression reduces memory access data by half and power consumption by 20%.
Toru NAKURA Tetsuya IIZUKA Kunihiro ASADA
This paper demonstrates a PLL compiler that generates the final GDSII data from a specification of input and output frequencies with PVT corner conditions. A Pulse Width Controlled PLLs (PWPLL) is composed of digital blocks, and thus suitable for being designed using a standard cell library and being layed out with a commercially available place-and-route (P&R) tool. A PWPLL has 8 design parameters. Our PLL compiler decides the 8 parameters and confirms the PLL operation with the following functions: 1) calculates rough parameter values based on an analytical model, 2) generates SPICE and gate-level verilog netlists with given parameter values, 3) runs SPICE simulations and analyzes the waveform, to examine the oscillation frequency or the voltage of specified nodes at a given time, 4) changes the parameter values to an appropriate direction depending on the waveform analyses to obtain the optimized parameter values, 5) generates scripts that can be used in commercial design tools and invokes the tools with the gate-level verilog netlist to get the final LVS/DRC-verified GDSII data from a P&R and a verification tools, and finally 6) generates the necessary characteristic summary sheets from the post-layout SPICE simulations extracted from the GDSII. Our compiler was applied to an 0.18µm standard CMOS technology to design a PLL with 600MHz output, 600/16MHz input frequency, and confirms the PLL operation with 1.2mW power and 85µm×85µm layout area.
Shuichi HONDA Takahiro ISHINABE Yosei SHIBATA Hideo FUJIKAKE
We investigated the effects of a bending stress on the change in phase retardation of curved polycarbonate substrates and optical characteristics of flexible liquid crystal displays (LCDs). We clarified that the change in phase retardation was extremely small even for the substrates with a small radius of curvature, because bending stresses occurred in the inner and upper surfaces are canceled each other out. We compensated for the phase retardation of polycarbonate substrates by a positive C-plate and successfully suppressed light leakage in both non-curved and curved states. These results indicate the feasibility of high-quality flexible LCDs using polycarbonate substrates even in curved states.
Li CHEN Ling YANG Juan DU Chao SUN Shenglei DU Haipeng XI
Extreme learning machine (ELM) has recently attracted many researchers' interest due to its very fast learning speed, good generalization ability, and ease of implementation. However, it has a linear output layer which may limit the capability of exploring the available information, since higher-order statistics of the signals are not taken into account. To address this, we propose a novel ELM architecture in which the linear output layer is replaced by a Volterra filter structure. Additionally, the principal component analysis (PCA) technique is used to reduce the number of effective signals transmitted to the output layer. This idea not only improves the processing capability of the network, but also preserves the simplicity of the training process. Then we carry out performance evaluation and application analysis for the proposed architecture in the context of supervised classification and unsupervised equalization respectively, and the obtained results either on publicly available datasets or various channels, when compared to those produced by already proposed ELM versions and a state-of-the-art algorithm: support vector machine (SVM), highlight the adequacy and the advantages of the proposed architecture and characterize it as a promising tool to deal with signal processing tasks.
Xushan CHEN Jibin YANG Meng SUN Jianfeng LI
In order to significantly reduce the time and space needed, compressive sensing builds upon the fundamental assumption of sparsity under a suitable discrete dictionary. However, in many signal processing applications there exists mismatch between the assumed and the true sparsity bases, so that the actual representative coefficients do not lie on the finite grid discretized by the assumed dictionary. Unlike previous work this paper introduces the unified compressive measurement operator into atomic norm denoising and investigates the problems of recovering the frequency support of a combination of multiple sinusoids from sub-Nyquist samples. We provide some useful properties to ensure the optimality of the unified framework via semidefinite programming (SDP). We also provide a sufficient condition to guarantee the uniqueness of the optimizer with high probability. Theoretical results demonstrate the proposed method can locate the nonzero coefficients on an infinitely dense grid over a wide range of SNR case.
Eishin MURAKAMI Yuki OGURO Yuji SAKAMOTO
Head-mounted displays (HMDs) and augmented reality (AR) are actively being studied. However, ordinary AR HMDs for visual assistance have a problem in which users have difficulty simultaneously focusing their eyes on both the real target object and the displayed image because the image can only be displayed at a fixed distance from an user's eyes in contrast to where the real object three-dimensionally exists. Therefore, we considered incorporating a holographic technology, an ideal three-dimensional (3D) display technology, into an AR HMD system. A few studies on holographic HMDs have had technical problems, and they have faults in size and weight. This paper proposes a compact holographic AR HMD system with the purpose of enabling an ideal 3D AR HMD system which can correctly reconstruct the image at any depth. In this paper, a Fourier transform optical system (FTOS) was implemented using only one lens in order to achieve a compact and lightweight structure, and a compact holographic AR HMD system was constructed. The experimental results showed that the proposed system can reconstruct sharp images at the correct depth for a wide depth range. This study enabled an ideal 3D AR HMD system that enables simultaneous viewing of both the real target object and the reconstructed image without feeling visual fatigue.
Rational proofs, introduced by Azar and Micali (STOC 2012), are a variant of interactive proofs in which the prover is rational, and may deviate from the protocol for increasing his reward. Guo et al. (ITCS 2014) demonstrated that rational proofs are relevant to delegation of computation. By restricting the prover to be computationally bounded, they presented a one-round delegation scheme with sublinear verification for functions computable by log-space uniform circuits with logarithmic depth. In this work, we study rational proofs in which the verifier is also rational, and may deviate from the protocol for decreasing the prover's reward. We construct a three-message delegation scheme with sublinear verification for functions computable by log-space uniform circuits with polylogarithmic depth in the random oracle model.
Massive amounts of computation involved in real-time evaluation of deep neural networks pose a serious challenge in battery-powered systems, and neuromorphic systems specialized in neural networks have been developed. This paper first shows the portion of active neurons at a time dwindles as going toward the output layer in recent large-scale deep convolutional neural networks. Spike-based, asynchronous neuromorphic systems take advantage of the sparse activation and reduce dynamic power consumption, while synchronous systems may waste much dynamic power even for the sparse activation due to clocks. We thus propose a clock gating-based dynamic power reduction method that exploits the sparse activation for synchronous neuromorphic systems. We apply the proposed method to a building block of a recently proposed synchronous neuromorphic computing system and demonstrate up to 79% dynamic power saving at a negligible overhead.
Naoki TAKADA Masato FUJIWARA ChunWei OOI Yuki MAEDA Hirotaka NAKAYAMA Takashi KAKUE Tomoyoshi SHIMOBABA Tomoyoshi ITO
This study involves proposing a high-speed computer-generated hologram playback by using a digital micromirror device for high-definition spatiotemporal division multiplexing electroholography. Consequently, the results indicated that the study successfully reconstructed a high-definition 3-D movie of 3-D objects that was comprised of approximately 900,000 points at 60 fps when each frame was divided into twelve parts.
Li GUO Dajiang ZHOU Shinji KIMURA Satoshi GOTO
For mobile video codecs, the huge energy dissipation for external memory traffic is a critical challenge under the battery power constraint. Lossy embedded compression (EC), as a solution to this challenge, is considered in this paper. While previous studies in lossy EC mostly focused on algorithm optimization to reduce distortion, this work, to the best of our knowledge, is the first one that addresses the distortion control. Firstly, from both theoretical analysis and experiments for distortion optimization, a conclusion is drawn that, at the frame level, allocating memory traffic evenly is a reliable approximation to the optimal solution to minimize quality loss. Then, to reduce the complexity of decoding twice, the distortion between two sequences is estimated by a linear function of that calculated within one sequence. Finally, on the basis of even allocation, the distortion control is proposed to determine the amount of memory traffic according to a given distortion limitation. With the adaptive target setting and estimating function updating in each group of pictures (GOP), the scene change in video stream is supported without adding a detector or retraining process. From experimental results, the proposed distortion control is able to accurately fix the quality loss to the target. Compared to the baseline of negative feedback on non-referred B frames, it achieves about twice memory traffic reduction.
Siya BAO Tomoyuki NITTA Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose a safe and comprehensive route finding algorithm for pedestrians based on lighting and landmark conditions. Safety and comprehensiveness can be predicted by the five possible indicators: (1) lighting conditions, (2) landmark visibility, (3) landmark effectiveness, (4) turning counts along a route, and (5) road widths. We first investigate impacts of these five indicators on pedestrians' perceptions on safety and comprehensiveness during route findings. After that, a route finding algorithm is proposed for pedestrians. In the algorithm, we design the score based on the indicators (1), (2), (3), and (5) above and also introduce a turning count reduction strategy for the indicator (4). Thus we find out a safe and comprehensive route through them. In particular, we design daytime score and nighttime score differently and find out an appropriate route depending on the time periods. Experimental simulation results demonstrate that the proposed algorithm obtains higher scores compared to several existing algorithms. We also demonstrate that the proposed algorithm is able to find out safe and comprehensive routes for pedestrians in real environments in accordance with questionnaire results.