Tati ERLINA Renyuan ZHANG Yasuhiko NAKASHIMA
An efficient approximate computing circuit is developed for polynomial functions through the hybrid of analog and stochastic domains. Different from the ordinary time-based stochastic computing (TBSC), the proposed circuit exploits not only the duty cycle of pulses but also the pulse strength of the analog current to carry information for multiplications. The accumulation of many multiplications is performed by merely collecting the stochastic-current. As the calculation depth increases, the growth of latency (while summations), signal power weakening, and disparity of output signals (while multiplications) are substantially avoidable in contrast to that in the conventional TBSC. Furthermore, the calculation range spreads to bipolar infinite without scaling, theoretically. The proposed multi-domain stochastic computing (MDSC) is designed and simulated in a 0.18 µm CMOS technology by employing a set of current mirrors and an improved scheme of the TBSC circuit based on the Neuron-MOS mechanism. For proof-of-concept, the multiply and accumulate calculations (MACs) are implemented, achieving an average accuracy of 95.3%. More importantly, the transistor counting, power consumption, and latency decrease to 6.1%, 55.4%, and 4.2% of the state-of-art TBSC circuit, respectively. The robustness against temperature and process variations is also investigated and presented in detail.
Hao XIAO Kaikai ZHAO Guangzhu LIU
This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.
Jianli CAO Zhikui CHEN Yuxin WANG He GUO Pengcheng WANG
Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
Ryota OKUMURA Keiichi MIZUTANI Hiroshi HARADA
In this paper, we propose two schemes that improve the delay and the current consumption for efficient polling communications in multi-hop networks based on the receiver-initiated media access control (MAC) protocol. Polling communications can offer reliable data collection by avoiding communication collisions, but the larger delay and current consumption for the round-trip operation should be improved. The first proposal is an enhanced source routing scheme for downlink communications. In the proposed scheme, multiple candidates of relay terminals can be loaded in the routing information, so the route is not specified uniquely. The improvement of the delay and the current consumption is achieved by shortening the waiting time for communication timings based on the flexible routing. The second proposal is a round-trip delay reduction scheme which focuses on the bi-directionality of polling communications. The proposed scheme reduces the round-trip delay by offering frequent communication timings for uplink communications. Also, this paper proposes the joint application of the enhanced source routing scheme and the round-trip delay reduction scheme in polling communications. Computer simulations that suppose a multi-hop network based on the feathery receiver-initiated transmission (F-RIT) protocol in stable channel conditions are carried out. The results show the effectiveness of the proposed schemes in improving the delay and the current consumption. When the polling interval is 900s, the combination of the two proposed schemes improves the round-trip delay by up to 44.1% and the current consumption by up to 38.7% in average.
Takashi NAGAMATSU Mamoru HIROE Hisashi ARAI
An eye model expressed by a revolution about the optical axis of the eye is one of the most accurate models for use in a 3D gaze estimation method. The measurement range of the previous gaze estimation method that uses two cameras based on the eye model is limited by the larger of the two angles between the gaze and the optical axes of two cameras. The previous method cannot calculate the gaze when exceeding a certain limit of the rotation angle of the eye. In this paper, we show the characteristics of reflections on the surface of the eye from two light sources, when the eye rotates. Then, we propose a method that extends the rotation angle of the eye for a 3D gaze estimation based on this model. The proposed method uses reflections that were not used in the previous method. We developed an experimental gaze tracking system for a wide projector screen and experimentally validated the proposed method with 20 participants. The result shows that the proposed method can measure the gaze of more number of people with increased accuracy compared with the previous method.
Takuya SAKAMOTO Sohei MITANI Toru SATO
We experimentally evaluate the performance of a noncontact system that measures the heartbeat of a sleeping person. The proposed system comprises a pair of radar systems installed at two different positions. We use millimeter-wave ultra-wideband multiple-input multiple-output array radar systems and evaluate the performance attained in measuring the heart inter-beat interval and body movement. The importance of using two radar systems instead of one is demonstrated in this paper. We conduct three types of experiments; the first and second experiments are radar measurements of three participants lying on a bed with and without body movement, while the third experiment is the radar measurement of a participant actually sleeping overnight. The experiments demonstrate that the performance of the radar-based vital measurement strongly depends on the orientation of the person under test. They also show that the proposed system detects 70% of rolling-over movements made overnight.
Masaaki ISEKI Takamichi NAKAMOTO
An olfactory display is a device to present smells. Temporal characteristics of three types of olfactory displays such as one based upon high-speed switching of solenoid valves, desktop-type one based on SAW atomizer and wearable-type one based on SAW atomizer were evaluated using three odorants with different volatilities. The sensory test revealed that the olfactory displays based on SAW atomizer had the presentation speeds faster than that of solenoid valves switching. Especially, the wearable one had an excellent temporal characteristic. These results largely depend on the difference in the odor delivery method. The data obtained in this study provides basic knowledge when we make olfactory contents.
Masahito YATA Go OTSURU Yukitoshi SANADA
In this paper, user scheduling with beam selection for full-digital massive multi-input multi-output (MIMO) is proposed. Inter-user interference (IUI) can be canceled by precoding such as zero-forcing at a massive MIMO base station if ideal hardware implementation is assumed. However, owing to the non-ideal characteristics of hardware components, IUI occurs among multiple user terminals allocated on the same resource. Thus, in the proposed scheme, the directions of beams for allocated user terminals are adjusted to maximize the total user throughput. User allocation based on the user throughput after the adjustment of beam directivity is then carried out. Numerical results obtained through computer simulation show that when the number of user terminals in the cell is two and the number of user terminals allocated to one resource block (RB) is two, the throughput per subcarrier per subframe improves by about 3.0 bits. On the other hand, the fairness index (FI) is reduced by 0.03. This is because only the probability in the high throughput region increases as shown in the cumulative distribution function (CDF) of throughput per user. Also, as the number of user terminals in the cell increases, the amount of improvement in throughput decreases. As the number of allocated user terminals increases, more user terminals are allocated to the cell-edge, which reduces the average throughput.
Soudalin KHOUANGVICHIT Eiji OKI
This paper proposes an optimization model under uncertain traffic demands to design the backup network to minimize the total capacity of a backup network to protect the primary network from multiple link failures, where the probability of link failure is specified. The hose uncertainty is adopted to express uncertain traffic demands. The probabilistic survivability guarantee is provided by determining both primary and backup network routing, simultaneously. Robust optimization is introduced to provide probabilistic survivability guarantees for different link capacities in the primary network model under the hose uncertainty. Robust optimization in the proposed model handles two uncertain items: uncertain failed primary link with different capacities and uncertain traffic demands. We formulate an optimization problem for the proposed model. Since it is difficult to directly solve it, we introduce a heuristic approach for the proposed model. By using the heuristic approach, we investigate how the probability of link failure affects both primary and backup network routing. Numerical results show that the proposed model yields a backup network with lower total capacity requirements than the conventional model for the link failure probabilities examined in this paper. The results indicate that the proposed model reduces the total capacity of the backup network compared to the conventional model under the hose uncertainty. The proposed model shares more effectively the backup resources to protect primary links by determining routing in both primary and backup networks.
Shogo NAKAMURA Sho IWAZAKI Koichi ICHIGE
This paper presents a method to optimize 2-D sparse array configurations along with a technique to interpolate holes to accurately estimate the direction of arrival (DOA). Conventional 2-D sparse arrays are often defined using a closed-form representation and have the property that they can create hole-free difference co-arrays that can estimate DOAs of incident signals that outnumber the physical elements. However, this property restricts the array configuration to a limited structure and results in a significant mutual coupling effect between consecutive sensors. In this paper, we introduce an optimization-based method for designing 2-D sparse arrays that enhances flexibility of array configuration as well as DOA estimation accuracy. We also propose a method to interpolate holes in 2-D co-arrays by nuclear norm minimization (NNM) that permits holes and to extend array aperture to further enhance DOA estimation accuracy. The performance of the proposed optimum arrays is evaluated through numerical examples.
Kazutaka KIKUTA Li YI Lilong ZOU Motoyuki SATO
In this paper, we propose a cross-correlation method applied to multistatic ground penetrating radar (GPR) data sets to detect road pavement damage. Pavement cracks and delamination cause variations in electromagnetic wave propagation. The proposed method can detect velocity change using cross-correlation of data traces at different times. An artificially damaged airport taxiway model was measured, and the method captures the positions of damaged parts.
Zhengfeng GU Hongying TANG Xiaobing YUAN
Source localization in a wireless sensor network (WSN) is sensitive to the sensors' positions. In practice, due to mobility, the receivers' positions may be known inaccurately, leading to non-negligible degradation in source localization estimation performance. The goal of this paper is to develop a semidefinite programming (SDP) method using time-difference-of arrival (TDOA) and frequency-difference-of-arrival (FDOA) by taking the sensor position uncertainties into account. Specifically, we transform the commonly used maximum likelihood estimator (MLE) problem into a convex optimization problem to obtain an initial estimation. To reduce the coupling between position and velocity estimator, we also propose an iterative method to obtain the velocity and position, by using weighted least squares (WLS) method and SDP method, respectively. Simulations show that the method can approach the Cramér-Rao lower bound (CRLB) under both mild and high noise levels.
Nida RASHEED Waqar S. QURESHI Shoab A. KHAN Manshoor A. NAQVI Eisa ALANAZI
Surveillance through aerial systems is in place for years. Such systems are expensive, and a large fleet is in operation around the world without upgrades. These systems have low resolution and multiple analog cameras on-board, with Digital Video Recorders (DVRs) at the control station. Generated digital videos have multi-scenes from multi-feeds embedded in a single video stream and lack video stabilization. Replacing on-board analog cameras with the latest digital counterparts requires huge investment. These videos require stabilization and other automated video analysis prepossessing steps before passing it to the mosaicing algorithm. Available mosaicing software are not tailored to segregate feeds from different cameras and scenes, automate image enhancements, and stabilize before mosaicing (image stitching). We present "AirMatch", a new automated system that first separates camera feeds and scenes, then stabilize and enhance the video feed of each camera; generates a mosaic of each scene of every feed and produce a super quality mosaic by stitching mosaics of all feeds. In our proposed solution, state-of-the-art video analytics techniques are tailored to work on videos from vintage cameras in aerial applications. Our new framework is independent of specialized hardware requirements and generates effective mosaics. Affine motion transform with smoothing Gaussian filter is selected for the stabilization of videos. A histogram-based method is performed for scene change detection and image contrast enhancement. Oriented FAST and rotated BRIEF (ORB) is selected for feature detection and descriptors in video stitching. Several experiments on a number of video streams are performed and the analysis shows that our system can efficiently generate mosaics of videos with high distortion and artifacts, compared with other commercially available mosaicing software.
Hitoshi KAWAKITA Hiroyuki YOMO Petar POPOVSKI
In this paper, we advocate applying the concept of content-based wake-up to distributed estimation in wireless sensor networks employing wake-up receivers. With distributed estimation, where sensing data of multiple nodes are used for estimating a target observation, the energy consumption can be reduced by ensuring that only a subset of nodes in the network transmit their data, such that the collected data can guarantee the required estimation accuracy. In this case, a sink needs to selectively wake up those sensor nodes whose data can contribute to the improvement of estimation accuracy. In this paper, we propose wake-up signaling called estimative sampling (ES) that can selectively activate the desired nodes by using content-based wake-up control. The ES method includes a mechanism that dynamically searches for the desired nodes over a distribution of sensing data. With numerical results obtained by computer simulations, we show that the distributed estimation with ES method achieves lower energy consumption than conventional identity-based wake-up while satisfying the required accuracy. We also show that the proposed dynamic mechanism finely controls the trade-off between delay and energy consumption to complete the distributed estimation.
Chiharu KATAOKA Osamu KUKIMOTO Yuichiro YOSHIKAWA Kohei OGAWA Hiroshi ISHIGURO
Connected services have been under development in the automotive industry. Meanwhile, the volume of predictive notifications that utilize travel-related data is increasing, and there are concerns that drivers cannot process such an amount of information or do not accept and follow such predictive instructions straightforwardly because the information provided is predicted. In this work, an interactive voice system using two agents is proposed to realize notifications that can easily be accepted by drivers and enhance the reliability of the system by adding contextual information. An experiment was performed using a driving simulator to compare the following three forms of notifications: (1) notification with no contextual information, (2) notification with contextual information using one agent, and (3) notification with contextual information using two agents. The notification content was limited to probable near-miss incidents. The results of the experiment indicate that the driver may decelerate more with the one- and two-agent notification methods than with the conventional notification method. The degree of deceleration depended the number of times the notification was provided and whether there were cars parked on the streets.
Shunsuke YAMAKI Kazuhiro FUKUI Masahide ABE Masayuki KAWAMATA
This paper proposes statistical analysis of phase-only correlation (POC) functions under the phase fluctuation of signals due to additive Gaussian noise. We derive probability density function of phase-spectrum differences between original signal and its noise-corrupted signal with additive Gaussian noise. Furthermore, we evaluate the expectation and variance of the POC functions between these two signals. As the variance of Gaussian noise increases, the expectation of the peak of the POC function monotonically decreases and variance of the POC function monotonically increases. These results mathematically guarantee the validity of the POC functions used for similarity measure in matching techniques.
Riku AKEMA Masao YAMAGISHI Isao YAMADA
Approximate Simultaneous Diagonalization (ASD) is a problem to find a common similarity transformation which approximately diagonalizes a given square-matrix tuple. Many data science problems have been reduced into ASD through ingenious modelling. For ASD, the so-called Jacobi-like methods have been extensively used. However, the methods have no guarantee to suppress the magnitude of off-diagonal entries of the transformed tuple even if the given tuple has an exact common diagonalizer, i.e., the given tuple is simultaneously diagonalizable. In this paper, to establish an alternative powerful strategy for ASD, we present a novel two-step strategy, called Approximate-Then-Diagonalize-Simultaneously (ATDS) algorithm. The ATDS algorithm decomposes ASD into (Step 1) finding a simultaneously diagonalizable tuple near the given one; and (Step 2) finding a common similarity transformation which diagonalizes exactly the tuple obtained in Step 1. The proposed approach to Step 1 is realized by solving a Structured Low-Rank Approximation (SLRA) with Cadzow's algorithm. In Step 2, by exploiting the idea in the constructive proof regarding the conditions for the exact simultaneous diagonalizability, we obtain an exact common diagonalizer of the obtained tuple in Step 1 as a solution for the original ASD. Unlike the Jacobi-like methods, the ATDS algorithm has a guarantee to find an exact common diagonalizer if the given tuple happens to be simultaneously diagonalizable. Numerical experiments show that the ATDS algorithm achieves better performance than the Jacobi-like methods.
Zedong SUN Chunxiang GU Yonghui ZHENG
Sieve algorithms are regarded as the best algorithms to solve the shortest vector problem (SVP) on account of its good asymptotical quality, which could make it outperform enumeration algorithms in solving SVP of high dimension. However, due to its large memory requirement, sieve algorithms are not practical as expected, especially on high dimension lattice. To overcome this bottleneck, TupleSieve algorithm was proposed to reduce memory consumption by a trade-off between time and memory. In this work, aiming to make TupleSieve algorithm more practical, we combine TupleSieve algorithm with SubSieve technique and obtain a sub-exponential gain in running time. For 2-tuple sieve, 3-tuple sieve and arbitrary k-tuple sieve, when selecting projection index d appropriately, the time complexity of our algorithm is O(20.415(n-d)), O(20.566(n-d)) and $O(2^{rac{kmathrm{log}_2p}{1-k}(n-d)})$ respectively. In practice, we propose a practical variant of our algorithm based on GaussSieve algorithm. Experimental results show that our algorithm implementation is about two order of magnitude faster than FPLLL's GuassSieve algorithm. Moreover, techniques such as XOR-POPCNT trick, progressive sieving and appropriate projection index selection can be exploited to obtain a further acceleration.
Shucong TIAN Meng YANG Jianpeng WANG
Z-complementary pairs (ZCPs) were proposed by Fan et al. to make up for the scarcity of Golay complementary pairs. A ZCP of odd length N is called Z-optimal if its zero correlation zone width can achieve the maximum value (N + 1)/2. In this letter, inserting three elements to a GCP of length L, or deleting a point of a GCP of length L, we propose two constructions of Z-optimal ZCPs with length L + 3 and L - 1, where L=2α 10β 26γ, α ≥ 1, β ≥ 0, γ ≥ 0 are integers. The proposed constructions generate ZCPs with new lengths which cannot be produced by earlier ones.
Jingjing SI Wenwen SUN Chuang LI Yinbo CHENG
Deep learning is playing an increasingly important role in signal processing field due to its excellent performance on many inference problems. Parametric bilinear generalized approximate message passing (P-BiG-AMP) is a new approximate message passing based approach to a general class of structure-matrix bilinear estimation problems. In this letter, we propose a novel feed-forward neural network architecture to realize P-BiG-AMP methodology with deep learning for the inference problem of compressive sensing under matrix uncertainty. Linear transforms utilized in the recovery process and parameters involved in the input and output channels of measurement are jointly learned from training data. Simulation results show that the trained P-BiG-AMP network can achieve higher reconstruction performance than the P-BiG-AMP algorithm with parameters tuned via the expectation-maximization method.