Junshan LUO Shilian WANG Qian CHENG
Joint transmit and receive antenna selection (JTRAS) for transceive spatial modulation (TRSM) is investigated in this paper. A couple of low-complexity and efficient JTRAS algorithms are proposed to improve the reliability of TRSM systems by maximizing the minimum Euclidean distance (ED) among all received signals. Specifically, the QR decomposition based ED-JTRAS achieves near-optimal error performance with a moderate complexity reduction as compared to the optimal ED-JTRAS method. The singular value decomposition based ED-JTRAS achieves sub-optimal error performance with a significant complexity reduction. Simulation results show that the proposed methods remarkably improve the system reliability in both uncorrelated and spatially correlated Rayleigh fading channels, as compared to the conventional norm based JTRAS method.
Hideaki KINSHO Rie TAGYO Daisuke IKEGAMI Takahiro MATSUDA Jun OKAMOTO Tetsuya TAKINE
In this paper, we consider network monitoring techniques to estimate communication qualities in wide-area mobile networks, where an enormous number of heterogeneous components such as base stations, routers, and servers are deployed. We assume that average delays of neighboring base stations are comparable, most of servers have small delays, and delays at core routers are negligible. Under these assumptions, we propose Heterogeneous Delay Tomography (HDT) to estimate the average delay at each network component from end-to-end round trip times (RTTs) between mobile terminals and servers. HDT employs a crowdsourcing approach to collecting RTTs, where voluntary mobile users report their empirical RTTs to a data collection center. From the collected RTTs, HDT estimates average delays at base stations in the Graph Fourier Transform (GFT) domain and average delays at servers, by means of Compressed Sensing (CS). In the crowdsourcing approach, the performance of HDT may be degraded when the voluntary mobile users are unevenly distributed. To resolve this problem, we further extend HDT by considering the number of voluntary mobile users. With simulation experiments, we evaluate the performance of HDT.
Yi JIANG Kenichiro YAMAZAKI Toshihiro HAYATA Kohei IZUI Kanada NAKAYASU Toshifumi SATO Tatsuki OKUYAMA Jun MASHINO Satoshi SUYAMA Yukihiko OKUMURA
Massive multiple input and multiple output (Massive MIMO) is a key technique to achieve high system capacity and user data rate for the fifth generation (5G) radio access network (RAN). To implement Massive MIMO in 5G, how much Massive MIMO meets our expectation with various user equipment (UEs) in different environments should be carefully addressed. We focused on using Massive MIMO in the low super-high-frequency (SHF) band, which is expected to be used for 5G commercial bands relatively soon. We previously developed a prototype low-SHF-band centralized-RAN Massive MIMO system that has a flexible active antenna system (AAS)-unit configuration and facilitates advanced radio coordination features, such as coordinated beamforming (CB) coordinated multi-point (CoMP). In this study, we conduct field trials to evaluate downlink (DL) multi-user (MU)-MIMO performance by using our prototype system in outdoor and indoor environments. The results indicate that about 96% of the maximum total DL system throughput can be achieved with 1 AAS unit outdoors and 2 AAS units indoors. We also investigate channel capacity based on the real propagation channel estimation data measured by the prototype system. Compared with without-CB mode, the channel capacity of with-CB mode increases by a maximum of 80% and 104%, respectively, when the location of UEs are randomly selected in the outdoor and indoor environments. Furthermore, the results from the field trial of with-CB mode with eight UEs indicate that the total DL system throughput and user data rate can be significantly improved.
We present an OpenACC-based parallelization implementation of stochastic algorithms for simulating biochemical reaction networks on modern GPUs (graphics processing units). To investigate the effectiveness of using OpenACC for leveraging the massive hardware parallelism of the GPU architecture, we carefully apply OpenACC's language constructs and mechanisms to implementing a parallel version of stochastic simulation algorithms on the GPU. Using our OpenACC implementation in comparison to both the NVidia CUDA and the CPU-based implementations, we report our initial experiences on OpenACC's performance and programming productivity in the context of GPU-accelerated scientific computing.
Takanobu BABA Shinpei WATANABE Boaz JESSIE JACKIN Kanemitsu OOTSU Takeshi OHKAWA Takashi YOKOTA Yoshio HAYASAKI Toyohiko YATAGAI
The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.
Yoshio YAMAGUCHI Yuto MINETANI Maito UMEMURA Hiroyoshi YAMADA
This paper presents a conifer and broad-leaf tree classification scheme that processes high resolution polarimetric synthetic aperture data above X-band. To validate the proposal, fully polarimetric measurements are conducted in a precisely controlled environment to examine the difference between the scattering mechanisms of conifer and broad-leaf trees at 15GHz. With 3.75cm range resolution, scattering matrices of two tree types were measured by a vector network analyzer. Polarimetric analyses using the 4-component scattering power decomposition and alpha-bar angle of eigenvalue decomposition yielded clear distinction between the two tree types. This scheme was also applied to an X-band Pi-SAR2 data set. The results confirm that it is possible to distinguish between tree types using fully polarimetric and high-resolution data above X-band.
The compressive sensing has been applied to develop an effective framework for simultaneously localizing multiple targets in wireless sensor networks. Nevertheless, existing methods implicitly use analog measurements, which have infinite bit precision. In this letter, we focus on off-grid target localization using quantized measurements with only several bits. To address this, we propose a novel localization framework for jointly estimating target locations and dealing with quantization errors, based on the novel application of the variational Bayesian Expectation-Maximization methodology. Simulation results highlight its superior performance.
Renyuan ZHANG Takashi NAKADA Yasuhiko NAKASHIMA
A programmable analog calculation unit (ACU) is designed for vector computations in continuous-time with compact circuit scale. From our early study, it is feasible to retrieve arbitrary two-variable functions through support vector regression (SVR) in silicon. In this work, the dimensions of regression are expanded for vector computations. However, the hardware cost and computing error greatly increase along with the expansion of dimensions. A two-stage architecture is proposed to organize multiple ACUs for high dimensional regression. The computation of high dimensional vectors is separated into several computations of lower dimensional vectors, which are implemented by the free combination of several ACUs with lower cost. In this manner, the circuit scale and regression error are reduced. The proof-of-concept ACU is designed and simulated in a 0.18μm technology. From the circuit simulation results, all the demonstrated calculations with nine operands are executed without iterative clock cycles by 4960 transistors. The calculation error of example functions is below 8.7%.
Karthikeyan PANJAPPAGOUNDER RAJAMANICKAM Sakthivel PERIYASAMY
Background subtraction algorithms generate a background model of the monitoring scene and compare the background model with the current video frame to detect foreground objects. In general, most of the background subtraction algorithms fail to detect foreground objects when the scene illumination changes. An entropy based background subtraction algorithm is proposed to address this problem. The proposed method adapts to illumination changes by updating the background model according to differences in entropy value between the current frame and the previous frame. This entropy based background modeling can efficiently handle both sudden and gradual illumination variations. The proposed algorithm is tested in six video sequences and compared with four algorithms to demonstrate its efficiency in terms of F-score, similarity and frame rate.
Jinjun LUO Shilian WANG Eryang ZHANG
Spectrum sensing is a fundamental requirement for cognitive radio, and it is a challenging problem in impulsive noise modeled by symmetric alpha-stable (SαS) distributions. The Gaussian kernelized energy detector (GKED) performs better than the conventional detectors in SαS distributed noise. However, it fails to detect the DC signal and has high computational complexity. To solve these problems, this paper proposes a more efficient and robust detector based on a Gaussian function (GF). The analytical expressions of the detection and false alarm probabilities are derived and the best parameter for the statistic is calculated. Theoretical analysis and simulation results show that the proposed GF detector has much lower computational complexity than the GKED method, and it can successfully detect the DC signal. In addition, the GF detector performs better than the conventional counterparts including the GKED detector in SαS distributed noise with different characteristic exponents. Finally, we discuss the reason why the GF detector outperforms the conventional counterparts.
Haiqiang LIU Gang HUA Hongsheng YIN Aichun ZHU Ran CUI
Compressed sensing is an effective compression algorithm. It is widely used to measure signals in distributed sensor networks (DSNs). Considering the limited resources of DSNs, the measurement matrices used in DSNs must be simple. In this paper, we construct a deterministic measurement matrix based on Gordon-Mills-Welch (GMW) sequence. The column vectors of the proposed measurement matrix are generated by cyclically shifting a GMW sequence. Compared with some state-of-the-art measurement matrices, the proposed measurement matrix has relative lower computational complexity and needs less storage space. It is suitable for resource-constrained DSNs. Moreover, because the proposed measurement matrix can be realized by using simple shift register, it is more practical. The simulation result shows that, in terms of recovery quality, the proposed measurement matrix performs better than some state-of-the-art measurement matrices.
In this paper, a reduction of the number of components included in direct simulation type active complex filter is proposed. The proposed method is achieved by sharing NIC's (Negative Impedance Converters) which satisfy some conditions. Compared with the conventional method, the proposed one has wide generality. As an example, a third-order complex elliptic filter is designed. The validity of the proposed method is confirmed through experiment.
Ming LI Li SHI Xudong CHEN Sidan DU Yang LI
The large computational complexity makes stereo matching a big challenge in real-time application scenario. The problem of stereo matching in a video sequence is slightly different with that in a still image because there exists temporal correlation among video frames. However, no existing method considered temporal consistency of disparity for algorithm acceleration. In this work, we proposed a scheme called the dynamic disparity range (DDR) to optimize matching cost calculation and cost aggregation steps by narrowing disparity searching range, and a scheme called temporal cost aggregation path to optimize the cost aggregation step. Based on the schemes, we proposed the DDR-SGM and the DDR-MCCNN algorithms for the stereo matching in video sequences. Evaluation results showed that the proposed algorithms significantly reduced the computational complexity with only very slight loss of accuracy. We proved that the proposed optimizations for the stereo matching are effective and the temporal consistency in stereo video is highly useful for either improving accuracy or reducing computational complexity.
A multi-carrier and blind shift-frequency jamming(MCBSFJ) against the pulsed compression radar with order-statistic (OS) constant false alarm rate (CFAR) detector is proposed. Firstly, according to the detection principle of the OS-CFAR detector, the design requirements for jamming signals are proposed. Then, some key parameters of the jamming are derived based on the characteristics of the OS-CFAR detector. As a result, multiple false targets around the real target with the quantity, amplitude and space distribution which can be controlled are produced. The simulation results show that the jamming method can reduce the detection probability of the target effectively.
In sparsity-based optimization problems for two dimensional (2-D) direction-of-arrival (DOA) estimation using L-shaped nested arrays, one of the major issues is computational complexity. A 2-D DOA estimation algorithm is proposed based on reconsitution sparse Bayesian learning (RSBL) and cross covariance matrix decomposition. A single measurement vector (SMV) model is obtained by the difference coarray corresponding to one-dimensional nested array. Through spatial smoothing, the signal measurement vector is transformed into a multiple measurement vector (MMV) matrix. The signal matrix is separated by singular values decomposition (SVD) of the matrix. Using this method, the dimensionality of the sensing matrix and data size can be reduced. The sparse Bayesian learning algorithm is used to estimate one-dimensional angles. By using the one-dimensional angle estimations, the steering vector matrix is reconstructed. The cross covariance matrix of two dimensions is decomposed and transformed. Then the closed expression of the steering vector matrix of another dimension is derived, and the angles are estimated. Automatic pairing can be achieved in two dimensions. Through the proposed algorithm, the 2-D search problem is transformed into a one-dimensional search problem and a matrix transformation problem. Simulations show that the proposed algorithm has better angle estimation accuracy than the traditional two-dimensional direction finding algorithm at low signal-to-noise ratio and few samples.
Takeshi OHKAWA Kazushi YAMASHINA Takuya MATSUMOTO Kanemitsu OOTSU Takashi YOKOTA
In order to realize intelligent robot system, it is required to process large amount of data input from complex and different kinds of sensors in a short time. FPGA is expected to improve process performance of robots due to better performance per power consumption than high performance CPU, but it has lower development productivity than software. In this paper, we discuss automatic generation of FPGA components for robots. A design tool, developed for easy integration of FPGA into robots, is proposed. The tool named cReComp can automatically convert circuit written in Verilog HDL into a software component compliant to a robot software framework ROS (Robot Operation System), which is the standard in robot development. To evaluate its productivity, we conducted a subject experiment. As a result, we confirmed that the automatic generation is effective to ease the development of FPGA components for robots.
Antoniette MONDIGO Tomohiro UENO Kentaro SANO Hiroyuki TAKIZAWA
Since the hardware resource of a single FPGA is limited, one idea to scale the performance of FPGA-based HPC applications is to expand the design space with multiple FPGAs. This paper presents a scalable architecture of a deeply pipelined stream computing platform, where available parallelism and inter-FPGA link characteristics are investigated to achieve a scaled performance. For a practical exploration of this vast design space, a performance model is presented and verified with the evaluation of a tsunami simulation application implemented on Intel Arria 10 FPGAs. Finally, scalability analysis is performed, where speedup is achieved when increasing the computing pipeline over multiple FPGAs while maintaining the problem size of computation. Performance is scaled with multiple FPGAs; however, performance degradation occurs with insufficient available bandwidth and large pipeline overhead brought by inadequate data stream size. Tsunami simulation results show that the highest scaled performance for 8 cascaded Arria 10 FPGAs is achieved with a single pipeline of 5 stream processing elements (SPEs), which obtained a scaled performance of 2.5 TFlops and a parallel efficiency of 98%, indicating the strong scalability of the multi-FPGA stream computing platform.
Qingbo WANG Gaoqi DOU Jun GAO Xianwen HE
A low complexity channel estimation scheme using data-dependent superimposed training (DDST) is proposed in this paper, where the pilots are inserted in more than one block, rather than the single block of the original DDST. Comparing with the original DDST (which improves the performance of channel estimation at the cost of huge computational overheads), the proposed DDST scheme improves the performance of channel estimation with only a slight increase in the consumption of computation resources. The optimal precoder is designed to minimize the data distortion caused by the rank-deficient precoding. The optimal pilots and placement are also provided to improve the performance of channel estimation. In addition, the impact of power allocation between the data and pilots on symbol detection is analyzed, the optimal power allocation scheme is derived to maximize the effective signal-to-noise ratio at the receiver. Simulation results are presented to show the computational advantage of the proposed scheme, and the advantages of the optimal pilots and power allocation scheme.
Liqing SHAN Shexiang MA Xin MENG Long ZHOU
In order to solve the problem in Automatic Identification System (AIS) that the signal in the target slot cannot be correctly received due to partial overlap of signals in adjacent time slots, the paper introduces a new criterion: maximum expected signal power (MESP) and proposes a novel beamforming algorithm based on generalized singular value decomposition (GSVD) and orthogonal projection. The algorithm employs GSVD to estimate the signal subspace, and adopts orthogonal projection to project the received signal onto the orthogonal subspace of the non-target signal. Then, beamforming technique is used to maximize the output power of the target signal on the basis of MESP. Theoretical analysis and simulation results show the effectiveness of the proposed algorithm.
Abderrahmane BOUDI Ivan FARRIS Miloud BAGAA Tarik TALEB
Accounting for the exponential increase in security threats, the development of new defense strategies for pervasive environments is acquiring an ever-growing importance. The expected avalanche of heterogeneous IoT devices which will populate our industrial factories and smart houses will increase the complexity of managing security requirements in a comprehensive way. To this aim, cloud-based security services are gaining notable impetus to provide security mechanisms according to Security-as-a-Service (SECaaS) model. However, the deployment of security applications in remote cloud data-centers can introduce several drawbacks in terms of traffic overhead and latency increase. To cope with this, Edge Computing can provide remarkable advantages avoiding long routing detours. On the other hand, the limited capabilities of edge node introduce potential constraints in the overall management. This paper focuses on the provisioning of virtualized security services in resource-constrained edge nodes by leveraging lightweight virtualization technologies. Our analysis aims at shedding light on the feasibility of container-based security solutions, thus providing useful guidelines towards the orchestration of security at the edge. Our experiments show that the overhead introduced by the containerization is very light.