Siyi HU Makiko ITO Takahide YOSHIKAWA Yuan HE Hiroshi NAKAMURA Masaaki KONDO
Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.
This paper provides a new method to implement substrate integrated defected ground structure (SIDGS)-based bandpass filter (BPF) with adjustable frequency and controllable bandwidth. Compared with previous literature, this method implements a new SIDGS-like resonator capable of tunable frequency in the same plane as the slotted line using a varactor diode, increasing the design flexibility. In addition, the method solves the problem that the tunable BPF constituted by the SIDGS resonator cannot control the bandwidth by introducing a T-shaped non-resonant unit. The theoretical design method and the structural design are shown. Moreover, the configured structure is fabricated and measured to show the validity of the design method in this paper.
Kengo TAJIRI Ryoichi KAWAHARA Yoichi MATSUO
Machine learning (ML) has been used for various tasks in network operations in recent years. However, since the scale of networks has grown and the amount of data generated has increased, it has been increasingly difficult for network operators to conduct their tasks with a single server using ML. Thus, ML with edge-cloud cooperation has been attracting attention for efficiently processing and analyzing a large amount of data. In the edge-cloud cooperation setting, although transmission latency, bandwidth congestion, and accuracy of tasks using ML depend on the load balance of processing data with edge servers and a cloud server in edge-cloud cooperation, the relationship is too complex to estimate. In this paper, we focus on monitoring anomalous traffic as an example of ML tasks for network operations and formulate transmission latency, bandwidth congestion, and the accuracy of the task with edge-cloud cooperation considering the ratio of the amount of data preprocessed in edge servers to that in a cloud server. Moreover, we formulate an optimization problem under constraints for transmission latency and bandwidth congestion to select the proper ratio by using our formulation. By solving our optimization problem, the optimal load balance between edge servers and a cloud server can be selected, and the accuracy of anomalous traffic monitoring can be estimated. Our formulation and optimization framework can be used for other ML tasks by considering the generating distribution of data and the type of an ML model. In accordance with our formulation, we simulated the optimal load balance of edge-cloud cooperation in a topology that mimicked a Japanese network and conducted an anomalous traffic detection experiment by using real traffic data to compare the estimated accuracy based on our formulation and the actual accuracy based on the experiment.
Van-Cam NGUYEN Yasuhiko NAKASHIMA
Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.
Midori NAGASAKA Taiki ARAKAWA Yutaro MOCHIDA Kazunori KAMEDA Shinichi FURUKAWA
In this study, we discuss a structure that realizes a wideband polarization splitter comprising fiber 1 with a single core and fiber 2 with circular pits, which touch the top and bottom of a single core. The refractive index profile of the W type was adopted in the core of fiber 1 to realize the wideband. We compared the maximum bandwidth of BW-15 (bandwidth at an extinction ratio of -15dB) for the W type obtained in this study with those (our previous results) of BW-15 for the step and graded types with cores and pits at the same location; this comparison clarified that the maximum bandwidth of BW-15 for the W type is 5.22 and 4.96 times wider than those of step and graded types, respectively. Furthermore, the device length at the maximum bandwidth improved, becoming slightly shorter. The main results of the FPS in this study are all obtained by numerical analysis based on our proposed MM-DM (a method that combines the multipole method and the difference method for the inhomogeneous region). Our MM-DM is a quite reliable method for high accuracy analysis of the FPS composed of inhomogeneous circular regions.
Yuya HOSODA Arata KAWAMURA Youji IIGUNI
The narrow bandwidth limitation of 300-3400Hz on the public switching telephone network results in speech quality deterioration. In this paper, we propose an artificial bandwidth extension approach that reconstructs the missing lower bandwidth of 50-300Hz using sinusoidal synthesis based on the first formant location. Sinusoidal synthesis generates sinusoidal waves with a harmonic structure. The proposed method detects the fundamental frequency using an autocorrelation method based on YIN algorithm, where a threshold processing avoids the false fundamental frequency detection on unvoiced sounds. The amplitude of the sinusoidal waves is calculated in the time domain from the weighted energy of 300-600Hz. In this case, since the first formant location corresponds to the first peak of the spectral envelope, we reconstruct the harmonic structure to avoid attenuating and overemphasizing by increasing the weight when the first formant location is lower, and vice versa. Consequently, the subjective and objective evaluations show that the proposed method reduces the speech quality difference between the original speech signal and the bandwidth extended speech signal.
Masaru SATO Yoshitaka NIIDA Atsushi YAMADA Junji KOTANI Shiro OZAKI Toshihiro OHKI Naoya OKAMOTO Norikazu NAKAMURA
This paper presents recent progress on high frequency and wide bandwidth GaN high power amplifiers (PAs) that are usable for high-data-rate wireless communications and modern radar systems. The key devices and design techniques for PA are described in this paper. The results of the state-of-the art GaN PAs for microwave to millimeter-wave applications and design methodology for ultra-wideband GaN PAs are shown. In order to realize high output power density, InAlGaN/GaN HEMTs were employed. An output power density of 14.8 W/mm in S-band was achieved which is 1.5 times higher than that of the conventional AlGaN/GaN HEMTs. This technique was applied to the millimeter-wave GaN PAs, and a measured power density at 96 GHz was 3 W/mm. The modified Angelov model was employed for a millimeter-wave design. W-band GaN MMIC achieved the maximum Pout of 1.15 W under CW operation. The PA with Lange coupler achieved 2.6 W at 94 GHz. The authors also developed a wideband PA. A power combiner with an impedance transformation function based on the transmission line transformer (TLT) technique was adopted for the wideband PA design. The fabricated PA exhibited an average Pout of 233 W, an average PAE of 42 %, in the frequency range of 0.5 GHz to 2.1 GHz.
Hui ZHANG Bin SHENG Pengcheng ZHU
Universal filtered multicarrier (UFMC) systems offer a flexibility of filtering sub-bands with arbitrary bandwidth to suppress out-of-band (OoB) emission, while keeping the orthogonality between subcarriers in one sub-band. Oscillator discrepancies between the transmitter and receiver induce carrier frequency offset (CFO) in practical systems. In this paper, we propose a novel CFO estimation method for UFMC systems that has very low computational complexity and can then be used in practical systems. In order to fully exploit the coherence bandwidth of the channel, the training symbols are designed to have several identical segments in the frequency domain. As a result, the integral part of CFO can be estimated by simply determining the correlation between received signal and the training symbol. Simulation results show that the proposed method can achieve almost the same performance as an existing method and even a better performance in channels that have small decay parameter values. The proposed method can also be used in other multicarrier systems, such as orthogonal frequency division multiplexing (OFDM).
Teruaki SHIKUMA Yasuaki YUDA Kenichi HIGUCHI
We propose a novel non-orthogonal multiple access (NOMA)-based optimal multiplexing method for multiple downlink service channels to maximize the integrated system throughput. In the fifth generation (5G) mobile communication system, the support of various wireless communication services such as massive machine-type communications (mMTC), ultra-reliable low latency communications (URLLC), and enhanced mobile broadband (eMBB) is expected. These services will serve different numbers of terminals and have different requirements regarding the spectrum efficiency and fairness among terminals. Furthermore, different operators may have different policies regarding the overall spectrum efficiency and fairness among services. Therefore, efficient radio resource allocation is essential during the multiplexing of multiple downlink service channels considering these requirements. The proposed method achieves better system performance than the conventional orthogonal multiple access (OMA)-based multiplexing method thanks to the wider transmission bandwidth per terminal and inter-terminal interference cancellation using a successive interference canceller (SIC). Computer simulation results reveal that the effectiveness of the proposed method is especially significant when the system prioritizes the fairness among terminals (including fairness among services).
Dong YAN Xurui MAO Sheng XIE Jia CONG Dongqun HAN Yicheng WU
This paper presents an analysis of the relationship between noise and bandwidth in visible light communication (VLC) systems. In the past few years, pre-emphasis and post-equalization techniques were proposed to extend the bandwidth of VLC systems. However, these bandwidth extension techniques also influence noise and sensitivity of the VLC systems. In this paper, first, we build a system model of VLC transceivers and circuit models of pre-emphasis and post-equalization. Next, we theoretically compare the bandwidth and noise of three different transceiver structures comprising a single pre-emphasis circuit, a single post-equalization circuit and a combination of pre-emphasis and post-equalization circuits. Finally, we validate the presented theoretical analysis using experimental results. The result shows that for the same resonant frequency, and for high signal-to-noise ratio (S/N), VLC systems employing post-equalization or pre-emphasis have the same bandwidth extension ability. Therefore, a transceiver employing both the pre-emphasis and post-equalization techniques has a bandwidth √2 times the bandwidth of the systems employing only the pre-emphasis or post-equalization. Based on the theoretical analysis of noise, the VLC system with only active pre-emphasis shows the lowest noise, which is a good choice for low-noise systems. The result of this paper may provide a new perspective of noise and sensitivity of the bandwidth extension techniques in VLC systems.
Jun IWAMOTO Yuma KIKUTANI Renyuan ZHANG Yasuhiko NAKASHIMA
A paradigm shift toward edge computing infrastructures that prioritize small footprint and scalable/easy-to-estimate performance is increasing. In this paper, we propose the following to improve the footprint and the scalability of systolic arrays: (1) column multithreading for reducing the number of physical units and maintaining the performance even for back-to-back floating-point accumulations; (2) a cascaded peer-to-peer AXI bus for a scalable multichip structure and an intra-chip parallel local memory bus for low latency; (3) multilevel loop control in any unit for reducing the startup overhead and adaptive operation shifting for efficient reuse of local memories. We designed a systolic array with a single column × 64 row configuration with Verilog HDL, evaluated the frequency and the performance on an FPGA attached to a ZYNQ system as an AXI slave device, and evaluated the area with a TSMC 28nm library and memory generator and identified the following: (1) the execution speed of a matrix multiplication/a convolution operation/a light-field depth extraction, whose size larger than the capacity of the local memory, is 6.3× / 9.2× / 6.6× compared with a similar systolic array (EMAX); (2) the estimated speed with a 4-chip configuration is 19.6× / 16.0× / 8.5×; (3) the size of a single-chip is 8.4 mm2 (0.31× of EMAX) and the basic performance per area is 2.4×.
Ryota KAMINISHI Haruna MIYAMOTO Sayaka SHIOTA Hitoshi KIYA
This study evaluates the effects of some non-learning blind bandwidth extension (BWE) methods on state-of-the-art automatic speaker verification (ASV) systems. Recently, a non-linear bandwidth extension (N-BWE) method has been proposed as a blind, non-learning, and light-weight BWE approach. Other non-learning BWEs have also been developed in recent years. For ASV evaluations, most data available to train ASV systems is narrowband (NB) telephone speech. Meanwhile, wideband (WB) data have been used to train the state-of-the-art ASV systems, such as i-vector, d-vector, and x-vector. This can cause sampling rate mismatches when all datasets are used. In this paper, we investigate the influence of sampling rate mismatches in the x-vector-based ASV systems and how non-learning BWE methods perform against them. The results showed that the N-BWE method improved the equal error rate (EER) on ASV systems based on the x-vector when the mismatches were present. We researched the relationship between objective measurements and EERs. Consequently, the N-BWE method produced the lowest EERs on both ASV systems and obtained the lower RMS-LSD value and the higher STOI score.
Richard Hsin-Hsyong YANG Chia-Kun LEE Shiunn-Jang CHERN
Continuous phase modulation (CPM) is a very attractive digital modulation scheme, with constant envelope feature and high efficiency in meeting the power and bandwidth requirements. CPM signals with pairs of input sequences that differ in an infinite number of positions and map into pairs of transmitted signals with finite Euclidean distance (ED) are called catastrophic. In the CPM scheme, data sequences that have the catastrophic property are called the catastrophic sequences; they are periodic difference data patterns. The catastrophic sequences are usually with shorter length of the merger. The corresponding minimum normalized squared ED (MNSED) is smaller and below the distance bound. Two important CPM schemes, viz., LREC and LRC schemes, are known to be catastrophic for most cases; they have poor overall power and bandwidth performance. In the literatures, it has been shown that the probability of generating such catastrophic sequences are negligible, therefore, the asymptotic error performance (AEP) of those well-known catastrophic CPM schemes evaluated with the corresponding MNSED, over AWGN channels, might be too negative or pessimistic. To deal with this problem in AWGN channel, this paper presents a new split-merged MNSED and provide criteria to explore which conventional catastrophic CPM scheme could increase the length of mergers with split-merged non-periodic events, effectively. For comparison, we investigate the exact power and bandwidth performance for LREC and LRC CPM for the same bandwidth occupancy. Computer simulation results verify that the AEP evaluating with the split-merged MNSED could achieve up to 3dB gain over the conventional approach.
Xin QI Zheng WEN Keping YU Kazunori MURATA Kouichi SHIBATA Takuro SATO
Low Power Wide Area Network (LPWAN) is designed for low-bandwidth, low-power, long-distance, large-scale connected IoT applications and realistic for networking in an emergency or restricted situation, so it has been proposed as an attractive communication technology to handle unexpected situations that occur during and/or after a disaster. However, the traditional LPWAN with its default protocol will reduce the communication efficiency in disaster situation because a large number of users will send and receive emergency information result in communication jams and soaring error rates. In this paper, we proposed a LPWAN based decentralized network structure as an extension of our previous Disaster Information Sharing System (DISS). Our network structure is powered by Named Node Networking (3N) which is based on the Information-Centric Networking (ICN). This network structure optimizes the excessive useless packet forwarding and path optimization problems with node name routing (NNR). To verify our proposal, we conduct a field experiment to evaluate the efficiency of packet path forwarding between 3N+LPWA structure and ICN+LPWA structure. Experimental results confirm that the load of the entire data transmission network is significantly reduced after NNR optimized the transmission path.
We have proposed and demonstrated a mode selective active-MMI (multimode interferometer) laser diode as a mode selective light source so far. This laser diode features; 1) lasing at a selected space mode, and 2) high modulation bandwidth. Based on these, it is expected to enable high speed interconnection into future personal and mobile devices. In this paper, we explain the mode selection, and the high speed modulation principles. Then, we present our recent results concerning high speed frequency response of the fundamental and first order space modes.
Kazuhiko KINOSHITA Kazuki GINNAN Keita KAWANO Hiroki NAKAYAMA Tsunemasa HAYASHI Takashi WATANABE
The recent widespread use of high-performance terminals has resulted in a rapid increase in mobile data traffic. Therefore, public wireless local area networks (WLANs) are being used often to supplement the cellular networks. Capacity improvement through the dense deployment of access points (APs) is being considered. However, the effective throughput degrades significantly when many users connect to a single AP. In this paper, users are classified into guaranteed bit rate (GBR) users and best effort (BE) users, and we propose a network model to provide those services. In the proposed model, physical APs and the bandwidths are assigned to each service class dynamically using a virtual AP configuration and a virtualized backhaul network, for reducing the call-blocking probability of GBR users and improving the satisfaction degree of BE users. Finally, we evaluate the performance of the proposed model through simulation experiments and discuss its feasibility.
Takahiro NAKAMURA Kenichiro YASHIKI Kenji MIZUTANI Takaaki NEDACHI Junichi FUJIKATA Masatoshi TOKUSHIMA Jun USHIDA Masataka NOGUCHI Daisuke OKAMOTO Yasuyuki SUZUKI Takanori SHIMIZU Koichi TAKEMURA Akio UKITA Yasuhiro IBUSUKI Mitsuru KURIHARA Keizo KINOSHITA Tsuyoshi HORIKAWA Hiroshi YAMAGUCHI Junichi TSUCHIDA Yasuhiko HAGIHARA Kazuhiko KURATA
Optical I/O core based on silicon photonics technology and optical/electrical assembly was developed as a fingertip-size optical module with high bandwidth density, low power consumption, and high temperature operation. The advantages of the optical I/O core, including hybrid integration of quantum dot laser diode and optical pin, allow us to achieve 300-m transmission at 25Gbps per channel when optical I/O core is mounted around field-programmable gate array without clock data recovery.
Takafumi FUJIMOTO Keigo SHIMIZU
In this paper, a printed inverted-F antenna for radiating circularly polarized wave around its resonant frequency is proposed. To get good axial ratio at the frequency band with 10dB-return loss, a rectangular element is loaded at the feeding line perpendicularly. The axial ratio and the frequency giving the minimum axial ratio can be adjusted by the ratio of the length to the width of the whole antenna and by the dimension of the loaded rectangular element. The operational principle for circular polarization is explained using the electric current distributions. Moreover, the approach of the enhancement for the bandwidth is discussed. The simulated and measured bandwidths of the 10dB-return loss with a 3dB-axial ratio are 2.375GHz-2.591GHz (216MHz) and 2.350-2.534GHz (184MHz), respectively. The proposed antenna's dimension is 0.067λ2c (λc is the wavelength at the center frequency). The proposed antenna is compact and planar, and is therefore useful for circular polarization in the ISM band.
We explore ways to optimize online, permutation-based authenticated encryption (AE) schemes for lightweight applications. The lightweight applications demand that AE schemes operate in resource-constrained environments, which raise two issues: 1) implementation costs must be low, and 2) ensuring proper use of a nonce is difficult due to its small size and lack of randomness. Regarding the implementation costs, recently it has been recognized that permutation-based (rather than block-cipher-based) schemes frequently show advantages. However, regarding the security under nonce misuse, the standard permutation-based duplex construction cannot ensure confidentiality. There exists one permutation-based scheme named APE which offers certain robustness against nonce misuse. Unfortunately, the APE construction has several drawbacks such as ciphertext expansion and bidirectional permutation circuits. The ciphertext expansion would require more bandwidth, and the bidirectional circuits would require a larger hardware footprint. In this paper, we propose new constructions of online permutation-based AE that require less bandwidth, a smaller hardware footprint and lower computational costs. We provide security proofs for the new constructions, demonstrating that they are as secure as the APE construction.
Liaoruo HUANG Qingguo SHEN Zhangkai LUO
Bandwidth reservation is an important way to guarantee deterministic end-to-end service quality. However, with the traditional bandwidth reservation mechanism, the allocated bandwidth at each link is by default the same without considering the available resource of each link, which may lead to unbalanced resource utilization and limit the number of user connections that network can accommodate. In this paper, we propose a non-uniform bandwidth reservation method, which can further balance the resource utilization of network by optimizing the reserved bandwidth at each link according to its link load. Furthermore, to implement the proposed method, we devise a flexible and automatic bandwidth reservation mechanism based on meter table of Openflow. Through simulations, it is showed that our method can achieve better load balancing performance and make network accommodate more user connections comparing with the traditional methods in most application scenarios.