Asuka MAKI Daisuke MIYASHITA Shinichi SASAKI Kengo NAKATA Fumihiko TACHIBANA Tomoya SUZUKI Jun DEGUCHI Ryuichi FUJIMOTO
Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
Eiji MIYANO Toshiki SAITOH Ryuhei UEHARA Tsuyoshi YAGITA Tom C. van der ZANDEN
This paper introduces the maximization version of the k-path vertex cover problem, called the MAXIMUM K-PATH VERTEX COVER problem (MaxPkVC for short): A path consisting of k vertices, i.e., a path of length k-1 is called a k-path. If a k-path Pk includes a vertex v in a vertex set S, then we say that v or S covers Pk. Given a graph G=(V, E) and an integer s, the goal of MaxPkVC is to find a vertex subset S⊆V of at most s vertices such that the number of k-paths covered by S is maximized. The problem MaxPkVC is generally NP-hard. In this paper we consider the tractability/intractability of MaxPkVC on subclasses of graphs. We prove that MaxP3VC remains NP-hard even for split graphs. Furthermore, if the input graph is restricted to graphs with constant bounded treewidth, then MaxP3VC can be solved in polynomial time.
Dong YAN Xurui MAO Sheng XIE Jia CONG Dongqun HAN Yicheng WU
This paper presents an analysis of the relationship between noise and bandwidth in visible light communication (VLC) systems. In the past few years, pre-emphasis and post-equalization techniques were proposed to extend the bandwidth of VLC systems. However, these bandwidth extension techniques also influence noise and sensitivity of the VLC systems. In this paper, first, we build a system model of VLC transceivers and circuit models of pre-emphasis and post-equalization. Next, we theoretically compare the bandwidth and noise of three different transceiver structures comprising a single pre-emphasis circuit, a single post-equalization circuit and a combination of pre-emphasis and post-equalization circuits. Finally, we validate the presented theoretical analysis using experimental results. The result shows that for the same resonant frequency, and for high signal-to-noise ratio (S/N), VLC systems employing post-equalization or pre-emphasis have the same bandwidth extension ability. Therefore, a transceiver employing both the pre-emphasis and post-equalization techniques has a bandwidth √2 times the bandwidth of the systems employing only the pre-emphasis or post-equalization. Based on the theoretical analysis of noise, the VLC system with only active pre-emphasis shows the lowest noise, which is a good choice for low-noise systems. The result of this paper may provide a new perspective of noise and sensitivity of the bandwidth extension techniques in VLC systems.
Jun IWAMOTO Yuma KIKUTANI Renyuan ZHANG Yasuhiko NAKASHIMA
A paradigm shift toward edge computing infrastructures that prioritize small footprint and scalable/easy-to-estimate performance is increasing. In this paper, we propose the following to improve the footprint and the scalability of systolic arrays: (1) column multithreading for reducing the number of physical units and maintaining the performance even for back-to-back floating-point accumulations; (2) a cascaded peer-to-peer AXI bus for a scalable multichip structure and an intra-chip parallel local memory bus for low latency; (3) multilevel loop control in any unit for reducing the startup overhead and adaptive operation shifting for efficient reuse of local memories. We designed a systolic array with a single column × 64 row configuration with Verilog HDL, evaluated the frequency and the performance on an FPGA attached to a ZYNQ system as an AXI slave device, and evaluated the area with a TSMC 28nm library and memory generator and identified the following: (1) the execution speed of a matrix multiplication/a convolution operation/a light-field depth extraction, whose size larger than the capacity of the local memory, is 6.3× / 9.2× / 6.6× compared with a similar systolic array (EMAX); (2) the estimated speed with a 4-chip configuration is 19.6× / 16.0× / 8.5×; (3) the size of a single-chip is 8.4 mm2 (0.31× of EMAX) and the basic performance per area is 2.4×.
Meiting XUE Huan ZHANG Weijun LI Feng YU
Sorting is one of the most fundamental problems in mathematics and computer science. Because high-throughput and flexible sorting is a key requirement in modern databases, this paper presents efficient techniques for designing a high-throughput sorting matrix that supports continuous data sequences. There have been numerous studies on the optimization of sorting circuits on FPGA (field-programmable gate array) platforms. These studies focused on attaining high throughput for a single command with fixed data width. However, the architectures proposed do not meet the requirement of diversity for database data types. A sorting matrix architecture is thus proposed to overcome this problem. Our design consists of a matrix of identical basic sorting cells. The sorting cells work in a pipeline and in parallel, and the matrix can simultaneously process multiple data streams, which can be combined into a high-width single-channel data stream or low-width multiple-channel data streams. It can handle continuous sequences and allows for sorting variable-length data sequences. Its maximum throughput is approximately 1.4 GB/s for 32-bit sequences and approximately 2.5 GB/s for 64-bit sequences on our platform.
Ryota KAMINISHI Haruna MIYAMOTO Sayaka SHIOTA Hitoshi KIYA
This study evaluates the effects of some non-learning blind bandwidth extension (BWE) methods on state-of-the-art automatic speaker verification (ASV) systems. Recently, a non-linear bandwidth extension (N-BWE) method has been proposed as a blind, non-learning, and light-weight BWE approach. Other non-learning BWEs have also been developed in recent years. For ASV evaluations, most data available to train ASV systems is narrowband (NB) telephone speech. Meanwhile, wideband (WB) data have been used to train the state-of-the-art ASV systems, such as i-vector, d-vector, and x-vector. This can cause sampling rate mismatches when all datasets are used. In this paper, we investigate the influence of sampling rate mismatches in the x-vector-based ASV systems and how non-learning BWE methods perform against them. The results showed that the N-BWE method improved the equal error rate (EER) on ASV systems based on the x-vector when the mismatches were present. We researched the relationship between objective measurements and EERs. Consequently, the N-BWE method produced the lowest EERs on both ASV systems and obtained the lower RMS-LSD value and the higher STOI score.
Richard Hsin-Hsyong YANG Chia-Kun LEE Shiunn-Jang CHERN
Continuous phase modulation (CPM) is a very attractive digital modulation scheme, with constant envelope feature and high efficiency in meeting the power and bandwidth requirements. CPM signals with pairs of input sequences that differ in an infinite number of positions and map into pairs of transmitted signals with finite Euclidean distance (ED) are called catastrophic. In the CPM scheme, data sequences that have the catastrophic property are called the catastrophic sequences; they are periodic difference data patterns. The catastrophic sequences are usually with shorter length of the merger. The corresponding minimum normalized squared ED (MNSED) is smaller and below the distance bound. Two important CPM schemes, viz., LREC and LRC schemes, are known to be catastrophic for most cases; they have poor overall power and bandwidth performance. In the literatures, it has been shown that the probability of generating such catastrophic sequences are negligible, therefore, the asymptotic error performance (AEP) of those well-known catastrophic CPM schemes evaluated with the corresponding MNSED, over AWGN channels, might be too negative or pessimistic. To deal with this problem in AWGN channel, this paper presents a new split-merged MNSED and provide criteria to explore which conventional catastrophic CPM scheme could increase the length of mergers with split-merged non-periodic events, effectively. For comparison, we investigate the exact power and bandwidth performance for LREC and LRC CPM for the same bandwidth occupancy. Computer simulation results verify that the AEP evaluating with the split-merged MNSED could achieve up to 3dB gain over the conventional approach.
Farley Soares OLIVEIRA Hidefumi HIRAISHI Hiroshi IMAI
Revisiting the Sekine-Imai-Tani top-down algorithm to compute the BDD of all spanning trees and the Tutte polynomial of a given graph, we explicitly analyze the Fixed-Parameter Tractable (FPT) time complexity with respect to its (proper) pathwidth, pw (ppw), and obtain a bound of O*(Bellmin{pw}+1,ppw}), where Belln denotes the n-th Bell number, defined as the number of partitions of a set of n elements. We further investigate the case of complete graphs in terms of Bell numbers and related combinatorics, obtaining a time complexity bound of Belln-O(n/log n).
Xin QI Zheng WEN Keping YU Kazunori MURATA Kouichi SHIBATA Takuro SATO
Low Power Wide Area Network (LPWAN) is designed for low-bandwidth, low-power, long-distance, large-scale connected IoT applications and realistic for networking in an emergency or restricted situation, so it has been proposed as an attractive communication technology to handle unexpected situations that occur during and/or after a disaster. However, the traditional LPWAN with its default protocol will reduce the communication efficiency in disaster situation because a large number of users will send and receive emergency information result in communication jams and soaring error rates. In this paper, we proposed a LPWAN based decentralized network structure as an extension of our previous Disaster Information Sharing System (DISS). Our network structure is powered by Named Node Networking (3N) which is based on the Information-Centric Networking (ICN). This network structure optimizes the excessive useless packet forwarding and path optimization problems with node name routing (NNR). To verify our proposal, we conduct a field experiment to evaluate the efficiency of packet path forwarding between 3N+LPWA structure and ICN+LPWA structure. Experimental results confirm that the load of the entire data transmission network is significantly reduced after NNR optimized the transmission path.
We have proposed and demonstrated a mode selective active-MMI (multimode interferometer) laser diode as a mode selective light source so far. This laser diode features; 1) lasing at a selected space mode, and 2) high modulation bandwidth. Based on these, it is expected to enable high speed interconnection into future personal and mobile devices. In this paper, we explain the mode selection, and the high speed modulation principles. Then, we present our recent results concerning high speed frequency response of the fundamental and first order space modes.
Kazuhiko KINOSHITA Kazuki GINNAN Keita KAWANO Hiroki NAKAYAMA Tsunemasa HAYASHI Takashi WATANABE
The recent widespread use of high-performance terminals has resulted in a rapid increase in mobile data traffic. Therefore, public wireless local area networks (WLANs) are being used often to supplement the cellular networks. Capacity improvement through the dense deployment of access points (APs) is being considered. However, the effective throughput degrades significantly when many users connect to a single AP. In this paper, users are classified into guaranteed bit rate (GBR) users and best effort (BE) users, and we propose a network model to provide those services. In the proposed model, physical APs and the bandwidths are assigned to each service class dynamically using a virtual AP configuration and a virtualized backhaul network, for reducing the call-blocking probability of GBR users and improving the satisfaction degree of BE users. Finally, we evaluate the performance of the proposed model through simulation experiments and discuss its feasibility.
Takahiro NAKAMURA Kenichiro YASHIKI Kenji MIZUTANI Takaaki NEDACHI Junichi FUJIKATA Masatoshi TOKUSHIMA Jun USHIDA Masataka NOGUCHI Daisuke OKAMOTO Yasuyuki SUZUKI Takanori SHIMIZU Koichi TAKEMURA Akio UKITA Yasuhiro IBUSUKI Mitsuru KURIHARA Keizo KINOSHITA Tsuyoshi HORIKAWA Hiroshi YAMAGUCHI Junichi TSUCHIDA Yasuhiko HAGIHARA Kazuhiko KURATA
Optical I/O core based on silicon photonics technology and optical/electrical assembly was developed as a fingertip-size optical module with high bandwidth density, low power consumption, and high temperature operation. The advantages of the optical I/O core, including hybrid integration of quantum dot laser diode and optical pin, allow us to achieve 300-m transmission at 25Gbps per channel when optical I/O core is mounted around field-programmable gate array without clock data recovery.
Takafumi FUJIMOTO Keigo SHIMIZU
In this paper, a printed inverted-F antenna for radiating circularly polarized wave around its resonant frequency is proposed. To get good axial ratio at the frequency band with 10dB-return loss, a rectangular element is loaded at the feeding line perpendicularly. The axial ratio and the frequency giving the minimum axial ratio can be adjusted by the ratio of the length to the width of the whole antenna and by the dimension of the loaded rectangular element. The operational principle for circular polarization is explained using the electric current distributions. Moreover, the approach of the enhancement for the bandwidth is discussed. The simulated and measured bandwidths of the 10dB-return loss with a 3dB-axial ratio are 2.375GHz-2.591GHz (216MHz) and 2.350-2.534GHz (184MHz), respectively. The proposed antenna's dimension is 0.067λ2c (λc is the wavelength at the center frequency). The proposed antenna is compact and planar, and is therefore useful for circular polarization in the ISM band.
We explore ways to optimize online, permutation-based authenticated encryption (AE) schemes for lightweight applications. The lightweight applications demand that AE schemes operate in resource-constrained environments, which raise two issues: 1) implementation costs must be low, and 2) ensuring proper use of a nonce is difficult due to its small size and lack of randomness. Regarding the implementation costs, recently it has been recognized that permutation-based (rather than block-cipher-based) schemes frequently show advantages. However, regarding the security under nonce misuse, the standard permutation-based duplex construction cannot ensure confidentiality. There exists one permutation-based scheme named APE which offers certain robustness against nonce misuse. Unfortunately, the APE construction has several drawbacks such as ciphertext expansion and bidirectional permutation circuits. The ciphertext expansion would require more bandwidth, and the bidirectional circuits would require a larger hardware footprint. In this paper, we propose new constructions of online permutation-based AE that require less bandwidth, a smaller hardware footprint and lower computational costs. We provide security proofs for the new constructions, demonstrating that they are as secure as the APE construction.
Ryoji MIYAHARA Akihiko SUGIYAMA
This paper proposes a directional noise suppressor with a specified constant beamwidth for directional interferences and diffuse noise. A directional gain is calculated based on interchannel phase difference and combined with a spectral gain commonly used in single-channel noise suppressors. The beamwidth can be specified as passband edges of the directional gain. In order to implement frequency-independent constant beamwidth, frequency-proportionate directional gains are defined for different frequencies as a constraint. Evaluation with signals recorded by a commercial PC demonstrates good agreement between the theoretical and the measured directivity. The signal-to-noise ratio improvement and the PESQ score for the enhanced signal are improved by 24.4dB and 0.3 over a conventional noise suppressor. In a speech recognition scenario, the proposed directional noise suppressor outperforms both the conventional nondirectional noise suppressor and the conventional directional noise suppressor based on phase based T/F filtering with a negligible degradation in the word error rate for clean speech.
Xiang ZHAO Zishu HE Yikai WANG Yuan JIANG
This letter addresses the problem of space-time adaptive processing (STAP) for airborne nonuniform linear array (NLA) radar using a generalized sidelobe canceller (GSC). Due to the difficulty of determining the spatial nulls for the NLAs, it is a problem to obtain a valid blocking matrix (BM) of the GSC directly. In order to solve this problem and improve the STAP performance, a BM modification method based on the modified Gram-Schmidt orthogonalization algorithm is proposed. The modified GSC processor can achieve the optimal STAP performance and as well a faster convergence rate than the orthogonal subspace projection method. Numerical simulations validate the effectiveness of the proposed methods.
Liaoruo HUANG Qingguo SHEN Zhangkai LUO
Bandwidth reservation is an important way to guarantee deterministic end-to-end service quality. However, with the traditional bandwidth reservation mechanism, the allocated bandwidth at each link is by default the same without considering the available resource of each link, which may lead to unbalanced resource utilization and limit the number of user connections that network can accommodate. In this paper, we propose a non-uniform bandwidth reservation method, which can further balance the resource utilization of network by optimizing the reserved bandwidth at each link according to its link load. Furthermore, to implement the proposed method, we devise a flexible and automatic bandwidth reservation mechanism based on meter table of Openflow. Through simulations, it is showed that our method can achieve better load balancing performance and make network accommodate more user connections comparing with the traditional methods in most application scenarios.
Filippos BALASIS Sugang XU Yoshiaki TANAKA
Orthogonal frequency division multiplexing (OFDM) promises to provide the necessary boost in the core networks' capacity along with the required flexibility in order to cope with the Internet's growing heterogeneous traffic. At the same time, wavelength division multiplexing (WDM) technology remains a cost-effective and reliable solution especially for long-haul transmission. Due to the higher implementation cost of optical OFDM transmission technology, it is expected that OFDM-based bandwidth variable transponders (BVT) will co-exist with conventional WDM ones. In this paper, we provide an integer linear programming (ILP) formulation that minimizes the cost and power consumption of such hybrid architecture and then a comparison is made with a pure OFDM-based elastic optical network (EON) and a mixed line rate (MLR) WDM optical network in order to evaluate their cost and energy efficiency.
Lei CHEN Wei LU Ergude BAO Liqiang WANG Weiwei XING Yuanyuan CAI
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the reducer input data is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important feature of MapReduce because it determines the reducer nodes to which map output results will be sent. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering both essential issues can be divided into two categories: those that preferentially improve data locality, such as LEEN, and those that preferentially improve load balance, such as CLP. However, all these studies ignore the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a naive Bayes classifier based partitioner, namely, BAPM, which achieves better performance because it can automatically choose the proper algorithm (LEEN or CLP) by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster, and the results show that BAPM boosts the computing performance of MapReduce. The selection accuracy reaches 95.15%. Further, compared with other popular algorithms, under specific bandwidths, the improvement BAPM achieved is up to 31.31%.
Hiroyuki SAITO Naoki MINATO Hideaki TAMAI Hironori SASAKI
Capital expenditure (CAPEX) reduction and efficient wavelength allocation are critical for the future access networks. Elastic lambda aggregation network (EλAN) based on WDM and OFDM technologies is expected to realize efficient wavelength allocation. In this paper, we propose adaptive bandwidth allocation (ABA) algorithm for EλAN under the conditions of crowded networks, in which modulation format, symbol rate and the number of sub-carriers are adaptively decided based on the distance of PON-section, QoS and bandwidth demand of each ONU. Network simulation results show that the proposed algorithm can effectively reduce the total bandwidth and achieve steady high spectrum efficiency and contribute to the further reduction of CAPEX of future optical access networks.