The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ACH(1072hit)

241-260hit(1072hit)

  • Effect of User Antenna Selection on Block Beamforming Algorithms for Suppressing Inter-User Interference in Multiuser MIMO System Open Access

    Nobuyoshi KIKUMA  Kentaro NISHIMORI  Takefumi HIRAGURI  

     
    INVITED PAPER

      Pubricized:
    2018/01/22
      Vol:
    E101-B No:7
      Page(s):
    1523-1535

    Multiuser MIMO (MU-MIMO) improves the system channel capacity by generating a large virtual MIMO channel between a base station and multiple user terminals (UTs) with effective utilization of wireless resources. Block beamforming algorithms such as Block Diagonalization (BD) and Block Maximum Signal-to-Noise ratio (BMSN) have been proposed in order to realize MU-MIMO broadcast transmission. The BD algorithm cancels inter-user interference (IUI) by creating the weights so that the channel matrices for the other users are set to be zero matrices. The BMSN algorithm has a function of maintaining a high gain response for each desired user in addition to IUI cancellation. Therefore, the BMSN algorithm generally outperforms the BD algorithm. However, when the number of transmit antennas is equal to the total number of receive antennas, the transmission rate by both BD and BMSN algorithms is decreased. This is because the eigenvalues of channel matrices are too small to support data transmission. To resolve the issue, this paper focuses on an antenna selection (AS) method at the UTs. The AS method reduces the number of pattern nulls for the other users except an intended user in the BD and BMSN algorithms. It is verified via bit error rate (BER) evaluation that the AS method is effective in the BD and BMSN algorithms, especially, when the number of user antennas with a low bit rate (i.e., low signal-to-noise power ratio) is increased. Moreover, this paper evaluates the achievable bit rate and throughput including an actual channel state information feedback based on IEEE802.11ac standard. Although the number of equivalent receive antenna is reduced to only one by the AS method when the number of antennas at the UT is two, it is shown that the throughputs by BD and BMSN with the AS method (BD-AS and BMSN-AS) are higher than those by the conventional BD and BMSN algorithms.

  • HOAH: A Hybrid TCP Throughput Prediction with Autoregressive Model and Hidden Markov Model for Mobile Networks

    Bo WEI  Kenji KANAI  Wataru KAWAKAMI  Jiro KATTO  

     
    PAPER

      Pubricized:
    2018/01/22
      Vol:
    E101-B No:7
      Page(s):
    1612-1624

    Throughput prediction is one of the promising techniques to improve the quality of service (QoS) and quality of experience (QoE) of mobile applications. To address the problem of predicting future throughput distribution accurately during the whole session, which can exhibit large throughput fluctuations in different scenarios (especially scenarios of moving user), we propose a history-based throughput prediction method that utilizes time series analysis and machine learning techniques for mobile network communication. This method is called the Hybrid Prediction with the Autoregressive Model and Hidden Markov Model (HOAH). Different from existing methods, HOAH uses Support Vector Machine (SVM) to classify the throughput transition into two classes, and predicts the transmission control protocol (TCP) throughput by switching between the Autoregressive Model (AR Model) and the Gaussian Mixture Model-Hidden Markov Model (GMM-HMM). We conduct field experiments to evaluate the proposed method in seven different scenarios. The results show that HOAH can predict future throughput effectively and decreases the prediction error by a maximum of 55.95% compared with other methods.

  • Energy Efficient Resource Selection and Allocation Strategy for Virtual Machine Consolidation in Cloud Datacenters

    Yaohui CHANG  Chunhua GU  Fei LUO  Guisheng FAN  Wenhao FU  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/03/30
      Vol:
    E101-D No:7
      Page(s):
    1816-1827

    Virtual Machine Placement (VMP) plays an important role in ensuring efficient resource provisioning of physical machines (PMs) and energy efficiency in Infrastructure as a Service (IaaS) data centers. Efficient server consolidation assisted by virtual machine (VM) migration can promote the utilization level of the servers and switch the idle PMs to sleep mode to save energy. The trade-off between energy and performance is difficult, because consolidation may cause performance degradation, even service level agreement (SLA) violations. A novel residual available capacity (RAC) resource model is proposed to resolve the VM selection and allocation problem from the cloud service provider (CSP) perspective. Furthermore, a novel heuristic VM selection policy for server consolidation, named Minimized Square Root available Resource (MISR) is proposed. Meanwhile, an efficient VM allocation policy, named Balanced Selection (BS) based on RAC is proposed. The effectiveness validation of the BS-MISR combination is conducted on CloudSim with real workloads from the CoMon project. Evaluation results of experiments show that the proposed combinationBS-MISR can significantly reduce the energy consumption, with an average of 36.35% compared to the Local Regression and Minimum Migration Time (LR-MMT) combination policy. Moreover, the BS-MISR ensures a reasonable level of SLAs compared to the benchmarks.

  • Fuzzy Levy-GJR-GARCH American Option Pricing Model Based on an Infinite Pure Jump Process

    Huiming ZHANG  Junzo WATADA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/04/16
      Vol:
    E101-D No:7
      Page(s):
    1843-1859

    This paper focuses mainly on issues related to the pricing of American options under a fuzzy environment by taking into account the clustering of the underlying asset price volatility, leverage effect and stochastic jumps. By treating the volatility as a parabolic fuzzy number, we constructed a Levy-GJR-GARCH model based on an infinite pure jump process and combined the model with fuzzy simulation technology to perform numerical simulations based on the least squares Monte Carlo approach and the fuzzy binomial tree method. An empirical study was performed using American put option data from the Standard & Poor's 100 index. The findings are as follows: under a fuzzy environment, the result of the option valuation is more precise than the result under a clear environment, pricing simulations of short-term options have higher precision than those of medium- and long-term options, the least squares Monte Carlo approach yields more accurate valuation than the fuzzy binomial tree method, and the simulation effects of different Levy processes indicate that the NIG and CGMY models are superior to the VG model. Moreover, the option price increases as the time to expiration of options is extended and the exercise price increases, the membership function curve is asymmetric with an inclined left tendency, and the fuzzy interval narrows as the level set α and the exponent of membership function n increase. In addition, the results demonstrate that the quasi-random number and Brownian Bridge approaches can improve the convergence speed of the least squares Monte Carlo approach.

  • MRO-PUF: Physically Unclonable Function with Enhanced Resistance against Machine Learning Attacks Utilizing Instantaneous Output of Ring Oscillator

    Masayuki HIROMOTO  Motoki YOSHINAGA  Takashi SATO  

     
    PAPER

      Vol:
    E101-A No:7
      Page(s):
    1035-1044

    This paper proposes MRO-PUF, a new architecture for ring-oscillator-based physically unclonable functions (PUFs) with enhanced resistance against machine learning attacks. In the proposed PUF, an instantaneous output value of a ring oscillator is used as a response, whereas the most existing PUFs directly use propagation delays to determine the response. Since the response of the MRO-PUF is non-linear and discontinuous as the delay of the ring oscillator increases, the prediction of the response by machine learning attacks is difficult. Through the performance evaluation of the MRO-PUF with simulations, it achieves 15 times stronger resistance against machine learning attacks using a support vector machine compared to the existing ones such as an arbiter PUF and a bistable ring PUF. The MRO-PUF also achieves a sufficient level of the basic performance of PUFs in terms of uniqueness and robustness.

  • Co-Propagation with Distributed Seeds for Salient Object Detection

    Yo UMEKI  Taichi YOSHIDA  Masahiro IWAHASHI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2018/03/09
      Vol:
    E101-D No:6
      Page(s):
    1640-1647

    In this paper, we propose a method of salient object detection based on distributed seeds and a co-propagation of seed information. Salient object detection is a technique which estimates important objects for human by calculating saliency values of pixels. Previous salient object detection methods often produce incorrect saliency values near salient objects in the case of images which have some objects, called the leakage of saliencies. Therefore, a method based on a co-propagation, the scale invariant feature transform, the high dimensional color transform, and machine learning is proposed to reduce the leakage. Firstly, the proposed method estimates regions clearly located in salient objects and the background, which are called as seeds and resultant seeds, are distributed over images. Next, the saliency information of seeds is simultaneously propagated, which is then referred as a co-propagation. The proposed method can reduce the leakage caused because of the above methods when the co-propagation of each information collide with each other near the boundary. Experiments show that the proposed method significantly outperforms the state-of-the-art methods in mean absolute error and F-measure, which perceptually reduces the leakage.

  • Compact CAR: Low-Overhead Cache Replacement Policy for an ICN Router

    Atsushi OOKA  Suyong EUM  Shingo ATA  Masayuki MURATA  

     
    PAPER-Network System

      Pubricized:
    2017/12/18
      Vol:
    E101-B No:6
      Page(s):
    1366-1378

    Information-centric networking (ICN) has gained attention from network research communities due to its capability of efficient content dissemination. In-network caching function in ICN plays an important role to achieve the design motivation. However, many researchers on in-network caching due to its ability to efficiently disseminate content. The in-network caching function in ICN plays an important role in realizing the design goals. However, many in-network caching researchers have focused on where to cache rather than how to cache: the former is known as content deployment in the network and the latter is known as cache replacement in an ICN router. Although the cache replacement has been intensively researched in the context of web-caching and content delivery network previously, networks, the conventional approaches cannot be directly applied to ICN due to the fine granularity of chunks in ICN, which eventually changes the access patterns. In this paper, we argue that ICN requires a novel cache replacement algorithm to fulfill the requirements in the design of a high performance ICN router. Then, we propose a novel cache replacement algorithm to satisfy the requirements named Compact CLOCK with Adaptive Replacement (Compact CAR), which can reduce the consumption of cache memory to one-tenth compared to conventional approaches. In this paper, we argue that ICN requires a novel cache replacement algorithm to fulfill the requirements set for high performance ICN routers. Our solution, Compact CLOCK with Adaptive Replacement (Compact CAR), is a novel cache replacement algorithm that satisfies the requirements. The evaluation result shows that the consumption of cache memory required to achieve a desired performance can be reduced by 90% compared to conventional approaches such as FIFO and CLOCK.

  • Extreme Learning Machine with Superpixel-Guided Composite Kernels for SAR Image Classification

    Dongdong GUAN  Xiaoan TANG  Li WANG  Junda ZHANG  

     
    LETTER-Pattern Recognition

      Pubricized:
    2018/03/14
      Vol:
    E101-D No:6
      Page(s):
    1703-1706

    Synthetic aperture radar (SAR) image classification is a popular yet challenging research topic in the field of SAR image interpretation. This paper presents a new classification method based on extreme learning machine (ELM) and the superpixel-guided composite kernels (SGCK). By introducing the generalized likelihood ratio (GLR) similarity, a modified simple linear iterative clustering (SLIC) algorithm is firstly developed to generate superpixel for SAR image. Instead of using a fixed-size region, the shape-adaptive superpixel is used to exploit the spatial information, which is effective to classify the pixels in the detailed and near-edge regions. Following the framework of composite kernels, the SGCK is constructed base on the spatial information and backscatter intensity information. Finally, the SGCK is incorporated an ELM classifier. Experimental results on both simulated SAR image and real SAR image demonstrate that the proposed framework is superior to some traditional classification methods.

  • Extraction and Recognition of Shoe Logos with a Wide Variety of Appearance Using Two-Stage Classifiers

    Kazunori AOKI  Wataru OHYAMA  Tetsushi WAKABAYASHI  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1325-1332

    A logo is a symbolic presentation that is designed not only to identify a product manufacturer but also to attract the attention of shoppers. Shoe logos are a challenging subject for automatic extraction and recognition using image analysis techniques because they have characteristics that distinguish them from those of other products; that is, there is much within-class variation in the appearance of shoe logos. In this paper, we propose an automatic extraction and recognition method for shoe logos with a wide variety of appearance using a limited number of training samples. The proposed method employs maximally stable extremal regions for the initial region extraction, an iterative algorithm for region grouping, and gradient features and a support vector machine for logo recognition. The results of performance evaluation experiments using a logo dataset that consists of a wide variety of appearances show that the proposed method achieves promising performance for both logo extraction and recognition.

  • Forecasting Service Performance on the Basis of Temporal Information by the Conditional Restricted Boltzmann Machine

    Jiali YOU  Hanxing XUE  Yu ZHUO  Xin ZHANG  Jinlin WANG  

     
    PAPER-Network

      Pubricized:
    2017/11/10
      Vol:
    E101-B No:5
      Page(s):
    1210-1221

    Predicting the service performance of Internet applications is important in service selection, especially for video services. In order to design a predictor for forecasting video service performance in third-party application, two famous service providers in China, Iqiyi and Letv, are monitored and analyzed. The study highlights that the measured performance in the observation period is time-series data, and it has strong autocorrelation, which means it is predictable. In order to combine the temporal information and map the measured data to a proper feature space, the authors propose a predictor based on a Conditional Restricted Boltzmann Machine (CRBM), which can capture the potential temporal relationship of the historical information. Meanwhile, the measured data of different sources are combined to enhance the training process, which can enlarge the training size and avoid the over-fit problem. Experiments show that combining the measured results from different resolutions for a video can raise prediction performance, and the CRBM algorithm shows better prediction ability and more stable performance than the baseline algorithms.

  • A Hardware-Based Caching System on FPGA NIC for Blockchain

    Yuma SAKAKIBARA  Shin MORISHIMA  Kohei NAKAMURA  Hiroki MATSUTANI  

     
    PAPER-Computer System

      Pubricized:
    2018/02/02
      Vol:
    E101-D No:5
      Page(s):
    1350-1360

    Engineers and researchers have recently paid attention to Blockchain. Blockchain is a fault-tolerant distributed ledger without administrators. Blockchain is originally derived from cryptocurrency, but it is possible to be applied to other industries. Transferring digital asset is called a transaction. Blockchain holds all transactions, so the total amount of Blockchain data will increase as time proceeds. On the other hand, the number of Internet of Things (IoT) products has been increasing. It is difficult for IoT products to hold all Blockchain data because of their storage capacity. Therefore, they access Blockchain data via servers that have Blockchain data. However, if a lot of IoT products access Blockchain network via servers, server overloads will occur. Thus, it is useful to reduce workloads and improve throughput. In this paper, we propose a caching technique using a Field Programmable Gate Array-based (FPGA) Network Interface Card (NIC) which possesses four 10Gigabit Ethernet (10GbE) interfaces. The proposed system can reduce server overloads, because the FPGA NIC instead of the server responds to requests from IoT products if cache hits. We implemented the proposed hardware cache to achieve high throughput on NetFPGA-10G board. We counted the number of requests that the server or the FPGA NIC processed as an evaluation. As a result, the throughput improved by on average 1.97 times when hitting the cache.

  • Semi-Blind Interference Cancellation with Multiple Receive Antennas for MIMO Heterogeneous Networks

    Huiyu YE  Kazuhiko FUKAWA  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2017/11/10
      Vol:
    E101-B No:5
      Page(s):
    1299-1310

    Our previous work proposed a semi-blind single antenna interference cancellation scheme to cope with severe inter-cell interference in heterogeneous networks. This paper extends the scheme to allow multiple-receive-antenna implementation. It does not require knowledge of the training sequences of interfering signals and can cancel multiple interfering signals irrespective of the number of receive antennas. The proposed scheme applies an enhanced version of the quantized channel approach to suboptimal joint channel estimation and signal detection (JCESD) during the training period in order to blindly estimate channels of the interfering signals, while reducing the computational complexity of optimum JCESD drastically. Different from the previous work, the proposed scheme applies the quantized channel generation and local search at each individual receive antenna so as to estimate transmitted symbol matrices during the training period. Then, joint estimation is newly introduced in order to estimate a channel matrix from the estimated symbol matrices, which operates in the same manner as the expectation maximization (EM) algorithm and considers signals received at all receive antennas. Using the estimated channels, the proposed scheme performs multiuser detection (MUD) during the data period under the maximum likelihood (ML) criterion in order to cancel the interference. Computer simulations with two receive antennas under two-interfering-stream conditions show that the proposed scheme outperforms interference rejection combining (IRC) with perfect channel state information (CSI) and MUD with channels estimated by a conventional scheme based on the generalized Viterbi algorithm, and can achieve almost the same average bit error rate (BER) performance as MUD with channels estimated from sufficiently long training sequences of both the desired stream(s) and the interfering streams, while reducing the computational complexity significantly compared with full search involving all interfering signal candidates during the training period.

  • Towards Ultra-High-Speed Cryogenic Single-Flux-Quantum Computing Open Access

    Koki ISHIDA  Masamitsu TANAKA  Takatsugu ONO  Koji INOUE  

     
    INVITED PAPER

      Vol:
    E101-C No:5
      Page(s):
    359-369

    CMOS microprocessors are limited in their capacity for clock speed improvement because of increasing computing power, i.e., they face a power-wall problem. Single-flux-quantum (SFQ) circuits offer a solution with their ultra-fast-speed and ultra-low-power natures. This paper introduces our contributions towards ultra-high-speed cryogenic SFQ computing. The first step is to design SFQ microprocessors. From qualitatively and quantitatively evaluating past-designed SFQ microprocessors, we have found that revisiting the architecture of SFQ microprocessors and on-chip caches is the first critical challenge. On the basis of cross-layer discussions and analysis, we came to the conclusion that a bit-parallel gate-level pipeline architecture is the best solution for SFQ designs. This paper summarizes our current research results targeting SFQ microprocessors and on-chip cache architectures.

  • Performance Evaluation of Pipeline-Based Processing for the Caffe Deep Learning Framework

    Ayae ICHINOSE  Atsuko TAKEFUSA  Hidemoto NAKADA  Masato OGUCHI  

     
    PAPER

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1042-1052

    Many life-log analysis applications, which transfer data from cameras and sensors to a Cloud and analyze them in the Cloud, have been developed as the use of various sensors and Cloud computing technologies has spread. However, difficulties arise because of the limited network bandwidth between such sensors and the Cloud. In addition, sending raw sensor data to a Cloud may introduce privacy issues. Therefore, we propose a pipelined method for distributed deep learning processing between sensors and the Cloud to reduce the amount of data sent to the Cloud and protect the privacy of users. In this study, we measured the processing times and evaluated the performance of our method using two different datasets. In addition, we performed experiments using three types of machines with different performance characteristics on the client side and compared the processing times. The experimental results show that the accuracy of deep learning with coarse-grained data is comparable to that achieved with the default parameter settings, and the proposed distributed processing method has performance advantages in cases of insufficient network bandwidth between realistic sensors and a Cloud environment. In addition, it is confirmed that the process that most affects the overall processing time varies depending on the machine performance on the client side, and the most efficient distribution method similarly differs.

  • Realizability of Choreography Given by Two Scenarios

    Toshiki KINOSHITA  Toshiyuki MIYAMOTO  

     
    PAPER

      Vol:
    E101-A No:2
      Page(s):
    345-356

    For a service-oriented architecture-based system, the problem of synthesizing a concrete model (i.e., behavioral model) for each peer configuring the system from an abstract specification-which is referred to as choreography-is known as the choreography realization problem. A flow of interaction of peers is called a scenario. In our previous study, we showed conditions and an algorithm to synthesize concrete models when choreography is given by one scenario. In this paper, we extend the study for choreography given by two scenarios. We show necessary and sufficient conditions on the realizability of choreography under both cases where there exist conflicts between scenarios and no conflicts exist.

  • Drift-Free Tracking Surveillance Based on Online Latent Structured SVM and Kalman Filter Modules

    Yung-Yao CHEN  Yi-Cheng ZHANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2017/11/14
      Vol:
    E101-D No:2
      Page(s):
    491-503

    Tracking-by-detection methods consider tracking task as a continuous detection problem applied over video frames. Modern tracking-by-detection trackers have online learning ability; the update stage is essential because it determines how to modify the classifier inherent in a tracker. However, most trackers search for the target within a fixed region centered at the previous object position; thus, they lack spatiotemporal consistency. This becomes a problem when the tracker detects an incorrect object during short-term occlusion. In addition, the scale of the bounding box that contains the target object is usually assumed not to change. This assumption is unrealistic for long-term tracking, where the scale of the target varies as the distance between the target and the camera changes. The accumulation of errors resulting from these shortcomings results in the drift problem, i.e. drifting away from the target object. To resolve this problem, we present a drift-free, online learning-based tracking-by-detection method using a single static camera. We improve the latent structured support vector machine (SVM) tracker by designing a more robust tracker update step by incorporating two Kalman filter modules: the first is used to predict an adaptive search region in consideration of the object motion; the second is used to adjust the scale of the bounding box by accounting for the background model. We propose a hierarchical search strategy that combines Bhattacharyya coefficient similarity analysis and Kalman predictors. This strategy facilitates overcoming occlusion and increases tracking efficiency. We evaluate this work using publicly available videos thoroughly. Experimental results show that the proposed method outperforms the state-of-the-art trackers.

  • An FPGA Realization of a Random Forest with k-Means Clustering Using a High-Level Synthesis Design

    Akira JINGUJI  Shimpei SATO  Hiroki NAKAHARA  

     
    PAPER-Emerging Applications

      Pubricized:
    2017/11/17
      Vol:
    E101-D No:2
      Page(s):
    354-362

    A random forest (RF) is a kind of ensemble machine learning algorithm used for a classification and a regression. It consists of multiple decision trees that are built from randomly sampled data. The RF has a simple, fast learning, and identification capability compared with other machine learning algorithms. It is widely used for application to various recognition systems. Since it is necessary to un-balanced trace for each tree and requires communication for all the ones, the random forest is not suitable in SIMD architectures such as GPUs. Although the accelerators using the FPGA have been proposed, such implementations were based on HDL design. Thus, they required longer design time than the soft-ware based realizations. In the previous work, we showed the high-level synthesis design of the RF including the fully pipelined architecture and the all-to-all communication. In this paper, to further reduce the amount of hardware, we use k-means clustering to share comparators of the branch nodes on the decision tree. Also, we develop the krange tool flow, which generates the bitstream with a few number of hyper parameters. Since the proposed tool flow is based on the high-level synthesis design, we can obtain the high performance RF with short design time compared with the conventional HDL design. We implemented the RF on the Xilinx Inc. ZC702 evaluation board. Compared with the CPU (Intel Xeon (R) E5607 Processor) and the GPU (NVidia Geforce Titan) implementations, as for the performance, the FPGA realization was 8.4 times faster than the CPU one, and it was 62.8 times faster than the GPU one. As for the power consumption efficiency, the FPGA realization was 7.8 times better than the CPU one, and it was 385.9 times better than the GPU one.

  • Accurate Estimation of Personalized Video Preference Using Multiple Users' Viewing Behavior

    Yoshiki ITO  Takahiro OGAWA  Miki HASEYAMA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2017/11/22
      Vol:
    E101-D No:2
      Page(s):
    481-490

    A method for accurate estimation of personalized video preference using multiple users' viewing behavior is presented in this paper. The proposed method uses three kinds of features: a video, user's viewing behavior and evaluation scores for the video given by a target user. First, the proposed method applies Supervised Multiview Spectral Embedding (SMSE) to obtain lower-dimensional video features suitable for the following correlation analysis. Next, supervised Multi-View Canonical Correlation Analysis (sMVCCA) is applied to integrate the three kinds of features. Then we can get optimal projections to obtain new visual features, “canonical video features” reflecting the target user's individual preference for a video based on sMVCCA. Furthermore, in our method, we use not only the target user's viewing behavior but also other users' viewing behavior for obtaining the optimal canonical video features of the target user. This unique approach is the biggest contribution of this paper. Finally, by integrating these canonical video features, Support Vector Ordinal Regression with Implicit Constraints (SVORIM) is trained in our method. Consequently, the target user's preference for a video can be estimated by using the trained SVORIM. Experimental results show the effectiveness of our method.

  • A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA

    Tomoya FUJII  Shimpei SATO  Hiroki NAKAHARA  

     
    PAPER-Emerging Applications

      Pubricized:
    2017/11/17
      Vol:
    E101-D No:2
      Page(s):
    376-386

    For a pre-trained deep convolutional neural network (CNN) for an embedded system, a high-speed and a low power consumption are required. In the former of the CNN, it consists of convolutional layers, while in the latter, it consists of fully connection layers. In the convolutional layer, the multiply accumulation operation is a bottleneck, while the fully connection layer, the memory access is a bottleneck. The binarized CNN has been proposed to realize many multiply accumulation circuit on the FPGA, thus, the convolutional layer can be done with a high-seed operation. However, even if we apply the binarization to the fully connection layer, the amount of memory was still a bottleneck. In this paper, we propose a neuron pruning technique which eliminates almost part of the weight memory, and we apply it to the fully connection layer on the binarized CNN. In that case, since the weight memory is realized by an on-chip memory on the FPGA, it achieves a high-speed memory access. To further reduce the memory size, we apply the retraining the CNN after neuron pruning. In this paper, we propose a sequential-input parallel-output fully connection layer circuit for the binarized fully connection layer, while proposing a streaming circuit for the binarized 2D convolutional layer. The experimental results showed that, by the neuron pruning, as for the fully connected layer on the VGG-11 CNN, the number of neurons was reduced by 39.8% with keeping the 99% baseline accuracy. We implemented the neuron pruning CNN on the Xilinx Inc. Zynq Zedboard. Compared with the ARM Cortex-A57, it was 1773.0 times faster, it dissipated 3.1 times lower power, and its performance per power efficiency was 5781.3 times better. Also, compared with the Maxwell GPU, it was 11.1 times faster, it dissipated 7.7 times lower power, and its performance per power efficiency was 84.1 times better. Thus, the binarized CNN on the FPGA is suitable for the embedded system.

  • Semi-Blind Interference Cancellation with Single Receive Antenna for Heterogeneous Networks

    Huiyu YE  Kazuhiko FUKAWA  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2017/06/28
      Vol:
    E101-B No:1
      Page(s):
    232-241

    In order to cope with severe interference in heterogeneous networks, this paper proposes a semi-blind interference cancellation scheme, which does not require multiple receive antennas or knowledge about training sequences of the interfering signals. The proposed scheme performs joint channel estimation and signal detection (JCESD) during the training period in order to blindly estimate channels of the interfering signals. On the other hand, maximum likelihood detection (MLD), which can be considered the optimum JCESD, must perform channel estimation for all transmitted signal candidates of the interfering signals and must search for the most likely signal candidate. Therefore, MLD incurs a prohibitive amount of computational complexity. To reduce such complexity drastically, the proposed scheme enhances the quantized channel approach, and applies the enhanced version to JCESD. In addition, a recalculation scheme is introduced to avoid inaccurate channel estimates due to local minima. Using the estimated channels, the proposed scheme performs multiuser detection (MUD) of the data sequences in order to cancel the interference. Computer simulations show that the proposed scheme outperforms a conventional scheme based on the Viterbi algorithm, and can achieve almost the same average bit error rate performance as the MUD with channels estimated from sufficiently long training sequences of both the desired signal and the interfering signals, while reducing the computational complexity significantly compared with full search involving all interfering signal candidates during the training period.

241-260hit(1072hit)