In underwater acoustic communication systems based on orthogonal frequency division multiplexing (OFDM), taking clipping to reduce the peak-to-average power ratio leads to nonlinear distortion of the signal, making the receiver unable to recover the faded signal accurately. In this letter, an Aquila optimizer-based convolutional attention block stacked network (AO-CABNet) is proposed to replace the receiver to improve the ability to recover the original signal. Simulation results show that the AO method has better optimization capability to quickly obtain the optimal parameters of the network model, and the proposed AO-CABNet structure outperforms existing schemes.
Qinghua SHENG Yu CHENG Xiaofang HUANG Changcai LAI Xiaofeng HUANG Haibin YIN
Dependent Quantization (DQ) is a new quantization tool introduced in the Versatile Video Coding (VVC) standard. While it provides better rate-distortion calculation accuracy, it also increases the computational complexity and hardware cost compared to the widely used scalar quantization. To address this issue, this paper proposes a parallel-dependent quantization hardware architecture using Verilog HDL language. The architecture preprocesses the coefficients with a scalar quantizer and a high-frequency filter, and then further segments and processes the coefficients in parallel using the Viterbi algorithm. Additionally, the weight bit width of the rate-distortion calculation is reduced to decrease the quantization cycle and computational complexity. Finally, the final quantization of the TU is determined through sequential scanning and judging of the rate-distortion cost. Experimental results show that the proposed algorithm reduces the quantization cycle by an average of 56.96% compared to VVC’s reference platform VTM, with a Bjøntegaard delta bit rate (BDBR) loss of 1.03% and 1.05% under the Low-delay P and Random Access configurations, respectively. Verification on the AMD FPGA development platform demonstrates that the hardware implementation meets the quantization requirements for 1080P@60Hz video hardware encoding.
Baohang ZHANG Haichuan YANG Tao ZHENG Rong-Long WANG Shangce GAO
The equilibrium optimizer (EO) is a novel physics-based meta-heuristic optimization algorithm that is inspired by estimating dynamics and equilibrium states in controlled volume mass balance models. As a stochastic optimization algorithm, EO inevitably produces duplicated solutions, which is wasteful of valuable evaluation opportunities. In addition, an excessive number of duplicated solutions can increase the risk of the algorithm getting trapped in local optima. In this paper, an improved EO algorithm with a bis-population-based non-revisiting (BNR) mechanism is proposed, namely BEO. It aims to eliminate duplicate solutions generated by the population during iterations, thus avoiding wasted evaluation opportunities. Furthermore, when a revisited solution is detected, the BNR mechanism activates its unique archive population learning mechanism to assist the algorithm in generating a high-quality solution using the excellent genes in the historical information, which not only improves the algorithm's population diversity but also helps the algorithm get out of the local optimum dilemma. Experimental findings with the IEEE CEC2017 benchmark demonstrate that the proposed BEO algorithm outperforms other seven representative meta-heuristic optimization techniques, including the original EO algorithm.
Yoshiharu YAMAGISHI Tatsuya KANEKO Megumi AKAI-KASAYA Tetsuya ASAI
Edge computing, which has been gaining attention in recent years, has many advantages, such as reducing the load on the cloud, not being affected by the communication environment, and providing excellent security. Therefore, many researchers have attempted to implement neural networks, which are representative of machine learning in edge computing. Neural networks can be divided into inference and learning parts; however, there has been little research on implementing the learning component in edge computing in contrast to the inference part. This is because learning requires more memory and computation than inference, easily exceeding the limit of resources available for edge computing. To overcome this problem, this research focuses on the optimizer, which is the heart of learning. In this paper, we introduce our new optimizer, hardware-oriented logarithmic momentum estimation (Holmes), which incorporates new perspectives not found in existing optimizers in terms of characteristics and strengths of hardware. The performance of Holmes was evaluated by comparing it with other optimizers with respect to learning progress and convergence speed. Important aspects of hardware implementation, such as memory and operation requirements are also discussed. The results show that Holmes is a good match for edge computing with relatively low resource requirements and fast learning convergence. Holmes will help create an era in which advanced machine learning can be realized on edge computing.
Koji KAMMA Sarimu INOUE Toshikazu WADA
Pruning is an effective technique to reduce computational complexity of Convolutional Neural Networks (CNNs) by removing redundant neurons (or weights). There are two types of pruning methods: holistic pruning and layer-wise pruning. The former selects the least important neuron from the entire model and prunes it. The latter conducts pruning layer by layer. Recently, it has turned out that some layer-wise methods are effective for reducing computational complexity of pruned models while preserving their accuracy. The difficulty of layer-wise pruning is how to adjust pruning ratio (the ratio of neurons to be pruned) in each layer. Because CNNs typically have lots of layers composed of lots of neurons, it is inefficient to tune pruning ratios by human hands. In this paper, we present Pruning Ratio Optimizer (PRO), a method that can be combined with layer-wise pruning methods for optimizing pruning ratios. The idea of PRO is to adjust pruning ratios based on how much pruning in each layer has an impact on the outputs in the final layer. In the experiments, we could verify the effectiveness of PRO.
Jun MENG Gangyi DING Laiyang LIU
In view of the different spatial and temporal resolutions of observed multi-source heterogeneous carbon dioxide data and the uncertain quality of observations, a data fusion prediction model for observed multi-scale carbon dioxide concentration data is studied. First, a wireless carbon sensor network is created, the gross error data in the original dataset are eliminated, and remaining valid data are combined with kriging method to generate a series of continuous surfaces for expressing specific features and providing unified spatio-temporally normalized data for subsequent prediction models. Then, the long short-term memory network is used to process these continuous time- and space-normalized data to obtain the carbon dioxide concentration prediction model at any scales. Finally, the experimental results illustrate that the proposed method with spatio-temporal features is more accurate than the single sensor monitoring method without spatio-temporal features.
Yi LIU Wei QIN Jinhui ZHANG Mengmeng LI Qibin ZHENG Jichuan WANG
Multi-objective evolutionary algorithms are widely used in many engineering optimization problems and artificial intelligence applications. Ant lion optimizer is an outstanding evolutionary method, but two issues need to be solved to extend it to the multi-objective optimization field, one is how to update the Pareto archive, and the other is how to choose elite and ant lions from archive. We develop a novel multi-objective variant of ant lion optimizer in this paper. A new measure combining Pareto dominance relation and distance information of individuals is put forward and used to tackle the first issue. The concept of time weight is developed to handle the second problem. Besides, mutation operation is adopted on solutions in middle part of archive to further improve its performance. Eleven functions, other four algorithms and four indicators are taken to evaluate the new method. The results show that proposed algorithm has better performance and lower time complexity.
Mengmeng LI Xiaoguang REN Yanzhen WANG Wei QIN Yi LIU
Feature selection is important for learning algorithms, and it is still an open problem. Antlion optimizer is an excellent nature inspired method, but it doesn't work well for feature selection. This paper proposes a hybrid approach called Ant-Antlion Optimizer which combines advantages of antlion's smart behavior of antlion optimizer and ant's powerful searching movement of ant colony optimization. A mutation operator is also adopted to strengthen exploration ability. Comprehensive experiments by binary classification problems show that the proposed algorithm is superiority to other state-of-art methods on four performance indicators.
Su LIU Xingguang GENG Yitao ZHANG Shaolong ZHANG Jun ZHANG Yanbin XIAO Chengjun HUANG Haiying ZHANG
The quality of edge detection is related to detection angle, scale, and threshold. There have been many algorithms to promote edge detection quality by some rules about detection angles. However these algorithm did not form rules to detect edges at an arbitrary angle, therefore they just used different number of angles and did not indicate optimized number of angles. In this paper, a novel edge detection algorithm is proposed to detect edges at arbitrary angles and optimized number of angles in the algorithm is introduced. The algorithm combines singularity detection with Gaussian wavelet transform and edge detection at arbitrary directions and contain five steps: 1) An image is divided into some pixel lines at certain angle in the range from 45° to 90° according to decomposition rules of this paper. 2) Singularities of pixel lines are detected and form an edge image at the certain angle. 3) Many edge images at different angles form a final edge images. 4) Detection angles in the range from 45° to 90° are extended to range from 0° to 360°. 5) Optimized number of angles for the algorithm is proposed. Then the algorithm with optimized number of angles shows better performances.
Tomoyuki SASAKI Hidehiro NAKANO Arata MIYAUCHI Akira TAGUCHI
In this paper, we propose a new paradigm of deterministic PSO, named piecewise-linear particle swarm optimizer (PPSO). In PPSO, each particle has two search dynamics, a convergence mode and a divergence mode. The trajectory of each particle is switched between the two dynamics and is controlled by parameters. We analyze convergence condition of each particle and investigate parameter conditions to allow particles to converge to an equilibrium point through numerical experiments. We further compare solving performances of PPSO. As a result, we report here that the solving performances of PPSO are substantially the same as or superior to those of PSO.
Tomoyuki SASAKI Hidehiro NAKANO Arata MIYAUCHI Akira TAGUCHI
Particle swarm optimizer network (PSON) is one of the multi-swarm PSOs. In PSON, a population is divided into multiple sub-PSOs, each of which searches a solution space independently. Although PSON has a good solving performance, it may be trapped into a local optimum solution. In this paper, we introduce into PSON a dynamic stochastic network topology called “PSON with stochastic connection” (PSON-SC). In PSON-SC, each sub-PSO can be connected to the global best (gbest) information memory and refer to gbest stochastically. We show clearly herein that the diversity of PSON-SC is higher than that of PSON, while confirming the effectiveness of PSON-SC by many numerical simulations.
Jin XU Yan ZHANG Zhizhong FU Ning ZHOU
Distributed compressive video sensing (DCVS) is a new paradigm for low-complexity video compression. To achieve the highest possible perceptual coding performance under the measurements budget constraint, we propose a perceptual optimized DCVS codec by jointly exploiting the reweighted sampling and rate-distortion optimized measurements allocation technologies. A visual saliency modulated just-noticeable distortion (VS-JND) profile is first developed based on the side information (SI) at the decoder side. Then the estimated correlation noise (CN) between each non-key frame and its SI is suppressed by the VS-JND. Subsequently, the suppressed CN is utilized to determine the weighting matrix for the reweighted sampling as well as to design a perceptual rate-distortion optimization model to calculate the optimal measurements allocation for each non-key frame. Experimental results indicate that the proposed DCVS codec outperforms the other existing DCVS codecs in term of both the objective and subjective performance.
Lixin WANG Yutong LU Wei ZHANG Yan LEI
File system workloads are increasing write-heavy. The growing capacity of RAM in modern nodes allows many reads to be satisfied from memory while writes must be persisted to disk. Today's sophisticated local file systems like Ext4, XFS and Btrfs optimize for reads but suffer from workloads dominated by microdata (including metadata and tiny files). In this paper we present an LSM-tree-based file system, RFS, which aims to take advantages of the write optimization of LSM-tree to provide enhanced microdata performance, while offering matching performance for large files. RFS incrementally partitions the namespace into several metadata columns on a per-directory basis, preserving disk locality for directories and reducing the write amplification of LSM-trees. A write-ordered log-structured layout is used to store small files efficiently, rather than embedding the contents of small files into inodes. We also propose an optimization of global bloom filters for efficient point lookups. Experiments show our library version of RFS can handle microwrite-intensive workloads 2-10 times faster than existing solutions such as Ext4, Btrfs and XFS.
Distributed compressive video sensing (DCVS) is an emerging low-complexity video coding framework which integrates the merits of distributed video coding (DVC) and compressive sensing (CS). In this paper, we propose a novel rate-distortion optimized DCVS codec, which takes advantage of a rate-distortion optimization (RDO) model based on the estimated correlation noise (CN) between a non-key frame and its side information (SI) to determine the optimal measurements allocation for the non-key frame. Because the actual CN can be more accurately recovered by our DCVS codec, it leads to more faithful reconstruction of the non-key frames by adding the recovered CN to the SI. The experimental results reveal that our DCVS codec significantly outperforms the legacy DCVS codecs in terms of both objective and subjective performance.
Lixin WANG Yutong LU Wei ZHANG Yan LEI
One of the patterns that the design of parallel file systems has to solve stems from the difficulty of handling the metadata-intensive I/O generated by parallel applications accessing a single large directory. We demonstrate a middleware design called SFS to support existing parallel file systems for distributed and scalable directory service. SFS distributes directory entries over data servers instead of metadata servers to offer increased scalability and performance. Firstly, SFS exploits an adaptive directory partitioning based on extendible hashing to support concurrent and unsynchronized partition splitting. Secondly, SFS describes an optimization based on recursive split-ordering that emphasizes speeding up the splitting process. Thirdly, SFS applies a write-optimized index structure to convert slow, small, random metadata updates into fast, large, sequential writes. Finally, SFS gracefully tolerates stale mapping at the clients while maintaining the correctness and consistency of the system. Our performance results on a cluster of 32-servers show our implementation can deliver more than 250,000 file creations per second on average.
Kosuke KATAYAMA Mizuki MOTOYOSHI Kyoya TAKANO Chen Yang LI Shuhei AMAKAWA Minoru FUJISHIMA
E-band communication is allocated to the frequency bands of 71-76 and 81-86GHz. Radio-frequency (RF) front-end components for E-band communication have been realized using compound semiconductor technology. To realize a CMOS LNA for E-band communication, we propose a gain-boosted cascode amplifier (GBCA) stage that simultaneously provides high gain and stability. Designing an LNA from scratch requires considerable time because the tuning of matching networks with consideration of the parasitic elements is complicated. In this paper, we model the characteristics of devices including the effects of their parasitic elements. Using these models, an optimizer can estimate the characteristic of a designed LNA precisely without electromagnetic simulations and gives us the design values of an LNA when the layout constraint is ignored. Starting from the values, a four-stage LNA with a GBCA stage is designed very easily even though the layout constraint is considered and fabricated by a 65nm LP CMOS process. The fabricated LNA is measured, and it is confirmed that it achieves 18.5GHz bandwidth and over 24.3dB gain with 50.6mW power consumption. This is the first LNA to achieve a gain bandwidth of over 300GHz in the E-band among the LNAs utilizing any kind of semiconductor technologies. In this paper, we have proved that CMOS technology, which is suitable for baseband and digital circuitry, is applicable to a communication system covering the entire E-band.
Yaolong QI Weixian TAN Xueming PENG Yanping WANG Wen HONG
Near range microwave imaging systems have broad application prospects in the field of concealed weapon detection, biomedical imaging, nondestructive testing, etc. In this paper, the technique of optimized sparse antenna array is applied to near range microwave imaging, which can greatly reduce the complexity of imaging systems. In detail, the paper establishes three-dimensional sparse array imaging geometry and corresponding echo model, where the imaging geometry is formed by arranging optimized sparse antenna array in elevation, scanning in azimuth and transmitting broadband signals in range direction; and by analyzing the characteristics of near range imaging, that is, the maximum interval of transmitting and receiving elements is limited by the range from imaging system to targets, we propose the idea of piecewise sparse line array; secondly, by analyzing the convolution principle, we develop a method of arranging piecewise sparse array which can generate the same distribution of equivalent phase centers as filled antenna array; then, the paper deduces corresponding imaging algorithm; finally, the imaging geometry and corresponding algorithm proposed in this paper are investigated and verified via numerical simulations and near range imaging experiments.
Xin MAN Takashi HORIYAMA Shinji KIMURA
Clock gating is supported by commercial tools as a power optimization feature based on the guard signal described in HDL (structural method). However, the identification of control signals for gated registers is hard and designer-intensive work. Besides, since the clock gating cells also consume power, it is imperative to minimize the number of inserted clock gating cells and their switching activities for power optimization. In this paper, we propose an automatic multi-stage clock gating algorithm with ILP (Integer Linear Programming) formulation, including clock gating control candidate extraction, constraints construction and optimum control signal selection. By multi-stage clock gating, unnecessary clock pulses to clock gating cells can be avoided by other clock gating cells, so that the switching activity of clock gating cells can be reduced. We find that any multi-stage control signals are also single-stage control signals, and any combination of signals can be selected from single-stage candidates. The proposed method can be applied to 3 or more cascaded stages. The multi-stage clock gating optimization problem is formulated as constraints in LP format for the selection of cascaded clock-gating order of multi-stage candidate combinations, and a commercial ILP solver (IBM CPLEX) is applied to obtain the control signals for each register with minimum switching activity. Those signals are used to generate a gate level description with guarded registers from original design, and a commercial synthesis and layout tools are applied to obtain the circuit with multi-stage clock gating. For a set of benchmark circuits and a Low Density Parity Check (LDPC) Decoder (6.6k gates, 212 F.F.s), the proposed method is applied and actual power consumption is estimated using Synopsys NanoSim after layout. On average, 31% actual power reduction has been obtained compared with original designs with structural clock gating, and more than 10% improvement has been achieved for some circuits compared with single-stage optimization method. CPU time for optimum multi-stage control selection is several seconds for up to 25k variables in LP format. By applying the proposed clock gating, area can also be reduced since the multiplexors controlling register inputs are eliminated.
Junqi ZHANG Lina NI Jing YAO Wei WANG Zheng TANG
Kennedy has proposed the bare bones particle swarm (BBPS) by the elimination of the velocity formula and its replacement by the Gaussian sampling strategy without parameter tuning. However, a delicate balance between exploitation and exploration is the key to the success of an optimizer. This paper firstly analyzes the sampling distribution in BBPS, based on which we propose an adaptive BBPS inspired by the cloud model (ACM-BBPS). The cloud model adaptively produces a different standard deviation of the Gaussian sampling for each particle according to the evolutionary state in the swarm, which provides an adaptive balance between exploitation and exploration on different objective functions. Meanwhile, the diversity of the swarms is further enhanced by the randomness of the cloud model itself. Experimental results show that the proposed ACM-BBPS achieves faster convergence speed and more accurate solutions than five other contenders on twenty-five unimodal, basic multimodal, extended multimodal and hybrid composition benchmark functions. The diversity enhancement by the randomness in the cloud model itself is also illustrated.
Junqi ZHANG Lina NI Chen XIE Ying TAN Zheng TANG
This paper presents an adaptive magnification transformation based particle swarm optimizer (AMT-PSO) that provides an adaptive search strategy for each particle along the search process. Magnification transformation is a simple but very powerful mechanism, which is inspired by using a convex lens to see things much clearer. The essence of this transformation is to set a magnifier around an area we are interested in, so that we could inspect the area of interest more carefully and precisely. An evolutionary factor, which utilizes the information of population distribution in particle swarm, is used as an index to adaptively tune the magnification scale factor for each particle in each dimension. Furthermore, a perturbation-based elitist learning strategy is utilized to help the swarm's best particle to escape the local optimum and explore the potential better space. The AMT-PSO is evaluated on 15 unimodal and multimodal benchmark functions. The effects of the adaptive magnification transformation mechanism and the elitist learning strategy in AMT-PSO are studied. Results show that the adaptive magnification transformation mechanism provides the main contribution to the proposed AMT-PSO in terms of convergence speed and solution accuracy on four categories of benchmark test functions.