Zhengwei XIA Yun LIU Xiaoyun WANG Feiyun ZHANG Rui CHEN Weiwei JIANG
Infrared and visible image fusion can combine the thermal radiation information and the textures to provide a high-quality fused image. In this letter, we propose a hybrid variational fusion model to achieve this end. Specifically, an ℓ0 term is adopted to preserve the highlighted targets with salient gradient variation in the infrared image, an ℓ1 term is used to suppress the noise in the fused image and an ℓ2 term is employed to keep the textures of the visible image. Experimental results demonstrate the superiority of the proposed variational model and our results have more sharpen textures with less noise.
Pengxu JIANG Yang YANG Yue XIE Cairong ZOU Qingyun WANG
Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model’s input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.
Xiaoyun WANG Jinsong ZHANG Masafumi NISHIDA Seiichi YAMAMOTO
This paper describes a novel method to improve the performance of second language speech recognition when the mother tongue of users is known. Considering that second language speech usually includes less fluent pronunciation and more frequent pronunciation mistakes, the authors propose using a reduced phoneme set generated by a phonetic decision tree (PDT)-based top-down sequential splitting method instead of the canonical one of the second language. The authors verify the efficacy of the proposed method using second language speech collected with a translation game type dialogue-based English CALL system. Experiments show that a speech recognizer achieved higher recognition accuracy with the reduced phoneme set than with the canonical phoneme set.
Tianyu LU Haibo DAI Juan ZHAO Baoyun WANG
We investigate the uplink channel selection problem of unmanned aerial vehicle (UAV)-aided data collection system in delay-sensitive sensor networks. In the studied model, the fixed-wing UAV is dispatched to gather sensing information from terrestrial sensor nodes (SNs) and they contend for uplink channels for transmission. With the goal of minimizing the system-wide delay, we formulate a resource allocation problem. Encountered with the challenge that the flight trajectory of UAV is unknown to SNs and the wireless channel is time-varying, we solve the problem by stochastic game approach and further propose a fully distributed channel selection algorithm which is proved to converge to a pure strategy Nash Equilibrium (NE). Simulation results are presented to show that our proposed algorithm has good performance.
Bangan LIU Yun WANG Jian PANG Haosheng ZHANG Dongsheng YANG Aravind Tharayil NARAYANAN Dae Young LEE Sung Tae CHOI Rui WU Kenichi OKADA Akira MATSUZAWA
An energy efficient modulator for an ultra-low-power (ULP) 60-GHz IEEE transmitter is presented in this paper. The modulator consists of a differential duobinary coder and a semi-digital finite-impulse-response (FIR) pulse-shaping filter. By virtue of differential duobinary coding and pulse shaping, the transceiver successfully solves the adjacent-channel-power-ratio (ACPR) issue of conventional on-off-keying (OOK) transceivers. The proposed differential duobinary code adopts an over-sampling precoder, which relaxes timing requirement and reduces power consumption. The semi-digital FIR eliminates the power hungry digital multipliers and accumulators, and improves the power efficiency through optimization of filter parameters. Fabricated in a 65nm CMOS process, this modulator occupies a core area of 0.12mm2. With a throughput of 1.7Gbps/2.6Gbps, power consumption of modulator is 24.3mW/42.8mW respectively, while satisfying the IEEE 802.11ad spectrum mask.
Tianwen GUO Ping DENG Qiang YU Baoyun WANG
In this letter, we investigate a design of efficient antenna allocation at the full duplex receiver (FDR) in a multi-input multi-output multi-eavesdropper (MIMOME) wiretap channel for physical layer security improvement. Specifically, we propose the allocation which are feasible for the practical scenario with self-interference (SI) taken into account, because the jamming signals from FDR not only confuse the eavesdropper but also inevitably cause SI at the FDR. Due to the nolinear and coupling of the antenna allocation optimization problem, we transform the original problem into an integer programming problem. Then, we derive the optimal solution and the corresponding beamforming matrices in closed-form by means of combining spatial alignment and null-space projection method. Furthermore, we present the feasibility condition and full-protection condition, which offer insight into principles that enable more efficient and effective use of FDR in the wiretap channel for security improvement. From the simulation results, we validate the theoretical analysis and demonstrate the outstanding performance of the proposed antennas allocation at FDR.
Senyang HUANG Xiaoyun WANG Guangwu XU Meiqin WANG Jingyuan ZHAO
The security analysis of Keccak, the winner of SHA-3, has attracted considerable interest. Recently, some attention has been paid to distinguishing Keccak sponge function from random permutation. In EUROCRYPT'17, Huang et al. proposed conditional cube tester to recover the key of Keccak-MAC and Keyak and to construct practical distinguishing attacks on Keccak sponge function up to 7 rounds. In this paper, we improve the conditional cube tester model by refining the formulation of cube variables. By classifying cube variables into three different types and working the candidates of these types of cube variable carefully, we are able to establish a new theoretical distinguisher on 8-round Keccak sponge function. Our result is more efficient and greatly improves the existing results. Finally we remark that our distinguishing attack on the the reduced-round Keccak will not threat the security margin of the Keccak sponge function.
Pei LI Haiyang ZHANG Fan CHU Wei WU Juan ZHAO Baoyun WANG
This paper proposes a sampling strategy for bandlimited graph signals over perturbed graph, in which we assume the edge between any pair of the nodes may be deleted randomly. Considering the mismatch between the true graph and the presumed graph, we derive the mean square error (MSE) of the reconstructed bandlimited graph signals. To minimize the MSE, we propose a greedy-based algorithm to obtain the optimal sampling set. Furthermore, we use Neumann series to avoid the pseudo-inverse computing. An efficient algorithm with low-complexity is thus proposed. Finally, numerical results show the superiority of our proposed algorithms over the other existing algorithms.
Xia WANG Ruiyu LIANG Qingyun WANG Li ZHAO Cairong ZOU
In this letter, an effective acoustic feedback cancellation algorithm is proposed based on the normalized sub-band adaptive filter (NSAF). To improve the confliction between fast convergence rate and low misalignment in the NSAF algorithm, a variable step size is designed to automatically vary according to the update state of the filter. The update state of the filter is adaptively detected via the normalized distance between the long term average and the short term average of the tap-weight vector. Simulation results demonstrate that the proposed algorithm has superior performance in terms of convergence rate and misalignment.
Xi FU Yun WANG Xiaolin WANG Xiaofan GU Xueting LUO Zheng LI Jian PANG Atsushi SHIRANE Kenichi OKADA
This paper presents a high-resolution and low-insertion-loss CMOS hybrid phase shifter with a nonuniform matching technique for satellite communication (SATCOM). The proposed hybrid phase shifter includes three 45° coarse phase-shifting stages and one 45° fine phase-tuning stage. The coarse stages are realized by bridged-T switch-type phase shifters (STPS) with 45° phase steps. The fine-tuning stage is based on a reflective-type phase shifter (RTPS) with two identical LC load tanks for phase tuning. A 0.8° phase resolution is realized by this work to support fine beam steering for the SATCOM. To further reduce the chain insertion loss, a nonuniform matching technique is utilized at the coarse stages. For the coarse and fine stages, the measured RMS gain errors at 29GHz are 0.7dB and 0.3dB, respectively. The measured RMS phase errors are 0.8° and 0.4°, respectively. The proposed hybrid phase shifter maintains return losses of all phase states less than -12dB from 24GHz to 34GHz. The presented hybrid phase shifter is fabricated in a standard 65-nm CMOS technology with a 0.14mm2 active area.
Jun WANG Yuanyun WANG Chengzhi DENG Shengqian WANG Yong QIN
Developing a robust appearance model is a challenging task due to appearance variations of objects such as partial occlusion, illumination variation, rotation and background clutter. Existing tracking algorithms employ linear combinations of target templates to represent target appearances, which are not accurate enough to deal with appearance variations. The underlying relationship between target candidates and the target templates is highly nonlinear because of complicated appearance variations. To address this, this paper presents a regularized kernel representation for visual tracking. Namely, the feature vectors of target appearances are mapped into higher dimensional features, in which a target candidate is approximately represented by a nonlinear combination of target templates in a dimensional space. The kernel based appearance model takes advantage of considering the non-linear relationship and capturing the nonlinear similarity between target candidates and target templates. l2-regularization on coding coefficients makes the approximate solution of target representations more stable. Comprehensive experiments demonstrate the superior performances in comparison with state-of-the-art trackers.
Meiqin WANG Xiaoyun WANG Kam Pui CHOW Lucas Chi Kwong HUI
CAST-128 is a block cipher used in a number of products, notably as the default cipher in some versions of GPG and PGP. It has been approved for Canadian government use by the Communications Security Establishment. Haruki Seki et al. found 2-round differential characteristics and they can attack 5-round CAST-128. In this paper, we studied the properties of round functions F1 and F3 in CAST-128, and identified differential characteristics for F1 round function and F3 round function. So we identified a 6-round differential characteristic with probability 2-53 under 2-23.8 of the total key space. Then based on 6-round differential characteristic, we can attack 8-round CAST-128 with key sizes greater than or equal to 72 bits and 9-round CAST-128 with key sizes greater than or equal to 104 bits. We give the summary of attacks on reduced-round CAST-128 in Table 10.
YiYun WANG Qingji ZENG Chen HE Lihua LU ZhiCheng SUI
We develop a new optical switching fabric with multi-granularity grooming based on our lambda-group model, as well as algorithms that can handle dynamic environments. The proposed fabric based on a new multi-granular grooming scheme presents the distinctive approach of assigning different contiguous groups of granularities to different paths for effective treatment. Results and figures from experiments show that the particular partitioning approach not only is helpful to port reduction significantly, but also improves the SNR of signal and blocking performance for dynamic connection requests.
Cheng CHEN Haibo DAI Tianwen GUO Qiang YU Baoyun WANG
This paper investigates the wireless information surveillance in a suspicious millimeter wave (mmWave) wireless communication system via the spoofing relay based proactive eavesdropping approach. Specifically, the legitimate monitor in the system acts as a relay to simultaneously eavesdrop and send spoofing signals to vary the source transmission rate. To maximize the effective eavesdropping rate, an optimization problem for both hybrid precoding design and power distribution is formulated. Since the problem is fractional and non-convex, we resort to the Dinkelbach method to equivalently reduce the original problem into a series of non-fractional problems, which is still coupling. Afterwards, based on the BCD-type method, the non-fractional problem is reduced to three subproblems with two introduced parameters. Then the GS-PDD-based algorithm is proposed to obtain the optimal solution by alternately optimizing the three subproblems and simultaneously updating the introduced parameters. Numerical results verify the effectiveness and superiority of our proposed scheme.
Xi FU Yun WANG Zheng LI Atsushi SHIRANE Kenichi OKADA
There are enlarged requirements of millimeter-wave beamforming phased-array transceivers and high-order modulation multi-input multi-output (MIMO) transceivers. High-performance integrated RF switches are regarded as one of the most critical components for those transceivers to support signal channel distribution and path redundancy. This paper introduces a CMOS high-isolation and low-loss RF switch with a novel switched parallel LC resonance network. The proposed single-pole double-throw (SPDT) RF switch realizes 68dB port isolation and 1.0dB insertion loss with an active area of 0.034mm2. The SPDT RF switch is composed of two series-shunt transistor pairs with body-floating technology and a switched parallel LC network. The network uses a turned-off series transistor to resonate out off-capacitance Coff. The measured output third-order intercept (OIP3) is higher than 21dBm. The proposed SPDT RF switch maintains return losses of all working ports less than 10dB from 8GHz to 20GHz. The high-performance SPDT RF switch is fabricated in standard 65-nm CMOS technology.
Qingyun WANG Xinchun JI Ruiyu LIANG Li ZHAO
In the traditional microphone array signal processing, the performance degrades rapidly when the array aperture decreases, which has been a barrier restricting its implementation in the small-scale acoustic system such as digital hearing aids. In this work a new compressed sampling method of miniature microphone array is proposed, which compresses information in the internal of ADC by means of mixture system of hardware circuit and software program in order to remove the redundancy of the different array element signals. The architecture of the method is developed using the Verilog language and has already been tested in the FPGA chip. Experiments of compressed sampling and reconstruction show the successful sparseness and reconstruction for speech sources. Owing to having avoided singularity problem of the correlation matrix of the miniature microphone array, when used in the direction of arrival (DOA) estimation in digital hearing aids, the proposed method has the advantage of higher resolution compared with the traditional GCC and MUSIC algorithms.
Xiaoyun WANG Tsuneo KATO Seiichi YAMAMOTO
Recognition of second language (L2) speech is a challenging task even for state-of-the-art automatic speech recognition (ASR) systems, partly because pronunciation by L2 speakers is usually significantly influenced by the mother tongue of the speakers. Considering that the expressions of non-native speakers are usually simpler than those of native ones, and that second language speech usually includes mispronunciation and less fluent pronunciation, we propose a novel method that maximizes unified acoustic and linguistic objective function to derive a phoneme set for second language speech recognition. The authors verify the efficacy of the proposed method using second language speech collected with a translation game type dialogue-based computer assisted language learning (CALL) system. In this paper, the authors examine the performance based on acoustic likelihood, linguistic discrimination ability and integrated objective function for second language speech. Experiments demonstrate the validity of the phoneme set derived by the proposed method.
Xueqi ZHANG Wei WU Baoyun WANG Jian LIU
This letter investigates transmit optimization in multi-user multi-input multi-output (MIMO) wiretap channels. In particular, we address the transmit covariance optimization for an artificial-noise (AN)-aided secrecy rate maximization (SRM) when subject to individual harvested energy and average transmit power. Owing to the inefficiency of the conventional interior-point solvers in handling our formulated SRM problem, a custom-designed algorithm based on penalty function (PF) and projected gradient (PG) is proposed, which results in semi-closed form solutions. The proposed algorithm achieves about two orders of magnitude reduction of running time with nearly the same performance comparing to the existing interior-point solvers. In addition, the proposed algorithm can be extended to other power-limited transmit design problems. Simulation results demonstrate the excellent performance and high efficiency of the algorithm.
Qingyun WANG Ruiyu LIANG Li JING Cairong ZOU Li ZHAO
Since digital hearing aids are sensitive to time delay and power consumption, the computational complexity of noise reduction must be reduced as much as possible. Therefore, some complicated algorithms based on the analysis of the time-frequency domain are very difficult to implement in digital hearing aids. This paper presents a new approach that yields an improved noise reduction algorithm with greatly reduce computational complexity for multi-channel digital hearing aids. First, the sub-band sound pressure level (SPL) is calculated in real time. Then, based on the calculated sub-band SPL, the noise in the sub-band is estimated and the possibility of speech is computed. Finally, a posteriori and a priori signal-to-noise ratios are estimated and the gain function is acquired to reduce the noise adaptively. By replacing the FFT and IFFT transforms by the known SPL, the proposed algorithm greatly reduces the computation loads. Experiments on a prototype digital hearing aid show that the time delay is decreased to nearly half that of the traditional adaptive Wiener filtering and spectral subtraction algorithms, but the SNR improvement and PESQ score are rather satisfied. Compared with modulation frequency-based noise reduction algorithm, which is used in many commercial digital hearing aids, the proposed algorithm achieves not only more than 5dB SNR improvement but also less time delay and power consumption.
Recognition of second language (L2) speech is still a challenging task even for state-of-the-art automatic speech recognition (ASR) systems, partly because pronunciation by L2 speakers is usually significantly influenced by the mother tongue of the speakers. The authors previously proposed using a reduced phoneme set (RPS) instead of the canonical one of L2 when the mother tongue of speakers is known, and demonstrated that this reduced phoneme set improved the recognition performance through experiments using English utterances spoken by Japanese. However, the proficiency of L2 speakers varies widely, as does the influence of the mother tongue on their pronunciation. As a result, the effect of the reduced phoneme set is different depending on the speakers' proficiency in L2. In this paper, the authors examine the relation between proficiency of speakers and a reduced phoneme set customized for them. The experimental results are then used as the basis of a novel speech recognition method using a lexicon in which the pronunciation of each lexical item is represented by multiple reduced phoneme sets, and the implementation of a language model most suitable for that lexicon is described. Experimental results demonstrate the high validity of the proposed method.