Shijian HUANG Junyong YE Tongqing WANG Li JIANG Changyuan XING Yang LI
Traditional low-rank feature lose the temporal information among action sequence. To obtain the temporal information, we split an action video into multiple action subsequences and concatenate all the low-rank features of subsequences according to their time order. Then we recognize actions by learning a novel dictionary model from concatenated low-rank features. However, traditional dictionary learning models usually neglect the similarity among the coding coefficients and have bad performance in dealing with non-linearly separable data. To overcome these shortcomings, we present a novel similarity constrained discriminative kernel dictionary learning for action recognition. The effectiveness of the proposed method is verified on three benchmarks, and the experimental results show the promising results of our method for action recognition.
Haiyang LIU Hao ZHANG Lianrong MA Lingjun KONG
In this letter, the structural analysis of nonbinary cyclic and quasi-cyclic (QC) low-density parity-check (LDPC) codes with α-multiplied parity-check matrices (PCMs) is concerned. Using analytical methods, several structural parameters of nonbinary cyclic and QC LDPC codes with α-multiplied PCMs are determined. In particular, some classes of nonbinary LDPC codes constructed from finite fields and finite geometries are shown to have good minimum and stopping distances properties, which may explain to some extent their wonderful decoding performances.
Andrew W. POON Linjie ZHOU Fang XU Chao LI Hui CHEN Tak-Keung LIANG Yang LIU Hon K. TSANG
In this review paper we showcase recent activities on silicon photonics science and technology research in Hong Kong regarding two important topical areas--microresonator devices and optical nonlinearities. Our work on silicon microresonator filters, switches and modulators have shown promise for the nascent development of on-chip optoelectronic signal processing systems, while our studies on optical nonlinearities have contributed to basic understanding of silicon-based optically-pumped light sources and helium-implanted detectors. Here, we review our various passive and electro-optic active microresonator devices including (i) cascaded microring resonator cross-connect filters, (ii) NRZ-to-PRZ data format converters using a microring resonator notch filter, (iii) GHz-speed carrier-injection-based microring resonator modulators and 0.5-GHz-speed carrier-injection-based microdisk resonator modulators, and (iv) electrically reconfigurable microring resonator add-drop filters and electro-optic logic switches using interferometric resonance control. On the nonlinear waveguide front, we review the main nonlinear optical effects in silicon, and show that even at fairly modest average powers two-photon absorption and the accompanied free-carrier linear absorption could lead to optical limiting and a dramatic reduction in the effective lengths of nonlinear devices.
Yang LI Jinlin WANG Xuewen ZENG Xiaozhou YE
Montgomery modular multiplication is one of the most efficient algorithms for modular multiplication of large integers. On resource-constraint embedded processors, memory-access operations play an important role as arithmetic operations in the modular multiplication. To improve the efficiency of Montgomery modular multiplication on embedded processors, this paper concentrates on reducing the memory-access operations through adding a few working registers. We first revisit previous popular Montgomery modular multiplication algorithms, and then present improved algorithms for Montgomery modular multiplication and squaring for arbitrary prime fields. The algorithms adopt the general ideas of hybrid multiplication algorithm proposed by Gura and lazy doubling algorithm proposed by Lee. By careful optimization and redesign, we propose novel implementations for Montgomery multiplication and squaring called coarsely integrated product and operand hybrid scanning algorithm (CIPOHS) and coarsely integrated lazy doubling algorithm (CILD). Then, we implement the algorithms on general MIPS64 processor and OCTEON CN6645 processor equipped with specific multiply-add instructions. Experiments show that CIPOHS and CILD offer the best performance both on the general MIPS64 and OCTEON CN6645 processors. But the proposed algorithms have obvious advantages for the processors with specific multiply-add instructions such as OCTEON CN6645. When the modulus is 2048 bits, the CIPOHS and CILD outperform the CIOS algorithm by a factor of 47% and 58%, respectively.
This letter proposes an efficient Two-stage Resource scheduling algorithm for cloud based Live Media Streaming system (TRLMS). It transforms the cloud-based resource scheduling problem to a min-cost flow problem in a graph, and solves it by an improved Successive Short Path (SSP) algorithm. Simulation results show that TRLMS can enhance user demand satisfaction by 17.1% than mean-based method, and its time complexity is much lower than original SSP algorithm.
Kuo-Yi CHEN Chin-Yang LIN Tien-Yan MA Ting-Wei HOU
With more digital home appliances and network devices having OSGi as the software management platform, the power-saving capability of the OSGi platform has become a critical issue. This paper is aimed at improving the power-efficiency of the OSGi platform, i.e. reducing the energy consumption with minimum performance degradation. The key to this study is an efficient power-saving technique which exploits the runtime information already available in a Java virtual machine (JVM), the base software of the OSGi platform, to best determine the timing of performing DVFS (Dynamic Voltage and Frequency Scaling). This, technically, involves a phase detection scheme that identifies the memory phase of the OSGi-enabled device/server in a correct and almost effortless way. The overhead of the power-saving procedure is thus minimized, and the system performance is well maintained. We have implemented and evaluated the proposed power-saving approach on an OSGi server, where the Apache Felix OSGi implementation and the DaCapo benchmarks were applied. The results show that this approach can achieve real power-efficiency for the OSGi platform, in which the power consumption is significantly reduced and the performance remains highly competitive, compared with the other power-saving techniques.
Hua JIANG Kanglian ZHAO Yang LI Sidan DU
In this letter we design a new family of space-time block codes (STBC) for multi-input multi-output (MIMO) systems. The complex orthogonal STBC achieves full diversity and full transmission rate with fast maximum-likelihood decoding when only two transmit antennas are employed. By combining the Alamouti STBC and the multidimensional signal constellation rotation based on the cyclotomic number field, we construct cyclotomic orthogonal space-time block codes (COSTBCs) which can achieve full diversity and full rate for multiple transmit antennas. Theoretical analysis and simulation results demonstrate excellent performance of the proposed codes, while the decoding complexity is further reduced.
Object detection is one of the most important aspects of computer vision, and the use of CNNs for object detection has yielded substantial results in a variety of fields. However, due to the fixed sampling in standard convolution layers, it restricts receptive fields to fixed locations and limits CNNs in geometric transformations. This leads to poor performance of CNNs for slender object detection. In order to achieve better slender object detection accuracy and efficiency, this proposed detector DFAM-DETR not only can adjust the sampling points adaptively, but also enhance the ability to focus on slender object features and extract essential information from global to local on the image through an attention mechanism. This study uses slender objects images from MS-COCO dataset. The experimental results show that DFAM-DETR achieves excellent detection performance on slender objects compared to CNN and transformer-based detectors.
Lingjun KONG Haiyang LIU Lianrong MA
This letter is concerned with incorrigible sets of binary linear codes. For a given binary linear code C, we represent the numbers of incorrigible sets of size up to ⌈3/2d - 1⌉ using the weight enumerator of C, where d is the minimum distance of C. In addition, we determine the incorrigible set enumerators of binary Golay codes G23 and G24 through combinatorial methods.
Yang LIU Yuqi XIA Haoqin SUN Xiaolei MENG Jianxiong BAI Wenbo GUAN Zhen ZHAO Yongwei LI
Speech emotion recognition (SER) has been a complex and difficult task for a long time due to emotional complexity. In this paper, we propose a multitask deep learning approach based on cascaded attention network and self-adaption loss for SER. First, non-personalized features are extracted to represent the process of emotion change while reducing external variables' influence. Second, to highlight salient speech emotion features, a cascade attention network is proposed, where spatial temporal attention can effectively locate the regions of speech that express emotion, while self-attention reduces the dependence on external information. Finally, the influence brought by the differences in gender and human perception of external information is alleviated by using a multitask learning strategy, where a self-adaption loss is introduced to determine the weights of different tasks dynamically. Experimental results on IEMOCAP dataset demonstrate that our method gains an absolute improvement of 1.97% and 0.91% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.
Jingzhao DAI Ming LI Xuejiao HU Yang LI Sidan DU
Gaze following is the task of estimating where an observer is looking inside a scene. Both the observer and scene information must be learned to determine the gaze directions and gaze points. Recently, many existing works have only focused on scenes or observers. In contrast, revealed frameworks for gaze following are limited. In this paper, a gaze following method using a hybrid transformer is proposed. Based on the conventional method (GazeFollow), we conduct three developments. First, a hybrid transformer is applied for learning head images and gaze positions. Second, the pinball loss function is utilized to control the gaze point error. Finally, a novel ReLU layer with the reborn mechanism (reborn ReLU) is conducted to replace traditional ReLU layers in different network stages. To test the performance of our developments, we train our developed framework with the DL Gaze dataset and evaluate the model on our collected set. Through our experimental results, it can be proven that our framework can achieve outperformance over our referred methods.
Yang YU Longlong LIU Ye ZHU Shixin CEN Yang LI
Pedestrian attribute recognition (PAR) aims to recognize a series of a person's semantic attributes, e.g., age, gender, which plays an important role in video surveillance. This paper proposes a multi-correlation graph convolutional network named MCGCN for PAR, which includes a semantic graph, visual graph, and synthesis graph. We construct a semantic graph by using attribute features with semantic constraints. A graph convolution is employed, based on prior knowledge of the dataset, to learn the semantic correlation. 2D features are projected onto visual graph nodes and each node corresponds to the feature region of each attribute group. Graph convolution is then utilized to learn regional correlation. The visual graph nodes are connected to the semantic graph nodes to form a synthesis graph. In the synthesis graph, regional and semantic correlation are embedded into each other through inter-graph edges, to guide each other's learning and to update the visual and semantic graph, thereby constructing semantic and regional correlation. On this basis, we use a better loss weighting strategy, the suit_polyloss, to address the imbalance of pedestrian attribute datasets. Experiments on three benchmark datasets show that the proposed approach achieves superior recognition performance compared to existing technologies, and achieves state-of-the-art performance.
Yang LI Zhuang MIAO Jiabao WANG Yafei ZHANG Hang LI
The latest deep hashing methods perform hash codes learning and image feature learning simultaneously by using pairwise or triplet labels. However, generating all possible pairwise or triplet labels from the training dataset can quickly become intractable, where the majority of those samples may produce small costs, resulting in slow convergence. In this letter, we propose a novel deep discriminative supervised hashing method, called DDSH, which directly learns hash codes based on a new combined loss function. Compared to previous methods, our method can take full advantages of the annotated data in terms of pairwise similarity and image identities. Extensive experiments on standard benchmarks demonstrate that our method preserves the instance-level similarity and outperforms state-of-the-art deep hashing methods in the image retrieval application. Remarkably, our 16-bits binary representation can surpass the performance of existing 48-bits binary representation, which demonstrates that our method can effectively improve the speed and precision of large scale image retrieval systems.
Haiyang LIU Hao ZHANG Lianrong MA
Based on the codewords of the [q,2,q-1] extended Reed-Solomon (RS) code over the finite field Fq, we can construct a regular binary γq×q2 matrix H(γ,q), where q is a power of 2 and γ≤q. The matrix H(γ,q) defines a regular low-density parity-check (LDPC) code C(γ,q), called a full-length RS-LDPC code. Using some analytical methods, we completely determine the values of s(H(4,q)), s(H(5,q)), and d(C(5,q)) in this letter, where s(H(γ,q)) and d(C(γ,q)) are the stopping distance of H(γ,q) and the minimum distance of C(γ,q), respectively.
Haiyang LI Tieran ZHENG Guibin ZHENG Jiqing HAN
In this paper, we propose a novel confidence measure to improve the performance of spoken term detection (STD). The proposed confidence measure is based on the context consistency between a hypothesized word and its context in a word lattice. The main contribution of this paper is to compute the context consistency by considering the uncertainty in the results of speech recognition and the effect of topic. To measure the uncertainty of the context, we employ the word occurrence probability, which is obtained through combining the overlapping hypotheses in a word posterior lattice. To handle the effect of topic, we propose a method of topic adaptation. The adaptation method firstly classifies the spoken document according to the topics and then computes the context consistency of the hypothesized word with the topic-specific measure of semantic similarity. Additionally, we apply the topic-specific measure of semantic similarity by two means, and they are performed respectively with the information of the top-1 topic and the mixture of all topics according to topic classification. The experiments conducted on the Hub-4NE Mandarin database show that both the occurrence probability of context word and the topic adaptation are effective for the confidence measure of STD. The proposed confidence measure performs better compared with the one ignoring the uncertainty of the context or the one using a non-topic method.
Kosuke KATAYAMA Mizuki MOTOYOSHI Kyoya TAKANO Chen Yang LI Shuhei AMAKAWA Minoru FUJISHIMA
E-band communication is allocated to the frequency bands of 71-76 and 81-86GHz. Radio-frequency (RF) front-end components for E-band communication have been realized using compound semiconductor technology. To realize a CMOS LNA for E-band communication, we propose a gain-boosted cascode amplifier (GBCA) stage that simultaneously provides high gain and stability. Designing an LNA from scratch requires considerable time because the tuning of matching networks with consideration of the parasitic elements is complicated. In this paper, we model the characteristics of devices including the effects of their parasitic elements. Using these models, an optimizer can estimate the characteristic of a designed LNA precisely without electromagnetic simulations and gives us the design values of an LNA when the layout constraint is ignored. Starting from the values, a four-stage LNA with a GBCA stage is designed very easily even though the layout constraint is considered and fabricated by a 65nm LP CMOS process. The fabricated LNA is measured, and it is confirmed that it achieves 18.5GHz bandwidth and over 24.3dB gain with 50.6mW power consumption. This is the first LNA to achieve a gain bandwidth of over 300GHz in the E-band among the LNAs utilizing any kind of semiconductor technologies. In this paper, we have proved that CMOS technology, which is suitable for baseband and digital circuitry, is applicable to a communication system covering the entire E-band.
Fuxing CHEN Weiyang LIU Hui LI Dongcheng WU
The traditional multicast switch fabrics, which were mainly developed from the unicast switch fabrics, currently are not able to achieve high efficiency and flexible large-scale scalability. In the light of lattice theory and multicast concentrator, a novel multistage interconnection multicast switch fabric is proposed in this paper. Comparing to traditional multicast switch fabrics, this multicast switch fabric has the advantages of superior scalability, wire-speed, jitter-free multicast with low delay, and no queuing buffer. This paper thoroughly analyzes the performance of the proposed multicast switch fabric with supporting priority-based multicast. Simulations on packet loss rate and delay are discussed and presented at normalized load. Moreover, a detailed FPGA implementation is given. Practical network traffic tests provide evidence supporting the feasibility and stability of the proposed fabric.
Xiang-bin YU Ying WANG Qiu-ming ZHU Yang LI Qing-ming MENG
In this paper, a low-complexity precoding scheme for minimizing the bit error rate (BER) subject to fixed power constraint for distributed antenna systems with non-Kronecker correlation over spatially correlated Rayleigh fading channels is presented. Based on an approximated BER bound and a newly defined compressed signal-to-noise ratio (CSNR) criterion, closed-form expressions of power allocation and beamforming matrix are derived for the developed precoding scheme. This scheme not only has the calculation of the power allocation less than and also obtain the BER performance close to that of the existing optimal precoding scheme. Simulation results show that the proposed scheme can provide BER lower than the equal power allocation and single mode beamforming scheme, has almost the same performance as the existing optimal scheme.
Hao HAN Yinxing XUE Keizo OYAMA Yang LIU
The rendering mechanism plays an indispensable role in browser-based Web application. It generates active webpages dynamically and provides human-readable layout through template engines, which are used as a standard programming model to separate the business logic and data computations from the webpage presentation. The client-side rendering mechanism, owing to the advances of rich application technologies, has been widely adopted. The adoption of client side rendering brings not only various merits but also new problems. In this paper, we propose and construct “pagelet”, a segment-based template engine for developing flexible and extensible Web applications. By presenting principles, practice and usage experience of pagelet, we conduct a comprehensive analysis of possible advantages and disadvantages brought by client-side rendering mechanism from the viewpoints of both developers and end-users.
Yulong XU Yang LI Jiabao WANG Zhuang MIAO Hang LI Yafei ZHANG Gang TAO
Feature extractor is an important component of a tracker and the convolutional neural networks (CNNs) have demonstrated excellent performance in visual tracking. However, the CNN features cannot perform well under conditions of low illumination. To address this issue, we propose a novel deep correlation tracker with backtracking, which consists of target translation, backtracking and scale estimation. We employ four correlation filters, one with a histogram of oriented gradient (HOG) descriptor and the other three with the CNN features to estimate the translation. In particular, we propose a backtracking algorithm to reconfirm the translation location. Comprehensive experiments are performed on a large-scale challenging benchmark dataset. And the results show that the proposed algorithm outperforms state-of-the-art methods in accuracy and robustness.