Yang WANG Hongliang FU Huawei TAO Jing YANG Hongyi GE Yue XIE
This letter focuses on the cross-corpus speech emotion recognition (SER) task, in which the training and testing speech signals in cross-corpus SER belong to different speech corpora. Existing algorithms are incapable of effectively extracting common sentiment information between different corpora to facilitate knowledge transfer. To address this challenging problem, a novel convolutional auto-encoder and adversarial domain adaptation (CAEADA) framework for cross-corpus SER is proposed. The framework first constructs a one-dimensional convolutional auto-encoder (1D-CAE) for feature processing, which can explore the correlation among adjacent one-dimensional statistic features and the feature representation can be enhanced by the architecture based on encoder-decoder-style. Subsequently the adversarial domain adaptation (ADA) module alleviates the feature distributions discrepancy between the source and target domains by confusing domain discriminator, and specifically employs maximum mean discrepancy (MMD) to better accomplish feature transformation. To evaluate the proposed CAEADA, extensive experiments were conducted on EmoDB, eNTERFACE, and CASIA speech corpora, and the results show that the proposed method outperformed other approaches.
Kota YAMASHITA Shotaro KAMIYA Koji YAMAMOTO Yusuke KODA Takayuki NISHIO Masahiro MORIKURA
In this study, a contextual multi-armed bandit (CMAB)-based decentralized channel exploration framework disentangling a channel utility function (i.e., reward) with respect to contending neighboring access points (APs) is proposed. The proposed framework enables APs to evaluate observed rewards compositionally for contending APs, allowing both robustness against reward fluctuation due to neighboring APs' varying channels and assessment of even unexplored channels. To realize this framework, we propose contention-driven feature extraction (CDFE), which extracts the adjacency relation among APs under contention and forms the basis for expressing reward functions in disentangled form, that is, a linear combination of parameters associated with neighboring APs under contention). This allows the CMAB to be leveraged with a joint linear upper confidence bound (JLinUCB) exploration and to delve into the effectiveness of the proposed framework. Moreover, we address the problem of non-convergence — the channel exploration cycle — by proposing a penalized JLinUCB (P-JLinUCB) based on the key idea of introducing a discount parameter to the reward for exploiting a different channel before and after the learning round. Numerical evaluations confirm that the proposed method allows APs to assess the channel quality robustly against reward fluctuations by CDFE and achieves better convergence properties by P-JLinUCB.
Yiyang JIA Jun MITANI Ryuhei UEHARA
Logical matrices are binary matrices often used to represent relations. In the map folding problem, each folded state corresponds to a unique partial order on the set of squares and thus could be described with a logical matrix. The logical matrix representation is powerful than graphs or other common representations considering its association with category theory and homology theory and its generalizability to solve other computational problems. On the application level, such representations allow us to recognize map folding intuitively. For example, we can give a precise mathematical description of a folding process using logical matrices so as to solve problems like how to represent the up-and-down relations between all the layers according to their adjacency in a flat-folded state, how to check self-penetration, and how to deduce a folding process from a given order of squares that is supposed to represent a folded state of the map in a mathematical and natural manner. In this paper, we give solutions to these problems and analyze their computational complexity.
Yasutaka OGAWA Taichi UTSUNO Toshihiko NISHIMURA Takeo OHGANE Takanori SATO
A sub-Terahertz band is envisioned to play a great role in 6G to achieve extreme high data-rate communication. In addition to very wide band transmission, we need spatial multiplexing using a hybrid MIMO system. A recently presented paper, however, reveals that the number of observed multipath components in a sub-Terahertz band is very few in indoor environments. A channel with few multipath components is called sparse. The number of layers (streams), i.e. multiplexing gain in a MIMO system does not exceed the number of multipaths. The sparsity may restrict the spatial multiplexing gain of sub-Terahertz systems, and the poor multiplexing gain may limit the data rate of communication systems. This paper describes fundamental considerations on sub-Terahertz MIMO spatial multiplexing in indoor environments. We examined how we should steer analog beams to multipath components to achieve higher channel capacity. Furthermore, for different beam allocation schemes, we investigated eigenvalue distributions of a channel Gram matrix, power allocation to each layer, and correlations between analog beams. Through simulation results, we have revealed that the analog beams should be steered to all the multipath components to lower correlations and to achieve higher channel capacity.
Quan XIU HO Takao JINNO Yusuke UCHIMI Shigeru KURIYAMA
The colors of objects in natural images are affected by the color of lighting, and accurately estimating an illuminant's color is indispensable in analyzing scenes lit by colored lightings. Recent lighting environments enhance colorfulness due to the spread of light-emitting diode (LED) lightings whose colors are flexibly controlled in a full visible spectrum. However, existing color estimations mainly focus on the single illuminant of normal color ranges. The estimation of multiple illuminants of unusual color settings, such as blue or red of high chroma, has not been studied yet. Therefore, new color estimations should be developed for multiple illuminants of various colors. In this article, we propose a color estimation for LED lightings using Color Line features, which regards the color distribution as a straight line in a local area. This local estimate is suitable for estimating various colors of multiple illuminants. The features are sampled at many small regions in an image and aggregated to estimate a few global colors using supervised learning with a convolutional neural network. We demonstrate the higher accuracy of our method over existing ones for such colorful lighting environments by producing the image dataset lit by multiple LED lightings in a full-color range.
Stance prediction on social media aims to infer the stances of users towards a specific topic or event, which are not expressed explicitly. It is of great significance for public opinion analysis to extract and determine users' stances using user-generated content on social media. Existing research makes use of various signals, ranging from text content to online network connections of users on these platforms. However, it lacks joint modeling of the heterogeneous information for stance prediction. In this paper, we propose a self-supervised heterogeneous graph contrastive learning framework for stance prediction in online debate forums. Firstly, we perform data augmentation on the original heterogeneous information network to generate an augmented view. The original view and augmented view are learned from a meta-path based graph encoder respectively. Then, the contrastive learning among the two views is conducted to obtain high-quality representations of users and issues. Finally, the stance prediction is accomplished by matrix factorization between users and issues. The experimental results on an online debate forum dataset show that our model outperforms other competitive baseline methods significantly.
Takuma KINUGAWA Toshimitsu USHIO
In spatially distributed systems such as smart buildings and intelligent transportation systems, control of spatio-temporal patterns is an important issue. In this paper, we consider a finite-horizon optimal spatio-temporal pattern control problem where the pattern is specified by a signal spatio-temporal logic formula over finite traces, which will be called an SSTLf formula. We give the syntax and Boolean semantics of SSTLf. Then, we show linear encodings of the temporal and spatial operators used in SSTLf and we convert the problem into a mixed integer programming problem. We illustrate the effectiveness of this proposed approach through an example of a heat system in a room.
Zheying HUANG Ji XU Qingwei ZHAO Pengyuan ZHANG
Although end-to-end based speech recognition research for Mandarin-English code-switching has attracted increasing interests, it remains challenging due to data scarcity. Meta-learning approach is popular with low-resource modeling using high-resource data, but it does not make full use of low-resource code-switching data. Therefore we propose a two-fold cross-validation training framework combined with meta-learning approach. Experiments on the SEAME corpus demonstrate the effects of our method.
Tomu MAKITA Atsuki NAGAO Tatsuki OKADA Kazuhisa SETO Junichi TERUYAMA
A branching program is a well-studied model of computation and a representation for Boolean functions. It is a directed acyclic graph with a unique root node, some accepting nodes, and some rejecting nodes. Except for the accepting and rejecting nodes, each node has a label with a variable and each outgoing edge of the node has a label with a 0/1 assignment of the variable. The satisfiability problem for branching programs is, given a branching program with n variables and m nodes, to determine if there exists some assignment that activates a consistent path from the root to an accepting node. The width of a branching program is the maximum number of nodes at any level. The satisfiability problem for width-2 branching programs is known to be NP-complete. In this paper, we present a satisfiability algorithm for width-2 branching programs with n variables and cn nodes, and show that its running time is poly(n)·2(1-µ(c))n, where µ(c)=1/2O(c log c). Our algorithm consists of two phases. First, we transform a given width-2 branching program to a set of some structured formulas that consist of AND and Exclusive-OR gates. Then, we check the satisfiability of these formulas by a greedy restriction method depending on the frequency of the occurrence of variables.
The road space rationing (RSR) method regulates a period in which a user group can make telephone calls in order to decrease the call attempt rate and induce calling parties to shorten their calls during disaster congestion. This paper investigates what settings of this indirect control induce more self-restraint and how the settings change calling parties' behavior using experimental psychology. Our experiments revealed that the length of the regulated period differently affected calling parties' behavior (call duration and call attempt rate) and indicated that the 60-min RSR method (i.e., 10 six-min periods) is the most effective setting against disaster congestion.
Kazuho KANAHARA Kengo KATAYAMA Etsuji TOMITA
The Graph Coloring Problem (GCP) is a fundamental combinatorial optimization problem that has many practical applications. Degree of SATURation (DSATUR) and Recursive Largest First (RLF) are well known as typical solution construction algorithms for GCP. It is necessary to update the vertex degree in the subgraph induced by uncolored vertices when selecting vertices to be colored in both DSATUR and RLF. There is an issue that the higher the edge density of a given graph, the longer the processing time. The purposes of this paper are to propose a degree updating method called Adaptive Degree Updating (ADU for short) that improves the issue, and to evaluate the effectiveness of ADU for DSATUR and RLF on DIMACS benchmark graphs as well as random graphs having a wide range of sizes and densities. Experimental results show that the construction algorithms with ADU are faster than the conventional algorithms for many graphs and that the ADU method yields significant speed-ups relative to the conventional algorithms, especially in the case of large graphs with higher edge density.
To accommodate an increasing amount of traffic efficiently, elastic optical networks (EON) that can use optical spectrum resources flexibly have been studied. We implement multi-path routing in case we cannot allocate the spectrum with single-path routing. However, multi-path routing requires more guard bands to avoid interference between two adjacent optical paths when compared with single-path routing in EON. A multi-path routing algorithm with traffic grooming technology has been proposed. The researchers assumed that a uniform modulation level was adopted, and so they did not consider the impact of path length on the resources needed. In this paper, we propose a multi-path routing method with traffic grooming considering path lengths. Our proposed method establishes an optical multi-path considering path length, fiber utilization, and the use of traffic grooming. Simulations show we can decrease the call-blocking probability by approximately 24.8% in NSFNET. We also demonstrate the effectiveness of traffic grooming and the improvement in the utilization ratio of optical spectrum resources.
Suraj Prakash PATTAR Tsubasa HIRAKAWA Takayoshi YAMASHITA Tetsuya SAWANOBORI Hironobu FUJIYOSHI
Predicting the grasping point accurately and quickly is crucial for successful robotic manipulation. However, to commercially deploy a robot, such as a dishwasher robot in a commercial kitchen, we also need to consider the constraints of limited usable resources. We present a deep learning method to predict the grasp position when using a single suction gripper for picking up objects. The proposed method is based on a shallow network to enable lower training costs and efficient inference on limited resources. Costs are further reduced by collecting data in a custom-built synthetic environment. For evaluating the proposed method, we developed a system that models a commercial kitchen for a dishwasher robot to manipulate symmetric objects. We tested our method against a model-fitting method and an algorithm-based method in our developed commercial kitchen environment and found that a shallow network trained with only the synthetic data achieves high accuracy. We also demonstrate the practicality of using a shallow network in sequence with an object detector for ease of training, prediction speed, low computation cost, and easier debugging.
This letter studies a biobjective optimization problem in binary associative memories characterized by ternary connection parameters. First, we introduce a condition of parameters that guarantees storage of any desired memories and suppression of oscillatory behavior. Second, we define a biobjective problem based on two objectives that evaluate uniform stability of desired memories and sparsity of connection parameters. Performing precise numerical analysis for typical examples, we have clarified existence of a trade-off between the two objectives.
Peng YANG Yu YANG Puning ZHANG Dapeng WU Ruyan WANG
The integration of social networking concepts into the Internet of Things has led to the Social Internet of Things (SIoT) paradigm, and trust evaluation is essential to secure interaction in SIoT. In SIoT, when resource-constrained nodes respond to unexpected malicious services and malicious recommendations, the trust assessment is prone to be inaccurate, and the existing architecture has the risk of privacy leakage. An edge-cloud collaborative trust evaluation architecture in SIoT is proposed in this paper. Utilize the resource advantages of the cloud and the edge to complete the trust assessment task collaboratively. An evaluation algorithm of relationship closeness between nodes is designed to evaluate neighbor nodes' reliability in SIoT. A trust computing algorithm with enhanced sensitivity is proposed, considering the fluctuation of trust value and the conflict between trust indicators to enhance the sensitivity of identifying malicious behaviors. Simulation results show that compared with traditional methods, the proposed trust evaluation method can effectively improve the success rate of interaction and reduce the false detection rate when dealing with malicious services and malicious recommendations.
Yuanwei HOU Yu GU Weiping LI Zhi LIU
The fast evolving credential attacks have been a great security challenge to current password-based information systems. Recently, biometrics factors like facial, iris, or fingerprint that are difficult to forge rise as key elements for designing passwordless authentication. However, capturing and analyzing such factors usually require special devices, hindering their feasibility and practicality. To this end, we present WiASK, a device-free WiFi sensing enabled Authentication System exploring Keystroke dynamics. More specifically, WiASK captures keystrokes of a user typing a pre-defined easy-to-remember string leveraging the existing WiFi infrastructure. But instead of focusing on the string itself which are vulnerable to password attacks, WiASK interprets the way it is typed, i.e., keystroke dynamics, into user identity, based on the biologically validated correlation between them. We prototype WiASK on the low-cost off-the-shelf WiFi devices and verify its performance in three real environments. Empirical results show that WiASK achieves on average 93.7% authentication accuracy, 2.5% false accept rate, and 5.1% false reject rate.
This paper proposes a novel interference cancellation technique that prevents radio receivers from degrading due to periodic interference signals caused by electromagnetic waves emitted from high power circuits. The proposed technique cancels periodic interference signals in the frequency domain, even if the periodic interference signals drift in the time domain. We propose a drift estimation based on a super resolution technique such as ESPRIT. Moreover, we propose a sequential drift estimation to enhance the drift estimation performance. The proposed technique employs a linear filter based on the minimum mean square error criterion with assistance of the estimated drifts for the interference cancellation. The performance of the proposed technique is confirmed by computer simulation. The proposed technique achieves a gain of more than 40dB at the higher frequency part in the band. The proposed canceler achieves such superior performance, if the parameter sets are carefully selected. The proposed sequential drift estimation relaxes the parameter constraints, and enables the proposed cancellation to achieve the performance upper bound.
An interpretation method of inversion phenomena is newly proposed for backward transient scattered field components for both E- and H-polarizations when an ultra-wideband (UWB) pulse wave radiated from a line source is incident on a two-dimensional metal cylinder covered with a lossless dielectric medium layer (coated metal cylinder). A time-domain (TD) asymptotic solution, which is referred to as a TD saddle point technique (TD-SPT), is derived by applying the SPT in evaluating a backward transient scattered field which is expressed by an integral form. The TD-SPT is represented by a combination of a direct geometric optical ray (DGO) and a reflected GO (RGO) series, thereby being able to extract and calculate any backward transient scattered field component from a response waveform. The TD-SPT is useful in understanding the response waveform of a backward transient scattered field by a coated metal cylinder because it can give us the peak value and arrival time of any field component, namely DGO and RGO components, and interpret analytically inversion phenomenon of any field component. The accuracy, validity, and practicality of the TD-SPT are clarified by comparing it with two kinds of reference solutions.
Keisuke INAZAWA Akihiro KASHIHARA
Self-review is essential to improving presentation, particularly for novice/unskilled researchers. In general, they could record a video of their presentation, and then check it out for self-review. However, they would be quite uncomfortable due to their appearance and voice in the video. They also struggle with in-depth self-review. To address these issues, we designed a presentation avatar that reproduces presentation made by researchers. The presentation avatar intends to increase self-awareness through self-reviewing. We also designed a checklist to aid in a detailed self-review, which includes points to be reviewed. This paper also demonstrates presentation avatar systems that use a virtual character and a robot, to allow novice/unskilled researchers as learners to self-review their own presentation using the checklist. The results of case studies with the systems indicate that the presentation avatar systems have the potential to promote self-review. In particular, we found that robot avatar promoted engagement in self-reviewing presentation.
Gengxin NING Yushen LIN Shenjie JIANG Jun ZHANG
The performance of conventional direction of arrival (DOA) methods is susceptible to the uncertainty of acoustic velocity in the underwater environment. To solve this problem, an underwater DOA estimation method with L-shaped array for wide-band signals under unknown acoustic velocity is proposed in this paper. The proposed method refers to the idea of incoherent signal subspace method and Root-MUSIC to obtain two sets of average roots corresponding to the subarray of the L-shaped array. And the geometric relationship between two vertical linear arrays is employed to derive the expression of DOA estimation with respect to the two average roots. The acoustic velocity variable in the DOA estimation expression can be eliminated in the proposed method. The simulation results demonstrate that the proposed method is more accurate and robust than other methods in an unknown acoustic velocity environment.