Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN
Yinan YANG
Myung-Hyun KIM Seungkwang LEE
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI
Martin LUKAC Saadat NURSULTAN Georgiy KRYLOV Oliver KESZOCZE Abilmansur RAKHMETTULAYEV Michitaka KAMEYAMA
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG
Liu ZHANG Zilong WANG Yindong CHEN
Wenxia Bao An Lin Hua Huang Xianjun Yang Hemu Chen
Fengshan ZHAO Qin LIU Takeshi IKENAGA
Haruhiko KAIYA Shinpei OGATA Shinpei HAYASHI
Jiakai LI Jianyong DUAN Hao WANG Li HE Qing ZHANG
Yuxin HUANG Yuanlin YANG Enchang ZHU Yin LIANG Yantuan XIAN
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI
Na XING Lu LI Ye ZHANG Shiyi YANG
Zhe Wang Zhe-Ming Lu Hao Luo Yang-Ming Zheng
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO
Tomohiro KOBAYASHI Tomomi MATSUI
Shin-ichi NAKANO
Hongzhi XU Binlian ZHANG
Weizhi WANG Lei XIA Zhuo ZHANG Xiankai MENG
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA
Rinka KAWANO Masaki KAWAMURA
Zhishuo ZHANG Chengxiang TAN Xueyan ZHAO Min YANG
Peng WANG Guifen CHEN Zhiyao SUN
Zeyuan JU Zhipeng LIU Yu GAO Haotian LI Qianhang DU Kota YOSHIKAWA Shangce GAO
Ji WU Ruoxi YU Kazuteru NAMBA
Hao WANG Yao Ma Jianyong Duan Li HE Xin Li
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU
Arata KANEKO Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO
Qi LIU Bo WANG Shihan TAN Shurong ZOU Wenyi GE
HanYu Zhang Tomoji Kishi
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER
Yoon Hak KIM
Takashi HIRAYAMA Rin SUZUKI Katsuhisa YAMANAKA Yasuaki NISHITANI
Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA
Batnasan Luvaanjalba Elaine Yi-Ling Wu
KuanChao CHU Satoshi YAMAZAKI Hideki NAKAYAMA
Shenglei LI Haoran LUO Tengfei SHAO Reiko HISHIYAMA
Yasushi YUMINAKA Kazuharu NAKAJIMA Yosuke IIJIMA
Chunbo Liu Liyin Wang Zhikai Zhang Chunmiao Xiang Zhaojun Gu Zhi Wang Shuang Wang
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG
Yuhao LIU Zhenzhong CHU Lifei WEI
Ken ASANO Masanori NATSUI Takahiro HANYU
Shuto HASEGAWA Koichiro ENOMOTO Taeko MIZUTANI Yuri OKANO Takenori TANAKA Osamu SAKAI
Zhewei XU Mizuho IWAIHARA
Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI
Taisei SAITO Kota ANDO Tetsuya ASAI
Shiyu YANG Tetsuya KANDA Daniel M. GERMAN Yoshiki HIGO
Tsutomu SASAO
Jiyeon LEE
Koichi MORIYAMA Akira OTSUKA
Hongliang FU Qianqian LI Huawei TAO Chunhua ZHU Yue XIE Ruxue GUO
Gao WANG Gaoli WANG Siwei SUN
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG
Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG
Tomoyasu NAKANO Masataka GOTO
Hyebong CHOI Joel SHIN Jeongho KIM Samuel YOON Hyeonmin PARK Hyejin CHO Jiyoung JUNG
Xianglong LI Yuan LI Jieyuan ZHANG Xinhai XU Donghong LIU
Haoran LUO Tengfei SHAO Shenglei LI Reiko HISHIYAMA
Chang SUN Yitong LIU Hongwen YANG
Ji XI Yue XIE Pengxu JIANG Wei JIANG
Ming PAN
Takuma KINUGAWA Toshimitsu USHIO
In spatially distributed systems such as smart buildings and intelligent transportation systems, control of spatio-temporal patterns is an important issue. In this paper, we consider a finite-horizon optimal spatio-temporal pattern control problem where the pattern is specified by a signal spatio-temporal logic formula over finite traces, which will be called an SSTLf formula. We give the syntax and Boolean semantics of SSTLf. Then, we show linear encodings of the temporal and spatial operators used in SSTLf and we convert the problem into a mixed integer programming problem. We illustrate the effectiveness of this proposed approach through an example of a heat system in a room.
Takashi TOMITA Shigeki HAGIHARA Masaya SHIMAKAWA Naoki YONEZAKI
This paper focuses on verification for reactive system specifications. A reactive system is an open system that continuously interacts with an uncontrollable external environment, and it must often be highly safe and reliable. However, realizability checking for a given specification is very costly, so we need effective methods to detect and analyze defects in unrealizable specifications to refine them efficiently. We introduce a systematic characterization on necessary conditions of realizability. This characterization is based on quantifications for inputs and outputs in early and late behaviors and reveals four essential aspects of realizability: exhaustivity, strategizability, preservability and stability. Additionally, the characterization derives new necessary conditions, which enable us to classify unrealizable specifications systematically and hierarchically.
Kohei TATEISHI Chihiro TSUTAKE Keita TAKAHASHI Toshiaki FUJII
A light field (LF), which is represented as a set of dense, multi-view images, has been used in various 3D applications. To make LF acquisition more efficient, researchers have investigated compressive sensing methods by incorporating certain coding functionalities into a camera. In this paper, we focus on a challenging case called snapshot compressive LF imaging, in which an entire LF is reconstructed from only a single acquired image. To embed a large amount of LF information in a single image, we consider two promising methods based on rapid optical control during a single exposure: time-multiplexed coded aperture (TMCA) and coded focal stack (CFS), which were proposed individually in previous works. Both TMCA and CFS can be interpreted in a unified manner as extensions of the coded aperture (CA) and focal stack (FS) methods, respectively. By developing a unified algorithm pipeline for TMCA and CFS, based on deep neural networks, we evaluated their performance with respect to other possible imaging methods. We found that both TMCA and CFS can achieve better reconstruction quality than the other snapshot methods, and they also perform reasonably well compared to methods using multiple acquired images. To our knowledge, we are the first to present an overall discussion of TMCA and CFS and to compare and validate their effectiveness in the context of compressive LF imaging.
Yoshitaka KIDANI Haruhisa KATO Kei KAWAMURA Hiroshi WATANABE
Geometric partitioning mode (GPM) is a new inter prediction tool adopted in versatile video coding (VVC), which is the latest video coding of international standard developed by joint video expert team in 2020. Different from the regular inter prediction performed on rectangular blocks, GPM separates a coding block into two regions by the pre-defined 64 types of straight lines, generates inter predicted samples for each separated region, and then blends them to obtain the final inter predicted samples. With this feature, GPM improves the prediction accuracy at the boundary between the foreground and background with different motions. However, GPM has room to further improve the prediction accuracy if the final predicted samples can be generated using not only inter prediction but also intra prediction. In this paper, we propose a GPM with inter and intra prediction to achieve further enhanced compression capability beyond VVC. To maximize the coding performance of the proposed method, we also propose the restriction of the applicable intra prediction mode number and the prohibition of applying the intra prediction to both GPM-separated regions. The experimental results show that the proposed method improves the coding performance gain by the conventional GPM method of VVC by 1.3 times, and provides an additional coding performance gain of 1% bitrate savings in one of the coding structures for low-latency video transmission where the conventional GPM method cannot be utilized.
Seung-Tak NOH Hiroki HARADA Xi YANG Tsukasa FUKUSATO Takeo IGARASHI
It is important to consider curvature properties around the control points to produce natural-looking results in the vector illustration. C2 interpolating splines satisfy point interpolation with local support. Unfortunately, they cannot control the sharpness of the segment because it utilizes trigonometric function as blending function that has no degree of freedom. In this paper, we alternate the definition of C2 interpolating splines in both interpolation curve and blending function. For the interpolation curve, we adopt a rational Bézier curve that enables the user to tune the shape of curve around the control point. For the blending function, we generalize the weighting scheme of C2 interpolating splines and replace the trigonometric weight to our novel hyperbolic blending function. By extending this basic definition, we can also handle exact non-C2 features, such as cusps and fillets, without losing generality. In our experiment, we provide both quantitative and qualitative comparisons to existing parametric curve models and discuss the difference among them.
Wenhao HUANG Akira TSUGE Yin CHEN Tadashi OKOSHI Jin NAKAZAWA
Crowdedness of buses is playing an increasingly important role in the disease control of COVID-19. The lack of a practical approach to sensing the crowdedness of buses is a major problem. This paper proposes a bus crowdedness sensing system which exploits deep learning-based object detection to count the numbers of passengers getting on and off a bus and thus estimate the crowdedness of buses in real time. In our prototype system, we combine YOLOv5s object detection model with Kalman Filter object tracking algorithm to implement a sensing algorithm running on a Jetson nano-based vehicular device mounted on a bus. By using the driving recorder video data taken from real bus, we experimentally evaluate the performance of the proposed sensing system to verify that our proposed system system improves counting accuracy and achieves real-time processing at the Jetson Nano platform.
Kotaro MATSUURA Chihiro TSUTAKE Keita TAKAHASHI Toshiaki FUJII
Inspired by the framework of algorithm unrolling, we propose a scalable network architecture that computes layer patterns for light field displays, enabling control of the trade-off between the display quality and the computational cost on a single pre-trained network.
Ana GUASQUE Patricia BALBASTRE
In order to obtain a feasible schedule of a hard real-time system, heuristic based techniques are the solution of choice. In the last few years, optimization solvers have gained attention from research communities due to their capability of handling large number of constraints. Recently, some works have used integer linear programming (ILP) for solving mono processor scheduling of real-time systems. In fact, ILP is commonly used for static scheduling of multiprocessor systems. However, two main solvers are used to solve the problem indistinctly. But, which one is the best for obtaining a schedulable system for hard real-time systems? This paper makes a comparison of two well-known optimization software packages (CPLEX and GUROBI) for the problem of finding a feasible schedule on monoprocessor hard real-time systems.
Kenya TAJIMA Takahiko HENMI Tsuyoshi KATO
Domain knowledge is useful to improve the generalization performance of learning machines. Sign constraints are a handy representation to combine domain knowledge with learning machine. In this paper, we consider constraining the signs of the weight coefficients in learning the linear support vector machine, and develop an optimization algorithm for minimizing the empirical risk under the sign constraints. The algorithm is based on the Frank-Wolfe method that also converges sublinearly and possesses a clear termination criterion. We show that each iteration of the Frank-Wolfe also requires O(nd+d2) computational cost. Furthermore, we derive the explicit expression for the minimal iteration number to ensure an ε-accurate solution by analyzing the curvature of the objective function. Finally, we empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.
Jinyan LU Quanzhen HUANG Shoubing LIU
For intelligent vision measurement, the geometric image feature extraction is an essential issue. Contour primitive of interest (CPI) means a regular-shaped contour feature lying on a target object, which is widely used for geometric calculation in vision measurement and servoing. To realize that the CPI extraction model can be flexibly applied to different novel objects, the one-shot learning based CPI extraction can be implemented with deep convolutional neural network, by using only one annotated support image to guide the CPI extraction process. In this paper, we propose a multi-stage contour primitives of interest extraction network (MS-CPieNet), which uses the multi-stage strategy to improve the discrimination ability of CPI and complex background. Second, the spatial non-local attention module is utilized to enhance the deep features, by globally fusing the image features with both short and long ranges. Moreover, the dense 4-direction classification is designed to obtain the normal direction of the contour, and the directions can be further used for the contour thinning post-process. The effectiveness of the proposed methods is validated by the experiments with the OCP and ROCM datasets. A 2-D measurement experiments are conducted to demonstrate the convenient application of the proposed MS-CPieNet.
Quan XIU HO Takao JINNO Yusuke UCHIMI Shigeru KURIYAMA
The colors of objects in natural images are affected by the color of lighting, and accurately estimating an illuminant's color is indispensable in analyzing scenes lit by colored lightings. Recent lighting environments enhance colorfulness due to the spread of light-emitting diode (LED) lightings whose colors are flexibly controlled in a full visible spectrum. However, existing color estimations mainly focus on the single illuminant of normal color ranges. The estimation of multiple illuminants of unusual color settings, such as blue or red of high chroma, has not been studied yet. Therefore, new color estimations should be developed for multiple illuminants of various colors. In this article, we propose a color estimation for LED lightings using Color Line features, which regards the color distribution as a straight line in a local area. This local estimate is suitable for estimating various colors of multiple illuminants. The features are sampled at many small regions in an image and aggregated to estimate a few global colors using supervised learning with a convolutional neural network. We demonstrate the higher accuracy of our method over existing ones for such colorful lighting environments by producing the image dataset lit by multiple LED lightings in a full-color range.
In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.
Nenghuan ZHANG Yongbin WANG Xiaoguang WANG Peng YU
Recently, multi-modal fusion methods based on remote sensing data and social sensing data have been widely used in the field of urban region function recognition. However, due to the high complexity of noise problem, most of the existing methods are not robust enough when applied in real-world scenes, which seriously affect their application value in urban planning and management. In addition, how to extract valuable periodic feature from social sensing data still needs to be further study. To this end, we propose a multi-modal fusion network guided by feature co-occurrence for urban region function recognition, which leverages the co-occurrence relationship between multi-modal features to identify abnormal noise feature, so as to guide the fusion network to suppress noise feature and focus on clean feature. Furthermore, we employ a graph convolutional network that incorporates node weighting layer and interactive update layer to effectively extract valuable periodic feature from social sensing data. Lastly, experimental results on public available datasets indicate that our proposed method yeilds promising improvements of both accuracy and robustness over several state-of-the-art methods.
Manaya TOMIOKA Tsuneo KATO Akihiro TAMURA
A neural conversational model (NCM) based on an encoder-decoder recurrent neural network (RNN) with an attention mechanism learns different sequence-to-sequence mappings from what neural machine translation (NMT) learns even when based on the same technique. In the NCM, we confirmed that target-word-to-source-word mappings captured by the attention mechanism are not as clear and stationary as those for NMT. Considering that vector norms indicate a magnitude of information in the processing, we analyzed the inner workings of an encoder-decoder GRU-based NCM focusing on the norms of word embedding vectors and hidden vectors. First, we conducted correlation analyses on the norms of word embedding vectors with frequencies in the training set and with conditional entropies of a bi-gram language model to understand what is correlated with the norms in the encoder and decoder. Second, we conducted correlation analyses on norms of change in the hidden vector of the recurrent layer with their input vectors for the encoder and decoder, respectively. These analyses were done to understand how the magnitude of information propagates through the network. The analytical results suggested that the norms of the word embedding vectors are associated with their semantic information in the encoder, while those are associated with the predictability as a language model in the decoder. The analytical results further revealed how the norms propagate through the recurrent layer in the encoder and decoder.
Stance prediction on social media aims to infer the stances of users towards a specific topic or event, which are not expressed explicitly. It is of great significance for public opinion analysis to extract and determine users' stances using user-generated content on social media. Existing research makes use of various signals, ranging from text content to online network connections of users on these platforms. However, it lacks joint modeling of the heterogeneous information for stance prediction. In this paper, we propose a self-supervised heterogeneous graph contrastive learning framework for stance prediction in online debate forums. Firstly, we perform data augmentation on the original heterogeneous information network to generate an augmented view. The original view and augmented view are learned from a meta-path based graph encoder respectively. Then, the contrastive learning among the two views is conducted to obtain high-quality representations of users and issues. Finally, the stance prediction is accomplished by matrix factorization between users and issues. The experimental results on an online debate forum dataset show that our model outperforms other competitive baseline methods significantly.
In this paper, we propose a scheme to strengthen network-based moving target defense with disposable identifiers. The main idea is to change disposable identifiers for each packet to maximize unpredictability with large hopping space and substantially high hopping frequency. It allows network-based moving target defense to defeat active scanning, passive scanning, and passive host profiling attacks. Experimental results show that the proposed scheme changes disposable identifiers for each packet while requiring low overhead.
Yang WANG Hongliang FU Huawei TAO Jing YANG Hongyi GE Yue XIE
This letter focuses on the cross-corpus speech emotion recognition (SER) task, in which the training and testing speech signals in cross-corpus SER belong to different speech corpora. Existing algorithms are incapable of effectively extracting common sentiment information between different corpora to facilitate knowledge transfer. To address this challenging problem, a novel convolutional auto-encoder and adversarial domain adaptation (CAEADA) framework for cross-corpus SER is proposed. The framework first constructs a one-dimensional convolutional auto-encoder (1D-CAE) for feature processing, which can explore the correlation among adjacent one-dimensional statistic features and the feature representation can be enhanced by the architecture based on encoder-decoder-style. Subsequently the adversarial domain adaptation (ADA) module alleviates the feature distributions discrepancy between the source and target domains by confusing domain discriminator, and specifically employs maximum mean discrepancy (MMD) to better accomplish feature transformation. To evaluate the proposed CAEADA, extensive experiments were conducted on EmoDB, eNTERFACE, and CASIA speech corpora, and the results show that the proposed method outperformed other approaches.
Joanna Kazzandra DUMAGPI Yong-Jin JEONG
Fine-grained image analysis, such as pixel-level approaches, improves threat detection in x-ray security images. In the practical setting, the cost of obtaining complete pixel-level annotations increases significantly, which can be reduced by partially labeling the dataset. However, handling partially labeled datasets can lead to training complicated multi-stage networks. In this paper, we propose a new end-to-end object separation framework that trains a single network on a partially labeled dataset while also alleviating the inherent class imbalance at the data and object proposal level. Empirical results demonstrate significant improvement over existing approaches.
The purpose of graph embedding is to learn a lower-dimensional embedding function for graph data. Existing methods usually rely on maximum likelihood estimation (MLE), and often learn an embedding function through conditional mean estimation (CME). However, MLE is well-known to be vulnerable to the contamination of outliers. Furthermore, CME might restrict the applicability of the graph embedding methods to a limited range of graph data. To cope with these problems, this paper proposes a novel method for graph embedding called the robust ratio graph embedding (RRGE). RRGE is based on the ratio estimation between the conditional and marginal probability distributions of link weights given data vectors, and would be applicable to a wider-range of graph data than CME-based methods. Moreover, to achieve outlier-robust estimation, the ratio is estimated with the γ-cross entropy, which is a robust alternative to the standard cross entropy. Numerical experiments on artificial data show that RRGE is robust against outliers and performs well even when CME-based methods do not work at all. Finally, the performance of the proposed method is demonstrated on realworld datasets using neural networks.
Privacy violations via spy cameras are becoming increasingly serious. With the recent advent of various smart home IoT devices, such as smart TVs and robot vacuum cleaners, spycam attacks that steal users' information are being carried out in more unpredictable ways. In this paper, we introduce a new spycam attack on a mobile WebVR environment. It is performed by a web attacker who maliciously accesses the back-facing cameras of victims' mobile devices while they are browsing the attacker's WebVR site. This has the power to allow the attacker to capture victims' surroundings even at the desired field of view through sophisticated content placement in VR scenes, resulting in serious privacy breaches for mobile VR users. In this letter, we introduce a new threat facing mobile VR and show that it practically works with major browsers in a stealthy manner.
Zhi LIU Jia CAO Xiaohan GUAN Mengmeng ZHANG
Inter-channel correlation is one of the redundancy which need to be eliminated in video coding. In the latest video coding standard H.266/VVC, the DM (Direct Mode) and CCLM (Cross-component Linear Model) modes have been introduced to reduce the similarity between luminance and chroma. However, inter-channel correlation is still observed. In this paper, a new inter-channel prediction algorithm is proposed, which utilizes coloring principle to predict chroma pixels. From the coloring perspective, for most natural content video frames, the three components Y, U and V always demonstrate similar coloring pattern. Therefore, the U and V components can be predicted using the coloring pattern of the Y component. In the proposed algorithm, correlation coefficients are obtained in a lightweight way to describe the coloring relationship between current pixel and reference pixel in Y component, and used to predict chroma pixels. The optimal position for the reference samples is also designed. Base on the selected position of the reference samples, two new chroma prediction modes are defined. Experiment results show that, compared with VTM 12.1, the proposed algorithm has an average of -0.92% and -0.96% BD-rate improvement for U and V components, for All Intra (AI) configurations. At the same time, the increased encoding time and decoding time can be ignored.
Zhi LIU Fangyuan ZHAO Mengmeng ZHANG
In video-text retrieval task, mainstream framework consists of three parts: video encoder, text encoder and similarity calculation. MMT (Multi-modal Transformer) achieves remarkable performance for this task, however, it faces the problem of insufficient training dataset. In this paper, an efficient multimodal aggregation network for video-text retrieval is proposed. Different from the prior work using MMT to fuse video features, the NetVLAD is introduced in the proposed network. It has fewer parameters and is feasible for training with small datasets. In addition, since the function of CLIP (Contrastive Language-Image Pre-training) can be considered as learning language models from visual supervision, it is introduced as text encoder in the proposed network to avoid overfitting. Meanwhile, in order to make full use of the pre-training model, a two-step training scheme is designed. Experiments show that the proposed model achieves competitive results compared with the latest work.
Koki TSUBOTA Hiroaki AKUTSU Kiyoharu AIZAWA
Image quality assessment (IQA) is a fundamental metric for image processing tasks (e.g., compression). With full-reference IQAs, traditional IQAs, such as PSNR and SSIM, have been used. Recently, IQAs based on deep neural networks (deep IQAs), such as LPIPS and DISTS, have also been used. It is known that image scaling is inconsistent among deep IQAs, as some perform down-scaling as pre-processing, whereas others instead use the original image size. In this paper, we show that the image scale is an influential factor that affects deep IQA performance. We comprehensively evaluate four deep IQAs on the same five datasets, and the experimental results show that image scale significantly influences IQA performance. We found that the most appropriate image scale is often neither the default nor the original size, and the choice differs depending on the methods and datasets used. We visualized the stability and found that PieAPP is the most stable among the four deep IQAs.