Ryuta SAKAMOTO Takahiro SHOBUDANI Ryosuke HOTCHI Ryogo KUBO
In video distribution services such as video streaming, the providers must satisfy the various quality demands of the users. One of the human-centric indexes used to assess video quality is the quality of experience (QoE). In video streaming, the video bitrate, video freezing time, and video bitrate switching are significant determiners of QoE. To provide high-quality video streaming services, adaptive streaming using the Moving Picture Experts Group dynamic adaptive streaming over Hypertext Transfer Protocol (MPEG-DASH) is widely utilized. One of the conventional bitrate selection methods for MPEG-DASH selects the bitrate such that the amount of buffered data in the playback buffer, i.e., the playback buffer level, can be maintained at a constant value. This method can avoid buffer overflow and video freezing based on feedback control; however, this method induces high-frequency video bitrate switching, which can degrade QoE. To overcome this issue, this paper proposes a bitrate selection method in an adaptive video steaming for MPEG-DASH to improve the QoE by minimizing the bitrate fluctuation. To this end, the proposed method does not change the bitrate if the playback buffer level is not around its upper or lower limit, corresponding to the full or empty state of the playback buffer, respectively. In particular, to avoid buffer overflow and video freezing, the proposed method selects the bitrate based on proportional-derivative (PD) control to maintain the playback buffer level at a target level, which corresponds to an upper or lower threshold of the playback buffer level. Simulations confirm that, the proposed method offers better QoE than the conventional method for users with various preferences.
Satoshi DENNO Ryoko SASAKI Yafei HOU
This paper proposes non-orthogonal packet access based on low density signature with phase only adaptive precoding. The proposed access allows multiple user terminals to send their packets simultaneously for implementing massive connectivity, though only one antenna is put on every terminal and on an access point. This paper proposes a criterion that defines the optimum rotation angles for the phase only precoding, and an algorithm based on the steepest descent to approach the optimum rotation angles. Moreover, this paper proposes two complexity-reduced algorithms that converge much faster than the original algorithm. When 6 packets are transmitted in 4 time slots, i.e., overloading ratio of 1.5, the proposed adaptive precoding based on all the proposed algorithms attains a gain of about 4dB at the BER of 10-4 in Rician fading channels.
Sourav MISHRA Subhajit CHAUDHURY Hideaki IMAIZUMI Toshihiko YAMASAKI
Our paper attempts to critically assess the robustness of deep learning methods in dermatological evaluation. Although deep learning is being increasingly sought as a means to improve dermatological diagnostics, the performance of models and methods have been rarely investigated beyond studies done under ideal settings. We aim to look beyond results obtained on curated and ideal data corpus, by investigating resilience and performance on user-submitted data. Assessing via few imitated conditions, we have found the overall accuracy to drop and individual predictions change significantly in many cases despite of robust training.
AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology. Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact on today's widespread networked computing environments. In this paper, we evaluate two major entry-level offerings in the state-of-the-art edge device technology, which highlight increased computing power and specialized hardware support for AI applications. We perform a set of deep learning benchmarks on the devices to measure their performance. By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant amount of hardware parallelism.
Kiyoshi KURIHARA Nobumasa SEIYAMA Tadashi KUMANO
This paper describes a method to control prosodic features using phonetic and prosodic symbols as input of attention-based sequence-to-sequence (seq2seq) acoustic modeling (AM) for neural text-to-speech (TTS). The method involves inserting a sequence of prosodic symbols between phonetic symbols that are then used to reproduce prosodic acoustic features, i.e. accents, pauses, accent breaks, and sentence endings, in several seq2seq AM methods. The proposed phonetic and prosodic labels have simple descriptions and a low production cost. By contrast, the labels of conventional statistical parametric speech synthesis methods are complicated, and the cost of time alignments such as aligning the boundaries of phonemes is high. The proposed method does not need the boundary positions of phonemes. We propose an automatic conversion method for conventional labels and show how to automatically reproduce pitch accents and phonemes. The results of objective and subjective evaluations show the effectiveness of our method.
Go ISHII Takaaki HASEGAWA Daichi CHONO
In this paper, we build a microscopic simulator of traffic flow in a three-modal transport society for pedestrians/slow vehicles/vehicles (P/SV/V) to evaluate a post P/V society. The simulator assumes that the SV includes bicycles and micro electric vehicles, whose speed is strictly and mechanically limited up to 30 km/h. In addition, this simulator adopts an SV overtaking model. Modal shifts caused by modal diversity requires new valuation indexes. The simulator has a significant feature of a traveler-based traffic demand simulation not a vehicle-based traffic demand simulation as well as new evaluation indexes. New assessment taking this situation into account is conducted and the results explain new aspects of traffic flow in a three-mode transport society.
Zhenhui XU Tielong SHEN Daizhan CHENG
This paper studies the infinite time horizon optimal control problem for continuous-time nonlinear systems. A completely model-free approximate optimal control design method is proposed, which only makes use of the real-time measured data from trajectories instead of a dynamical model of the system. This approach is based on the actor-critic structure, where the weights of the critic neural network and the actor neural network are updated sequentially by the method of weighted residuals. It should be noted that an external input is introduced to replace the input-to-state dynamics to improve the control policy. Moreover, strict proof of convergence to the optimal solution along with the stability of the closed-loop system is given. Finally, a numerical example is given to show the efficiency of the method.
Hengyong XIANG Li ZHOU Xiaohui BA Jie CHEN
The traditional RANSAC samples uniformly in the dataset which is not efficient in the task with rich prior information. This letter proposes GUISAC (Guided Sample Consensus), which samples with the guidance of various prior information. In image matching, GUISAC extracts seed points sets evenly on images based on various prior factors at first, then it incorporates seed points sets into the sampling subset with a growth function, and a new termination criterion is used to decide whether the current best hypothesis is good enough. Finally, experimental results show that the new method GUISAC has a great advantage in time-consuming than other similar RANSAC methods, and without loss of accuracy.
Ryo HASE Mitsue IMAHORI Norihiko SHINOMIYA
The relationships between producers and consumers have changed radically by the recent growth of sharing economy. Promoting resource sharing can contribute to finding a solution to environmental issues (e.g. reducing food waste, consuming surplus electricity, and so on). Although prosumers have both roles as consumers and suppliers, matching between suppliers and consumers should be determined when the prosumers share resources. Especially, it is important to achieve envy-freeness that is a metric indicating how the number of prosumers feeling unfairness is kept small since the capacity of prosumers to supply resources is limited. Changing resource capacity and demand will make the situation more complex. This paper proposes a resource sharing model based on a temporal network and flows to realize envy-free resource sharing among prosumers. Experimental results demonstrate the deviation of envy among prosumers can be reduced by setting appropriate weights in a flow network.
Kosuke SHIMA Kazuki MARUTA Chang-Jun AHN
This paper proposes a novel weight derivation method to improve adaptive array interference suppression performance based on our previously conceived sample matrix inversion algorithm using common correlation matrix (CCM-SMI), by data-aided approach. In recent broadband wireless communication system such as orthogonal frequency division multiplexing (OFDM) which possesses lots of subcarriers, the computation complexity is serious problem when using SMI algorithm to suppress unknown interference. To resolve this problem, CCM based SMI algorithm was previously proposed. It computes the correlation matrix by the received time domain signals before fast Fourier transform (FFT). However, due to the limited number of pilot symbols, the estimated channel state information (CSI) is often incorrect. It leads limited interference suppression performance. In this paper, we newly employ a data-aided channel state estimation. Decision results of received symbols are obtained by CCM-SMI and then fed-back to the channel estimator. It assists improving CSI estimation accuracy. Computer simulation result reveals that our proposal accomplishes better bit error rate (BER) performance in spite of the minimum pilot symbols with a slight additional computation complexity.
This paper expands our previously proposed semi-blind uplink interference suppression scheme for multicell multiuser massive MIMO systems to support multi modulus signals. The original proposal applies the channel state information (CSI) aided blind adaptive array (BAA) interference suppression after the beamspace preprocessing and the decision feedback channel estimation (DFCE). BAA is based on the constant modulus algorithm (CMA) which can fully exploit the degree of freedom (DoF) of massive antenna arrays to suppress both inter-user interference (IUI) and inter-cell interference (ICI). Its effectiveness has been verified under the extensive pilot contamination constraint. Unfortunately, CMA basically works well only for constant envelope signals such as QPSK and thus the proposed scheme should be expanded to cover QAM signals for more general use. This paper proposes to apply the multi modulus algorithm (MMA) and the minimum mean square error weight derivation based on data-aided sample matrix inversion (MMSE-SMI). It can successfully realize interference suppression even with the use of multi-level envelope signals such as 16QAM with satisfactorily outage probability performance below the fifth percentile.
In this paper, we propose a robust output feedback control method for nonlinear systems with uncertain time-varying parameters associated with diagonal terms and there are additional external disturbances. First, we provide a new practical guidance of obtaining a compact set which contains the allowed time-varying parameters by utilizing a Lyapunov equation and matrix inequalities. Then, we show that all system states and observer errors of the controlled system remain bounded by the proposed controller. Moreover, we show that the ultimate bounds of some system states and observer errors can be made (arbitrarily) small by adjusting a gain-scaling factor depending on the system nonlinearity. With an application example, we illustrate the effectiveness of our control scheme over the existing one.
Isao ECHIZEN Noboru BABAGUCHI Junichi YAMAGISHI Naoko NITTA Yuta NAKASHIMA Kazuaki NAKAMURA Kazuhiro KONO Fuming FANG Seiko MYOJIN Zhenzhong KUANG Huy H. NGUYEN Ngoc-Dung T. TIEU
With the spread of high-performance sensors and social network services (SNS) and the remarkable advances in machine learning technologies, fake media such as fake videos, spoofed voices, and fake reviews that are generated using high-quality learning data and are very close to the real thing are causing serious social problems. We launched a research project, the Media Clone (MC) project, to protect receivers of replicas of real media called media clones (MCs) skillfully fabricated by means of media processing technologies. Our aim is to achieve a communication system that can defend against MC attacks and help ensure safe and reliable communication. This paper describes the results of research in two of the five themes in the MC project: 1) verification of the capability of generating various types of media clones such as audio, visual, and text derived from fake information and 2) realization of a protection shield for media clones' attacks by recognizing them.
Hiroyuki IMAGAWA Motoi IWATA Koichi KISE
There are some technologies like QR codes to obtain digital information from printed matters. Digital watermarking is one of such techniques. Compared with other techniques, digital watermarking is suitable for adding information to images without spoiling their design. For such purposes, digital watermarking methods for printed matters using detection markers or image registration techniques for detecting watermarked areas are proposed. However, the detection markers themselves can damage the appearance such that the advantages of digital watermarking, which do not lose design, are not fully utilized. On the other hand, methods using image registration techniques are not able to work for non-registered images. In this paper, we propose a novel digital watermarking method using deep learning for the detection of watermarked areas instead of using detection markers or image registration. The proposed method introduces a semantic segmentation based on deep learning model for detecting watermarked areas from printed matters. We prepare two datasets for training the deep learning model. One is constituted of geometrically transformed non-watermarked and watermarked images. The number of images in this dataset is relatively large because the images can be generated based on image processing. This dataset is used for pre-training. The other is obtained from actually taken photographs including non-watermarked or watermarked printed matters. The number of this dataset is relatively small because taking the photographs requires a lot of effort and time. However, the existence of pre-training allows a fewer training images. This dataset is used for fine-tuning to improve robustness for print-cam attacks. In the experiments, we investigated the performance of our method by implementing it on smartphones. The experimental results show that our method can carry 96 bits of information with watermarked printed matters.
Lianqiang LI Kangbo SUN Jie ZHU
Knowledge distillation approaches can transfer information from a large network (teacher network) to a small network (student network) to compress and accelerate deep neural networks. This paper proposes a novel knowledge distillation approach called multi-knowledge distillation (MKD). MKD consists of two stages. In the first stage, it employs autoencoders to learn compact and precise representations of the feature maps (FM) from the teacher network and the student network, these representations can be treated as the essential of the FM, i.e., EFM. In the second stage, MKD utilizes multiple kinds of knowledge, i.e., the magnitude of individual sample's EFM and the similarity relationships among several samples' EFM to enhance the generalization ability of the student network. Compared with previous approaches that employ FM or the handcrafted features from FM, the EFM learned from autoencoders can be transferred more efficiently and reliably. Furthermore, the rich information provided by the multiple kinds of knowledge guarantees the student network to mimic the teacher network as closely as possible. Experimental results also show that MKD is superior to the-state-of-arts.
Hashing methods have proven to be effective algorithm for image retrieval. However, learning discriminative hash codes is challenging for unsupervised models. In this paper, we propose a novel distinguishable image retrieval framework, named Unsupervised Deep Embedded Hashing (UDEH), to recursively learn discriminative clustering through soft clustering models and generate highly similar binary codes. We reduce the data dimension by auto-encoder and apply binary constraint loss to reduce quantization error. UDEH can be jointly optimized by standard stochastic gradient descent (SGD) in the embedd layer. We conducted a comprehensive experiment on two popular datasets.
Nobuchika SAKATA Kohei KANAMORI Tomu TOMINAGA Yoshinori HIJIKATA Kensuke HARADA Kiyoshi KIYOKAWA
The aim of this study is to calculate optimal walking routes in real space for users partaking in immersive virtual reality (VR) games without compromising their immersion. To this end, we propose a navigation system to automatically determine the route to be taken by a VR user to avoid collisions with surrounding obstacles. The proposed method is evaluated by simulating a real environment. It is verified to be capable of calculating and displaying walking routes to safely guide users to their destinations without compromising their VR immersion. In addition, while walking in real space while experiencing VR content, users can choose between 6-DoF (six degrees of freedom) and 3-DoF (three degrees of freedom). However, we expect users to prefer 3-DoF conditions, as they tend to walk longer while using VR content. In dynamic situations, when two pedestrians are added to a designated computer-generated real environment, it is necessary to calculate the walking route using moving body prediction and display the moving body in virtual space to preserve immersion.
Kazuya URAZOE Nobutaka KUROKI Yu KATO Shinya OHTANI Tetsuya HIROSE Masahiro NUMA
This paper presents an image super-resolution technique using a convolutional neural network (CNN) and multi-task learning for multiple image categories. The image categories include natural, manga, and text images. Their features differ from each other. However, several CNNs for super-resolution are trained with a single category. If the input image category is different from that of the training images, the performance of super-resolution is degraded. There are two possible solutions to manage multi-categories with conventional CNNs. The first involves the preparation of the CNNs for every category. This solution, however, requires a category classifier to select an appropriate CNN. The second is to learn all categories with a single CNN. In this solution, the CNN cannot optimize its internal behavior for each category. Therefore, this paper presents a super-resolution CNN architecture for multiple image categories. The proposed CNN has two parallel outputs for a high-resolution image and a category label. The main CNN for the high-resolution image is a normal three convolutional layer-architecture, and the sub neural network for the category label is branched out from its middle layer and consists of two fully-connected layers. This architecture can simultaneously learn the high-resolution image and its category using multi-task learning. The category information is used for optimizing the super-resolution. In an applied setting, the proposed CNN can automatically estimate the input image category and change the internal behavior. Experimental results of 2× image magnification have shown that the average peak signal-to-noise ratio for the proposed method is approximately 0.22 dB higher than that for the conventional super-resolution with no difference in processing time and parameters. We have ensured that the proposed method is useful when the input image category is varying.
Kenichi KAWAMURA Akiyoshi INOKI Shouta NAKAYAMA Keisuke WAKAO Yasushi TAKATORI
A method is presented for increasing wireless LAN (WLAN) capacity in high-density environments with IEEE 802.11ax systems. We propose using coordinated scheduling of trigger frames based on our mobile cooperative control concept. High-density WLAN systems are managed by a management server, which gathers wireless environmental information from user equipment through cellular access. Hierarchical clustering of basic service sets is used to form synchronized clusters to reduce interference and increase throughput of high-density WLAN systems based on mobile cooperative control. This method increases uplink capacity by up to 19.4% and by up to 11.3% in total when WLAN access points are deployed close together. This control method is potentially effective for IEEE 802.11ax WLAN systems utilized as 5G mobile network components.
Kota YOSHIDA Mitsuru SHIOZAKI Shunsuke OKURA Takaya KUBOTA Takeshi FUJINO
A model extraction attack is a security issue in deep neural networks (DNNs). Information on a trained DNN model is an attractive target for an adversary not only in terms of intellectual property but also of security. Thus, an adversary tries to reveal the sensitive information contained in the trained DNN model from machine-learning services. Previous studies on model extraction attacks assumed that the victim provides a machine-learning cloud service and the adversary accesses the service through formal queries. However, when a DNN model is implemented on an edge device, adversaries can physically access the device and try to reveal the sensitive information contained in the implemented DNN model. We call these physical model extraction attacks model reverse-engineering (MRE) attacks to distinguish them from attacks on cloud services. Power side-channel analyses are often used in MRE attacks to reveal the internal operation from power consumption or electromagnetic leakage. Previous studies, including ours, evaluated MRE attacks against several types of DNN processors with power side-channel analyses. In this paper, information leakage from a systolic array which is used for the matrix multiplication unit in the DNN processors is evaluated. We utilized correlation power analysis (CPA) for the MRE attack and reveal weight parameters of a DNN model from the systolic array. Two types of the systolic array were implemented on field-programmable gate array (FPGA) to demonstrate that CPA reveals weight parameters from those systolic arrays. In addition, we applied an extended analysis approach called “chain CPA” for robust CPA analysis against the systolic arrays. Our experimental results indicate that an adversary can reveal trained model parameters from a DNN accelerator even if the DNN model parameters in the off-chip bus are protected with data encryption. Countermeasures against side-channel leaks will be important for implementing a DNN accelerator on a FPGA or application-specific integrated circuit (ASIC).