Youliang ZHENG Ruihu LI Jingjie LV Qiang FU
Locally repairable codes (LRCs) are a type of new erasure codes designed for modern distributed storage systems (DSSs). In order to obtain ternary LRCs of distance 6, firstly, we propose constructions with disjoint repair groups and construct several families of LRCs with 1 ≤ r ≤ 6, where codes with 3 ≤ r ≤ 6 are obtained through a search algorithm. Then, we propose a new method to extend the length of codes without changing the distance. By employing the methods such as expansion and deletion, we obtain more LRCs from a known LRC. The resulting LRCs are optimal or near optimal in terms of the Cadambe-Mazumdar (C-M) bound.
Kazuhiro MURAKAMI Arata KAWAMURA Yoh-ichi FUJISAKA Nobuhiko HIRUMA Youji IIGUNI
In this paper, we propose a real-time BSS (Blind Source Separation) system with two microphones that extracts only desired sound sources. Under the assumption that the desired sound sources are close to the microphones, the proposed BSS system suppresses distant sound sources as undesired sound sources. We previously developed a BSS system that can estimate the distance from a microphone to a sound source and suppress distant sound sources, but it was not a real-time processing system. The proposed BSS system is a real-time version of our previous BSS system. To develop the proposed BSS system, we simplify some BSS procedures of the previous system. Simulation results showed that the proposed system can effectively suppress the distant source signals in real-time and has almost the same capability as the previous system.
Kiyoshi KURIHARA Nobumasa SEIYAMA Tadashi KUMANO
This paper describes a method to control prosodic features using phonetic and prosodic symbols as input of attention-based sequence-to-sequence (seq2seq) acoustic modeling (AM) for neural text-to-speech (TTS). The method involves inserting a sequence of prosodic symbols between phonetic symbols that are then used to reproduce prosodic acoustic features, i.e. accents, pauses, accent breaks, and sentence endings, in several seq2seq AM methods. The proposed phonetic and prosodic labels have simple descriptions and a low production cost. By contrast, the labels of conventional statistical parametric speech synthesis methods are complicated, and the cost of time alignments such as aligning the boundaries of phonemes is high. The proposed method does not need the boundary positions of phonemes. We propose an automatic conversion method for conventional labels and show how to automatically reproduce pitch accents and phonemes. The results of objective and subjective evaluations show the effectiveness of our method.
Mingxing ZHANG Zhengchun ZHOU Meng YANG Haode YAN
The partial-period autocorrelation of sequences is an important performance measure of communication systems employing them, but it is notoriously difficult to be analyzed. In this paper, we propose an algorithm to design unimodular sequences with low partial-period autocorrelations via directly minimizing the partial-period integrated sidelobe level (PISL). The proposed algorithm is inspired by the monotonic minimizer for integrated sidelobe level (MISL) algorithm. Then an acceleration scheme is considered to further accelerate the algorithms. Numerical experiments show that the proposed algorithm can effectively generate sequences with lower partial-period peak sidelobe level (PPSL) compared with the well-known Zadoff-Chu sequences.
Ayano NAKAI-KASAI Kazunori HAYASHI
Diffusion least-mean-square (LMS) is a method to estimate and track an unknown parameter at multiple nodes in a network. When the unknown vector has sparsity, the sparse promoting version of diffusion LMS, which utilizes a sparse regularization term in the cost function, is known to show better convergence performance than that of the original diffusion LMS. This paper proposes a novel choice of the coefficients involved in the updates of sparse diffusion LMS using the idea of message propagation. Moreover, we optimize the proposed coefficients with respect to mean-square-deviation at the steady-state. Simulation results demonstrate that the proposed method outperforms conventional methods in terms of the convergence performance.
Kohei SHIMATANI Shigemasa TAKAI
We consider the bisimilarity control problem for partially observed nondeterministic discrete event systems with deterministic specifications. This problem requires us to synthesize a supervisor that achieves bisimulation equivalence of the supervised system and the deterministic specification under partial observation. We present necessary and sufficient conditions for the existence of such a deterministic supervisor and show that these conditions can be verified polynomially.
In this paper, we propose a robust output feedback control method for nonlinear systems with uncertain time-varying parameters associated with diagonal terms and there are additional external disturbances. First, we provide a new practical guidance of obtaining a compact set which contains the allowed time-varying parameters by utilizing a Lyapunov equation and matrix inequalities. Then, we show that all system states and observer errors of the controlled system remain bounded by the proposed controller. Moreover, we show that the ultimate bounds of some system states and observer errors can be made (arbitrarily) small by adjusting a gain-scaling factor depending on the system nonlinearity. With an application example, we illustrate the effectiveness of our control scheme over the existing one.
Shinichi KAWAMURA Yuichi KOMANO Hideo SHIMIZU Saki OSUKA Daisuke FUJIMOTO Yuichi HAYASHI Kentaro IMAFUKU
The residue number system (RNS) is a method for representing an integer x as an n-tuple of its residues with respect to a given set of moduli. In RNS, addition, subtraction, and multiplication can be carried out by independent operations with respect to each modulus. Therefore, an n-fold speedup can be achieved by parallel processing. The main disadvantage of RNS is that we cannot efficiently compare the magnitude of two integers or determine the sign of an integer. Two general methods of comparison are to transform a number in RNS to a mixed-radix system or to a radix representation using the Chinese remainder theorem (CRT). We used the CRT to derive an equation approximating a value of x relative to M, the product of moduli. Then, we propose two algorithms that efficiently evaluate the equation and output a sign bit. The expected number of steps of these algorithms is of order n. The algorithms use a lookup table that is (n+3) times as large as M, which is reasonably small for most applications including cryptography.
Osamu KAGAYA Yasuo MORIMOTO Takeshi MOTEGI Minoru INOMATA
This paper proposes a transparent glass quartz antenna for 5G-millimeter-wave-connected vehicles and clarifies the characteristics of signal reception when the glass antennas are placed on the windows of a vehicle traveling in an urban environment. Synthetic fused quartz is a material particularly suited for millimeter-wave devices owing to its excellent low transmission loss. Realizing synthetic fused quartz devices requires accurate micromachining technology specialized for the material coupled with the material technology. This paper presents a transparent antenna comprising a thin mesh pattern on a quartz substrate for installation on a vehicle window. A comparison of distributed transparent antennas and an omnidirectional antenna shows that the relative received power of the distributed antenna system is higher than that of the omnidirectional antenna. In addition, results show that the power received is similar when using vertically and horizontally polarized antennas. The design is verified in a field test using transparent antennas on the windows of a real vehicle.
In this paper, we propose a model of a diversity receiver which uses an antenna whose antenna pattern can periodically change. We also propose a minimum mean square error (MMSE) based interference cancellation method of the receiver which, in principle, can suffer from the interference in neighboring frequency bands. Since the antenna pattern changes according to the sum of sinusoidal waveforms with different frequencies, the received signals are received at the carrier frequency and the frequencies shifted from the carrier frequency by the frequency of the sinusoidal waveforms. The proposed diversity scheme combines the components in the frequency domain to maximize the signal-to-noise power ratio (SNR) and to maximize the diversity gain. We confirm that the bit error rate (BER) of the proposed receiver can be improved by increase in the number of arrival paths resulting in obtaining path diversity gain. We also confirm that the proposed MMSE based interference canceller works well when interference signals exist and achieves better BER performances than the conventional diversity receiver with maximum ratio combining.
Masaaki FUJIYOSHI Ruifeng LI Hitoshi KIYA
This paper proposes an encryption-then-compression (EtC) system-friendly data hiding scheme for images, where an EtC system compresses images after they are encrypted. The EtC system divides an image into non-overlapping blocks and applies four block-based processes independently and randomly to the image for visual encryption of the image. The proposed scheme hides data to a plain, i.e., unencrypted image and the scheme can take hidden data out from the image encrypted by the EtC system. Furthermore, the scheme serves reversible data hiding, so it can perfectly recover the unmarked image from the marked image whereas the scheme once distorts unmarked image for hiding data to the image. The proposed scheme copes with the three of four processes in the EtC system, namely, block permutation, rotation/flipping of blocks, and inverting brightness in blocks, whereas the conventional schemes for the system do not cope with the last one. In addition, these conventional schemes have to identify the encrypted image so that image-dependent side information can be used to extract embedded data and to restore the unmarked image, but the proposed scheme does not need such identification. Moreover, whereas the data hiding process must know the block size of encryption in conventional schemes, the proposed scheme needs no prior knowledge of the block size for encryption. Experimental results show the effectiveness of the proposed scheme.
Xina CHENG Ziken LI Songlin DU Takeshi IKENAGA
The spike height of volleyball players is important in volleyball analysis as the quantitative criteria to evaluation players' motions, which not only provides rich information to audiences in live broadcast of sports events but also makes contribution to evaluate and improve the performance of players in strategy analysis and players training. In the volleyball game scene, the high similarity between hands, the deformation and the occlusion are three main problems that influence the acquisition performance of spike height. To solve these problems, this paper proposes a body part connection, categorization and occlusion based observation model and a temporal position based correction method. Firstly, skin pixel filter based connection detection solves the problem of high similarity between hands by judging whether a hand is connected to the spike player. Secondly, the body part categorization based observation uses the probability distribution map of hand to determine the category of each body part to solve the deformation problem. Thirdly, the occlusion part detection based observation eliminates the influence of the views with occluded body part by detecting the occluded views with a trained classifier of body part. At last, the temporal position based result correction combines the estimated results, which refers the historical positions, and the posterior result to obtain an optimal result by degree of confidence. The experiments are based on the videos of final and semi-final games of 2014 Japan Inter High School Men's Volleyball in Tokyo Metropolitan Gymnasium, which includes 196 spike sequences of 4 teams. The experiment results of proposed methods are that: 93.37% of test sequences can be successfully detected the spike height, and in which the average error of spike height is 5.96cm.
Expectation propagation (EP) decoding is proposed for sparse superposition coding in orthogonal frequency division multiplexing (OFDM) systems. When a randomized discrete Fourier transform (DFT) dictionary matrix is used, the EP decoding has the same complexity as approximate message-passing (AMP) decoding, which is a low-complexity and powerful decoding algorithm for the additive white Gaussian noise (AWGN) channel. Numerical simulations show that the EP decoding achieves comparable performance to AMP decoding for the AWGN channel. For OFDM systems, on the other hand, the EP decoding is much superior to the AMP decoding while the AMP decoding has an error-floor in high signal-to-noise ratio regime.
Farzin MATIN Yoosoo JEONG Hanhoon PARK
Multiscale retinex is one of the most popular image enhancement methods. However, its control parameters, such as Gaussian kernel sizes, gain, and offset, should be tuned carefully according to the image contents. In this letter, we propose a new method that optimizes the parameters using practical swarm optimization and multi-objective function. The method iteratively verifies the visual quality (i.e. brightness, contrast, and colorfulness) of the enhanced image using a multi-objective function while subtly adjusting the parameters. Experimental results shows that the proposed method achieves better image quality qualitatively and quantitatively compared with other image enhancement methods.
Ying JI Yu WANG Jien KATO Kensaku MORI
With the rapid development of multimedia, violent video can be easily accessed in games, movies, websites, and so on. Identifying violent videos and rating violence extent is of great importance to media filtering and children protection. Many previous studies only address the problems of violence scene detection and violent action recognition, yet violence rating problem is still not solved. In this paper, we present a novel video-level rating prediction method to estimate violence extent automatically. It has two main characteristics: (1) a two-stream network is fine-tuned to construct effective representations of violent videos; (2) a violence rating prediction machine is designed to learn the strength relationship among different videos. Furthermore, we present a novel violent video dataset with a total of 1,930 human-involved violent videos designed for violence rating analysis. Each video is annotated with 6 fine-grained objective attributes, which are considered to be closely related to violence extent. The ground-truth of violence rating is given by pairwise comparison method. The dataset is evaluated in both stability and convergence. Experiment results on this dataset demonstrate the effectiveness of our method compared with the state-of-art classification methods.
In this paper, we propose L0 norm optimization in a scrambled sparse representation domain and its application to an Encryption-then-Compression (EtC) system. We design a random unitary transform that conserves L0 norm isometry. The resulting encryption method provides a practical orthogonal matching pursuit (OMP) algorithm that allows computation in the encrypted domain. We prove that the proposed method theoretically has exactly the same estimation performance as the nonencrypted variant of the OMP algorithm. In addition, we demonstrate the security strength of the proposed secure sparse representation when applied to the EtC system. Even if the dictionary information is leaked, the proposed scheme protects the privacy information of observed signals.
Yubo LIU Yangting LAI Jianyong CHEN Lingyu LIANG Qiaoming DENG
Computer aided design (CAD) technology is widely used for architectural design, but current CAD tools still require high-level design specifications from human. It would be significant to construct an intelligent CAD system allowing automatic architectural layout parsing (AutoALP), which generates candidate designs or predicts architectural attributes without much user intervention. To tackle these problems, many learning-based methods were proposed, and benchmark dataset become one of the essential elements for the data-driven AutoALP. This paper proposes a new dataset called SCUT-AutoALP for multi-paradigm applications. It contains two subsets: 1) Subset-I is for floor plan design containing 300 residential floor plan images with layout, boundary and attribute labels; 2) Subset-II is for urban plan design containing 302 campus plan images with layout, boundary and attribute labels. We analyzed the samples and labels statistically, and evaluated SCUT-AutoALP for different layout parsing tasks of floor plan/urban plan based on conditional generative adversarial networks (cGAN) models. The results verify the effectiveness and indicate the potential applications of SCUT-AutoALP. The dataset is available at https://github.com/designfuturelab702/SCUT-AutoALP-Database-Release.
Shigeru KOZONO Yuya TASHIRO Yuuki KANEMIYO Hiroaki NAKABAYASHI
In a multiple-user MIMO system in which numerous users simultaneously communicate in a cell, the channel matrix properties depend on the parameters of the individual users in such a way that they can be modeled as points randomly moving within the cell. Although these properties can be simulated by computer, they need to be expressed analytically to develop MIMO systems with diversity. Given a small area with an equivalent multi-path, we assume that a user u is at a certain “user point” $P^u(lambda _p^u,xi _p^u)$ in a cell, or (radius $lambda _p^u$ from origin, angle $xi _p^u)$ and that the user moves with movement $M^u(f_{max}^u, xi_v^u)$ around that point, or (Doppler frequency $f_{max}^u$, direction $xi_v^u$). The MU-MIMO channel model consists of a multipath environment, user parameters, and antenna configuration. A general formula of the correlation $ ho_{i - j,i' - j'}^{u - u'} (bm)$ between the channel matrix elements of users u and u' and one for given multipath conditions are derived. As a feature of the MU-MIMO channel, the movement factor $F^{u - u'}(gamma^u,xi_n ,xi_v^u)$, which means a fall coefficient of the spatial correlation calculated from only the user points of u and u', is also derived. As the difference in speed or direction between u and u' increases, $F^{u - u'}(gamma^u,xi_n ,xi_v^u)$ becomes smaller. Consequently, even if the path is LOS, $ ho_{i - j,i' - j'}^{u - u'} (bm)$ becomes low enough owing to the movement factor, even though the correlation in the single-user MIMO channel is high. If the parameters of u and u' are the same, the factor equals 1, and the channels correspond to the users' own channels and work like SU-MIMO channel. These analytical findings are verified by computer simulation.
Masayuki SHIMODA Youki SADA Ryosuke KURAMOCHI Shimpei SATO Hiroki NAKAHARA
In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.
Chenxu WANG Yutong LU Zhiguang CHEN Junnan LI
Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.