This paper presents MDX-Mixer, which improves music demixing (MDX) performance by leveraging source signals separated by multiple existing MDX models. Deep-learning-based MDX models have improved their separation performances year by year for four kinds of sound sources: “vocals,” “drums,” “bass,” and “other”. Our research question is whether mixing (i.e., weighted sum) the signals separated by state-of-the-art MDX models can obtain either the best of everything or higher separation performance. Previously, in singing voice separation and MDX, there have been studies in which separated signals of the same sound source are mixed with each other using time-invariant or time-varying positive mixing weights. In contrast to those, this study is novel in that it allows for negative weights as well and performs time-varying mixing using all of the separated source signals and the music acoustic signal before separation. The time-varying weights are estimated by modeling the music acoustic signals and their separated signals by dividing them into short segments. In this paper we propose two new systems: one that estimates time-invariant weights using 1×1 convolution, and one that estimates time-varying weights by applying the MLP-Mixer layer proposed in the computer vision field to each segment. The latter model is called MDX-Mixer. Their performances were evaluated based on the source-to-distortion ratio (SDR) using the well-known MUSDB18-HQ dataset. The results show that the MDX-Mixer achieved higher SDR than the separated signals given by three state-of-the-art MDX models.
Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA
In high-speed data communication systems, it is important to evaluate the quality of the transmitted signal at the receiver. At a high-speed data rate, the transmission line characteristics act as a high-frequency attenuator and contribute to the intersymbol interference (ISI) at the receiver. To evaluate ISI conditions, eye diagrams are widely used to analyze signal quality and visualize the ISI effect as an eye-opening rate. Various types of on-chip eye-opening monitors (EOM) have been proposed to adjust waveform-shaping circuits. However, the eye diagram evaluation of multi-valued signaling becomes more difficult than that of binary transmission because of the complicated signal transition patterns. Moreover, in severe ISI situations where the eye is completely closed, eye diagram evaluation does not work well. This paper presents a novel evaluation method using Two-dimensional(2D) symbol mapping and a linear mixture model (LMM) for multi-valued data transmission. In our proposed method, ISI evaluation can be realized by 2D symbol mapping, and an efficient quantitative analysis can be realized using the LMM. An experimental demonstration of four leveled pulse amplitude modulation(PAM-4) data transmission using a Cat5e cable 100 m is presented. The experimental results show that the proposed method can extract features of the ISI effect even though the eye is completely closed in the server condition.
Ryuta TAMURA Yuichi TAKANO Ryuhei MIYASHIRO
We study the mixed-integer optimization (MIO) approach to feature subset selection in nonlinear kernel support vector machines (SVMs) for binary classification. To measure the performance of subset selection, we use the distance between two classes (DBTC) in a high-dimensional feature space based on the Gaussian kernel function. However, DBTC to be maximized as an objective function is nonlinear, nonconvex and nonconcave. Despite the difficulty of linearizing such a nonlinear function in general, our major contribution is to propose a mixed-integer linear optimization (MILO) formulation to maximize DBTC for feature subset selection, and this MILO problem can be solved to optimality using optimization software. We also derive a reduced version of the MILO problem to accelerate our MILO computations. Experimental results show good computational efficiency for our MILO formulation with the reduced problem. Moreover, our method can often outperform the linear-SVM-based MILO formulation and recursive feature elimination in prediction performance, especially when there are relatively few data instances.
Takumi KOMORI Yutaka MASUDA Tohru ISHIHARA
Recent embedded systems require both traditional machinery control and information processing, such as network and GUI handling. A dual-OS platform consolidates a real-time OS (RTOS) and general-purpose OS (GPOS) to realize efficient software development on one physical processor. Although the dual-OS platform attracts increasing attention, it often suffers from energy inefficiency in the GPOS for guaranteeing real-time responses of the RTOS. This paper proposes an energy minimization method called DVFS virtualization, which allows running multiple DVFS policies dedicated to the RTOS and GPOS, respectively. The experimental evaluation using a commercial microcontroller showed that the proposed hardware could change the supply voltage within 500 ns and reduce the energy consumption of typical applications by 60 % in the best case compared to conventional dual-OS platforms. Furthermore, evaluation using a commercial microprocessor achieved a 15 % energy reduction of practical open-source software at best.
Kyohei MURAKATA Koichi KOBAYASHI Yuh YAMASHITA
The multi-agent surveillance problem is to find optimal trajectories of multiple agents that patrol a given area as evenly as possible. In this paper, we consider the multi-agent surveillance problem based on travel cost minimization. The surveillance area is given by an undirected graph. The penalty for each agent is introduced to evaluate the surveillance performance. Through a mixed logical dynamical system model, the multi-agent surveillance problem is reduced to a mixed integer linear programming (MILP) problem. In model predictive control, trajectories of agents are generated by solving the MILP problem at each discrete time. Furthermore, a condition that the MILP problem is always feasible is derived based on the Chinese postman problem. Finally, the proposed method is demonstrated by a numerical example.
Siyi HU Makiko ITO Takahide YOSHIKAWA Yuan HE Hiroshi NAKAMURA Masaaki KONDO
Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.
Shinji NIMURA Shota ISHIMURA Kazuki TANAKA Kosuke NISHIMURA Ryo INOHARA
In 5th generation (5G) and Beyond 5G mobile communication systems, it is expected that numerous antennas will be densely deployed to realize ultra-broadband communication and uniform coverage. However, as the number of antennas increases, total power consumption of all antennas will also increase, which leads to a negative impact on the environment and operating costs of telecommunication operators. Thus, it is necessary to simplify an antenna structure to suppress the power consumption of each antenna. On the other hand, as a way to realize ultra-broadband communication, millimeter waves will be utilized because they can transmit signals with a broader bandwidth than lower frequencies. However, since millimeter waves have a large propagation loss, a propagation distance is shorter than that of low frequencies. Therefore, in order to extend the propagation distance, it is necessary to increase an equivalent isotropic radiated power by beamforming with phased array antenna. In this paper, a phased antenna array module in combined with analog radio over fiber (A-RoF) technology for 40-GHz millimeter wave is developed and evaluated for the first time. An 8×8 phased array antenna for 40-GHz millimeter wave with integrated photodiodes and RF chains has been developed, and end-to-end transmission experiment including 20km A-RoF transmission and 3-m over-the-air transmission from the developed phased array antenna has been conducted. The results showed that the 40-GHz RF signal after the end-to-end transmission satisfied the criteria of 3GPP signal quality requirements within ±50 degrees of main beam direction.
Bandpass filters (BPFs) are very important to extract target signals and eliminate noise from the received signals. A BPF of which frequency characteristics is a sum of Gaussian functions is called the Gaussian mixture BPF (GMBPF). In this research, we propose to implement the GMBPF approximately by the sum of several frequency components of the sliding Fourier transform (SFT) or the attenuated SFT (ASFT). Because a component of the SFT/ASFT can be approximately realized using the finite impulse response (FIR) recursive filters, its calculation complexity does not depend on the length of the impulse response. The property makes GMBPF ideal for narrow bandpass filtering applications. We conducted experiments to demonstrate the advantages of the proposed GMBPF over FIR filters designed by a MATLAB function with regard to the computational complexity.
Atsushi MATSUO Wakaki HATTORI Shigeru YAMASHITA
Mixed-Polarity Multiple-Control Toffoli (MPMCT) gates are generally used to implement large control logic functions for quantum computation. A logic circuit consisting of MPMCT gates needs to be mapped to a quantum computing device that invariably has a physical limitation, which means we need to (1) decompose the MPMCT gates into one- or two-qubit gates, and then (2) insert SWAP gates so that all the gates can be performed on Nearest Neighbor Architectures (NNAs). Up to date, the above two processes have only been studied independently. In this work, we investigate that the total number of gates in a circuit can be decreased if the above two processes are considered simultaneously as a single step. We developed a method that inserts SWAP gates while decomposing MPMCT gates unlike most of the existing methods. Also, we consider the effect on the latter part of a circuit carefully by considering the qubit placement when decomposing an MPMCT gate. Experimental results demonstrate the effectiveness of our method.
Chongren ZHAO Yinhui ZHANG Zifen HE Yunnan DENG Ying HUANG Guangchen CHEN
Aiming at the problem of spatial focus regions distribution dispersion and dislocation in feature pyramid networks and insufficient feature dependency acquisition in both spatial and channel dimensions, this paper proposes a spatial-temporal aggregated shuffle attention for video instance segmentation (STASA-VIS). First, an mixed subsampling (MS) module to embed activating features from the low-level target area of feature pyramid into the high-level is designed, so as to aggregate spatial information on target area. Taking advantage of the coherent information in video frames, STASA-VIS uses the first ones of every 5 video frames as the key-frames and then propagates the keyframe feature maps of the pyramid layers forward in the time domain, and fuses with the non-keyframe mixed subsampled features to achieve time-domain consistent feature aggregation. Finally, STASA-VIS embeds shuffle attention in the backbone to capture the pixel-level pairwise relationship and dimensional dependencies among the channels and reduce the computation. Experimental results show that the segmentation accuracy of STASA-VIS reaches 41.2%, and the test speed reaches 34FPS, which is better than the state-of-the-art one stage video instance segmentation (VIS) methods in accuracy and achieves real-time segmentation.
Kenshiro KATO Daichi WATARI Ittetsu TANIGUCHI Takao ONOYE
Solar energy is an important energy resource for a sustainable society and is massively introduced these days. Household generally sells their excess solar energy by the reverse power flow, but the massive reverse power flow usually sacrifices the grid stability. In order to utilize renewable energy effectively and reduce solar energy waste, electric vehicles (EVs) takes an important role to fill in the spatiotemporal gap of solar energy. This paper proposes a novel EV aggregation framework for spatiotemporal shifting of solar energy without any reverse power flow. The proposed framework causes charging and discharging via an EV aggregator by intentionally changing the price, and the solar energy waste is expected to reduce by the energy trade. Simulation results show the proposed framework reduced the solar energy waste by 68%.
Yifang BAO Shigeru YAMASHITA Bing LI Tsung-Yi HO
When we use a Programmable Microfluidic Device (PMD), we need to wash some contaminated area to use the chip for further experiments. Recently, a novel washing technique called Block-Flushing has been proposed. Block-Flushing washes contaminated area in PMDs by using buffer flows. In Block-Flushing, we need to keep a buffer flow from an input port to an output port of a PMD for a long period to dissolve residual contaminants. Thus, we may need a lot of buffer fluids and washing time even if the contaminated area is small. Another disadvantage of the washing method by Block-Flushing is such that we may not able to clean residual contaminants at valves completely by only buffer flows. To address the above-mentioned issues, this paper proposes a totally new idea to wash PMDs; our method does not use buffer flows, but washes contaminated area by using mixers. By using a mixer, we can dissolve residual contaminants at valves in the area of the mixer very efficiently. In this paper, we propose two methods to wash PMDs by using mixers. The first method can wash the whole chip area by using only four times of a single 2x2-mixer time. We also propose the second method which is a heuristic to reduce the number of moving valves because valves may wear down if they are used many times. We also show some experimental results to confirm that the second method can indeed decrease the number of used valves.
Jie LI Sai LI Abdul Hayee SHAIKH
In this manuscript, we propose a joint channel and power assignment algorithm for an unmanned aerial vehicle (UAV) swarm communication system based on multi-agent deep reinforcement learning (DRL). Regarded as an agent, each UAV to UAV (U2U) link can choose the optimal channel and power according to the current situation after training is successfully completed. Further, a mixing network is introduced based on DRL, where Q values of every single agent are non-linearly mapped, and we call it the QMIX algorithm. As it accesses state information, QMIX can learn to enrich the joint action value function. The proposed method can be used for both unicast and multicast scenarios. Experiments show that each U2U link can be trained to meet the constraints of UAV communication and minimize the interference to the system. For unicast communication, the communication rate is increased up to 15.6% and 8.9% using the proposed DRL method compared with the well-known random and adaptive methods, respectively. For multicast communication, the communication rate is increased up to 6.7% using the proposed QMIX method compared with the DRL method and 13.6% using DRL method compared with adaptive method. Besides, the successful transmission probability can maintain a high level.
Yuncheng ZHANG Bangan LIU Teruki SOMEYA Rui WU Junjun QIU Atsushi SHIRANE Kenichi OKADA
This paper presents a fully integrated yet compact receiver front-end for Sub-GHz applications such as Internet-of-Things (IoT). The low noise amplifier (LNA) matching network leverages an inductance boosting technique. A relatively small on-chip inductor with a compact area achieves impedance matching in such a low frequency. Moreover, a passive-mixer-first mode bypasses the LNA to extend the receiver dynamic-range. The passive mixer provides matching to the 50Ω antenna interface to eliminate the need for additional passive components. Therefore, the receiver can be fully-integrated without any off-chip matching components. The flipped-voltage-follower (FVF) cell is adopted in the low pass filter (LPF) and the variable gain amplifier (VGA) for its high linearity and low power consumption. Fabricated in 65nm LP CMOS process, the proposed receiver front-end occupies 0.37mm2 core area, with a tolerable input power ranging from -91.5dBm to -1dBm for 500kbps GMSK signal at 924MHz frequency. The power consumption is 1mW power under a 1.2V supply.
Numerous variable tap-length algorithms can be found in some literature and few strategies are derived from a basic theoretical formula. Thus, some algorithms lack of theoretical depth and their performance are unstable. In view of this point, the novel variable tap-length algorithm which is based on the mixed error cost function is presented in this letter. By analyzing the mixed expectation of the prior and the posterior error, the novel variable tap-length strategy is derived. The proposed algorithm has a more valid proximity to the optimal tap-length and a good convergence ability by the performance analysis. It can solve many deficiencies comprising large fluctuations of the tap-length, the high complexity and the weak steady-state ability. Simulation results demonstrate that the proposed algorithm equips good performance.
Hao FANG Chi-Hua CHEN Dewang CHEN Feng-Jang HWANG
Aiming for accurate data-driven predictions for the passenger walking time, this study proposes a novel neuron-network-based mixture probability (NNBMP) model with repetition learning (RL) to estimate the probability density distribution of passenger walking time (PWT) in the metro station. Our conducted experiments for Fuzhou metro stations demonstrate that the proposed NNBMP-RL model achieved the mean absolute error, mean square error, and mean absolute percentage error of 0.0078, 1.33 × 10-4, and 19.41%, respectively, and it outperformed all the seven compared models. The developed NNBMP model fitting accurately the PWT distribution in the metro station is readily applicable to the microscopic analyses of passenger flow.
Zhongqiang LUO Chaofu JING Chengjie LI
Nonnegative Matrix Factorization (NMF) is a promising data-driven matrix decomposition method, and is becoming very active and attractive in machine learning and blind source separation areas. So far NMF algorithm has been widely used in diverse applications, including image processing, anti-collision for Radio Frequency Identification (RFID) systems and audio signal analysis, and so on. However the typical NMF algorithms cannot work well in underdetermined mixture, i.e., the number of observed signals is less than that of source signals. In practical applications, adding suitable constraints fused into NMF algorithm can achieve remarkable decomposition results. As a motivation, this paper proposes to add the minimum volume and minimum correlation constrains (MCV) to the NMF algorithm, which makes the new algorithm named MCV-NMF algorithm suitable for underdetermined scenarios where the source signals satisfy mutual independent assumption. Experimental simulation results validate that the MCV-NMF algorithm has a better performance improvement in solving RFID tag anti-collision problem than that of using the nearest typical NMF method.
Hiroki NISHIMOTO Renyuan ZHANG Yasuhiko NAKASHIMA
The efficient implementation strategy for speeding up high-quality clustering algorithms is developed on the basis of general purpose graphic processing units (GPGPUs) in this work. Among various clustering algorithms, a sophisticated Gaussian mixture model (GMM) by estimating parameters through variational Bayesian (VB) mechanism is conducted due to its superior performances. Since the VB-GMM methodology is computation-hungry, the GPGPU is employed to carry out massive matrix-computations. To efficiently migrate the conventional CPU-oriented schemes of VB-GMM onto GPGPU platforms, an entire migration-flow with thirteen stages is presented in detail. The CPU-GPGPU co-operation scheme, execution re-order, and memory access optimization are proposed for optimizing the GPGPU utilization and maximizing the clustering speed. Five types of real-world applications along with relevant data-sets are introduced for the cross-validation. From the experimental results, the feasibility of implementing VB-GMM algorithm by GPGPU is verified with practical benefits. The proposed GPGPU migration achieves 192x speedup in maximum. Furthermore, it succeeded in identifying the proper number of clusters, which is hardly conducted by the EM-algotihm.
Naohiro TAWARA Atsunori OGAWA Tomoharu IWATA Hiroto ASHIKAWA Tetsunori KOBAYASHI Tetsuji OGAWA
Most conventional multi-source domain adaptation techniques for recurrent neural network language models (RNNLMs) are domain-centric. In these approaches, each domain is considered independently and this makes it difficult to apply the models to completely unseen target domains that are unobservable during training. Instead, our study exploits domain attributes, which represent common knowledge among such different domains as dialects, types of wordings, styles, and topics, to achieve domain generalization that can robustly represent unseen target domains by combining the domain attributes. To achieve attribute-based domain generalization system in language modeling, we introduce domain attribute-based experts to a multi-stream RNNLM called recurrent adaptive mixture model (RADMM) instead of domain-based experts. In the proposed system, a long short-term memory is independently trained on each domain attribute as an expert model. Then by integrating the outputs from all the experts in response to the context-dependent weight of the domain attributes of the current input, we predict the subsequent words in the unseen target domain and exploit the specific knowledge of each domain attribute. To demonstrate the effectiveness of our proposed domain attributes-centric language model, we experimentally compared the proposed model with conventional domain-centric language model by using texts taken from multiple domains including different writing styles, topics, dialects, and types of wordings. The experimental results demonstrated that lower perplexity can be achieved using domain attributes.
Akihito HIRAI Kazutomi MORI Masaomi TSURU Mitsuhiro SHIMOZAWA
This paper demonstrates that a 360° radio-frequency phase detector consisting of a combination of symmetrical mixers and 45° phase shifters with tunable devices can achieve a low phase-detection error over a wide frequency range. It is shown that the phase detection error does not depend on the voltage gain of the 45° phase shifter. This allows the usage of tunable devices as 45° phase shifters for a wide frequency range with low phase-detection errors. The fabricated phase detector having tunable low-pass filters as the tunable device demonstrates phase detection errors lower than 2.0° rms in the frequency range from 3.0 GHz to 10.5 GHz.