Duc A. HOANG Akira SUZUKI Tsuyoshi YAGITA
A vertex subset I of a graph G is called a k-path vertex cover if every path on k vertices in G contains at least one vertex from I. The K-PATH VERTEX COVER RECONFIGURATION (K-PVCR) problem asks if one can transform one k-path vertex cover into another via a sequence of k-path vertex covers where each intermediate member is obtained from its predecessor by applying a given reconfiguration rule exactly once. We investigate the computational complexity of K-PVCR from the viewpoint of graph classes under the well-known reconfiguration rules: TS, TJ, and TAR. The problem for k=2, known as the VERTEX COVER RECONFIGURATION (VCR) problem, has been well-studied in the literature. We show that certain known hardness results for VCR on different graph classes can be extended for K-PVCR. In particular, we prove a complexity dichotomy for K-PVCR on general graphs: on those whose maximum degree is three (and even planar), the problem is PSPACE-complete, while on those whose maximum degree is two (i.e., paths and cycles), the problem can be solved in polynomial time. Additionally, we also design polynomial-time algorithms for K-PVCR on trees under each of TJ and TAR. Moreover, on paths, cycles, and trees, we describe how one can construct a reconfiguration sequence between two given k-path vertex covers in a yes-instance. In particular, on paths, our constructed reconfiguration sequence is shortest.
Kyungmin KIM Jiung SONG Jong Wook KWAK
We propose a novel synthetic-benchmarks generation model using partial time-series regression, called Partial-Regression-Integrated Generic Model (PRIGM). PRIGM abstracts the unique characteristics of the input sensor data into generic time-series data confirming the generation similarity and evaluating the correctness of the synthetic benchmarks. The experimental results obtained by the proposed model with its formula verify that PRIGM preserves the time-series characteristics of empirical data in complex time-series data within 10.4% on an average difference in terms of descriptive statistics accuracy.
We consider a regulation problem for an uncertain chain of integrators with an unknown time-varying delay in the input. To deal with uncertain parameters and unknown delay, we propose an adaptive event-triggered controller with a dynamic gain. We show that the system is globally regulated and interexecution times are lower bounded. Moreover, we show that these lower bounds can be enlarged by adjusting a control parameter. An example is given for clear illustration.
This letter presents an innovative solution for real-time interaction during online classes. Synchronous sharing enables instructors to provide real-time feedback to students. This encourages students to stay focused and feel engaged during class. Consequently, students evaluated anonymously that this solution significantly enhanced their learning experience during real-time online classes.
Kentaro KAWAKAMI Kouji KURIHARA Masafumi YAMAZAKI Takumi HONDA Naoto FUKUMOTO
To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.
Jerdvisanop CHAKAROTHAI Katsumi FUJII Yukihisa SUZUKI Jun SHIBAYAMA Kanako WAKE
In this study, we develop a numerical method for determining transient energy deposition in biological bodies exposed to electromagnetic (EM) pulses. We use a newly developed frequency-dependent finite-difference time-domain (FD2TD) method, which is combined with the fast inverse Laplace transform (FILT) and Prony method. The FILT and Prony method are utilized to transform the Cole-Cole model of biological media into a sum of multiple Debye relaxation terms. Parameters of Debye terms are then extracted by comparison with the time-domain impulse responses. The extracted parameters are used in an FDTD formulation, which is derived using the auxiliary differential equation method, and transient energy deposition into a biological medium is calculated by the equivalent circuit method. The validity of our proposed method is demonstrated by comparing numerical results and those derived from an analytical method. Finally, transient energy deposition into human heads of TARO and HANAKO models is then calculated using the proposed method and, physical insights into pulse exposures of the human heads are provided.
Chen CHEN Wence ZHANG Xu BAO Jing XIA
This paper studies the performance of quantized massive multiple-input multiple-output (MIMO) systems with superimposed pilots (SP), using linear minimum mean-square-error (LMMSE) channel estimation and maximum ratio combining (MRC) detection. In contrast to previous works, arbitrary-bit analog-to-digital converters (ADCs) are considered. We derive an accurate approximation of the uplink achievable rate considering the removal of estimated pilots. Based on the analytical expression, the optimal pilot power factor that maximizes the achievable rate is deduced and an expression for energy efficiency (EE) is given. In addition, the achievable rate and the optimal power allocation policy under some asymptotic limits are analyzed. Analysis shows that the systems with higher-resolution ADCs or larger number of base station (BS) antennas need to allocate more power to pilots. In contrast, more power needs to be allocated to data when the channel is slowly varying. Numerical results show that in the low signal-to-noise ratio (SNR) region, for 1-bit quantizers, SP outperforms time-multiplexed pilots (TP) in most cases, while for systems with higher-resolution ADCs, the SP scheme is suitable for the scenarios with comparatively small number of BS antennas and relatively long channel coherence time.
Hao FANG Chi-Hua CHEN Dewang CHEN Feng-Jang HWANG
Aiming for accurate data-driven predictions for the passenger walking time, this study proposes a novel neuron-network-based mixture probability (NNBMP) model with repetition learning (RL) to estimate the probability density distribution of passenger walking time (PWT) in the metro station. Our conducted experiments for Fuzhou metro stations demonstrate that the proposed NNBMP-RL model achieved the mean absolute error, mean square error, and mean absolute percentage error of 0.0078, 1.33 × 10-4, and 19.41%, respectively, and it outperformed all the seven compared models. The developed NNBMP model fitting accurately the PWT distribution in the metro station is readily applicable to the microscopic analyses of passenger flow.
Masaki NAKAMURA Shuki HIGASHI Kazutoshi SAKAKIBARA Kazuhiro OGATA
Because processes run concurrently in multitask systems, the size of the state space grows exponentially. Therefore, it is not straightforward to formally verify that such systems enjoy desired properties. Real-time constrains make the formal verification more challenging. In this paper, we propose the following to address the challenge: (1) a way to model multitask real-time systems as observational transition systems (OTSs), a kind of state transition systems, (2) a way to describe their specifications in CafeOBJ, an algebraic specification language, and (3) a way to verify that such systems enjoy desired properties based on such formal specifications by writing proof scores, proof plans, in CafeOBJ. As a case study, we model Fischer's protocol, a well-known real-time mutual exclusion protocol, as an OTS, describe its specification in CafeOBJ, and verify that the protocol enjoys the mutual exclusion property when an arbitrary number of processes participates in the protocol*.
This paper presents a novel method for optimal control of timed Petri nets, introducing a novel temporal logic based constraint called a generalized mutual exclusion temporal constraint (GMETC). The GMETC is described by a metric temporal logic (MTL) formula where each atomic proposition represents a generalized mutual exclusion constraint (GMEC). We formulate an optimal control problem of the timed Petri nets under a given GMETC and solve the problem by transforming it into an integer linear programming problem where the MTL formula is encoded by linear inequalities. We show the effectiveness of the proposed approach by a numerical simulation.
Hequn LI Jiaxi LU Jinfa WANG Hai ZHAO Jiuqiang XU Xingchi CHEN
Real-time and scalable multicast services are of paramount importance to Industrial Internet of Things (IIoT) applications. To realize these services, the multicast algorithm should, on the one hand, ensure the maximum delay of a multicast session not exceeding its upper delay bound. On the other hand, the algorithm should minimize session costs. As an emerging networking paradigm, Software-defined Networking (SDN) can provide a global view of the network to multicast algorithms, thereby bringing new opportunities for realizing the desired multicast services in IIoT environments. Unfortunately, existing SDN-based multicast (SDM) algorithms cannot meet the real-time and scalable requirements simultaneously. Therefore, in this paper, we focus on SDM algorithm design for IIoT environments. To be specific, the paper first converts the multicast tree construction problem for SDM in IIoT environments into a delay-bounded least-cost shared tree problem and proves that it is an NP-complete problem. Then, the paper puts forward a shared tree (ST) algorithm called SDM4IIoT to compute suboptimal solutions to the problem. The algorithm consists of five steps: 1) construct a delay-optimal shared tree; 2) divide the tree into a set of subpaths and a subtree; 3) optimize the cost of each subpath by relaxing the delay constraint; 4) optimize the subtree cost in the same manner; 5) recombine them into a shared tree. Simulation results show that the algorithm can provide real-time support that other ST algorithms cannot. In addition, it can achieve good scalability. Its cost is only 20.56% higher than the cost-optimal ST algorithm. Furthermore, its computation time is also acceptable. The algorithm can help to realize real-time and scalable multicast services for IIoT applications.
Denghui YAO Xiaoyong ZHANG Zhengbo SUN Dexiu HU
Long-term coherent integration can significantly improve the ability to detect maneuvering targets by radar. Especially for weak targets, longer integration times are needed to improve. But for non-radially moving targets, the time-varying angle between target moving direction and radar line of sight will cause non-linear range migration (NLRM) and non-linear Doppler frequency migration (NLDFM) within long-time coherent processing, which precludes existing methods that ignore angle changes, and seriously degrades the performance of coherent integration. To solve this problem, an efficient method based on Radon Fourier transform (RFT) with modified variant angle model (ARFT) is proposed. In this method, a new parameter angle is introduced to optimize the target motion model, and the NLRM and NLDFM are eliminated by range-velocity-angle joint three-dimensional searching of ARFT. Compared with conventional algorithms, the proposed method can more accurately compensate for the NLRM and NLDFM, thus achieving better integration performance and detection probability for non-radial moving weak targets. Numerical simulations verify the effectiveness and advantages of the proposed method.
Jing WANG Yiyu LUO Weiming YI Xiang XIE
Speech separation is the task of extracting target speech while suppressing background interference components. In applications like video telephones, visual information about the target speaker is available, which can be leveraged for multi-speaker speech separation. Most previous multi-speaker separation methods are mainly based on convolutional or recurrent neural networks. Recently, Transformer-based Seq2Seq models have achieved state-of-the-art performance in various tasks, such as neural machine translation (NMT), automatic speech recognition (ASR), etc. Transformer has showed an advantage in modeling audio-visual temporal context by multi-head attention blocks through explicitly assigning attention weights. Besides, Transformer doesn't have any recurrent sub-networks, thus supporting parallelization of sequence computation. In this paper, we propose a novel speaker-independent audio-visual speech separation method based on Transformer, which can be flexibly applied to unknown number and identity of speakers. The model receives both audio-visual streams, including noisy spectrogram and speaker lip embeddings, and predicts a complex time-frequency mask for the corresponding target speaker. The model is made up by three main components: audio encoder, visual encoder and Transformer-based mask generator. Two different structures of encoders are investigated and compared, including ResNet-based and Transformer-based. The performance of the proposed method is evaluated in terms of source separation and speech quality metrics. The experimental results on the benchmark GRID dataset show the effectiveness of the method on speaker-independent separation task in multi-talker environments. The model generalizes well to unseen identities of speakers and noise types. Though only trained on 2-speaker mixtures, the model achieves reasonable performance when tested on 2-speaker and 3-speaker mixtures. Besides, the model still shows an advantage compared with previous audio-visual speech separation works.
Tomoyuki FURUICHI Mizuki MOTOYOSHI Suguru KAMEDA Takashi SHIBA Noriharu SUEMATSU
To reduce the complexity of direct radio frequency (RF) undersampling real-time spectrum monitoring in wireless Internet of Things (IoT) bands (920MHz, 2.4GHz, and 5 GHz bands), a design method of sampling frequencies is proposed in this paper. The Direct RF Undersampling receiver architecture enables the use of ADC with sampling clock lower frequency than receiving RF signal, but it needs RF signal identification signal processing from folded spectrums with multiple sampling clock frequencies. The proposed design method allows fewer sampling frequencies to be used than the conventional design method for continuous frequency range (D.C. to 5GHz-band). The proposed method reduced 2 sampling frequencies in wireless IoT bands case compared with the continuous range. The design result using the proposed method is verified by measurement.
Yue MA Chen MIAO Yuehua LI Wen WU
Near-field beamforming has played an important role in many scenarios such as radar imaging and acoustic detection. In this paper, the near-field beamforming is implemented in the time modulated array with the harmonic. The beam pattern with a low sidelobe level in precise position is achieved by controlling the switching sequence in time modulated cross array. Numerical results verify the correctness of the proposed method.
The challenges posed by big data in the 21st Century are complex: Under the previous common sense, we considered that polynomial-time algorithms are practical; however, when we handle big data, even a linear-time algorithm may be too slow. Thus, sublinear- and constant-time algorithms are required. The academic research project, “Foundations of Innovative Algorithms for Big Data,” which was started in 2014 and will finish in September 2021, aimed at developing various techniques and frameworks to design algorithms for big data. In this project, we introduce a “Sublinear Computation Paradigm.” Toward this purpose, we first provide a survey of constant-time algorithms, which are the most investigated framework of this area, and then present our recent results on sublinear progressive algorithms. A sublinear progressive algorithm first outputs a temporary approximate solution in constant time, and then suggests better solutions gradually in sublinear-time, finally finds the exact solution. We present Sublinear Progressive Algorithm Theory (SPA Theory, for short), which enables to make a sublinear progressive algorithm for any property if it has a constant-time algorithm and an exact algorithm (an exponential-time one is allowed) without losing any computation time in the big-O sense.
Taito MANABE Taichi KATAYAMA Yuichiro SHIBATA
Line detection is the fundamental image processing technique which has various applications in the field of computer vision. For example, lane keeping required to realize autonomous vehicles can be implemented based on line detection technique. For such purposes, however, low detection latency and power consumption are essential. Using hardware-based stream processing is considered as an effective way to achieve such properties since it eliminates the need of storing the whole frame into energy-consuming external memory. In addition, adopting FPGAs enables us to keep flexibility of software processing. The line segment detector (LSD) is the algorithm based on intensity gradient, and performs better than the well-known Hough transform in terms of processing speed and accuracy. However, implementing the original LSD on FPGAs as a pipeline structure is difficult mainly because of its iterative region growing approach. Therefore, we propose a simple and stream-friendly line segment detection algorithm based on the concept of LSD. The whole system is implemented on a Xilinx Zynq-7000 XC7Z020-1CLG400C FPGA without any external memory. Evaluation results reveal that the implemented system is able to detect line segments successfully and is compact with 7.5% of Block RAM and less than 7.0% of the other resources used, while maintaining 60 fps throughput for VGA videos. It is also shown that the system is power-efficient compared to software processing on CPUs.
Ryota EGUCHI Naoki KITAMURA Taisuke IZUMI
In the rendezvous problem, two computing entities (called agents) located at different vertices in a graph have to meet at the same vertex. In this paper, we consider the synchronous neighborhood rendezvous problem, where the agents are initially located at two adjacent vertices. While this problem can be trivially solved in O(Δ) rounds (Δ is the maximum degree of the graph), it is highly challenging to reveal whether that problem can be solved in o(Δ) rounds, even assuming the rich computational capability of agents. The only known result is that the time complexity of O($O(sqrt{n})$) rounds is achievable if the graph is complete and agents are probabilistic, asymmetric, and can use whiteboards placed at vertices. Our main contribution is to clarify the situation (with respect to computational models and graph classes) admitting such a sublinear-time rendezvous algorithm. More precisely, we present two algorithms achieving fast rendezvous additionally assuming bounded minimum degree, unique vertex identifier, accessibility to neighborhood IDs, and randomization. The first algorithm runs within $ ilde{O}(sqrt{nDelta/delta} + n/delta)$ rounds for graphs of the minimum degree larger than $sqrt{n}$, where n is the number of vertices in the graph, and δ is the minimum degree of the graph. The second algorithm assumes that the largest vertex ID is O(n), and achieves $ ilde{O}left( rac{n}{sqrt{delta}} ight)$-round time complexity without using whiteboards. These algorithms attain o(Δ)-round complexity in the case of $delta = {omega}(sqrt{n} log n)$ and δ=ω(n2/3log4/3n) respectively. We also prove that four unconventional assumptions of our algorithm, bounded minimum degree, accessibility to neighborhood IDs, initial distance one, and randomization are all inherently necessary for attaining fast rendezvous. That is, one can obtain the Ω(n)-round lower bound if either one of them is removed.
Yuki MONMA Kan ARO Muneki YASUDA
In this study, Bayesian image denoising, in which the prior distribution is assumed to be a Gaussian Markov random field (GMRF), is considered. Recently, an effective algorithm for Bayesian image denoising with a standard GMRF prior has been proposed, which can help implement the overall procedure and optimize its parameters in O(n)-time, where n is the size of the image. A new GMRF-type prior, referred to as a hierarchical GMRF (HGMRF) prior, is proposed, which is obtained by applying a hierarchical Bayesian approach to the standard GMRF prior; in addition, an effective denoising algorithm based on the HGMRF prior is proposed. The proposed HGMRF method can help implement the overall procedure and optimize its parameters in O(n)-time, as well as the previous GMRF method. The restoration quality of the proposed method is found to be significantly higher than that of the previous GMRF method as well as that of a non-local means filter in several cases. Furthermore, numerical evidence implies that the proposed HGMRF prior is more suitable for the image prior than the standard GMRF prior.
Takumi KOMORI Yutaka MASUDA Jun SHIOMI Tohru ISHIHARA
In the upcoming Internet of Things era, reducing energy consumption of embedded processors is highly desired. Minimum Energy Point Tracking (MEPT) is one of the most efficient methods to reduce both dynamic and static energy consumption of a processor. Previous works proposed a variety of MEPT methods over the past years. However, none of them incorporate their algorithms with practical real-time operating systems, although edge computing applications often require low energy task execution with guaranteeing real-time properties. The difficulty comes from the time complexity for identifying an MEP and changing voltages, which often prevents real-time task scheduling. The conventional Dynamic Voltage and Frequency Scaling (DVFS) only scales the supply voltage. On the other hand, MEPT needs to adjust the body bias voltage in addition. This additional tuning knob makes MEPT much more complicated. This paper proposes an approximate MEPT algorithm, which reduces the complexity of identifying an MEP down to that of DVFS. The key idea is to linearly approximate the relationship between the processor frequency, supply voltage, and body bias voltage. Thanks to the approximation, optimal voltages for a specified clock frequency can be derived immediately. We also propose a task scheduling algorithm, which adjusts processor performance to the workload and then provides a soft real-time capability to the system. The operating system stochastically adjusts the average response time of the processor to be equal to a specified deadline. MEPT will be performed as a general task, and its overhead is considered in the calculation of the frequency. The experiments using a fabricated test chip and on-chip sensors show that the proposed algorithm is a maximum of 16 times more energy-efficient than DVFS. Also, the energy loss induced by the approximation is only 3% at most, and the algorithm does not sacrifice the fundamental real-time properties.