Shunsuke TSUKADA Hikaru TAKAYASHIKI Masayuki SATO Kazuhiko KOMATSU Hiroaki KOBAYASHI
A hybrid memory architecture (HMA) that consists of some distinct memory devices is expected to achieve a good balance between high performance and large capacity. Unlike conventional memory architectures, the HMA needs the metadata for data management since the data are migrated between the memory devices during the execution of an application. The memory controller caches the metadata to avoid accessing the memory devices for the metadata reference. However, as the amount of the metadata increases in proportion to the size of the HMA, the memory controller needs to handle a large amount of metadata. As a result, the memory controller cannot cache all the metadata and increases the number of metadata references. This results in an increase in the access latency to reach the target data and degrades the performance. To solve this problem, this paper proposes a metadata prefetching mechanism for HMAs. The proposed mechanism loads the metadata needed in the near future by prefetching. Moreover, to increase the effect of the metadata prefetching, the proposed mechanism predicts the metadata used in the near future based on an address difference that is the difference between two consecutive access addresses. The evaluation results show that the proposed metadata prefetching mechanism can improve the instructions per cycle by up to 44% and 9% on average.
In this study, we aim to improve the performance of audio source separation for monaural mixture signals. For monaural audio source separation, semisupervised nonnegative matrix factorization (SNMF) can achieve higher separation performance by employing small supervised signals. In particular, penalized SNMF (PSNMF) with orthogonality penalty is an effective method. PSNMF forces two basis matrices for target and nontarget sources to be orthogonal to each other and improves the separation accuracy. However, the conventional orthogonality penalty is based on an inner product and does not affect the estimation of the basis matrix properly because of the scale indeterminacy between the basis and activation matrices in NMF. To cope with this problem, a new PSNMF with cosine similarity between the basis matrices is proposed. The experimental comparison shows the efficacy of the proposed cosine similarity penalty in supervised audio source separation.
Tomohiro YAMAZAKI Hisashi KOGA
We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.
A variety of smart services are being provided on multiple virtual networks embedded into a common inter-cloud substrate network. The substrate network operator deploys critical substrate nodes so that multiple service providers can achieve enhanced services due to the secure sharing of their service data. Even if one of the critical substrate nodes incurs damage, resiliency of the enhanced services can be assured due to reallocation of the workload and periodic backup of the service data to the other normal critical substrate nodes. However, the connectivity of the embedded virtual networks must be maintained so that the enhanced services can be continuously provided to all clients on the virtual networks. This paper considers resilient virtual network embedding (VNE) that ensures the connectivity of the embedded virtual networks after critical substrate node failures have occurred. The resilient VNE problem is formulated using an integer linear programming model and a distance-based method is proposed to solve the large-scale resilient VNE problem efficiently. Simulation results demonstrate that the distance-based method can derive a sub-optimum VNE solution with a small computational effort. The method derived a VNE solution with an approximation ratio of less than 1.2 within ten seconds in all the simulation experiments.
A linear and broadband power amplifier (PA) for 5G phased-array is presented. The design improves the linearity by operating the transistors in deep class AB region. The design broadens the bandwidth by applying the inter-stage weakly-coupled transformer. The theory of transformers is illustrated by analyzing the odd- and even-mode model. Based on this, the odd-mode Q factor is used to evaluate the quality of impedance matching. Weakly- and strongly-coupled transformers are compared and analyzed in both the design process and applicable characteristics. Besides, a well-founded method to achieve the transformer-based balanced-unbalanced transformation is proposed. The fully integrated two-stage PA is designed and implemented in a 65-nm CMOS process with a 1-V power supply to provide a maximum small-signal gain of 19dB. The maximum output 1-dB compressed power (P1dB) of 17.4dBm and the saturated output power (PSAT) of 18dBm are measured at 28GHz. The power-added efficiency (PAE) of the P1dB is 26.5%. From 23 to 32GHz, the measured P1dB is above 16dBm, covering the potential 5G bands worldwide around 28GHz.
Tomoya HASHIGUCHI Takehiro YAMAMOTO Sumio FUJITA Hiroaki OHSHIMA
In this study, we generate dialogue contents in which two systems discuss their distress with each other. The user inputs sentences that include environment and feelings of distress. The system generates the dialogue content from the input. In this study, we created dialogue data about distress in order to generate them using deep learning. The generative model fine-tunes the GPT of the pre-trained model using the TransferTransfo method. The contribution of this study is the creation of a conversational dataset using publicly available data. This study used EmpatheticDialogues, an existing empathetic dialogue dataset, and Reddit r/offmychest, a public data set of distress. The models fine-tuned with each data were evaluated both automatically (such as by the BLEU and ROUGE scores) and manually (such as by relevance and empathy) by human assessors.
Da LI Yuanyuan WANG Rikuya YAMAMOTO Yukiko KAWAI Kazutoshi SUMIYA
Recently, machine learning approaches and user movement history analysis on mobile devices have attracted much attention. Generally, we need to apply text data into the word embedding tool for acquiring word vectors as the preprocessing of machine learning approaches. However, it is difficult for mobile devices to afford the huge cost of high-dimensional vector calculation. Thus, a low-cost user behavior and user movement history analysis approach should be considered. To address this issue, firstly, we convert the zip code and street house number into vectors instead of textual address information to reduce the cost of spatial vector calculation. Secondly, we propose a low-cost high-performance semantic and physical distance (real distance) calculation method that applied zip-code-based vectors. Finally, to verify the validity of our proposed method, we utilize the US zip code data to calculate both semantic and physical distances and compare their results with the previous method. The experimental results showed that our proposed method could significantly improve the performance of distance calculation and effectively control the cost to a low level.
Chen CHEN Wence ZHANG Xu BAO Jing XIA
This paper studies the performance of quantized massive multiple-input multiple-output (MIMO) systems with superimposed pilots (SP), using linear minimum mean-square-error (LMMSE) channel estimation and maximum ratio combining (MRC) detection. In contrast to previous works, arbitrary-bit analog-to-digital converters (ADCs) are considered. We derive an accurate approximation of the uplink achievable rate considering the removal of estimated pilots. Based on the analytical expression, the optimal pilot power factor that maximizes the achievable rate is deduced and an expression for energy efficiency (EE) is given. In addition, the achievable rate and the optimal power allocation policy under some asymptotic limits are analyzed. Analysis shows that the systems with higher-resolution ADCs or larger number of base station (BS) antennas need to allocate more power to pilots. In contrast, more power needs to be allocated to data when the channel is slowly varying. Numerical results show that in the low signal-to-noise ratio (SNR) region, for 1-bit quantizers, SP outperforms time-multiplexed pilots (TP) in most cases, while for systems with higher-resolution ADCs, the SP scheme is suitable for the scenarios with comparatively small number of BS antennas and relatively long channel coherence time.
Wen SHAO Rei KAWAKAMI Takeshi NAEMURA
Previous studies on anomaly detection in videos have trained detectors in which reconstruction and prediction tasks are performed on normal data so that frames on which their task performance is low will be detected as anomalies during testing. This paper proposes a new approach that involves sorting video clips, by using a generative network structure. Our approach learns spatial contexts from appearances and temporal contexts from the order relationship of the frames. Experiments were conducted on four datasets, and we categorized the anomalous sequences by appearance and motion. Evaluations were conducted not only on each total dataset but also on each of the categories. Our method improved detection performance on both anomalies with different appearance and different motion from normality. Moreover, combining our approach with a prediction method produced improvements in precision at a high recall.
Sejin JUNG Eui-Sub KIM Junbeom YOO
Traditional safety analysis techniques have shown difficulties in incorporating dynamically changing structures of CPSs (Cyber-Physical Systems). STPA (System-Theoretic Process Analysis), one of the widely used, needs to unfold and arrange all hidden structures before beginning a full-fledged analysis. This paper proposes an intermediate model “Information Unfolding Model (IUM)” and a process “Information Unfolding Process (IUP)” to unfold dynamic structures which are hidden in CPSs and so help analysts construct control structures in STPA thoroughly.
Hao FANG Chi-Hua CHEN Dewang CHEN Feng-Jang HWANG
Aiming for accurate data-driven predictions for the passenger walking time, this study proposes a novel neuron-network-based mixture probability (NNBMP) model with repetition learning (RL) to estimate the probability density distribution of passenger walking time (PWT) in the metro station. Our conducted experiments for Fuzhou metro stations demonstrate that the proposed NNBMP-RL model achieved the mean absolute error, mean square error, and mean absolute percentage error of 0.0078, 1.33 × 10-4, and 19.41%, respectively, and it outperformed all the seven compared models. The developed NNBMP model fitting accurately the PWT distribution in the metro station is readily applicable to the microscopic analyses of passenger flow.
A hubness-score based normalization of the pairwise similarity is proposed for the sequence-alignment based cover song retrieval. The hubness, which is the tendency of some data points in high-dimensional data sets to link more frequently to other points than the rest of the points from the set, is widely-known to deteriorate the information retrieval accuracy. This paper tries to relieve the performance degradation due to the hubness by normalizing the pairwise similarity with a hubness score. Experiments on two cover song datasets confirm that the proposed similarity normalization improves the cover song retrieval accuracy.
Shinpei HAYASHI Keisuke ASANO Motoshi SAEKI
Goal refinement is a crucial step in goal-oriented requirements analysis to create a goal model of high quality. Poor goal refinement leads to missing requirements and eliciting incorrect requirements as well as less comprehensiveness of produced goal models. This paper proposes a technique to automate detecting bad smells of goal refinement, symptoms of poor goal refinement. At first, to clarify bad smells, we asked subjects to discover poor goal refinement concretely. Based on the classification of the specified poor refinement, we defined four types of bad smells of goal refinement: Low Semantic Relation, Many Siblings, Few Siblings, and Coarse Grained Leaf, and developed two types of measures to detect them: measures on the graph structure of a goal model and semantic similarity of goal descriptions. We have implemented a supporting tool to detect bad smells and assessed its usefulness by an experiment.
Yanyan ZHANG Meiling SHEN Wensheng YANG
We propose a target detection network (RMF-Net) based on the multi-scale strategy to solve the problems of large differences in the detection scale and mutual occlusion, which result in inaccurate locations. A multi-layer feature fusion module and multi-expansion dilated convolution pyramid module were designed based on the ResNet-101 residual network. The ability of the network to express the multi-scale features of the target could be improved by combining the shallow and deep features of the target and expanding the receptive field of the network. Moreover, RoI Align pooling was introduced to reduce the low accuracy of the anchor frame caused by multiple quantizations for improved positioning accuracy. Finally, an AD-IoU loss function was designed, which can adaptively optimise the distance between the prediction box and real box by comprehensively considering the overlap rate, centre distance, and aspect ratio between the boxes and can improve the detection accuracy of the occlusion target. Ablation experiments on the RMF-Net model verified the effectiveness of each factor in improving the network detection accuracy. Comparative experiments were conducted on the Pascal VOC2007 and Pascal VOC2012 datasets with various target detection algorithms based on convolutional neural networks. The results demonstrated that RMF-Net exhibited strong scale adaptability at different occlusion rates. The detection accuracy reached 80.4% and 78.5% respectively.
Duc Minh NGUYEN Hiroshi SHIRAI
In this study, edge diffraction of an electromagnetic plane wave by two-dimensional conducting wedges has been analyzed by the physical optics (PO) method for both E and H polarizations. Non-uniform and uniform asymptotic solutions of diffracted fields have been derived. A unified edge diffraction coefficient has also been derived with four cotangent functions from the conventional angle-dependent coefficients. Numerical calculations have been made to compare the results with those by other methods, such as the exact solution and the uniform geometrical theory of diffraction (UTD). A good agreement has been observed to confirm the validity of our method.
With the high development of computation requirements in Internet of Things, resource-limited edge servers usually require to cooperate to perform the tasks. Most related studies usually assume a static cooperation approach which might not suit the dynamic environment of edge computing. In this paper, we consider a dynamic cooperation approach by guiding edge servers to form coalitions dynamically. It raises two issues: 1) how to guide them to optimally form coalitions and 2) how to cope with the dynamic feature where server statuses dynamically change as the tasks are performed. The coalitional Markov decision process (CMDP) model proposed in our previous work can handle these issues well. However, its basic solution, coalitional Q-learning, cannot handle the large scale problem when the task number is large in edge computing. Our response is to propose a novel algorithm called deep coalitional Q-learning (DCQL) to solve it. To sum up, we first formulate the dynamic cooperation problem of edge servers as a CMDP: each edge server is regarded as an agent and the dynamic process is modeled as a MDP where the agents observe the current state to formulate several coalitions. Each coalition takes an action to impact the environment which correspondingly transfers to the next state to repeat the above process. Then, we propose DCQL which includes a deep neural network and so can well cope with large scale problem. DCQL can guide the edge servers to form coalitions dynamically with the target of optimizing some goal. Furthermore, we run experiments to verify our proposed algorithm's effectiveness in different settings.
Xiuping PENG Hongxiao LI Hongbin LIN
In this letter, the almost binary sequence (sequence with a single zero element) is considered as a special class of binary sequence. Four new bounds on the cross-correlation of balanced (almost) binary sequences with period Q ≡ 1(mod 4) under the precondition of out-of-phase autocorrelation values {-1} or {1, -3} are firstly presented. Then, seven new pairs of balanced (almost) binary sequences of period Q with ideal or optimal autocorrelation values and meeting the lower cross-correlation bounds are proposed by using cyclotomic classes of order 4. These new bounds of (almost) binary sequences with period Q achieve smaller maximum out-of-phase autocorrelation values and cross-correlation values.
In this paper, we present a scheme to compute either AB or AB2 multiplications over GF(2m) and propose a bit-parallel systolic architecture based on the proposed algorithm. The AB multiplication algorithm is derived in the same form as the formula of AB2 multiplication algorithm, and an architecture that can perform AB multiplication by adding very little extra hardware to AB2 multiplier is designed. Therefore, the proposed architecture can be effectively applied to hardware constrained applications that cannot deploy AB2 multiplier and AB multiplier separately.
Kei FUJIMOTO Masashi KANEKO Kenichi MATSUI Masayuki AKUTSU
Packet processing on commodity hardware is a cost-efficient and flexible alternative to specialized networking hardware. However, virtualizing dedicated networking hardware as a virtual machine (VM) or a container on a commodity server results in performance problems, such as longer latency and lower throughput. This paper focuses on obtaining a low-latency networking system in a VM and a container. We reveal mechanisms that cause millisecond-scale networking delays in a VM through a series of experiments. To eliminate such delays, we design and implement a low-latency networking system, kernel busy poll (KBP), which achieves three goals: (1) microsecond-scale tail delays and higher throughput than conventional solutions are achieved in a VM and a container; (2) application customization is not required, so applications can use the POSIX sockets application program interface; and (3) KBP software does not need to be developed for every Linux kernel security update. KBP can be applied to both a VM configuration and a container configuration. Evaluation results indicate that KBP achieves microsecond-scale tail delays in both a VM and a container. In the VM configuration, KBP reduces maximum round-trip latency by more than 98% and increases the throughput by up to three times compared with existing NAPI and Open vSwitch with the Data Plane Development Kit (OvS-DPDK). In the container configuration, KBP reduces maximum round-trip latency by 21% to 96% and increases the throughput by up to 1.28 times compared with NAPI.
Caixia CAI Wenyang GAN Han HAI Fengde JIA
In this paper, to improve communication system's energy-efficiency (EE), multi-case optimization of two new transmission strategies is investigated. Firstly, with amplify-and-forward relaying and full-duplex technique, two new transmission strategies are designed. The designed transmission strategies consider direct links and non-ideal transmission conditions. At the same time, detailed capacity and energy consumption analyses of the designed transmission strategies are given. In addition, EE optimization and analysis of the designed transmission strategies are studied. It is the first case of EE optimization and it is achieved by joint optimization of transmit time (TT) and transmit power (TP). Furthermore, the second and third cases of EE optimization with respectively optimizing TT and TP are given. Simulations reveal that the designed transmission strategies can effectively improve the communication system's EE.