Conventional approaches to statistical parametric speech synthesis use context-dependent hidden Markov models (HMMs) clustered using decision trees to generate speech parameters from linguistic features. However, decision trees are not always appropriate to model complex context dependencies of linguistic features efficiently. An alternative scheme that replaces decision trees with deep neural networks (DNNs) was presented as a possible way to overcome the difficulty. By training the network to represent high-dimensional feedforward dependencies from linguistic features to acoustic features, DNN-based speech synthesis systems convert a text into a speech. To improved the naturalness of the synthesized speech, this paper presents a novel pre-training method for DNN-based statistical parametric speech synthesis systems. In our method, a deep relational model (DRM), which represents a joint probability of two visible variables, is applied to describe the joint distribution of acoustic and linguistic features. As with DNNs, a DRM consists several hidden layers and two visible layers. Although DNNs represent feedforward dependencies from one visible variables (inputs) to other visible variables (outputs), a DRM has an ability to represent the bidirectional dependencies between two visible variables. During the maximum-likelihood (ML) -based training, the model optimizes its parameters (connection weights between two adjacent layers, and biases) of a deep architecture considering the bidirectional conversion between 1) acoustic features given linguistic features, and 2) linguistic features given acoustic features generated from itself. Owing to considering whether the generated acoustic features are recognizable, our method can obtain reasonable parameters for speech synthesis. Experimental results in a speech synthesis task show that pre-trained DNN-based systems using our proposed method outperformed randomly-initialized DNN-based systems, especially when the amount of training data is limited. Additionally, speaker-dependent speech recognition experimental results also show that our method outperformed DNN-based systems, by setting the initial parameters of our method are the same as that in the synthesis experiments.
Shinnosuke SARUWATARI Fuyuki ISHIKAWA Tsutomu KOBAYASHI Shinichi HONIDEN
Refinement-based formal specification is a promising approach to the increasing complexity of software systems, as demonstrated in the formal method Event-B. It allows stepwise modeling and verifying of complex systems with multiple steps at different abstraction levels. However, making changes is more difficult, as caution is necessary to avoid breaking the consistency between the steps. Judging whether a change is valid or not is a non-trivial task, as the logical dependency relationships between the modeling elements (predicates) are implicit and complex. In this paper, we propose a method for analyzing the impact of the changes of Event-B. By attaching labels to modeling elements (predicates), the method helps engineers understand how a model is structured and what needs to be modified to accomplish a change.
Yinwei ZHAN Yaodong LI Zhuo YANG Yao ZHAO Huaiyu WU
Heat map is an important tool for eye tracking data analysis and visualization. It is very intuitive to express the area watched by observer, but ignores saccade information that expresses gaze shift. Based on conventional heat map generation method, this paper presents a novel heat map generation method for eye tracking data. The proposed method introduces a mixed data structure of fixation points and saccades, and considers heat map deformation for saccade type data. The proposed method has advantages on indicating gaze transition direction while visualizing gaze region.
Chun-Yu LIU Shu-Nung YAO Ying-Jen CHEN
With advances in information technology and the development of big data, manual operation is unlikely to be a smart choice for stock market investing. Instead, the computer-based investment model is expected to bring investors more accurate strategic analysis and more effective investment decisions than human beings. This paper aims to improve investor profits by mining for critical information in the stock data, therefore helping big data analysis. We used the R language to find the technical indicators in the stock market, and then applied the technical indicators to the prediction. The proposed R package includes several analysis toolkits, such as trend line indicators, W type reversal patterns, V type reversal patterns, and the bull or bear market. The simulation results suggest that the developed R package can accurately present the tendency of the price and enhance the return on investment.
Temporal behavior is a primary aspect of business process executions. Herein, we propose a temporal outlier detection and analysis method for business processes. Particularly, the method performs correlation analysis between the execution times of traces and activities to determine the type of activities that significantly influences the anomalous temporal behavior of a trace. To this end, we describe the modeling of temporal behaviors considering different control-flow patterns of business processes. Further, an execution time matrix with execution times of activities in all traces is constructed by using the event logs. Based on this matrix, we perform temporal outlier detection and correlation-based analysis.
Fei GUO Yuan YANG Yang XIAO Yong GAO Ningmei YU
Currently, visual perceptions generated by visual prosthesis are low resolution with unruly color and restricted grayscale. This severely restricts the ability of prosthetic implant to complete visual tasks in daily scenes. Some studies explore existing image processing techniques to improve the percepts of objects in prosthetic vision. However, most of them extract the moving objects and optimize the visual percepts in general dynamic scenes. The application of visual prosthesis in daily life scenes with high dynamic is greatly limited. Hence, in this study, a novel unsupervised moving object segmentation model is proposed to automatically extract the moving objects in high dynamic scene. In this model, foreground cues with spatiotemporal edge features and background cues with boundary-prior are exploited, the moving object proximity map are generated in dynamic scene according to the manifold ranking function. Moreover, the foreground and background cues are ranked simultaneously, and the moving objects are extracted by the two ranking maps integration. The evaluation experiment indicates that the proposed method can uniformly highlight the moving object and keep good boundaries in high dynamic scene with other methods. Based on this model, two optimization strategies are proposed to improve the perception of moving objects under simulated prosthetic vision. Experimental results demonstrate that the introduction of optimization strategies based on the moving object segmentation model can efficiently segment and enhance moving objects in high dynamic scene, and significantly improve the recognition performance of moving objects for the blind.
Takuya KOYANAGI Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA
Body bias generators are useful circuits that can reduce variability and power dissipation in LSI circuits. However, the amplifier implemented into the body bias generator is difficult to design because of its complexity. To overcome the difficulty, this paper proposes a clearer cell-based design method of the amplifier than the existing cell-based design methods. The proposed method is based on a simple analytical model, which enables to easily design the amplifiers under various operating conditions. First, we introduce a small signal equivalent circuit of two-stage amplifiers by which we approximate a three-stage amplifier, and introduce a method for determining its design parameters based on the analytical model. Second, we propose a method of tuning parameters such as cell-based phase compensation elements and drive-strength of the output stage. Finally, based on the test chip measurement, we show the advantage of the body bias generator we designed in a cell-based flow over existing designs.
Shun-ichiro OHMI Yuya TSUKAMOTO Rengie Mark D. MAILIG
In this paper, we have investigated the etching selectivity of HfN encapsulating layer for high quality PtHf-alloy silicide (PtHfSi) formation with low contact resistivity on Si(100). The HfN(10 nm)/PtHf(20 nm)/p-Si(100) stacked layer was in-situ deposited by RF-magnetron sputtering at room temperature. Then, silicidation was carried out at 500°C/20 min in N2/4.9%H2 ambient. Next, the HfN encapsulating layer was etched for 1-10 min by buffered-HF (BHF) followed by the unreacted PtHf metal etching. We have found that the etching duration of the 10-nm-thick HfN encapsulating layer should be shorter than 6 min to maintain the PtHfSi crystallinity. This is probably because the PtHf-alloy silicide was gradually etched by BHF especially for the Hf atoms after the HfN was completely removed. The optimized etching process realized the ultra-low contact resistivity of PtHfSi to p+/n-Si(100) and n+/p-Si(100) such as 9.4×10-9Ωcm2 and 4.8×10-9Ωcm2, respectively, utilizing the dopant segregation process. The control of etching duration of HfN encapsulating layer is important to realize the high quality PtHfSi formation with low contact resistivity.
Wenting WEI Kun WANG Gu BAN Keming FENG Xuan WANG Huaxi GU
Network virtualization is viewed as a promising approach to facilitate the sharing of physical infrastructure among different kinds of users and applications. In this letter, we propose a topological consistency-based virtual network embedding (TC-VNE) over elastic optical networks (EONs). Based on the concept of topological consistency, we propose a new node ranking approach, named Sum-N-Rank, which contributes to the reduction of optical path length between preferred substrate nodes. In the simulation results, we found our work contributes to improve spectral efficiency and balance link load simultaneously without deteriorating blocking probability.
Teruo TANIMOTO Takatsugu ONO Koji INOUE
Correctly understanding microarchitectural bottlenecks is important to optimize performance and energy of OoO (Out-of-Order) processors. Although CPI (Cycles Per Instruction) stack has been utilized for this purpose, it stacks architectural events heuristically by counting how many times the events occur, and the order of stacking affects the result, which may be misleading. It is because CPI stack does not consider the execution path of dynamic instructions. Critical path analysis (CPA) is a well-known method to identify the critical execution path of dynamic instruction execution on OoO processors. The critical path consists of the sequence of events that determines the execution time of a program on a certain processor. We develop a novel representation of CPCI stack (Cycles Per Critical Instruction stack), which is CPI stack based on CPA. The main challenge in constructing CPCI stack is how to analyze a large number of paths because CPA often results in numerous critical paths. In this paper, we show that there are more than ten to the tenth power critical paths in the execution of only one thousand instructions in 35 benchmarks out of 48 from SPEC CPU2006. Then, we propose a statistical method to analyze all the critical paths and show a case study using the benchmarks.
Yuto MATSUNAGA Tetsuya KOJIMA Naofumi AOKI Yoshinori DOBASHI Tsuyoshi YAMAMOTO
We have proposed a novel concept of a digital watermarking technique for music data that focuses on the use of sound synthesis and sound effect techniques. This paper describes the details of our proposed technique that employs the distortion effect, one of the most common sound effects frequently utilized especially for guitar and bass instruments. This paper describes the experimental results of evaluating the resistance of the proposed technique against some basic malicious attacks utilizing MP3 coding, tempo alteration, pitch alteration, and high-pass filtering. It is demonstrated that the proposed technique potentially has appropriate resistance against such attacks except for the high-pass filtering attack. A technique for increasing the resistance against the high-pass filtering attack is also supplementarily discussed.
Daiki SEKIZAWA Shinnosuke TAKAMICHI Hiroshi SARUWATARI
This article proposes a prosody correction method based on partial model adaptation for Chinese-accented Japanese hidden Markov model (HMM)-based text-to-speech synthesis. Although text-to-speech synthesis built from non-native speech accurately reproduces the speaker's individuality in synthetic speech, the naturalness of the synthetic speech is strongly degraded. In the proposed model, to improve the naturalness while preserving the speaker individuality of Chinese-accented Japanese text-to-speech synthesis, we partially utilize HMM parameters of native Japanese speech to synthesize prosody-corrected synthetic speech. Results of an experimental evaluation demonstrate that duration and F0 correction are significantly effective for improving naturalness.
Shu FUJITA Keita TAKAHASHI Toshiaki FUJII
We propose a method for extracting multi-view images from a light field (plenoptic) camera that accurately handles the physical pixel arrangement of this camera. We use a Lytro Illum camera to obtain 4D light field data (a set of multi-viewpoint images) through a micro-lens array. The light field data are multiplexed on a single image sensor, and thus, the data is first demultiplexed into a set of multi-viewpoint (sub-aperture) images. However, the demultiplexing process usually includes interpolation of the original data such as demosaicing for a color filter array and pixel resampling for the hexagonal pixel arrangement of the original sub-aperture images. If this interpolation is performed, some information is added or lost to/from the original data. In contrast, we preserve the original data as faithfully as possible, and use them directly for the super resolution reconstruction, where the super-resolved image and the corresponding depth map are alternatively refined. We experimentally demonstrate the effectiveness of our method in resolution enhancement through comparisons with Light Field Toolbox and Lytro Desktop Application. Moreover, we also mention another type of light field cameras, a Raytrix camera, and describe how it can be handled to extract high-quality multi-view images.
Jing ZHAO Yoshiharu ISHIKAWA Lei CHEN Chuan XIAO Kento SUGIURA
As big data attracts attention in a variety of fields, research on data exploration for analyzing large-scale scientific data has gained popularity. To support exploratory analysis of scientific data, effective summarization and visualization of the target data as well as seamless cooperation with modern data management systems are in demand. In this paper, we focus on the exploration-based analysis of scientific array data, and define a spatial V-Optimal histogram to summarize it based on the notion of histograms in the database research area. We propose histogram construction approaches based on a general hierarchical partitioning as well as a more specific one, the l-grid partitioning, for effective and efficient data visualization in scientific data analysis. In addition, we implement the proposed algorithms on the state-of-the-art array DBMS, which is appropriate to process and manage scientific data. Experiments are conducted using massive evacuation simulation data in tsunami disasters, real taxi data as well as synthetic data, to verify the effectiveness and efficiency of our methods.
Masahiro KOHJIMA Tatsushi MATSUBAYASHI Hiroshi SAWADA
Due to the need to protect personal information and the impracticality of exhaustive data collection, there is increasing need to deal with datasets with various levels of granularity, such as user-individual data and user-group data. In this study, we propose a new method for jointly analyzing multiple datasets with different granularity. The proposed method is a probabilistic model based on nonnegative matrix factorization, which is derived by introducing latent variables that indicate the high-resolution data underlying the low-resolution data. Experiments on purchase logs show that the proposed method has a better performance than the existing methods. Furthermore, by deriving an extension of the proposed method, we show that the proposed method is a new fundamental approach for analyzing datasets with different granularity.
David ALEDO Benjamin CARRION SCHAFER Félix MORENO
This paper describes the advantages and disadvantages observed when describing complex parameterizable Artificial Neural Networks (ANNs) at the behavioral level using SystemC and at the Register Transfer Level (RTL) using VHDL. ANNs are complex to parameterize because they have a configurable number of layers, and each one of them has a unique configuration. This kind of structure makes ANNs, a priori, challenging to parameterize using Hardware Description Languages (HDL). Thus, it seems intuitively that ANNs would benefit from the raise in level of abstraction from RTL to behavioral level. This paper presents the results of implementing an ANN using both levels of abstractions. Results surprisingly show that VHDL leads to better results and allows a much higher degree of parameterization than SystemC. The implementation of these parameterizable ANNs are made open source and are freely available online. Finally, at the end of the paper we make some recommendation for future HLS tools to improve their parameterization capabilities.
This paper introduces a new noise generation algorithm for vocoder-based speech waveform generation. White noise is generally used for generating an aperiodic component. Since short-term white noise includes a zero-frequency component (ZFC) and inaudible components below 20 Hz, they are reduced in advance when synthesizing. We propose a new noise generation algorithm based on that for velvet noise to overcome the problem. The objective evaluation demonstrated that the proposed algorithm can reduce the unwanted components.
Fei WU Xiwei DONG Lu HAN Xiao-Yuan JING Yi-mu JI
Recently, multi-view dictionary learning technique has attracted lots of research interest. Although several multi-view dictionary learning methods have been addressed, they can be further improved. Most of existing multi-view dictionary learning methods adopt the l0 or l1-norm sparsity constraint on the representation coefficients, which makes the training and testing phases time-consuming. In this paper, we propose a novel multi-view dictionary learning approach named multi-view synthesis and analysis dictionaries learning (MSADL), which jointly learns multiple discriminant dictionary pairs with each corresponding to one view and containing a structured synthesis dictionary and a structured analysis dictionary. MSADL utilizes synthesis dictionaries to achieve class-specific reconstruction and uses analysis dictionaries to generate discriminative code coefficients by linear projection. Furthermore, we design an uncorrelation term for multi-view dictionary learning, such that the redundancy among synthesis dictionaries learned from different views can be reduced. Two widely used datasets are employed as test data. Experimental results demonstrate the efficiency and effectiveness of the proposed approach.
Zhe LI Yili XIA Qian WANG Wenjiang PEI Jinguang HAO
A novel time-series relationship among four consecutive real-valued single-tone sinusoid samples is proposed based on their linear prediction property. In order to achieve unbiased frequency estimates for a real sinusoid in white noise, based on the proposed four-point time-series relationship, a constrained least squares cost function is minimized based on the unit-norm principle. Closed-form expressions for the variance and the asymptotic expression for the variance of the proposed frequency estimator are derived, facilitating a theoretical performance comparison with the existing three-point counterpart, called as the reformed Pisarenko harmonic decomposer (RPHD). The region of performance advantage of the proposed four-point based constrained least squares frequency estimator over the RPHD is also discussed. Computer simulations are conducted to support our theoretical development and to compare the proposed estimator performance with the RPHD as well as the Cramer-Rao lower bound (CRLB).
Xiangdong HUANG Jingwen XU Jiexiao YU Yu LIU
To optimize the performance of FIR filters that have low computation complexity, this paper proposes a hybrid design consisting of two optimization levels. The first optimization level is based on cyclic-shift synthesis, in which all possible sub filters (or windowed sub filters) with distinct cycle shifts are averaged to generate a synthesized filter. Due to the fact that the ripples of these sub filters' transfer curves can be individually compensated, this synthesized filter attains improved performance (besides two uprushes occur on the edges of a transition band) and thus this synthesis actually plays the role of ‘natural optimization’. Furthermore, this synthesis process can be equivalently summarized into a 3-step closed-form procedure, which converts the multi-variable optimization into a single-variable optimization. Hence, to suppress the uprushes, what the second optimization level (by Differential Evolution (DE) algorithm) needs to do is no more than searching for the optimum transition point which incurs only minimal complexity . Owning to the combination between the cyclic-shift synthesis and DE algorithm, unlike the regular evolutionary computing schemes, our hybrid design is more attractive due to its narrowed search space and higher convergence speed . Numerical results also show that the proposed design is superior to the conventional DE design in both filter performance and design efficiency, and it is comparable to the Remez design.