The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] sound(160hit)


  • Some Evaluations on a Digital Watermarking Technique for Music Data Using Distortion Effect

    Yuto MATSUNAGA  Tetsuya KOJIMA  Naofumi AOKI  Yoshinori DOBASHI  Tsuyoshi YAMAMOTO  

    PAPER-Information Network

    E102-D No:6

    We have proposed a novel concept of a digital watermarking technique for music data that focuses on the use of sound synthesis and sound effect techniques. This paper describes the details of our proposed technique that employs the distortion effect, one of the most common sound effects frequently utilized especially for guitar and bass instruments. This paper describes the experimental results of evaluating the resistance of the proposed technique against some basic malicious attacks utilizing MP3 coding, tempo alteration, pitch alteration, and high-pass filtering. It is demonstrated that the proposed technique potentially has appropriate resistance against such attacks except for the high-pass filtering attack. A technique for increasing the resistance against the high-pass filtering attack is also supplementarily discussed.

  • Analysis of Option to Complete, Proper Completion and No Dead Tasks for Acyclic Free Choice Workflow Nets

    Shingo YAMAGUCHI  


    E102-A No:2

    Workflow nets (WF-nets for short) are a subclass of Petri nets and are used for modeling and analysis of workflows. Soundness is a criterion of logical correctness defined for WF-nets. A WF-net is said to be sound if it satisfies three conditions: (i) option to complete, (ii) proper completion, and (iii) no dead tasks. In this paper, focusing our analysis on acyclic free choice WF-nets, we revealed that (1) Conditions (i) and (ii) of soundness are respectively equivalent to the liveness and the boundedness of its short-circuited net; (2) The decision problem for each condition of soundness is co-NP-complete; and (3) If the short-circuited net has no disjoint paths from a transition to a place (or no disjoint paths from a place to a transition), each condition can be checked in polynomial time.

  • Active Contours Driven by Local Rayleigh Distribution Fitting Energy for Ultrasound Image Segmentation

    Hui BI  Yibo JIANG  Hui LI  Xuan SHA  Yi WANG  

    PAPER-Image Recognition, Computer Vision

    E101-D No:7

    The ultrasound image segmentation is a crucial task in many clinical applications. However, the ultrasound image is difficult to segment due to image inhomogeneity caused by the ultrasound imaging technique. In this paper, to deal with image inhomogeneity with considering ultrasound image properties the Local Rayleigh Distribution Fitting (LRDF) energy term is introduced into the traditional level set method newly. While the curve evolution equation is derived for energy minimization, and self-driven uterus contour is achieved on the ultrasound images. The experimental segmentation results on synthetic images and in-vivo ultrasound images present that the proposed approach is effective and accurate, with the Dice Score Coefficient (DSC) of 0.95 ± 0.02.

  • A Novel Low-Overhead Channel Sounding Protocol for Downlink Multi-User MIMO in IEEE 802.11ax WLAN Open Access

    Toshihisa NABETANI  Narendar MADHAVAN  Hiroki MORI  Tsuguhide AOKI  

    PAPER-Terrestrial Wireless Communication/Broadcasting Technologies

    E101-B No:3

    The next generation wireless LAN standard IEEE 802.11ax aims to provide improved throughput performance in dense environments. We have proposed an efficient channel sounding mechanism for DL-MU-MIMO that has been adopted as a new sounding protocol in the 802.11ax standard. In this paper, we evaluate the overhead reduction in the 802.11ax sounding protocol compared with the 802.11ac sounding protocol. Sounding is frequently performed to obtain accurate channel information from the associated stations in order to improve overall system throughput. However, there is a trade-off between accurate channel information and the overhead incurred due to frequent sounding. Therefore, the sounding interval is an important factor that determines system throughput in DL-MU-MIMO transmission. We also evaluate the effect of sounding interval on the system throughput performance using both sounding protocols and provide a comparative analysis of the performance improvement.

  • Multi-Dimensional Radio Channel Measurement, Analysis and Modeling for High Frequency Bands Open Access

    Minseok KIM  Jun-ichi TAKADA  Kentaro SAITO  


    E101-B No:2

    In order to utilize higher frequency bands above 6GHz, which is an important technical challenge in fifth generation mobile systems, radio propagation channel properties in a large variety of deployment scenarios should be thoroughly investigated. The authors' group has been involved in a fundamental research project aimed at investigating multiple-input-multiple-output (MIMO) transmission performance and propagation channel properties at microwave frequency above 10GHz from 2009 to 2013, and since then they have been conducting measurement and modeling for high frequency bands. This paper aims at providing comprehensive tutorial of a whole procedure of channel modeling; multi-dimensional channel sounding, propagation channel measurement, analysis, and modeling, by introducing the developed MIMO channel sounders at high frequency bands of 11 and 60GHz and presenting some measurement results in a microcell environment at 11GHz. Furthermore, this paper identifies challenges in radio propagation measurements, and discusses current/future channel modeling issues as future works.

  • Speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression

    Ryouichi NISHIMURA  Seigo ENOMOTO  Hiroaki KATO  


    E101-D No:1

    Surveillance with multiple cameras and microphones is promising to trace activities of suspicious persons for security purposes. When these sensors are connected to the Internet, they might also jeopardize innocent people's privacy because, as a result of human error, signals from sensors might allow eavesdropping by malicious persons. This paper presents a proposal for exploiting super-resolution to address this problem. Super-resolution is a signal processing technique by which a high-resolution version of a signal can be reproduced from a low-resolution version of the same signal source. Because of this property, an intelligible speech signal is reconstructed from multiple sensor signals, each of which is completely unintelligible because of its sufficiently low sampling rate. A method based on Bayesian linear regression is proposed in comparison with one based on maximum likelihood. Computer simulations using a simple sinusoidal input demonstrate that the methods restore the original signal from those which are actually measured. Moreover, results show that the method based on Bayesian linear regression is more robust than maximum likelihood under various microphone configurations in noisy environments and that this advantage is remarkable when the number of microphones enrolled in the process is as small as the minimum required. Finally, listening tests using speech signals confirmed that mean opinion score (MOS) of the reconstructed signal reach 3, while those of the original signal captured at each single microphone are almost 1.

  • Availability of Reference Sound Sources for Qualification of Hemi-Anechoic Rooms Based on Deviation of Sound Pressure Level from Inverse Square Law

    Keisuke YAMADA  Hironobu TAKAHASHI  Ryuzo HORIUCHI  

    PAPER-Engineering Acoustics

    E101-A No:1

    The sound power level is a physical quantity indispensable for evaluating the amount of sound energy radiated from electrical and mechanical apparatuses. The precise determination of the sound power level requires the qualification of the measurement environment, such as a hemi-anechoic room, by estimating the deviation of the sound pressure level from the inverse-square law. In this respect, Annex A of ISO 3745 specifies the procedure for room qualification and defines a tolerance limit for the directivity of the sound source, which is used for the qualification. However, it is impractical to prepare a special loudspeaker only for room qualification. Thus, we developed a simulation method to investigate the influence of the sound source directivity on the measured deviation of the sound pressure level from the inverse-square law by introducing a quantitative index for the influence of the directivity. In this study, type 4202 reference sound source by Brüel & Kjær was used as a directional sound source because it has been widely used as a reference standard for the measurement of sound power levels. We experimentally obtained the directivity of the sound source by measuring the sound pressure level over the measurement surface. Moreover, the proposed method was applied to the qualification of several hemi-anechoic rooms, and we discussed the availability of a directional sound source for the process. Analytical results showed that available reference sound sources may be used for the evaluation of hemi-anechoic rooms depending on the sound energy absorption coefficient of the inner wall, the directionality of the microphone traverse, and the size of the space to be qualified. In other words, the results revealed that a reference sound source that is once quantified by the proposed method can be used for qualifying hemi-anechoic rooms.

  • Tolerance Evaluation of Audio Watermarking Method Based on Modification of Sound Pressure Level between Channels

    Harumi MURATA  Akio OGIHARA  Shigetoshi HAYASHI  


    E101-D No:1

    We have proposed an audio watermarking method based on modification of sound pressure level between channels. This method is focused on the invariability of sound localization against sound processing like MP3 and the imperceptibility about slightly change of sound localization. In this paper, we investigate about tolerance evaluation against various attacks in reference to IHC criteria.

  • Computational Soundness of Asymmetric Bilinear Pairing-Based Protocols

    Kazuki YONEYAMA  


    E100-A No:9

    Asymmetric bilinear maps using Type-3 pairings are known to be advantageous in several points (e.g., the speed and the size of a group element) to symmetric bilinear maps using Type-1 pairings. Kremer and Mazaré introduce a symbolic model to analyze protocols based on bilinear maps, and show that the symbolic model is computationally sound. However, their model only covers symmetric bilinear maps. In this paper, we propose a new symbolic model to capture asymmetric bilinear maps. Our model allows us to analyze security of various protocols based on asymmetric bilinear maps (e.g., Joux's tripartite key exchange, and Scott's client-server ID-based key exchange). Also, we show computational soundness of our symbolic model under the decisional bilinear Diffie-Hellman assumption.

  • An Efficient Image to Sound Mapping Method Using Speech Spectral Phase and Multi-Column Image


    LETTER-Digital Signal Processing

    E100-A No:3

    Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.

  • Structural and Behavioral Properties of Well-Structured Workflow Nets

    Zhaolong GOU  Shingo YAMAGUCHI  


    E100-A No:2

    Workflow nets (WF-nets for short) are a standard way to automate business processes. Well-Structured WF-nets (WS WF-nets for short) are an important subclass of WF-nets because they have a well-balanced capability to expression power and analysis power. In this paper, we revealed structural and behavioral properties of WS WF-nets. Our results on structural properties are: (i) There is no EFC but non-FC WF-net in WS WF-nets; (ii) A WS WF-net is sound iff it is a van Hee et al.'s ST-net. Our results on behavioral properties are: (i) Any WS WF-net is safe; (ii) Any WS WF-net is separable; (iii) A necessary and sufficient condition on reachability of sound WS WF-net (N,[pIk]). Finally we illustrated the usefulness of the proposed properties with an application example of analyzing workflow evolution.

  • Bi-Direction Interaural Matching Filter and Decision Weighting Fusion for Sound Source Localization in Noisy Environments

    Hong LIU  Mengdi YUE  Jie ZHANG  

    LETTER-Speech and Hearing

    E99-D No:12

    Sound source localization is an essential technique in many applications, e.g., speech enhancement, speech capturing and human-robot interaction. However, the performance of traditional methods degrades in noisy or reverberant environments, and it is sensitive to the spatial location of sound source. To solve these problems, we propose a sound source localization framework based on bi-direction interaural matching filter (IMF) and decision weighting fusion. Firstly, bi-directional IMF is put forward to describe the difference between binaural signals in forward and backward directions, respectively. Then, a hybrid interaural matching filter (HIMF), which is obtained by the bi-direction IMF through decision weighting fusion, is used to alleviate the affection of sound locations on sound source localization. Finally, the cosine similarity between the HIMFs computed from the binaural audio and transfer functions is employed to measure the probability of the source location. Constructing the similarity for all the spatial directions as a matrix, we can determine the source location by Maximum A Posteriori (MAP) estimation. Compared with several state-of-the-art methods, experimental results indicate that HIMF is more robust in noisy environments.

  • Design of a Compact Sound Localization Device on a Stand-Alone FPGA-Based Platform

    Mauricio KUGLER  Teemu TOSSAVAINEN  Susumu KUROYANAGI  Akira IWATA  

    PAPER-Computer System

    E99-D No:11

    Sound localization systems are widely studied and have several potential applications, including hearing aid devices, surveillance and robotics. However, few proposed solutions target portable systems, such as wearable devices, which require a small unnoticeable platform, or unmanned aerial vehicles, in which weight and low power consumption are critical aspects. The main objective of this research is to achieve real-time sound localization capability in a small, self-contained device, without having to rely on large shaped platforms or complex microphone arrays. The proposed device has two surface-mount microphones spaced only 20 mm apart. Such reduced dimensions present challenges for the implementation, as differences in level and spectra become negligible, and only time-difference of arrival (TDoA) can be used as a localization cue. Three main issues have to be addressed in order to accomplish these objectives. To achieve real-time processing, the TDoA is calculated using zero-crossing spikes applied to the hardware-friendly Jeffers model. In order to make up for the reduction in resolution due to the small dimensions, the signal is upsampled several-fold within the system. Finally, a coherence-based spectral masking is used to select only frequency components with relevant TDoA information. The proposed system was implemented on a field-programmable gate array (FPGA) based platform, due to the large amount of concurrent and independent tasks, which can be efficiently parallelized in reconfigurable hardware devices. Experimental results with white-noise and environmental sounds show high accuracies for both anechoic and reverberant conditions.

  • WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications Open Access

    Masanori MORISE  Fumiya YOKOMORI  Kenji OZAWA  

    PAPER-Speech and Hearing

    E99-D No:7

    A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.

  • Real-Time Hardware Implementation of a Sound Recognition System with In-Field Learning


    PAPER-Speech and Hearing

    E99-D No:7

    The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applications.

  • Multiple k-Nearest Neighbor Classifier and Its Application to Tissue Characterization of Coronary Plaque

    Eiji UCHINO  Ryosuke KUBOTA  Takanori KOGA  Hideaki MISAWA  Noriaki SUETAKE  

    PAPER-Biological Engineering

    E99-D No:7

    In this paper we propose a novel classification method for the multiple k-nearest neighbor (MkNN) classifier and show its practical application to medical image processing. The proposed method performs fine classification when a pair of the spatial coordinate of the observation data in the observation space and its corresponding feature vector in the feature space is provided. The proposed MkNN classifier uses the continuity of the distribution of features of the same class not only in the feature space but also in the observation space. In order to validate the performance of the present method, it is applied to the tissue characterization problem of coronary plaque. The quantitative and qualitative validity of the proposed MkNN classifier have been confirmed by actual experiments.

  • A Sensor-Based Data Visualization System for Training Blood Pressure Measurement by Auscultatory Method

    Chooi-Ling GOH  Shigetoshi NAKATAKE  


    E99-D No:4

    Blood pressure measurement by auscultatory method is a compulsory skill that is required by all healthcare practitioners. During the measurement, they must concentrate on recognizing the Korotkoff sounds, looking at the sphygmomanometer scale, and constantly deflating the cuff pressure simultaneously. This complex operation is difficult for the new learners and they need a lot of practice with the supervisor in order to guide them on their measurements. However, the supervisor is not always available and consequently, they always face the problem of lack of enough training. In order to help them mastering the skill of measuring blood pressure by auscultatory method more efficiently and effectively, we propose using a sensor device to capture the signals of Korotkoff sounds and cuff pressure during the measurement, and display the signal changes on a visualization tool through wireless connection. At the end of the measurement, the learners can verify their skill on deflation speed and recognition of Korotkoff sounds using the graphical view, and compare their measurements with the machine instantly. By using this device, the new learners do not need to wait for their supervisor for training but can practice with their colleagues more frequently. As a result, they will be able to acquire the skill in a shorter time and be more confident with their measurements.

  • Enhancing Stereo Signals with High-Order Ambisonics Spatial Information Open Access

    Jorge TREVINO  Shuichi SAKAMOTO  Junfeng LI  Yôiti SUZUKI  


    E99-D No:1

    There is a strong push towards the ultra-realistic presentation of multimedia contents made possible by the latest advances in computational and signal processing technologies. Three-dimensional sound presentation is necessary to convey a natural and rich multimedia experience. Promising ways to achieve this include the sound field reproduction technique known as high-order Ambisonics (HOA). While these advanced methods are now within the capabilities of consumer-level processing systems, their adoption is hindered by the lack of contents. Production and coding of the audio components in multimedia focus on traditional formats such as stereophonic sound. Mainstream audio codecs and media such as CDs or DVDs do not support advanced, rich contents such as HOA encodings. To ameliorate this problem and speed up the adoption of spatial sound technologies, this paper proposes a novel way to downmix HOA contents into a stereo signal. The resulting data can be distributed using conventional methods such as audio CDs or as the audio component of an internet video stream. The results can be listened to using legacy stereo reproduction systems. However, they include spatial information encoded as the inter-channel level and phase differences. The proposed method consists of a downmixing filterbank which independently modulate inter-channel differences at each frequency bin. The proposal is evaluated using simple test signals and found to outperform conventional methods such as matrix-encoded surround and the Ambisonics UHJ format in terms of spatial resolution. The proposal can be coupled with a previously presented method to recover HOA signals from stereo recordings. The resulting system allows for the preservation of full-surround spatial information in ultra-realistic contents when they are transferred using a stereo stream. Simulation results show that a compatible decoder can accurately recover up to five HOA channels from a stereo signal (2nd order HOA data in the horizontal plane).

  • Interference Reduction Characteristics by Circular Array Based Massive MIMO in a Real Microcell Environment

    Ryochi KATAOKA  Kentaro NISHIMORI  Ngochao TRAN  Tetsuro IMAI  Hideo MAKINO  


    E98-B No:8

    The concept of massive multiple input multiple output (MIMO) has recently been proposed. It has been reported that using linear or planar arrays to implement massive MIMO yields narrow beams that can mitigate the interference signal even if interference cancellation techniques such as zero forcing (ZF) are not employed. In this work, we investigate the interference reduction performance achieved by circular array implemented massive MIMO in a real micro cell environment. The channel state information (CSI) is obtained by using a wideband channel sounder with cylindrical 96-element array in the 2-GHz band in an urban area. Circular arrays have much larger beamwidth and sidelobe level than linear arrays. In this paper, when considering the cylindrical array, the interference reduction performance between ZF and maximum ratio combining is compared when one desired user exists in the micro cell while the interference user moves around the adjacent cell. We show that ZF is essential for reducing the interference from the adjacent cell in the circular array based massive MIMO. The required number of antennas in the vertical and horizontal planes for the interference reduction is evaluated, in order to simplify the burden of signal processing for the ZF algorithm in massive MIMO. Because there are elements with low signal to noise power ratio (SNR) when considering cylindrical 96-element array, it is shown that the degradation of the signal to noise plus interference power ratio (SINR) when the number of antennas is reduced is smaller than that by ideal antenna gain reduction with a linear array. Moreover, we show that the appropriate antennas should be selected when a limited number of antennas is assumed, because the dominant waves arrive from certain specific directions.

  • Automatic Detection of the Carotid Artery Location from Volumetric Ultrasound Images Using Anatomical Position-Dependent LBP Features

    Fumi KAWAI  Satoshi KONDO  Keisuke HAYATA  Jun OHMIYA  Kiyoko ISHIKAWA  Masahiro YAMAMOTO  

    PAPER-Image Recognition, Computer Vision

    E98-D No:7

    We propose a fully automatic method for detecting the carotid artery from volumetric ultrasound images as a preprocessing stage for building three-dimensional images of the structure of the carotid artery. The proposed detector utilizes support vector machine classifiers to discriminate between carotid artery images and non-carotid artery images using two kinds of LBP-based features. The detector switches between these features depending on the anatomical position along the carotid artery. We evaluate our proposed method using actual clinical cases. Accuracies of detection are 100%, 87.5% and 68.8% for the common carotid artery, internal carotid artery, and external carotid artery sections, respectively.
