Owing to the several cases wherein abnormal sounds, called adventitious sounds, are included in the lung sounds of a patient suffering from pulmonary disease, the objective of this study was to automatically detect abnormal sounds from auscultatory sounds. To this end, we expressed the acoustic features of the normal lung sounds of healthy people and abnormal lung sounds of patients using Gaussian mixture model (GMM)-hidden Markov models (HMMs), and distinguished between normal and abnormal lung sounds. In our previous study, we constructed left-to-right GMM-HMMs with a limited number of states. Because we expressed abnormal sounds that occur intermittently and repeatedly using limited states, the GMM-HMMs could not express the acoustic features of abnormal sounds. Furthermore, because the analysis frame length and intervals were long, the GMM-HMMs could not express the acoustic features of short time segments, such as heart sounds. Therefore, the classification rate of normal and abnormal respiration was low (86.60%). In this study, we propose the construction of ergodic GMM-HMMs with a repetitive structure for intermittent sounds. Furthermore, we considered a suitable frame length and frame interval to analyze acoustic features. Using the ergodic GMM-HMM, which can express the acoustic features of abnormal sounds and heart sounds that occur repeatedly in detail, the classification rate increased (89.34%). The results obtained in this study demonstrated the effectiveness of the proposed method.
Wenxin DONG Jianxun ZHANG Shuqiu TAN Xinyue ZHANG
In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.
João Filipe PAPEL Tatsuji MUNAKA
In recent years, with the aging of society, many kinds of research have been actively conducted to recognize human activity in a home to watch over the elderly. Multiple sensors for activity recognition are used. However, we need to consider privacy when using these sensors. One of the candidates of the sensors that keep privacy is a sound sensor. MFCC (Mel-Frequency Cepstral Coefficient) is widely used as a feature extraction algorithm for voice recognition. However, it is not suitable to apply conventional MFCC to activity recognition by sounds of daily life. We denote “sounds of daily life” as “life sounds” simply in this paper. The reason is that conventional MFCC does not extract well several features of life sounds that appear at high frequencies. This paper proposes the improved MFCC and reports the evaluation results of activity recognition by machine learning SVM (Support Vector Machine) using features extracted by improved MFCC.
In many situations, abnormal sounds, called adventitious sounds, are included with the lung sounds of a subject suffering from pulmonary diseases. Thus, a method to automatically detect abnormal sounds in auscultation was proposed. The acoustic features of normal lung sounds for control subjects and abnormal lung sounds for patients are expressed using hidden markov models (HMMs) to distinguish between normal and abnormal lung sounds. Furthermore, abnormal sounds were detected in a noisy environment, including heart sounds, using a heart-sound model. However, the F1-score obtained in detecting abnormal respiration was low (0.8493). Moreover, the duration and acoustic properties of segments of respiratory, heart, and adventitious sounds varied. In our previous method, the appropriate HMMs for the heart and adventitious sound segments were constructed. Although the properties of the types of adventitious sounds varied, an appropriate topology for each type was not considered. In this study, appropriate HMMs for the segments of each type of adventitious sound and other segments were constructed. The F1-score was increased (0.8726) by selecting a suitable topology for each segment. The results demonstrate the effectiveness of the proposed method.
Ken MANO Hideki SAKURADA Yasuyuki TSUKADA
We present a mathematical formulation of a trust metric using a quality and quantity pair. Under a certain assumption, we regard trust as an additive value and define the soundness of a trust computation as not to exceed the total sum. Moreover, we point out the importance of not only soundness of each computed trust but also the stability of the trust computation procedure against changes in trust value assignment. In this setting, we define trust composition operators. We also propose a trust computation protocol and prove its soundness and stability using the operators.
Taiki HAYASHI Kazuyoshi ISHIMURA Isao T. TOKUDA
Towards realization of a noise-induced synchronization in a natural environment, an experimental study is carried out using the Van der Pol oscillator circuit. We focus on acoustic sounds as a potential source of noise that may exist in nature. To mimic such a natural environment, white noise sounds were generated from a loud speaker and recorded into microphone signals. These signals were then injected into the oscillator circuits. We show that the oscillator circuits spontaneously give rise to synchronized dynamics when the microphone signals are highly correlated with each other. As the correlation among the input microphone signals is decreased, the level of synchrony is lowered monotonously, implying that the input correlation is the key determinant for the noise-induced synchronization. Our study provides an experimental basis for synchronizing clocks in distributed sensor networks as well as other engineering devices in natural environment.
Yuzhuo LIU Hangting CHEN Qingwei ZHAO Pengyuan ZHANG
Weakly labelled semi-supervised audio tagging (AT) and sound event detection (SED) have become significant in real-world applications. A popular method is teacher-student learning, making student models learn from pseudo-labels generated by teacher models from unlabelled data. To generate high-quality pseudo-labels, we propose a master-teacher-student framework trained with a dual-lead policy. Our experiments illustrate that our model outperforms the state-of-the-art model on both tasks.
Toi TOMITA Wakaha OGATA Kaoru KUROSAWA
In this paper, we construct the first efficient leakage-resilient CCA2 (LR-CCA2)-secure attribute-based encryption (ABE) schemes. We also construct the first efficient LR-CCA2-secure identity-based encryption (IBE) scheme with optimal leakage rate. To obtain our results, we develop a new quasi-adaptive non-interactive zero-knowledge (QA-NIZK) argument for the ciphertext consistency of the LR-CPA-secure schemes. Our ABE schemes are obtained by boosting the LR-CPA-security of some existing schemes to the LR-CCA2-security by using our QA-NIZK arguments. The schemes are almost as efficient as the underlying LR-CPA-secure schemes.
Kosei OZEKI Naofumi AOKI Saki ANAZAWA Yoshinori DOBASHI Kenichi IKEDA Hiroshi YASUDA
This study has developed a system that performs data communications using high frequency bands of sound signals. Unlike radio communication systems using advanced wireless devices, it only requires the legacy devices such as microphones and speakers employed in ordinary telephony communication systems. In this study, we have investigated the possibility of a machine learning approach to improve the recognition accuracy identifying binary symbols exchanged through sound media. This paper describes some experimental results evaluating the performance of our proposed technique employing a neural network as its classifier of binary symbols. The experimental results indicate that the proposed technique may have a certain appropriateness for designing an optimal classifier for the symbol identification task.
Motohiro SUNOUCHI Masaharu YOSHIOKA
This paper proposes new acoustic feature signatures based on the multiscale fractal dimension (MFD), which are robust against the diversity of environmental sounds, for the content-based similarity search. The diversity of sound sources and acoustic compositions is a typical feature of environmental sounds. Several acoustic features have been proposed for environmental sounds. Among them is the widely-used Mel-Frequency Cepstral Coefficients (MFCCs), which describes frequency-domain features. However, in addition to these features in the frequency domain, environmental sounds have other important features in the time domain with various time scales. In our previous paper, we proposed enhanced multiscale fractal dimension signature (EMFD) for environmental sounds. This paper extends EMFD by using the kernel density estimation method, which results in better performance of the similarity search tasks. Furthermore, it newly proposes another acoustic feature signature based on MFD, namely very-long-range multiscale fractal dimension signature (MFD-VL). The MFD-VL signature describes several features of the time-varying envelope for long periods of time. The MFD-VL signature has stability and robustness against background noise and small fluctuations in the parameters of sound sources, which are produced in field recordings. We discuss the effectiveness of these signatures in the similarity sound search by comparing with acoustic features proposed in the DCASE 2018 challenges. Due to the unique descriptiveness of our proposed signatures, we confirmed the signatures are effective when they are used with other acoustic features.
Ryosuke NISHIHARA Hidehiko MATSUBAYASHI Tomomoto ISHIKAWA Kentaro MORI Yutaka HATA
The frequency of uterine peristalsis is closely related to the success rate of pregnancy. An ultrasonic imaging is almost always employed for the measure of the frequency. The physician subjectively evaluates the frequency from the ultrasound image by the naked eyes. This paper aims to measure the frequency of uterine peristalsis from the ultrasound image. The ultrasound image consists of relative amounts in the brightness, and the contour of the uterine is not clear. It was not possible to measure the frequency by using the inter-frame difference and optical flow, which are the representative methods of motion detection, since uterine peristaltic movement is too small to apply them. This paper proposes a measurement method of the frequency of the uterine peristalsis from the ultrasound image in the implantation phase. First, traces of uterine peristalsis are semi-automatically done from the images with location-axis and time-axis. Second, frequency analysis of the uterine peristalsis is done by Fourier transform for 3 minutes. As a result, the frequency of uterine peristalsis was known as the frequency with the dominant frequency ingredient with maximum value among the frequency spectrums. Thereby, we evaluate the number of the frequency of uterine peristalsis quantitatively from the ultrasound image. Finally, the success rate of pregnancy is calculated from the frequency based on Fuzzy logic. This enabled us to evaluate the success rate of pregnancy by measuring the uterine peristalsis from the ultrasound image.
Noriyuki TONAMI Keisuke IMOTO Ryosuke YAMANISHI Yoichi YAMASHITA
Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separately even though sound events and acoustic scenes are closely related to each other. For example, in the acoustic scene “office,” the sound events “mouse clicking” and “keyboard typing” are likely to occur. Therefore, it is expected that information on sound events and acoustic scenes will be of mutual aid for SED and ASC. In this paper, we propose multitask learning for joint analysis of sound events and acoustic scenes, in which the parts of the networks holding information on sound events and acoustic scenes in common are shared. Experimental results obtained using the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of SED and ASC by 1.31 and 1.80 percentage points in terms of the F-score, respectively, compared with the conventional CRNN-based method.
Yohei NAKAMURA Shinya KAJIYAMA Yutaka IGARASHI Takashi OSHIMA Taizo YAMAWAKI
3D ultrasound imagers require low-noise amplifier (LNA) with much lower power consumption and smaller chip area than conventional 2D imagers because of the huge amount of transducer channels. This paper presents a low-power small-size LNA with a novel current-reuse circuitry for 3D ultrasound imaging systems. The proposed LNA is composed of a differential common source amplifier and a source-follower driver which share the current without using inductors. The LNA was fabricated in a 0.18-μm CMOS process with only 0.0056mm2. The measured results show a gain of 21dB and a bandwidth of 9MHz. The proposed LNA achieves an average noise density of 11.3nV/√Hz, and the 2nd harmonic distortion below -40dBc with 0.1-Vpp input. The supply current is 85μA with a 1.8-V power supply, which is competitive with conventional LNAs by finer CMOS process.
A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events “dishes” and “glass jingling” are likely to co-occur in the acoustic scene “cooking.” In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed method, the occurrences of sound events are expressed as a graph whose nodes indicate the frequencies of event occurrence and whose edges indicate the sound event co-occurrences. This graph representation is then utilized for the model training of sound event detection, which is optimized under an objective function with a regularization term considering the graph structure of sound event occurrence and co-occurrence. Evaluation experiments using the TUT Sound Events 2016 and 2017 detasets, and the TUT Acoustic Scenes 2016 dataset show that the proposed method improves the performance of sound event detection by 7.9 percentage points compared with the conventional CNN-BiGRU-based detection method in terms of the segment-based F1 score. In particular, the experimental results indicate that the proposed method enables the detection of co-occurring sound events more accurately than the conventional method.
Minseok KIM Tatsuki IWATA Shigenobu SASAKI Jun-ichi TAKADA
In radio channel measurements and modeling, directional scanning via highly directive antennas is the most popular method to obtain angular channel characteristics to develop and evaluate advanced wireless systems for high frequency band use. However, it is often insufficient for ray-/cluster-level characterizations because the angular resolution of the measured data is limited by the angular sampling interval over a given scanning angle range and antenna half power beamwidth. This study proposes the sub-grid CLEAN algorithm, a novel technique for high-resolution multipath component (MPC) extraction from the multi-dimensional power image, so called double-directional angular delay power spectrum. This technique can successfully extract the MPCs by using the multi-dimensional power image. Simulation and measurements showed that the proposed technique could extract MPCs for ray-/cluster-level characterizations and channel modeling. Further, applying the proposed method to the data captured at 58.5GHz in an atrium entrance hall environment which is an indoor hotspot access scenario in the fifth generation mobile system, the multipath clusters and corresponding scattering processes were identified.
Gengxin NING Shenjie JIANG Xuejin ZHAO Cui YANG
This paper presents a two-dimensional (2D) DOA algorithm for double L-shaped arrays. The algorithm is applied to the underwater environment for eliminating the performance error caused by the sound speed uncertainty factor. By introducing the third dimensional array, the algorithm eliminates the sound velocity variable in the depression angle expression, so that the DOA estimation no longer considering the true value of unknown sound velocity. In order to determine the parameters of a three-dimensional array, a parameter matching method with the double L-shaped array is also proposed. Simulations show that the proposed algorithm outperforms the conventional 2D-DOA estimation algorithm in unknown sound velocity environment.
Yuya HOSODA Arata KAWAMURA Youji IIGUNI
In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.
Saya OHIRA Naoki TSUCHIYA Tetsuya MATSUMURA
We propose a three-dimensional (3D) sound processor architecture that includes super-directional modulation intellectual property (IP) and 3D sound processing IP and for consumer applications. In addition, we also propose an automatic design environment for 3D sound processing IP. This processor can generate realistic small sound fields in arbitrary spaces using ultrasound. In particular, in the 3D sound processing IP, in order to reproduce 3D audio, it is necessary to reproduce the personal frequency characteristics of complex head related transfer functions. For this reason, we have constructed an automatic design environment with high reconfigurability. This automatic design environment is based on high-level synthesis, and it is possible to automatically generate a C-based algorithm simulator and automatically synthesize the IP hardware by inputting a parameter description file for filter design. This automatic design environment can reduce the design period to approximately 1/5 as compared with conventional manual design. Applying the automatic design environment, a 3D sound processing IP was designed experimentally. The designed IP can be sufficiently applied to consumer applications from the viewpoints of hardware amount and power consumption.
Yingwei FU Kele XU Haibo MI Qiuqiang KONG Dezhi WANG Huaimin WANG Tie HONG
Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
Deng-Fong LU Chin HSIA Jian-Chiun LIOU Yen-Chung HUANG
Design of an equivalent slew-rate monolithic pulse generator using bipolar-CMOS-DMOS (BCD) technology for medical ultrasound transmitters is presented in this paper. The pulse generator employs a floating capacitive coupling level-shifter architecture to produce a high-voltage (Vpp=80V) output. The performance of equivalent slew-rate in the rising and falling edge is achieved by carefully choosing the value of coupling capacitors and the size of the final stage high-voltage MOSFETs of the pulse generator. The measured output pulses show the rising and falling time of 8.6nsec and 8.5nsec, respectively with second harmonic distortion down to -40dBc, indicating the designed pulse generator can be used for advanced ultrasonic harmonic imaging systems.