The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] presence(12hit)

1-12hit
  • A Multi-Task Scheme for Supervised DNN-Based Single-Channel Speech Enhancement by Using Speech Presence Probability as the Secondary Training Target

    Lei WANG  Jie ZHU  Kangbo SUN  

    This paper has been cancelled due to violation of duplicate submission policy on IEICE Transactions on Information and Systems.
     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    1963-1970

    To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.

  • Evaluating the Influence of Country-Related Pictures on the Perception of a Foreign Online Store

    Vanessa BRACAMONTE  Hitoshi OKADA  

     
    PAPER

      Pubricized:
    2015/10/21
      Vol:
    E99-D No:1
      Page(s):
    111-119

    The sense of presence, that is, the sense of the website being psychologically transported to the consumer, has been identified as an important factor for bringing back the feeling of sociability and physicality that is lost in online shopping. Previous research has investigated how visual content in the design can influence the sense of presence in a website, but the focus has been limited to the domestic electronic commerce context. In this paper, we conduct an experimental study in a cross-border electronic commerce context to evaluate the effect of country-related pictures on the perception of country presence, visual appeal and trust in a foreign online store. Two experimental conditions were considered: country-related pictures and generic pictures, each one evaluated for Thai and Singaporean websites. It was hypothesized that country-related content in pictures included in the design of the foreign online store would result in a higher level of country presence, and that this would in turn result in higher visual appeal and trust in the website. We conducted a survey among Japanese online consumers, with a total of 1991 participants obtained. The subjects were randomly assigned into four groups corresponding to the combination of country-of-origin of the website and picture condition. We used structural equation modeling in order to analyze the proposed hypotheses. The results showed that for both the Thai and Singaporean websites, country-related pictures resulted in higher country presence, and visual appeal was positively influenced by this increase in country presence. However, country presence did not have a direct effect on trust; this effect was completely mediated by visual appeal. We discuss these results and their implications for cross-border electronic commerce.

  • Development of an Estimation Model for Instantaneous Presence in Audio-Visual Content

    Kenji OZAWA  Shota TSUKAHARA  Yuichiro KINOSHITA  Masanori MORISE  

     
    PAPER

      Pubricized:
    2015/10/21
      Vol:
    E99-D No:1
      Page(s):
    120-127

    The sense of presence is often used to evaluate the performances of audio-visual (AV) content and systems. However, a presence meter has yet to be realized. We consider that the sense of presence can be divided into two aspects: system presence and content presence. In this study we focused on content presence. To estimate the overall presence of a content item, we have developed estimation models for the sense of presence in audio-only and audio-visual content. In this study, the audio-visual model is expanded to estimate the instantaneous presence in an AV content item. Initially, we conducted an evaluation experiment of the presence with 40 content items to investigate the relationship between the features of the AV content and the instantaneous presence. Based on the experimental data, a neural-network-based model was developed by expanding the previous model. To express the variation in instantaneous presence, 6 audio-related features and 14 visual-related features, which are extracted from the content items in 500-ms intervals, are used as inputs for the model. The audio-related features are loudness, sharpness, roughness, dynamic range and standard deviation in sound pressure levels, and movement of sound images. The visual-related features involve hue, lightness, saturation, and movement of visual images. After constructing the model, a generalization test confirmed that the model is sufficiently accurate to estimate the instantaneous presence. Hence, the model should contribute to the development of a presence meter.

  • Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model

    Yonggang HU  Xiongwei ZHANG  Xia ZOU  Gang MIN  Meng SUN  Yunfei ZHENG  

     
    LETTER-Speech and Hearing

      Vol:
    E98-A No:12
      Page(s):
    2701-2704

    The conventional non-negative matrix factorization (NMF)-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. With the probabilistic estimation of whether the speech is present or not in a certain frame, this letter proposes a speech enhancement algorithm incorporating the speech presence probability (SPP) obtained via noise estimation to the NMF process. To take advantage of both the NMF-based and statistical model-based approaches, the final enhanced speech is achieved by applying a statistical model-based filter to the output of the SPP weighted NMF. Objective evaluations using perceptual evaluation of speech quality (PESQ) on TIMIT with 20 noise types at various signal-to-noise ratio (SNR) levels demonstrate the superiority of the proposed algorithm over the conventional NMF and statistical model-based baselines.

  • Instantaneous Evaluation of the Sense of Presence in Audio-Visual Content

    Kenji OZAWA  Shota TSUKAHARA  Yuichiro KINOSHITA  Masanori MORISE  

     
    PAPER

      Vol:
    E98-D No:1
      Page(s):
    49-57

    The sense of presence is crucial to evaluate the performance of audio-visual (AV) equipment and content. Previously, the overall presence was evaluated for a set of AV content items by asking subjects to judge the presence of the entire content item. In this study, the sense of presence is evaluated for a time-series using the method of continuous judgment by category. Specifically, the audio signals of 40 content items with durations of approximately 30 s each were recorded with a dummy head, and then presented as stimuli to subjects via headphones. The corresponding visual signals were recorded using a video camera in the full-HD format, and reproduced on a 65-inch display. In the experiments, 20 subjects evaluated the instantaneous sense of presence of each item on a seven-point scale under two conditions: audio-only or audio-visual. At the end of the time-series, the subjects also evaluated the overall presence of the item by seven categories. Based on these results, the effects of visual information on the sense of presence were examined. The overall presence is highly correlated with the ten-percentile exceeded presence score, S10, which is the score that is exceeded for the 10% of the time during the responses. Based on the instantaneous presence data in this study, we are one step closer to our ultimate goal of developing a real-time operational presence meter.

  • Noise Spectrum Estimation Based on SNR Discrepancy for Speech Enhancement

    Atanu SAHA  Tetsuya SHIMAMURA  

     
    LETTER-Speech and Hearing

      Vol:
    E97-D No:2
      Page(s):
    373-377

    This letter proposes a noise spectrum estimation algorithm for speech enhancement. The algorithm incorporates the speech presence probability, which is calculated from SNR (signal-to-noise ratio) discrepancy. The discrepancy is measured based on the estimation of the a priori and a posteriori SNR. The proposed algorithm is found to be effective in rapidly switched noise environments. This is confirmed by the experimental results which indicate that the proposed algorithm when integrated in a speech enhancement scheme performs better than conventional noise estimation algorithms.

  • Improved Speech-Presence Uncertainty Estimation Based on Spectral Gradient for Global Soft Decision-Based Speech Enhancement

    Jong-Woong KIM  Joon-Hyuk CHANG  Sang Won NAM  Dong Kook KIM  Jong Won SHIN  

     
    LETTER-Speech and Hearing

      Vol:
    E96-A No:10
      Page(s):
    2025-2028

    In this paper, we propose a speech-presence uncertainty estimation to improve the global soft decision-based speech enhancement technique by using the spectral gradient scheme. The conventional soft decision-based speech enhancement technique uses a fixed ratio (Q) of the a priori speech-presence and speech-absence probabilities to derive the speech-absence probability (SAP). However, we attempt to adaptively change Q according to the spectral gradient between the current and past frames as well as the status of the voice activity in the previous two frames. As a result, the distinct values of Q to each frequency in each frame are assigned in order to improve the performance of the SAP by tracking the robust a priori information of the speech-presence in time.

  • imCast: Studio-Quality Digital Media Platform Exploiting Broadband IP Networks

    Jinyong JO  JongWon KIM  

     
    PAPER-Educational Technology

      Vol:
    E93-D No:5
      Page(s):
    1214-1224

    The recent growth in available network bandwidth envisions the wide-spread use of broadband applications such as uncompressed HD-SDI (High-definition serial digital interface) over IP. These cutting-edge applications are also driving the development of a media-oriented infrastructure for networked collaboration. This paper introduces imCast, a high-quality digital media platform dealing with uncompressed HD-SDI over IP, and discusses its internal architecture in depth. imCast mainly provides cost-effective hardware-based approaches for high-quality media acquisition and presentation; flexible software-based approaches for presentation; and allows for economical network transmission. Experimental results (taken over best-effort IP networks) will demonstrate the functional feasibility and performance of imCast.

  • A Media Access Protocol for Proactive Presence Discovery in Ubiquitous Wireless Networks

    Pavel POUPYREV  Peter DAVIS  Hiroyuki MORIKAWA  

     
    PAPER-Network

      Vol:
    E91-B No:11
      Page(s):
    3639-3647

    This paper proposes a MAC protocol for presence information discovery in ubiquitous networks. The proposed protocol is designed for proactive discovery in which wireless devices periodically broadcast packets containing presence information. The protocol is based on Framed Aloha. The objective of the protocol is to assure the discovery time of single-hop neighbors considering wireless collisions and also power consumption. In this paper, we show that the proposed protocol is able to assure specified discovery time in distributed networks with random topology.

  • Audio Narrowcasting and Privacy for Multipresent Avatars on Workstations and Mobile Phones

    Owen Noel Newton FERNANDO  Kazuya ADACHI  Uresh DUMINDUWARDENA  Makoto KAWAGUCHI  Michael COHEN  

     
    PAPER

      Vol:
    E89-D No:1
      Page(s):
    73-87

    Our group is exploring interactive multi- and hypermedia, especially applied to virtual and mixed reality multimodal groupware systems. We are researching user interfaces to control source→sink transmissions in synchronous groupware (like teleconferences, chatspaces, virtual concerts, etc.). We have developed two interfaces for privacy visualization of narrowcasting (selection) functions in collaborative virtual environments (CVES): for a workstation WIMP (windows/icon/menu/pointer) GUI (graphical user interface), and for networked mobile devices, 2.5- and 3rd-generation mobile phones. The interfaces are integrated with other CVE clients, interoperating with a heterogeneous multimodal groupware suite, including stereographic panoramic browsers and spatial audio backends & speaker arrays. The narrowcasting operations comprise an idiom for selective attention, presence, and privacy-- an infrastructure for rich conferencing capability.

  • FieldCast: Peer-to-Peer Presence Information Exchange in Ubiquitous Computing Environment

    Katsunori MATSUURA  Yoshitsugu TSUCHIYA  Tsuyoshi TOYONO  Kenji TAKAHASHI  

     
    PAPER-Protocols, Applications and Services

      Vol:
    E87-D No:12
      Page(s):
    2610-2617

    Availability of network access "anytime and anywhere" will impose new requirements to presence services - server load sharing and privacy protection. In such cases, presence services would have to deal with sensor device information with maximum consideration of user's privacy. In this paper, we propose FieldCast: peer-to-peer system architecture for presence information exchange in ubiquitous computing environment. According to our proposal, presence information is exchanged directly among user's own computing resources. We illustrate our result of evaluation that proves the feasibility of our proposal.

  • A High Presence Shared Space Communication System Using 2D Background and 3D Avatar

    Kyohei YOSHIKAWA  Takashi MACHIDA  Kiyoshi KIYOKAWA  Haruo TAKEMURA  

     
    INVITED PAPER

      Vol:
    E87-D No:12
      Page(s):
    2532-2539

    Displaying a 3D geometric model of a user in real time is an advantage for a telecommunication system because depth information is useful for nonverbal communication such as finger-pointing and gesturing that contain 3D information. However, the range image acquired by a rangefinder suffers from errors due to image noises and distortions in depth measurement. On the other hand, a 2D image is free from such errors. In this paper, we propose a new method for a shared space communication system that combines the advantages of both 2D and 3D representations. A user is represented as a 3D geometric model in order to exchange nonverbal communication cues. A background is displayed as a 2D image to give the user adequate information about the environment of the remote site. Additionally, a high-resolution texture taken by a video camera is projected onto the 3D geometric model of the user. This is done because the low resolution of the image acquired by the rangefinder makes it difficult to exchange facial expressions. Furthermore, to fill in the data occluded by the user, old pixel values are used for the user area in the 2D background image. We have constructed a prototype of a high presence shared space communication system based on our method. Through a number of experiments, we have found that our method is more effective for telecommunication than a method with only a 2D or 3D representation.