The search functionality is under construction.

Keyword Search Result

[Keyword] attention(111hit)

81-100hit(111hit)

  • A Unified Neural Network for Quality Estimation of Machine Translation

    Maoxi LI  Qingyu XIANG  Zhiming CHEN  Mingwen WANG  

     
    LETTER-Natural Language Processing

      Pubricized:
    2018/06/18
      Vol:
    E101-D No:9
      Page(s):
    2417-2421

    The-state-of-the-art neural quality estimation (QE) of machine translation model consists of two sub-networks that are tuned separately, a bidirectional recurrent neural network (RNN) encoder-decoder trained for neural machine translation, called the predictor, and an RNN trained for sentence-level QE tasks, called the estimator. We propose to combine the two sub-networks into a whole neural network, called the unified neural network. When training, the bidirectional RNN encoder-decoder are initialized and pre-trained with the bilingual parallel corpus, and then, the networks are trained jointly to minimize the mean absolute error over the QE training samples. Compared with the predictor and estimator approach, the use of a unified neural network helps to train the parameters of the neural networks that are more suitable for the QE task. Experimental results on the benchmark data set of the WMT17 sentence-level QE shared task show that the proposed unified neural network approach consistently outperforms the predictor and estimator approach and significantly outperforms the other baseline QE approaches.

  • Improve Multichannel Speech Recognition with Temporal and Spatial Information

    Yu ZHANG  Pengyuan ZHANG  Qingwei ZHAO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2018/04/06
      Vol:
    E101-D No:7
      Page(s):
    1963-1967

    In this letter, we explored the usage of spatio-temporal information in one unified framework to improve the performance of multichannel speech recognition. Generalized cross correlation (GCC) is served as spatial feature compensation, and an attention mechanism across time is embedded within long short-term memory (LSTM) neural networks. Experiments on the AMI meeting corpus show that the proposed method provides a 8.2% relative improvement in word error rate (WER) over the model trained directly on the concatenation of multiple microphone outputs.

  • Cyber-Physical Hybrid Environment Using a Largescale Discussion System Enhances Audiences' Participation and Satisfaction in the Panel Discussion

    Satoshi KAWASE  Takayuki ITO  Takanobu OTSUKA  Akihisa SENGOKU  Shun SHIRAMATSU  Tokuro MATSUO  Tetsuya OISHI  Rieko FUJITA  Naoki FUKUTA  Katsuhide FUJITA  

     
    PAPER-Creativity Support Systems and Decision Support Systems

      Pubricized:
    2018/01/19
      Vol:
    E101-D No:4
      Page(s):
    847-855

    Performance based on multi-party discussion has been reported to be superior to that based on individuals. However, it is impossible that all participants simultaneously express opinions due to the time and space limitations in a large-scale discussion. In particular, only a few representative discussants and audiences can speak in conventional unidirectional discussions (e.g., panel discussion), although many participants gather for the discussion. To solve these problems, in this study, we proposed a cyber-physical discussion using “COLLAGREE,” which we developed for building consensus of large-scale online discussions. COLLAGREE is equipped with functions such as a facilitator, point ranking system, and display of discussion in tree structure. We focused on the relationship between satisfaction with the discussion and participants' desire to express opinions. We conducted the experiment in the panel discussion of an actual international conference. Participants who were audiences in the floor used COLLAGREE during the panel discussion. They responded to questionnaires after the experiment. The main findings are as follows: (1) Participation in online discussion was associated with the satisfaction of the participants; (2) Participants who desired to positively express opinions joined the cyber-space discussion; and (3) The satisfaction of participants who expressed opinions in the cyber-space discussion was higher than those of participants who expressed opinions in the real-space discussion and those who did not express opinions in both the cyber- and real-space discussions. Overall, active behaviors in the cyber-space discussion were associated with participants' satisfaction with the entire discussion, suggesting that cyberspace provided useful alternative opportunities to express opinions for audiences who used to listen to conventional unidirectional discussions passively. In addition, a complementary relationship exists between participation in the cyber-space and real-space discussions. These findings can serve to create a user-friendly discussion environment.

  • Deep Attention Residual Hashing

    Yang LI  Zhuang MIAO  Ming HE  Yafei ZHANG  Hang LI  

     
    LETTER-Image

      Vol:
    E101-A No:3
      Page(s):
    654-657

    How to represent images into highly compact binary codes is a critical issue in many computer vision tasks. Existing deep hashing methods typically focus on designing loss function by using pairwise or triplet labels. However, these methods ignore the attention mechanism in the human visual system. In this letter, we propose a novel Deep Attention Residual Hashing (DARH) method, which directly learns hash codes based on a simple pointwise classification loss function. Compared to previous methods, our method does not need to generate all possible pairwise or triplet labels from the training dataset. Specifically, we develop a new type of attention layer which can learn human eye fixation and significantly improves the representation ability of hash codes. In addition, we embedded the attention layer into the residual network to simultaneously learn discriminative image features and hash codes in an end-to-end manner. Extensive experiments on standard benchmarks demonstrate that our method preserves the instance-level similarity and outperforms state-of-the-art deep hashing methods in the image retrieval application.

  • An Attention-Based Hybrid Neural Network for Document Modeling

    Dengchao HE  Hongjun ZHANG  Wenning HAO  Rui ZHANG  Huan HAO  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/03/21
      Vol:
    E100-D No:6
      Page(s):
    1372-1375

    The purpose of document modeling is to learn low-dimensional semantic representations of text accurately for Natural Language Processing tasks. In this paper, proposed is a novel attention-based hybrid neural network model, which would extract semantic features of text hierarchically. Concretely, our model adopts a bidirectional LSTM module with word-level attention to extract semantic information for each sentence in text and subsequently learns high level features via a dynamic convolution neural network module. Experimental results demonstrate that our proposed approach is effective and achieve better performance than conventional methods.

  • Image Modification Based on Spatial Frequency Components for Visual Attention Retargeting

    Hironori TAKIMOTO  Syuhei HITOMI  Hitoshi YAMAUCHI  Mitsuyoshi KISHIHARA  Kensuke OKUBO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2017/03/15
      Vol:
    E100-D No:6
      Page(s):
    1339-1349

    It is estimated that 80% of the information entering the human brain is obtained through the eyes. Therefore, it is commonly believed that drawing human attention to particular objects is effective in assisting human activities. In this paper, we propose a novel image modification method for guiding user attention to specific regions of interest by using a novel saliency map model based on spatial frequency components. We modify the frequency components on the basis of the obtained saliency map to decrease the visual saliency outside the specified region. By applying our modification method to an image, human attention can be guided to the specified region because the saliency inside the region is higher than that outside the region. Using gaze measurements, we show that the proposed saliency map matches well with the distribution of actual human attention. Moreover, we evaluate the effectiveness of the proposed modification method by using an eye tracking system.

  • Top-Down Visual Attention Estimation Using Spatially Localized Activation Based on Linear Separability of Visual Features

    Takatsugu HIRAYAMA  Toshiya OHIRA  Kenji MASE  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2015/09/10
      Vol:
    E98-D No:12
      Page(s):
    2308-2316

    Intelligent information systems captivate people's attention. Examples of such systems include driving support vehicles capable of sensing driver state and communication robots capable of interacting with humans. Modeling how people search visual information is indispensable for designing these kinds of systems. In this paper, we focus on human visual attention, which is closely related to visual search behavior. We propose a computational model to estimate human visual attention while carrying out a visual target search task. Existing models estimate visual attention using the ratio between a representative value of visual feature of a target stimulus and that of distractors or background. The models, however, can not often achieve a better performance for difficult search tasks that require a sequentially spotlighting process. For such tasks, the linear separability effect of a visual feature distribution should be considered. Hence, we introduce this effect to spatially localized activation. Concretely, our top-down model estimates target-specific visual attention using Fisher's variance ratio between a visual feature distribution of a local region in the field of view and that of a target stimulus. We confirm the effectiveness of our computational model through a visual search experiment.

  • Image Modification Based on a Visual Saliency Map for Guiding Visual Attention

    Hironori TAKIMOTO  Tatsuhiko KOKUI  Hitoshi YAMAUCHI  Mitsuyoshi KISHIHARA  Kensuke OKUBO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2015/08/13
      Vol:
    E98-D No:11
      Page(s):
    1967-1975

    It is commonly believed that improved interaction between humans and electronic device, it is effective to draw the viewer's attention to a particular object. Augmented reality (AR) applications can call attention to real objects by overlaying highlight effects or visual stimuli (such as arrows) on a physical scene. Sometimes, more subtle effects would be desirable, in which case it would be necessary to smoothly and naturally guide the user's gaze without external stimuli. Here, a novel image modification method is proposed for directing a viewer's gaze to specific regions of interest. The proposed method uses saliency analysis and color modulation to create modified images in which the region of interest is the most salient region in the entire image. The proposed saliency map model that is used during saliency analysis reduces computational costs and improves the naturalness of the image using the LAB color space and simplified normalization. During color modulation, the modulation value of each LAB component is determined in order to consider the relationship between the LAB components and the saliency value. With the image obtained in this manner, the viewer's attention is smoothly attracted to a specific region very naturally. Gaze measurements as well as a subjective experiments were conducted to prove the effectiveness of the proposed method. These results show that a viewer's visual attention is indeed attracted toward the specified region without any sense of discomfort or disruption when the proposed method is used.

  • A Salient Feature Extraction Algorithm for Speech Emotion Recognition

    Ruiyu LIANG  Huawei TAO  Guichen TANG  Qingyun WANG  Li ZHAO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/05/29
      Vol:
    E98-D No:9
      Page(s):
    1715-1718

    A salient feature extraction algorithm is proposed to improve the recognition rate of the speech emotion. Firstly, the spectrogram of the emotional speech is calculated. Secondly, imitating the selective attention mechanism, the color, direction and brightness map of the spectrogram is computed. Each map is normalized and down-sampled to form the low resolution feature matrix. Then, each feature matrix is converted to the row vector and the principal component analysis (PCA) is used to reduce features redundancy to make the subsequent classification algorithm more practical. Finally, the speech emotion is classified with the support vector machine. Compared with the tradition features, the improved recognition rate reaches 15%.

  • Selective Attention Mechanisms for Visual Quality Assessment

    Ulrich ENGELKE  

     
    INVITED PAPER

      Vol:
    E98-A No:8
      Page(s):
    1681-1688

    Selective visual attention is an integral mechanism of the human visual system that is often neglected when designing perceptually relevant image and video quality metrics. Disregarding attention mechanisms assumes that all distortions in the visual content impact equally on the overall quality perception, which is typically not the case. Over the past years we have performed several experiments to study the effect of visual attention on quality perception. In addition to gaining a deeper scientific understanding of this matter, we were also able to use this knowledge to further improve various quality prediction models. In this article, I review our work with the aim to increase awareness on the importance of visual attention mechanisms for the effective design of quality prediction models.

  • Hybrid Integration of Visual Attention Model into Image Quality Metric

    Chanho JUNG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2014/08/22
      Vol:
    E97-D No:11
      Page(s):
    2971-2973

    Integrating the visual attention (VA) model into an objective image quality metric is a rapidly evolving area in modern image quality assessment (IQA) research due to the significant opportunities the VA information presents. So far, in the literature, it has been suggested to use either a task-free saliency map or a quality-task one for the integration into quality metric. A hybrid integration approach which takes the advantages of both saliency maps is presented in this paper. We compare our hybrid integration scheme with existing integration schemes using simple quality metrics. Results show that the proposed method performs better than the previous techniques in terms of prediction accuracy.

  • Distribution of Attention in Augmented Reality: Comparison between Binocular and Monocular Presentation Open Access

    Akihiko KITAMURA  Hiroshi NAITO  Takahiko KIMURA  Kazumitsu SHINOHARA  Takashi SASAKI  Haruhiko OKUMURA  

     
    INVITED PAPER

      Vol:
    E97-C No:11
      Page(s):
    1081-1088

    This study investigated the distribution of attention to frontal space in augmented reality (AR). We conducted two experiments to compare binocular and monocular observation when an AR image was presented. According to a previous study, when participants observed an AR image in monocular presentation, they perceived the AR image as more distant than in binocular vision. Therefore, we predicted that attention would need to be shifted between the AR image and the background in not the monocular observation but the binocular one. This would enable an observer to distribute his/her visual attention across a wider space in the monocular observation. In the experiments, participants performed two tasks concurrently to measure the size of the useful field of view (UFOV). One task was letter/number discrimination in which an AR image was presented in the central field of view (the central task). The other task was luminance change detection in which dots were presented in the peripheral field of view (the peripheral task). Depth difference existed between the AR image and the location of the peripheral task in Experiment 1 but not in Experiment 2. The results of Experiment 1 indicated that the UFOV became wider in the monocular observation than in the binocular observation. In Experiment 2, the size of the UFOV in the monocular observation was equivalent to that in the binocular observation. It becomes difficult for a participant to observe the stimuli on the background in the binocular observation when there is depth difference between the AR image and the background. These results indicate that the monocular presentation in AR is superior to binocular presentation, and even in the best condition for the binocular condition the monocular presentation is equivalent to the binocular presentation in terms of the UFOV.

  • Salient Region Detection Based on Color Uniqueness and Color Spatial Distribution

    Xing ZHANG  Keli HU  Lei WANG  Xiaolin ZHANG  Yingguan WANG  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E97-D No:7
      Page(s):
    1933-1936

    In this study, we address the problem of salient region detection. Recently, saliency detection with contrast based approaches has shown to give promising results. However, different individual features exhibit different performance. In this paper, we show that the combination of color uniqueness and color spatial distribution is an effective way to detect saliency. A Color Adaptive Thresholding Watershed Fusion Segmentation (CAT-WFS) method is first given to retain boundary information and delete unnecessary details. Based on the segmentation, color uniqueness and color spatial distribution are defined separately. The color uniqueness denotes the color rareness of salient object, while the color spatial distribution represents the color attribute of the background. Aiming at highlighting the salient object and downplaying the background, we combine the two characters to generate the final saliency map. Experimental results demonstrate that the proposed algorithm outperforms existing salient object detection methods.

  • Indoor Scene Classification Based on the Bag-of-Words Model of Local Feature Information Gain

    Rong WANG  Zhiliang WANG  Xirong MA  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E96-D No:4
      Page(s):
    984-987

    For the problem of Indoor Home Scene Classification, this paper proposes the BOW Model of Local Feature Information Gain. The experimental results show that not only the performance is improved but also the computation is reduced. Consequently this method out performs the state-of-the-art approach.

  • Computational Models of Human Visual Attention and Their Implementations: A Survey Open Access

    Akisato KIMURA  Ryo YONETANI  Takatsugu HIRAYAMA  

     
    INVITED SURVEY PAPER

      Vol:
    E96-D No:3
      Page(s):
    562-578

    We humans are easily able to instantaneously detect the regions in a visual scene that are most likely to contain something of interest. Exploiting this pre-selection mechanism called visual attention for image and video processing systems would make them more sophisticated and therefore more useful. This paper briefly describes various computational models of human visual attention and their development, as well as related psychophysical findings. In particular, our objective is to carefully distinguish several types of studies related to human visual attention and saliency as a measure of attentiveness, and to provide a taxonomy from several viewpoints such as the main objective, the use of additional cues and mathematical principles. This survey finally discusses possible future directions for research into human visual attention and saliency computation.

  • Skeleton Modulated Topological Perception Map for Rapid Viewpoint Selection

    Zhenfeng SHI  Liyang YU  Ahmed A. ABD EL-LATIF  Xiamu NIU  

     
    LETTER-Computer Graphics

      Vol:
    E95-D No:10
      Page(s):
    2585-2588

    Incorporating insights from human visual perception into 3D object processing has become an important research field in computer graphics during the past decades. Many computational models for different applications have been proposed, such as mesh saliency, mesh roughness and mesh skeleton. In this letter, we present a novel Skeleton Modulated Topological Visual Perception Map (SMTPM) integrated with visual attention and visual masking mechanism. A new skeletonisation map is presented and used to modulate the weight of saliency and roughness. Inspired by salient viewpoint selection, a new Loop subdivision stencil decision based rapid viewpoint selection algorithm using our new visual perception is also proposed. Experimental results show that the SMTPM scheme can capture more richer visual perception information and our rapid viewpoint selection achieves high efficiency.

  • Global-Context Based Salient Region Detection in Nature Images

    Hong BAO  De XU  Yingjun TANG  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E95-D No:5
      Page(s):
    1556-1559

    Visually saliency detection provides an alternative methodology to image description in many applications such as adaptive content delivery and image retrieval. One of the main aims of visual attention in computer vision is to detect and segment the salient regions in an image. In this paper, we employ matrix decomposition to detect salient object in nature images. To efficiently eliminate high contrast noise regions in the background, we integrate global context information into saliency detection. Therefore, the most salient region can be easily selected as the one which is globally most isolated. The proposed approach intrinsically provides an alternative methodology to model attention with low implementation complexity. Experiments show that our approach achieves much better performance than that from the existing state-of-art methods.

  • A Novel Bayes' Theorem-Based Saliency Detection Model

    Xin HE  Huiyun JING  Qi HAN  Xiamu NIU  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E94-D No:12
      Page(s):
    2545-2548

    We propose a novel saliency detection model based on Bayes' theorem. The model integrates the two parts of Bayes' equation to measure saliency, each part of which was considered separately in the previous models. The proposed model measures saliency by computing local kernel density estimation of features in the center-surround region and global kernel density estimation of features at each pixel across the whole image. Under the proposed model, a saliency detection method is presented that extracts DCT (Discrete Cosine Transform) magnitude of local region around each pixel as the feature. Experiments show that the proposed model not only performs competitively on psychological patterns and better than the current state-of-the-art models on human visual fixation data, but also is robust against signal uncertainty.

  • A Novel Saliency-Based Graph Learning Framework with Application to CBIR

    Hong BAO  Song-He FENG  De XU  Shuoyan LIU  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E94-D No:6
      Page(s):
    1353-1356

    Localized content-based image retrieval (LCBIR) has emerged as a hot topic more recently because in the scenario of CBIR, the user is interested in a portion of the image and the rest of the image is irrelevant. In this paper, we propose a novel region-level relevance feedback method to solve the LCBIR problem. Firstly, the visual attention model is employed to measure the regional saliency of each image in the feedback image set provided by the user. Secondly, the regions in the image set are constructed to form an affinity matrix and a novel propagation energy function is defined which takes both low-level visual features and regional significance into consideration. After the iteration, regions in the positive images with high confident scores are selected as the candidate query set to conduct the next-round retrieval task until the retrieval results are satisfactory. Experimental results conducted on the SIVAL dataset demonstrate the effectiveness of the proposed approach.

  • Contour Grouping and Object-Based Attention with Saliency Maps

    Jingjing ZHONG  Siwei LUO  Jiao WANG  

     
    LETTER-Pattern Recognition

      Vol:
    E92-D No:12
      Page(s):
    2531-2534

    The key problem of object-based attention is the definition of objects, while contour grouping methods aim at detecting the complete boundaries of objects in images. In this paper, we develop a new contour grouping method which shows several characteristics. First, it is guided by the global saliency information. By detecting multiple boundaries in a hierarchical way, we actually construct an object-based attention model. Second, it is optimized by the grouping cost, which is decided both by Gestalt cues of directed tangents and by region saliency. Third, it gives a new definition of Gestalt cues for tangents which includes image information as well as tangent information. In this way, we can improve the robustness of our model against noise. Experiment results are shown in this paper, with a comparison against other grouping model and space-based attention model.

81-100hit(111hit)