The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] task(142hit)

21-40hit(142hit)

  • Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information

    Jae-Won KIM  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/07/14
      Vol:
    E104-D No:10
      Page(s):
    1762-1765

    We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.

  • Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

    Noriyuki TONAMI  Keisuke IMOTO  Ryosuke YAMANISHI  Yoichi YAMASHITA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/11/19
      Vol:
    E104-D No:2
      Page(s):
    294-301

    Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separately even though sound events and acoustic scenes are closely related to each other. For example, in the acoustic scene “office,” the sound events “mouse clicking” and “keyboard typing” are likely to occur. Therefore, it is expected that information on sound events and acoustic scenes will be of mutual aid for SED and ASC. In this paper, we propose multitask learning for joint analysis of sound events and acoustic scenes, in which the parts of the networks holding information on sound events and acoustic scenes in common are shared. Experimental results obtained using the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of SED and ASC by 1.31 and 1.80 percentage points in terms of the F-score, respectively, compared with the conventional CRNN-based method.

  • Integration of Experts' and Beginners' Machine Operation Experiences to Obtain a Detailed Task Model

    Longfei CHEN  Yuichi NAKAMURA  Kazuaki KONDO  Dima DAMEN  Walterio MAYOL-CUEVAS  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2020/10/02
      Vol:
    E104-D No:1
      Page(s):
    152-161

    We propose a novel framework for integrating beginners' machine operational experiences with those of experts' to obtain a detailed task model. Beginners can provide valuable information for operation guidance and task design; for example, from the operations that are easy or difficult for them, the mistakes they make, and the strategy they tend to choose. However, beginners' experiences often vary widely and are difficult to integrate directly. Thus, we consider an operational experience as a sequence of hand-machine interactions at hotspots. Then, a few experts' experiences and a sufficient number of beginners' experiences are unified using two aggregation steps that align and integrate sequences of interactions. We applied our method to more than 40 experiences of a sewing task. The results demonstrate good potential for modeling and obtaining important properties of the task.

  • Multi-Category Image Super-Resolution with Convolutional Neural Network and Multi-Task Learning

    Kazuya URAZOE  Nobutaka KUROKI  Yu KATO  Shinya OHTANI  Tetsuya HIROSE  Masahiro NUMA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/10/02
      Vol:
    E104-D No:1
      Page(s):
    183-193

    This paper presents an image super-resolution technique using a convolutional neural network (CNN) and multi-task learning for multiple image categories. The image categories include natural, manga, and text images. Their features differ from each other. However, several CNNs for super-resolution are trained with a single category. If the input image category is different from that of the training images, the performance of super-resolution is degraded. There are two possible solutions to manage multi-categories with conventional CNNs. The first involves the preparation of the CNNs for every category. This solution, however, requires a category classifier to select an appropriate CNN. The second is to learn all categories with a single CNN. In this solution, the CNN cannot optimize its internal behavior for each category. Therefore, this paper presents a super-resolution CNN architecture for multiple image categories. The proposed CNN has two parallel outputs for a high-resolution image and a category label. The main CNN for the high-resolution image is a normal three convolutional layer-architecture, and the sub neural network for the category label is branched out from its middle layer and consists of two fully-connected layers. This architecture can simultaneously learn the high-resolution image and its category using multi-task learning. The category information is used for optimizing the super-resolution. In an applied setting, the proposed CNN can automatically estimate the input image category and change the internal behavior. Experimental results of 2× image magnification have shown that the average peak signal-to-noise ratio for the proposed method is approximately 0.22 dB higher than that for the conventional super-resolution with no difference in processing time and parameters. We have ensured that the proposed method is useful when the input image category is varying.

  • Multi-Task Convolutional Neural Network Leading to High Performance and Interpretability via Attribute Estimation

    Keisuke MAEDA  Kazaha HORII  Takahiro OGAWA  Miki HASEYAMA  

     
    LETTER-Neural Networks and Bioengineering

      Vol:
    E103-A No:12
      Page(s):
    1609-1612

    A multi-task convolutional neural network leading to high performance and interpretability via attribute estimation is presented in this letter. Our method can provide interpretation of the classification results of CNNs by outputting attributes that explain elements of objects as a judgement reason of CNNs in the middle layer. Furthermore, the proposed network uses the estimated attributes for the following prediction of classes. Consequently, construction of a novel multi-task CNN with improvements in both of the interpretability and classification performance is realized.

  • Joint Multi-Patch and Multi-Task CNNs for Robust Face Recognition

    Yanfei LIU  Junhua CHEN  Yu QIU  

     
    PAPER-Pattern Recognition

      Pubricized:
    2020/07/02
      Vol:
    E103-D No:10
      Page(s):
    2178-2187

    In this paper, we present a joint multi-patch and multi-task convolutional neural networks (JMM-CNNs) framework to learn more descriptive and robust face representation for face recognition. In the proposed JMM-CNNs, a set of multi-patch CNNs and a feature fusion network are constructed to learn and fuse global and local facial features, then a multi-task learning algorithm, including face recognition task and pose estimation task, is operated on the fused feature to obtain a pose-invariant face representation for the face recognition task. To further enhance the pose insensitiveness of the learned face representation, we also introduce a similarity regularization term on features of the two tasks to propose a regularization loss. Moreover, a simple but effective patch sampling strategy is applied to make the JMM-CNNs have an end-to-end network architecture. Experiments on Multi-PIE dataset demonstrate the effectiveness of the proposed method, and we achieve a competitive performance compared with state-of-the-art methods on Labeled Face in the Wild (LFW), YouTube Faces (YTF) and MegaFace Challenge.

  • User-Assisted QoS Control for QoE Enhancement in Audiovisual and Haptic Interactive IP Communications

    Toshiro NUNOME  Suguru KAEDE  Shuji TASAKA  

     
    PAPER-Network

      Pubricized:
    2020/04/21
      Vol:
    E103-B No:10
      Page(s):
    1107-1116

    In this paper, we propose a user-assisted QoS control scheme that utilizes media adaptive buffering to enhance QoE of audiovisual and haptic IP communications. The scheme consists of two modes: a manual mode and an automatic mode. It enables users to switch between these two modes according to their inclinations. We compare four QoS control schemes: the manual mode only, the automatic mode only, the switching scheme starting with the manual mode, and the switching scheme starting with the automatic mode. We assess the effects of the four schemes, user attributes, and tasks on QoE through a subjective experiment which provides information on users' behavior in addition to QoE scores. As a result of the experiment, we show that the user-assisted QoS control scheme can enhance QoE. Furthermore, we notice that the proper QoS control scheme depends on user attributes and tasks.

  • A Double Adversarial Network Model for Multi-Domain and Multi-Task Chinese Named Entity Recognition

    Yun HU  Changwen ZHENG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2020/04/01
      Vol:
    E103-D No:7
      Page(s):
    1744-1752

    Named Entity Recognition (NER) systems are often realized by supervised methods such as CRF and neural network methods, which require large annotated data. In some domains that small annotated training data is available, multi-domain or multi-task learning methods are often used. In this paper, we explore the methods that use news domain and Chinese Word Segmentation (CWS) task to improve the performance of Chinese named entity recognition in weibo domain. We first propose two baseline models combining multi-domain and multi-task information. The two baseline models share information between different domains and tasks through sharing parameters simply. Then, we propose a Double ADVersarial model (DoubADV model). The model uses two adversarial networks considering the shared and private features in different domains and tasks. Experimental results show that our DoubADV model outperforms other baseline models and achieves state-of-the-art performance compared with previous works in multi-domain and multi-task situation.

  • Adaptive Balanced Allocation for Peer Assessments

    Hideaki OHASHI  Yasuhito ASANO  Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  

     
    PAPER

      Pubricized:
    2019/12/26
      Vol:
    E103-D No:5
      Page(s):
    939-948

    Peer assessments, in which people review the works of peers and have their own works reviewed by peers, are useful for assessing homework. In conventional peer assessment systems, works are usually allocated to people before the assessment begins; therefore, if people drop out (abandoning reviews) during an assessment period, an imbalance occurs between the number of works a person reviews and that of peers who have reviewed the work. When the total imbalance increases, some people who diligently complete reviews may suffer from a lack of reviews and be discouraged to participate in future peer assessments. Therefore, in this study, we adopt a new adaptive allocation approach in which people are allocated review works only when requested and propose an algorithm for allocating works to people, which reduces the total imbalance. To show the effectiveness of the proposed algorithm, we provide an upper bound of the total imbalance that the proposed algorithm yields. In addition, we extend the above algorithm to consider reviewing ability. The extended algorithm avoids the problem that only unskilled (or skilled) reviewers are allocated to a given work. We show the effectiveness of the proposed two algorithms compared to the existing algorithms through experiments using simulation data.

  • Measurement of Fatigue Based on Changes in Eye Movement during Gaze

    Yuki KUROSAWA  Shinya MOCHIDUKI  Yuko HOSHINO  Mitsuho YAMADA  

     
    LETTER-Multimedia Pattern Processing

      Pubricized:
    2020/02/20
      Vol:
    E103-D No:5
      Page(s):
    1203-1207

    We measured eye movements at gaze points while subjects performed calculation tasks and examined the relationship between the eye movements and fatigue and/or internal state of a subject by tasks. It was suggested that fatigue and/or internal state of a subject affected eye movements at gaze points and that we could measure them using eye movements at gaze points in real time.

  • Improvement in the Effectiveness of Cutting Skill Practice for Paper-Cutting Creations Based on the Steering Law

    Takafumi HIGASHI  Hideaki KANAI  

     
    PAPER

      Pubricized:
    2019/11/29
      Vol:
    E103-D No:4
      Page(s):
    730-738

    To improve the cutting skills of learners, we developed a method for improving the skill involved in creating paper cuttings based on a steering task in the field of human-computer interaction. TaWe made patterns using the white and black boundaries that make up a picture. The index of difficulty (ID) is a numerical value based on the width and distance of the steering law. First, we evaluated novice and expert pattern-cutters, and measured their moving time (MT), error rate, and compliance with the steering law, confirming that the MT and error rate are affected by pattern width and distance. Moreover, we quantified the skills of novices and experts using ID and MT based models. We then observed changes in the cutting skills of novices who practiced with various widths and evaluated the impact of the difficulty level on skill improvement. Patterns considered to be moderately difficult for novices led to a significant improvement in skills.

  • An Energy-Efficient Task Scheduling for Near Real-Time Systems on Heterogeneous Multicore Processors

    Takashi NAKADA  Hiroyuki YANAGIHASHI  Kunimaro IMAI  Hiroshi UEKI  Takashi TSUCHIYA  Masanori HAYASHIKOSHI  Hiroshi NAKAMURA  

     
    PAPER-Software System

      Pubricized:
    2019/11/01
      Vol:
    E103-D No:2
      Page(s):
    329-338

    Near real-time periodic tasks, which are popular in multimedia streaming applications, have deadline periods that are longer than the input intervals thanks to buffering. For such applications, the conventional frame-based schedulings cannot realize the optimal scheduling due to their shortsighted deadline assumptions. To realize globally energy-efficient executions of these applications, we propose a novel task scheduling algorithm, which takes advantage of the long deadline period. We confirm our approach can take advantage of the longer deadline period and reduce the average power consumption by up to 18%.

  • Hand-Dorsa Vein Recognition Based on Task-Specific Cross-Convolutional-Layer Pooling Open Access

    Jun WANG  Yulian LI  Zaiyu PAN  

     
    LETTER-Pattern Recognition

      Pubricized:
    2019/09/09
      Vol:
    E102-D No:12
      Page(s):
    2628-2631

    Hand-dorsa vein recognition is solved based on the convolutional activations of the pre-trained deep convolutional neural network (DCNN). In specific, a novel task-specific cross-convolutional-layer pooling is proposed to obtain the more representative and discriminative feature representation. Rigorous experiments on the self-established database achieves the state-of-the-art recognition result, which demonstrates the effectiveness of the proposed model.

  • Simultaneous Estimation of Dish Locations and Calories with Multi-Task Learning Open Access

    Takumi EGE  Keiji YANAI  

     
    PAPER

      Pubricized:
    2019/04/25
      Vol:
    E102-D No:7
      Page(s):
    1240-1246

    In recent years, a rise in healthy eating has led to various food management applications which have image recognition function to record everyday meals automatically. However, most of the image recognition functions in the existing applications are not directly useful for multiple-dish food photos and cannot automatically estimate food calories. Meanwhile, methodologies on image recognition have advanced greatly because of the advent of Convolutional Neural Network (CNN). CNN has improved accuracies of various kinds of image recognition tasks such as classification and object detection. Therefore, we propose CNN-based food calorie estimation for multiple-dish food photos. Our method estimates dish locations and food calories simultaneously by multi-task learning of food dish detection and food calorie estimation with a single CNN. It is expected to achieve high speed and small network size by simultaneous estimation in a single network. Because currently there is no dataset of multiple-dish food photos annotated with both bounding boxes and food calories, in this work we use two types of datasets alternately for training a single CNN. For the two types of datasets, we use multiple-dish food photos annotated with bounding boxes and single-dish food photos with food calories. Our results showed that our multi-task method achieved higher accuracy, higher speed and smaller network size than a sequential model of food detection and food calorie estimation.

  • Scalable State Space Search with Structural-Bottleneck Heuristics for Declarative IT System Update Automation Open Access

    Takuya KUWAHARA  Takayuki KURODA  Manabu NAKANOYA  Yutaka YAKUWA  Hideyuki SHIMONISHI  

     
    PAPER

      Pubricized:
    2018/09/20
      Vol:
    E102-B No:3
      Page(s):
    439-451

    As IT systems, including network systems using SDN/NFV technologies, become large-scaled and complicated, the cost of system management also increases rapidly. Network operators have to maintain their workflow in constructing and consistently updating such complex systems, and thus these management tasks in generating system update plan are desired to be automated. Declarative system update with state space search is a promising approach to enable this automation, however, the current methods is not enough scalable to practical systems. In this paper, we propose a novel heuristic approach to greatly reduce computation time to solve system update procedure for practical systems. Our heuristics accounts for structural bottleneck of the system update and advance search to resolve bottlenecks of current system states. This paper includes the following contributions: (1) formal definition of a novel heuristic function specialized to system update for A* search algorithm, (2) proofs that our heuristic function is consistent, i.e., A* algorithm with our heuristics returns a correct optimal solution and can omit repeatedly expansion of nodes in search spaces, and (3) results of performance evaluation of our heuristics. We evaluate the proposed algorithm in two cases; upgrading running hypervisor and rolling update of running VMs. The results show that computation time to solve system update plan for a system with 100 VMs does not exceed several minutes, whereas the conventional algorithm is only applicable for a very small system.

  • Hotspot Modeling of Hand-Machine Interaction Experiences from a Head-Mounted RGB-D Camera

    Longfei CHEN  Yuichi NAKAMURA  Kazuaki KONDO  Walterio MAYOL-CUEVAS  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2018/11/12
      Vol:
    E102-D No:2
      Page(s):
    319-330

    This paper presents an approach to analyze and model tasks of machines being operated. The executions of the tasks were captured through egocentric vision. Each task was decomposed into a sequence of physical hand-machine interactions, which are described with touch-based hotspots and interaction patterns. Modeling the tasks was achieved by integrating the experiences of multiple experts and using a hidden Markov model (HMM). Here, we present the results of more than 70 recorded egocentric experiences of the operation of a sewing machine. Our methods show good potential for the detection of hand-machine interactions and modeling of machine operation tasks.

  • Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms

    Chanyoung OH  Saehanseul YI  Youngmin YI  

     
    PAPER-Real-time Systems

      Pubricized:
    2018/09/18
      Vol:
    E101-D No:12
      Page(s):
    2878-2888

    As energy efficiency has become a major design constraint or objective, heterogeneous manycore architectures have emerged as mainstream target platforms not only in server systems but also in embedded systems. Manycore accelerators such as GPUs are getting also popular in embedded domains, as well as the heterogeneous CPU cores. However, as the number of cores in an embedded GPU is far less than that of a server GPU, it is important to utilize both heterogeneous multi-core CPUs and GPUs to achieve the desired throughput with the minimal energy consumption. In this paper, we present a case study of mapping LBP-based face detection onto a recent CPU-GPU heterogeneous embedded platform, which exploits both task parallelism and data parallelism to achieve maximal energy efficiency with a real-time constraint. We first present the parallelization technique of each task for the GPU execution, then we propose performance and energy models for both task-parallel and data-parallel executions on heterogeneous processors, which are used in design space exploration for the optimal mapping. The design space is huge since not only processor heterogeneity such as CPU-GPU and big.LITTLE, but also various data partitioning ratios for the data-parallel execution on these heterogeneous processors are considered. In our case study of LBP face detection on Exynos 5422, the estimation error of the proposed performance and energy models were on average -2.19% and -3.67% respectively. By systematically finding the optimal mappings with the proposed models, we could achieve 28.6% less energy consumption compared to the manual mapping, while still meeting the real-time constraint.

  • A Comparison Study on Front- and Back-of-Device Touch Input for Handheld Displays

    Liang CHEN  Dongyi CHEN  Xiao CHEN  

     
    BRIEF PAPER

      Vol:
    E101-C No:11
      Page(s):
    880-883

    Touch screen has become the mainstream manipulation technique on handheld devices. However, its innate limitations, e.g. the occlusion problem and fat finger problem, lower user experience in many use scenarios on handheld displays. Back-of-device interaction, which makes use of input units on the rear of a device for interaction, is one of the most promising approaches to address the above problems. In this paper, we present the findings of a user study in which we explored users' pointing performances in using two types of touch input on handheld devices. The results indicate that front-of-device touch input is averagely about two times as fast as back-of-device touch input but with higher error rates especially in acquiring the narrower targets. Based on the results of our study, we argue that in the premise of keeping the functionalities and layouts of current mainstream user interfaces back-of-device touch input should be treated as a supplement to front-of-device touch input rather than a replacement.

  • Impact of Viewing Distance on Task Performance and Its Properties

    Makio ISHIHARA  Yukio ISHIHARA  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2018/07/02
      Vol:
    E101-D No:10
      Page(s):
    2530-2533

    This paper discusses VDT syndrome from the point of view of the viewing distance between a computer screen and user's eyes. This paper conducts a series of experiments to show an impact of the viewing distance on task performance. In the experiments, two different viewing distances of 50cm and 350cm with the same viewing angle of 30degrees are taken into consideration. The results show that the long viewing distance enables people to manipulate the mouse more slowly, more correctly and more precisely than the short.

  • Hardware Architecture for High-Speed Object Detection Using Decision Tree Ensemble

    Koichi MITSUNARI  Jaehoon YU  Takao ONOYE  Masanori HASHIMOTO  

     
    PAPER

      Vol:
    E101-A No:9
      Page(s):
    1298-1307

    Visual object detection on embedded systems involves a multi-objective optimization problem in the presence of trade-offs between power consumption, processing performance, and detection accuracy. For a new Pareto solution with high processing performance and low power consumption, this paper proposes a hardware architecture for decision tree ensemble using multiple channels of features. For efficient detection, the proposed architecture utilizes the dimensionality of feature channels in addition to parallelism in image space and adopts task scheduling to attain random memory access without conflict. Evaluation results show that an FPGA implementation of the proposed architecture with an aggregated channel features pedestrian detector can process 229 million samples per second at 100MHz operation frequency while it requires a relatively small amount of resources. Consequently, the proposed architecture achieves 350fps processing performance for 1080P Full HD images and outperforms conventional object detection hardware architectures developed for embedded systems.

21-40hit(142hit)