The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] task(142hit)

1-20hit(142hit)

  • Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution Open Access

    Yuka KO  Katsuhito SUDOH  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2024/05/24
      Vol:
    E107-D No:10
      Page(s):
    1322-1331

    End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.

  • Pool-Unet: A Novel Tongue Image Segmentation Method Based on Pool-Former and Multi-Task Mask Learning Open Access

    Xiangrun LI  Qiyu SHENG  Guangda ZHOU  Jialong WEI  Yanmin SHI  Zhen ZHAO  Yongwei LI  Xingfeng LI  Yang LIU  

     
    PAPER-Image

      Pubricized:
    2024/05/29
      Vol:
    E107-A No:10
      Page(s):
    1609-1620

    Automated tongue segmentation plays a crucial role in the realm of computer-aided tongue diagnosis. The challenge lies in developing algorithms that achieve higher segmentation accuracy and maintain less memory space and swift inference capabilities. To relieve this issue, we propose a novel Pool-unet integrating Pool-former and Multi-task mask learning for tongue image segmentation. First of all, we collected 756 tongue images taken in various shooting environments and from different angles and accurately labeled the tongue under the guidance of a medical professional. Second, we propose the Pool-unet model, combining a hierarchical Pool-former module and a U-shaped symmetric encoder-decoder with skip-connections, which utilizes a patch expanding layer for up-sampling and a patch embedding layer for down-sampling to maintain spatial resolution, to effectively capture global and local information using fewer parameters and faster inference. Finally, a Multi-task mask learning strategy is designed, which improves the generalization and anti-interference ability of the model through the Multi-task pre-training and self-supervised fine-tuning stages. Experimental results on the tongue dataset show that compared to the state-of-the-art method (OET-NET), our method has 25% fewer model parameters, achieves 22% faster inference times, and exhibits 0.91% and 0.55% improvements in Mean Intersection Over Union (MIOU), and Mean Pixel Accuracy (MPA), respectively.

  • Joint Optimization of Task Offloading and Resource Allocation for UAV-Assisted Edge Computing: A Stackelberg Bilayer Game Approach Open Access

    Peng WANG  Guifen CHEN  Zhiyao SUN  

     
    PAPER-Information Network

      Pubricized:
    2024/05/21
      Vol:
    E107-D No:9
      Page(s):
    1174-1181

    Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC) can provide mobile users (MU) with additional computing services and a wide range of connectivity. This paper investigates the joint optimization strategy of task offloading and resource allocation for UAV-assisted MEC systems in complex scenarios with the goal of reducing the total system cost, consisting of task execution latency and energy consumption. We adopt a game theoretic approach to model the interaction process between the MEC server and the MU Stackelberg bilayer game model. Then, the original problem with complex multi-constraints is transformed into a duality problem using the Lagrangian duality method. Furthermore, we prove that the modeled Stackelberg bilayer game has a unique Nash equilibrium solution. In order to obtain an approximate optimal solution to the proposed problem, we propose a two-stage alternating iteration (TASR) algorithm based on the subgradient method and the marginal revenue optimization method. We evaluate the effective performance of the proposed algorithm through detailed simulation experiments. The simulation results show that the proposed algorithm is superior and robust compared to other benchmark methods and can effectively reduce the task execution latency and total system cost in different scenarios.

  • Agent Allocation-Action Learning with Dynamic Heterogeneous Graph in Multi-Task Games Open Access

    Xianglong LI  Yuan LI  Jieyuan ZHANG  Xinhai XU  Donghong LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/04/03
      Vol:
    E107-D No:8
      Page(s):
    1040-1049

    In many real-world problems, a complex task is typically composed of a set of subtasks that follow a certain execution order. Traditional multi-agent reinforcement learning methods perform poorly in such multi-task cases, as they consider the whole problem as one task. For such multi-agent multi-task problems, heterogeneous relationships i.e., subtask-subtask, agent-agent, and subtask-agent, are important characters which should be explored to facilitate the learning performance. This paper proposes a dynamic heterogeneous graph based agent allocation-action learning framework. Specifically, a dynamic heterogeneous graph model is firstly designed to characterize the variation of heterogeneous relationships with the time going on. Then a multi-subgraph partition method is invented to extract features of heterogeneous graphs. Leveraging the extracted features, a hierarchical framework is designed to learn the dynamic allocation of agents among subtasks, as well as cooperative behaviors. Experimental results demonstrate that our framework outperforms recent representative methods on two challenging tasks, i.e., SAVETHECITY and Google Research Football full game.

  • A Joint Coverage Constrained Task Offloading and Resource Allocation Method in MEC Open Access

    Daxiu ZHANG  Xianwei LI  Bo WEI  Yukun SHI  

     
    PAPER-Mobile Information Network and Personal Communications

      Vol:
    E107-A No:8
      Page(s):
    1277-1285

    With the increase of the number of Mobile User Equipments (MUEs), numerous tasks that with high requirements of resources are generated. However, the MUEs have limited computational resources, computing power and storage space. In this paper, a joint coverage constrained task offloading and resource allocation method based on deep reinforcement learning is proposed. The aim is offloading the tasks that cannot be processed locally to the edge servers to alleviate the conflict between the resource constraints of MUEs and the high performance task processing. The studied problem considers the dynamic variability and complexity of the system model, coverage, offloading decisions, communication relationships and resource constraints. An entropy weight method is used to optimize the resource allocation process and balance the energy consumption and execution time. The results of the study show that the number of tasks and MUEs affects the execution time and energy consumption of the task offloading and resource allocation processes in the interest of the service provider, and enhances the user experience.

  • 2D Human Skeleton Action Recognition Based on Depth Estimation Open Access

    Lei WANG  Shanmin YANG  Jianwei ZHANG  Song GU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/02/27
      Vol:
    E107-D No:7
      Page(s):
    869-877

    Human action recognition (HAR) exhibits limited accuracy in video surveillance due to the 2D information captured with monocular cameras. To address the problem, a depth estimation-based human skeleton action recognition method (SARDE) is proposed in this study, with the aim of transforming 2D human action data into 3D format to dig hidden action clues in the 2D data. SARDE comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation. The two tasks are integrated in a multi-task manner in end-to-end training to comprehensively utilize the correlation between action recognition and depth estimation by sharing parameters to learn the depth features effectively for human action recognition. In this study, graph-structured networks with inception blocks and skip connections are investigated for depth estimation. The experimental results verify the effectiveness and superiority of the proposed method in skeleton action recognition that the method reaches state-of-the-art on the datasets.

  • Learning from Repeated Trials without Feedback: Can Collective Intelligence Outperform the Best Members? Open Access

    Yoshiko ARIMA  

     
    PAPER

      Pubricized:
    2023/10/18
      Vol:
    E107-D No:4
      Page(s):
    443-450

    Both group process studies and collective intelligence studies are concerned with “which of the crowds and the best members perform better.” This can be seen as a matter of democracy versus dictatorship. Having evidence of the growth potential of crowds and experts can be useful in making correct predictions and can benefit humanity. In the collective intelligence experimental paradigm, experts' or best members ability is compared with the accuracy of the crowd average. In this research (n = 620), using repeated trials of simple tasks, we compare the correct answer of a class average (index of collective intelligence) and the best member (the one whose answer was closest to the correct answer). The results indicated that, for the cognition task, collective intelligence improved to the level of the best member through repeated trials without feedback; however, it depended on the ability of the best members for the prediction task. The present study suggested that best members' superiority over crowds for the prediction task on the premise of being free from social influence. However, machine learning results suggests that the best members among us cannot be easily found beforehand because they appear through repeated trials.

  • Collecting Balls on a Line by Robots with Limited Energy

    Tesshu HANAKA  Nicolás HONORATO DROGUETT  Kazuhiro KURITA  Hirotaka ONO  Yota OTACHI  

     
    LETTER

      Pubricized:
    2023/10/10
      Vol:
    E107-D No:3
      Page(s):
    325-327

    In this paper, we study BALL COLLECTING WITH LIMITED ENERGY, which is a problem of scheduling robots with limited energy confined to a line to catch moving balls that eventually cross the line. For this problem, we show the NP-completeness of the general case and some algorithmic results for some cases with a small number of robots.

  • Multi-Task Learning of Japanese How-to Tip Machine Reading Comprehension by a Generative Model

    Xiaotian WANG  Tingxuan LI  Takuya TAMURA  Shunsuke NISHIDA  Takehito UTSURO  

     
    PAPER-Natural Language Processing

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    125-134

    In the research of machine reading comprehension of Japanese how-to tip QA tasks, conventional extractive machine reading comprehension methods have difficulty in dealing with cases in which the answer string spans multiple locations in the context. The method of fine-tuning of the BERT model for machine reading comprehension tasks is not suitable for such cases. In this paper, we trained a generative machine reading comprehension model of Japanese how-to tip by constructing a generative dataset based on the website “wikihow” as a source of information. We then proposed two methods for multi-task learning to fine-tune the generative model. The first method is the multi-task learning with a generative and extractive hybrid training dataset, where both generative and extractive datasets are simultaneously trained on a single model. The second method is the multi-task learning with the inter-sentence semantic similarity and answer generation, where, drawing upon the answer generation task, the model additionally learns the distance between the sentences of the question/context and the answer in the training examples. The evaluation results showed that both of the multi-task learning methods significantly outperformed the single-task learning method in generative question-and-answer examples. Between the two methods for multi-task learning, that with the inter-sentence semantic similarity and answer generation performed the best in terms of the manual evaluation result. The data and the code are available at https://github.com/EternalEdenn/multitask_ext-gen_sts-gen.

  • A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition

    Yang LIU  Yuqi XIA  Haoqin SUN  Xiaolei MENG  Jianxiong BAI  Wenbo GUAN  Zhen ZHAO  Yongwei LI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/12/08
      Vol:
    E106-A No:6
      Page(s):
    876-885

    Speech emotion recognition (SER) has been a complex and difficult task for a long time due to emotional complexity. In this paper, we propose a multitask deep learning approach based on cascaded attention network and self-adaption loss for SER. First, non-personalized features are extracted to represent the process of emotion change while reducing external variables' influence. Second, to highlight salient speech emotion features, a cascade attention network is proposed, where spatial temporal attention can effectively locate the regions of speech that express emotion, while self-attention reduces the dependence on external information. Finally, the influence brought by the differences in gender and human perception of external information is alleviated by using a multitask learning strategy, where a self-adaption loss is introduced to determine the weights of different tasks dynamically. Experimental results on IEMOCAP dataset demonstrate that our method gains an absolute improvement of 1.97% and 0.91% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.

  • Intrinsic Representation Mining for Zero-Shot Slot Filling

    Sixia LI  Shogo OKADA  Jianwu DANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/08/19
      Vol:
    E105-D No:11
      Page(s):
    1947-1956

    Zero-shot slot filling is a domain adaptation approach to handle unseen slots in new domains without training instances. Previous studies implemented zero-shot slot filling by predicting both slot entities and slot types. Because of the lack of knowledge about new domains, the existing methods often fail to predict slot entities for new domains as well as cannot effectively predict unseen slot types even when slot entities are correctly identified. Moreover, for some seen slot types, those methods may suffer from the domain shift problem, because the unseen context in new domains may change the explanations of the slots. In this study, we propose intrinsic representations to alleviate the domain shift problems above. Specifically, we propose a multi-relation-based representation to capture both the general and specific characteristics of slot entities, and an ontology-based representation to provide complementary knowledge on the relationships between slots and values across domains, for handling both unseen slot types and unseen contexts. We constructed a two-step pipeline model using the proposed representations to solve the domain shift problem. Experimental results in terms of the F1 score on three large datasets—Snips, SGD, and MultiWOZ 2.3—showed that our model outperformed state-of-the-art baselines by 29.62, 10.38, and 3.89, respectively. The detailed analysis with the average slot F1 score showed that our model improved the prediction by 25.82 for unseen slot types and by 10.51 for seen slot types. The results demonstrated that the proposed intrinsic representations can effectively alleviate the domain shift problem for both unseen slot types and seen slot types with unseen contexts.

  • A Hierarchical Memory Model for Task-Oriented Dialogue System

    Ya ZENG  Li WAN  Qiuhong LUO  Mao CHEN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/05/16
      Vol:
    E105-D No:8
      Page(s):
    1481-1489

    Traditional pipeline methods for task-oriented dialogue systems are designed individually and expensively. Existing memory augmented end-to-end methods directly map the inputs to outputs and achieve promising results. However, the most existing end-to-end solutions store the dialogue history and knowledge base (KB) information in the same memory and represent KB information in the form of KB triples, making the memory reader's reasoning on the memory more difficult, which makes the system difficult to retrieve the correct information from the memory to generate a response. Some methods introduce many manual annotations to strengthen reasoning. To reduce the use of manual annotations, while strengthening reasoning, we propose a hierarchical memory model (HM2Seq) for task-oriented systems. HM2Seq uses a hierarchical memory to separate the dialogue history and KB information into two memories and stores KB in KB rows, then we use memory rows pointer combined with an entity decoder to perform hierarchical reasoning over memory. The experimental results on two publicly available task-oriented dialogue datasets confirm our hypothesis and show the outstanding performance of our HM2Seq by outperforming the baselines.

  • Contextualized Language Generation on Visual-to-Language Storytelling

    Rizal Setya PERDANA  Yoshiteru ISHIDA  

     
    PAPER

      Pubricized:
    2022/01/17
      Vol:
    E105-D No:5
      Page(s):
    873-886

    This study presents a formulation for generating context-aware natural language by machine from visual representation. Given an image sequence input, the visual storytelling task (VST) aims to generate a coherent, object-focused, and contextualized sentence story. Previous works in this domain faced a problem in modeling an architecture that works in temporal multi-modal data, which led to a low-quality output, such as low lexical diversity, monotonous sentences, and inaccurate context. This study introduces a further improvement, that is, an end-to-end architecture, called cross-modal contextualize attention, optimized to extract visual-temporal features and generate a plausible story. Visual object and non-visual concept features are encoded from the convolutional feature map, and object detection features are joined with language features. Three scenarios are defined in decoding language generation by incorporating weights from a pre-trained language generation model. Extensive experiments are conducted to confirm that the proposed model outperforms other models in terms of automatic metrics and manual human evaluation.

  • Specification and Verification of Multitask Real-Time Systems Using the OTS/CafeOBJ Method

    Masaki NAKAMURA  Shuki HIGASHI  Kazutoshi SAKAKIBARA  Kazuhiro OGATA  

     
    PAPER

      Pubricized:
    2021/09/24
      Vol:
    E105-A No:5
      Page(s):
    823-832

    Because processes run concurrently in multitask systems, the size of the state space grows exponentially. Therefore, it is not straightforward to formally verify that such systems enjoy desired properties. Real-time constrains make the formal verification more challenging. In this paper, we propose the following to address the challenge: (1) a way to model multitask real-time systems as observational transition systems (OTSs), a kind of state transition systems, (2) a way to describe their specifications in CafeOBJ, an algebraic specification language, and (3) a way to verify that such systems enjoy desired properties based on such formal specifications by writing proof scores, proof plans, in CafeOBJ. As a case study, we model Fischer's protocol, a well-known real-time mutual exclusion protocol, as an OTS, describe its specification in CafeOBJ, and verify that the protocol enjoys the mutual exclusion property when an arbitrary number of processes participates in the protocol*.

  • Assessment System of Presentation Slide Design Using Visual and Structural Features

    Shengzhou YI  Junichiro MATSUGAMI  Toshihiko YAMASAKI  

     
    PAPER

      Pubricized:
    2021/12/01
      Vol:
    E105-D No:3
      Page(s):
    587-596

    Developing well-designed presentation slides is challenging for many people, especially novices. The ability to build high quality slideshows is becoming more important in society. In this study, a neural network was used to identify novice vs. well-designed presentation slides based on visual and structural features. For such a purpose, a dataset containing 1,080 slide pairs was newly constructed. One of each pair was created by a novice, and the other was the improved one by the same person according to the experts' advice. Ten checkpoints frequently pointed out by professional consultants were extracted and set as prediction targets. The intrinsic problem was that the label distribution was imbalanced, because only a part of the samples had corresponding design problems. Therefore, re-sampling methods for addressing class imbalance were applied to improve the accuracy of the proposed model. Furthermore, we combined the target task with an assistant task for transfer and multi-task learning, which helped the proposed model achieve better performance. After the optimal settings were used for each checkpoint, the average accuracy of the proposed model rose up to 81.79%. With the advice provided by our assessment system, the novices significantly improved their slide design.

  • Simultaneous Scheduling and Core-Type Optimization for Moldable Fork-Join Tasks on Heterogeneous Multicores

    Hiroki NISHIKAWA  Kana SHIMADA  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  

     
    PAPER

      Pubricized:
    2021/09/01
      Vol:
    E105-A No:3
      Page(s):
    540-548

    With the demand for energy-efficient and high- performance computing, multicore architecture has become more appealing than ever. Multicore task scheduling is one of domains in parallel computing which exploits the parallelism of multicore. Unlike traditional scheduling, multicore task scheduling has recently been studied on the assumption that tasks have inherent parallelism and can be split into multiple sub-tasks in data parallel fashion. However, it is still challenging to properly determine the degree of parallelism of tasks and mapping on multicores. Our proposed scheduling techniques determine the degree of parallelism of tasks, and sub-tasks are decided which type of cores to be assigned to heterogeneous multicores. In addition, two approaches to hardware/software codesign for heterogeneous multicore systems are proposed. The works optimize the types of cores organized in the architecture simultaneously with scheduling of the tasks such that the overall energy consumption is minimized under a deadline constraint, a warm start approach is also presented to effectively solve the problem. The experimental results show the simultaneous scheduling and core-type optimization technique remarkably reduces the energy consumption.

  • Layerweaver+: A QoS-Aware Layer-Wise DNN Scheduler for Multi-Tenant Neural Processing Units

    Young H. OH  Yunho JIN  Tae Jun HAM  Jae W. LEE  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2021/11/11
      Vol:
    E105-D No:2
      Page(s):
    427-431

    Many cloud service providers employ specialized hardware accelerators, called neural processing units (NPUs), to accelerate deep neural networks (DNNs). An NPU scheduler is responsible for scheduling incoming user requests and required to satisfy the two, often conflicting, optimization goals: maximizing system throughput and satisfying quality-of-service (QoS) constraints (e.g., deadlines) of individual requests. We propose Layerweaver+, a low-cost layer-wise DNN scheduler for NPUs, which provides both high system throughput and minimal QoS violations. For a serving scenario based on the industry-standard MLPerf inference benchmark, Layerweaver+ significantly improves the system throughput by up to 266.7% over the baseline scheduler serving one DNN at a time.

  • FPGA Implementation of 3-Bit Quantized Multi-Task CNN for Contour Detection and Disparity Estimation

    Masayuki MIYAMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/10/26
      Vol:
    E105-D No:2
      Page(s):
    406-414

    Object contour detection is a task of extracting the shape created by the boundaries between objects in an image. Conventional methods limit the detection targets to specific categories, or miss-detect edges of patterns inside an object. We propose a new method to represent a contour image where the pixel value is the distance to the boundary. Contour detection becomes a regression problem that estimates this contour image. A deep convolutional network for contour estimation is combined with stereo vision to detect unspecified object contours. Furthermore, thanks to similar inference targets and common network structure, we propose a network that simultaneously estimates both contour and disparity with fully shared weights. As a result of experiments, the multi-tasking network drew a good precision-recall curve, and F-measure was about 0.833 for FlyingThings3D dataset. L1 loss of disparity estimation for the dataset was 2.571. This network reduces the amount of calculation and memory capacity by half, and accuracy drop compared to the dedicated networks is slight. Then we quantize both weights and activations of the network to 3-bit. We devise a dedicated hardware architecture for the quantized CNN and implement it on an FPGA. This circuit uses only internal memory to perform forward propagation calculations, that eliminates high-power external memory accesses. This circuit is a stall-free pixel-by-pixel pipeline, and performs 8 rows, 16 input channels, 16 output channels, 3 by 3 pixels convolution calculations in parallel. The convolution calculation performance at the operating frequency of 250 MHz is 9 TOPs/s.

  • Smaller Residual Network for Single Image Depth Estimation

    Andi HENDRA  Yasushi KANAZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/08/17
      Vol:
    E104-D No:11
      Page(s):
    1992-2001

    We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.

  • A Multi-Task Scheme for Supervised DNN-Based Single-Channel Speech Enhancement by Using Speech Presence Probability as the Secondary Training Target

    Lei WANG  Jie ZHU  Kangbo SUN  

    This paper has been cancelled due to violation of duplicate submission policy on IEICE Transactions on Information and Systems.
     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    1963-1970

    To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.

1-20hit(142hit)