Xu CHENG Nijun LI Tongchi ZHOU Zhenyang WU Lin ZHOU
In this paper, we propose an efficient tracking method that is formulated as a multi-task reverse sparse representation problem. The proposed method learns the representation of all tasks jointly using a customized APG method within several iterations. In order to reduce the computational complexity, the proposed tracking algorithm starts from a feature selection scheme that chooses suitable number of features from the object and background in the dynamic environment. Based on the selected feature, multiple templates are constructed with a few candidates. The candidate that corresponds to the highest similarity to the object templates is considered as the final tracking result. In addition, we present a template update scheme to capture the appearance changes of the object. At the same time, we keep several earlier templates in the positive template set unchanged to alleviate the drifting problem. Both qualitative and quantitative evaluations demonstrate that the proposed tracking algorithm performs favorably against the state-of-the-art methods.
Kai HUANG Min YU Xiaomeng ZHANG Dandan ZHENG Siwen XIU Rongjie YAN Kai HUANG Zhili LIU Xiaolang YAN
The increasing complexity of embedded applications and the prevalence of multiprocessor system-on-chip (MPSoC) introduce a great challenge for designers on how to achieve performance and programmability simultaneously in embedded systems. Automatic multithreaded code generation methods taking account of performance optimization techniques can be an effective solution. In this paper, we consider the issue of increasing processor utilization and reducing communication cost during multithreaded code generation from Simulink models to improve system performance. We propose a combination of three-layered multithreaded software with Integer Linear Programming (ILP) based design-time mapping and scheduling policies to get optimal performance. The hierarchical software with a thread layer increases processor usage, while the mapping and scheduling policies formulate a group of integer linear programming formulations to minimize communication cost as well as to maximize performance. Experimental results demonstrate the advantages of the proposed techniques on performance improvements.
Jaak SIMM Ildefons MAGRANS DE ABRIL Masashi SUGIYAMA
Multi-task learning is an important area of machine learning that tries to learn multiple tasks simultaneously to improve the accuracy of each individual task. We propose a new tree-based ensemble multi-task learning method for classification and regression (MT-ExtraTrees), based on Extremely Randomized Trees. MT-ExtraTrees is able to share data between tasks minimizing negative transfer while keeping the ability to learn non-linear solutions and to scale well to large datasets.
This paper proposes a new approach to defining and expressing algorithms: the notion of task logical algorithms. This notion allows the user to define an algorithm for a task T as a set of agents who can collectively perform T. This notion considerably simplifies the algorithm development process and can be seen as an integration of the sequential pseudocode and logical algorithms. This observation requires some changes to algorithm development process. We propose a two-step approach: the first step is to define an algorithm for a task T via a set of agents that can collectively perform T. The second step is to translate these agents into (higher-order) computability logic.
Qiang SONG Takayuki KAWABATA Fumiaki ITOH Yousuke WATANABE Haruo YOKOTA
The numbers of files in file systems have increased dramatically in recent years. Office workers spend much time and effort searching for the documents required for their jobs. To reduce these costs, we propose a new method for recommending files and operations on them. Existing technologies for recommendation, such as collaborative filtering, suffer from two problems. First, they can only work with documents that have been accessed in the past, so that they cannot recommend when only newly generated documents are inputted. Second, they cannot easily handle sequences involving similar or differently ordered elements because of the strict matching used in the access sequences. To solve these problems, such minor variations should be ignored. In our proposed method, we introduce the concepts of abstract files as groups of similar files used for a similar purpose, abstract tasks as groups of similar tasks, and frequent abstract workflows grouped from similar workflows, which are sequences of abstract tasks. In experiments using real file-access logs, we confirmed that our proposed method could extract workflow patterns with longer sequences and higher support-count values, which are more suitable as recommendations. In addition, the F-measure for the recommendation results was improved significantly, from 0.301 to 0.598, compared with a method that did not use the concepts of abstract tasks and abstract workflows.
Shouyi YIN Rui SHI Leibo LIU Shaojun WEI
Coarse-grained Reconfigurable Architecture (CGRA) is a parallel computing platform that provides both high performance of hardware and high flexibility of software. It is becoming a promising platform for embedded and mobile applications. Since the embedded and mobile devices are usually battery-powered, improving battery lifetime becomes one of the primary design issues in using CGRAs. In this paper, we propose a battery-aware task-mapping method to optimize energy consumption and improve battery lifetime. The proposed method mainly addresses two problems: task partitioning and task scheduling when mapping applications onto CGRA. The task partitioning and scheduling are formulated as a joint optimization problem of minimizing the energy consumption. The nonlinear effects of real battery are taken into account in problem formulation. Using the insights from the problem formulation, we design the task-mapping algorithm. We have used several real-world benchmarks to test the effectiveness of the proposed method. Experiment results show that our method can dramatically lower the energy consumption and prolong the battery-life.
Energy-harvesting devices are materials that allow ambient energy sources to be converters into usable electrical power. While a battery powers the modern embedded systems, these energy-harvesting devices power the energy-harvesting embedded systems. This claims a new energy efficient management techniques for the energy-harvesting systems dislike the previous management techniques. The higher entire system efficiency in an energy-harvesting system can be obtained by a higher generating efficiency, a higher consuming efficiency, or a higher transferring efficiency. This paper presents a generalized technique for a dynamic reconfiguration and a task scheduling considering the power loss in DC-DC converters in the system. The proposed technique minimizes the power loss in the DC-DC converter and charger of the system. The proposed technique minimizes the power loss in the DC-DC converters and charger of the system. Experiments with actual application demonstrate that our approach reduces the total energy consumption by 22% in average over the conventional approach.
Fumihiko INO Shinta NAKAGAWA Kenichi HAGIHARA
This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.
Shuai MU Dongdong LI Yubei CHEN Yangdong DENG Zhihua WANG
By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.
In the image classification applications, the test sample with multiple man-handcrafted descriptions can be sparsely represented by a few training subjects. Our paper is motivated by the success of multi-task joint sparse representation (MTJSR), and considers that the different modalities of features not only have the constraint of joint sparsity across different tasks, but also have the constraint of local manifold structure across different features. We introduce the constraint of local manifold structure into the MTJSR framework, and propose the Locality-constrained multi-task joint sparse representation method (LC-MTJSR). During the optimization of the formulated objective, the stochastic gradient descent method is used to guarantee fast convergence rate, which is essential for large-scale image categorization. Experiments on several challenging object classification datasets show that our proposed algorithm is better than the MTJSR, and is competitive with the state-of-the-art multiple kernel learning methods.
Keehang KWON Sungwoo HUR Mi-Young PARK
To deal with failures as simply as possible, we propose a new foundation for the core (untyped) C++, which is based on a new logic called task logic or imperative logic. We then introduce a sequential-disjunctive statement of the form S : R. This statement has the following semantics: execute S and R sequentially. It is considered a success if at least one of S, R is a success. This statement is useful for dealing with inessential errors without explicitly catching them.
Jaak SIMM Masashi SUGIYAMA Hirotaka HACHIYA
Reinforcement learning (RL) is a flexible framework for learning a decision rule in an unknown environment. However, a large number of samples are often required for finding a useful decision rule. To mitigate this problem, the concept of transfer learning has been employed to utilize knowledge obtained from similar RL tasks. However, most approaches developed so far are useful only in low-dimensional settings. In this paper, we propose a novel transfer learning idea that targets problems with high-dimensional states. Our idea is to transfer knowledge between state factors (e.g., interacting objects) within a single RL task. This allows the agent to learn the system dynamics of the target RL task with fewer data samples. The effectiveness of the proposed method is demonstrated through experiments.
Yuyu YUAN Chuanyi LIU Jie CHENG Xiaoliang WANG
Execution performance is critical for large-scale and data-intensive workflows. This paper proposes DISWOP, a novel scheduling algorithm for data-intensive workflow optimizations; it consists of three main steps: workflow process generation, task & resource mapping, and task clustering. To evaluate the effectiveness and efficiency of DISWOP, a comparison evaluation of different workflows is conducted a prototype workflow platform. The results show that DISWOP can speed up execution performance by about 1.6-2.3 times depending on the task scale.
Hyung Goo PAEK Jeong Mo YEO Kyong Hoon KIM Wan Yeon LEE
The proposed scheduling scheme minimizes the mean power consumption of real-time tasks with probabilistic computation amounts while meeting their deadlines. Our study formally solves the minimization problem under finitely discrete clock frequencies with irregular power consumptions, whereas state-of-the-arts studies did under infinitely continuous clock frequencies with regular power consumptions.
Alex VALDIVIELSO CHIAN Toshiyuki MIYAMOTO
In this letter, we present the evaluation of an option-based learning algorithm, developed to perform a conflict-free allocation of calls among cars in a multi-car elevator system. We evaluate its performance in terms of the service time, its flexibility in the task-allocation, and the load balancing.
Krzysztof JOZWIK Hiroyuki TOMIYAMA Shinya HONDA Hiroaki TAKADA
Modern FPGAs (Field Programmable Gate Arrays), such as Xilinx Virtex-4, have the capability of changing their contents dynamically and partially, allowing implementation of such concepts as a HW (hardware) task. Similarly to its software counterpart, the HW task shares time-multiplexed resources with other HW tasks. To support preemptive multitasking in such systems, additional context saving and restoring mechanisms must be built practically from scratch. This paper presents an efficient method for hardware task preemption which is suitable for tasks containing both Flip-Flops and memory elements. Our solution consists of an offline tool for analyzing and manipulating bitstreams, used at the design time, as well as an embedded system framework. The framework contains a DMA-based (Direct Memory Access), instruction-driven reconfiguration/readback controller and a developed lightweight bus facilitating management of HW tasks. The whole system has been implemented on top of the Xilinx Virtex-4 FPGA and showed promising results for a variety of HW tasks.
In this paper, the substitutability of the indifferentiability framework with non-sequential scheduling is examined by reformulating the framework through applying the Task-PIOA framework, which provides non-sequential activation with oblivious task sequences. First, the indifferentiability framework with non-sequential scheduling is shown to be able to retain the substitutability. Thus, the substitutability can be applied in another situation that processes of the systems may behave non-sequentially. Next, this framework is shown to be closely related to reducibility of systems. Reducibility is useful to discuss about the construction of a system from a weaker system. Finally, two modelings with respectively sequential scheduling and non-sequential scheduling are shown to be mutually independent. We find examples of systems which are indifferentiable under one model but differentiable under the other. Thus, the importance of scheduling in the indifferentiability framework is clarified.
Wan Yeon LEE Hyogon KIM Heejo LEE
The proposed scheduling scheme minimizes the energy consumption of a real-time task on the multi-core processor with the dynamic voltage and frequency scaling capability. The scheme allocates a pertinent number of cores to the task execution, inactivates unused cores, and assigns the lowest frequency meeting the deadline. For a periodic real-time task with consecutive real-time instances, the scheme prepares the minimum-energy solutions for all input cases at off-line time, and applies one of the prepared solutions to each real-time instance at runtime.
Alex VALDIVIELSO Toshiyuki MIYAMOTO
In automated transport applications, the design of a task allocation policy becomes a complex problem when there are several agents in the system and conflicts between them may arise, affecting the system's performance. In this situation, to achieve a globally optimal result would require the complete knowledge of the system's model, which is infeasible for real systems with huge state spaces and unknown state-transition probabilities. Reinforcement Learning (RL) methods have done well approximating optimal results in the processing of tasks, without requiring previous knowledge of the system's model. However, to our knowledge, there are not many RL methods focused on the task allocation problem in transportation systems, and even fewer directly used to allocate tasks, considering the risk of conflicts between agents. In this paper, we propose an option-based RL algorithm with conditioned updating to make agents learn a task allocation policy to complete tasks while preventing conflicts between them. We use a multicar elevator (MCE) system as test application. Simulation results show that with our algorithm, elevator cars in the same shaft effectively learn to respond to service calls without interfering with each other, under different passenger arrival rates, and system configurations.
Hideki TAKASE Hiroyuki TOMIYAMA Hiroaki TAKADA
Energy minimization has become one of the primary goals in the embedded real-time domains. Consequently, scratch-pad memory has been employed as partial or entire replacement for cache memory due to its better energy efficiency. However, most previous approaches were not applicable to a preemptive multi-task environment. We propose three methods of partitioning and allocation of scratch-pad memory for fixed-priority-based preemptive multi-task systems. The three methods, i.e., spatial, temporal, and hybrid methods, achieve energy reduction in the instruction memory subsystems. With the spatial method, each task occupies its exclusive space in scratch-pad memory. With the temporal method, the running task uses entire scratch-pad space. The content of scratch-pad memory is swapped out as a task executes or gets preempted. The hybrid method is based on the spatial one but a higher priority task can temporarily use the space of lower priority task. The amount of space is prioritized for higher priority tasks. We formulate each method as an integer programming problem that simultaneously determines (1) partitioning of scratch-pad memory space for the tasks, and (2) allocation of program code to scratch-pad memory space for each task. Our methods not only support the real-time task scheduling but also consider aggressively the periods and priorities of tasks for the energy minimization. Additionally, we implement an RTOS-hardware cooperative support mechanism for runtime code allocation to the scratch-pad memory space. We have made the experiments with the fully functional real-time operating system. The experimental results have demonstrated the effectiveness of our techniques. Up to 73% energy reduction compared to a conventional method was achieved.