The search functionality is under construction.

Keyword Search Result

[Keyword] deep reinforcement learning(16hit)

1-16hit
  • Federated Deep Reinforcement Learning for Multimedia Task Offloading and Resource Allocation in MEC Networks Open Access

    Rongqi ZHANG  Chunyun PAN  Yafei WANG  Yuanyuan YAO  Xuehua LI  

     
    PAPER-Network

      Vol:
    E107-B No:6
      Page(s):
    446-457

    With maturation of 5G technology in recent years, multimedia services such as live video streaming and online games on the Internet have flourished. These multimedia services frequently require low latency, which pose a significant challenge to compute the high latency requirements multimedia tasks. Mobile edge computing (MEC), is considered a key technology solution to address the above challenges. It offloads computation-intensive tasks to edge servers by sinking mobile nodes, which reduces task execution latency and relieves computing pressure on multimedia devices. In order to use MEC paradigm reasonably and efficiently, resource allocation has become a new challenge. In this paper, we focus on the multimedia tasks which need to be uploaded and processed in the network. We set the optimization problem with the goal of minimizing the latency and energy consumption required to perform tasks in multimedia devices. To solve the complex and non-convex problem, we formulate the optimization problem as a distributed deep reinforcement learning (DRL) problem and propose a federated Dueling deep Q-network (DDQN) based multimedia task offloading and resource allocation algorithm (FDRL-DDQN). In the algorithm, DRL is trained on the local device, while federated learning (FL) is responsible for aggregating and updating the parameters from the trained local models. Further, in order to solve the not identically and independently distributed (non-IID) data problem of multimedia devices, we develop a method for selecting participating federated devices. The simulation results show that the FDRL-DDQN algorithm can reduce the total cost by 31.3% compared to the DQN algorithm when the task data is 1000 kbit, and the maximum reduction can be 35.3% compared to the traditional baseline algorithm.

  • Resource Allocation for Mobile Edge Computing System Considering User Mobility with Deep Reinforcement Learning

    Kairi TOKUDA  Takehiro SATO  Eiji OKI  

     
    PAPER-Network

      Pubricized:
    2023/10/06
      Vol:
    E107-B No:1
      Page(s):
    173-184

    Mobile edge computing (MEC) is a key technology for providing services that require low latency by migrating cloud functions to the network edge. The potential low quality of the wireless channel should be noted when mobile users with limited computing resources offload tasks to an MEC server. To improve the transmission reliability, it is necessary to perform resource allocation in an MEC server, taking into account the current channel quality and the resource contention. There are several works that take a deep reinforcement learning (DRL) approach to address such resource allocation. However, these approaches consider a fixed number of users offloading their tasks, and do not assume a situation where the number of users varies due to user mobility. This paper proposes Deep reinforcement learning model for MEC Resource Allocation with Dummy (DMRA-D), an online learning model that addresses the resource allocation in an MEC server under the situation where the number of users varies. By adopting dummy state/action, DMRA-D keeps the state/action representation. Therefore, DMRA-D can continue to learn one model regardless of variation in the number of users during the operation. Numerical results show that DMRA-D improves the success rate of task submission while continuing learning under the situation where the number of users varies.

  • Minimization of Energy Consumption in TDMA-Based Wireless-Powered Multi-Access Edge Computing Networks

    Xi CHEN  Guodong JIANG  Kaikai CHI  Shubin ZHANG  Gang CHEN  Jiang LIU  

     
    PAPER-Communication Theory and Signals

      Pubricized:
    2023/06/19
      Vol:
    E106-A No:12
      Page(s):
    1544-1554

    Many nodes in Internet of Things (IoT) rely on batteries for power. Additionally, the demand for executing compute-intensive and latency-sensitive tasks is increasing for IoT nodes. In some practical scenarios, the computation tasks of WDs have the non-separable characteristic, that is, binary offloading strategies should be used. In this paper, we focus on the design of an efficient binary offloading algorithm that minimizes system energy consumption (EC) for TDMA-based wireless-powered multi-access edge computing networks, where WDs either compute tasks locally or offload them to hybrid access points (H-APs). We formulate the EC minimization problem which is a non-convex problem and decompose it into a master problem optimizing binary offloading decision and a subproblem optimizing WPT duration and task offloading transmission durations. For the master problem, a DRL based method is applied to obtain the near-optimal offloading decision. For the subproblem, we firstly consider the scenario where the nodes do not have completion time constraints and obtain the optimal analytical solution. Then we consider the scenario with the constraints. By jointly using the Golden Section Method and bisection method, the optimal solution can be obtained due to the convexity of the constraint function. Simulation results show that the proposed offloading algorithm based on DRL can achieve the near-minimal EC.

  • Joint Virtual Network Function Deployment and Scheduling via Heuristics and Deep Reinforcement Learning

    Zixiao ZHANG  Eiji OKI  

     
    PAPER-Network

      Pubricized:
    2023/08/01
      Vol:
    E106-B No:12
      Page(s):
    1424-1440

    This paper introduces heuristic approaches and a deep reinforcement learning approach to solve a joint virtual network function deployment and scheduling problem in a dynamic scenario. We formulate the problem as an optimization problem. Based on the mathematical description of the optimization problem, we introduce three heuristic approaches and a deep reinforcement learning approach to solve the problem. We define an objective to maximize the ratio of delay-satisfied requests while minimizing the average resource cost for a dynamic scenario. Our introduced two greedy approaches are named finish time greedy and computational resource greedy, respectively. In the finish time greedy approach, we make each request be finished as soon as possible despite its resource cost; in the computational resource greedy approach, we make each request occupy as few resources as possible despite its finish time. Our introduced simulated annealing approach generates feasible solutions randomly and converges to an approximate solution. In our learning-based approach, neural networks are trained to make decisions. We use a simulated environment to evaluate the performances of our introduced approaches. Numerical results show that the introduced deep reinforcement learning approach has the best performance in terms of benefit in our examined cases.

  • Dynamic VNF Scheduling: A Deep Reinforcement Learning Approach

    Zixiao ZHANG  Fujun HE  Eiji OKI  

     
    PAPER-Network

      Pubricized:
    2023/01/10
      Vol:
    E106-B No:7
      Page(s):
    557-570

    This paper introduces a deep reinforcement learning approach to solve the virtual network function scheduling problem in dynamic scenarios. We formulate an integer linear programming model for the problem in static scenarios. In dynamic scenarios, we define the state, action, and reward to form the learning approach. The learning agents are applied with the asynchronous advantage actor-critic algorithm. We assign a master agent and several worker agents to each network function virtualization node in the problem. The worker agents work in parallel to help the master agent make decision. We compare the introduced approach with existing approaches by applying them in simulated environments. The existing approaches include three greedy approaches, a simulated annealing approach, and an integer linear programming approach. The numerical results show that the introduced deep reinforcement learning approach improves the performance by 6-27% in our examined cases.

  • Semantic Path Planning for Indoor Navigation Tasks Using Multi-View Context and Prior Knowledge

    Jianbing WU  Weibo HUANG  Guoliang HUA  Wanruo ZHANG  Risheng KANG  Hong LIU  

     
    PAPER-Positioning and Navigation

      Pubricized:
    2022/01/20
      Vol:
    E106-D No:5
      Page(s):
    756-764

    Recently, deep reinforcement learning (DRL) methods have significantly improved the performance of target-driven indoor navigation tasks. However, the rich semantic information of environments is still not fully exploited in previous approaches. In addition, existing methods usually tend to overfit on training scenes or objects in target-driven navigation tasks, making it hard to generalize to unseen environments. Human beings can easily adapt to new scenes as they can recognize the objects they see and reason the possible locations of target objects using their experience. Inspired by this, we propose a DRL-based target-driven navigation model, termed MVC-PK, using Multi-View Context information and Prior semantic Knowledge. It relies only on the semantic label of target objects and allows the robot to find the target without using any geometry map. To perceive the semantic contextual information in the environment, object detectors are leveraged to detect the objects present in the multi-view observations. To enable the semantic reasoning ability of indoor mobile robots, a Graph Convolutional Network is also employed to incorporate prior knowledge. The proposed MVC-PK model is evaluated in the AI2-THOR simulation environment. The results show that MVC-PK (1) significantly improves the cross-scene and cross-target generalization ability, and (2) achieves state-of-the-art performance with 15.2% and 11.0% increase in Success Rate (SR) and Success weighted by Path Length (SPL), respectively.

  • Edge Computing Resource Allocation Algorithm for NB-IoT Based on Deep Reinforcement Learning

    Jiawen CHU  Chunyun PAN  Yafei WANG  Xiang YUN  Xuehua LI  

     
    PAPER-Network

      Pubricized:
    2022/11/04
      Vol:
    E106-B No:5
      Page(s):
    439-447

    Mobile edge computing (MEC) technology guarantees the privacy and security of large-scale data in the Narrowband-IoT (NB-IoT) by deploying MEC servers near base stations to provide sufficient computing, storage, and data processing capacity to meet the delay and energy consumption requirements of NB-IoT terminal equipment. For the NB-IoT MEC system, this paper proposes a resource allocation algorithm based on deep reinforcement learning to optimize the total cost of task offloading and execution. Since the formulated problem is a mixed-integer non-linear programming (MINLP), we cast our problem as a multi-agent distributed deep reinforcement learning (DRL) problem and address it using dueling Q-learning network algorithm. Simulation results show that compared with the deep Q-learning network and the all-local cost and all-offload cost algorithms, the proposed algorithm can effectively guarantee the success rates of task offloading and execution. In addition, when the execution task volume is 200KBit, the total system cost of the proposed algorithm can be reduced by at least 1.3%, and when the execution task volume is 600KBit, the total cost of system execution tasks can be reduced by 16.7% at most.

  • SPSD: Semantics and Deep Reinforcement Learning Based Motion Planning for Supermarket Robot

    Jialun CAI  Weibo HUANG  Yingxuan YOU  Zhan CHEN  Bin REN  Hong LIU  

     
    PAPER-Positioning and Navigation

      Pubricized:
    2022/09/15
      Vol:
    E106-D No:5
      Page(s):
    765-772

    Robot motion planning is an important part of the unmanned supermarket. The challenges of motion planning in supermarkets lie in the diversity of the supermarket environment, the complexity of obstacle movement, the vastness of the search space. This paper proposes an adaptive Search and Path planning method based on the Semantic information and Deep reinforcement learning (SPSD), which effectively improves the autonomous decision-making ability of supermarket robots. Firstly, based on the backbone of deep reinforcement learning (DRL), supermarket robots process real-time information from multi-modality sensors to realize high-speed and collision-free motion planning. Meanwhile, in order to solve the problem caused by the uncertainty of the reward in the deep reinforcement learning, common spatial semantic relationships between landmarks and target objects are exploited to define reward function. Finally, dynamics randomization is introduced to improve the generalization performance of the algorithm in the training. The experimental results show that the SPSD algorithm is excellent in the three indicators of generalization performance, training time and path planning length. Compared with other methods, the training time of SPSD is reduced by 27.42% at most, the path planning length is reduced by 21.08% at most, and the trained network of SPSD can be applied to unfamiliar scenes safely and efficiently. The results are motivating enough to consider the application of the proposed method in practical scenes. We have uploaded the video of the results of the experiment to https://www.youtube.com/watch?v=h1wLpm42NZk.

  • Multi-Agent Reinforcement Learning for Cooperative Task Offloading in Distributed Edge Cloud Computing

    Shiyao DING  Donghui LIN  

     
    PAPER

      Pubricized:
    2021/12/28
      Vol:
    E105-D No:5
      Page(s):
    936-945

    Distributed edge cloud computing is an important computation infrastructure for Internet of Things (IoT) and its task offloading problem has attracted much attention recently. Most existing work on task offloading in distributed edge cloud computing usually assumes that each self-interested user owns one edge server and chooses whether to execute its tasks locally or to offload the tasks to cloud servers. The goal of each edge server is to maximize its own interest like low delay cost, which corresponds to a non-cooperative setting. However, with the strong development of smart IoT communities such as smart hospital and smart factory, all edge and cloud servers can belong to one organization like a technology company. This corresponds to a cooperative setting where the goal of the organization is to maximize the team interest in the overall edge cloud computing system. In this paper, we consider a new problem called cooperative task offloading where all edge servers try to cooperate to make the entire edge cloud computing system achieve good performance such as low delay cost and low energy cost. However, this problem is hard to solve due to two issues: 1) each edge server status dynamically changes and task arrival is uncertain; 2) each edge server can observe only its own status, which makes it hard to optimize team interest as global information is unavailable. For solving these issues, we formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) which can well handle the dynamic features under partial observations. Then, we apply a multi-agent reinforcement learning algorithm called value decomposition network (VDN) and propose a VDN-based task offloading algorithm (VDN-TO) to solve the problem. Specifically, the motivation is that we use a team value function to evaluate the team interest, which is then divided into individual value functions for each edge server. Then, each edge server updates its individual value function in the direction that can maximize the team interest. Finally, we choose a part of a real dataset to evaluate our algorithm and the results show the effectiveness of our algorithm in a comparison with some other existing methods.

  • Improved Metric Function for AlphaSeq Algorithm to Design Ideal Complementary Codes for Multi-Carrier CDMA Systems

    Shucong TIAN  Meng YANG  Jianpeng WANG  Rui WANG  Avik R. ADHIKARY  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2021/11/15
      Vol:
    E105-A No:5
      Page(s):
    901-905

    AlphaSeq is a new paradigm to design sequencess with desired properties based on deep reinforcement learning (DRL). In this work, we propose a new metric function and a new reward function, to design an improved version of AlphaSeq. We show analytically and also through numerical simulations that the proposed algorithm can discover sequence sets with preferable properties faster than that of the previous algorithm.

  • Control of Discrete-Time Chaotic Systems with Policy-Based Deep Reinforcement Learning

    Junya IKEMOTO  Toshimitsu USHIO  

     
    PAPER-Nonlinear Problems

      Vol:
    E103-A No:7
      Page(s):
    885-892

    The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a target periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. Thus, we propose a reinforcement learning algorithm to the design of a controller for the chaotic system. Recently, reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. We propose a controller design method consisting of two steps, where we determine a region including a target periodic point first, and make the controller learn an optimal control policy for its stabilization. The controller efficiently explores its control policy only in the region.

  • Deep-Reinforcement-Learning-Based Distributed Vehicle Position Controls for Coverage Expansion in mmWave V2X

    Akihito TAYA  Takayuki NISHIO  Masahiro MORIKURA  Koji YAMAMOTO  

     
    PAPER-Network Management/Operation

      Pubricized:
    2019/04/17
      Vol:
    E102-B No:10
      Page(s):
    2054-2065

    In millimeter wave (mmWave) vehicular communications, multi-hop relay disconnection by line-of-sight (LOS) blockage is a critical problem, particularly in the early diffusion phase of mmWave-available vehicles, where not all vehicles have mmWave communication devices. This paper proposes a distributed position control method to establish long relay paths through road side units (RSUs). This is realized by a scheme via which autonomous vehicles change their relative positions to communicate with each other via LOS paths. Even though vehicles with the proposed method do not use all the information of the environment and do not cooperate with each other, they can decide their action (e.g., lane change and overtaking) and form long relays only using information of their surroundings (e.g., surrounding vehicle positions). The decision-making problem is formulated as a Markov decision process such that autonomous vehicles can learn a practical movement strategy for making long relays by a reinforcement learning (RL) algorithm. This paper designs a learning algorithm based on a sophisticated deep reinforcement learning algorithm, asynchronous advantage actor-critic (A3C), which enables vehicles to learn a complex movement strategy quickly through its deep-neural-network architecture and multi-agent-learning mechanism. Once the strategy is well trained, vehicles can move independently to establish long relays and connect to the RSUs via the relays. Simulation results confirm that the proposed method can increase the relay length and coverage even if the traffic conditions and penetration ratio of mmWave communication devices in the learning and operation phases are different.

  • Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning

    Zhi-xiong XU  Lei CAO  Xi-liang CHEN  Chen-xi LI  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/18
      Vol:
    E101-D No:9
      Page(s):
    2409-2412

    Aiming at the contradiction between exploration and exploitation in deep reinforcement learning, this paper proposes “reward-based exploration strategy combined with Softmax action selection” (RBE-Softmax) as a dynamic exploration strategy to guide the agent to learn. The superiority of the proposed method is that the characteristic of agent's learning process is utilized to adapt exploration parameters online, and the agent is able to select potential optimal action more effectively. The proposed method is evaluated in discrete and continuous control tasks on OpenAI Gym, and the empirical evaluation results show that RBE-Softmax method leads to statistically-significant improvement in the performance of deep reinforcement learning algorithms.

  • Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

    Zhi-xiong XU  Lei CAO  Xi-liang CHEN  Chen-xi LI  Yong-liang ZHANG  Jun LAI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/22
      Vol:
    E101-D No:9
      Page(s):
    2315-2322

    The commonly used Deep Q Networks is known to overestimate action values under certain conditions. It's also proved that overestimations do harm to performance, which might cause instability and divergence of learning. In this paper, we present the Deep Sarsa and Q Networks (DSQN) algorithm, which can considered as an enhancement to the Deep Q Networks algorithm. First, DSQN algorithm takes advantage of the experience replay and target network techniques in Deep Q Networks to improve the stability of neural networks. Second, double estimator is utilized for Q-learning to reduce overestimations. Especially, we introduce Sarsa learning to Deep Q Networks for removing overestimations further. Finally, DSQN algorithm is evaluated on cart-pole balancing, mountain car and lunarlander control task from the OpenAI Gym. The empirical evaluation results show that the proposed method leads to reduced overestimations, more stable learning process and improved performance.

  • A Study of Qualitative Knowledge-Based Exploration for Continuous Deep Reinforcement Learning

    Chenxi LI  Lei CAO  Xiaoming LIU  Xiliang CHEN  Zhixiong XU  Yongliang ZHANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/07/26
      Vol:
    E100-D No:11
      Page(s):
    2721-2724

    As an important method to solve sequential decision-making problems, reinforcement learning learns the policy of tasks through the interaction with environment. But it has difficulties scaling to large-scale problems. One of the reasons is the exploration and exploitation dilemma which may lead to inefficient learning. We present an approach that addresses this shortcoming by introducing qualitative knowledge into reinforcement learning using cloud control systems to represent ‘if-then’ rules. We use it as the heuristics exploration strategy to guide the action selection in deep reinforcement learning. Empirical evaluation results show that our approach can make significant improvement in the learning process.

  • Relation Extraction with Deep Reinforcement Learning

    Hongjun ZHANG  Yuntian FENG  Wenning HAO  Gang CHEN  Dawei JIN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/05/17
      Vol:
    E100-D No:8
      Page(s):
    1893-1902

    In recent years, deep learning has been widely applied in relation extraction task. The method uses only word embeddings as network input, and can model relations between target named entity pairs. It equally deals with each relation mention, so it cannot effectively extract relations from the corpus with an enormous number of non-relations, which is the main reason why the performance of relation extraction is significantly lower than that of relation classification. This paper designs a deep reinforcement learning framework for relation extraction, which considers relation extraction task as a two-step decision-making game. The method models relation mentions with CNN and Tree-LSTM, which can calculate initial state and transition state for the game respectively. In addition, we can tackle the problem of unbalanced corpus by designing penalty function which can increase the penalties for first-step decision-making errors. Finally, we use Q-Learning algorithm with value function approximation to learn control policy π for the game. This paper sets up a series of experiments in ACE2005 corpus, which show that the deep reinforcement learning framework can achieve state-of-the-art performance in relation extraction task.