The search functionality is under construction.

Keyword Search Result

[Keyword] Q-learning(13hit)

1-13hit
  • Performance Analysis and Optimization of Worst Case User in CoMP Ultra Dense Networks

    Sinh Cong LAM  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2023/03/27
      Vol:
    E106-B No:10
      Page(s):
    979-986

    In the cellular system, the Worst Case User (WCU), whose distances to three nearest BSs are the similar, usually achieves the lowest performance. Improving user performance, especially the WCU, is a big problem for both network designers and operators. This paper works on the WCU in terms of coverage probability analysis by the stochastic geometry tool and data rate optimization with the transmission power constraint by the reinforcement learning technique under the Stretched Pathloss Model (SPLM). In analysis, only fast fading from the WCU to the serving Base Stations (BSs) is taken into the analysis to derive the lower bound coverage probability. Furthermore, the paper assumes that the Coordinated Multi-Point (CoMP) technique is only employed for the WCU to enhance its downlink signal and avoid the explosion of Intercell Interference (ICI). Through the analysis and simulation, the paper states that to improve the WCU performance under bad wireless environments, an increase in transmission power can be a possible solution. However, in good environments, the deployment of advanced techniques such as Joint Transmission (JT), Joint Scheduling (JS), and reinforcement learning is an suitable solution.

  • A Hybrid Routing Algorithm for V2V Communication in VANETs Based on Blocked Q-Learning

    Xiang BI  Huang HUANG  Benhong ZHANG  Xing WEI  

     
    PAPER-Network

      Pubricized:
    2022/05/31
      Vol:
    E106-B No:1
      Page(s):
    1-17

    It is of great significance to design a stable and reliable routing protocol for Vehicular Ad Hoc Networks (VANETs) that adopt Vehicle to Vehicle (V2V) communications in the face of frequent network topology changes. In this paper, we propose a hybrid routing algorithm, RCRIQ, based on improved Q-learning. For an established cluster structure, the cluster head is used to select the gateway vehicle according to the gateway utility function to expand the communication range of the cluster further. During the link construction stage, an improved Q-learning algorithm is adopted. The corresponding neighbor vehicle is chosen according to the maximum Q value in the neighbor list. The heuristic algorithm selects the next-hop by the maximum heuristic function value when selecting the next-hop neighbor node. The above two strategies are comprehensively evaluated to determine the next hop. This way ensures the optimal selection of the next hop in terms of reachability and other communication parameters. Simulation experiments show that the algorithm proposed in this article has better performance in terms of routing stability, throughput, and communication delay in the urban traffic scene.

  • A Novel Hierarchical V2V Routing Algorithm Based on Bus in Urban VANETs

    Xiang BI  Shengzhen YANG  Benhong ZHANG  Xing WEI  

     
    PAPER-Network

      Pubricized:
    2022/05/19
      Vol:
    E105-B No:12
      Page(s):
    1487-1497

    Multi-hop V2V communication is a fundamental way to realize data transmission in Vehicular Ad-hoc Networks (VANET). It has excellent potential in intelligent transportation systems and automatic vehicle driving, and positively affects the safety, reliability, and comfort of vehicles. With advantages in speed and trajectory, distribution along the route, size, etc., the urban buses have become prospective relay nodes for urban VANETs. However, it is a considerable challenge to construct stable and reliable (meeting the requirements of bandwidth, delay, and bit error rate) multi-hop routing because of the complexity of the urban road and bus line network in the communication area, as well as many unevenly distributed buses on the road, etc. Given this above, this paper proposes a new hierarchical routing algorithm based on V2V geographic topology segmentation. Urban hierarchical routing is divided into two layers. The first layer of routing is called coarse routing, which is composed of areas; the second layer of routing is called internal routing (bus routing within the area). Q-learning is used to formulate the sequence of buses that transmit information within each area. Details are as follows: Firstly, based on a city map containing road network information, the entire city is divided into small grids by physical streets. Secondly, based on an analysis of the characteristics of the adjacent grid bus lines, the grids with the same routing attributes are integrated into the same area, reducing the algorithm's computational complexity during route discovery. Then, for the calculated area set, a coarse route composed of the selected area is established by filtering out a group of areas satisfying from the source node to the destination node. Finally, the bus sequence between anchor intersections is selected within the chosen area, and a complete multi-hop route from the source node to the destination node is finally constructed. Sufficient simulations show that the proposed routing algorithm has more stable performance in terms of packet transmission rate, average end-to-end delay, routing duration, and other indicators than similar algorithms.

  • A Deep Q-Network Based Intelligent Decision-Making Approach for Cognitive Radar

    Yong TIAN  Peng WANG  Xinyue HOU  Junpeng YU  Xiaoyan PENG  Hongshu LIAO  Lin GAO  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2021/10/15
      Vol:
    E105-A No:4
      Page(s):
    719-726

    The electromagnetic environment is increasingly complex and changeable, and radar needs to meet the execution requirements of various tasks. Modern radars should improve their intelligence level and have the ability to learn independently in dynamic countermeasures. It can make the radar countermeasure strategy change from the traditional fixed anti-interference strategy to dynamically and independently implementing an efficient anti-interference strategy. Aiming at the performance optimization of target tracking in the scene where multiple signals coexist, we propose a countermeasure method of cognitive radar based on a deep Q-learning network. In this paper, we analyze the tracking performance of this method and the Markov Decision Process under the triangular frequency sweeping interference, respectively. The simulation results show that reinforcement learning has substantial autonomy and adaptability for solving such problems.

  • A Real-Time Subtask-Assistance Strategy for Adaptive Services Composition

    Li QUAN  Zhi-liang WANG  Xin LIU  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2018/01/30
      Vol:
    E101-D No:5
      Page(s):
    1361-1369

    Reinforcement learning has been used to adaptive service composition. However, traditional algorithms are not suitable for large-scale service composition. Based on Q-Learning algorithm, a multi-task oriented algorithm named multi-Q learning is proposed to realize subtask-assistance strategy for large-scale and adaptive service composition. Differ from previous studies that focus on one task, we take the relationship between multiple service composition tasks into account. We decompose complex service composition task into multiple subtasks according to the graph theory. Different tasks with the same subtasks can assist each other to improve their learning speed. The results of experiments show that our algorithm could obtain faster learning speed obviously than traditional Q-learning algorithm. Compared with multi-agent Q-learning, our algorithm also has faster convergence speed. Moreover, for all involved service composition tasks that have the same subtasks between each other, our algorithm can improve their speed of learning optimal policy simultaneously in real-time.

  • Adaptive Q-Learning Cell Selection Method for Open-Access Femtocell Networks: Multi-User Case

    Chaima DHAHRI  Tomoaki OHTSUKI  

     
    PAPER-Network Management/Operation

      Vol:
    E97-B No:8
      Page(s):
    1679-1688

    Open-access femtocell networks assure the cellular user of getting a better and stronger signal. However, due to the small range of femto base stations (FBSs), any motion of the user may trigger handover. In a dense environment, the possibility of such handover is very frequent. To avoid frequent communication disruptions due to phenomena such as the ping-pong effect, it is necessary to ensure the effectiveness of the cell selection method. Existing selection methods commonly uses a measured channel/cell quality metric such as the channel capacity (between the user and the target cell). However, the throughput experienced by the user is time-varying because of the channel condition, i.e., owing to the propagation effects or receiver location. In this context, the conventional approach does not reflect the future performance. To ensure the efficiency of cell selection, user's decision needs to depend not only on the current state of the network, but also on the future possible states (horizon). To this end, we implement a learning algorithm that can predict, based on the past experience, the best performing cell in the future. We present in this paper a reinforcement learning (RL) framework as a generic solution for the cell selection problem in a non-stationary femtocell network that selects, without prior knowledge about the environment, a target cell by exploring past cells' behavior and predicting their potential future states based on Q-learning algorithm. Then, we extend this proposal by referring to a fuzzy inference system (FIS) to tune Q-learning parameters during the learning process to adapt to environment changes. Our solution aims at minimizing the frequency of handovers without affecting the user experience in terms of channel capacity. Simulation results demonstrate that· our solution comes very close to the performance of the opportunistic method in terms of capacity, while fewer handovers are required on average.· the use of fuzzy rules achieves better performance in terms of received reward (capacity) and number of handovers than fixing the values of Q-learning parameters.

  • Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning

    Fereidoun H. PANAHI  Tomoaki OHTSUKI  

     
    PAPER

      Vol:
    E97-B No:2
      Page(s):
    283-294

    In a cognitive radio (CR) network, the channel sensing scheme used to detect the existence of a primary user (PU) directly affects the performances of both CR and PU. However, in practical systems, the CR is prone to sensing errors due to the inefficiency of the sensing scheme. This may yield primary user interference and low system performance. In this paper, we present a learning-based scheme for channel sensing in CR networks. Specifically, we formulate the channel sensing problem as a partially observable Markov decision process (POMDP), where the most likely channel state is derived by a learning process called Fuzzy Q-Learning (FQL). The optimal policy is derived by solving the problem. Simulation results show the effectiveness and efficiency of our proposed scheme.

  • Network Selection for Cognitive Radio Based on Fuzzy Learning

    Mo LI  Youyun XU  Ruiqin MIAO  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E94-B No:12
      Page(s):
    3490-3497

    Cognitive radio is a promising approach to ensuring the coexistence of heterogeneous wireless networks since it can perceive wireless conditions and freely switch among different network modes. When there are many network opportunities, how to decide the appropriate network selection for CR user's current service is the main problem we study in this paper. We make full use of the intelligent characteristic of CR user and propose a fuzzy learning based network selection scheme, in which the network selection choice is made based on the estimated evaluations of available networks. Multiple factors are considered when estimating these evaluations. Both the outer environment factors directly sensed by CR user (signal strength of the available network and network mode), and also the factor that cannot be determined beforehand and is learnt by our scheme (the bandwidth allocated by the optional network) are considered. From several interactions with the wireless environment, the experience of network selection behavior is accumulated which will direct our scheme to make a proper decision of the network. Two simulations verify that our scheme could not only guarantee a better bandwidth requirement of CR user compared with other three network selection methods, but also shows it to be a reasonable scheme for utilizing the available resources of these networks.

  • Distributed Reinforcement Learning Approach for Vehicular Ad Hoc Networks

    Celimuge WU  Kazuya KUMEKAWA  Toshihiko KATO  

     
    PAPER-Network

      Vol:
    E93-B No:6
      Page(s):
    1431-1442

    In Vehicular Ad hoc Networks (VANETs), general purpose ad hoc routing protocols such as AODV cannot work efficiently due to the frequent changes in network topology caused by vehicle movement. This paper proposes a VANET routing protocol QLAODV (Q-Learning AODV) which suits unicast applications in high mobility scenarios. QLAODV is a distributed reinforcement learning routing protocol, which uses a Q-Learning algorithm to infer network state information and uses unicast control packets to check the path availability in a real time manner in order to allow Q-Learning to work efficiently in a highly dynamic network environment. QLAODV is favored by its dynamic route change mechanism, which makes it capable of reacting quickly to network topology changes. We present an analysis of the performance of QLAODV by simulation using different mobility models. The simulation results show that QLAODV can efficiently handle unicast applications in VANETs.

  • CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes

    Hiroshi OSADA  Satoshi FUJITA  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E88-D No:5
      Page(s):
    1004-1011

    In this paper, we propose a new reinforcement learning scheme called CHQ that could efficiently acquire appropriate policies under partially observable Markov decision processes (POMDP) involving probabilistic state transitions, that frequently occurs in multi-agent systems in which each agent independently takes a probabilistic action based on a partial observation of the underlying environment. A key idea of CHQ is to extend the HQ-learning proposed by Wiering et al. in such a way that it could learn the activation order of the MDP subtasks as well as an appropriate policy under each MDP subtask. The goodness of the proposed scheme is experimentally evaluated. The result of experiments implies that it can acquire a deterministic policy with a sufficiently high success rate, even if the given task is POMDP with probabilistic state transitions.

  • Labeling Q-Learning in POMDP Environments

    Haeyeon LEE  Hiroyuki KAMAYA  Kenichi ABE  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E85-D No:9
      Page(s):
    1425-1432

    This paper presents a new Reinforcement Learning (RL) method, called "Labeling Q-learning (LQ-learning)," to solve the partially obervable Markov Decision Process (POMDP) problems. Recently, hierarchical RL methods are widely studied. However, they have the drawback that the learning time and memory are exhausted only for keeping the hierarchical structure, though they wouldn't be necessary. On the other hand, our LQ-learning has no hierarchical structure, but adopts a new type of internal memory mechanism. Namely, in the LQ-learning, the agent percepts the current state by pair of observation and its label, and then, the agent can distinguish states, which look as same, but obviously different, more exactly. So to speak, at each step t, we define a new type of perception of its environment õt=(ot,θt), where ot is conventional observation, and θt is the label attached to the observation ot. Then the classical RL-algorithm is used as if the pair (ot,θt) serves as a Markov state. This labeling is carried out by a Boolean variable, called "CHANGE," and a hash-like or mod function, called Labeling Function (LF). In order to demonstrate the efficiency of LQ-learning, we will apply it to "maze problems" in Grid-Worlds, used in many literatures as POMDP simulated environments. By using the LQ-learning, we can solve the maze problems without initial knowledge of environments.

  • Agent-Oriented Routing in Telecommunications Networks

    Karla VITTORI  Aluizio F. R. ARAUJO  

     
    PAPER-Software Platform

      Vol:
    E84-B No:11
      Page(s):
    3006-3013

    This paper presents an intelligent routing algorithm, called Q-Agents, which bases its actions only on the agent-environment interaction. This algorithm combines properties of three learning strategies (Q-learning, dual reinforcement learning and learning based on ant colony behavior), adding to them two further mechanisms to improve its adaptability. Hence, the proposed algorithm is composed of a set of agents, moving through the network independently and concurrently, searching for the best routes. The agents share knowledge about the quality of the paths traversed through indirect communication. Information about the network and traffic status is updated by using Q-learning and dual reinforcement updating rules. Q-Agents were applied to a model of an AT&T circuit-switched network. Experiments were carried out on the performance of the algorithm under variations of traffic patterns, load level and topology, and with addition of noise in the information used to route calls. Q-Agents suffered a lower number of lost calls than two algorithms based entirely on ant colony behavior.

  • Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment

    Gang ZHAO  Shoji TATSUMI  Ruoying SUN  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E83-A No:9
      Page(s):
    1786-1795

    Reinforcement Learning (RL) is an efficient method for solving Markov Decision Processes (MDPs) without a priori knowledge about an environment, and can be classified into the exploitation oriented method and the exploration oriented method. Q-learning is a representative RL and is classified as an exploration oriented method. It is guaranteed to obtain an optimal policy, however, Q-learning needs numerous trials to learn it because there is not action-selecting mechanism in Q-learning. For accelerating the learning rate of the Q-learning and realizing exploitation and exploration at a learning process, the Q-ee learning system has been proposed, which uses pre-action-selector, action-selector and back propagation of Q values to improve the performance of Q-learning. But the Q-ee learning is merely suitable for deterministic MDPs, and its convergent guarantee to derive an optimal policy has not been proved. In this paper, based on discussing different exploration methods, replacing the pre-action-selector in the Q-ee learning, we introduce a method that can be used to implement an active exploration to an environment, the Active Exploration Planning (AEP), into the learning system, which we call the Q-ae learning. With this replacement, the Q-ae learning not only maintains advantages of the Q-ee learning but also is adapted to a stochastic environment. Moreover, under deterministic MDPs, this paper presents the convergent condition and its proof for an agent to obtain the optimal policy by the method of the Q-ae learning. Further, by discussions and experiments, it is shown that by adjusting the relation between the learning factor and the discounted rate, the exploration process to an environment can be controlled on a stochastic environment. And, experimental results about the exploration rate to an environment and the correct rate of learned policies also illustrate the efficiency of the Q-ae learning on the stochastic environment.