IEICE global.ieice.org Site

Keyword Search Result

[Keyword] Q-learning(13hit)

1-13hit

Performance Analysis and Optimization of Worst Case User in CoMP Ultra Dense Networks
Sinh Cong LAM

PAPER-Wireless Communication Technologies

Pubricized:
2023/03/27
Vol:
E106-B No:10
Page(s):
979-986
In the cellular system, the Worst Case User (WCU), whose distances to three nearest BSs are the similar, usually achieves the lowest performance. Improving user performance, especially the WCU, is a big problem for both network designers and operators. This paper works on the WCU in terms of coverage probability analysis by the stochastic geometry tool and data rate optimization with the transmission power constraint by the reinforcement learning technique under the Stretched Pathloss Model (SPLM). In analysis, only fast fading from the WCU to the serving Base Stations (BSs) is taken into the analysis to derive the lower bound coverage probability. Furthermore, the paper assumes that the Coordinated Multi-Point (CoMP) technique is only employed for the WCU to enhance its downlink signal and avoid the explosion of Intercell Interference (ICI). Through the analysis and simulation, the paper states that to improve the WCU performance under bad wireless environments, an increase in transmission power can be a possible solution. However, in good environments, the deployment of advanced techniques such as Joint Transmission (JT), Joint Scheduling (JS), and reinforcement learning is an suitable solution.
A Hybrid Routing Algorithm for V2V Communication in VANETs Based on Blocked Q-Learning
Xiang BI Huang HUANG Benhong ZHANG Xing WEI

PAPER-Network

Pubricized:
2022/05/31
Vol:
E106-B No:1
Page(s):
1-17
It is of great significance to design a stable and reliable routing protocol for Vehicular Ad Hoc Networks (VANETs) that adopt Vehicle to Vehicle (V2V) communications in the face of frequent network topology changes. In this paper, we propose a hybrid routing algorithm, RCRIQ, based on improved Q-learning. For an established cluster structure, the cluster head is used to select the gateway vehicle according to the gateway utility function to expand the communication range of the cluster further. During the link construction stage, an improved Q-learning algorithm is adopted. The corresponding neighbor vehicle is chosen according to the maximum Q value in the neighbor list. The heuristic algorithm selects the next-hop by the maximum heuristic function value when selecting the next-hop neighbor node. The above two strategies are comprehensively evaluated to determine the next hop. This way ensures the optimal selection of the next hop in terms of reachability and other communication parameters. Simulation experiments show that the algorithm proposed in this article has better performance in terms of routing stability, throughput, and communication delay in the urban traffic scene.
A Novel Hierarchical V2V Routing Algorithm Based on Bus in Urban VANETs
Xiang BI Shengzhen YANG Benhong ZHANG Xing WEI

PAPER-Network

Pubricized:
2022/05/19
Vol:
E105-B No:12
Page(s):
1487-1497
Multi-hop V2V communication is a fundamental way to realize data transmission in Vehicular Ad-hoc Networks (VANET). It has excellent potential in intelligent transportation systems and automatic vehicle driving, and positively affects the safety, reliability, and comfort of vehicles. With advantages in speed and trajectory, distribution along the route, size, etc., the urban buses have become prospective relay nodes for urban VANETs. However, it is a considerable challenge to construct stable and reliable (meeting the requirements of bandwidth, delay, and bit error rate) multi-hop routing because of the complexity of the urban road and bus line network in the communication area, as well as many unevenly distributed buses on the road, etc. Given this above, this paper proposes a new hierarchical routing algorithm based on V2V geographic topology segmentation. Urban hierarchical routing is divided into two layers. The first layer of routing is called coarse routing, which is composed of areas; the second layer of routing is called internal routing (bus routing within the area). Q-learning is used to formulate the sequence of buses that transmit information within each area. Details are as follows: Firstly, based on a city map containing road network information, the entire city is divided into small grids by physical streets. Secondly, based on an analysis of the characteristics of the adjacent grid bus lines, the grids with the same routing attributes are integrated into the same area, reducing the algorithm's computational complexity during route discovery. Then, for the calculated area set, a coarse route composed of the selected area is established by filtering out a group of areas satisfying from the source node to the destination node. Finally, the bus sequence between anchor intersections is selected within the chosen area, and a complete multi-hop route from the source node to the destination node is finally constructed. Sufficient simulations show that the proposed routing algorithm has more stable performance in terms of packet transmission rate, average end-to-end delay, routing duration, and other indicators than similar algorithms.
A Deep Q-Network Based Intelligent Decision-Making Approach for Cognitive Radar
Yong TIAN Peng WANG Xinyue HOU Junpeng YU Xiaoyan PENG Hongshu LIAO Lin GAO

PAPER-Neural Networks and Bioengineering

Pubricized:
2021/10/15
Vol:
E105-A No:4
Page(s):
719-726
The electromagnetic environment is increasingly complex and changeable, and radar needs to meet the execution requirements of various tasks. Modern radars should improve their intelligence level and have the ability to learn independently in dynamic countermeasures. It can make the radar countermeasure strategy change from the traditional fixed anti-interference strategy to dynamically and independently implementing an efficient anti-interference strategy. Aiming at the performance optimization of target tracking in the scene where multiple signals coexist, we propose a countermeasure method of cognitive radar based on a deep Q-learning network. In this paper, we analyze the tracking performance of this method and the Markov Decision Process under the triangular frequency sweeping interference, respectively. The simulation results show that reinforcement learning has substantial autonomy and adaptability for solving such problems.
A Real-Time Subtask-Assistance Strategy for Adaptive Services Composition
Li QUAN Zhi-liang WANG Xin LIU

PAPER-Data Engineering, Web Information Systems

Pubricized:
2018/01/30
Vol:
E101-D No:5
Page(s):
1361-1369
Reinforcement learning has been used to adaptive service composition. However, traditional algorithms are not suitable for large-scale service composition. Based on Q-Learning algorithm, a multi-task oriented algorithm named multi-Q learning is proposed to realize subtask-assistance strategy for large-scale and adaptive service composition. Differ from previous studies that focus on one task, we take the relationship between multiple service composition tasks into account. We decompose complex service composition task into multiple subtasks according to the graph theory. Different tasks with the same subtasks can assist each other to improve their learning speed. The results of experiments show that our algorithm could obtain faster learning speed obviously than traditional Q-learning algorithm. Compared with multi-agent Q-learning, our algorithm also has faster convergence speed. Moreover, for all involved service composition tasks that have the same subtasks between each other, our algorithm can improve their speed of learning optimal policy simultaneously in real-time.
Adaptive Q-Learning Cell Selection Method for Open-Access Femtocell Networks: Multi-User Case
Chaima DHAHRI Tomoaki OHTSUKI

PAPER-Network Management/Operation

Vol:
E97-B No:8
Page(s):
1679-1688
Open-access femtocell networks assure the cellular user of getting a better and stronger signal. However, due to the small range of femto base stations (FBSs), any motion of the user may trigger handover. In a dense environment, the possibility of such handover is very frequent. To avoid frequent communication disruptions due to phenomena such as the ping-pong effect, it is necessary to ensure the effectiveness of the cell selection method. Existing selection methods commonly uses a measured channel/cell quality metric such as the channel capacity (between the user and the target cell). However, the throughput experienced by the user is time-varying because of the channel condition, i.e., owing to the propagation effects or receiver location. In this context, the conventional approach does not reflect the future performance. To ensure the efficiency of cell selection, user's decision needs to depend not only on the current state of the network, but also on the future possible states (horizon). To this end, we implement a learning algorithm that can predict, based on the past experience, the best performing cell in the future. We present in this paper a reinforcement learning (RL) framework as a generic solution for the cell selection problem in a non-stationary femtocell network that selects, without prior knowledge about the environment, a target cell by exploring past cells' behavior and predicting their potential future states based on Q-learning algorithm. Then, we extend this proposal by referring to a fuzzy inference system (FIS) to tune Q-learning parameters during the learning process to adapt to environment changes. Our solution aims at minimizing the frequency of handovers without affecting the user experience in terms of channel capacity. Simulation results demonstrate that· our solution comes very close to the performance of the opportunistic method in terms of capacity, while fewer handovers are required on average.· the use of fuzzy rules achieves better performance in terms of received reward (capacity) and number of handovers than fixing the values of Q-learning parameters.
Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning
Fereidoun H. PANAHI Tomoaki OHTSUKI

PAPER

Vol:
E97-B No:2
Page(s):
283-294
In a cognitive radio (CR) network, the channel sensing scheme used to detect the existence of a primary user (PU) directly affects the performances of both CR and PU. However, in practical systems, the CR is prone to sensing errors due to the inefficiency of the sensing scheme. This may yield primary user interference and low system performance. In this paper, we present a learning-based scheme for channel sensing in CR networks. Specifically, we formulate the channel sensing problem as a partially observable Markov decision process (POMDP), where the most likely channel state is derived by a learning process called Fuzzy Q-Learning (FQL). The optimal policy is derived by solving the problem. Simulation results show the effectiveness and efficiency of our proposed scheme.
Network Selection for Cognitive Radio Based on Fuzzy Learning
Mo LI Youyun XU Ruiqin MIAO

PAPER-Wireless Communication Technologies

Vol:
E94-B No:12
Page(s):
3490-3497
Cognitive radio is a promising approach to ensuring the coexistence of heterogeneous wireless networks since it can perceive wireless conditions and freely switch among different network modes. When there are many network opportunities, how to decide the appropriate network selection for CR user's current service is the main problem we study in this paper. We make full use of the intelligent characteristic of CR user and propose a fuzzy learning based network selection scheme, in which the network selection choice is made based on the estimated evaluations of available networks. Multiple factors are considered when estimating these evaluations. Both the outer environment factors directly sensed by CR user (signal strength of the available network and network mode), and also the factor that cannot be determined beforehand and is learnt by our scheme (the bandwidth allocated by the optional network) are considered. From several interactions with the wireless environment, the experience of network selection behavior is accumulated which will direct our scheme to make a proper decision of the network. Two simulations verify that our scheme could not only guarantee a better bandwidth requirement of CR user compared with other three network selection methods, but also shows it to be a reasonable scheme for utilizing the available resources of these networks.
Distributed Reinforcement Learning Approach for Vehicular Ad Hoc Networks
Celimuge WU Kazuya KUMEKAWA Toshihiko KATO

PAPER-Network

Vol:
E93-B No:6
Page(s):
1431-1442
In Vehicular Ad hoc Networks (VANETs), general purpose ad hoc routing protocols such as AODV cannot work efficiently due to the frequent changes in network topology caused by vehicle movement. This paper proposes a VANET routing protocol QLAODV (Q-Learning AODV) which suits unicast applications in high mobility scenarios. QLAODV is a distributed reinforcement learning routing protocol, which uses a Q-Learning algorithm to infer network state information and uses unicast control packets to check the path availability in a real time manner in order to allow Q-Learning to work efficiently in a highly dynamic network environment. QLAODV is favored by its dynamic route change mechanism, which makes it capable of reacting quickly to network topology changes. We present an analysis of the performance of QLAODV by simulation using different mobility models. The simulation results show that QLAODV can efficiently handle unicast applications in VANETs.
CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes
Hiroshi OSADA Satoshi FUJITA

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E88-D No:5
Page(s):
1004-1011
In this paper, we propose a new reinforcement learning scheme called CHQ that could efficiently acquire appropriate policies under partially observable Markov decision processes (POMDP) involving probabilistic state transitions, that frequently occurs in multi-agent systems in which each agent independently takes a probabilistic action based on a partial observation of the underlying environment. A key idea of CHQ is to extend the HQ-learning proposed by Wiering et al. in such a way that it could learn the activation order of the MDP subtasks as well as an appropriate policy under each MDP subtask. The goodness of the proposed scheme is experimentally evaluated. The result of experiments implies that it can acquire a deterministic policy with a sufficiently high success rate, even if the given task is POMDP with probabilistic state transitions.
Labeling Q-Learning in POMDP Environments
Haeyeon LEE Hiroyuki KAMAYA Kenichi ABE

PAPER-Biocybernetics, Neurocomputing

Vol:
E85-D No:9
Page(s):
1425-1432
This paper presents a new Reinforcement Learning (RL) method, called "Labeling Q-learning (LQ-learning)," to solve the partially obervable Markov Decision Process (POMDP) problems. Recently, hierarchical RL methods are widely studied. However, they have the drawback that the learning time and memory are exhausted only for keeping the hierarchical structure, though they wouldn't be necessary. On the other hand, our LQ-learning has no hierarchical structure, but adopts a new type of internal memory mechanism. Namely, in the LQ-learning, the agent percepts the current state by pair of observation and its label, and then, the agent can distinguish states, which look as same, but obviously different, more exactly. So to speak, at each step t, we define a new type of perception of its environment õt=(ot,θt), where ot is conventional observation, and θt is the label attached to the observation ot. Then the classical RL-algorithm is used as if the pair (ot,θt) serves as a Markov state. This labeling is carried out by a Boolean variable, called "CHANGE," and a hash-like or mod function, called Labeling Function (LF). In order to demonstrate the efficiency of LQ-learning, we will apply it to "maze problems" in Grid-Worlds, used in many literatures as POMDP simulated environments. By using the LQ-learning, we can solve the maze problems without initial knowledge of environments.
Agent-Oriented Routing in Telecommunications Networks
Karla VITTORI Aluizio F. R. ARAUJO

PAPER-Software Platform

Vol:
E84-B No:11
Page(s):
3006-3013
This paper presents an intelligent routing algorithm, called Q-Agents, which bases its actions only on the agent-environment interaction. This algorithm combines properties of three learning strategies (Q-learning, dual reinforcement learning and learning based on ant colony behavior), adding to them two further mechanisms to improve its adaptability. Hence, the proposed algorithm is composed of a set of agents, moving through the network independently and concurrently, searching for the best routes. The agents share knowledge about the quality of the paths traversed through indirect communication. Information about the network and traffic status is updated by using Q-learning and dual reinforcement updating rules. Q-Agents were applied to a model of an AT&T circuit-switched network. Experiments were carried out on the performance of the algorithm under variations of traffic patterns, load level and topology, and with addition of noise in the information used to route calls. Q-Agents suffered a lower number of lost calls than two algorithms based entirely on ant colony behavior.
Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment
Gang ZHAO Shoji TATSUMI Ruoying SUN

PAPER-Algorithms and Data Structures

Vol:
E83-A No:9
Page(s):
1786-1795
Reinforcement Learning (RL) is an efficient method for solving Markov Decision Processes (MDPs) without a priori knowledge about an environment, and can be classified into the exploitation oriented method and the exploration oriented method. Q-learning is a representative RL and is classified as an exploration oriented method. It is guaranteed to obtain an optimal policy, however, Q-learning needs numerous trials to learn it because there is not action-selecting mechanism in Q-learning. For accelerating the learning rate of the Q-learning and realizing exploitation and exploration at a learning process, the Q-ee learning system has been proposed, which uses pre-action-selector, action-selector and back propagation of Q values to improve the performance of Q-learning. But the Q-ee learning is merely suitable for deterministic MDPs, and its convergent guarantee to derive an optimal policy has not been proved. In this paper, based on discussing different exploration methods, replacing the pre-action-selector in the Q-ee learning, we introduce a method that can be used to implement an active exploration to an environment, the Active Exploration Planning (AEP), into the learning system, which we call the Q-ae learning. With this replacement, the Q-ae learning not only maintains advantages of the Q-ee learning but also is adapted to a stochastic environment. Moreover, under deterministic MDPs, this paper presents the convergent condition and its proof for an agent to obtain the optimal policy by the method of the Q-ae learning. Further, by discussions and experiments, it is shown that by adjusting the relation between the learning factor and the discounted rate, the exploration process to an environment can be controlled on a stochastic environment. And, experimental results about the exploration rate to an environment and the correct rate of learned policies also illustrate the efficiency of the Q-ae learning on the stochastic environment.

Keyword Search Result

[Keyword] Q-learning(13hit)

Performance Analysis and Optimization of Worst Case User in CoMP Ultra Dense Networks

A Hybrid Routing Algorithm for V2V Communication in VANETs Based on Blocked Q-Learning

A Novel Hierarchical V2V Routing Algorithm Based on Bus in Urban VANETs

A Deep Q-Network Based Intelligent Decision-Making Approach for Cognitive Radar

A Real-Time Subtask-Assistance Strategy for Adaptive Services Composition

Adaptive Q-Learning Cell Selection Method for Open-Access Femtocell Networks: Multi-User Case

Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning

Network Selection for Cognitive Radio Based on Fuzzy Learning

Distributed Reinforcement Learning Approach for Vehicular Ad Hoc Networks

CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes

Labeling Q-Learning in POMDP Environments

Agent-Oriented Routing in Telecommunications Networks

Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles