IEICE global.ieice.org Site

Keyword Search Result

[Keyword] decision process(15hit)

1-15hit

A Deep Q-Network Based Intelligent Decision-Making Approach for Cognitive Radar
Yong TIAN Peng WANG Xinyue HOU Junpeng YU Xiaoyan PENG Hongshu LIAO Lin GAO

PAPER-Neural Networks and Bioengineering

Pubricized:
2021/10/15
Vol:
E105-A No:4
Page(s):
719-726
The electromagnetic environment is increasingly complex and changeable, and radar needs to meet the execution requirements of various tasks. Modern radars should improve their intelligence level and have the ability to learn independently in dynamic countermeasures. It can make the radar countermeasure strategy change from the traditional fixed anti-interference strategy to dynamically and independently implementing an efficient anti-interference strategy. Aiming at the performance optimization of target tracking in the scene where multiple signals coexist, we propose a countermeasure method of cognitive radar based on a deep Q-learning network. In this paper, we analyze the tracking performance of this method and the Markov Decision Process under the triangular frequency sweeping interference, respectively. The simulation results show that reinforcement learning has substantial autonomy and adaptability for solving such problems.
Cost- and Energy-Aware Multi-Flow Mobile Data Offloading Using Markov Decision Process
Cheng ZHANG Bo GU Zhi LIU Kyoko YAMORI Yoshiaki TANAKA

PAPER

Pubricized:
2017/09/19
Vol:
E101-B No:3
Page(s):
657-666
With the rapid increase in demand for mobile data, mobile network operators are trying to expand wireless network capacity by deploying wireless local area network (LAN) hotspots on which they can offload their mobile traffic. However, these network-centric methods usually do not fulfill the interests of mobile users (MUs). Taking into consideration many issues, MUs should be able to decide whether to offload their traffic to a complementary wireless LAN. Our previous work studied single-flow wireless LAN offloading from a MU's perspective by considering delay-tolerance of traffic, monetary cost and energy consumption. In this paper, we study the multi-flow mobile data offloading problem from a MU's perspective in which a MU has multiple applications to download data simultaneously from remote servers, and different applications' data have different deadlines. We formulate the wireless LAN offloading problem as a finite-horizon discrete-time Markov decision process (MDP) and establish an optimal policy by a dynamic programming based algorithm. Since the time complexity of the dynamic programming based offloading algorithm is still high, we propose a low time complexity heuristic offloading algorithm with performance sacrifice. Extensive simulations are conducted to validate our proposed offloading algorithms.
Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords
Kentaro DOMOTO Takehito UTSURO Naoki SAWADA Hiromitsu NISHIZAKI

PAPER-Spoken term detection

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2528-2538
This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine's output. In a front-end process, the STD engine is used to pre-index target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of keywords and their detection intervals (positions) in the spoken documents. For keywords having competitive intervals, we rank them based on the STD matching cost and select the one having the longest duration among competitive detections. The selected keywords are registered in the pre-index. They are then used to train an SVM-based classifier. In a query term search process, a query term is searched by the same STD engine, and the output candidates are verified by the SVM-based classifier. Our proposed two-stage STD method with pre-indexing was evaluated using the NTCIR-10 SpokenDoc-2 STD task and it drastically outperformed the traditional STD method based on dynamic time warping and a confusion network-based index.
Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning
Fereidoun H. PANAHI Tomoaki OHTSUKI

PAPER

Vol:
E97-B No:2
Page(s):
283-294
In a cognitive radio (CR) network, the channel sensing scheme used to detect the existence of a primary user (PU) directly affects the performances of both CR and PU. However, in practical systems, the CR is prone to sensing errors due to the inefficiency of the sensing scheme. This may yield primary user interference and low system performance. In this paper, we present a learning-based scheme for channel sensing in CR networks. Specifically, we formulate the channel sensing problem as a partially observable Markov decision process (POMDP), where the most likely channel state is derived by a learning process called Fuzzy Q-Learning (FQL). The optimal policy is derived by solving the problem. Simulation results show the effectiveness and efficiency of our proposed scheme.
Heuristic Function Negotiation for Markov Decision Process and Its Application in UAV Simulation
Fengfei ZHAO Zheng QIN Zhuo SHAO

PAPER-Artificial Intelligence, Data Mining

Vol:
E97-D No:1
Page(s):
89-97
The traditional reinforcement learning (RL) methods can solve Markov Decision Processes (MDPs) online, but these learning methods cannot effectively use a priori knowledge to guide the learning process. The exploration of the optimal policy is time-consuming and does not employ the information about specific issues. To tackle the problem, this paper proposes heuristic function negotiation (HFN) as an online learning framework. The HFN framework extends MDPs and introduces heuristic functions. HFN changes the state-action dual layer structure of traditional RL to the triple layer structure, in which multiple heuristic functions can be set to meet the needs required to solve the problem. The HFN framework can use different algorithms to let the functions negotiate to determine the appropriate action, and adjust the impact of each function according to the rewards. The HFN framework introduces domain knowledge by setting heuristic functions and thus speeds up the problem solving of MDPs. Furthermore, user preferences can be reflected in the learning process, which improves the flexibility of RL. The experiments show that, by setting reasonable heuristic functions, the learning results of the HFN framework are more efficient than traditional RL. We also apply HFN to the air combat simulation of unmanned aerial vehicles (UAVs), which shows that different function settings lead to different combat behaviors.
Online Content Dissemination in Hybrid Broadcast-Unicast Networks
Fang WANG Yong LI Zhaocheng WANG Zhixing YANG

PAPER-Wireless Communication Technologies

Vol:
E96-B No:6
Page(s):
1551-1558
There has been an explosion in wireless devices and mobile data traffic, and cellular network alone is unable to support such fast growing demand on data transmission. Therefore, it is reasonable to add another network to the cellular network to augment the capacity. In fact, the dilemma of cellular network is mainly caused by that the same content is repeatedly transmitted in the network, since many people are interested in the same content. A broadcast network, however, could mitigate this problem and save wireless bandwidth by delivering popular content to multiple clients simultaneously. This paper presents a content dissemination system that combines broadcast and cellular networks. Using the model of Markov Decision Process (MDP), we propose an online optimal scheme to maximize the expected number of clients receiving their interested content, which takes clients' interests and queuing length at broadcast and cellular base stations into full consideration. Simulations demonstrate that the proposed scheme effectively decreases item drop rate at base stations and enhances the average number of clients who receive their interested content.
Multi-Task Approach to Reinforcement Learning for Factored-State Markov Decision Problems
Jaak SIMM Masashi SUGIYAMA Hirotaka HACHIYA

PAPER-Artificial Intelligence, Data Mining

Vol:
E95-D No:10
Page(s):
2426-2437
Reinforcement learning (RL) is a flexible framework for learning a decision rule in an unknown environment. However, a large number of samples are often required for finding a useful decision rule. To mitigate this problem, the concept of transfer learning has been employed to utilize knowledge obtained from similar RL tasks. However, most approaches developed so far are useful only in low-dimensional settings. In this paper, we propose a novel transfer learning idea that targets problems with high-dimensional states. Our idea is to transfer knowledge between state factors (e.g., interacting objects) within a single RL task. This allows the agent to learn the system dynamics of the target RL task with fewer data samples. The effectiveness of the proposed method is demonstrated through experiments.
A POMDP Based Distributed Adaptive Opportunistic Spectrum Access Strategy for Cognitive Ad Hoc Networks
Yichen WANG Pinyi REN Zhou SU

LETTER

Vol:
E94-B No:6
Page(s):
1621-1624
In this letter, we propose a Partially Observable Markov Decision Process (POMDP) based Distributed Adaptive Opportunistic Spectrum Access (DA-OSA) Strategy for Cognitive Ad Hoc Networks (CAHNs). In each slot, the source and destination choose a set of channels to sense and then decide the transmission channels based on the sensing results. In order to maximize the throughput for each link, we use the theories of sequential decision and optimal stopping to determine the optimal sensing channel set. Moreover, we also establish the myopic policy and exploit the monotonicity of the reward function that we use, which can be used to reduce the complexity of the sequential decision.
Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
Ngo Anh VIEN SeungGwan LEE TaeChoong CHUNG

PAPER

Vol:
E93-D No:2
Page(s):
271-279
In and we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.
Performance Analysis of an Opportunistic Transmission Scheme for Wireless Sensor Networks
Jeong Geun KIM Ca Van PHAN Wonha KIM

LETTER-Network

Vol:
E92-B No:6
Page(s):
2259-2262
We analyze the performance of an opportunistic transmission strategy for Wireless Sensor Networks (WSNs). We consider a transmission strategy called Binary Decision-Based Transmission (BDT), which is a common form of opportunistic transmission. The BDT scheme initiates transmission only when the channel quality exceeds the optimum threshold to avoid unsuccessful transmissions that waste energy. We formulate the Markov Decision Process (MDP) to identify an optimum threshold for transmission decisions in the BDT scheme.
Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks
Ngo Anh VIEN Nguyen Hoang VIET SeungGwan LEE TaeChoong CHUNG

PAPER-Network

Vol:
E92-B No:6
Page(s):
2008-2022
In this paper, we solve the call admission control (CAC) and routing problem in an integrated network that handles several classes of calls of different values and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a semi-Markov Decision Process (SMDP) problem, but is too complex to allow for an exact solution. Thus in this paper, a policy gradient algorithm, together with a decomposition approach, is proposed to find the dynamic (state-dependent) optimal CAC and routing policy among a parameterized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate gradient of the average reward (called GSMDP algorithm), using only a single sample path of the underlying Markov chain for the SMDP of CAC and routing problem. The algorithm enhances performance in terms of convergence speed, rejection probability, robustness to the changing arrival statistics and an overall received average revenue. The experimental simulations will compare our method's performance with other existing methods and show the robustness of our method.
SLA-Constrained Policy-Based Scheduling Mechanism in Grid
Youngjoo HAN Hyewon SONG Byungsang KIM Chan-Hyun YOUN

LETTER-Network

Vol:
E91-B No:12
Page(s):
4009-4012
Due to the dynamic nature and uncertainty of grid computing, system reliability can become very unpredictable. Thus, a well-defined scheduling mechanism that provides high system availability for grid applications is required. In this letter, we propose a SLA-constrained policy-based scheduling mechanism to enhance system performance in grid. Also, we implement the proposed model and show that our policy-based scheduling mechanism can guarantee high system availability as well as support load balancing on an experimental basis.
Optimal Call Admission Control for Voice Traffic in Cellular Mobile Communication Networks
Minoru OHMIKAWA Hideaki TAKAGI Sang-Yong KIM

PAPER-Network Management/Operation

Vol:
E88-A No:7
Page(s):
1809-1815
We propose a new call admission control (CAC) scheme for voice calls in cellular mobile communication networks. It is assumed that the rejection of a hand-off call is less desirable than that of a new call, for a hand-off call loss would cause a severe mental pain to a user. We consider the pains of rejecting new and hand-off calls as different costs. The key idea of our CAC is to restrict the admission of new calls in order to minimize the total expected costs per unit time over the long term. An optimal policy is derived from a semi-Markov decision process in which the intervals between successive decision epochs are exponentially distributed. Based on this optimal policy, we calculate the steady state probability for the number of established voice connections in a cell. We then evaluate the probability of blocking new calls and the probability of forced termination of hand-off calls. In the numerical experiments, it is found that the forced termination probability of hand-off calls is reduced significantly by our CAC scheme at the slight expense of the blocking probability of new calls and the channel utilization. Comparison with the static guard channel scheme is made.
On the Effects of Domain Size and Complexity in Empirical Distribution of Reinforcement Learning
Kazunori IWATA Kazushi IKEDA Hideaki SAKAI

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E88-D No:1
Page(s):
135-142
We regard the events of a Markov decision process as the outputs from a Markov information source in order to analyze the randomness of an empirical sequence by the codeword length of the sequence. The randomness is an important viewpoint in reinforcement learning since the learning is to eliminate the randomness and to find an optimal policy. The occurrence of optimal empirical sequence also depends on the randomness. We then introduce the Lempel-Ziv coding for measuring the randomness which consists of the domain size and the stochastic complexity. In experimental results, we confirm that the learning and the occurrence of optimal empirical sequence depend on the randomness and show the fact that in early stages the randomness is mainly characterized by the domain size and as the number of time steps increases the randomness depends greatly on the complexity of Markov decision processes.
On LLR Routing in Circuit-Switched Networks
Ren-Hung HWANG Huang-Leng CHANG

PAPER-Network

Vol:
E84-B No:5
Page(s):
1397-1405
In the circuit-switching literature, the Least Loaded Path Routing (LLR) concept has been shown to be very simple and efficient. However, it seems that there is no unique definition for the "least busy" path, i.e., how to measure the degree of "busy" of a path. In this paper, we examine six ways of defining the least busy path and a random policy. The performance of these policies is evaluated via both simulation and analysis. Our numerical results show that all policies, include the random policy, have almost the same performance under most of the network configurations. Only under extremely low traffic load conditions, the difference between the policies becomes significant. However, the magnitude of the difference is still very small (about 0.001). Therefore, we conclude that how to select the alternate path does not affect the performance of LLR-based routing algorithms significantly when the call blocking probability is not too small. Instead, we found that the trunk reservation level affects the performance of LLR-based routing algorithms significantly.

Keyword Search Result

[Keyword] decision process(15hit)

A Deep Q-Network Based Intelligent Decision-Making Approach for Cognitive Radar

Cost- and Energy-Aware Multi-Flow Mobile Data Offloading Using Markov Decision Process

Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords

Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning

Heuristic Function Negotiation for Markov Decision Process and Its Application in UAV Simulation

Online Content Dissemination in Hybrid Broadcast-Unicast Networks

Multi-Task Approach to Reinforcement Learning for Factored-State Markov Decision Problems

A POMDP Based Distributed Adaptive Opportunistic Spectrum Access Strategy for Cognitive Ad Hoc Networks

Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors

Performance Analysis of an Opportunistic Transmission Scheme for Wireless Sensor Networks

Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks

SLA-Constrained Policy-Based Scheduling Mechanism in Grid

Optimal Call Admission Control for Voice Traffic in Cellular Mobile Communication Networks

On the Effects of Domain Size and Complexity in Empirical Distribution of Reinforcement Learning

On LLR Routing in Circuit-Switched Networks

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles