1-15hit |
Yong TIAN Peng WANG Xinyue HOU Junpeng YU Xiaoyan PENG Hongshu LIAO Lin GAO
The electromagnetic environment is increasingly complex and changeable, and radar needs to meet the execution requirements of various tasks. Modern radars should improve their intelligence level and have the ability to learn independently in dynamic countermeasures. It can make the radar countermeasure strategy change from the traditional fixed anti-interference strategy to dynamically and independently implementing an efficient anti-interference strategy. Aiming at the performance optimization of target tracking in the scene where multiple signals coexist, we propose a countermeasure method of cognitive radar based on a deep Q-learning network. In this paper, we analyze the tracking performance of this method and the Markov Decision Process under the triangular frequency sweeping interference, respectively. The simulation results show that reinforcement learning has substantial autonomy and adaptability for solving such problems.
Cheng ZHANG Bo GU Zhi LIU Kyoko YAMORI Yoshiaki TANAKA
With the rapid increase in demand for mobile data, mobile network operators are trying to expand wireless network capacity by deploying wireless local area network (LAN) hotspots on which they can offload their mobile traffic. However, these network-centric methods usually do not fulfill the interests of mobile users (MUs). Taking into consideration many issues, MUs should be able to decide whether to offload their traffic to a complementary wireless LAN. Our previous work studied single-flow wireless LAN offloading from a MU's perspective by considering delay-tolerance of traffic, monetary cost and energy consumption. In this paper, we study the multi-flow mobile data offloading problem from a MU's perspective in which a MU has multiple applications to download data simultaneously from remote servers, and different applications' data have different deadlines. We formulate the wireless LAN offloading problem as a finite-horizon discrete-time Markov decision process (MDP) and establish an optimal policy by a dynamic programming based algorithm. Since the time complexity of the dynamic programming based offloading algorithm is still high, we propose a low time complexity heuristic offloading algorithm with performance sacrifice. Extensive simulations are conducted to validate our proposed offloading algorithms.
Kentaro DOMOTO Takehito UTSURO Naoki SAWADA Hiromitsu NISHIZAKI
This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine's output. In a front-end process, the STD engine is used to pre-index target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of keywords and their detection intervals (positions) in the spoken documents. For keywords having competitive intervals, we rank them based on the STD matching cost and select the one having the longest duration among competitive detections. The selected keywords are registered in the pre-index. They are then used to train an SVM-based classifier. In a query term search process, a query term is searched by the same STD engine, and the output candidates are verified by the SVM-based classifier. Our proposed two-stage STD method with pre-indexing was evaluated using the NTCIR-10 SpokenDoc-2 STD task and it drastically outperformed the traditional STD method based on dynamic time warping and a confusion network-based index.
Fereidoun H. PANAHI Tomoaki OHTSUKI
In a cognitive radio (CR) network, the channel sensing scheme used to detect the existence of a primary user (PU) directly affects the performances of both CR and PU. However, in practical systems, the CR is prone to sensing errors due to the inefficiency of the sensing scheme. This may yield primary user interference and low system performance. In this paper, we present a learning-based scheme for channel sensing in CR networks. Specifically, we formulate the channel sensing problem as a partially observable Markov decision process (POMDP), where the most likely channel state is derived by a learning process called Fuzzy Q-Learning (FQL). The optimal policy is derived by solving the problem. Simulation results show the effectiveness and efficiency of our proposed scheme.
Fengfei ZHAO Zheng QIN Zhuo SHAO
The traditional reinforcement learning (RL) methods can solve Markov Decision Processes (MDPs) online, but these learning methods cannot effectively use a priori knowledge to guide the learning process. The exploration of the optimal policy is time-consuming and does not employ the information about specific issues. To tackle the problem, this paper proposes heuristic function negotiation (HFN) as an online learning framework. The HFN framework extends MDPs and introduces heuristic functions. HFN changes the state-action dual layer structure of traditional RL to the triple layer structure, in which multiple heuristic functions can be set to meet the needs required to solve the problem. The HFN framework can use different algorithms to let the functions negotiate to determine the appropriate action, and adjust the impact of each function according to the rewards. The HFN framework introduces domain knowledge by setting heuristic functions and thus speeds up the problem solving of MDPs. Furthermore, user preferences can be reflected in the learning process, which improves the flexibility of RL. The experiments show that, by setting reasonable heuristic functions, the learning results of the HFN framework are more efficient than traditional RL. We also apply HFN to the air combat simulation of unmanned aerial vehicles (UAVs), which shows that different function settings lead to different combat behaviors.
Fang WANG Yong LI Zhaocheng WANG Zhixing YANG
There has been an explosion in wireless devices and mobile data traffic, and cellular network alone is unable to support such fast growing demand on data transmission. Therefore, it is reasonable to add another network to the cellular network to augment the capacity. In fact, the dilemma of cellular network is mainly caused by that the same content is repeatedly transmitted in the network, since many people are interested in the same content. A broadcast network, however, could mitigate this problem and save wireless bandwidth by delivering popular content to multiple clients simultaneously. This paper presents a content dissemination system that combines broadcast and cellular networks. Using the model of Markov Decision Process (MDP), we propose an online optimal scheme to maximize the expected number of clients receiving their interested content, which takes clients' interests and queuing length at broadcast and cellular base stations into full consideration. Simulations demonstrate that the proposed scheme effectively decreases item drop rate at base stations and enhances the average number of clients who receive their interested content.
Jaak SIMM Masashi SUGIYAMA Hirotaka HACHIYA
Reinforcement learning (RL) is a flexible framework for learning a decision rule in an unknown environment. However, a large number of samples are often required for finding a useful decision rule. To mitigate this problem, the concept of transfer learning has been employed to utilize knowledge obtained from similar RL tasks. However, most approaches developed so far are useful only in low-dimensional settings. In this paper, we propose a novel transfer learning idea that targets problems with high-dimensional states. Our idea is to transfer knowledge between state factors (e.g., interacting objects) within a single RL task. This allows the agent to learn the system dynamics of the target RL task with fewer data samples. The effectiveness of the proposed method is demonstrated through experiments.
In this letter, we propose a Partially Observable Markov Decision Process (POMDP) based Distributed Adaptive Opportunistic Spectrum Access (DA-OSA) Strategy for Cognitive Ad Hoc Networks (CAHNs). In each slot, the source and destination choose a set of channels to sense and then decide the transmission channels based on the sensing results. In order to maximize the throughput for each link, we use the theories of sequential decision and optimal stopping to determine the optimal sensing channel set. Moreover, we also establish the myopic policy and exploit the monotonicity of the reward function that we use, which can be used to reduce the complexity of the sequential decision.
Ngo Anh VIEN SeungGwan LEE TaeChoong CHUNG
In and we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.
Jeong Geun KIM Ca Van PHAN Wonha KIM
We analyze the performance of an opportunistic transmission strategy for Wireless Sensor Networks (WSNs). We consider a transmission strategy called Binary Decision-Based Transmission (BDT), which is a common form of opportunistic transmission. The BDT scheme initiates transmission only when the channel quality exceeds the optimum threshold to avoid unsuccessful transmissions that waste energy. We formulate the Markov Decision Process (MDP) to identify an optimum threshold for transmission decisions in the BDT scheme.
Ngo Anh VIEN Nguyen Hoang VIET SeungGwan LEE TaeChoong CHUNG
In this paper, we solve the call admission control (CAC) and routing problem in an integrated network that handles several classes of calls of different values and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a semi-Markov Decision Process (SMDP) problem, but is too complex to allow for an exact solution. Thus in this paper, a policy gradient algorithm, together with a decomposition approach, is proposed to find the dynamic (state-dependent) optimal CAC and routing policy among a parameterized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate gradient of the average reward (called GSMDP algorithm), using only a single sample path of the underlying Markov chain for the SMDP of CAC and routing problem. The algorithm enhances performance in terms of convergence speed, rejection probability, robustness to the changing arrival statistics and an overall received average revenue. The experimental simulations will compare our method's performance with other existing methods and show the robustness of our method.
Youngjoo HAN Hyewon SONG Byungsang KIM Chan-Hyun YOUN
Due to the dynamic nature and uncertainty of grid computing, system reliability can become very unpredictable. Thus, a well-defined scheduling mechanism that provides high system availability for grid applications is required. In this letter, we propose a SLA-constrained policy-based scheduling mechanism to enhance system performance in grid. Also, we implement the proposed model and show that our policy-based scheduling mechanism can guarantee high system availability as well as support load balancing on an experimental basis.
Minoru OHMIKAWA Hideaki TAKAGI Sang-Yong KIM
We propose a new call admission control (CAC) scheme for voice calls in cellular mobile communication networks. It is assumed that the rejection of a hand-off call is less desirable than that of a new call, for a hand-off call loss would cause a severe mental pain to a user. We consider the pains of rejecting new and hand-off calls as different costs. The key idea of our CAC is to restrict the admission of new calls in order to minimize the total expected costs per unit time over the long term. An optimal policy is derived from a semi-Markov decision process in which the intervals between successive decision epochs are exponentially distributed. Based on this optimal policy, we calculate the steady state probability for the number of established voice connections in a cell. We then evaluate the probability of blocking new calls and the probability of forced termination of hand-off calls. In the numerical experiments, it is found that the forced termination probability of hand-off calls is reduced significantly by our CAC scheme at the slight expense of the blocking probability of new calls and the channel utilization. Comparison with the static guard channel scheme is made.
Kazunori IWATA Kazushi IKEDA Hideaki SAKAI
We regard the events of a Markov decision process as the outputs from a Markov information source in order to analyze the randomness of an empirical sequence by the codeword length of the sequence. The randomness is an important viewpoint in reinforcement learning since the learning is to eliminate the randomness and to find an optimal policy. The occurrence of optimal empirical sequence also depends on the randomness. We then introduce the Lempel-Ziv coding for measuring the randomness which consists of the domain size and the stochastic complexity. In experimental results, we confirm that the learning and the occurrence of optimal empirical sequence depend on the randomness and show the fact that in early stages the randomness is mainly characterized by the domain size and as the number of time steps increases the randomness depends greatly on the complexity of Markov decision processes.
Ren-Hung HWANG Huang-Leng CHANG
In the circuit-switching literature, the Least Loaded Path Routing (LLR) concept has been shown to be very simple and efficient. However, it seems that there is no unique definition for the "least busy" path, i.e., how to measure the degree of "busy" of a path. In this paper, we examine six ways of defining the least busy path and a random policy. The performance of these policies is evaluated via both simulation and analysis. Our numerical results show that all policies, include the random policy, have almost the same performance under most of the network configurations. Only under extremely low traffic load conditions, the difference between the policies becomes significant. However, the magnitude of the difference is still very small (about 0.001). Therefore, we conclude that how to select the alternate path does not affect the performance of LLR-based routing algorithms significantly when the call blocking probability is not too small. Instead, we found that the trunk reservation level affects the performance of LLR-based routing algorithms significantly.