1-5hit |
Ryo WATANABE Junpei KOMIYAMA Atsuyoshi NAKAMURA Mineichi KUDO
We study the budgeted multi-armed bandit problem with stochastic action costs. In this problem, a player not only receives a reward but also pays a cost for an action of his/her choice. The goal of the player is to maximize the cumulative reward he/she receives before the total cost exceeds the budget. In the classical multi-armed bandit problem, a policy called KL-UCB is known to perform well. We propose KL-UCB-SC, an extension of this policy for the budgeted bandit problem. We prove that KL-UCB-SC is asymptotically optimal for the case of Bernoulli costs and rewards. To the best of our knowledge, this is the first result that shows asymptotic optimality in the study of the budgeted bandit problem. In fact, our regret upper bound is at least four times better than that of BTS, the best known upper bound for the budgeted bandit problem. Moreover, an empirical simulation we conducted shows that the performance of a tuned variant of KL-UCB-SC is comparable to that of state-of-the-art policies such as PD-BwK and BTS.
Hirofumi NAKAMURA Sadayuki MURASHIMA
A recursive-type positive integer code is proposed. It prefixes the information about the length of the component of the codeword recursively. It is an asymptotically optimal code. The codeword length for a positive integer n is shorter than n bits in almost all of sufficiently large positive integers, where n is the log-star function.
Hirofumi NAKAMURA Sadayuki MURASHIMA
A positive integer code EXEb,h,d(b1, h1,d0) is proposed. Its codeword for a positive integer n consists of three kinds of information: (1) how many times the number of n's digits can be subtracted by the terms of a progression including a geometric progression, (2) the rest of the subtractions, and (3) given value of the positive integer n. EXEb,h,d is a non-recursive type code. It is an asymptotically optimal code (for d1) and preserves the lexicographic,length, and number orders (for bh+2). Some examples of EXEb,h,d are also presented. Their codeword lengths are found to be shorter than the Amemiya and Yamamoto code CEk except for small positive integers.
The paper considers the design of two families of binary block codes developed for controlling large numbers of errors which may occur in LSI, optical disks and other devices. The semidistance codes are capable of assuring a required signal-to-noise ratio in information retrieval; the t-symmetric error correcting/all unidirectional error detecting" (t-SyEC/AUED) codes are capable of correcting t or fewer symmetric errors and also detecting any number of unidirectional errors caused by the asymmetric nature of transmission or storage madia. The paper establishes an equivalence between these families of codes, and proposes improved methods for constructing, for any values of t, a class of nonsystematic constant weight codes as well as a class of systematic codes. The constructed codes of both classes are shown to be optimal when t is O, and of asymptotically optimal order" in general cases. The number of redundant bits of the obtained nonsystematic code is of the order of (t+1/2)
This letter considers a subclass of t-symmetric error correcting/all unidirectional error detecting (t-SyEC/AUED) codes in which the information is represented in an m-out-of-k coded form, which thus can be regarded as virtually systematic for practical purposes. For t3, previous researchers proposed methods for constructing codes of this subclass which are either optimal or of asymptotically optimal order. This letter proposes a new method for constructing, for any values of t, m and k, codes that are either optimal or of asymptotically optimal order. The redundancy of the obtained code is of the order t