1-2hit |
Shucong TIAN Meng YANG Jianpeng WANG Rui WANG Avik R. ADHIKARY
AlphaSeq is a new paradigm to design sequencess with desired properties based on deep reinforcement learning (DRL). In this work, we propose a new metric function and a new reward function, to design an improved version of AlphaSeq. We show analytically and also through numerical simulations that the proposed algorithm can discover sequence sets with preferable properties faster than that of the previous algorithm.
Kazuki TERAOKA Kohei HATANO Eiji TAKIMOTO
We consider Monte Carlo tree search problem, a variant of Min-Max tree search problem where the score of each leaf is the expectation of some Bernoulli variables and not explicitly given but can be estimated through (random) playouts. The goal of this problem is, given a game tree and an oracle that returns an outcome of a playout, to find a child node of the root which attains an approximate min-max score. This problem arises in two player games such as computer Go. We propose a simple and efficient algorithm for Monte Carlo tree search problem.