1-1hit |
Learning behaviors of hierarchically structured stochastic automata operating in a general nonstationary multiteacher environment are considered. It is shown that convergence with probability 1 to the optimal path is ensured by a new learning algorithm which is an extended form of the relative reward strength algorithm. Several computer simulation results confirm the effectiveness of the proposed algorithm.