1-1hit |
Jae Won LEE Sung-Dong KIM Jongwoo LEE Jinseok CHAE
This paper describes a stock trading system based on reinforcement learning, regarding the process of stock price changes as Markov decision process (MDP). The system adopts two popular reinforcement learning algorithms, temporal-difference (TD) and Q, for selecting stocks and optimizing trading parameters, respectively. Input features of the system are devised using technical analysis and value functions are approximated by feedforward neural networks. Multiple cooperative agents are used for Q-learning to efficiently integrate global trend prediction with local trading strategy. Agents communicate with others sharing training episodes and learned policies, while keeping the overall scheme of conventional Q-learning. Experimental results on the Korean stock market show that our trading system outperforms the market average and makes appreciable profits. Furthermore, we can find that our system is superior to a system trained by supervised learning in view of risk management.