Analysis of Model-Free Reinforcement Learning Algorithm for Target Tracking

Muhammad Fikry, Rizal Tjut Adek, Zulfhazli Zulfhazli, Subhan Hartanto, Taufiqurrahman Taufiqurrahman, Dyah Ika Rinawati

Abstract


Target tracking is a process that can find points in different domains. In tracking, some places contain prizes (positive or negative values) that the agent does not know at first. Therefore, the agent, which is a system, must learn to get the maximum value with various learning rates. Reinforcement learning is a machine learning technique in which agents learn through interaction with the environment using reward functions and probabilistic dynamics to allow agents to explore and learn about the environment through various iterations. Thus, for each action taken, the agent receives a reward from the environment, which determines positive or negative behavior. The agent's goal is to maximize the total reward received during the interaction. In this case, the agent will study three different modules, namely sidewalk, obstacle, and product, using the Q-learning algorithm. Each module will be training with various learning rates and rewards. Q-learning can work effectively with the highest final reward at a learning rate of 0.8 for 500 rounds with an epsilon of 0.9.

Full Text:

PDF

References


Prabowo, R. C. (2018) ‘Penentuan Rute Distribusi Barang Yang Optimal Dengan Menggunakan Algoritma Heuristik pada PT. XYZ’, Strategy: Jurnal Teknik Industri, 3(1), pp. 47–50.

Hendra Bucika Glen Kadam, IG. Jaka Mulyana, J. M. (2018) ‘Penentuan Rute Terpendek Dengan Metode Tabu Search (Studi Kasus)’, Widya Teknik, 17(2), pp. 94–103.

Sirojul Hadi, Parama Diptya Widayaka, Radimas Putra M.D.L, R. D. (2020) ‘Pengukuran Jarak Pada Mobile Robot Menggunakan Xbee Berdasarkan Nilai Receive Signal Strength Indicator (RSSI)’, Jurnal Bumigora Information Technology (BITe), 2(1), pp. 66–70.

Barto, R. S. S. and A. G. (2018) Reinforcement Learning: An Introduction. Second. MIT press.

Laber, J. C. And E. (2020) ‘Q-Learning: Theory and Applications’, Annual Review of Statistics and Its Application, 7, pp. 279–301.

Chakrabarti, S. M. O. T. S. (2018) ‘Reinforcement learning algorithms for uncertain, dynamic, zero-sum games’, in International Conference on Machine Learning and Applications (ICMLA). IEEE, pp. 48–54.

Watkins, C. J. C. H. (1989) Learning from delayed rewards. Cambridge University.

Ghulam Muhammad Ali, Asif Mansoor, Shuai Liu, Jacek Olearczyk, A. B. and M. A.-H. (2020) ‘Attaining global optimized solution by applying Q-learning’, in International Multidisciplinary Modeling & Simulation Multiconference, pp. 112–119.


Refbacks

  • There are currently no refbacks.


Journal of Computer Engineering, Electronics and Information Technology (COELITE)
is published by UNIVERSITAS PENDIDIKAN INDONESIA, and managed by Department of Computer Enginering.