Experience in Using the Transformer Network Architecture to Approximate Agent’s Policy in Reinforcement Learning

N.P. Novikov; V.I. Vinogradov

doi:10.17759/mda.2024140201

Modelling and Data Analysis
2024. Vol. 14, no. 2, 7–22
doi:10.17759/mda.2024140201
ISSN: 2219-3758 / 2311-9454 (online)

Experience in Using the Transformer Network Architecture to Approximate Agent’s Policy in Reinforcement Learning

56

N.P. Novikov, V.I. Vinogradov

Abstract

This paper discusses the basics of the deep reinforcement learning algorithm and the use of neural networks to approximate the agent’s policy. The comparison of using a fully connected neural network and a transformer network in the reinforcement learning algorithm is considered.

General Information

Keywords: artificial intelligence, machine learning, deep reinforcement learning, Markov decision processes, transformer, optimization

Journal rubric: Data Analysis

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2024140201

Received: 03.06.2024

Accepted: 21.03.2024

For citation: Novikov N.P., Vinogradov V.I. Experience in Using the Transformer Network Architecture to Approximate Agent’s Policy in Reinforcement Learning. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2024. Vol. 14, no. 2, pp. 7–22. DOI: 10.17759/mda.2024140201. (In Russ., аbstr. in Engl.)

References

An outline of reinforcement learning // Arxiv URL: https://arxiv.org/pdf/2201.09746.pdf (circulation date: 02.01.2024).
Proximal Policy Optimization Algorithms // Arxiv URL: https://arxiv.org/pdf/1707.06347.pdf (circulation date: 24.12.2023).
Attention Is All You Need // Arxiv URL: https://arxiv.org/abs/1706.03762 (circulation date: 16.12.2023).
Gymnasium URL: https://gymnasium.farama.org/ (circulation date: 10.12.2023).
Stable Baselines Documentation // URL: https://buildmedia.readthedocs.org/media/pdf/stable-baselines/master/stable-baselines.pdf (circulation date: 08.12.2023).
High-dimensional continuous control using generalized advantage estimation // Arxiv URL: https://arxiv.org/pdf/1506.02438.pdf (circulation date: 07.12.2023).

Information About the Authors

Nikita P. Novikov, master's student, Institute of Computer Science and Applied Mathematics, Moscow Aviation Institute (National Research University) (MAI), Moscow, Russian Federation, e-mail: rtyderson@gmail.com

Vladimir I. Vinogradov, Candidate of Science (Physics and Matematics), Associate Professor, Department of Mathematical Cybernetics, Moscow Aviation Institute (National Research University), Moscow, Russian Federation, ORCID: https://orcid.org/0000-0003-3773-9653, e-mail: vvinogradov@inbox.ru

Metrics

Web Views

Whole time: 135
Previous month: 27
Current month: 13

PDF Downloads

Whole time: 56
Previous month: 8
Current month: 3

Total

Whole time: 191
Previous month: 35
Current month: 16

PlumX

article metrics