Method for Creating Behavior of Cognitive Agents Based on Multimodal Signal Processing

D.A. Weizenfeld; G.A. Kiselev

doi:10.17759/mda.2024140403

Modelling and Data Analysis
2024. Vol. 14, no. 4, 45–62
doi:10.17759/mda.2024140403
ISSN: 2219-3758 / 2311-9454 (online)

Method for Creating Behavior of Cognitive Agents Based on Multimodal Signal Processing

33

D.A. Weizenfeld, G.A. Kiselev

Abstract

The paper considers the problem of predicting the agent's activity based on the text description of the task and visual analysis of the environment. An update of the approaches of classical cognitive architecture is proposed, allowing its application in a real environment. An addition to the semiotic method of symbolic designation with the author's neural network mechanism for linking vectors of text and visual spaces is developed. A number of experiments with the obtained model in a complex environment of a car driving emulator are conducted.

General Information

Keywords: activity-related experience, quality of motivation, self-determination theory, intrinsic motivation, extrinsic motivation, academic motivation

Journal rubric: Data Analysis

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2024140403

Funding. This publication has been supported by the RUDN University Strategic Academic Leadership Program, project No. 021934-0-000.The research was carried out using the infrastructure of the Shared Research Facilities «High Performance Computing and Big Data» (CKP «Informatics») of FRC CSC RAS (Moscow).

Received: 15.11.2024

Accepted: 02.12.2024

For citation: Weizenfeld D.A., Kiselev G.A. Method for Creating Behavior of Cognitive Agents Based on Multimodal Signal Processing. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2024. Vol. 14, no. 4, pp. 45–62. DOI: 10.17759/mda.2024140403. (In Russ., аbstr. in Engl.)

References

Bechon, P., Barbier, M., Grand, C., Lacroix, S., Lesire, C., & Pralet, C. (2018). Integrating planning and execution for a team of heterogeneous robots with time and communication constraints. 1091–1097.
Benjamin, D. P., Li, T., Shen, P., Yue, H., Zhao, Z., & Lyons, D. (2018). Spatial understanding as a common basis for human-robot collaboration. Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-3-319-60384-1_3J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, Y. Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1060, 2004.
Chu, Z., Wang, Y., Zhu, F., Yu, L., Li, L., & Gu, J. (2024). Professional Agents - Evolving Large Language Models into Autonomous Experts with Human-Level Competencies. ArXiv, abs/2402.03628.
Davis, D. N., & Ramulu, S. K. (2017). Reasoning with BDI robots: From simulation to physical environment - Implementations and limitations. Paladyn, 8(1), 39–57. https://doi.org/10.1515/pjbr-2017-0003
Madl, T., Franklin, S., Chen, K., & Trappl, R. (2018). A computational cognitive framework of spatial memory in brains and robots. Cognitive Systems Research, 47, 147–172. https://doi.org/10.1016/j.cogsys.2017.08.002
Sumers, T.R., Yao, S., Narasimhan, K., & Griffiths, T.L. (2023). Cognitive Architectures for Language Agents. Trans. Mach. Learn. Res., 2024.
Emelyanov S. and etc. (2015) Multilayer cognitive architecture for UAV control. Cognitive System Research, 34.
Киселев, Г. А. (2020). Интеллектуальная система планирования поведения коалиции робототехнических агентов с STRL архитектурой. Информационные Технологии и Вычислительные Системы, 21–37. https://doi.org/10.14357/20718632200203
Kiselev G., Panov A. (2019) Hierarchical Psychologically Inspired Planning for Human-Robot Interaction Tasks. In: Ronzhin A., Rigoll G., Meshcheryakov R. (eds) Interactive Collaborative Robotics. ICR 2019. Lecture Notes in Computer Science, vol 11659. Springer, Cham.
G.S. Osipov, A.I. Panov Relationships and Operations in a Sign-Based World Model of the Actor // Scientific and Technical Information Processing. 2018. No. 5.
Osipov, G.S.: Sign-based representation and word model of actor. In: Yager, R., Sgurev, V., Hadjiski, M., and Jotsov, V. (eds.) 2016 IEEE 8th International Conference on Intelligent Systems (IS). pp. 2226. IEEE (2016).
Leontiev A.N. Activity Consciousness. Personality. М .: Politizdat, 1975.
Bruner J. Psychology of knowledge. Outside of direct information. M .: Progress, 1977.413 s.
Kiselev G., Panov A. Q-Learning of Spatial Actions for Hierarchical Planner of Cognitive Agents. In: Ronzhin A., Rigoll G., Meshcheryakov R. (eds) Interactive Collaborative Robotics. ICR 2020. Lecture Notes in Computer Science, (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer, Cham 2020, pp. 160-169. https://doi.org/10.1007/978-3-030-60337-3_16
Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang (2024) DriveLM: Driving with Graph Visual Question Answering In: arXiv preprint arXiv:2312.14150
Jintao, Xue & Zhang, Dongkun & Xiong, Rong & Wang, Yue & Liu, Eryun. (2023). A Two-Stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System. 5507-5513. 10.1109/IROS55552.2023.10341803.
Tian, Ran & Li, Boyi & Weng, Xinshuo & Chen, Yuxiao & Schmerling, Edward & Wang, Yue & Ivanovic, Boris & Pavone, Marco. (2024). Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving. 10.48550/arXiv.2407.00959.
Weng, Xinshuo & Ivanovic, Boris & Wang, Yan & Wang, Yue & Pavone, Marco. (2024). PARA-Drive: Parallelized Architecture for Real-Time Autonomous Driving. 15449-15458. 10.1109/CVPR52733.2024.01463.
Bain, A., (1855). The senses and the intellect. London: Parker.
Hull, C. L., (1943). Principles of Behavior: An Introduction to Behavior Theory. New York: D. Appleton-Century Company.
Skinner, B. F. (1931). The concept of the reflex in the description of behavior, Ph.D. thesis, Harvard University.
https://github.com/neuroidss/text-generation-neurofeedback-webui
Dong, Na & Zhang, Wen-qi & Gao, Zhong-ke. (2019). Research on fuzzy PID Shared control method of small brain-controlled uav. 10.48550/arXiv.1905.12240.
Осипов Г.С., Чудова Н.В., Панов А.И.: Знаковая картина мира субъекта поведения.
Antol, Stanislaw & Agrawal, Aishwarya & Lu, Jiasen & Mitchell, Margaret & Batra, Dhruv & Zitnick, C. & Parikh, Devi. (2015). VQA: Visual Question Answering. 2. 10.1109/ICCV.2015.279.
Xie, S., Sun, C., Huang, J., Tu, Z., and Murphy, K. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European conference on computer vision (ECCV), pp. 305–321, 2018.
Gupta, A., Pacchiano, A., Zhai, Y., Kakade, S. M., and Levine, S. Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity, 2022.
Chu, K., Zhao, X., Weber, C., Li, M., and Wermter, S. Accelerating reinforcement learning of robotic manipulations via feedback from large language models. arXiv preprint arXiv:2311.02379, 2023.
Radford A. et al. Language models are unsupervised multitask learners // OpenAI blog. – 2019. – Т. 1. – №. 8. – С. 9.
Touvron H. et al. Llama: Open and efficient foundation language models // arXiv preprint arXiv:2302.13971. – 2023.
Tan H., Bansal M. Lxmert: Learning cross-modality encoder representations from transformers // arXiv preprint arXiv:1908.07490. – 2019.
Li X. et al. Oscar: Object-semantics aligned pre-training for vision-language tasks // Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16. – Springer International Publishing, 2020. – С. 121-137.
Zhang P. et al. Vinvl: Revisiting visual representations in vision-language models // Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. – 2021. – С. 5579-5588.
Li J. et al. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models // International conference on machine learning. – PMLR, 2023. – С. 19730-19742.
Sima C. et al. Drivelm: Driving with graph visual question answering // arXiv preprint arXiv:2312.14150. – 2023.
Caesar H. et al. nuscenes: A multimodal dataset for autonomous driving // Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. – 2020. – С. 11621-11631.

Information About the Authors

Daniil A. Weizenfeld, Master’s Degree Student, Peoples' Friendship University of Russia named after Patrice Lumumba, Moscow, Russian Federation, ORCID: https://orcid.org/0000-0002-2787-0714, e-mail: veicenfeld@isa.ru

Gleb A. Kiselev, Candidate of Science (Engineering), Researcher, Federal Research Center "Informatics and Management", Russian Academy of Sciences (FRC CSC RAS), Senior Lecturer, Peoples' Friendship University of Russia (RUDN), Moscow, Russian Federation, ORCID: https://orcid.org/0000-0001-9231-8662, e-mail: kiselev@isa.ru

Metrics

Views

Total: 56
Previous month: 23
Current month: 2

Downloads

Total: 33
Previous month: 7
Current month: 1

PlumX

article metrics