Modelling and Data Analysis
2024. Vol. 14, no. 4, 45–62
doi:10.17759/mda.2024140403
ISSN: 2219-3758 / 2311-9454 (online)
Method for Creating Behavior of Cognitive Agents Based on Multimodal Signal Processing
Abstract
The paper considers the problem of predicting the agent's activity based on the text description of the task and visual analysis of the environment. An update of the approaches of classical cognitive architecture is proposed, allowing its application in a real environment. An addition to the semiotic method of symbolic designation with the author's neural network mechanism for linking vectors of text and visual spaces is developed. A number of experiments with the obtained model in a complex environment of a car driving emulator are conducted.
General Information
Keywords: activity-related experience, quality of motivation, self-determination theory, intrinsic motivation, extrinsic motivation, academic motivation
Journal rubric: Data Analysis
Article type: scientific article
DOI: https://doi.org/10.17759/mda.2024140403
Funding. This publication has been supported by the RUDN University Strategic Academic Leadership Program, project No. 021934-0-000.The research was carried out using the infrastructure of the Shared Research Facilities «High Performance Computing and Big Data» (CKP «Informatics») of FRC CSC RAS (Moscow).
Received: 15.11.2024
Accepted:
For citation: Weizenfeld D.A., Kiselev G.A. Method for Creating Behavior of Cognitive Agents Based on Multimodal Signal Processing. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2024. Vol. 14, no. 4, pp. 45–62. DOI: 10.17759/mda.2024140403. (In Russ., аbstr. in Engl.)
References
- Bechon, P., Barbier, M., Grand, C., Lacroix, S., Lesire, C., & Pralet, C. (2018). Integrating planning and execution for a team of heterogeneous robots with time and communication constraints. 1091–1097.
- Benjamin, D. P., Li, T., Shen, P., Yue, H., Zhao, Z., & Lyons, D. (2018). Spatial understanding as a common basis for human-robot collaboration. Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-3-319-60384-1_3J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, Y. Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1060, 2004.
- Chu, Z., Wang, Y., Zhu, F., Yu, L., Li, L., & Gu, J. (2024). Professional Agents - Evolving Large Language Models into Autonomous Experts with Human-Level Competencies. ArXiv, abs/2402.03628.
- Davis, D. N., & Ramulu, S. K. (2017). Reasoning with BDI robots: From simulation to physical environment - Implementations and limitations. Paladyn, 8(1), 39–57. https://doi.org/10.1515/pjbr-2017-0003
- Madl, T., Franklin, S., Chen, K., & Trappl, R. (2018). A computational cognitive framework of spatial memory in brains and robots. Cognitive Systems Research, 47, 147–172. https://doi.org/10.1016/j.cogsys.2017.08.002
- Sumers, T.R., Yao, S., Narasimhan, K., & Griffiths, T.L. (2023). Cognitive Architectures for Language Agents. Trans. Mach. Learn. Res., 2024.
- Emelyanov S. and etc. (2015) Multilayer cognitive architecture for UAV control. Cognitive System Research, 34.
- Киселев, Г. А. (2020). Интеллектуальная система планирования поведения коалиции робототехнических агентов с STRL архитектурой. Информационные Технологии и Вычислительные Системы, 21–37. https://doi.org/10.14357/20718632200203
- Kiselev G., Panov A. (2019) Hierarchical Psychologically Inspired Planning for Human-Robot Interaction Tasks. In: Ronzhin A., Rigoll G., Meshcheryakov R. (eds) Interactive Collaborative Robotics. ICR 2019. Lecture Notes in Computer Science, vol 11659. Springer, Cham.
- G.S. Osipov, A.I. Panov Relationships and Operations in a Sign-Based World Model of the Actor // Scientific and Technical Information Processing. 2018. No. 5.
- Osipov, G.S.: Sign-based representation and word model of actor. In: Yager, R., Sgurev, V., Hadjiski, M., and Jotsov, V. (eds.) 2016 IEEE 8th International Conference on Intelligent Systems (IS). pp. 2226. IEEE (2016).
- Leontiev A.N. Activity Consciousness. Personality. М .: Politizdat, 1975.
- Bruner J. Psychology of knowledge. Outside of direct information. M .: Progress, 1977.413 s.
- Kiselev G., Panov A. Q-Learning of Spatial Actions for Hierarchical Planner of Cognitive Agents. In: Ronzhin A., Rigoll G., Meshcheryakov R. (eds) Interactive Collaborative Robotics. ICR 2020. Lecture Notes in Computer Science, (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer, Cham 2020, pp. 160-169. https://doi.org/10.1007/978-3-030-60337-3_16
- Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang (2024) DriveLM: Driving with Graph Visual Question Answering In: arXiv preprint arXiv:2312.14150
- Jintao, Xue & Zhang, Dongkun & Xiong, Rong & Wang, Yue & Liu, Eryun. (2023). A Two-Stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System. 5507-5513. 10.1109/IROS55552.2023.10341803.
- Tian, Ran & Li, Boyi & Weng, Xinshuo & Chen, Yuxiao & Schmerling, Edward & Wang, Yue & Ivanovic, Boris & Pavone, Marco. (2024). Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving. 10.48550/arXiv.2407.00959.
- Weng, Xinshuo & Ivanovic, Boris & Wang, Yan & Wang, Yue & Pavone, Marco. (2024). PARA-Drive: Parallelized Architecture for Real-Time Autonomous Driving. 15449-15458. 10.1109/CVPR52733.2024.01463.
- Bain, A., (1855). The senses and the intellect. London: Parker.
- Hull, C. L., (1943). Principles of Behavior: An Introduction to Behavior Theory. New York: D. Appleton-Century Company.
- Skinner, B. F. (1931). The concept of the reflex in the description of behavior, Ph.D. thesis, Harvard University.
- https://github.com/neuroidss/text-generation-neurofeedback-webui
- Dong, Na & Zhang, Wen-qi & Gao, Zhong-ke. (2019). Research on fuzzy PID Shared control method of small brain-controlled uav. 10.48550/arXiv.1905.12240.
- Осипов Г.С., Чудова Н.В., Панов А.И.: Знаковая картина мира субъекта поведения.
- Antol, Stanislaw & Agrawal, Aishwarya & Lu, Jiasen & Mitchell, Margaret & Batra, Dhruv & Zitnick, C. & Parikh, Devi. (2015). VQA: Visual Question Answering. 2. 10.1109/ICCV.2015.279.
- Xie, S., Sun, C., Huang, J., Tu, Z., and Murphy, K. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European conference on computer vision (ECCV), pp. 305–321, 2018.
- Gupta, A., Pacchiano, A., Zhai, Y., Kakade, S. M., and Levine, S. Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity, 2022.
- Chu, K., Zhao, X., Weber, C., Li, M., and Wermter, S. Accelerating reinforcement learning of robotic manipulations via feedback from large language models. arXiv preprint arXiv:2311.02379, 2023.
- Radford A. et al. Language models are unsupervised multitask learners // OpenAI blog. – 2019. – Т. 1. – №. 8. – С. 9.
- Touvron H. et al. Llama: Open and efficient foundation language models // arXiv preprint arXiv:2302.13971. – 2023.
- Tan H., Bansal M. Lxmert: Learning cross-modality encoder representations from transformers // arXiv preprint arXiv:1908.07490. – 2019.
- Li X. et al. Oscar: Object-semantics aligned pre-training for vision-language tasks // Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16. – Springer International Publishing, 2020. – С. 121-137.
- Zhang P. et al. Vinvl: Revisiting visual representations in vision-language models // Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. – 2021. – С. 5579-5588.
- Li J. et al. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models // International conference on machine learning. – PMLR, 2023. – С. 19730-19742.
- Sima C. et al. Drivelm: Driving with graph visual question answering // arXiv preprint arXiv:2312.14150. – 2023.
- Caesar H. et al. nuscenes: A multimodal dataset for autonomous driving // Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. – 2020. – С. 11621-11631.
Information About the Authors
Metrics
Views
Total: 8
Previous month: 0
Current month: 8
Downloads
Total: 2
Previous month: 0
Current month: 2