Methods of Computational Linguistics and Natural Language Processing: Opportunities and Limitations for Personality Psychology Tasks

402

Abstract

The use of modern methods of computational linguistics in psychological research opens up new possibilities both for the study of personality and language and for the development of psychodiagnostics methods. This article discusses the main possible directions of such research, as well as non-obvious nuances that are important in their planning. Maximum use of the methods of computational linguistics will allow to consider the characteristics of the methods themselves, the language system, sources of texts and a sample of their authors, as well as the level of theoretical development. Each of the points will be considered in detail on the examples of studies already conducted. This review is not exhaustive but allows to create a general picture for the further search for solutions to specific research problems.

General Information

Keywords: computational linguistics, natural language processing, personality psychology, textual data analysis

Journal rubric: General Psychology

Article type: review article

DOI: https://doi.org/10.17759/jmfp.2022110110

Funding. This research is supported by the Faculty of Social Sciences, HSE University.

For citation: Kuzmina A.A., Lifshits M.A., Kostenko V.Y. Methods of Computational Linguistics and Natural Language Processing: Opportunities and Limitations for Personality Psychology Tasks [Elektronnyi resurs]. Sovremennaia zarubezhnaia psikhologiia = Journal of Modern Foreign Psychology, 2022. Vol. 11, no. 1, pp. 104–115. DOI: 10.17759/jmfp.2022110110. (In Russ., аbstr. in Engl.)

References

  1. Plungyan V.A. Zachem nuzhen Natsional'nyi korpus russkogo yazyka? Neformal'noe vvedenie [Why do we need the National Corpus of the Russian Language? informal introduction] [Elektronnyi resurs]. Natsional'nyi korpus russkogo yazyka: 2003—2005. Rezul'taty i perspektivy [National Corpus of the Russian Language: 2003-2005. Results and prospects]. Moscow: Indrik, 2005. pp. 6—20. URL: https://elibrary.ru/item.asp?id=26629630 (дата обращения: 03.03.2022). (In Russ.).
  2. Yasulova Kh.S., Shikhiev Sh.B. Prikladnye zadachi komp'yuternoi lingvistiki [Applied Problems of Computational Linguistics] [Elektronnyi resurs]. Vestnik Sotsial'no-pedagogicheskogo instituta [Bulletin of the Social and Pedagogical Institute], 2015. Vol. 14, no. 2, 3 p. URL: https://cyberleninka.ru/article/n/prikladnye-zadachi-kompyuternoy-lingvistiki (Accessed 03.03.2022). (In Russ.).
  3. Mikolov T., Grave E., Bojanowski P., Puhrsch C., Joulin A. Advances in pre-training distributed word representations [Elektronnyi resurs]. arXiv preprint arXiv:1712.09405, 2017. 4 p. URL: https://arxiv.org/pdf/1712.09405.pdf (Accessed 03.03.2022).
  4. Allport G.W., Odbert H.S. Trait-names: A psycho-lexical study. Psychological monographs, 1936. Vol. 47, no. 1, i-171. DOI:10.1037/h0093360
  5. O'Callaghan D., Greene D., Carthy J., Cunningham P. An analysis of the coherence of descriptors in topic modeling. Expert Systems with Applications, 2015. Vol. 42, no. 13, pp. 5645—5657. DOI:10.1016/j.eswa.2015.02.055
  6. Park G., Schwartz H.A., Eichstaedt J.C., Kern M.L., Kosinski M., Stillwell D.J., Ungar L.H., Seligman M.E.P. Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 2015. Vol. 108, no. 6, pp. 934—952. DOI:10.1037/pspp0000020
  7. Argamon S., Koppel M., Pennebaker J.W., Schler J. Automatically profiling the author of an anonymous text. Communications of the ACM, 2009. Vol. 52, no. 2, pp. 119—123. DOI:10.1145/1461928.1461959
  8. Besharati M.R., Izadi M. DAST Model: Deciding About Semantic Complexity of Text By DAST Model [Elektronnyi resurs]. ArXiv, 2019. 40 p. URL: http://arxiv.org/abs/1908.09080 (Accessed 03.03.2022).
  9. Bird S., Loper E. NLTK: the natural language toolkit [Elektronnyi resurs]. COLING ACL 2006 : 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Proceedings of the Interactive Presentation Sessions Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions. Stroudsburg, PA: Association for Computational Linguistics (ACL), 2006, pp. 69—72. URL: https://www.aclweb.org/anthology/P04-3031.pdf (Accessed 03.03.2022).
  10. Bisong E. Google AutoML: Cloud Natural Language Processing. Building Machine Learning and Deep Learning Models on Google Cloud Platform. Berkeley, CA: Apress, 2019, pp. 599—612. DOI:10.1007/978-1-4842-4470-8_43
  11. Bleidorn W., Hopwood C.J., Wright A.G. Using big data to advance personality theory. Current Opinion Behavioral Sciences, 2017. Vol. 18, pp. 79—82. DOI:10.1016/j.cobeha.2017.08.004
  12. Campbell J.C., Hindle A., Stroulia E. Latent Dirichlet Allocation. In Bird C., Menzies T., Zimmermann T. (eds.), The Art and Science of Analyzing Software Data. Waltham, MA: Elsevier, 2015, pp. 139—159. DOI:10.1016/B978-0-12-411519-4.00006-9
  13. Clark A., Fox C., Lappin S. The handbook of computational linguistics and natural language processing [Elektronnyi resurs]. West Sussex, England : Wiley-Blackwell, 2013. 800 p. URL: https://books.google.ru/books?id=zBmom42eWPcC&lpg=PA3&hl=ru&pg=PA3#v=onepage&q&f=false (Accessed 03.03.2022).
  14. Crossley S.A., Kyle K., McNamara D.S. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior research methods, 2016. Vol. 48, no. 4, pp. 1227—1237. DOI:10.3758/ s13428-015-0651-7
  15. Bogolyubova O., Panicheva P., Tikhonov R., Ivanov V., Ledovaya Y. Dark personalities on Facebook: Harmful online behaviors and language. Computers in Human Behavior, 2018. Vol. 78, pp. 151—159. DOI:10.1016/j.chb.2017.09.032
  16. SlovNet Deep Learning based NLP modeling for Russian language [Elektronnyi resurs]. Github, 2020. URL: https:// github.com/natasha/slovnet (Accessed 03.03.2022).
  17. Dickerson M. A gentle introduction to text analysis with Voyant tools [Elektronnyi resurs]. eScholarship. 2018. 22 p. URL: https://escholarship.org/content/qt6jz712sf/supp/Dickerson_TextAnalysisVoyantTools_112018.pdf (Accessed 03.03.2022).
  18. bureaucratic-labs Dostoevsky: Sentiment analysis library for russian language [Elektronnyi resurs]. Github, 2022. URL: https://github.com/bureaucratic-labs/dostoevsky (Accessed 03.03.2022).
  19. Zhang Y., Wei H., Ran Y., Deng Y., Liu D. Drawing openness to experience from user generated contents: An interpretable data-driven topic modeling approach. Expert Systems with Applications, 2020. Vol. 144. Article ID 113073, 13 p. DOI:10.1016/j.eswa.2019.113073
  20. Goranson A., Ritter R.S., Waytz A., Norton M.I., Gray K. Dying is unexpectedly positive. Psychological Science, 2017. Vol. 28, no. 7, pp. 988—999. DOI:10.1177/0956797617701186
  21. Kleim B., Horn A.B., Kraehenmann R., Mehl M.R., Ehlers A. Early linguistic markers of trauma-specific processing predict post-trauma adjustment. Frontiers in psychiatry, 2018. Vol. 9. Article ID 645, 7 p. DOI:10.3389/fpsyt.2018.00645
  22. Eder M., Rybicki J., Kestemont M. Stylometry with R: a package for computational text analysis. The R Journal, 2016. Vol. 8, no. 1, pp. 119—121. DOI:10.32614/RJ-2016-007
  23. Ferraro F.R. Males tend to die, females tend to pass away. Death studies, 2019. Vol. 43, no. 10, pp. 665—667. DOI:10.1 080/07481187.2018.1515127
  24. Carreras X., Chao I., Padro, Padro M. FreeLing: An Open-Source Suite of Language Analyzers [Elektronnyi resurs]. In Lino M.T., Xavier M.F., Ferreira F., Costa R., Silva R. (eds.), Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04), Lisbon, Portugal. Lisbon: European Language Resources Association (ELRA), 2004, pp. 239—242. URL: http://www.lrec-conf.org/proceedings/lrec2004/pdf/271.pdf (Accessed 03.03.2022).
  25. Kern M.L., Park G., Eichstaedt J.C., Schwartz H.A., Sap M., Smith L.K., Ungar L.H. Gaining insights from social media language: Methodologies and challenges. Psychological methods, 2016. Vol. 21, no. 4, pp. 507—525. DOI:10.1037/ met0000091
  26. Goldberg L. R. Language and individual differences: The search for universals in personality lexicons. Review of personality and social psychology, 1981. Vol. 2, no. 1, pp. 141—165.
  27. Grishman R. Computational linguistics: an introduction [Elektronnyi resurs]. Cambridge: Cambridge University Press, 1986. 193 p. URL: https://books.google.ru/books?id=Ar3-TXCYXUkC&lpg=PP1&hl=ru&pg=PP1#v=onepage&q&f=f alse (Accessed 03.03.2022).
  28. Haspelmath M., Michaelis S.M. Analytic and synthetic: Typological change in varieties of European languages. In Buchstaller I., Siebenhaar B. (eds.), Language Variation — European Perspectives VI. Selected papers from the Eighth International Conference on Language Variation in Europe (ICLaVE 8). Leipzig: John Benjamins Publishing Company, 2017, pp. 3—22. DOI:10.1075/silv.19.01has
  29. Nguyen D., Gravel R., Trieschnigg D., Meder T. “How Old Do You Think I Am?” A Study of Language and Age in Twitter [Elektronnyi resurs]. Proceedings of the International AAAI Conference on Web and Social Media, 2013. Vol. 7, no. 1, pp. 439—448. URL: https://ojs.aaai.org/index.php/ICWSM/article/view/14381 (Accessed 03.03.2022).
  30. Ireland M.E., Mehl M.R. Natural language use as a marker [Elektronnyi resurs]. In Holtgraves T.M. (eds.), The Oxford handbook of language and social psychology. Oxford University Press, 2014, pp. 201—218. URL: https://books.google.ru/ books?id=I2UJBAAAQBAJ&lpg=PP1&hl=ru&pg=PA201#v=onepage&q&f=false (Accessed 03.03.2022).
  31. Korobov M. Morphological analyzer and generator for Russian and Ukrainian languages. In Khachay M.Yu., Konstantinova N., Panchenko A., Ignatov D., Labunets V.G. (eds.), International Conference on Analysis of Images, Social Networks and Texts. New York: Springer, 2015, pp. 320—332. DOI:10.1007/978-3-319-26123-2_31
  32. Lyons M., Aksayli N. D., Brewer G. Mental distress and language use: Linguistic analysis of discussion forum posts. Computers in Human Behavior, 2018. Vol. 87, pp. 207—211. DOI:10.1016/j.chb.2018.05.035
  33. McCoy T.H. Mapping the Delirium Literature Through Probabilistic Topic Modeling and Network Analysis: A Computational Scoping Review. Psychosomatics, 2019. Vol. 60, no. 2, pp. 105—120. DOI:10.1016/j.psym.2018.12.003
  34. Noecker Jr J., Ryan M., Juola P. Psychological profiling through textual analysis. Literary and Linguistic Computing, 2013. Vol. 28, no. 3, pp. 382—387. DOI:10.1093/llc/fqs070
  35. Tran Dang Hien, Do Van Tuan, Pham Van At, Le Hung Son Novel Algorithm for Non-Negative Matrix Factorization. New Mathematics and Natural Computation, 2015. Vol. 11, no. 02, pp. 121—133. DOI:10.1142/S1793005715400013
  36. Panicheva P., Litvinova T. Matching LIWC with Russian Thesauri: An Exploratory Study. In Filchenkov A., Kauttonen J., Pivovarova L. (eds.), Artificial Intelligence and Natural Language: 9th Conference, AINL 2020: Helsinki, Finland, October 7—9, 2020: Proceedings. Cham: Springer, 2020, pp. 181—195. DOI:10.1007/978-3-030-59082-6_14
  37. Pennacchiotti M., Popescu A.M. Democrats, republicans and starbucks afficionados: user classification in twitter. KDD ‘11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: Association for Computing Machinery, 2011, pp. 430—438. DOI:10.1145/2020408.2020477
  38. Pennebaker J.W. Putting stress into words: Health, linguistic, and therapeutic implications. Behaviour research and therapy, 1993. Vol. 31, no. 6, pp. 539—548. DOI:10.1016/0005-7967(93)90105-4
  39. Lanning K., Pauletti R.E., King L.A., McAdams D.P. Personality development through natural language. Nature human behavior, 2018. Vol. 2, no. 5, pp. 327—334. DOI:10.1038/s41562-018-0329-0
  40. Qiu L., Chen J., Ramsay J., Lu J. Personality predicts words in favorite songs. Journal of Research in Personality, 2019. Vol. 78, pp. 25—35. DOI:10.1016/j.jrp.2018.11.004
  41. Rehurek R., Sojka P Gensim — statistical semantics in Python [Elektronnyi resurs]. Paris: EuroScipy, 2011. 1 p. URL: https://www.fi.muni.cz/usr/sojka/posters/rehurek-sojka-scipy2011.pdf (Accessed 03.11.2021).
  42. Schubert L. Computational Linguistics [Elektronnyi resurs]. The Stanford Encyclopedia of Philosophy Archive, 2014. URL: https://plato.stanford.edu/archives/spr2020/entries/computational-linguistics/ (Accessed 03.11.2021).
  43. Shavrina T.O., Benko V. Omnia russica: even larger russian corpus [Elektronnyi resurs]. In Zakharova V.P. (eds.), Trudy mezhdunarodnoi konferentsii «Korpusnaya lingvistika — 2019 [Proceedings of the international conference «Corpus linguistics—2019»]. Sankt-Peterburg: Izdatel'stvo Sankt-Peterburgskogo gosudarstvennogo universiteta, 2019, pp. 94—102. URL: https://events.spbu.ru/eventsContent/events/2019/corpora/corp_sborn.pdf (Accessed 03.11.2021).
  44. Stirman S.W., Pennebaker J.W. Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic medicine, 2001. Vol. 63, no. 4, article ID 150, pp. 517—522. DOI:10.1097/00006842-200107000-00001
  45. Kowsari K., Meimandi K.J., Heidarysafa M., Mendu S., Barnes L., Brown D. Text classification algorithms: A survey. Information, 2019. Vol. 10, no. 4, 68 p. DOI:10.3390/info10040150
  46. Pennebaker J.W., Boyd R.L., Jordan K., Blackburn K. The development and psychometric properties of LIWC2015 [Elektronnyi resurs]. Austin, TX: University of Texas at Austin, 2015. 26 p. URL: https://repositories.lib.utexas.edu/bitstream/handle/2152/31333/LIWC2015_LanguageManual.pdf?Sequence=3 (Accessed 03.11.2021).
  47. Pang D., Eichstaedt J.C., Buffone A., Slaff B., Ruch W., Ungar L.H. The language of character strengths: Predicting morally valued traits on social media. Journal of personality, 2020. Vol. 88, no. 2, pp. 287—306. DOI:10.1111/jopy.12491
  48. Bogolyubova O., Panicheva P., Ledovaya Y., Tikhonov R., Yaminov B. The Language of Positive Mental Health: Findings From a Sample of Russian Facebook Users. SAGE Open, 2020. Vol. 10, no. 2, 8 p. DOI:10.1177/2158244020924370
  49. Le M.T., Woodworth M., Gillman L., Hutton E., Hare R.D. The linguistic output of psychopathic offenders during a PCL-R interview. Criminal justice and behavior, 2017. Vol. 44, no. 4, pp. 551—565. DOI:10.1177/0093854816683423
  50. Franz P.J., Nook E.C., Mair P., Nock M.K. Using Topic Modeling to Detect and Describe Self-Injurious and Related Content on a Large-Scale Digital Platform. Suicide and Life-Threatening Behavior, 2020. Vol. 50, no. 1, pp. 5—18. DOI:10.1111/sltb.12569
  51. Vergani M., Bliuc A.M. The language of new terrorism: Differences in psychological dimensions of communication in Dabiq and Inspire. Journal of Language and Social Psychology, 2018. Vol. 37, no. 5, pp. 523—540. DOI:10.1177/0261927X17751011
  52. Weintraub W. Verbal behavior: Adaptation and psychopathology. New York: Springer Publishing Company, 1981. 214 p. DOI:10.2307/3790837
  53. Murakami A., Thompson P., Hunston S., Vajn D. ‘What is this corpus about?': using topic modelling to explore a specialised corpus. Corpora, 2017. Vol. 12, no. 2, pp. 243—277. DOI:10.3366/cor.2017.0118
  54. Wright A.G.C. Current directions in personality science and the potential for advances through computing. IEEE Transactions on Affective Computing, 2014. Vol. 5, no. 3, pp. 292—296. DOI:10.1109/TAFFC.2014.2332331

Information About the Authors

Alisa A. Kuzmina, Student of the Master Program “Positive Psychology”, Research Intern in the Laboratory of Linguistic Conflict Resolution Studies and Contemporary Communicative Practices, National Research University Higher School of Economics, Moscow, Russia, ORCID: https://orcid.org/0000-0003-0794-9131, e-mail: kuzmina.alice@gmail.com

Mari A. Lifshits, Independent Researcher, New York, USA, ORCID: https://orcid.org/0000-0003-0079-0244, e-mail: lifsh22m@mtholyoke.edu

Vasily Y. Kostenko, PhD in Psychology, Associate Professor, Faculty of Social Sciences, National Research University Higher School of Economics, Moscow, Russia, ORCID: https://orcid.org/0000-0002-5612-3857, e-mail: vasily.kostenko@gmail.com

Metrics

Views

Total: 2313
Previous month: 103
Current month: 56

Downloads

Total: 402
Previous month: 20
Current month: 6