Stochastic swarm clusterization method in natural language data processing

421

Abstract

Consider natural language data processing technology based on non-linear dimensionality reduction method which takes into account the discriminating power of the solution found for given values of the categorical variable associated with each observation. Stochastic optimization method known as the “Particle swarm optimization” is proposed to found characteristics that ensure the best separation of observations in terms of a given quality functional. The basis for evaluating the quality of the solution lies in the purity of the clusters obtained with the k-means method, or with using self-organizing Kohonen feature maps.

General Information

Keywords: сombinatorial optimization, particle swarm optimization, non-linear dimensionality reduction

Journal rubric: Mathematical Psychology

Article type: scientific article

DOI: https://doi.org/10.17759/exppsy.2018110301

For citation: Yuryev G.A., Verkhovskaya E.K., Yuryeva N.E. Stochastic swarm clusterization method in natural language data processing. Eksperimental'naâ psihologiâ = Experimental Psychology (Russia), 2018. Vol. 11, no. 3, pp. 5–18. DOI: 10.17759/exppsy.2018110301. (In Russ., аbstr. in Engl.)

References

  1. Aviation safety network. URL: https://aviation-safety.net/database/ (06.12.2017).
  2. Eberhart R. Kennedy J. A New Optimizer Using Particles Swarm Theory. Sixth International Symposium on MicroMachine and Human Science (Nagoya, Japan). NJ, 1995. IEEE Service Center, Piscataway, pp. 39— 43.
  3. Formalev V.F., Reviznikov D.L. Chislennye metody [Mathematical methods]. Moscow, Fizmatlit. 2004. 400 p.
  4. Gladkov L.A. Bioinspirirovannye metody v optimizacii: monografiya [Bioinspiration methods in optimization]. Moscow, Fizmatlit, 2009. 384 p.
  5. Kennedy J., Swarm Intelligence. Morgan Kaufmann Publishers, Inc. San Francisco, CA, 2001.
  6. Kennedy J., Eberhart R. Particle Swarm Optimization. IEEE International Conference on Neural Networks (Perth, Australia). IEEE Service Center, Piscataway. NJ, 1995, pp. 1942—1948.
  7. Khanesar M.A. Novel Binary Particle Swarm Optimization, Particle Swarm Optimization. In M.A. Khanesar, H. Tavakoli, M. Teshnehlab, M.A. Shoorehdeli, A. Lazinica (Ed.). InTech, DOI: 10.5772/6738. 2009. URL: https://www.intechopen.com/books/particle_swarm_optimization/novel_binary_particle_swarm_optimization (06.12.2017).
  8. Kuravsky L.S., Artemenkov S.L., Yuriev G.A., Grigorenko E.L. Novyj podhod k komp’yuterizirovannomu adaptivnomu testirovaniyu [New approach to computer adaptive testing]. Eksperimental’naya psihologiya [Experimental Psychology], 2017, vol. 10, no. 3, pp. 33—45. doi:10.17759/exppsy.2017100303
  9. Kuravsky L.S., Marmalyuk P.A., Alhimov V.I., Yuriev G.A. Matematicheskie osnovy novogo podhoda k postroeniyu procedur testirovaniya [Mathematical basis of a novel approach to testing]. Eksperimental’naya psihologiya [Experimental Psychology], 2012, vol. 5, no. 4, pp. 75—98.
  10. Kuravsky L.S., Marmalyuk P.A., Alhimov V.I., Yuriev G.A. Novyj podhod k postroeniyu intellektual’nyh i kompetentnostnyh testov [Novel approach to intellectual testing]. Modelirovanie i analiz dannyh [Modeling and data analysis], 2013, no. 1, pp. 4—28.
  11. Kuravsky L.S., Yuriev G.A. Probabilistic artifact filtration in adaptive testing. Modelirovanie i analiz dannyh [Modeling and data analysis], 2012, no. 1, pp. 70—81.
  12. Kuravskiy L.S., Yuriev G.A. Ispol’zovanie markovskih modelej pri obrabotke rezul’tatov testirovaniya [Markov models in testing data analysis]. Voprosy psihologii [Issues in Psychology], 2011, no 2, pp. 98—107.
  13. Kuravsky L.S, Marmalyuk P.A., Yuriev G.A., Dumin P.N. Chislennye metody identifikacii markovskih processov s diskretnymi sostoyaniyami i nepreryvnym vremenem [Mathematical methods of markov processes in discrete state in time]. Matem. Modelirovanie [Mathematical modeling], 2017, vol. 29, no. 5, pp. 133—146.
  14. Kuravsky L.S., Baranov S.N. Komp’yuternoe modelirovanie i analiz dannyh: Konspekty lekcij i uprazhneniya: ucheb. Posobie [Computer modeling and data analysis]. Moscow, Rusavia, 2012. 18 p.
  15. Mikolov T., Yih W., Zweig G. Linguistic Regularities in Continuous Space Word Representations. Proceedings of NAACL HLT, 2013.
  16. Swamy N. Cluster Purity Visualizer. 2016. URL: https://bl.ocks.org/nswamy14/e28ec2c438e9e8bd302f
  17. Tyumeneva Y.A. Psihologicheskoe izmerenie [Psychological measurement]. Moscow, Aspekt-Press, 2007.
  18. Kennedy J., Eberhart R. Particle Swarm Optimization. IEEE International Conference on Neural Networks (Perth, Australia). IEEE Service Center, Piscataway. NJ, 1995, pp. 1942—1948.
  19. Khanesar M.A. Novel Binary Particle Swarm Optimization, Particle Swarm Optimization. In M.A. Khanesar, H. Tavakoli, M. Teshnehlab, M.A. Shoorehdeli, A. Lazinica (Ed.). InTech, DOI: 10.5772/6738. 2009. URL: https://www.intechopen.com/books/particle_swarm_optimization/novel_binary_particle_swarm_optimization (06.12.2017).
  20. Kuravsky L.S., Artemenkov S.L., Yuriev G.A., Grigorenko E.L. Novyj podhod k komp’yuterizirovannomu adaptivnomu testirovaniyu [New approach to computer adaptive testing]. Eksperimental’naya psihologiya [Experimental Psychology], 2017, vol. 10, no. 3, pp. 33—45. doi:10.17759/exppsy.2017100303
  21. Kuravsky L.S., Marmalyuk P.A., Alhimov V.I., Yuriev G.A. Matematicheskie osnovy novogo podhoda k postroeniyu procedur testirovaniya [Mathematical basis of a novel approach to testing]. Eksperimental’naya psihologiya [Experimental Psychology], 2012, vol. 5, no. 4, pp. 75—98.
  22. Kuravsky L.S., Marmalyuk P.A., Alhimov V.I., Yuriev G.A. Novyj podhod k postroeniyu intellektual’nyh i kompetentnostnyh testov [Novel approach to intellectual testing]. Modelirovanie i analiz dannyh [Modeling and data analysis], 2013, no. 1, pp. 4—28.
  23. Kuravsky L.S., Yuriev G.A. Probabilistic artifact filtration in adaptive testing. Modelirovanie i analiz dannyh [Modeling and data analysis], 2012, no. 1, pp. 70—81.
  24. Kuravskiy L.S., Yuriev G.A. Ispol’zovanie markovskih modelej pri obrabotke rezul’tatov testirovaniya [Markov models in testing data analysis]. Voprosy psihologii [Issues in Psychology], 2011, no 2, pp. 98—107.
  25. Kuravsky L.S, Marmalyuk P.A., Yuriev G.A., Dumin P.N. Chislennye metody identifikacii markovskih processov s diskretnymi sostoyaniyami i nepreryvnym vremenem [Mathematical methods of markov processes in discrete state in time]. Matem. Modelirovanie [Mathematical modeling], 2017, vol. 29, no. 5, pp. 133—146.
  26. Kuravsky L.S., Baranov S.N. Komp’yuternoe modelirovanie i analiz dannyh: Konspekty lekcij i uprazhneniya: ucheb. Posobie [Computer modeling and data analysis]. Moscow, Rusavia, 2012. 18 p.
  27. Mikolov T., Yih W., Zweig G. Linguistic Regularities in Continuous Space Word Representations. Proceedings of NAACL HLT, 2013.
  28. Swamy N. Cluster Purity Visualizer. 2016. URL: https://bl.ocks.org/nswamy14/e28ec2c438e9e8bd302f
  29. Tyumeneva Y.A. Psihologicheskoe izmerenie [Psychological measurement]. Moscow, Aspekt-Press, 2007.

Information About the Authors

Grigory A. Yuryev, PhD in Physics and Matematics, Associate Professor, Head of Department of the Computer Science Faculty, Leading Researcher, Youth Laboratory Information Technologies for Psychological Diagnostics, Moscow State University of Psychology and Education, Moscow, Russia, ORCID: https://orcid.org/0000-0002-2960-6562, e-mail: g.a.yuryev@gmail.com

E. K. Verkhovskaya, Researcher, Moscow State University of Psychology and Education, Moscow, Russia, e-mail: katrin636bmw@yandex.ru

Nataliya E. Yuryeva, PhD in Engineering, Head of Laboratory, Youth Laboratory Information Technologies for Psychological Diagnostics, Research Fellow, Information Technology Center for Psychological Studies of the Computer Science Faculty, Moscow State University of Psychology and Education, Moscow, Russia, ORCID: https://orcid.org/0000-0003-1419-876X, e-mail: yurieva.ne@gmail.com

Metrics

Views

Total: 1273
Previous month: 3
Current month: 2

Downloads

Total: 421
Previous month: 1
Current month: 1