Using Machine Learning Methods to Solve Problems of Forecasting the Amount and Probability of Purchase Based on E-Commerce Data

264

Abstract

The study is aimed at investigating the possibility of using machine learning methods to build models for predicting the probability of purchase and the amount of purchase by online store customers. As a sample, we used data of users transactions of the site ponpare.jp in the period from 01.07.2011 to 23.06.2012. The description and comparative analysis of the most common methods for solving similar problems are given. The metrics used to measure the results in the case of forecasting the fact and amount of the purchase are being described. The results obtained make it clear that within the framework of the problem of predicting the probability of a purchase, gradient boosting, namely its implementation of LGBMClassifier, shows the most accurate estimate. For the problem of predicting the amount of a customer’s purchase, using gradient boosting also gave the best results.

General Information

Keywords: probability and purchase amount forecast, classification, regression, data analysis, data processing, machine learning

Journal rubric: Data Analysis

DOI: https://doi.org/10.17759/mda.2020100403

For citation: Mamiev O.A., Finogenov N.A., Sologub G.B. Using Machine Learning Methods to Solve Problems of Forecasting the Amount and Probability of Purchase Based on E-Commerce Data. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2020. Vol. 10, no. 4, pp. 31–40. DOI: 10.17759/mda.2020100403. (In Russ., аbstr. in Engl.)

References

  1. Day, D., Gan, B., Gendall, P. and Esslemont, D. Predicting purchase behaviour // Marketing Bulletin. 1991. P.18–30.
  2. Starostin, V.S. and CHERNOVA, V.Y. E-commerce development in Russia: trends and prospects // The Journal of Internet Banking and Commerce. 2016.
  3. Kuhn M, Johnson K. Applied predictive modeling // New York: Springer. 2013.
  4. Glasbey, C.A. An analysis of histogram-based thresholding algorithms // CVGIP: Graphical models and image processing. 1993. P. 532–537.
  5. https://github.com/dmlc/xgboost
  6. Yang S, Zhang H. Comparison of several data mining methods in credit card default prediction // Intelligent Information Management. 2018. P. 115.
  7. Wu, H., Jiao, H., Yu, Y., Li, Z., Peng, Z., Liu, L. and Zeng, Z. Influence factors and regression model of urban housing prices based on internet open access data // Sustainability. 2018. P. 1676.
  8. Liu, L., Ji, M. and Buchroithner, M. Combining partial least squares and the gradient-boosting method for soil property retrieval using visible near-infrared shortwave infrared spectra // Remote Sensing. 2017. P. 1299.
  9. Wu, J.Y. Housing Price prediction Using Support Vector Regression. 2017.
  10. Limsombunchai, V. House price prediction: hedonic price model vs. artificial neural network // In New Zealand agricultural and resource economics society conference. 2004. P. 25–26.
  11. Li, J.Z. Monthly Housing Rent Forecast Based on LightGBM (Light Gradient Boosting) Model // International Journal of Intelligent Information and Management Science, 2018.

Information About the Authors

Oleg A. Mamiev, Moscow Aviation Institute (National Research University), Moscow, Russian Federation, ORCID: https://orcid.org/0000-0003-1137-4019, e-mail: olegios@mail.ru

Nikita A. Finogenov, Moscow Aviation Institute (National Research University), Moscow, Russian Federation, ORCID: https://orcid.org/0000-0001-7680-9496, e-mail: finogenov.nik@gmail.com

Gleb B. Sologub, Candidate of Science (Physics and Matematics), Associate Professor of the Department of Mathematical Cybernetics of Institute of Information Technologies and Applied Mathematics, Moscow Aviation Institute (National Research University), Moscow, Russian Federation, ORCID: https://orcid.org/0000-0002-5657-4826, e-mail: glebsologub@ya.ru

Metrics

Views

Total: 495
Previous month: 11
Current month: 1

Downloads

Total: 264
Previous month: 6
Current month: 0