Data Mining and Machine Learning Approaches for Real Estate Valuation: A Systematic Review of Predictive Accuracy
Keywords:
Data Mining, Random Forest, Real Estate Valuation, Regression Models, Systematic Literature Review, XGBoostAbstract
The current systematic review analyses the utilization of DM and ML techniques in real estate appraisal by collecting evidence from 70 empirical works conducted during 2000–2025. In accordance with PRISMA recommendations, the exhaustive search for relevant publications in Scopus, IEEE Xplore, ScienceDirect, and Web of Science resulted in 1,200 records, 70 of which were considered for further analysis. The findings show that the ensemble tree-based methods, mainly Random Forest and XGBoost, significantly outperform the conventional hedonic approach and regression models with regard to prediction accuracy (above 90%). At the same time, major drawbacks include geographical isolation of the samples in 17 out of 20 cases; reliance on structured databases against unstructured ones (e.g., text and image); poor interpretability inconsistent with regulations; and lack of longitudinal testing through different economic periods. Web-scraping is most prevalent in DM (31%) and ML (29%) studies, whereas the governmental registries serve as an essential source of data in ML research (29%). Further investigations should focus on validating the findings in multiple markets, incorporating multimodal data, constructing interpretable hybrid models, and conducting longitudinal studies.