Abstract:
Predicting national GDP can provide an important basis for policy making. The Random Forest(RF) model can integrate complex nonlinear relationships between economic variables, but its predictive performance depends on the rationality of decision tree parameters. We propose a predictive framework based on quarterly data from the National Bureau of Statistics from 2003 to 2023, utilizing the “decision tree parameter optimization-random forest model” approach: first,four algorithms of particle swarm optimization(PSO), genetic algorithm(GA), differential equation(DE), and Bayesian optimization(BO) are employed to optimize three core parameters of the decision tree: maximum depth, minimum sample split, and minimum sample leaf node.Then, using the optimized decision tree as the base learner, a random forest model is constructed.The optimal number of decision trees and maximum number of features are determined using out-ofbag error(OOB), ultimately forming four optimized random forest prediction models. Ten-fold cross-validation was employed to assess model performance using the coefficient of determination,mean absolute error, and root mean square error, and feature importance ranking was conducted based on the optimal model. The results show that the RF model with PSO-optimized decision tree parameters(PSO-DT) has the best predictive performance. The primary economic indicators influencing GDP are the secondary industry, tertiary industry, wholesale and retail trade, and transportation, warehousing, and postal services. This model provides theoretical support for all parties to gain insights into economic trends and enhance policy effectiveness.