高级检索

基于优化决策树参数的随机森林模型预测全国GDP

PredictingNational GDP by Using Optimized Decision Tree Parameters of A Random Forest Model

  • 摘要: 预测全国GDP可为政策制定提供重要依据,随机森林(RF)模型能整合经济变量间复杂非线性关系,但其预测性能依赖于决策树的参数合理性。基于国家统计局2003—2023年季度数据,提出“决策树参数优化—随机森林模型”的预测框架,先采用粒子群(PSO)、遗传(GA)、差分(DE)、贝叶斯优化(BO)4种算法,对决策树的最大深度、最小样本分割、最小样本叶子节点3个核心参数进行寻优;再以优化后的决策树为基学习器构建随机森林(RF)模型,通过袋外误差(OOB)确定模型的最佳决策树数量与最大特征数,最终形成4种优化随机森林预测模型。采用10重交叉验证,以决定系数、平均绝对误差、均方根误差评估模型性能,并基于最优模型开展特征重要性排序。结果显示,PSO优化决策树参数的RF模型(PSO-DT)预测效果最佳,影响GDP的主要经济指标为第二产业、第三产业、批发和零售业及交通运输仓储邮政业,该模型可为各方洞察经济趋势、提升政策效能提供理论支撑。

     

    Abstract: Predicting national GDP can provide an important basis for policy making. The Random Forest(RF) model can integrate complex nonlinear relationships between economic variables, but its predictive performance depends on the rationality of decision tree parameters. We propose a predictive framework based on quarterly data from the National Bureau of Statistics from 2003 to 2023, utilizing the “decision tree parameter optimization-random forest model” approach: first,four algorithms of particle swarm optimization(PSO), genetic algorithm(GA), differential equation(DE), and Bayesian optimization(BO) are employed to optimize three core parameters of the decision tree: maximum depth, minimum sample split, and minimum sample leaf node.Then, using the optimized decision tree as the base learner, a random forest model is constructed.The optimal number of decision trees and maximum number of features are determined using out-ofbag error(OOB), ultimately forming four optimized random forest prediction models. Ten-fold cross-validation was employed to assess model performance using the coefficient of determination,mean absolute error, and root mean square error, and feature importance ranking was conducted based on the optimal model. The results show that the RF model with PSO-optimized decision tree parameters(PSO-DT) has the best predictive performance. The primary economic indicators influencing GDP are the secondary industry, tertiary industry, wholesale and retail trade, and transportation, warehousing, and postal services. This model provides theoretical support for all parties to gain insights into economic trends and enhance policy effectiveness.

     

/

返回文章
返回