A Multi-Stage Modeling Framework Integrating Random Forests and Cluster Analysis: Feature Identification and Prediction for the Global Innovation Index
DOI:
https://doi.org/10.54097/tyan7k03Keywords:
Random Forest, K-means Clustering, Global Innovation IndexAbstract
Based on data from the Global Innovation Index, this paper constructs an integrated data-driven framework that combines feature selection, cluster analysis, and predictive modeling. First, a random forest model is used to assess the importance of multidimensional indicators, identifying key variables from high-dimensional features to effectively capture complex nonlinear relationships. Second, the K-means method is employed to stratify the innovation levels of different countries, achieving structured classification. Building on this foundation, a sliding-window-based random forest regression model is introduced to transform short-sequence data into supervised learning samples, enabling high-precision forecasting of future indicators. The model demonstrates strong fitting capability and stability in testing. The core innovation of this study lies in the systematic integration of feature selection, structural classification, and predictive modeling into a unified analytical workflow, thereby enhancing the model’s adaptability to complex data and its interpretability. This method avoids reliance on a single model and demonstrates good generalization performance under various data structural conditions, making it applicable to scenarios involving multi-indicator comprehensive evaluation and dynamic forecasting. Overall, the proposed model framework enhances forecasting accuracy while simultaneously providing a synergistic characterization of key drivers and structural features, offering a universal and efficient technical approach for the analysis of complex systems.
Downloads
References
[1] Cui Youxiang, Hu Xinghua, Liao Juan, et al. Research on the Measurement and Evaluation System for Implementing the Innovation-Driven Development Strategy [J]. Science and Technology Management, 2013, 34(S1): 308–314+338. DOI: 10.19571/j.cnki.1000-2995.2013.s1.045.
[2] Su Jin. User Segmentation Based on the K-means Clustering Algorithm [J]. Digital Communications World, 2021, (06): 127-128+124.
[3] Shi Ce. A Study on User-Oriented Value in Internet Enterprises [D]. Shanghai Normal University, 2025. DOI:10.27312/d.cnki.gshsu. 2025.001393.
[4] Jia Ru. A Discussion on the Applicability of the Global Innovation Index to China: An Analysis Based on Measurement Content, Data, and Methods [J]. Research on Science, Technology Innovation and Development Strategy, 2025, 9(05): 54-66.
[5] Qi Su, Liu Lichun. Analysis of the Current Status and Influencing Factors of China’s Innovation Capabilities Based on the Global Innovation Index [J]. Science and Technology Progress and Policies, 2018, 35(18): 1-10.
[6] Wang Zongjun, Wang Xue, Jiang Zhenyu. A Study on the Hierarchical Structure Model and Improvement Pathways of China’s Global Innovation Index [J]. Complex Science Management, 2020, (02): 35-50.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computer Science and Artificial Intelligence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








