TY - JOUR
T1 - Investigating the effectiveness of hybrid gradient boosting models and optimization algorithms for concrete strength prediction
AU - Le Nguyen, Khuong
AU - Shakouri, Mahmoud
AU - Ho, Lanh Si
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/6
Y1 - 2025/6
N2 - This study aims to evaluate and predict the compressive strength of concrete using 8 different machine learning (ML) models, including Extreme Gradient Boosting (XGBoost), Light Gradient Boosting machine (LightGBM), Gradient Boosting with Categorical features support (CatBoost), Gradient Boosting Regressor (GBR), Adaptive Boosting (AdaBoost), Decision Tree (DT), Random Forest (RF), and Support Vector Machine Regression (SVR). The study employed Bayesian optimisation process with two surrogate models (Gaussian Processes and Random Forest) and Random Search optimisation process to optimise the hyperparameters of these ML models. 1030 data samples were used to train the models and analyse the feature importance of each input variable using SHapley Additive exPlanations (SHAP). The results indicated that all 8 hybrid ML models performed well with R2 values larger than 0.80 and four models (XGBoost, CatBoost, GBR, and LightGBM) being the standout models, achieving R2 values of 0.94, 0.94, 0.92, and 0.92 on testing dataset, respectively. The four leading models (XGBoost, CatBoost, GBR, LightGBM) were applied to six sub-databases of concrete types, significantly enhancing accuracy with all models achieving R2 values over 0.98 on the testing dataset. The study also found that curing age, cement content, and amount of water were the most important variables affecting compressive strength while fly ash was the least important. By deploying the three best models to the cloud, it is now possible to make predictions using any web browser on any device.
AB - This study aims to evaluate and predict the compressive strength of concrete using 8 different machine learning (ML) models, including Extreme Gradient Boosting (XGBoost), Light Gradient Boosting machine (LightGBM), Gradient Boosting with Categorical features support (CatBoost), Gradient Boosting Regressor (GBR), Adaptive Boosting (AdaBoost), Decision Tree (DT), Random Forest (RF), and Support Vector Machine Regression (SVR). The study employed Bayesian optimisation process with two surrogate models (Gaussian Processes and Random Forest) and Random Search optimisation process to optimise the hyperparameters of these ML models. 1030 data samples were used to train the models and analyse the feature importance of each input variable using SHapley Additive exPlanations (SHAP). The results indicated that all 8 hybrid ML models performed well with R2 values larger than 0.80 and four models (XGBoost, CatBoost, GBR, and LightGBM) being the standout models, achieving R2 values of 0.94, 0.94, 0.92, and 0.92 on testing dataset, respectively. The four leading models (XGBoost, CatBoost, GBR, LightGBM) were applied to six sub-databases of concrete types, significantly enhancing accuracy with all models achieving R2 values over 0.98 on the testing dataset. The study also found that curing age, cement content, and amount of water were the most important variables affecting compressive strength while fly ash was the least important. By deploying the three best models to the cloud, it is now possible to make predictions using any web browser on any device.
KW - Bayesian optimization process
KW - CatBoost
KW - Concrete compressive strength
KW - LightGBM
KW - XGBoost
UR - https://www.scopus.com/pages/publications/105000057009
U2 - 10.1016/j.engappai.2025.110568
DO - 10.1016/j.engappai.2025.110568
M3 - Article
AN - SCOPUS:105000057009
SN - 0952-1976
VL - 149
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 110568
ER -