EARLY FAULT SEVERITY PREDICTION IN DIGITAL-TWIN–ENABLED INDUSTRIAL SYSTEMS USING IMBALANCE-AWARE MACHINE LEARNING AND TEMPORAL FEATURE ENGINEERING
Main Article Content
Abstract
Continuous industrial telemetry in high dimensions generated by digital twin (DT) platforms offers opportunities for large-scale fault and predictive maintenance (PdM). However, the real-life applications of PdM remain very challenging. In this paper, we investigate the potential of early fault severity prediction using a five-year, hourly dataset of DT-derived data from January 2019 to January 2024. The data includes 38 variables, including measurements such as vibration, temperature, pressure, acoustic signals, operational load, maintenance and repair history, anomaly scores, fault probability estimates, and a multiclass fault diagnosis label of No Fault or Critical Fault. The research presented in this paper follows an end-to-end analytical approach, using a set of machine-learning classifiers, then optimizing them using class-weighted classifiers, probability calibration, and a cost-sensitive decision policy, all aligned with the strategic goals of PdM. The results of the experiments conducted in this research show that the baseline and temporally enhanced models achieve ≈0.69 accuracy. When class-weighted learning is introduced, it increases fault recall to 0.059 and macro-F1 to 0.451, while threshold optimization achieves perfect fault recall (1.00) with a corresponding fault F1-score of 0.468. The main findings of this work show that the joint effects of temporal modelling, imbalance-aware learning, and calibrated decision thresholds govern early severity prediction performance. Our research is expected to yield a reproducible, deployment-oriented framework that improves minority-class detection and enables effective maintenance decisions for DT-enabling Industry 4.0/5.0 applications.
JEL Classification Codes: O14, O32, L42.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Abd Wahab, N. H., Hasikin, K., Wee Lai, K., Xia, K., Bei, L., Huang, K., & Wu, X. (2024). Systematic review of predictive maintenance and digital twin technologies challenges, opportunities, and best practices. PeerJ Computer Science, 10, e1943. https://doi.org/10.7717/peerj-cs.1943
Alobaid, A., & Corcho, O. (2024). Linear approximation of the quantile–quantile plot for semantic labelling of numeric columns in tabular data. Expert Systems with Applications, 238, 122152. https://doi.org/10.1016/j.eswa.2023.122152
Apeiranthitis, S., Zacharia, P., Chatzopoulos, A., & Papoutsidakis, M. (2024). Predictive Maintenance of Machinery with Rotating Parts Using Convolutional Neural Networks. Electronics, 13(2), 460. https://doi.org/10.3390/electronics13020460
Carvalho, T. P., Soares, F. A. A. M. N., Vita, R., Francisco, R. D. P., Basto, J. P., & Alcalá, S. G. S. (2019). A systematic literature review of machine learning methods applied to predictive maintenance. Computers & Industrial Engineering, 137, 106024. https://doi.org/10.1016/j.cie.2019.106024
Compare, M., Baraldi, P., & Zio, E. (2020). Challenges to IoT-Enabled Predictive Maintenance for Industry 4.0. IEEE Internet of Things Journal, 7(5), 4585–4597. https://doi.org/10.1109/JIOT.2019.2957029
Cook, J., & Ramadas, V. (2020). When to consult precision-recall curves. The Stata Journal: Promoting Communications on Statistics and Stata, 20(1), 131–148. https://doi.org/10.1177/1536867X20909693
Dalzochio, J., Kunst, R., Pignaton, E., Binotto, A., Sanyal, S., Favilla, J., & Barbosa, J. (2020). Machine learning and reasoning for predictive maintenance in Industry 4.0: Current status and challenges. Computers in Industry, 123, 103298. https://doi.org/10.1016/j.compind.2020.103298
Grieves, M., & Vickers, J. (2017). Digital Twin: Mitigating Unpredictable, Undesirable Emergent Behavior in Complex Systems. In F.-J. Kahlen, S. Flumerfelt, & A. Alves (Eds.), Transdisciplinary Perspectives on Complex Systems (pp. 85–113). Springer International Publishing. https://doi.org/10.1007/978-3-319-38756-7_4
Hancock, J. T., Khoshgoftaar, T. M., & Johnson, J. M. (2023). Evaluating classifier performance with highly imbalanced Big Data. Journal of Big Data, 10(1), 42. https://doi.org/10.1186/s40537-023-00724-5
Hancock, J. T., Khoshgoftaar, T. M., & Johnson, J. M. (2024). Using Area Under the Precision Recall Curve to Assess the Effect of Random Undersampling in the Classification of Imbalanced Medicare Big Data. International Journal of Reliability, Quality and Safety Engineering, 31(1), 2350039. https://doi.org/10.1142/S0218539323500390
IndFD-PM-DT. (2024). Retrieved January 20, 2026, from https://www.kaggle.com/datasets/datasetengineer/indfd-pm-dt
Isbilen, F., Bektas, O., & Konar, M. (2025). Deep learning and similarity-based models for predicting turbofan engine remaining useful life: Insights from the CMAPSS dataset. The Aeronautical Journal, 129(1337), 2004–2035. https://doi.org/10.1017/aer.2025.25
Jaenal, A., Ruiz-Sarmiento, J.-R., & Gonzalez-Jimenez, J. (2024). MachNet, a general Deep Learning architecture for Predictive Maintenance within the Industry 4.0 paradigm. Engineering Applications of Artificial Intelligence, 127, 107365. https://doi.org/10.1016/j.engappai.2023.107365
Karyofyllas, G., Giagopoulos, D., Jia, X., & Papadimitriou, C. (2025). A digital twin-driven machine learning framework for structural condition monitoring using multi-datasets. Structural Health Monitoring, 14759217251324110. https://doi.org/10.1177/14759217251324110
Li, X., Ding, Q., & Sun, J.-Q. (2018). Remaining useful life estimation in prognostics using deep convolution neural networks. Reliability Engineering & System Safety, 172, 1–11. https://doi.org/10.1016/j.ress.2017.11.021
Long, T., Luo, S., Luan, X., Sun, F., & Zhou, Q. (2026). A digital twin-assisted fault diagnosis framework based on denoising diffusion probabilistic model. Measurement, 258, 119396. https://doi.org/10.1016/j.measurement.2025.119396
Lyu, Y., Li, H., Sayagh, M., Jiang, Z. M. (Jack), & Hassan, A. E. (2021). An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions. ACM Transactions on Software Engineering and Methodology, 30(4), 1–38. https://doi.org/10.1145/3447876
Mateus, B. C., Mendes, M., Farinha, J. T., & Martins, A. (2025). Hybrid Deep Learning for Predictive Maintenance: LSTM, GRU, CNN, and Dense Models Applied to Transformer Failure Forecasting. Energies, 18(21), 5634. https://doi.org/10.3390/en18215634
Pu, D., & Wu, Y. (2024). Error Model for the Assimilation of All-Sky FY-4A/AGRI Infrared Radiance Observations. Sensors, 24(8), 2572. https://doi.org/10.3390/s24082572
Richardson, E., Trevizani, R., Greenbaum, J. A., Carter, H., Nielsen, M., & Peters, B. (2024). The receiver operating characteristic curve accurately assesses imbalanced datasets. Patterns, 5(6), 100994. https://doi.org/10.1016/j.patter.2024.100994
Saito, T., & Rehmsmeier, M. (2015). The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
Semeraro, C., Lezoche, M., Panetto, H., & Dassisti, M. (2021). Digital twin paradigm: A systematic literature review. Computers in Industry, 130, 103469. https://doi.org/10.1016/j.compind.2021.103469
Serradilla, O., Zugasti, E., Rodriguez, J., & Zurutuza, U. (2022). Deep learning models for predictive maintenance: A survey, comparison, challenges and prospects. Applied Intelligence, 52(10), 10934–10964. https://doi.org/10.1007/s10489-021-03004-y
Sofaer, H. R., Hoeting, J. A., & Jarnevich, C. S. (2019). The area under the precision‐recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution, 10(4), 565–577. https://doi.org/10.1111/2041-210X.13140
Tan, J., Radhi, R. M., Shirini, K., Gharehveran, S. S., Parisooz, Z., Khosravi, M., & Azarinfar, H. (2025). Innovative framework for fault detection and system resilience in hydropower operations using digital twins and deep learning. Scientific Reports, 15(1), 15669. https://doi.org/10.1038/s41598-025-98235-1
Tao, F., Zhang, H., Liu, A., & Nee, A. Y. C. (2019). Digital Twin in Industry: State-of-the-Art. IEEE Transactions on Industrial Informatics, 15(4), 2405–2415. https://doi.org/10.1109/TII.2018.2873186
Yang, C., Cai, B., Zhang, R., Zou, Z., Kong, X., Shao, X., Liu, Y., Shao, H., & Akbar Khan, J. (2023). Cross-validation enhanced digital twin driven fault diagnosis methodology for minor faults of subsea production control system. Mechanical Systems and Signal Processing, 204, 110813. https://doi.org/10.1016/j.ymssp.2023.110813
Yoo, J. (2025). Enhancing Nickel Matte Grade Prediction Using SMOTE-Based Data Augmentation and Stacking Ensemble Learning for Limited Dataset. Processes, 13(3), 754. https://doi.org/10.3390/pr13030754
You, Y., Chen, C., Hu, F., Liu, Y., & Ji, Z. (2022). Advances of Digital Twins for Predictive Maintenance. Procedia Computer Science, 200, 1471–1480. https://doi.org/10.1016/j.procs.2022.01.348
Zayed, S. M., Attiya, G., El-Sayed, A., Sayed, A., & Hemdan, E. E.-D. (2023). An Efficient Fault Diagnosis Framework for Digital Twins Using Optimized Machine Learning Models in Smart Industrial Control Systems. International Journal of Computational Intelligence Systems, 16(1), 69. https://doi.org/10.1007/s44196-023-00241-6
Zhao, P., Luo, C., Qiao, B., Wang, L., Rajmohan, S., Lin, Q., & Zhang, D. (2022). T-SMOTE: Temporal-oriented Synthetic Minority Oversampling Technique for Imbalanced Time Series Classification. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2406–2412. https://doi.org/10.24963/ijcai.2022/334
Zhong, D., Xia, Z., Zhu, Y., & Duan, J. (2023). Overview of predictive maintenance based on digital twin technology. Heliyon, 9(4), e14534. https://doi.org/10.1016/j.heliyon.2023.e14534
Zonta, T., Da Costa, C. A., Da Rosa Righi, R., De Lima, M. J., Da Trindade, E. S., & Li, G. P. (2020). Predictive maintenance in the Industry 4.0: A systematic literature review. Computers & Industrial Engineering, 150, 106889. https://doi.org/10.1016/j.cie.2020.106889