IDENTIFYING RISK FACTORS OF LATE HIV DIAGNOSIS USING OPTIMIZED MACHINE LEARNING ALGORITHM

Maryam Farhadian; Фархадиан Марьям; Samad Moslehi; Мослехи Самад; Mohammad Mirzaei; Мирзаи Мохаммад

doi:10.15789/2220-7619-ITC-17896

IDENTIFYING RISK FACTORS OF LATE HIV DIAGNOSIS USING OPTIMIZED MACHINE LEARNING ALGORITHM

Authors: Farhadian M.¹, Moslehi S.¹, Mirzaei M.²
Affiliations:
1. Hamadan University of Medical Sciences, Hamadan, Iran
2. Center for Disease Control & Prevention, Hamadan, Iran
Section: ORIGINAL ARTICLES
Submitted: 23.03.2025
Accepted: 19.05.2025
URL: https://iimmun.ru/iimm/article/view/17896
DOI: https://doi.org/10.15789/2220-7619-ITC-17896
ID: 17896

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

Abstract

Background: Early detection of HIV infection is essential for clinical diagnosis, preventing transmission, and ensuring the safety of blood products. Individuals diagnosed late with HIV may unknowingly transmit the virus, and once diagnosed, they may experience worse health outcomes. Therefore, this study aims to identify the characteristics associated with late diagnosis of HIV patients.

Methods: In this retrospective cohort study, the information of 236 patients with HIV infection in Hamadan, the West of Iran, was collected by recording the CD4 count during 2011 to 2022 years. Late HIV diagnosis was considered with a CD4≤350/mm3. Initially, Extreme Gradient Boosting (XGBoost) and Random Forest (RF) algorithms identified important variables. Subsequently, models such as Logistic Model Tree (LMT), Classification and Regression Tree (CART), Deep Neural Network (DNN), and Support Vector Machine (SVM) were developed using a 70/30 training/test dataset split for clinical and demographic variables. Finally, the optimal model was selected based on accuracy and F1-score using Python software version 3.10.

Results: The age, logarithm of Viral Load (LVL), Wight Blood Cell (WBC), Red Blood Cell (RBC), Lymphocyte (Lym), Hematocrit (Hct), Platelet (PLT), Hemoglobin (Hb), and clinical stage variables had relative importance above 6%. Among the developed models for the importance variables, the CART with F1-score and Accuracy values of 0.887 and 0.801 and 0.897 and 0.822 for training data, respectively. The AUC value obtained for the CART was equal to 0.918.

Conclusions: Late diagnosis of HIV infection is a substantial problem, particularly in developing an algorithm that can accurately and interpretably detect disease characteristics, such as the CART, which could be essential for identifying characteristics that influence late HIV diagnosis and clinical and therapeutic decisions.

Keywords

Machine Learning, Deep Learning, Decision Tree, HIV/AIDS, Classification.

About the authors

Maryam Farhadian

Hamadan University of Medical Sciences, Hamadan, Iran

Email: maryam_farhadian80@yahoo.com
ORCID iD: 0000-0002-6054-9850

Ph.D, Associate Professor of Biostatistics, Department of Biostatistics, School of Public Health and Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran

Иран

Samad Moslehi

Hamadan University of Medical Sciences, Hamadan, Iran

Email: samadmoslehi999@gmail.com
ORCID iD: 0000-0003-1597-7327

Ph.D, Assistant Professor of Biostatistics, Department of Biostatistics, School of Public Health, Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran

Иран

Mohammad Mirzaei

Center for Disease Control & Prevention, Hamadan, Iran

Author for correspondence.
Email: mirzaei3589@gmail.com
ORCID iD: 0000-0001-9428-059X

MS.c, Disease Control Expert, Center for Disease Control & Prevention, Deputy of Health Services, Hamadan University of Medical Sciences, Hamadan, Iran

Иран

References

World Health Statistics 2023: monitoring health for the SDGs, sustainable development goals. Available at: URL: https://www.who.int/publications/i/item/9789240074323.
Gallo RC. A reflection on HIV/AIDS research after 25 years. Retrovirology. 2006, vol. 3, no. 1, pp.1-7. doi: 10.1186/1742-4690-3-72.
Rotily M., Bentz L., Pradier C., Obadia Y., Cavailler P. Factors related to delayed diagnosis of HIV infection in southeastern France. International journal of STD & AIDS. 2000, vol. 11, no. 8, pp. 531-535. doi: 10.1258/0956462001916272.
Camoni L., Raimondo M., Regine V., Salfa MC., Suligoi B. Late presenters among persons with a new HIV diagnosis in Italy, 2010–2011. BMC Public Health. 2013, vol. 13, no. pp. 1-6. doi: 10.1186/1471-2458-13-281.
Likatavicius G., Van de Laar M. HIV and AIDS in the European Union, 2011. Eurosurveillance. 2012, vol. 17, no. 48, pp. 1-17. doi: 10.2807/ese.17.48.20329-en.
Buetikofer S. Prevalence and risk factors of late presentation for HIV diagnosis and care in a tertiary referral center in Switzerland. University of Zurich. 2014, pp. 1-8. doi: 10.5167/uzh-105956.
Gelaw YA., Senbete GH., Adane AA., Alene KA. Determinants of late presentation to HIV/AIDS care in Southern Tigray Zone, Northern Ethiopia: an institution-based case–control study. AIDS research and therapy. 2015, vol. 12, no. 1, pp. 1-8. doi: 10.1186/s12981-015-0079-2.
Croxford S, Stengaard AR., Brännström J., Combs L., Dedes N., Girardi E., Grabar S., Kirk O., Kuchukhidze G., Lazarus JV., Noori T. Late diagnosis of HIV: an updated consensus definition. HIV medicine. 2022, vol. 23, no. 11, pp:1202-1208. doi: 10.1111/hiv.13425.
Gesesew HA., Ward P., Woldemichael K., Mwanri L. Late presentation for HIV care in Southwest Ethiopia in 2003–2015: prevalence, trend, outcomes and risk factors. BMC infectious diseases. 2018, vol. 18, pp: 1-11. doi: 10.1186/s12879-018-2971-6.
Nyika H., Mugurungi O., Shambira G., Gombe NT., Bangure D., Mungati M., Tshimanga M. Factors associated with late presentation for HIV/AIDS care in Harare City, Zimbabwe, 2015. BMC Public Health. 2016, vol. 16, no. 369, pp: 1-7. doi: 10.1186/s12889-016-3044-7.
Najafi-Vosough R., Faradmal J., Hosseini SK., Moghimbeigi A., Mahjub H. Predicting Hospital Readmission in Heart Failure Patients in Iran: A Comparison of Various Machine Learning Methods. Healthcare informatics research. 2021, vol. 27, no. 4, pp: 307-14. doi: 10.4258/hir.2021.27.4.307.
Najafi-Vosough R., Faradmal J., Tapak L., Alafchi B., Najafi-Ghobadi K., Mohammadi T. Prediction the survival of patients with breast cancer using random survival forests for competing risks. Journal of preventive medicine and hygiene. 2022, vol. 63, no. 2. pp: 298-303. doi: 10.15167/2421-4248/jpmh2022.63.2.2405.
Wang D., Larder B., Revell A., Montaner J., Harrigan R., De Wolf F., Lange J., Wegner S., Ruiz L., Pérez-Elías MJ., Emery S. A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy. Artificial intelligence in medicine. 2009, vol. 47, no. 1, pp: 63-74. doi: 10.1016/j.artmed.2009.05.002.
Xiang Y., Du J., Fujimoto K., Li F., Schneider J., Tao C. Application of artificial intelligence and machine learning for HIV prevention interventions. The Lancet HIV. 2022, vol. 9, no. 1, pp: 54-62. doi: 10.1016/S2352-3018(21)00247-2.
Bisaso KR., Anguzu GT., Karungi SA., Kiragga A., Castelnuovo B. A survey of machine learning applications in HIV clinical research and care. Computers in biology and medicine. 2017, vol. 91, pp: 366-371. doi: 10.1016/j.compbiomed.2017.11.001.
Mi JX., Li AD., Zhou LF. Review study of interpretation methods for future interpretable machine learning. IEEE Access. 2020, vol. 8, pp: 191969 -191985. doi: 10.1109/ACCESS.2020.3032756.
Moslehi S., Rabiei N., Soltanian AR., Mamani M. Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran. BMC medical informatics and decision making. 2022, vol. 22, no. 1, pp: 192. doi: 10.1186/s12911-022-01939-x.
Holzinger A. Data mining with decision trees: theory and applications. Online Information Review. 2015, vol. 39, no. 3, pp: 437-448.
Landwehr N., Hall M., Frank E. Logistic model trees. 2005, vol. 59, pp: 161-205. doi: 10.1007/s10994-005-0466-3.
Reyad M., Sarhan AM., Arafa M. A modified Adam algorithm for deep neural network optimization. Neural Computing and Applications. 2023, vol. 35, no. 23, pp: 17095-17112. doi: 10.1007/s00521-023-08568-z.
Valkenborg D., Rousseau AJ., Geubbelmans M., Burzykowski T. Support vector machines. Official Journal of the American Association of Orthodontists. 2023, vol. 164, no. 5, pp: 754-757. doi: 10.1016/j.ajodo.2023.08.003.
Osman, A.I.A., Ahmed, A.N., Chow, M.F., Huang, Y.F. and El-Shafie, A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Engineering Journal. 2021, vol. 12, no. 2, pp: 1545-1556. doi: 10.1016/j.asej.2020.11.011.
Bath RE., Emmett L., Verlander NQ., Reacher M. Risk factors for late HIV diagnosis in the East of England: evidence from national surveillance data and policy implications. International journal of STD & AIDS. 2019, vol. 30, no. 1, pp: 37-44. doi: 10.1177/0956462418793327.
Mohammadi Y., Mirzaei M., Shirmohammadi-Khorram N., Farhadian M. Identifying risk factors for late HIV diagnosis and survival analysis of people living with HIV/AIDS in Iran (1987–2016). BMC infectious diseases. 2021, vol. 21, no. 1, pp: 1-9. doi: 10.1186/s12879-021-06100-z.
Lee C-Y., Lin Y-P., Wang S-F., Lu P-L. Late CART initiation consistently driven by late HIV presentation: A multicenter retrospective cohort study in Taiwan from 2009 to 2019. Infectious Diseases and Therapy. 2022, vol. 11, no. 3, pp: 1033-1056. doi: 10.1007/s40121-022-00619-7.
Weissman S., Yang X., Zhang J., Chen S., Olatosi B., Li X. Using a Machine Learning Approach to Explore Predictors of Health Care Visits as Missed Opportunities for HIV Diagnosis. AIDS (London, England). 2021, vol. 35, no. 1, pp: S7-S18. doi: 10.1097/QAD.0000000000002735.
Morales-Sánchez, R., Montalvo, S., Riaño, A., Martínez, R. and Velasco, M. Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes. Computers in Biology and Medicine. 2024, vol. 179, pp: 108830. doi: 10.1016/j.compbiomed.2024.108830.
Romero-Rodríguez, D.P., Ramírez, C., Imaz-Rosshandler, I., Ormsby, C.E., Peralta-Prado, A., Olvera-García, G., Cervantes, F., Würsch-Molina, D., Romero-Rodríguez, J., Jiang, W. and Reyes-Terán, G. Machine learning-selected variables associated with CD4 T cell recovery under antiretroviral therapy in very advanced HIV infection. Translational Medicine Communications. 2020, vol. 5, pp: 1-10. doi: 10.1186/s41231-020-00058-x.
Adler A., Mounier-Jack S., Coker R. Late diagnosis of HIV in Europe: definitional and public health challenges. AIDS care. 2009, vol. 21, no. 3, pp: 284-293. doi: 10.1080/09540120802183537.

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

Vol 15, No 4 (2025)

IDENTIFYING RISK FACTORS OF LATE HIV DIAGNOSIS USING OPTIMIZED MACHINE LEARNING ALGORITHM

Full Text

Abstract

Keywords

About the authors

Maryam Farhadian

Samad Moslehi

Mohammad Mirzaei

References

Supplementary files

This website uses cookies