Machine learning-driven identification of key risk factors for predicting depression among nurses

Qi, Xiaoyan; Huang, Xin

doi:10.1186/s12912-025-02957-6

Research
Open access
Published: 03 April 2025

Machine learning-driven identification of key risk factors for predicting depression among nurses

BMC Nursing volume 24, Article number: 368 (2025) Cite this article

649 Accesses
Metrics details

Abstract

Background

Since the outbreak of the coronavirus disease (COVID-19) in 2019, caused by SARS-CoV-2, the disease has become a global health threat due to its high infectivity, morbidity, and mortality rates. With China’s comprehensive relaxation of pandemic control policies in 2022, the risk of infection for nursing personnel has further increased.

Objectives

This study aims to identify risk factors associated with depression among nursing staff during the full reopening of COVID-19 in China in 2022 and to construct a predictive model to assess the risk.

Methods

From December 9, 2022, to April 6, 2023, a cross-sectional study was conducted in three hospitals in Anhui Province, including 293 nursing staff. The research subjects were divided into a depression group and a non-depression group, and SPSS 23.0 software was used to analyze the data of both groups. We developed four predictive machine learning models: logistic regression, support vector machine, extreme gradient boosting machine (XGBoost), and adaptive boosting (AdaBoost). The development and validation of these models utilized open-source Python libraries such as Scikit-learn and XGBoost. The models were trained and validated using a 10-fold cross-validation method, and the final model selection was based on the area under the receiver operating characteristic curve (AUC).

Results

The AUC values for the logistic regression, SVM, Logistic, XGBoost, and AdaBoost models were 0.86, 0.88, 0.95, and 0.93, respectively, with F1 scores of 0.79, 0.83, 0.90, and 0.89, respectively. The XGBoost model demonstrated the highest predictive accuracy. However, the study’s findings are limited by the small sample size and single location, and further validation is needed to confirm the model’s generalizability. The extreme gradient boosting machine model, tailored for common risk factors among Chinese nursing staff, provides a powerful tool for predicting the risk of depression.

Conclusion

This model can assist clinical managers in accurately identifying and addressing potential risk factors during and after the full reopening of COVID-19. Since the working environment and stress factors faced by nursing staff may vary across different countries, the research findings from China can promote international exchange and cooperation in the management of mental health among nursing staff, advice future research should focus on larger, multi-center studies to validate the model’s performance and explore additional risk factors.

Clinical trial number

Not applicable, because of this article belongs to cross-sectional study.

Peer Review reports

Introduction

SARS-CoV-2 is the pathogen responsible for the 2019 corona-virus disease (COVID-19), which is highly contagious, with high incidence and mortality rates. Globally, COVID-19 has become a significant health risk. The World Health Organization quickly declared COVID-19 a global pandemic. As of January 2021, there have been approximately 98 million confirmed cases and 2 million deaths worldwide. This has placed a tremendous burden on medical institutions, particularly on healthcare workers. As front-line medical providers for COVID-19 patients, nurses are at a higher risk of infection. In China, the government has implemented a series of epidemic prevention measures, including personal isolation, nucleic acid testing, the use of Chinese antigen testing, vaccination, social distancing, and mask-wearing [1]. However, on December 27, 2022, the National Health Commission of China issued a comprehensive lifting notice, advising individuals to conduct routine nucleic acid monitoring and temperature checks only when their body temperature exceeds 38.5 degrees Celsius [2]. The new round of infection peaks has led to a severe shortage of medical resources, a lack of medical staff, and significant adjustments in the medical work environment.

As the main force on the medical front-line, nurses have faced a significant increase in COVID-19 infections following the full outbreak of the pandemic. While caring for COVID-19 patients, nurses also face ongoing infection risks. Relevant data shows that after the full implementation of COVID-19 prevention measures, the number of infected nurses has shown a significant upward trend, and infected nurses exhibit more pronounced psychological health issues [3]. Depressive symptoms, as a typical manifestation, have been found in previous studies [4, 5]. However, existing research mainly focuses on the psychological emotions of nursing staff during COVID-19 and their influencing factors, with a focus on medical staff as a whole rather than specifically on nurses. It is crucial to accurately assess the depression risk of nurses at different stages and provide targeted intervention measures, especially during multiple public health emergencies.

Previous studies have confirmed that machine learning methods can be effectively used for risk monitoring. For instance, a study utilizing data from the United States National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018 employed machine learning models to identify risk factors for depression [6]. Another study during the COVID-19 pandemic used machine learning methods to study mental health disorders among nurses in the Asia-Pacific region, with an accuracy exceeding 0.784 [7]. However, comprehensive research on the risk factors for depression among nurses after the full lifting of restrictions is still lacking.

Developing personalized depression risk prediction models using machine learning models to accurately identify the depression risk of nurses after the full lifting of COVID-19 restrictions aims to provide an additional diagnostic method for nurses after the full implementation of COVID-19 and to alleviate the depression risk of nurses. This study aims to fill this gap by identifying key risk factors and constructing a predictive model to assess the risk.

Methods

Research design

This quantitative, cross-sectional study utilized an online survey tool.

Sample and setting

Participants in this study are clinical nurses from hospitals of different levels in Hefei City, Anhui Province. As the central area of Anhui Province, Hefei had a total population of 9.634 million by 2023, with the number of nurses exceeding 3,000. The three hospitals involved in the study are: first-level hospitals (with fewer than 100 beds), second-level hospitals (with fewer than 500 beds), and third-level hospitals (with more than 1,000 beds). Nurses included in the study had to meet the following five criteria: (1) registered nurses with a valid license; (2) clinical nurses; (3) employed at their current employer for at least six months; (4) working in the hospital during the period when COVID-19 restrictions were fully lifted; (5) consented to participate in the study. The sample size was determined using GPower software(Version 3.1.9.7). An anonymous online survey was conducted through the Wenjuanxing platform. The questionnaire was distributed to nurses via WeChat. The calculation was based on the following parameters: an effect size of 0.3 (considering the moderate effect size for detecting differences in depression prevalence between groups), an alpha level of 0.05, and a power (1 -β) of 0.80. These parameters were chosen based on recommendations from prior studies on sample size estimation for cross-sectional surveys, particularly those involving mental health outcomes [8, 9]. The estimated sample size required for the study was 268 participants. However, to account for potential non-response and data loss, we initially distributed 500 questionnaires. Initially, 500 questionnaires were distributed, with 324 nurses responding, yielding a response rate of 64.8%. After excluding invalid questionnaires, a total of 293 valid questionnaires were obtained, with a validity rate of 90.4%. All participants were allowed to participate in the study. The study was conducted in accordance with the ethical principles laid out in the Declaration of Helsinki [10]. The research protocol was approved by the Research Ethics Review Committee of Anhui Medical University (Ethical approval number: EYLL-2021-018). All participants were informed of the study’s purpose, methods, potential risks and benefits, and their right to withdraw at any time without penalty. Written informed consent was obtained from all participants prior to their involvement in the study. The study ensured the protection of participants’ privacy and confidentiality of their personal information.

Instrument

The severity of insomnia was assessed using the Insomnia Severity Index [ISI) scale, which was developed by Chung, Kan, and Yeung in [11]. It is used to evaluate an individual’s subjective experience of insomnia over the past two weeks. The scale consists of 7 items, with each item scored from 0 to 4, and the total score ranges from 0 to 28. The higher the total score, the more severe the individual’s insomnia, divided into four levels: no significant insomnia (0–7 points], sub-threshold insomnia (8–14 points), clinical insomnia (15–21 points), and severe insomnia (>21 points). In related studies, the Cronbach’s alpha coefficient of this scale is 0.891.indicating good reliability [11]. The Perceived Stress Scale-10 (PSS-10) was used to assess the stress levels of the subjects, translated by [12]. The scale consists of 10 items, scored on a 5-point scale, with the numbers 1 to 5 corresponding to “never,” “sometimes,” “occasionally,” “often,” and “always,” respectively. The higher the individual’s total score, the greater the perceived stress. In related studies, the Cronbach’s alpha coefficient of this scale is 0.761.

Despite the Cronbach’s alpha coefficient of the PSS-10 scale being 0.761, which is slightly lower than the commonly accepted ideal value (≥0.8), the scale has been widely used in mental health research and has demonstrated good reliability and validity across multiple cultural contexts [13, 14].

End events

The Patient Health Questionnaire-9 (PHQ-9) was used to assess the depressive status of the study subjects [15]. The PHQ-9 questionnaire is used to evaluate the frequency of depression in patients over the past two weeks. The questionnaire consists of 9 items, each scored from 0 to 3. Here, 0 represents “none,” 1 represents “several days,” 2 represents “more than half the days,” and 3 represents “nearly every day.” The total score ranges from 0 to 27, with a cutoff point of 10 for depression. In this study, based on the PHQ-9 scale scores of the subjects, 23 subjects with scores above 10 were included in the depression group, and the remaining 270 subjects were placed in the non-depression group.

Construction of the data set

This study included a total of 293 patient samples, with 7.85% belonging to the depression group. Considering that in binary classification tasks, a ratio of negative to positive samples close to 1:1 in the dataset can effectively prevent sample-induced bias. Therefore, to enhance the model’s final predictive accuracy, this study employed Adaptive Synthetic Sampling (ADASYN) technology to address the issue of class imbalance. Compared to traditional random oversampling methods, ADASYN not only balances negative and positive samples but also reduces the occurrence of overfitting [16]. Moreover,each continuous variable was scaled using Z-score transformation to enhance the stability of the predictive model [17]. After preprocessing the raw data, the final dataset contained 540 samples(In the original dataset, the sample ratio between the depression group and the non-depression group was severely imbalanced (7.85% vs. 92.15%). To enhance the model’s predictive power and generalizability, we employed the ADASYN technique to perform oversampling on the data. ADASYN increases the number of minority class samples by synthesizing new samples, bringing their count closer to that of the majority class. As a result, we generated a balanced dataset of 540 samples (270 samples in the depression group and 270 in the non-depression group). This method not only balanced the dataset but also reduced the bias caused by the initial sample imbalance): 270 negative samples (non-depression group patients) and 270 positive samples (depression group patients). The dataset was randomly stratified into training and testing sets with a ratio of 7:3. according to relevant studies, a 7:3 ratio can effectively balance the needs of model training and evaluation when dealing with small sample data [18]. The training set was used for model selection, construction, and hyperparameter tuning, while the testing set was used to evaluate the final model.

Model establishment and evaluation

In the field of machine learning, predicting which algorithm performs best often requires experimental validation. Therefore, this study employed four different algorithms to train common machine learning (ML) models, including logistic regression, support vector machines (SVM), extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost). The development and validation of these models utilized open-source Python libraries such as Scikit-learn and XGBoost. This study used a 10-fold cross-validation method to select the model, by dividing the training set into 10 non-overlapping parts, with 9 parts used for model training and 1 part used as internal validation data, a process repeated ten times. k-fold cross-validation is a standard technique for assessing model performance, and it is more reliable than simply holding out a validation set by providing information about performance variability [19]. During the training process, grid search was used to optimize the model’s hyperparameters, and the final model selection was based on the evaluation criterion of the area under the receiver operating characteristic curve (AUC). Finally, the successfully trained models were independently evaluated on the test set, with accuracy, sensitivity, and specificity calculated based on the confusion matrix. The predictive performance of the models was comprehensively assessed through the AUC value.

Ethical considerations and data collection

The Medical Research Ethics Committee of Anhui Children’s Hospital (Ethical approval number: EYLL-2021-018) approved this study. The survey was designed to connect with a nurse working group through WeChat in several hospitals. WeChat has achieved a penetration rate of 93% in first-tier cities. Nurses clicked the link to access the survey page. All participants were required to select the option “Yes, I have informed consent and am willing to participate in this study” or, if they did not wish to participate, click “No, I don’t have informed consent and am unwilling to participate in this study.” All participants were assured that their privacy would be protected in this study. To ensure a high response rate, the survey will be sent to participants once a week. The survey was conducted from December 2022 to April 2023 and was completed once the COVID-19 prevalence dropped to safe levels.

Statistical methods

To ensure the robustness and interpretability of our findings, we employed a combination of statistical analysis and machine learning techniques. Initially, we used SPSS 23.0 software to conduct descriptive statistics and preliminary analyses. For measurement data that were skewed, we expressed them as Median (IQR) and used the Mann-Whitney rank sum test to compare groups. For count data that were skewed, we expressed them as cases (%) and used the chi-square test or Fisher’s exact probability method to compare groups. These statistical tests were crucial for identifying significant differences between groups and for selecting relevant features for subsequent machine learning modeling. A p-value of less than 0.05 was considered statistically significant.

Results

A total of 293 clinical nurses, with a median age of 32.00 (interquartile range: 29.00–36.00), were included in this study, with females accounting for 99.32%. The research subjects had a mean age of 10.00 (7.00, 14.00) years, with 132 (45.05%) holding junior professional titles, 149 (50.85%) holding intermediate professional titles, and 12 (4.10%) holding associate or senior professional titles. The description of other baseline data is detailed in Table 1.

Table 1 Baseline data of research subjects

Full size table

Comparison of baseline data between the depressive and non-depressive groups

Compared with the non-depressive group, patients in the depressive group had higher total Insomnia Severity Index (ISI) and Perceived Stress Scale (PSS) scores. Additionally, a higher proportion of them had previously taken psychotropic drugs and had been isolated from hospitals, showing statistically significant differences (P < 0.05). Please refer to Table 2 for details.

Table 2 Baseline data comparison between the non-depressive and depressive groups

Full size table

Comparison of baseline data between the training and validation sets after sample equalization processing

After sample equalization using ADASYN technology, the dataset contains 540 samples: 270 negative samples (i.e., non-depressive patients) and 270 positive samples (i.e., depressive patients). The data set is randomly stratified into a training set (n = 378) and a testing set (n = 162) in a ratio of 7:3. The comparison of baseline data between the two groups showed no statistically significant difference (P > 0.05), indicating homogeneity, as illustrated in Table 3.

Table 3 Comparison of baseline data between training and validation sets after sample equalization processing

Full size table

Model selection and establishment

After 10-fold cross-validation on the training set, the AUC values of each model based on logistic regression, SVM, Logistic,XGBoost, and AdaBoost were 0.86, 0.88, 0.95, and 0.93, respectively, as shown in Table 4 and Fig. 1-A. Therefore, XGBoost was used as the final model for training on the complete training set. After grid search optimization, the main hyperparameters of the XGBoost model were determined. These include the objective (optimization objective function = binary: logistic), learning rate (learning rate = 0.3), max depth (maximum tree depth = 4), min child weight (sum of minimum bifurcation weights = 2), and reg lambda (L2 regularization coefficient = 1).

Table 4 Prediction efficiency analysis of different models in the training set

Full size table

Model evaluation and feature screening

On the test set, the confusion matrix analysis of the XGBoost model demonstrated its strong predictive power. The model had an AUC value of 0.92, an accuracy of 85%, a sensitivity of 84%, a specificity of 91%, a positive predictive value of 83%, a negative predictive value of 88%, and an F1 score of 0.84. Figure 1-B presents the ROC curves of the XGBoost model on both the training and test sets. Figure 2 shows the top 5 input variables that most significantly impact the predictive accuracy of the XGBoost model. These variables include the overall Perceived Stress Scale (PSS) scores of the subjects, years of work experience, overall Insomnia Severity Index (ISI) scores, age, and place of isolation. These factors have significant clinical importance in predicting the depression risk of the subjects.

In Fig. 1-A of the training set, the AUC values for the SVM, Logistic, XGBoost, and AdaBoost models were 0.86, 0.88, 0.95, and 0.93, respectively. Based on these results, XGBoost was chosen as the final model due to its highest AUC value and was trained on the entire training dataset. In Fig. 1-B of the test set, the confusion matrix analysis of the XGBoost model revealed the following performance metrics: an AUC value of 0.92, an accuracy of 85%, a sensitivity of 84%, a specificity of 91%, a positive predictive value of 83%, a negative predictive value of 88%, and an F1 score of 0.84. Additionally, Fig. 1-B also presents the ROC curves of the XGBoost model for both the training and test sets.

Figure 2 identifies the five most influential input variables on the XGBoost model’s predictive accuracy, which are crucial for assessing the risk of depression among the subjects. These key variables comprise the subjects’ total Perceived Stress Scale (PSS) scores, their years of professional experience, their overall Insomnia Severity Index (ISI) scores, their age, and the locations where they are isolated

Discussion

The results of this study support the proposed hypothetical paradigm. The research found that during the full lifting of COVID-19 restrictions, nurses in China experienced depression [4, 5]. Traditional logistic regression analysis revealed that nurses with higher total scores on the ISI and PSS [20], as well as those with a higher proportion who had taken psychiatric medication and been hospitalized, were directly affected by the level of depression and prognosis. A preliminary literature review indicates that nurses on the front line against COVID-19 experienced moderate to severe insomnia and were prone to depression [21]. Similar studies have shown that during the epidemic prevention and control period, nurses exhibited varying degrees of depression and insomnia [22, 23]. This may be attributed to the high-intensity work patterns that nurses typically face, and even after the full lifting of restrictions, they still face the risk of infection. Compared to general nurses, they are more susceptible to depression and insomnia. Therefore, we should strengthen the early monitoring and identification of nurses who have been working since the full lifting of restrictions, especially when they exhibit typical symptoms such as insomnia. Nurses who were isolated in hospitals due to infection control measures often had to deal with a large number of infected or suspected patients. Due to inadequate preparation, long working hours, insufficient support from medical professionals, and an imbalance in the nurse-to-patient ratio, nurses typically experienced moderate to high stress levels [24]. When individuals experience long-term stress related to COVID-19, they may be at risk of developing mental health issues, such as depression and anxiety [25]. Therefore, nursing managers should prioritize reasonable isolation models or comprehensive quarantine methods to address future epidemics. The use of psychiatric medication related to nurses’ mental health issues after the full lifting of restrictions. This result is similar to the findings of [26], who conducted research on patients in nursing homes. During the first wave of the COVID-19 pandemic, many patients required psychiatric medication and analgesics to alleviate pain. In our study, the use of psychiatric medication has been identified as one of the risk factors for depression. This is mainly due to the lack of strict government control policies after the full lifting of restrictions, allowing people to move freely inside and outside the hospital, leading to an increase in the number of infections. Hospitals had to continue to admit infected patients, even beyond their capacity, causing nurses to have to care for suspected or infected patients in a short period of time. From December 2022 to March 2023, within three months after the first wave of COVID-19, most studies [5, 27, 28] indicate that nurses have a significant tendency towards depression. In order to continue working and combat the fully lifted COVID-19, some nurses chose to take psychiatric medication to manage their mental health issues. As nursing managers, it is not only necessary to follow traditional methods of managing nursing staff but also to choose appropriate strategies that adapt to the current environment and prepare for similar future epidemics.

Experts and researchers have begun to explore how machine learning algorithms can be used to diagnose individuals with depression [29]. A study utilizing data from the United States National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018 employed machine learning models to identify risk factors for individuals with depression. The study developed models using Support Vector Machine (SVM), CatBoost, Backpropagation (BP), and deep learning algorithms, and compared the accuracy of these four methods in predicting depression in the test set. The best accuracy exceeded 0.8 [6]. The research indicated that sleep disturbances, age, and perceived stress could be significant risk factors in detecting depression. At the onset of the COVID-19 pandemic, machine learning methods were used to study mental health disorders among nurses in the Asia-Pacific region. Gradient boosting, random forests, and LightGBM, which are machine learning models based on decision trees, were used to predict psychological stress characteristics (such as depression), with an accuracy exceeding 0.784 [7]. Age and years of employment were considered factors affecting the mental health of nurses. For instance, due to a lack of sufficient work experience during COVID-19, younger nurses were more prone to adverse mental health outcomes. During the COVID-19 pandemic, a study was conducted using machine learning methods (XG Boost) to identify depression in the Chinese population, involving data from 29,841 participants. The accuracy of XGBoost was 0.75, higher than similar studies using logistic regression. This suggests the stability and reliability of the XGBoost model. The results indicated that, due to potential cognitive control deficits, the elderly may be more susceptible to comorbidities and depression during the pandemic than younger individuals. With lower perceived stress and hospital isolation, some nurses were more likely to feel stressed due to concerns about the virus and exposure to it. There was a potential for close contact. In these cases, the XGBoost algorithmic model outperformed the logistic algorithmic model, providing methodological guidance for identifying risk factors for depression during the COVID-19 period [30, 31]. Nonetheless, comprehensive research on the risk factors for depression among nurses after the full lifting of restrictions is still lacking.

The results of both logistic regression and the four machine learning methods indicate that PSS scores, ISI scores, and places of isolation are the same risk factors that can be used to detect depression among nurses. The differences in the results of the four machine learning methods also include years of work experience and age. In addition to the previous use of psychiatric medication, the main reasons for this study are as follows: The average age of our study participants is 32.00 years [ranging from 29.00 to 36.00 years), they have limited work experience, are still young, and thus lack the necessary coping mechanisms for emergencies. Moreover, they have excessive work burdens and lack professional training, all of which make nurses more susceptible to depression. Psychiatric medication was found not to be a risk factor for nurse depression, contrary to the results of [32]. The main reason is that although psychiatric medication can alleviate mental illnesses in nurses, professionals, such as nurses, are aware that these medications can also lead to unwanted side effects. Traditional views equate medication with chronic poison. In China, despite the high risk of developing depression for most nurses, they usually do not have time to consider this issue. The heavy workload they face prevents them from properly taking care of their own health, and mental health issues are often neglected. Many nurses simply attribute their fatigue to physical exhaustion and do not consider themselves in need of psychiatric medication.

This study established a classic logistic regression model and three other machine learning models to examine the risk factors for nurse depression during the full lifting of COVID-19 restrictions. After comparison, the XG Boost model was ultimately determined to perform the best. The study showed a significant discriminative effect on the prevalence of risk factors for nurse depression after complete lifting (F1 score = 0.90 [0.86–0.93], AUC = 0.920). A study on the mental health status of nursing personnel and models during public health emergencies yielded similar results to this study, indicating that the accuracy of using random forests, artificial neural networks, support vector machines, and gradient boosting machines can effectively predict the mental health status of nursing personnel during public health emergencies. The main reason is that in [33], study, public health events included data on Severe Acute Respiratory Syndrome (SARS], Middle East Respiratory Syndrome (MERS), and COVID-19. However, the study did not continue to detect which machine learning model is the best.

Compared with other machine learning algorithms, the XGBoost algorithm demonstrates fast training time, high efficiency, and strong generalization ability. Regression and classification are two widely used areas for it [34, 35]. The XGBoost model shows high relative importance in the analysis of the relative importance of indicators, compared to a small number of indicators. Compared with the other three machine learning algorithms, the XGBoost model can achieve high accuracy with fewer features. In clinical practice, partial or missing indicators are more useful. Therefore, by evaluating the performance of the model. In this study, we successfully identified key risk factors for depression among nurses and developed effective predictive models using various machine learning models. Nevertheless, we recognize that other advanced analytical methods, such as random forest and SHAP (Shapley Additive Explanations) analysis, may offer deeper insights into the assessment of depression risk among nurses. Random forest is renowned for its robust predictive power and adaptability to complex data structures, capable of handling a large number of input variables and providing more stable prediction outcomes. Meanwhile, SHAP analysis enhances model interpretability by quantifying the contribution of each feature to the model’s predictions, helping to reveal the complex relationships between risk factors and depression risk. Therefore, we suggest that future research explore the application of these methods in the assessment of depression risk among nurses to further optimize predictive models and provide more targeted intervention strategies for clinical practice.

Conclusion and limitation

When it comes to predicting the risk of depression in nurses who worked during the full liberalization of COVID-19, the XGBoost model outperforms the other three machine learning algorithms. This makes it useful for screening out the high-risk group of nurses who experienced depression during the full liberalization of COVID-19, based on early clinical characteristics. However, it is important to note the limitations of this study. First, the sample size is relatively small and was conducted at a single location, which may limit the generalizability of the findings. Future studies should consider larger, multi-center samples to validate the model’s performance across different populations. Second, while the XGBoost model demonstrated high predictive accuracy in this study, its performance may vary in other datasets. Therefore, external validation on more diverse datasets is necessary to confirm the model’s robustness. Third, the study only focused on a limited set of risk factors and did not explore other potential factors such as socioeconomic status, personal history of mental illness, or support systems. Future research should consider a broader range of variables to provide a more comprehensive understanding of depression risk among nurses. In conclusion, this study provides valuable insights into the risk factors for depression among nurses during the COVID-19 pandemic. The XGBoost model identified key risk factors and demonstrated strong predictive performance. However, further research is needed to address the limitations identified and to develop more robust and generalizable models.

Relevance for clinical practice

We recommend increasing awareness of nurses’ psychological support services following the comprehensive liberalization phase of the COVID-19 pandemic. This includes creating counseling and assistance programs, promoting initiatives to support nurses’ psychological well-being, and ensuring access to psycho-behavioral therapy if necessary. The findings of this study can be used to identify the characteristics of nurses who are more prone to experiencing depression and other health issues. By employing the best model to accurately detect risk factors for depression, interventions can be implemented to focus on the necessary support, especially for medical workers who are at higher risk of mental health issues, such as nurses. To analyze the risks during and after the pandemic, a follow-up evaluation of coping mechanisms and assessment techniques is suggested for future study.

Data availability

The data supporting the findings of this study are openly available in a public repository with a DOI. The dataset openly available in a public repository that issues datasets with DOls Repository URL:https://dx.doi.org/10.21203/rs.3.rs3977935/v1. The data have been anonymized to protect the privacy of the participants. Any additional information related to the study design, analysis code, and supplementary materials can be provided upon reasonable request to the corresponding author.

References

Xi H, Chen Y, Sun C, Chen X, Wang X. Spatiotemporal characteristics and prevention and control measures of SARS-CoV-2 Omicron pandemic in Shanghai. Shanghai Prev Med. 2023;35(1):22–27. https://doiorg.publicaciones.saludcastillayleon.es/10.19428/j.cnki.sjpm.2023.22301.
Article Google Scholar
Tu HW, Gan P, Zhong RX. Transmission route of SARS-CoV-2 and personal health intervention: a review on research advances. Chin J Public Health. 2022;38(8):1011–17. https://doiorg.publicaciones.saludcastillayleon.es/10.11847/zgggws1138953.
Article Google Scholar
Murat M, Kose S, Savaser S. Determination of stress, depression and burnout levels of front-line nurses during the COVID-19 pandemic. Int J Ment Health Nurs. 2021;30(2):533–43. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/inm.12818.
Article PubMed Google Scholar
Wu WM, DJP, Ren W, et al. Investigation on the mental health status and influencing factors of medical personnel after the full implementation of Covid-after the full implementation of Covid-1919 epidemicEpidemic. Chin J Manip Rehabil Med. 2023;14(9):100–05. https://doiorg.publicaciones.saludcastillayleon.es/10.19787/j.issn.1008-1879.2023.09.025.
Article Google Scholar
Xiao J, Liu L, Peng Y, Wen Y, Lv X, Liang L, Fan Y, Chen J, Chen Y, Hu H, Peng W, Wang H, Luo W. Anxiety, depression, and insomnia among nurses during the full liberalization of COVID-19: a multicenter cross-sectional analysis of the high-income region in China. Front Public Health. 2023;11:1179755. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpubh.2023.1179755
Zhang C. A comparative study on the diagnosis and prediction of depression using machine learning algorithms- Based on NHANES data. 2022.
Dong Y, Yeo MC, Tham XC, Danuaji R, Nguyen TH, Sharma AK, Rn K, Pv M, Tai ML, Ahmad A, Tan B, Ho RC, Chua MCH, Sharma VK. Investigating psychological differences between nurses and other health care workers from the Asia-Pacific region during the early phase of COVID-19: machine learning approach. JMIR Nurs. 2022;5(1):e32647.
Article PubMed PubMed Central Google Scholar
Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39(2):175–91.
Article PubMed Google Scholar
Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Lawrence Erlbaum Associates; 1988.
Google Scholar
World Medical Association. Declaration of helsinki: ethical principles for medical research involving human subjects. 2024.
Chung KF, Kan KK, Yeung WF. Assessing insomnia in adolescents: comparison of insomnia severity index, athens insomnia scale and sleep quality index. Sleep Med. 2011;12(5):463–70. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.sleep.2010.09.019.
Article PubMed Google Scholar
Lu W, Bian Q, Wang W, Wu X, Wang Z, Zhao M. Chinese version of the Perceived Stress Scale-10: apsychometric study in Chinese university students. PloS one. 2017;12(12):e0189543. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pone.0189543.
Cohen S, Williamson G. Perceived stress in a probability sample of the United States. In: Spacapan S, Oskamp S, editors. The Social Psychology of Health. Sage Publications; 1988;31–67.
Google Scholar
Lee E. Review of the psychometric evidence of the perceived stress scale. Asian Nurs Res. 2003;1(2):125–32.
Google Scholar
Bian C, He X, Qian J, Wu W, Li C. Application of the patient health questionnaire depression symptom cluster scale in a general hospital. J Tongji Univ (Medical Science). 2009;30(05):136–40.
Chen S, Tang Y, Liu Y. ADASYN: adaptive synthetic sampling approach for imbalanced learning. Proc IEEE Int Joint Conf Neural Netw. 2004;3(1):1322–29.
Google Scholar
Chubb H, Simpson JM. The use of Z-scores in paediatric cardiology. Ann Pediatr Cardiol. 2012;5(2):179–84. https://doiorg.publicaciones.saludcastillayleon.es/10.4103/0974-2069.99622.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc 14th Int Joint Conf Artif Intell. 1995;2(1):1137–43.
Google Scholar
Lian X, Qi J, Yuan M, Li X, Wang M, Li G, et al. Study on risk factors of diabetic peripheral neuropathy and establishment of a prediction model by machine learning. BMC medical informatics and decision making. 2023;23(1):146. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-023-02232-1.
Jian D, Dong G, You R, Hui C, Guoxin L. The relationship between perceived stress and the severity of insomnia among officers and soldiers in dispersed remote troops: a moderated mediation model. Psychologies. 2023;12(18):21–23. https://doiorg.publicaciones.saludcastillayleon.es/10.19738/j.cnki.psy.2023.12.006.
Kandemir D, Temiz Z, Ozhanli Y, Erdogan H, Kanbay Y. Analysis of mental health symptoms and insomnia levels of intensive care nurses during the COVID-19 pandemic with a structural equation model. J Clin Nurs. 2022;31(5-6):601–11. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/jocn.15918.
Article PubMed Google Scholar
Martin-Rodriguez LS, Escalda-Hernandez P, Soto-Ruiz N, Ferraz-Torres M, Rodriguez-Matesanz I, Garcia-Vivar C. Mental health of Spanish nurses working during the COVID-19 pandemic: a cross-sectional study. Int Nurs Rev. 2022;69(4):538–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/inr.12764.
Article PubMed Google Scholar
Mu Y, Duan Y, Liu D, Li Y, Yu Y. A survey on anxiety and depression among nurses in a cardiovascular hospital during the COVID-19 pandemic. Chinese Journal of Nursing. 2020;55(S1):93–4.
Article Google Scholar
Hendy A, Abozeid A, Sallam G, Abboud Abdel Fattah H, Ahmed Abdelkader Reshia F. Predictive factors affecting stress among nurses providing care at COVID-19 isolation hospitals at Egypt. Nurs Open. 2021;8(1):498–505. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/nop2.652.
Article PubMed Google Scholar
Morelen D, Najm J, Wolff M, Daniel K. Taking care of the caregivers: the moderating role of reflective supervision in the relationship between COVID-19 stress and the mental and professional well-being of the IECMH workforce. Infant Ment Health J. 2022;43(1):55–68. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/imhj.21956.
Article PubMed PubMed Central Google Scholar
Stevenson DG, Busch AB, Zarowitz BJ, Huskamp HA. Psychotropic and pain medication use in nursing homes and assisted living facilities during COVID-19. J Am Geriatr Soc. 2022;70(5):1345–48. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/jgs.17739.
Article PubMed PubMed Central Google Scholar
Kim SC, Quiban C, Sloan C, Montejano A. Predictors of poor mental health among nurses during COVID-19 pandemic. Nurs Open. 2021;8(2):900–07. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/nop2.697.
Article PubMed Google Scholar
Liang L, Yuan T, Guo X, Meng C, Lv J, Fei J, Mei S. The path of depression among frontline nurses during COVID-19 pandemic: a fuzzy-set qualitative comparative analysis. Int J Ment Health Nurs. 2022;31(5):1239–48. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/inm.13033.
Article PubMed Google Scholar
Priyadharshini M, Banu AF, Sharma B, Chowdhury S, Rabie K, Shongwe T. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. Sensors (Basel). 2023;23(15). https://doiorg.publicaciones.saludcastillayleon.es/10.3390/s23156836.
Ren Z, Xin Y, Ge J, Zhao Z, Liu D, Ho RCM, Ho CSH. Psychological Impact of COVID-19 on College Students After School Reopening: a Cross-Sectional Study Based on Machine Learning. Front Psychol. 2021;12:641806. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpsyg.2021.641806.
Article PubMed PubMed Central Google Scholar
Tian Z, Qu W, Zhao Y, Zhu X, Wang Z, Tan Y, Jiang R, Tan S. Predicting depression and anxiety of Chinese population during COVID-19 in psychological evaluation data by XGBoost. J Affect Disord. 2023;323:417–25. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2022.11.044.
Article PubMed Google Scholar
Liu W, Gerdtz MF, Liu TQ. A survey of psychiatrists’ and registered nurses’ levels of mental health literacy in a Chinese general hospital. Int Nurs Rev. 2011;58(3):361–69. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/j.1466-7657.2011.00883.x.
Article CAS PubMed Google Scholar
Ma X. A study on the psychological health status and model of nursing personnel under sudden public health incidents. Chin J Gen Pract. 2023;21(16):2263–67. https://doiorg.publicaciones.saludcastillayleon.es/10.12104/j.issn.1674-4748.2023.16.028.
Article Google Scholar
Huiping Li AH. Machine learning for prediction of all-cause mortality in critically ill patients. J Pra Med. 2020;36(4):466–69. https://doiorg.publicaciones.saludcastillayleon.es/10.3969/j.issn.1006-5725.2020.04.009.
Article Google Scholar
Qiao HM, Sheng Y. Classification prediction and application of diabetes based on XGBoost model. Mod. Instrum Med Treat. 2023;29(4):1–7. https://doiorg.publicaciones.saludcastillayleon.es/10.11876/mimt202304001.
Article Google Scholar

Download references

Acknowledgments

We are grateful to all Chinese nurses who are fighting COVID-19 on the front lines.

Funding

Our study is supported by Nursing Project of the Anhui Institute of Translational Medicine (No:2024zhyx-h1-B15); Youth Science Fund of Anhui Medical University (No:2022xkj014); Social Science Fund of Anhui Provincial Department of Education (No:2022AH050635).

Author information

Authors and Affiliations

School of Nursing, Anhui Medical University, No.15 Feicui Road, Hefei, 230601, China
Xiaoyan Qi
The Taikang Health and Wellness Industry Research Institute, Anhui Medical University, Hefei, China
Xiaoyan Qi
School of Management, Anhui University, Hefei, China
Xin Huang

Authors

Xiaoyan Qi
View author publications
You can also search for this author inPubMed Google Scholar
Xin Huang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Author contributionsApart from accepting full accountability for the work, Xiaoyan Qi and Xin Huang made noteworthy contributions to the ideation and planning, gathering, analyzing, and interpreting data, as well as drafting or critically revising the manuscript for important intellectual content.

Corresponding author

Correspondence to Xin Huang.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the Ethics Committee of Anhui Children’s Hospital (Ethical approval number: EYLL-2021-018) and conducted in accordance with the Declaration of Helsinki [10]. All methods followed relevant guidelines and regulations. Participants provided informed consent before taking part in the virtual survey, acknowledging the voluntary nature of their involvement. Personal information was secured and preserved in compliance with Chinese ethical laws.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Qi, X., Huang, X. Machine learning-driven identification of key risk factors for predicting depression among nurses. BMC Nurs 24, 368 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12912-025-02957-6

Download citation

Received: 01 January 2025
Accepted: 12 March 2025
Published: 03 April 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12912-025-02957-6

Machine learning-driven identification of key risk factors for predicting depression among nurses

Abstract

Background

Objectives

Methods

Results

Conclusion

Clinical trial number

Introduction

Methods

Research design

Sample and setting

Instrument

End events

Construction of the data set

Model establishment and evaluation

Ethical considerations and data collection

Statistical methods

Results

Comparison of baseline data between the depressive and non-depressive groups

Comparison of baseline data between the training and validation sets after sample equalization processing

Model selection and establishment

Model evaluation and feature screening

Discussion

Conclusion and limitation

Relevance for clinical practice

Data availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Nursing

Contact us