- Research
- Open access
- Published:
Can large language models facilitate the effective implementation of nursing processes in clinical settings?
BMC Nursing volume 24, Article number: 394 (2025)
Abstract
Background
The quality of generative nursing diagnoses and plans reported in existing research remains a topic of debate, and previous studies have primarily utilized ChatGPT as the sole large language mode.
Purpose
To explore the quality of nursing diagnoses and plans generated by a prompt framework across different large language models (LLMs) and assess the potential applicability of LLMs in clinical settings.
Methods
We designed a structured nursing assessment template and iteratively developed a prompt framework incorporating various prompting techniques. We then evaluated the quality of nursing diagnoses and care plans generated by this framework across two distinct LLMs(ERNIE Bot 4.0 and Moonshot AI), while also assessing their clinical utility.
Results
The scope and nature of the nursing diagnoses generated by ERNIE Bot 4.0 and Moonshot AI were similar to the “gold standard” nursing diagnoses and care plans.The structured assessment template effectively and comprehensively captures the key characteristics of neurosurgical patients, while the strategic use of prompting techniques has enhanced the generalization capabilities of the LLMs.
Conclusion
Our research further confirms the potential of LLMs in clinical nursing practice.However, significant challenges remain in the effective integration of LLM-assisted nursing processes into clinical environments.
Introduction
Large Language Models (LLMs), based on the Transformer neural network architecture, are deep learning models with powerful natural language processing capabilities. One of their most attractive features is their ability to engage in human-like conversations, answer questions across various professional domains, complete translation tasks, and generate programming code, among other complex language tasks [1, 2]. In recent years, the application of LLMs in nursing research, education, and practice has grown exponentially [3,4,5]. In clinical nursing practice, one of the most promising applications of LLMs is their capacity to generate personalized nursing diagnoses and care plans [6]. These two components are fundamental to the nursing process [7]. However, despite more than half a century since the concept of the nursing process was introduced, its implementation in clinical settings remains suboptimal. Nursing care is often based on routines and medical orders, with unplanned care being prevalent [8]. The primary reasons for unplanned care are a lack of knowledge and skills in developing care plans, as well as nurse staffing shortages [8]. LLMs may offer a novel approach to address these challenges. Recent studies have explored the use of ChatGPT for generating nursing diagnoses and care plans for various patient populations, including obese patients, mental health care, perinatal care, and lung cancer patients [9,10,11,12]. While these studies acknowledge the potential value of LLMs in nursing applications, concerns about the accuracy and reliability of LLM-generated nursing texts remain [13]. The quality of generated content is partly influenced by the quality of the prompts, which are the input texts or questions provided when interacting with LLMs. Designing effective prompts is crucial for obtaining desired outcomes [14]. Previous research has highlighted the value of structured prompts in enhancing the performance and generalization capabilities of LLMs [15]. Notably, all LLMs applied in related studies to date have been ChatGPT, and there is a lack of research on the use of other LLMs in nursing. Therefore, we selected two additional LLMs—ERNIE Bot 4.0 and Moonshot AI—constructed a prompt framework, and designed specific questioning strategies. We applied these to clinical cases of neurosurgical discharge patients to explore the potential of different LLMs in generating nursing diagnoses and care plans. This study aims to provide a reference for the integration of LLM-assisted nursing processes in clinical settings.
North American Nursing Diagnosis Association International (NANDA-I), Nursing Interventions Classification (NIC), and Nursing Outcomes Classification (NOC) are collectively known as the NNN linkages, which currently represent the most widely utilized standardized nursing terminology systems in the international nursing field [16]. NANDA-I provides a comprehensive set of nursing diagnostic terms that assist healthcare providers in identifying patients’ nursing problems [17]; NIC offers a validated series of nursing interventions [18]; and NOC focuses on the establishment of expected outcomes and the evaluation of nursing effectiveness [19]. This system significantly supports the nursing process by providing theoretical guidance and an operational framework for the stages of nursing diagnosis, care planning, and outcome evaluation. In the late 1990s, the nursing community in China began to gradually engage with, learn about, and explore the application of NNN linkages within the Chinese context. Over the years, NNN linkages have been adopted by numerous nursing education institutions and large comprehensive hospitals in China, with their use gradually expanding. According to reports, in nursing education, the integration of NNN linkages into the Chinese undergraduate nursing curriculum in the course 《Fundamentals of Nursing》 has facilitated the development of students’ critical thinking skills [20]. In specialized nursing fields, NNN linkages have been applied in clinical pathways for breastfeeding, home visits for hypertensive patients, cardiac rehabilitation nursing, and nursing practices for various patient populations and care scenarios [21,22,23]. As research progresses, the application of NNN linkages in China has expanded from initial uses in medical record documentation to more recent applications in medical insurance reimbursement, nursing information exchange, and care resource allocation. These applications demonstrate the significant role of NNN linkages in continuous documentation, clinical decision-making, and consistent evaluation. However, the applicability of NNN linkages in China remains constrained by language and regional factors, which do not fully align with the national context. Therefore, this study includes neurosurgery clinical nursing experts and utilizes the Chinese translation of the NNN linkages to evaluate the quality of nursing diagnoses and care plans generated by different LLMs.
Methods
The selection of LLMs
The two LLMs selected for this study are ERNIE Bot 4.0 and Moonshot AI, with their most recent updates in September 2024 and October 2024, respectively. These models were chosen based on their performance in the Chinese local market and their specific technical advantages. ERNIE Bot 4.0, developed by Baidu, integrates advanced technologies such as multimodal learning and reinforcement learning, making it particularly well-suited for natural language processing tasks in the Chinese context [24]. Currently, ERNIE Bot 4.0 is widely applied across various service platforms by Baidu, including intelligent customer service, search engines, and medical information processing. Moonshot AI, developed by a Chinese startup, aims to optimize its understanding of Chinese text through deep learning models and has shown strong adaptability and potential in the healthcare sector [25]. Additionally, GPT-4 and Bard are globally recognized LLMs with significant influence in multilingual processing and cross-cultural adaptability. However, compared to GPT-4 and Bard, ERNIE Bot 4.0 and Moonshot AI offer clear advantages in understanding and generating Chinese text. GPT-4 and Bard are primarily trained on English-language corpora, and their processing of Chinese text may be influenced by differences in linguistic structure, particularly in grammar and context. In contrast, ERNIE Bot 4.0 and Moonshot AI have been specifically optimized for Chinese natural language processing, enabling them to better handle the ambiguity, grammatical complexity, and culturally/contextual nuances inherent in the Chinese language. The goal of this study is to explore the performance of LLMs in specific nursing tasks, taking into account the unique localization needs of the nursing industry. Given the restrictions on GPT-4 in mainland China, we selected ERNIE Bot 4.0 and Moonshot AI, two Chinese-based LLMs, for this research.
Inclusion criteria for nursing experts
The scoring criteria used for the selection of experts and the corresponding score assigned were as follows: nursing master’s degree, 4 points; nursing master’s degree with dissertation directed to relevant content of the nursing diagnosis in this study, 1 point; publication of an article about nursing diagnosis in reference journals, 2 points; article published on nursing diagnosis and with content relevant to the area, 2 points; doctorate in nursing diagnosis, 2 points; clinical experience of at least 1 year in the area of the study, 1 point; and specialized experience of clinical practice relevant in the area of diagnosis, 2 points. Experts who accumulated a total score of ≥ 5 points from the above criteria were considered eligible as nursing experts [26].
Construction of structured nursing assessment template and prompt framework
Our research methodology consists of two main components: designing a structured nursing assessment template and constructing an iterative prompt framework. First, we designed a structured nursing assessment template based on the Chinese editions Xinbian Hulixue Jichu(4th Edition) and Waike Hulixue(7th Edition), with a specific focus on neurosurgical diseases [27, 28]. The template also incorporates discharge cases from five common neurosurgical conditions: craniocerebral injury, aneurysmal subarachnoid hemorrhage, pituitary tumors, vestibular schwannomas, and facial spasm. This template is intended to be universally applicable in neurosurgical clinical settings. It includes six categories: general information, chief complaint, current diagnosis, past health history, current health status, and current psychological status. Each category is designed to ensure a comprehensive and objective assessment of the patient’s condition and needs (see Table 1). Second, we developed an iterative prompt framework to generate nursing diagnoses by gradually integrating structured nursing assessment content. This process involved iterative adjustments to role-playing, examples, output constraints, and thought-chain guidance [29]. The framework consists of four components: instructions, examples, structured assessment content, and output requirements (see Fig. 1 and Annex). Finally, following the framework of instructions, examples, and output requirements, we prompted the LLMs to generate corresponding expected outcomes and nursing interventions (see Fig. 2 and Annex). The refinement process is detailed as follows:
Prompt Framework and Tips for Generating Nursing Diagnoses. Note: The “First Input Prompt (glioma)” shown in the figure is only a partial display. For detailed content, please refer to the Annex 1
Flowchart of the Prompt Framework and Questioning Strategy. Note: The “First Input Prompt” shown in the figure is only a partial display.For detailed content, please refer to the Annex 1
Design of the structured assessment template (Iterations 1–3)
Based on theoretical knowledge and five discharge cases, three clinical nursing experts in neurosurgery reviewed and unanimously approved the categories and specific items of the structured nursing assessment template. At this stage, the content of the template was designed to align with patient needs and disease characteristics. To ensure universality and operability, brief descriptions of each assessment category were provided, and limitations on the scope of the assessment items were established.
Construction of the prompt framework (Iterations 4–8)
Building upon the template, the three nursing experts developed structured assessment content for the five discharge cases. In collaboration with a computer expert, prompt technology was applied to construct a preliminary prompt framework based on the structured assessment content for each case. The prompts for each case were repeatedly input into ERNIE Bot 4.0, and the quality of each response was recorded and analyzed. To evaluate the quality of the generated responses, gold standards for nursing diagnoses were established for each case, and the outputs from ERNIE Bot 4.0 were compared against these standards to optimize the prompt framework. The aim of these iterative cycles was to refine the questioning logic, clarify prompting techniques, and stabilize the prompt framework.
Follow-up questioning and fine-tuning (Iterations 9–12)
Once the scope and nature of the generated nursing diagnoses achieved relative stability, follow-up questions were progressively asked to ERNIE Bot 4.0 to generate corresponding expected outcomes and nursing interventions. Gold standards for expected outcomes and nursing interventions were established for each case to assess the quality of the generated nursing plans. Additionally, Moonshot AI, another large language model, was used to repeatedly generate nursing diagnoses and plans for comparison. During this phase, detailed analyses of the outputs from both LLMs were conducted, with repeated adjustments to the prompt framework’s language descriptions to assess its stability.
We have outlined the methodology for developing a prompt framework, based on the “Structured Thinking Prompt Framework.” This framework involves extracting structured case information and integrating role-playing, contextual learning, and thought-chain prompting techniques to guide LLMs in step-by-step reasoning. The goal is to improve the generalization ability of LLMs in generating high-quality nursing recommendations. NNN linkages are primarily integrated into the prompt framework as constraints for content generation. The LLM is instructed to generate nursing diagnoses based on NANDA-I, expected outcomes based on the NOC classification, and nursing interventions based on the NIC classification.In this study, we demonstrate the application of this framework in generating nursing plan outputs for glioma patients using ERNIE Bot 4.0 and Moonshot AI (see Annex 1).
Evaluation of LLMs-Generated nursing diagnoses and care plans
Before the study began, three human neurosurgery nursing experts who met the inclusion criteria underwent standardized and homogeneous training. The training covered the interpretation and application of the NNN linkages standards [17,18,19], the application methods of the Fehring model [30], and the operational procedures for using LLMs. Subsequently, the three nursing experts held detailed discussions to establish the ‘gold standard’ for nursing diagnoses and care plans in the case of brain glioma. The LLMs-generated nursing diagnoses and care plans were then compared with the ‘gold standard’ to evaluate the quality of the generated nursing diagnoses and care plans. The evaluation focused on the following aspects: 1) the scope and nature of the generated content, 2) the prioritization of generated nursing diagnoses, and 3) the accuracy of nursing terminology descriptions in the generated content. The evaluation procedure was as follows: 1) The three nursing experts first reviewed whether the LLMs-generated nursing diagnoses and care plan entries conformed to the scope of the NNN linkages [16]. They then applied the Fehring model [30] to evaluate the content validity of the generated nursing diagnoses and care plans. For any controversial entries, an evidence-based literature review approach was used to discuss and reach a consensus. 2) The priority of the LLMs-generated nursing diagnoses was compared to that of the ‘gold standard’ nursing diagnoses. 3) The descriptions of the LLMs-generated nursing diagnoses and care plan entries were compared with the NNN linkages. Given the differences in cognitive patterns, culture, and language, the Chinese translations of NANDA-I Nursing Diagnoses: Definitions and Classification (2021–2023) by Li Xiaomei et al. [31] and Nursing Diagnosis, Outcomes, and Interventions (2nd Edition) by Wu Yuanjian et al. [32] were used as reference sources to compare the accuracy of the terminology descriptions.
Results
The scope and nature of the nursing diagnoses generated by ERNIE Bot 4.0 and Moonshot AI were similar to the “gold standard” nursing diagnoses (see Fig. 3 and Annex). The “gold standard” nursing diagnoses included 11 items (based on NANDA-I), while ERNIE Bot 4.0 generated 10 items, with 7 being an exact match, 3 being similar, and 1 omitted. Moonshot AI also generated 10 items, with 6 being an exact match, 2 being similar, 1 omitted, and 2 incorrect. Both models omitted “Potential Complication: Epilepsy.” The incorrect items generated by Moonshot AI were “Abnormal Urination” and “Knowledge Deficiency,” which were deemed incorrect because no supporting evidence for these diagnoses was found in the structured assessment content provided as input.
The items that were an exact match or similar corresponded one-to-one with the structured assessment content, and their accuracy and practicality were satisfactory, providing valuable references for clinical nursing. In terms of nursing priority ranking, both LLMs emphasized the importance of addressing “Acute Pain” and “Potential Complication: Cerebral Hernia.” ERNIE Bot 4.0 also highlighted “Ineffective Airway Clearance,” and Moonshot AI emphasized “Potential Complication: Lung Infection,” both placing them in the high-priority category for airway management, which aligns with the patient’s primary needs. The intermediate and low priorities were ranked according to the patient’s existing problems and potential risks, generally corresponding to patient needs and clinical scenarios (see Fig. 3 and Annex).
The expected outcomes and nursing interventions generated by ERNIE Bot 4.0 were closely aligned with the nursing diagnoses, and their scope and nature were consistent with the gold standard. In contrast, the expected outcomes and nursing interventions generated by Moonshot AI were not listed in order of nursing diagnosis priority, but their scope and nature also closely matched the gold standard (see Fig. 4 and Annex). Regarding terminology, the Chinese descriptions of nursing diagnoses and expected outcomes generated by both LLMs were highly similar to the gold standard (based on NANDA-I, NOC, NIC). However, the accuracy of the nursing intervention descriptions was lower. NNN linkages clearly define the classification of nursing interventions, emphasizing standardization and consistency. Each nursing intervention is precisely defined with a standardized expression, ensuring consistency and comparability across nursing diagnoses and care plans. In contrast, the nursing interventions generated by the LLMs are expressed more freely and do not strictly follow the standardized format of the terminology system. Furthermore, the LLMs-generated interventions tend to be more generalized and lack sufficient actionable details. A notable feature of both LLMs was their step-by-step reasoning and generation process based on the prompt requirements, with explanations and necessary reminders for each response. These features further underscore the potential value of effectively applying LLM-assisted nursing procedures in clinical settings.
Discussion
In this study, a prompt framework was developed by constructing structured nursing assessments and applying prompting engineering techniques. By simply replacing the relevant information in the structured nursing assessment, the framework was made adaptable to various neurosurgical disease cases. The structured nursing assessments were performed manually, while the nursing diagnosis, expected outcomes, and nursing interventions were generated by the large language model based on the structured information. This approach effectively combined the strengths of both human expertise and machine-generated insights. The nursing plans created using this framework demonstrated strong performance in terms of content relevance, completeness, prioritization, and accuracy of descriptions, showing a notable level of credibility and applicability. Therefore, under the supervision and review of clinical nursing experts, LLMs have the potential to effectively support the application of nursing procedures in clinical practice.
Previous studies have highlighted significant controversy regarding the quality of nursing plans generated by LLMs. In earlier research by Gosak [9] and Woodnutt [10], the accuracy of generative nursing diagnoses and plans was relatively low. However, more recent work by Dos Santos [12] indicated that generative nursing plans closely approximated the “gold standard” nursing plans developed by humans. All of these studies utilized ChatGPT, with prompts consisting of narrative text. The controversy may stem from structural optimization and iterative refinement of the prompts in Dos Santos [12], along with the delegation of the nursing assessment step to human evaluators. Our study adopts a design approach similar to that of Dos Santos [12], yielding comparable results. The key difference lies in the framework construction: while Dos Santos [12] developed their prompt framework based on the Theory of Human Needs and the Situation-Background-Assessment-Recommendation model, we have designed our framework using “Structured Thinking Prompt Framework,” which integrates disease knowledge, clinical practice, and prompting techniques [29, 33]. Furthermore, our framework was applied to the outputs of LLMs from different companies (ERNIE Bot 4.0 and Moonshot AI). The nursing plans generated by both LLMs showed high similarity to the “gold standard” in terms of both scope and nature. This outcome suggests that our prompt framework enhances the generalization ability of LLMs in generating high-quality nursing plans. Potential reasons for this include the effective application of prompting strategies and techniques, particularly role-playing, exemplars, and chain-of-thought prompting, which improve the explainability and credibility of the generated results [34, 35]. These techniques, grounded in human cognitive theory, enable LLMs to adopt human-like logical reasoning, thereby enhancing their performance [36]. Additionally, and importantly, our structured nursing assessment template comprehensively presents the background information, current status, and needs of neurosurgical patients. This template, similar to a neurosurgical nursing assessment checklist, includes objective and easily obtainable assessment items tailored to the characteristics of neurosurgical patients. The assessment process closely aligns with clinical nursing practices in collecting and organizing relevant information, a practice consistently followed in clinical settings, thus demonstrating strong clinical applicability.
It is important to note that for this study, we deliberately selected cases of patients who had recovered and been discharged, excluding disease-related information from hospitalized patients. Additionally, the generated nursing care plans were not applied to clinical decision-making for similar hospitalized patients (with analogous diagnoses) during the study period. Throughout the research process, we intentionally omitted sensitive, confidential, and identifiable information, such as patients’ names and addresses [5]. This approach was guided by legal, safety, and ethical considerations. It must also be acknowledged that the field of Large Language Models is rapidly evolving, with new solutions being developed on a daily basis. As a result, ensuring the reproducibility and consistency of generated outcomes remains challenging. Furthermore, the nursing diagnoses and care plans generated in this study showed some deviation from the established “gold standard.” LLMs rely on complex algorithms and build knowledge structures through training on large datasets to perform their functions [2], which may explain why the Moonshot AI model produced biased nursing diagnoses, such as “Abnormal Urination” associated with urinary catheters, as well as a diagnosis of “Knowledge Deficiency” that lacked sufficient supporting evidence in this study. Additionally, both LLMs failed to identify “Potential Complication: Epilepsy,” even after multiple iterations of prompt input. This outcome may be attributed to the absence of relevant descriptions in the structured assessment content or could be related to the specific training datasets used for the two LLMs—ERNIE Bot 4.0 and Moonshot AI. Therefore, before applying generative nursing diagnoses and care plans to clinical practice, it is essential to give full attention to and carefully consider the ethical issues that may arise from incorrect or inaccurate generated results. Firstly, it is crucial to promptly and effectively identify incorrect or inaccurate outputs and prevent their implementation in clinical practice. Establishing a manual review mechanism is currently the most effective way to address this issue. Each nursing diagnosis and care plan generated by an LLMs must undergo review by clinical nurses or nursing experts to ensure it meets actual nursing needs and the specific conditions of the patient before it is applied in clinical care practice. Secondly, in terms of improving LLMs performance, establishing a continuous optimization feedback mechanism, and maintaining LLMs transparency and traceability are also important measures to ensure patient safety. By systematically collecting feedback, particularly regarding errors or incorrect generated results, developers can further optimize the LLMs to enhance its accuracy and adaptability. By ensuring LLMs transparency and traceability—clearly presenting the generation process of each nursing diagnosis and care plan (including the underlying clinical knowledge and algorithmic principles)—nurses can clearly understand the basis for the LLMs-generated outputs, allowing them to make informed judgments and adjustments in uncertain situations. Therefore, it is essential to critically evaluate the recommendations generated by LLMs [37]. Nurses must leverage their clinical expertise and experience to assess the potential adverse effects of LLM suggestions on patient needs, safety, and ethical considerations [37]. It is imperative that nurses understand that LLMs are tools, not replacements for the core functions of nursing practice [5]. Moreover, nurses must acknowledge that their professional knowledge, communication skills, and empathy are irreplaceable [38]. Furthermore, although LLMs have shown promise in generating nursing care plans, their effectiveness in assisting nursing procedures is heavily influenced by individual nurses’ awareness, attitudes, learning capabilities, and proficiency in utilizing LLMs [39, 40]. Despite claims in the existing literature that LLMs can alleviate the burden of nursing documentation, direct evidence of their practical application in clinical settings remains limited. The question of whether LLMs can genuinely save time or merely serve as a gimmick that redistributes tasks without effectively reducing time spent on activities, along with identifying the specific contexts in which they can be most effectively prioritized, requires further investigation and discussion [41]. It can be inferred that the successful integration of LLM-assisted nursing programs into clinical practice still faces significant challenges and represents a long-term developmental process. The prompt framework we developed and its application in neurosurgical discharge cases further highlights the potential of LLMs to enhance nursing practice.
NNN linkages are one of the most widely applied standardized terminologies in the nursing field. In this study, by incorporating NNN linkages as a key constraint in the output of LLMs, we facilitate more accurate understanding and generation of nursing terminology and content that meet the professional requirements of nursing. This enhances the professionalism and accuracy of the nursing diagnoses and care plans generated by LLMs. Beyond NNN linkages, other nursing terminology systems, such as the Omaha System and the International Classification for Nursing Practice (ICNP), also hold similar potential to support the development and application of large language models [42]. Modifying the prompt framework to incorporate the Omaha System or ICNP can similarly generate corresponding nursing diagnoses and care plans. Through continuous optimization and feedback mechanisms, this process enables LLMs to learn and adapt to a broader range of specialized nursing terminologies, thereby improving their practical value in clinical nursing practice.
Nursing shortages are one of the major challenges facing healthcare systems worldwide. The high-pressure work environment and workload-induced fatigue are two key factors contributing to nursing shortages [43]. The rational and effective application of LLMs may help improve the work environment and reduce the documentation burden on nurses. On one hand, this study demonstrated that the process of generating nursing diagnoses and care plans using prompt frameworks and LLMs is extremely time-efficient, enabling nurses to quickly obtain accurate and personalized nursing diagnoses and interventions. This helps accelerate the decision-making process, improving the response time and quality of nursing services. On the other hand, the automation of nursing diagnosis and care plan generation can reduce the time nurses spend on documentation, freeing up more time for direct clinical care of patients. It is important to note, however, that the effective application of LLMs in the nursing field still requires some training resources and time, especially in regions with weaker technological infrastructure. In such areas, nurses’ acceptance of technology and training levels may be lower, which could limit the effectiveness of its implementation. However, aside from the need for manual completion of structured nursing assessments, the questioning logic and prompt word descriptions within the prompt framework are relatively fixed, making them easily applicable and input into LLMs without significantly increasing nurses’ workload. This is beneficial in addressing the global shortage of nursing personnel. The compatibility between healthcare information systems and LLMs is fundamental to realizing the automation and intelligence of nursing documentation. Different countries and regions have distinct healthcare systems, and factors such as resource disparities and variations in nursing practices can limit the further application and promotion of LLMs in healthcare. The prompt framework developed in this study is straightforward to use; as long as LLMs are available in a given country or region, this method can be applied to generate nursing diagnoses and care plans without requiring extensive infrastructure or training. With the continuous development of AI technology, the compatibility between healthcare information systems and LLMs will soon be realized and progressively implemented. Currently, large comprehensive hospitals in China have fully integrated the large language model ‘DeepSeek-R1,’ which has established an intelligent medical information system that provides AI-driven support in clinical decision-making, nursing care, and medical record generation, significantly enhancing work efficiency and alleviating workload. Therefore, the study of prompt engineering and its application in the nursing field will inevitably become a focal point of interest and a key topic for the nursing community in the AI era. Our research offers a novel exploratory approach and direction in this regard.
Limitations
The limitations of this study are as follows: 1. The structured nursing assessment template encompasses a broad range of content, and the information it contains may vary from brief and general to lengthy and comprehensive, depending on the assessor’s professional competence, language skills, and personal preferences. In this study, we have only interpreted and defined the items of this template based on the disease characteristics of neurosurgical patients. Further research is needed to explore the potential application of this template to other diseases. 2. The prompt framework is variable, and even minor adjustments to its textual description can significantly affect the response outcomes. This study primarily provides an exploratory approach aimed at enhancing clinical nurses’ understanding of LLMs and promoting prompt learning. 3. The quality of texts generated by different LLMs is also influenced by the language type of the training dataset (e.g., grammatical structures and language logic in English or Chinese). Selecting an appropriate LLM based on user needs may be an effective strategy for obtaining high-quality responses. Additionally, critically synthesizing texts generated by different LLMs on the same issue can be a valuable approach to achieving high-quality output. 4. Continuous vigilance is required regarding the potential legal, safety, and ethical risks associated with nursing care plans generated by LLMs. There is an urgent need to improve legislation and relevant medical and nursing systems to ensure the effective and responsible use of LLMs, enabling them to assist and enhance clinical nursing practice.
Conclusion
This study explored the process of applying structured nursing assessment combined with prompting techniques to generate nursing diagnoses and care plans using various LLMs. It demonstrated the potential application of the prompt framework we developed in neurosurgical discharge cases, while also imposing necessary restrictions from the user operation perspective to mitigate risks related to legal issues, privacy breaches, and ethical concerns. The study further confirmed the potential of LLMs in clinical nursing practice. However, a gap remains between this potential and the actual implementation of LLMs in assisting nursing procedures in clinical settings. Urgent challenges include the need for improved laws and regulations, enhanced privacy protection, and stronger ethical frameworks. Furthermore, there is considerable variation in clinical nurses’ awareness and acceptance of LLMs, and currently, there is a lack of LLM-specific training and education for clinical nurses. In the field of computer science, integrating or developing LLMs compatible with hospital information systems presents another significant challenge. Extensive future research and exploration are required to enable LLMs to serve clinical practice safely and effectively, alleviating nurses from burdensome procedural and repetitive tasks, enabling them to provide personalized care, and ultimately benefiting patient outcomes.
Data availability
No datasets were generated or analysed during the current study.
References
Gleason N. ChatGPT and the rise of AI writers: how should higher education respond? Times High Educ. 2022. https://www.timeshighereducation.com/campus/chatgpt-and-rise-ai-writers-how-should-higher-education-respon
Open AI. 2023. https://openai.com/news/research/
Yalcinkaya T, Cinar Yucel S. Bibliometric and content analysis of ChatGPT research in nursing education: The rabbit hole in nursing education. Nurse Educ Pract. 2024;77:103956.
Alessandri-Bonetti M, Liu HY, et al. The first months of life of ChatGPT and its impact in healthcare: A bibliometric analysis of the current literature. Ann Biomed Eng. 2024 May;52(5):1107–10.
Scerri A, Morin KH. Using chatbots like ChatGPT to support nursing practice. J Clin Nurs. 2023;32(15–16):4211–3.
Ruksakulpiwat S, Thorngthip S, et al. A systematic review of the application of artificial intelligence in nursing care: Where are we, and what’s next? J Multidiscip Healthc. 2024;17:1603–16.
Toney-Butler TJ, Thayer JM, Nursing P. 2023 Apr 10. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan.
Agyeman-Yeboah J, Korsah KA. Non-application of the nursing process at a hospital in Accra, Ghana: Lessons from descriptive research. BMC Nurs. 2018;17:45.
Gosak L, Pruinelli L, et al. The ChatGPT effect and transforming nursing education with generative AI: Discussion paper. Nurse Educ Pract. 2024;75:103888.
Woodnutt S, Allen C, et al. Could artificial intelligence write mental health nursing care plans? J Psychiatr Ment Health Nurs. 2024;31(1):79–86.
Johnson LG, Madandola OO et al. Creating perinatal nursing care plans using ChatGPT: A pathway to improve nursing care plans and reduce Documentation burden. J Perinat Neonatal Nurs. 2024 Nov 1.
Dos Santos FC, Johnson LG, et al. An example of leveraging AI for documentation: ChatGPT-generated nursing care plan for an older adult with lung cancer. J Am Med Inf Assoc. 2024;31(9):2089–96.
Dağci M, Çam F et al. Reliability and quality of the nursing care planning texts generated by ChatGPT. Nurse Educ. 2024 May-Jun 01;49(3):E109–14.
Sun GH. Prompt engineering for nurse educators. Nurse Educ. 2024 Nov-Dec;01(6):293–9.
Sivarajkumar S, Kelley M, et al. An empirical evaluation of prompting strategies for large Language models in Zero-Shot clinical natural Language processing: Algorithm development and validation study. JMIR Med Inf. 2024;12:e55318.
Herdmanh Kamitsurus. NANDA international nursing diagnoses definitions and classification. 2018–2020 [M]. 11th edition. NewYork: Thieme. 2019:1–5.
NANDA International. Nursing diagnoses definitions & classification 2018–2020. 11th ed. Thieme; 2018.
Butcher HK, Bulechek GM, Dochterman JM, Wagner C, editors. Nursing interventions classification (NIC). 7th ed. Elsevier; 2018.
Moorhead S, Swanson E, Johnson M, Maas ML, editors. Nursing outcomes classification (NOC): Measurement of health Out-comes. 7th ed. Elsevier; 2018.
Jing, Chen, et al. Application of nursing interventions and nursing outcomes classification in basic nursing education. J Nurs. 2012;27(8):72–4.
Xiaoqin Guo. Development of a clinical pathway for breastfeeding using the OPT model and NNN linkage. Nurs Res. 2015;29(5 C):1852–5.
Li X et al. Analysis of the effectiveness of home visits for elderly hypertensive patients in the community based on standardized nursing language. Chinese General Practice. 2013;16(27):3231–3233.
Duan X, Ding Y, Ning Y, Luo M. Application of NANDA-I nursing diagnoses, nursing interventions classification, and nursing outcomes classification in research and practice of cardiac rehabilitation nursing: A scoping review. Int J Nurs Knowl. 2024;35(3):256–271.
Wang SH, Sun Y, Xiang Y, et al. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. 2021.
Qin R, Li Z, He W, et al. Mooncake: Kimi's KVCache-centric Architecture for LLM Serving. 2024.
Santos ACFS, Mota ECH, Santos VD, et al. Validation of the nursing diagnosis "labile emotional control" in traumatic brain injury. J Nurs Scholarsh. 2019;51(1):88–95.
Cao M, Wang K, et al. Xinbian Hulixue Jichu. 4th ed. 2022.
Li Y, Lu Q, et al. Waike Hulixue. 7th ed. 2021.
Liu P, Yuan W, et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput Surv. 2023;55(9):195.
Fehring RJ. Methods to validate nursing diagnoses. Heart Lung. 1987 Nov;16(6 Pt 1):625–9.
Herdman TH, et al. NANDA-I HULI ZHENDUAN: DINGYI YU FENLEI. 1st ed. 2023.
Johnson M, et al. Nursing Diagnoses, Outcomes, & Interventions. 1st ed. 2010.
O'Connor S, Peltonen LM, et al. Prompt engineering when using generative AI in nursing education. Nurse Educ Pract. 2024 Jan;74:103825.
Xu B, Yang A, Lin J, Wang Q, Zhou C, Zhang Y, Mao Z. Expert Prompting: Instructing Large Language Models to be Distinguished Experts. arXiv. 2023 May;2305.14688.
Kojima T, Gu SS, et al. Large language models are zero-shot reasoners. Adv Neural Inf Process Syst. 2022;35:22199–22213.
Patacchiola M, Cangelosi A. A Developmental Cognitive Architecture for Trust and Theory of Mind in Humanoid Robots. IEEE Trans Cybern. 2022 Mar;52(3):1947–59.
Hobensack M, von Gerich H, et al. A rapid review on current and potential uses of large language models in nursing. Int J Nurs Stud. 2024 Jun;154:104753.
Abdulai AF, Hung L. Will ChatGPT undermine ethical values in nursing education, research, and practice? Nurs Inq. 2023 Jul;30(3):e12556.
Tuncer GZ, Tuncer M. Investigation of nurses' general attitudes toward artificial intelligence and their perceptions of ChatGPT usage and influencing factors. Digit Health. 2024 Aug 25.
Chang LC, Wang YN, et al. Registered Nurses' Attitudes Towards ChatGPT and Self-Directed Learning: A Cross-Sectional Study. J Adv Nurs. 2024 Oct 9.
Krüger L, Krotsetis S, OpenAI’s Generative Pretrained Transformer 3 (GPT-3). Model; Nydahl P. ChatGPT: Fluch Oder Segen in der Pflege? [ChatGPT: curse or blessing in nursing care?]. Med Klin Intensivmed Notfmed. 2023 Oct;118(7):534–9. German.
Tastan S, Linch GC, Keenan GM, Stifter J, McKinney D, Fahey L, Lopez KD, Yao Y, Wilkie DJ. Evidence for the existing American Nurses Association-recognized standardized nursing terminologies: A systematic review. Int J Nurs Stud. 2014 Aug;51(8):1160–70.
Drennan VM, Ross F. Global nurse shortages-the facts, the impact and action for change. Br Med Bull. 2019 Jun;130(1):25–37.
Funding
The authors received no external funding or grant to undertake this research.
Author information
Authors and Affiliations
Contributions
JP, YC1 and LH were responsible for the study conception and design; JP, YC1, LH and XC3 performed the construction and iteration of the prompt framework; YC1 and LH performed the organization and analysis of the textual results. JP, YC1 and LH were responsible for the drafting of the manuscript. All authors read and approved the final manuscript, and all authors agreed to be accountable for all aspects of the work.
Corresponding author
Ethics declarations
Ethical approval and consent to participate
The study was conducted in compliance with the Declaration of Helsinki and received ethical approval from the Medical Ethics Committee of the First Affiliated Hospital of Chongqing Medical University (Ethics review batch number K2023-191). All of the participants provided informed consent to participate in the study. Large Language Models were only used to generate nursing diagnoses and care plans in this study. Large Language Models were not utilized for the writing of the manuscript. All writing was completed by human authors. This is hereby stated for clarification.
Consent for publication
Not applicable.
Clinical trial number
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cao, Y., Hu, L., Cao, X. et al. Can large language models facilitate the effective implementation of nursing processes in clinical settings?. BMC Nurs 24, 394 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12912-025-03010-2
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12912-025-03010-2