AI applications have advanced rapidly in recent years, becoming an integral part of daily life. In the field of healthcare, they have been the subject of studies in a wide range of areas, including disease diagnosis, treatment management, prediction of complications, and interpretation of imaging and pathology examinations (6). Large language models (LLMs) powered by AI provide rapid responses by interpreting written text, scanning open sources, and summarizing information (7). This capability raises the possibility of their use by both patients and healthcare providers. Although algorithms developed for use by healthcare providers have not yet entered routine practice, their widespread adoption is anticipated in the near future.
Meanwhile, the accuracy and adequacy of these platforms, which are used by patients to obtain information, have become a topic of interest. The correctness and adequacy of responses provided by LLMs in patient education have been examined across various subtopics (8). This study aimed to investigate whether the responses generated by LLMs to basic questions in urological emergencies, which may require time-sensitive decision-making, are consistent with the literature, accurate, and reliable.
Urological emergencies encompass a variety of conditions, ranging from testicular torsion, which requires immediate intervention, to hematuria, which may allow for a relatively longer diagnostic window but can still lead to urgent outcomes. The lack of awareness of testicular torsion among patients and their families, delayed hospital presentation, and the potential for organ loss or future infertility can result in devastating consequences. A study investigating the causes of delayed testicular torsion found that only 23.8% of cases underwent timely surgery. Misdiagnosis and the initial consultation with a non-urologist were identified as risk factors for orchiectomy, emphasizing the importance of proper technical training and referral to prevent delays in the diagnosis and treatment of testicular torsion (9). In cases of testicular torsion presenting with scrotal pain, consulting a large language model (LLM) in remote areas with limited healthcare access could potentially reduce the time to initial presentation, thereby preventing orchiectomy.
Another example is urolithiasis, a highly prevalent condition in the general population. Although hospital visits and the need for analgesic treatment due to renal colic are common, patients may prefer to manage the condition without seeking medical attention based on prior experiences or anecdotal information. However, the development of fever and infection during this process may result in complicated urinary tract infections, such as pyelonephritis with obstruction, which, if left untreated, may progress to sepsis and multiorgan failure (10). Therefore, the lack of awareness among patients about the risk of sepsis in cases of renal colic complicated by infection may result in adverse outcomes in individuals who do not seek medical care. A comprehensive study examining factors related to mortality in obstructive pyelonephritis concluded that delayed decompression was associated with increased mortality, with higher rates observed in weekend admissions (11).
Another condition, penile fracture, occurs as an unexpected medical event in men. The dramatic presentation, including an audible snap during sexual intercourse and the appearance of hematoma, often signals the urgency of the situation even to untrained individuals. However, in such acute medical scenarios, the accuracy and reliability of responses provided by a free AI platform, which patients might consult to determine the urgency and potential complications, are of critical importance. In this study, we prioritized evaluating the responses of LLMs for patient education and guidance in these contexts.
Our results demonstrated that AI platforms generally provide accurate and adequate responses to basic questions regarding urological emergencies. While similar responses were predominantly observed across the three different AI assistants, no statistically significant differences were found among the results. The recent study examining the use of ChatGPT for self-diagnosis in orthopedic conditions suggested that, although it could serve as a potential initial step in accessing healthcare, it contained inconsistent results and emphasized the necessity of including clear language encouraging users to seek expert medical opinions (12). Another study investigating the use of AI platforms for emergency medical conditions highlighted that, even if the results are consistent, the ambiguity of sources and the presence of misleading information regarding the timing of medical interventions should be carefully considered due to potential risks (13). Scott et al., in their study evaluating AI-generated responses to urology patient messages, noted that ChatGPT performed better on simple questions compared to complex ones, suggesting its potential to assist care teams (14). A recent systematic review examining the use of LLMs in patient care underscored the need for caution due to the uncertainties inherent in this technology (15).
Furthermore, ethical considerations must be addressed, particularly concerning the reliance on AI tools without professional Supervision. As AI systems evolve, ensuring transparency in source attribution and decision-making logic becomes essential. Healthcare professionals must be aware of the limitations of these tools and use them as supplementary rather than primary decision-making instruments.
Studies involving LLMs must take into account several limitations. First, the instability of the platforms used, their ongoing development, and their potential for rapid evolution over time highlight the necessity of interpreting findings based on the specific conditions of the platforms at the time of the study. We emphasize that our study focused on basic and straightforward questions, with responses summarized in paragraph form for evaluation. The likelihood of inaccuracies or misleading information may increase with more complex and lengthy responses. Since we aimed to investigate basic questions in emergency scenarios, we believe it would be inappropriate to conclude complex urological emergency conditions based on these questions and answers. Given the continuous advancement and widespread adoption of these platforms, we consider it crucial to assess and research their accuracy and reliability consistently.
DISCUSSION
AI applications have advanced rapidly in recent years, becoming an integral part of daily life. In the field of healthcare, they have been the subject of studies in a wide range of areas, including disease diagnosis, treatment management, prediction of complications, and interpretation of imaging and pathology examinations (6). Large language models (LLMs) powered by AI provide rapid responses by interpreting written text, scanning open sources, and summarizing information (7). This capability raises the possibility of their use by both patients and healthcare providers. Although algorithms developed for use by healthcare providers have not yet entered routine practice, their widespread adoption is anticipated in the near future.
Meanwhile, the accuracy and adequacy of these platforms, which are used by patients to obtain information, have become a topic of interest. The correctness and adequacy of responses provided by LLMs in patient education have been examined across various subtopics (8). This study aimed to investigate whether the responses generated by LLMs to basic questions in urological emergencies, which may require time-sensitive decision-making, are consistent with the literature, accurate, and reliable.
Urological emergencies encompass a variety of conditions, ranging from testicular torsion, which requires immediate intervention, to hematuria, which may allow for a relatively longer diagnostic window but can still lead to urgent outcomes. The lack of awareness of testicular torsion among patients and their families, delayed hospital presentation, and the potential for organ loss or future infertility can result in devastating consequences. A study investigating the causes of delayed testicular torsion found that only 23.8% of cases underwent timely surgery. Misdiagnosis and the initial consultation with a non-urologist were identified as risk factors for orchiectomy, emphasizing the importance of proper technical training and referral to prevent delays in the diagnosis and treatment of testicular torsion (9). In cases of testicular torsion presenting with scrotal pain, consulting a large language model (LLM) in remote areas with limited healthcare access could potentially reduce the time to initial presentation, thereby preventing orchiectomy.
Another example is urolithiasis, a highly prevalent condition in the general population. Although hospital visits and the need for analgesic treatment due to renal colic are common, patients may prefer to manage the condition without seeking medical attention based on prior experiences or anecdotal information. However, the development of fever and infection during this process may result in complicated urinary tract infections, such as pyelonephritis with obstruction, which, if left untreated, may progress to sepsis and multiorgan failure (10). Therefore, the lack of awareness among patients about the risk of sepsis in cases of renal colic complicated by infection may result in adverse outcomes in individuals who do not seek medical care. A comprehensive study examining factors related to mortality in obstructive pyelonephritis concluded that delayed decompression was associated with increased mortality, with higher rates observed in weekend admissions (11).
Another condition, penile fracture, occurs as an unexpected medical event in men. The dramatic presentation, including an audible snap during sexual intercourse and the appearance of hematoma, often signals the urgency of the situation even to untrained individuals. However, in such acute medical scenarios, the accuracy and reliability of responses provided by a free AI platform, which patients might consult to determine the urgency and potential complications, are of critical importance. In this study, we prioritized evaluating the responses of LLMs for patient education and guidance in these contexts.
Our results demonstrated that AI platforms generally provide accurate and adequate responses to basic questions regarding urological emergencies. While similar responses were predominantly observed across the three different AI assistants, no statistically significant differences were found among the results. The recent study examining the use of ChatGPT for self-diagnosis in orthopedic conditions suggested that, although it could serve as a potential initial step in accessing healthcare, it contained inconsistent results and emphasized the necessity of including clear language encouraging users to seek expert medical opinions (12). Another study investigating the use of AI platforms for emergency medical conditions highlighted that, even if the results are consistent, the ambiguity of sources and the presence of misleading information regarding the timing of medical interventions should be carefully considered due to potential risks (13). Scott et al., in their study evaluating AI-generated responses to urology patient messages, noted that ChatGPT performed better on simple questions compared to complex ones, suggesting its potential to assist care teams (14). A recent systematic review examining the use of LLMs in patient care underscored the need for caution due to the uncertainties inherent in this technology (15).
Furthermore, ethical considerations must be addressed, particularly concerning the reliance on AI tools without professional Supervision. As AI systems evolve, ensuring transparency in source attribution and decision-making logic becomes essential. Healthcare professionals must be aware of the limitations of these tools and use them as supplementary rather than primary decision-making instruments.
Studies involving LLMs must take into account several limitations. First, the instability of the platforms used, their ongoing development, and their potential for rapid evolution over time highlight the necessity of interpreting findings based on the specific conditions of the platforms at the time of the study. We emphasize that our study focused on basic and straightforward questions, with responses summarized in paragraph form for evaluation. The likelihood of inaccuracies or misleading information may increase with more complex and lengthy responses. Since we aimed to investigate basic questions in emergency scenarios, we believe it would be inappropriate to conclude complex urological emergency conditions based on these questions and answers. Given the continuous advancement and widespread adoption of these platforms, we consider it crucial to assess and research their accuracy and reliability consistently.