eISSN: 3023-6940
  • Home
  • Comparing ChatGPT and MSKCC Nomogram for Prostate Cancer Risk Predictions: A Correlation Study
E-SUBMISSION

Original Research

Comparing ChatGPT and MSKCC Nomogram for Prostate Cancer Risk Predictions: A Correlation Study


1 Department of Urology, Bağcılar Training and Research Hospital, İstanbul, Türkiye
2 Department of Urology, Gaziosmanpaşa Training and Research Hospital, İstanbul, Türkiye


DOI : 10.33719/nju1759024
New J Urol. 2025;20(3):201-207.

Abstract

Objectives: Accurate prediction of risks such as extracapsular spread, seminal vesicle invasion and lymph node involvement is critical for treatment planning and patient prognosis in prostate cancer. Traditional nomograms are widely used for this risk stratification. In recent years, artificial intelligence (AI)-based Chabot’s have shown potential in this field. The aim of this study was to evaluate the correlation between AI chatbot (ChatGPT-4o) predictions and Memorial Sloan Kettering Cancer Center (MSKCC) nomogram predictions in prostate cancer patients according to risk groups.
Materials and Methods: 40 synthetic patient scenarios representing low, intermediate, high and locally advanced risk groups were created. These scenarios were entered into both ChatGPT-4o and MSKCC nomogram and predictions of “Organ-Confined Disease”, “Extracapsular Extension”, “Seminal Vesicle Invasion” and “Lymph Node Involvement” were obtained. The obtained data were analyzed using Spearman Correlation Coefficient.
Results: In general, there was a significant positive correlation between ChatGPT-4o and MSKCC nomogram in all prediction topics (p < 0.001). However, no significant correlation was found between the predictions of “Organ-Confined Disease” (r = 0.521, p = 0.123), “Seminal Vesicle Invasion” (r = 0.382, p = 0.276) and “Lymph Node Involvement” (r = 0.218, p = 0.546) in the high-risk patient group. Similarly, no significant correlation was found between the estimates of “Organ-Confined Disease” (r = 0.522, p = 0.122) and “Extracapsular Extension” (r = 0.524, p = 0.120) in the locally advanced patient group.
Conclusion: An overall high correlation between an AI-based chatbot (ChatGPT-4o) and the MSKCC nomogram was demonstrated for prostate cancer risk prediction. However, no significant correlation was observed especially in high-risk and locally advanced patient groups. These findings suggest that while AI chatbots are a potential tool for prostate cancer risk stratification, they require extensive validation and development studies before they can be put into clinical use, especially in more complex and advanced cases.


Abstract

Objectives: Accurate prediction of risks such as extracapsular spread, seminal vesicle invasion and lymph node involvement is critical for treatment planning and patient prognosis in prostate cancer. Traditional nomograms are widely used for this risk stratification. In recent years, artificial intelligence (AI)-based Chabot’s have shown potential in this field. The aim of this study was to evaluate the correlation between AI chatbot (ChatGPT-4o) predictions and Memorial Sloan Kettering Cancer Center (MSKCC) nomogram predictions in prostate cancer patients according to risk groups.
Materials and Methods: 40 synthetic patient scenarios representing low, intermediate, high and locally advanced risk groups were created. These scenarios were entered into both ChatGPT-4o and MSKCC nomogram and predictions of “Organ-Confined Disease”, “Extracapsular Extension”, “Seminal Vesicle Invasion” and “Lymph Node Involvement” were obtained. The obtained data were analyzed using Spearman Correlation Coefficient.
Results: In general, there was a significant positive correlation between ChatGPT-4o and MSKCC nomogram in all prediction topics (p < 0.001). However, no significant correlation was found between the predictions of “Organ-Confined Disease” (r = 0.521, p = 0.123), “Seminal Vesicle Invasion” (r = 0.382, p = 0.276) and “Lymph Node Involvement” (r = 0.218, p = 0.546) in the high-risk patient group. Similarly, no significant correlation was found between the estimates of “Organ-Confined Disease” (r = 0.522, p = 0.122) and “Extracapsular Extension” (r = 0.524, p = 0.120) in the locally advanced patient group.
Conclusion: An overall high correlation between an AI-based chatbot (ChatGPT-4o) and the MSKCC nomogram was demonstrated for prostate cancer risk prediction. However, no significant correlation was observed especially in high-risk and locally advanced patient groups. These findings suggest that while AI chatbots are a potential tool for prostate cancer risk stratification, they require extensive validation and development studies before they can be put into clinical use, especially in more complex and advanced cases.

INTRODUCTION

Prostate cancer (PCa) is the third most commonly diagnosed malignancy worldwide and it represents the most prevalent tumor of the male genitourinary system (1,2). The prognosis of the disease varies greatly depending on the stage and biologic characteristics at the time of diagnosis (3-6). Accurate prediction of the risks of extracapsular extension (ECE), seminal vesicle invasion (SVI) and lymph node involvement (LNI) is crucial for treatment planning and patient prognosis (5,6). Various nomograms are developed for preoperative risk stratification in prostate cancer using easily accessible parameters such as age, serum prostate specific antigen (PSA) level, Gleason score, clinical stage and number of biopsy positive cores (7-10). Among these nomograms, the Memorial Sloan Kettering Cancer Center (MSKCC) nomogram is one that has been validated on large patient cohorts and is widely used in clinical practice. The MSKCC is used as an important clinical guide for the prediction of ECE, SVI and LNI risks in prostate cancer patients (8).

In recent years, artificial intelligence (AI) and large language models (LLMs) have become increasingly widespread in the medical field and have attracted interest as potential support tools in the diagnosis and treatment of diseases (11). One of these models, ChatGPT-4o, can answer complex medical questions based on user inputs and produce outputs similar to clinical decision support systems (12,13). However, there is limited data in the literature on the extent to which such chatbots provide predictions that are compatible with traditional nomograms.

In this study, we aimed to compare the ECE, SVI, LNI and organ-confined disease (OCD) predictions of ChatGPT-4o and MSKCC nomograms over scenarios created according to De Amico risk classification in patients with prostate cancer and evaluate the correlation between them.


INTRODUCTION

Prostate cancer (PCa) is the third most commonly diagnosed malignancy worldwide and it represents the most prevalent tumor of the male genitourinary system (1,2). The prognosis of the disease varies greatly depending on the stage and biologic characteristics at the time of diagnosis (3-6). Accurate prediction of the risks of extracapsular extension (ECE), seminal vesicle invasion (SVI) and lymph node involvement (LNI) is crucial for treatment planning and patient prognosis (5,6). Various nomograms are developed for preoperative risk stratification in prostate cancer using easily accessible parameters such as age, serum prostate specific antigen (PSA) level, Gleason score, clinical stage and number of biopsy positive cores (7-10). Among these nomograms, the Memorial Sloan Kettering Cancer Center (MSKCC) nomogram is one that has been validated on large patient cohorts and is widely used in clinical practice. The MSKCC is used as an important clinical guide for the prediction of ECE, SVI and LNI risks in prostate cancer patients (8).

In recent years, artificial intelligence (AI) and large language models (LLMs) have become increasingly widespread in the medical field and have attracted interest as potential support tools in the diagnosis and treatment of diseases (11). One of these models, ChatGPT-4o, can answer complex medical questions based on user inputs and produce outputs similar to clinical decision support systems (12,13). However, there is limited data in the literature on the extent to which such chatbots provide predictions that are compatible with traditional nomograms.

In this study, we aimed to compare the ECE, SVI, LNI and organ-confined disease (OCD) predictions of ChatGPT-4o and MSKCC nomograms over scenarios created according to De Amico risk classification in patients with prostate cancer and evaluate the correlation between them.

MATERIALS AND METHODS

Study Design
This study is a comparative analysis designed using prospectively generated synthetic patient scenarios to compare the results provided by ChatGPT-4o and the traditional MSKCC nomogram in preoperative risk prediction in prostate cancer patients. The study was structured to represent low, intermediate, high and locally advanced risk groups according to the De Amico risk classification.

Creating Patient Scenarios
A total of 40 synthetic patient scenarios were created for the study, reflecting clinical practice and representing different risk groups in accordance with De Amico risk criteria. Each scenario was meticulously designed to include the essential preoperative data required for prostate cancer risk prediction. These data include the following:
Patient Age: Indicated in years
Serum PSA Level: expressed in ng/mL.
Biopsy Gleason Score: Indicated as [for example, 3+4=7] with primary and secondary patterns.
Clinical Stage: According to TNM staging system [for example, cT2a, cT3b].
Number of Positive Biopsy Cores: The number of cores containing cancer among the total number of cores taken.

These scenarios were created by considering typical patient profiles from clinical databases and existing literature, thus providing a diversity similar to real-world cases at different risk levels (Table 1).

Data Collection and Analysis Tools
The following prediction data were obtained for each synthetic patient scenario:
MSKCC Nomogram: Preoperative data from each patient scenario were entered into the publicly available MSKCC nomogram web-based calculator (https://www.mskcc.org/nomograms/prostate/pre_op) to obtain the following risk estimates
Probability of OCD: In percent (%).
ECE Probability: In percent (%).
SVI Probability: In percent (%).
LNI Probability: In percent (%).

Artificial Intelligence Chatbot (ChatGPT-4o)
The same patient scenarios were entered into the ChatGPT4o (OpenAI, San Francisco, CA, USA) model to request risk estimates. Data entry was done using a specific and standardized prompt for each scenario. An example prompt structure is as follows:

“Given the following clinical information for a prostate cancer patient, would you estimate the risks of organ-confined disease, extracapsular spread, seminal vesicle invasion, and lymph node involvement as a percentage?
Age: [Patient Age] years
PSA [PSA Value] ng/mL
Gleason Score: [Gleason Score]
Clinical Stage: [Clinical Stage]
Number of Positive Biopsy Cores: [Number of Positive Cores]”

The estimates generated by ChatGPT-4o (probabilities of OCD, ECE, SVI, LNI) were recorded.

Statistical Analysis
Statistical analysis of the data obtained was performed using SPSS Statistics Version 28.0 (IBM Corp., Armonk, NY, USA). Quantitative data are presented as median and interquartile range. The Kolmogorov-Smirnov test was used to determine the normal distribution of the data. The strength and direction of the relationship between ChatGPT-4o estimates and MSKCC nomogram estimates were assessed using the Spearman Correlation Coefficient (r). Correlation analyses were performed separately in each risk group (low, intermediate, high and locally advanced risk) as well as in the overall patient group. Statistical significance level was accepted as p≤0.05 in all analyses.

Ethical Statement
Since this study used synthetically generated patient scenarios instead of real patient data, ethics committee approval was not required. The study was conducted in accordance with general research ethical principles.


MATERIALS AND METHODS

Study Design
This study is a comparative analysis designed using prospectively generated synthetic patient scenarios to compare the results provided by ChatGPT-4o and the traditional MSKCC nomogram in preoperative risk prediction in prostate cancer patients. The study was structured to represent low, intermediate, high and locally advanced risk groups according to the De Amico risk classification.

Creating Patient Scenarios
A total of 40 synthetic patient scenarios were created for the study, reflecting clinical practice and representing different risk groups in accordance with De Amico risk criteria. Each scenario was meticulously designed to include the essential preoperative data required for prostate cancer risk prediction. These data include the following:
Patient Age: Indicated in years
Serum PSA Level: expressed in ng/mL.
Biopsy Gleason Score: Indicated as [for example, 3+4=7] with primary and secondary patterns.
Clinical Stage: According to TNM staging system [for example, cT2a, cT3b].
Number of Positive Biopsy Cores: The number of cores containing cancer among the total number of cores taken.

These scenarios were created by considering typical patient profiles from clinical databases and existing literature, thus providing a diversity similar to real-world cases at different risk levels (Table 1).

Data Collection and Analysis Tools
The following prediction data were obtained for each synthetic patient scenario:
MSKCC Nomogram: Preoperative data from each patient scenario were entered into the publicly available MSKCC nomogram web-based calculator (https://www.mskcc.org/nomograms/prostate/pre_op) to obtain the following risk estimates
Probability of OCD: In percent (%).
ECE Probability: In percent (%).
SVI Probability: In percent (%).
LNI Probability: In percent (%).

Artificial Intelligence Chatbot (ChatGPT-4o)
The same patient scenarios were entered into the ChatGPT4o (OpenAI, San Francisco, CA, USA) model to request risk estimates. Data entry was done using a specific and standardized prompt for each scenario. An example prompt structure is as follows:

“Given the following clinical information for a prostate cancer patient, would you estimate the risks of organ-confined disease, extracapsular spread, seminal vesicle invasion, and lymph node involvement as a percentage?
Age: [Patient Age] years
PSA [PSA Value] ng/mL
Gleason Score: [Gleason Score]
Clinical Stage: [Clinical Stage]
Number of Positive Biopsy Cores: [Number of Positive Cores]”

The estimates generated by ChatGPT-4o (probabilities of OCD, ECE, SVI, LNI) were recorded.

Statistical Analysis
Statistical analysis of the data obtained was performed using SPSS Statistics Version 28.0 (IBM Corp., Armonk, NY, USA). Quantitative data are presented as median and interquartile range. The Kolmogorov-Smirnov test was used to determine the normal distribution of the data. The strength and direction of the relationship between ChatGPT-4o estimates and MSKCC nomogram estimates were assessed using the Spearman Correlation Coefficient (r). Correlation analyses were performed separately in each risk group (low, intermediate, high and locally advanced risk) as well as in the overall patient group. Statistical significance level was accepted as p≤0.05 in all analyses.

Ethical Statement
Since this study used synthetically generated patient scenarios instead of real patient data, ethics committee approval was not required. The study was conducted in accordance with general research ethical principles.

RESULTS

Considering all 40 patient scenarios, overall significant positive correlation was found between the predictions provided by ChatGPT-4o and the MSKCC nomogram. In particular, a strong correlation (r=0.971, p<0.001) was found between the OCD predictions. Similarly, ECE (r=0.979, p<0.001), SVI (r=0.976, p<0.001) and LNI (r=0.972, p<0.001) predictions also exhibited generally high and significant positive correlations (Table 2).

Risk group-specific differences were observed in the analyses conducted by risk groups. In the low-risk patient group, significant positive correlations were found between OCD (r=0.780, p=0.008), ECE (r=0.872, p=0.001) and SVI (r=0.504, p=0.137) predictions. However, no significant correlation was observed in the LNI prediction (r=0.272, p=0.447). In the intermediate-risk patient group, significant positive correlations were found between ChatGPT-4o and MSKCC nomogram in all prediction topics. OCD (r=0.851, p=0.002), ECE (r=0.851, p=0.002), SVI (r=0.936, p<0.001) and LNI (r=0.873, p<0.001) predictions showed a high degree of agreement. No statistically significant correlation was found between the predictions of OCD (r=0.521, p=0.123), SVI (r=0.382, p=0.276) and LNI (r=0.218, p=0.546) in the high-risk patient group (p>0.05). However, a significant correlation was found in the ECE prediction (r=0.737, p=0.015). In the locally advanced patient group, no significant correlation was detected between OCD (r=0.522, p=0.122) and ECE (r=0.524, p=0.120) estimates (p>0.05). However, strong and significant correlations were observed between the MSKCC nomogram and ChatGPT-4o for SVI (r=0.888, p<0.001) and LNI (r=0.899, p<0.001).


RESULTS

Considering all 40 patient scenarios, overall significant positive correlation was found between the predictions provided by ChatGPT-4o and the MSKCC nomogram. In particular, a strong correlation (r=0.971, p<0.001) was found between the OCD predictions. Similarly, ECE (r=0.979, p<0.001), SVI (r=0.976, p<0.001) and LNI (r=0.972, p<0.001) predictions also exhibited generally high and significant positive correlations (Table 2).

Risk group-specific differences were observed in the analyses conducted by risk groups. In the low-risk patient group, significant positive correlations were found between OCD (r=0.780, p=0.008), ECE (r=0.872, p=0.001) and SVI (r=0.504, p=0.137) predictions. However, no significant correlation was observed in the LNI prediction (r=0.272, p=0.447). In the intermediate-risk patient group, significant positive correlations were found between ChatGPT-4o and MSKCC nomogram in all prediction topics. OCD (r=0.851, p=0.002), ECE (r=0.851, p=0.002), SVI (r=0.936, p<0.001) and LNI (r=0.873, p<0.001) predictions showed a high degree of agreement. No statistically significant correlation was found between the predictions of OCD (r=0.521, p=0.123), SVI (r=0.382, p=0.276) and LNI (r=0.218, p=0.546) in the high-risk patient group (p>0.05). However, a significant correlation was found in the ECE prediction (r=0.737, p=0.015). In the locally advanced patient group, no significant correlation was detected between OCD (r=0.522, p=0.122) and ECE (r=0.524, p=0.120) estimates (p>0.05). However, strong and significant correlations were observed between the MSKCC nomogram and ChatGPT-4o for SVI (r=0.888, p<0.001) and LNI (r=0.899, p<0.001).

DISCUSSION

The use of AI models in medicine is rapidly increasing, and various studies have been conducted in prostate cancer prognostic predictions (11). In the existing literature, AI is reported to show promising results in prostate cancer diagnosis and staging by combining imaging, pathology and clinical data (11,14). However, studies directly comparing AI chatbots with clinical risk nomograms and examining performance differences, especially in complex patient groups, are limited. Our study is an important step towards filling the knowledge gap in this field and emphasizes the need for a careful validation process before clinical use of AI. Considering that traditional nomograms have undergone years of validation based on specific clinical parameters, AI needs to be tested with similar rigor.

This study focused on the comparison of the predictions provided by ChatGPT-4o, an AI-based chatbot, and the MSKCC nomogram commonly used in clinical practice for preoperative risk prediction in prostate cancer. Our findings revealed that ChatGPT-4o were highly correlated with nomograms in general, but exhibited significant inconsistencies in certain prediction topics, especially in high-risk and locally advanced patient groups. These results are critical to understanding the potential and current limitations of AI-based tools in clinical practice.
The overall analysis of our study showed high and significant positive correlations between ChatGPT-4o and the MSKCC nomogram for OCD, ECE, SVI and LNI. In particular, a strong correlation was found between OCD predictions; similarly, ECE, SVI and LNI predictions also exhibited overall high and significant positive correlations. This finding suggests that ChatGPT-4o can produce similar outputs to traditional methods in complex clinical decision support processes such as prostate cancer risk prediction, thanks to their capacity to learn from large data sets. The strong correlations observed in the low- and intermediate-risk patient groups also support this potential, as in these groups, except for the LNI prediction in the low-risk group, all other predictions showed significant correlations. However, the most striking findings of our study are the discrepancies in the high-risk and locally advanced patient groups. In the high-risk group, there was no statistically significant correlation between the estimates of OCD, SVI and LNI. Similarly, no significant correlation was found in the predictions of OCD and ECE in the locally advanced patient group. Similarly, no significant correlation was found in the predictions of OCD and ECE in the locally advanced patient group. This suggests that ChatGPT-4o may not produce as reliable predictions as traditional nomograms, especially when the disease is more advanced and complex. The discrepancies observed in the high-risk and locally advanced groups may be explained by several factors. Large language models such as ChatGPT-4o are primarily trained on general internet-based sources rather than curated, domain-specific medical datasets. As a result, their ability to accurately represent rare or complex clinical scenarios remains limited. Nomograms, in contrast, are derived from large patient cohorts with detailed clinical and pathological annotations, allowing them to more precisely model the heterogeneity of advanced disease. In these groups, tumor biology is often more aggressive and unpredictable, with greater variability in features such as extracapsular spread patterns, seminal vesicle involvement, and nodal dissemination. Subtle distinctions in staging parameters (e.g., between cT3a and cT3b disease) may translate into markedly different risk profiles, but such nuances are difficult for a language-based model to capture without access to structured radiological, pathological, or molecular data. Furthermore, while ChatGPT-4o generates probability estimates by identifying linguistic patterns, it lacks true comprehension of the underlying pathophysiological mechanisms. These limitations collectively help to explain the reduced concordance with nomogram predictions in the most clinically complex patient groups.

The fact that our study provides a controlled comparison using synthetic patient scenarios representing risk groups eliminates the variability in real patient data and allows direct comparison of ChatGPT-4o and nomogram outputs. Furthermore, the reference to MSKCC, a validated nomogram widely used in clinical practice, increases the clinical validity of the results. On the other hand, the study has some limitations. The use of synthetic patient scenarios may not fully reflect the heterogeneity and clinical nuances of real-world patient populations. The use of only a single AI chatbot (ChatGPT-4o) and a single nomogram (MSKCC) may limit the generalizability of the results. Furthermore, although 40 patient scenarios were sufficient for statistical analyses, the smaller number of cases, especially in subgroups (10 scenarios in each risk group), may have led to smaller correlations not being statistically significant.

In conclusion, our findings suggest that ChatGPT-4o may be a promising tool in the field of prostate cancer risk prediction, but exhibit significant inconsistencies compared to existing nomograms, especially in complex scenarios such as high-risk and locally advanced disease. These findings emphasize the need for extensive validation and development studies on larger and real patient cohorts before AI can be widely used in clinical practice. Future research should focus on the specific training of AI models with medical data and their integration as a decision support tool for physicians.


DISCUSSION

The use of AI models in medicine is rapidly increasing, and various studies have been conducted in prostate cancer prognostic predictions (11). In the existing literature, AI is reported to show promising results in prostate cancer diagnosis and staging by combining imaging, pathology and clinical data (11,14). However, studies directly comparing AI chatbots with clinical risk nomograms and examining performance differences, especially in complex patient groups, are limited. Our study is an important step towards filling the knowledge gap in this field and emphasizes the need for a careful validation process before clinical use of AI. Considering that traditional nomograms have undergone years of validation based on specific clinical parameters, AI needs to be tested with similar rigor.

This study focused on the comparison of the predictions provided by ChatGPT-4o, an AI-based chatbot, and the MSKCC nomogram commonly used in clinical practice for preoperative risk prediction in prostate cancer. Our findings revealed that ChatGPT-4o were highly correlated with nomograms in general, but exhibited significant inconsistencies in certain prediction topics, especially in high-risk and locally advanced patient groups. These results are critical to understanding the potential and current limitations of AI-based tools in clinical practice.
The overall analysis of our study showed high and significant positive correlations between ChatGPT-4o and the MSKCC nomogram for OCD, ECE, SVI and LNI. In particular, a strong correlation was found between OCD predictions; similarly, ECE, SVI and LNI predictions also exhibited overall high and significant positive correlations. This finding suggests that ChatGPT-4o can produce similar outputs to traditional methods in complex clinical decision support processes such as prostate cancer risk prediction, thanks to their capacity to learn from large data sets. The strong correlations observed in the low- and intermediate-risk patient groups also support this potential, as in these groups, except for the LNI prediction in the low-risk group, all other predictions showed significant correlations. However, the most striking findings of our study are the discrepancies in the high-risk and locally advanced patient groups. In the high-risk group, there was no statistically significant correlation between the estimates of OCD, SVI and LNI. Similarly, no significant correlation was found in the predictions of OCD and ECE in the locally advanced patient group. Similarly, no significant correlation was found in the predictions of OCD and ECE in the locally advanced patient group. This suggests that ChatGPT-4o may not produce as reliable predictions as traditional nomograms, especially when the disease is more advanced and complex. The discrepancies observed in the high-risk and locally advanced groups may be explained by several factors. Large language models such as ChatGPT-4o are primarily trained on general internet-based sources rather than curated, domain-specific medical datasets. As a result, their ability to accurately represent rare or complex clinical scenarios remains limited. Nomograms, in contrast, are derived from large patient cohorts with detailed clinical and pathological annotations, allowing them to more precisely model the heterogeneity of advanced disease. In these groups, tumor biology is often more aggressive and unpredictable, with greater variability in features such as extracapsular spread patterns, seminal vesicle involvement, and nodal dissemination. Subtle distinctions in staging parameters (e.g., between cT3a and cT3b disease) may translate into markedly different risk profiles, but such nuances are difficult for a language-based model to capture without access to structured radiological, pathological, or molecular data. Furthermore, while ChatGPT-4o generates probability estimates by identifying linguistic patterns, it lacks true comprehension of the underlying pathophysiological mechanisms. These limitations collectively help to explain the reduced concordance with nomogram predictions in the most clinically complex patient groups.

The fact that our study provides a controlled comparison using synthetic patient scenarios representing risk groups eliminates the variability in real patient data and allows direct comparison of ChatGPT-4o and nomogram outputs. Furthermore, the reference to MSKCC, a validated nomogram widely used in clinical practice, increases the clinical validity of the results. On the other hand, the study has some limitations. The use of synthetic patient scenarios may not fully reflect the heterogeneity and clinical nuances of real-world patient populations. The use of only a single AI chatbot (ChatGPT-4o) and a single nomogram (MSKCC) may limit the generalizability of the results. Furthermore, although 40 patient scenarios were sufficient for statistical analyses, the smaller number of cases, especially in subgroups (10 scenarios in each risk group), may have led to smaller correlations not being statistically significant.

In conclusion, our findings suggest that ChatGPT-4o may be a promising tool in the field of prostate cancer risk prediction, but exhibit significant inconsistencies compared to existing nomograms, especially in complex scenarios such as high-risk and locally advanced disease. These findings emphasize the need for extensive validation and development studies on larger and real patient cohorts before AI can be widely used in clinical practice. Future research should focus on the specific training of AI models with medical data and their integration as a decision support tool for physicians.

CONCLUSION

Overall high correlation between ChatGPT-4o and the MSKCC nomogram was demonstrated for prostate cancer risk prediction. However, no significant correlation was observed especially in high-risk and locally advanced patient groups. These findings suggest that while AI chatbots are a potential tool for prostate cancer risk stratification, they require extensive validation and development studies before they can be put into clinical use, especially in more complex and advanced cases.


CONCLUSION

Overall high correlation between ChatGPT-4o and the MSKCC nomogram was demonstrated for prostate cancer risk prediction. However, no significant correlation was observed especially in high-risk and locally advanced patient groups. These findings suggest that while AI chatbots are a potential tool for prostate cancer risk stratification, they require extensive validation and development studies before they can be put into clinical use, especially in more complex and advanced cases.

Acknowledgement

Conflict of Interest: There is no conflict of interest for the all autors.

Funding/Financial Disclosure: No financial support was received for this study.

EthicalApproval: Since this study used synthetically generated patient scenarios instead of real patient data, ethics committee approval was not required. The study was conducted in accordance with general research ethical principles.

Author Contributions: Concept and Design: SG, MGK, SK, FP, SY.  Supervision: EK. Data Collection and/or Analysis: SG, MGK, SK, FP, SY. Analysis and/or Interpretation: SG, MGK, SK, FP, SY. LiteratureSearch: SG, MGK, SK, FP, SY. Writing: SG, MGK, SK, FP, SY. Critical Review: SG, MGK, SK, FP, SY, EK.


Acknowledgement

Conflict of Interest: There is no conflict of interest for the all autors.

Funding/Financial Disclosure: No financial support was received for this study.

EthicalApproval: Since this study used synthetically generated patient scenarios instead of real patient data, ethics committee approval was not required. The study was conducted in accordance with general research ethical principles.

Author Contributions: Concept and Design: SG, MGK, SK, FP, SY.  Supervision: EK. Data Collection and/or Analysis: SG, MGK, SK, FP, SY. Analysis and/or Interpretation: SG, MGK, SK, FP, SY. LiteratureSearch: SG, MGK, SK, FP, SY. Writing: SG, MGK, SK, FP, SY. Critical Review: SG, MGK, SK, FP, SY, EK.

REFERENCES

1.    Culp MB, Soerjomataram I, Efstathiou JA, Bray F, Jemal A. Recent Global Patterns in Prostate Cancer Incidence and Mortality Rates. Eur Urol. 2020;77(1):38-52. https://doi.org/10.1016/j.eururo.2019.08.005 
2.    Sung H, Ferlay J, Siegel R.L, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209-249. https://doi.org/10.3322/caac.21660
3.    Wilczak W, Wittmer C, Clauditz T, Minner S, Steurer S, Büscheck F, et al. Marked Prognostic Impact of Minimal Lymphatic Tumor Spread in Prostate Cancer. Eur Urol. 2018;74(3):376-386. https://doi.org/10.1016/j.eururo.2018.05.034
4.    Cornford P, van den Bergh RCN, Briers E, Van den Broeck T, Brunckhorst O, Darraugh J, et al. EAU-EANM-ESTRO-ESUR-ISUP-SIOG Guidelines on Prostate Cancer-2024 Update. Part I: Screening, Diagnosis, and Local Treatment with Curative Intent. Eur Urol. 2024;86(2):148-163. https://doi.org/10.1016/j.eururo.2024.03.027
5.    Mikel Hubanks J, Boorjian SA, Frank I, Gettman MT, Houston Thompson R, Rangel LJ, et al. The presence of extracapsular extension is associated with an increased risk of death from prostate cancer after radical prostatectomy for patients with seminal vesicle invasion and negative lymph nodes. Urol Oncol. 2014;32(1):26.e1-7. https://doi.org/10.1016/j.urolonc.2012.09.002
6.    Tollefson MK, Karnes RJ, Rangel LJ, Bergstralh EJ, Boorjian SA. The impact of clinical stage on prostate cancer survival following radical prostatectomy. J Urol. 2013;189(5):1707-12. https://doi.org/10.1016/j.juro.2012.11.065
7.    Eifler JB, Feng Z, Lin BM, Partin MT, Humphreys EB, Han M, et al. An updated prostate cancer staging nomogram (Partin tables) based on cases from 2006 to 2011. BJU Int. 2013;111(1):22-9. https://doi.org/10.1111/j.1464-410X.2012.11324.x
8.    Ohori M, Kattan MW, Koh H, Maru N, Slawin KM, Shariat S, Muramoto M, Reuter VE, Wheeler TM, Scardino PT. Predicting the presence and side of extracapsular extension: a nomogram for staging prostate cancer. J Urol. 2004;171(5):1844-9; discussion 1849. https://doi.org/10.1097/01.ju.0000121693.05077.3d
9.    Cimino S, Reale G, Castelli T, Favilla V, Giardina R, Russo GI, et al. Comparison between Briganti, Partin and MSKCC tools in predicting positive lymph nodes in prostate cancer: a systematic review and meta-analysis. Scand J Urol. 2017;51(5):345-350. https://doi.org/10.1080/21681805.2017.1332680
10.    Huang C, Song G, Wang H, Lin Z, Wang H, Ji G, et al. Preoperative PI-RADS Version 2 scores helps improve accuracy of clinical nomograms for predicting pelvic lymph node metastasis at radical prostatectomy. Prostate Cancer Prostatic Dis. 2020;23:116–26. https://doi.org/10.1038/s41391-019-0164-z
11.    Wang H, Xia Z, Xu Y, Sun J, Wu J. The predictive value of machine learning and nomograms for lymph node metastasis of prostate cancer: a systematic review and meta-analysis. Prostate Cancer Prostatic Dis. 2023;26(3):602-613. https://doi.org/10.1038/s41391-023-00704-z
12.    Görtz M, Baumgärtner K, Schmid T, Muschko M, Woessner P, Gerlach A, et al. An artificial intelligence-based chatbot for prostate cancer education: Design and patient evaluation study. Digit Health. 2023;9:20552076231173304. https://doi.org/10.1177/20552076231173304
13.    Belge Bilgin G, Bilgin C, Childs DS, Orme JJ, Burkett BJ, Packard AT, et al. Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy. Front Oncol. 2024;14:1386718. https://doi.org/10.3389/fonc.2024.1386718
14.    Twilt JJ, van Leeuwen KG, Huisman HJ, Fütterer JJ, de Rooij M. Artificial Intelligence Based Algorithms for Prostate Cancer Classification and Detection on Magnetic Resonance Imaging: A Narrative Review. Diagnostics (Basel). 2021;11(6):959. https://doi.org/10.3390/diagnostics11060959
 


REFERENCES

1.    Culp MB, Soerjomataram I, Efstathiou JA, Bray F, Jemal A. Recent Global Patterns in Prostate Cancer Incidence and Mortality Rates. Eur Urol. 2020;77(1):38-52. https://doi.org/10.1016/j.eururo.2019.08.005 
2.    Sung H, Ferlay J, Siegel R.L, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209-249. https://doi.org/10.3322/caac.21660
3.    Wilczak W, Wittmer C, Clauditz T, Minner S, Steurer S, Büscheck F, et al. Marked Prognostic Impact of Minimal Lymphatic Tumor Spread in Prostate Cancer. Eur Urol. 2018;74(3):376-386. https://doi.org/10.1016/j.eururo.2018.05.034
4.    Cornford P, van den Bergh RCN, Briers E, Van den Broeck T, Brunckhorst O, Darraugh J, et al. EAU-EANM-ESTRO-ESUR-ISUP-SIOG Guidelines on Prostate Cancer-2024 Update. Part I: Screening, Diagnosis, and Local Treatment with Curative Intent. Eur Urol. 2024;86(2):148-163. https://doi.org/10.1016/j.eururo.2024.03.027
5.    Mikel Hubanks J, Boorjian SA, Frank I, Gettman MT, Houston Thompson R, Rangel LJ, et al. The presence of extracapsular extension is associated with an increased risk of death from prostate cancer after radical prostatectomy for patients with seminal vesicle invasion and negative lymph nodes. Urol Oncol. 2014;32(1):26.e1-7. https://doi.org/10.1016/j.urolonc.2012.09.002
6.    Tollefson MK, Karnes RJ, Rangel LJ, Bergstralh EJ, Boorjian SA. The impact of clinical stage on prostate cancer survival following radical prostatectomy. J Urol. 2013;189(5):1707-12. https://doi.org/10.1016/j.juro.2012.11.065
7.    Eifler JB, Feng Z, Lin BM, Partin MT, Humphreys EB, Han M, et al. An updated prostate cancer staging nomogram (Partin tables) based on cases from 2006 to 2011. BJU Int. 2013;111(1):22-9. https://doi.org/10.1111/j.1464-410X.2012.11324.x
8.    Ohori M, Kattan MW, Koh H, Maru N, Slawin KM, Shariat S, Muramoto M, Reuter VE, Wheeler TM, Scardino PT. Predicting the presence and side of extracapsular extension: a nomogram for staging prostate cancer. J Urol. 2004;171(5):1844-9; discussion 1849. https://doi.org/10.1097/01.ju.0000121693.05077.3d
9.    Cimino S, Reale G, Castelli T, Favilla V, Giardina R, Russo GI, et al. Comparison between Briganti, Partin and MSKCC tools in predicting positive lymph nodes in prostate cancer: a systematic review and meta-analysis. Scand J Urol. 2017;51(5):345-350. https://doi.org/10.1080/21681805.2017.1332680
10.    Huang C, Song G, Wang H, Lin Z, Wang H, Ji G, et al. Preoperative PI-RADS Version 2 scores helps improve accuracy of clinical nomograms for predicting pelvic lymph node metastasis at radical prostatectomy. Prostate Cancer Prostatic Dis. 2020;23:116–26. https://doi.org/10.1038/s41391-019-0164-z
11.    Wang H, Xia Z, Xu Y, Sun J, Wu J. The predictive value of machine learning and nomograms for lymph node metastasis of prostate cancer: a systematic review and meta-analysis. Prostate Cancer Prostatic Dis. 2023;26(3):602-613. https://doi.org/10.1038/s41391-023-00704-z
12.    Görtz M, Baumgärtner K, Schmid T, Muschko M, Woessner P, Gerlach A, et al. An artificial intelligence-based chatbot for prostate cancer education: Design and patient evaluation study. Digit Health. 2023;9:20552076231173304. https://doi.org/10.1177/20552076231173304
13.    Belge Bilgin G, Bilgin C, Childs DS, Orme JJ, Burkett BJ, Packard AT, et al. Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy. Front Oncol. 2024;14:1386718. https://doi.org/10.3389/fonc.2024.1386718
14.    Twilt JJ, van Leeuwen KG, Huisman HJ, Fütterer JJ, de Rooij M. Artificial Intelligence Based Algorithms for Prostate Cancer Classification and Detection on Magnetic Resonance Imaging: A Narrative Review. Diagnostics (Basel). 2021;11(6):959. https://doi.org/10.3390/diagnostics11060959
 

Resources