Comparing ChatGPT and MSKCC Nomogram for Prostate Cancer Risk Predictions: A Correlation Study

Abstract

Objectives: Accurate prediction of risks such as extracapsular spread, seminal vesicle invasion and lymph node involvement is critical for treatment planning and patient prognosis in prostate cancer. Traditional nomograms are widely used for this risk stratification. In recent years, artificial intelligence (AI)-based Chabot’s have shown potential in this field. The aim of this study was to evaluate the correlation between AI chatbot (ChatGPT-4o) predictions and Memorial Sloan Kettering Cancer Center (MSKCC) nomogram predictions in prostate cancer patients according to risk groups.
Materials and Methods: 40 synthetic patient scenarios representing low, intermediate, high and locally advanced risk groups were created. These scenarios were entered into both ChatGPT-4o and MSKCC nomogram and predictions of “Organ-Confined Disease”, “Extracapsular Extension”, “Seminal Vesicle Invasion” and “Lymph Node Involvement” were obtained. The obtained data were analyzed using Spearman Correlation Coefficient.
Results: In general, there was a significant positive correlation between ChatGPT-4o and MSKCC nomogram in all prediction topics (p < 0.001). However, no significant correlation was found between the predictions of “Organ-Confined Disease” (r = 0.521, p = 0.123), “Seminal Vesicle Invasion” (r = 0.382, p = 0.276) and “Lymph Node Involvement” (r = 0.218, p = 0.546) in the high-risk patient group. Similarly, no significant correlation was found between the estimates of “Organ-Confined Disease” (r = 0.522, p = 0.122) and “Extracapsular Extension” (r = 0.524, p = 0.120) in the locally advanced patient group.
Conclusion: An overall high correlation between an AI-based chatbot (ChatGPT-4o) and the MSKCC nomogram was demonstrated for prostate cancer risk prediction. However, no significant correlation was observed especially in high-risk and locally advanced patient groups. These findings suggest that while AI chatbots are a potential tool for prostate cancer risk stratification, they require extensive validation and development studies before they can be put into clinical use, especially in more complex and advanced cases.

View

Abstract

Objectives: Accurate prediction of risks such as extracapsular spread, seminal vesicle invasion and lymph node involvement is critical for treatment planning and patient prognosis in prostate cancer. Traditional nomograms are widely used for this risk stratification. In recent years, artificial intelligence (AI)-based Chabot’s have shown potential in this field. The aim of this study was to evaluate the correlation between AI chatbot (ChatGPT-4o) predictions and Memorial Sloan Kettering Cancer Center (MSKCC) nomogram predictions in prostate cancer patients according to risk groups.
Materials and Methods: 40 synthetic patient scenarios representing low, intermediate, high and locally advanced risk groups were created. These scenarios were entered into both ChatGPT-4o and MSKCC nomogram and predictions of “Organ-Confined Disease”, “Extracapsular Extension”, “Seminal Vesicle Invasion” and “Lymph Node Involvement” were obtained. The obtained data were analyzed using Spearman Correlation Coefficient.
Results: In general, there was a significant positive correlation between ChatGPT-4o and MSKCC nomogram in all prediction topics (p < 0.001). However, no significant correlation was found between the predictions of “Organ-Confined Disease” (r = 0.521, p = 0.123), “Seminal Vesicle Invasion” (r = 0.382, p = 0.276) and “Lymph Node Involvement” (r = 0.218, p = 0.546) in the high-risk patient group. Similarly, no significant correlation was found between the estimates of “Organ-Confined Disease” (r = 0.522, p = 0.122) and “Extracapsular Extension” (r = 0.524, p = 0.120) in the locally advanced patient group.
Conclusion: An overall high correlation between an AI-based chatbot (ChatGPT-4o) and the MSKCC nomogram was demonstrated for prostate cancer risk prediction. However, no significant correlation was observed especially in high-risk and locally advanced patient groups. These findings suggest that while AI chatbots are a potential tool for prostate cancer risk stratification, they require extensive validation and development studies before they can be put into clinical use, especially in more complex and advanced cases.

INTRODUCTION

Prostate cancer (PCa) is the third most commonly diagnosed malignancy worldwide and it represents the most prevalent tumor of the male genitourinary system (1,2). The prognosis of the disease varies greatly depending on the stage and biologic characteristics at the time of diagnosis (3-6). Accurate prediction of the risks of extracapsular extension (ECE), seminal vesicle invasion (SVI) and lymph node involvement (LNI) is crucial for treatment planning and patient prognosis (5,6). Various nomograms are developed for preoperative risk stratification in prostate cancer using easily accessible parameters such as age, serum prostate specific antigen (PSA) level, Gleason score, clinical stage and number of biopsy positive cores (7-10). Among these nomograms, the Memorial Sloan Kettering Cancer Center (MSKCC) nomogram is one that has been validated on large patient cohorts and is widely used in clinical practice. The MSKCC is used as an important clinical guide for the prediction of ECE, SVI and LNI risks in prostate cancer patients (8).

In recent years, artificial intelligence (AI) and large language models (LLMs) have become increasingly widespread in the medical field and have attracted interest as potential support tools in the diagnosis and treatment of diseases (11). One of these models, ChatGPT-4o, can answer complex medical questions based on user inputs and produce outputs similar to clinical decision support systems (12,13). However, there is limited data in the literature on the extent to which such chatbots provide predictions that are compatible with traditional nomograms.

In this study, we aimed to compare the ECE, SVI, LNI and organ-confined disease (OCD) predictions of ChatGPT-4o and MSKCC nomograms over scenarios created according to De Amico risk classification in patients with prostate cancer and evaluate the correlation between them.

View

INTRODUCTION

Prostate cancer (PCa) is the third most commonly diagnosed malignancy worldwide and it represents the most prevalent tumor of the male genitourinary system (1,2). The prognosis of the disease varies greatly depending on the stage and biologic characteristics at the time of diagnosis (3-6). Accurate prediction of the risks of extracapsular extension (ECE), seminal vesicle invasion (SVI) and lymph node involvement (LNI) is crucial for treatment planning and patient prognosis (5,6). Various nomograms are developed for preoperative risk stratification in prostate cancer using easily accessible parameters such as age, serum prostate specific antigen (PSA) level, Gleason score, clinical stage and number of biopsy positive cores (7-10). Among these nomograms, the Memorial Sloan Kettering Cancer Center (MSKCC) nomogram is one that has been validated on large patient cohorts and is widely used in clinical practice. The MSKCC is used as an important clinical guide for the prediction of ECE, SVI and LNI risks in prostate cancer patients (8).

In recent years, artificial intelligence (AI) and large language models (LLMs) have become increasingly widespread in the medical field and have attracted interest as potential support tools in the diagnosis and treatment of diseases (11). One of these models, ChatGPT-4o, can answer complex medical questions based on user inputs and produce outputs similar to clinical decision support systems (12,13). However, there is limited data in the literature on the extent to which such chatbots provide predictions that are compatible with traditional nomograms.

In this study, we aimed to compare the ECE, SVI, LNI and organ-confined disease (OCD) predictions of ChatGPT-4o and MSKCC nomograms over scenarios created according to De Amico risk classification in patients with prostate cancer and evaluate the correlation between them.

MATERIALS AND METHODS

Study Design
This study is a comparative analysis designed using prospectively generated synthetic patient scenarios to compare the results provided by ChatGPT-4o and the traditional MSKCC nomogram in preoperative risk prediction in prostate cancer patients. The study was structured to represent low, intermediate, high and locally advanced risk groups according to the De Amico risk classification.

Creating Patient Scenarios
A total of 40 synthetic patient scenarios were created for the study, reflecting clinical practice and representing different risk groups in accordance with De Amico risk criteria. Each scenario was meticulously designed to include the essential preoperative data required for prostate cancer risk prediction. These data include the following:
Patient Age: Indicated in years
Serum PSA Level: expressed in ng/mL.
Biopsy Gleason Score: Indicated as [for example, 3+4=7] with primary and secondary patterns.
Clinical Stage: According to TNM staging system [for example, cT2a, cT3b].
Number of Positive Biopsy Cores: The number of cores containing cancer among the total number of cores taken.

These scenarios were created by considering typical patient profiles from clinical databases and existing literature, thus providing a diversity similar to real-world cases at different risk levels (Table 1).

Data Collection and Analysis Tools
The following prediction data were obtained for each synthetic patient scenario:
MSKCC Nomogram: Preoperative data from each patient scenario were entered into the publicly available MSKCC nomogram web-based calculator (https://www.mskcc.org/nomograms/prostate/pre_op) to obtain the following risk estimates
Probability of OCD: In percent (%).
ECE Probability: In percent (%).
SVI Probability: In percent (%).
LNI Probability: In percent (%).

Artificial Intelligence Chatbot (ChatGPT-4o)
The same patient scenarios were entered into the ChatGPT4o (OpenAI, San Francisco, CA, USA) model to request risk estimates. Data entry was done using a specific and standardized prompt for each scenario. An example prompt structure is as follows:

“Given the following clinical information for a prostate cancer patient, would you estimate the risks of organ-confined disease, extracapsular spread, seminal vesicle invasion, and lymph node involvement as a percentage?
Age: [Patient Age] years
PSA [PSA Value] ng/mL
Gleason Score: [Gleason Score]
Clinical Stage: [Clinical Stage]
Number of Positive Biopsy Cores: [Number of Positive Cores]”

The estimates generated by ChatGPT-4o (probabilities of OCD, ECE, SVI, LNI) were recorded.

Statistical Analysis
Statistical analysis of the data obtained was performed using SPSS Statistics Version 28.0 (IBM Corp., Armonk, NY, USA). Quantitative data are presented as median and interquartile range. The Kolmogorov-Smirnov test was used to determine the normal distribution of the data. The strength and direction of the relationship between ChatGPT-4o estimates and MSKCC nomogram estimates were assessed using the Spearman Correlation Coefficient (r). Correlation analyses were performed separately in each risk group (low, intermediate, high and locally advanced risk) as well as in the overall patient group. Statistical significance level was accepted as p≤0.05 in all analyses.

Ethical Statement
Since this study used synthetically generated patient scenarios instead of real patient data, ethics committee approval was not required. The study was conducted in accordance with general research ethical principles.

View