This prospective, multicenter observational study used a prespecified and published study protocol [13]. It was conducted at the surgical department of the St Antonius Hospital in Nieuwegein and Tergooi Hospital in Hilversum, two large non-academic teaching hospitals in The Netherlands. This study was approved by the medical ethical committee of the St Antonius Hospital, which determined that informed consent was not required. We adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines [14].
ParticipantsPatients aged ≥ 18 years who underwent elective colorectal surgery with a primary intestinal anastomosis (i.e., ileo-colic, colonic, or colorectal) between October 2013 and May 2019 were eligible for inclusion. Patients undergoing emergency surgery for colonic perforation were excluded. The study population comprises both conventional colorectal surgery and patients who underwent cytoreductive surgery with hyperthermic intraperitoneal chemotherapy (CRS-HIPEC).
Clinical assessmentFor each included patient, the attending physician assigned a subjective probability score (ranging from 0 to 100) estimating the likelihood of anastomotic leakage, solely based on clinical assessment, on a daily basis. This clinical assessment was derived from parameters such as the patient’s overall condition, vital signs, abdominal symptoms, and postoperative course. As part of routine postoperative care, physicians, including junior doctors, were required to assess mental status, the presence of peritoneal signs (yes/no), and bowel function (e.g., passage of stool), which informed the overall clinical judgment. Each patient was assessed once daily by the physician responsible for postoperative care on that day; assessments were not duplicated across clinician groups. Laboratory values, including C-reactive protein and leukocyte count, were deliberately excluded from the clinical assessment to isolate the diagnostic performance of bedside clinical judgment. This approach reflects clinical situations in which laboratory results are unavailable, delayed, or nonspecific, and enhances generalizability across postoperative settings. No formal scoring guidelines were provided, as the aim was to evaluate diagnostic accuracy under routine clinical conditions.
Clinical assessments were defined as the first postoperative evaluation of the day. Surgeons were not exclusively consulted in response to clinical deterioration; rather, assessments reflected routine postoperative care. During weekdays, junior doctors typically performed the first daily assessment, whereas during weekends this role was more often fulfilled by colorectal surgeons.
Reference standardPatients were followed for the occurrence of anastomotic leakage during the hospital admission within 30 days after surgery. Anastomotic leakage was defined as the occurrence of anastomotic dehiscence observed during re-laparoscopy or re-laparotomy, or the drainage of pus from a collection in contact with the anastomosis, either through percutaneous or trans anal means, which served as the reference standard for diagnosis. Anastomotic leakage severity was defined according to the International Study Group of Rectal Cancer (ISREC) definition. [15] Due to the intervention-based reference standard, the present study primarily captures clinically relevant anastomotic leakage requiring active therapeutic intervention (ISREC grade B and C). Asymptomatic leaks managed without active intervention (ISREC grade A) could not be reliably identified within the study design. Patients were assessed daily. Patients who developed an anastomotic leakage were considered a positive case on the specific day when it was detected. Patients diagnosed with anastomotic leakage were excluded from further analysis on the days after anastomotic leakage diagnosis. Patients were considered negative on the days before diagnosis.
Groups stratified by experienceClinicians were divided into two groups based on their level of experience. The first group consisted of junior doctors, which are doctors not enrolled in a surgical training program, typically in their first or second year after completing a six-year medical curriculum. The second group included colorectal surgeons. Residents in surgical training were excluded from analysis due to the considerable variability in clinical experience within this group (ranging from 1 to 6 years).
Due to the extended inclusion period of five years, a large and dynamic group of clinicians contributed to postoperative assessments. This included several dozen junior doctors rotating through the surgical wards over time, as well as a stable group of board-certified gastrointestinal surgeons. Because clinicians entered and left clinical service during the study period, the exact number of individual assessors per group could not be reliably reconstructed. This reflects routine clinical practice and enhances the generalizability of the findings.
Statistical analysisDescriptive statistics were used to summarize baseline characteristics and clinical scores. Normality of continuous variables was assessed visually. Normally distributed variables were reported as means with standard deviations; non-normally distributed variables as medians with interquartile ranges (IQR). Categorical variables were summarized as counts with percentages. Continuous variables were assessed for normality and compared using the t-test when normally distributed or the Wilcoxon rank-sum test when non-normally distributed. Categorical variables were compared using the chi-squared test or Fisher’s exact test, as appropriate. Statistical analyses were performed by the authors, with statistical expertise available within the research team.
DiscriminationReceiver operating characteristic (ROC) curves were constructed for both groups. The area under the curve (AUC) was calculated with 95% CI’s to quantify diagnostic discrimination. Diagnostic accuracy between groups was compared using DeLong’s test.
CalibrationCalibration, reflecting the agreement between predicted and observed probabilities of anastomotic leakage, was assessed visually using calibration plots for each clinician group. The subjective clinical scores (ranging from 0 to 100) were grouped into five bins (0–20, 20–40, 40–60, 60–80, and 80–100) to enhance interpretability and ensure sufficient observations within each risk category. The observed proportion of anastomotic leakage within each bin was then plotted against the corresponding mean clinical score.
Sensitivity analysisPatients were not assessed by both junior doctors and surgeons, which could potentially introduce bias. To evaluate this, the vital signs and clinical scores at the time of assessment were compared between observations made by junior doctors and those made by surgeons, including heart rate, systolic blood pressure, respiratory rate, and temperature.
Missing dataMissing data occurred when the clinical scores were not completed by the responsible physician. These missing values were assumed to be missing completely at random or missing at random. Missing values in the clinical score were addressed using multiple imputation by chained equations (MICE) with predictive mean matching. The imputation model included patient-level predictors (e.g., age, heart rate, blood pressure, temperature, and clinical symptoms), scores from adjacent days, and the final anastomotic leakage outcome. Twenty imputed datasets were generated, and diagnostic metrics were pooled using Rubin’s rules. To preserve the integrity of group-level comparisons, observations with missing assessor group data were excluded from the analysis.
All statistical analyses were performed using R version 4.3.1 with the ‘mice’ package. A two-sided p-value of < 0.05 was considered statistically significant [16, 17].
Comments (0)