The shift from human-controlled robotics to autonomous robotic surgery opens exciting frontiers for developing technologies to enhance surgical precision and improve patient outcomes. Data supports robotics offers enhanced precision, reduced morbidity and quicker return to recovery [18]. Therefore, the incorporation of iterative steps towards machine-automated endourology would be a natural evolution. One specific task in this field would be the development of an automated-assistant for the earliest detection of fragmentation while dusting during retrograde intrarenal laser lithotripsy.
Laser dusting offers significant advantages over traditional fragmenting in kidney stone surgery by reducing operative times up to 40% as it eliminates the need for active fragment retrieval and pulverizing stones into extremely fine fragments that can pass spontaneously in urine without requiring basket extraction. The dusting approach is associated with shorter procedure times and lower risk of ureteral damage compared to traditional fragmentation [2]. This technique also decreases stone retropulsion during the procedure, minimizing the need to chase migrated stones and potentially reducing postoperative complications.
One nuance of lasering is selecting the correct laser setting in order to ensure optimal dusting settings are used. Typically, dusting is achieved with low energy and high frequency (e.g. 0.4 J 20 Hz). Vendors will often recommend preset dusting settings but these are assumptions that do not always align with individual stones. Instead, the operator must toggle through settings until the optimal dusting setting is achieved. Herein, machine learning would seemingly have a role in helping to automate this step. Therefore, in this study, we developed a pipeline to facilitate autonomous fragment detection.
Our results demonstrated overall mixed performance, but performed better at the frame-level than segment level. We extracted images at 30 frames per second to match the native recording frame rate from the Dornier Axis ureteroscope. This proved critical because often fragmentation events might be brief - only 150 milliseconds (as few as 3–5 frames). This meant to adequately characterize moments of fragmentation we had to use a very high frame rate, which in turn led to much higher false negative rates. Similarly, while Lu et al., have had success with segmenting kidney stones and fragments, even during laser lithotripsy, small fragments, especially during ablation have remained a challenge [6]. Furthermore, we utilized a conservative fragment threshold of approximately 1 mm, however a more liberal threshold of two or three mm may have yielded improved results.
In both the frame and segment level analysis, our overall recall was moderate and suffered with respect to precision. This is to say that the algorithm was able to detect fragments with moderate sensitivity but also often overcalled fragments. The observed quantitative performance limitations, particularly the high false positive rate and low precision, likely stem from inherent challenges within each stage of our chosen pipeline and the complexity of the surgical video data itself. Firstly, the initial segmentation using general-purpose models like CLIPSeg may struggle with the specific visual artifacts of ureteroscopy, such as dust clouds, reflections, and low-resolution fragments, potentially leading to inaccurate point initialization for tracking. Secondly, CoTracker, designed primarily for consistent point tracking, is sensitive to occlusions, rapid motion, and domain shifts common in surgical videos; interpreting its instability as solely indicative of fragmentation is likely a major source of false positives. Finally, relying on HDBSCAN to detect fragmentation based purely on changes in the spatial clustering of points may be insufficient, as it doesn’t explicitly model the distinct motion patterns associated with a fragment breaking off and could misinterpret unrelated point cloud shifts. The limitations inherent in adapting general-purpose AI tools for this specific surgical application underscore the difficulty of achieving reliable fragmentation event detection with the current methodology.
In this way, future ureteroscopy specific vision models must overcome the unusual hurdles intrinsic to the procedure, such as that it is taking place in fluid, with large amounts of dust artifact, and sometimes bleeding. Furthermore, there can be significant movement with blur due to ureteroscope movement, patient respiration and/or pulsations from nearby vessels. While accurately labeling this data is challenging because the ground truth for any given image or clip may not even be known to the surgeon, we feel well labeled data is critical for fine-tuning general models for optimal performance in computer vision tasks.
Despite the challenges encountered, our study represents a novel investigation into the feasibility of detecting stone fragmentation events during ureteroscopy using a unique combination of contemporary AI technologies. Our approach diverges from traditional event detection methods by leveraging the instability characteristics of a point tracking foundation model, CoTracker, as a proxy signal for fragmentation. Specifically, we explored whether the disruption in tracking points, initialized via semantic segmentation (CLIPSeg) and identified through density-based clustering (HDBSCAN), could correlate with the physical event of a stone fragmenting. While the performance shows many limitations, this work contributes valuable insights into the application boundaries of these powerful general purpose models in complex, dynamic surgical environments and explores an unconventional methodology for event detection in a data-scarce domain, indicating the need for specifically designed models or fine-tuned methods for the unique challenges of more accurately analyzing fragmentation events in the future. Although our pipeline’s performance was modest, it can be applied to future laser lithotripsy videos to accelerate data labeling and support the creation of larger datasets for more robust training and improved fragmentation detection.
Comments (0)