PhD Dissertation · University of Illinois Chicago · 2026

Towards Autonomous
Robot-Assisted Surgery

Teaching a surgical robot to see, understand, and assist — working toward automating one of the world's most common operations: removing the gallbladder.

Live system: the robot grasps the gallbladder, stretches the tissue, and follows the cut line on its own.

0Annotated stereo frames in the dataset
0Surgeons recorded, with documented skill levels
0Faster autonomous dissection (v2 vs. v1)
0Boundary-tracking error (sub-millimeter)
0Instrument tip error from images alone
0Open-source repositories to build on

The big idea, in plain language

What is this research about?

No medical or robotics background needed. Here is the whole story in four short ideas.

01

Surgeons already use robots

In modern operating rooms, a surgeon often sits at a console and controls thin robotic arms that reach inside the body through tiny incisions — a bit like a very precise video game with real instruments. The most widely used system is the da Vinci surgical robot.

02

The robot still does nothing on its own

Today the robot only moves when the surgeon moves. My question: can it begin to handle parts of an operation by itself — safely, and guided only by what its camera sees inside the body?

03

Gallbladder removal is the perfect first step

A cholecystectomy (removing the small organ that stores bile) is one of the most common operations in the world, and its steps are highly standardized. That predictability makes it an ideal, lower-risk target to start automating.

04

So I gave the robot eyes and a baseline brain

I built the data, the perception, and the control needed for the robot to see the anatomy, decide where to separate tissue, and move precisely — then released everything openly so others can reproduce it and push further.

The da Vinci surgical system: a surgeon seated at the console on the left, the patient-side cart with four robotic arms over a draped patient in the center, and the vision tower with a 3D display on the right.
Meet the da Vinci system at the heart of this work. The surgeon operates from a console (left); the patient-side cart holds the robotic arms and the stereo endoscope that reach inside the body (center); and the vision tower processes and shows the 3D view (right). My research adds perception and autonomy on top of this platform.

How a robot “does surgery on its own” — three jobs it must master

See

Look at the stereo (3D) camera feed and recognize the gallbladder, the liver, and each instrument — in real time.

Perception

Decide

Find the exact boundary where the gallbladder meets the liver, and plan a safe path to separate them.

Planning

Act

Drive the robotic arms to grasp, stretch, and cut along that boundary with sub-millimeter accuracy.

Control
The recording setup and a montage of stereo endoscope frames, robot kinematics, and pedal signals that make up the CRCD dataset.

Contribution 01

CRCD — a window into real robotic surgery

To teach a robot, you first need examples. The Comprehensive Robotic Cholecystectomy Dataset (CRCD) is one of the most complete public recordings of real robotic gallbladder surgeries ever released — think of it as a richly annotated “textbook” of how experts actually operate.

It was recorded during ex vivo procedures on porcine (pig) livers and brings together, perfectly time-synchronized, every signal a learning system could want:

  • Stereo endoscopic video — the robot's 3D view inside the body
  • Full robot & console kinematics — exactly how every arm and the surgeon's hands moved
  • Foot-pedal signals — when the surgeon delivered cutting/sealing energy (rarely shared publicly)
  • Dense annotations — tissue segmentation and instrument keypoints
755,000+ stereo frames 7 surgeons, rated by experience Multimodal & synchronized CC-BY-4.0

Why it matters: existing datasets were missing something — kinematics, the surgeon's hand motion, pedal signals, or dense labels on a real procedure. CRCD brings them together, so researchers worldwide can train and fairly compare perception, control, and learning models.

The upgraded bimanual system: one arm grasps and stretches the tissue while the other follows the cut line.

Contribution 02

A robot that dissects from sight alone

The core demonstration of the thesis: a framework that lets the da Vinci robot separate the gallbladder from the liver using only its stereo camera — no pre-scripted trajectory, no external trackers.

It evolved through two generations. The first (v1) followed a pre-planned path with a single arm. The upgraded second version (v2) is bimanual and adaptive: it automatically grasps the gallbladder, stretches it to expose the boundary, and continuously re-finds the cut line as the tissue changes — delivering energy along the way.

  • Automatic grasping & tissue stretching for a clear, stable boundary
  • Online, boundary-guided cutting that adapts in real time
  • A new “liver-bed” tissue class to keep tracking after each pass — groundwork for multi-round dissection
3.3×faster than v1
(~101 s → ~31 s per pass)
0.49 mmboundary-tracking RMSE
(sub-millimeter)
11 + 5trials on porcine liver
& tissue proxies
Step 1 — Grasp. The forceps align, grasp, and pull until the boundary is taut.
Step 2 — Dissect. The hook follows the live-detected boundary while delivering energy.

Under the hood: teaching the robot to recognize what it sees

None of this works without reliable perception. The system identifies anatomy and instruments frame-by-frame, and reconstructs the scene in 3D from the stereo camera. The perception stack matured from Detectron2 to MaskDINO to YOLO11, which gave the most stable, real-time boundaries.

Contribution 03

Finding the instruments in 3D — from pictures alone

Robots normally know where their tools are from internal sensors (encoders). But that link can drift, break, or simply be unavailable when learning from video. So I asked: can we recover an instrument's 3D position using nothing but the camera images?

The answer is a baseline framework built on Vision Transformers (ViT). Each instrument gets its own backbone, first trained to segment the tool, then paired with a lightweight stereo head that estimates where the tip is in space — one model per instrument, which works better than forcing both into one.

Left camera
Right camera

Predicted instrument tip position, recovered purely from the stereo images.

0.94 cmtip error
Permanent Cautery Hook
1.13 cmtip error
Fenestrated Bipolar Forceps
Image-onlyno robot encoders
required at inference

Why it matters: recovering instrument state from images alone is a stepping stone to full 6-degree-of-freedom pose estimation, and to training robots from surgical video where sensor data doesn't exist.

For researchers & engineers

Use this work as your baseline

Everything is open source. Whether you want the data, real-time surgical vision, da Vinci control, or a vision-only pose model, start from one of these four repositories and build on top.

CRCD

The dataset — data & tools for robotic cholecystectomy research

DatasetROS2 msgsCC-BY-4.0

Synchronized stereo video, full da Vinci kinematics, pedal signals, and dense tissue/instrument annotations from real ex vivo procedures. Includes loaders and tutorial notebooks. Available on Hugging Face and via direct download.

from datasets import load_dataset

# Stream the synchronized dataset straight from the Hub
ds = load_dataset("SITL-Eng/CRCD", split="train", streaming=True)
sample = next(iter(ds))
print(sample.keys())  # stereo frames + kinematics + pedal signals

sitl_ros2_cv

Real-time computer vision for the da Vinci stereo endoscope

ROS2 HumblePythonCUDAMIT

Tissue segmentation (liver / gallbladder), surgical-instrument keypoint detection, stereo disparity, and 3D point-cloud reconstruction — with YOLO11, Detectron2, and MaskDINO backends. This is the “eyes” of the autonomous dissection system.

cd ~/ros2_ws/src
git clone https://github.com/SITL-Eng/sitl_ros2_cv.git
cd ~/ros2_ws && colcon build --packages-select sitl_ros2_cv
source install/setup.bash

# Live liver / gallbladder segmentation with YOLO11
ros2 launch sitl_ros2_cv yolo_seg_lv_gb.xml

sitl_ros2_dvrk

Control layer for the da Vinci Research Kit (dVRK)

ROS2 HumbledVRKPython

Custom kinematics, arm coordination, and the autonomous behaviors (grasp, stretch, dissect) that turn perception into motion on the real robot. Pair it with sitl_ros2_cv to reproduce the dissection demos.

# Requires ROS 2 Humble + the official dVRK ROS2 stack
cd ~/ros2_ws/src
git clone https://github.com/SITL-Eng/sitl_ros2_dvrk.git
git clone https://github.com/SITL-Eng/sitl_ros2_interfaces.git
pip install pyquaternion pyserial
cd ~/ros2_ws && colcon build && source install/setup.bash

stereo-endo-pose-vit

Vision-Transformer instrument pose estimation from stereo images

PyTorchViTYOLO11uv

The full pipeline behind Contribution 03: train a ViT segmentation backbone on CRCD instrument masks, then train stereo pose heads to regress end-effector state. Includes training/eval scripts, Grad-CAM, and 3D visualization — a clean baseline for vision-only surgical pose estimation.

git clone https://github.com/koh43/stereo-endo-pose-vit.git
cd stereo-endo-pose-vit
uv venv --python 3.12 && source .venv/bin/activate
uv pip install -r requirements.txt

# Train instrument segmentation (step 1 of the pipeline)
python scripts/train_inst_seg_yolo11.py --data-dir /path/to/instrument_segmentation

Peer-reviewed work

Publications

The dissertation builds on these papers. Citations and links below.

  1. JMRR 2025

    Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)

    K.-H. Oh, L. Borgioli, A. Mangano, V. Valle, M. Di Pangrazio, F. Toti, G. Pozza, L. Ambrosini, A. Ducas, M. Žefran, L. Chen, P. C. Giulianotti

    Journal of Medical Robotics Research, 2025

  2. IROS 2025

    Autonomous Dissection in Robotic Cholecystectomy

    K.-H. Oh, L. Borgioli, M. Žefran, V. Valle, P. C. Giulianotti

    IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 11240–11246, 2025

  3. BioRob 2024

    A Framework for Automated Dissection Along Tissue Boundary

    K.-H. Oh, L. Borgioli, M. Žefran, L. Chen, P. C. Giulianotti

    IEEE RAS/EMBS Int. Conf. for Biomedical Robotics and Biomechatronics (BioRob), pp. 1427–1433, 2024

  4. ISMR 2024

    Comprehensive Robotic Cholecystectomy Dataset (CRCD): Integrating Kinematics, Pedal Signals, and Endoscopic Videos

    K.-H. Oh, L. Borgioli, A. Mangano, V. Valle, M. Di Pangrazio, F. Toti, G. Pozza, L. Ambrosini, A. Ducas, M. Žefran, L. Chen, P. C. Giulianotti

    Int. Symposium on Medical Robotics (ISMR), pp. 1–7, 2024

  5. ICRA 2025

    Sensory Glove-Based Surgical Robot User Interface

    L. Borgioli, K.-H. Oh, A. Mangano, A. Ducas, L. Ambrosini, F. Pinto, P. A. Lopez, J. Cassiani, M. Žefran, L. Chen, P. C. Giulianotti

    IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 10487–10493, 2025

Cite this dissertation

@phdthesis{oh2026autonomous,
  author = {Oh, Ki-Hwan},
  title  = {Towards Autonomous Robot-Assisted Surgery},
  school = {University of Illinois Chicago},
  year   = {2026}
}
Portrait of Ki-Hwan Oh.

About the author

Ki-Hwan Oh

Ph.D. in Electrical & Computer Engineering, University of Illinois Chicago

I work at the intersection of surgical robotics, computer vision, and machine learning. My doctoral research, carried out in the Surgical Innovation & Training Lab under Prof. Miloš Žefran and in collaboration with the Division of General, Minimally Invasive and Robotic Surgery, focuses on giving the da Vinci surgical robot the perception and autonomy needed to assist — and eventually automate — steps of real procedures.

I've also interned at Intuitive Surgical (makers of the da Vinci) and Retina Robotics, and I'm continuing toward imitation-learning and vision-language-action models for surgical autonomy.

Acknowledgements

This work was conducted at the Surgical Innovation & Training Lab and the UIC Robotics Lab at the University of Illinois Chicago. Special thanks to advisor Prof. Miloš Žefran, to Leonardo Borgioli and the surgical team led by Prof. Pier Cristoforo Giulianotti, and to the committee members (A. E. Cetin, S. Han, L. Chen) for their guidance and collaboration.

University of Illinois Chicago logo. Surgical Innovation and Training Lab logo.