PhD Dissertation · University of Illinois Chicago · 2026

Towards Autonomous
Robot-Assisted Surgery

Teaching a surgical robot to see, understand, and assist — working toward automating one of the world's most common operations: removing the gallbladder.

Read the thesis Explore the code Get the dataset

by Ki-Hwan Oh · advised by Prof. Miloš Žefran · Surgical Innovation & Training Lab

Live system: the robot grasps the gallbladder, stretches the tissue, and follows the cut line on its own.

0Annotated stereo frames in the dataset

0Surgeons recorded, with documented skill levels

0Faster autonomous dissection (v2 vs. v1)

0Boundary-tracking error (sub-millimeter)

0Instrument tip error from images alone

0Open-source repositories to build on

The big idea, in plain language

What is this research about?

No medical or robotics background needed. Here is the whole story in four short ideas.

Surgeons already use robots

In modern operating rooms, a surgeon often sits at a console and controls thin robotic arms that reach inside the body through tiny incisions — a bit like a very precise video game with real instruments. The most widely used system is the da Vinci surgical robot.

The robot still does nothing on its own

Today the robot only moves when the surgeon moves. My question: can it begin to handle parts of an operation by itself — safely, and guided only by what its camera sees inside the body?

Gallbladder removal is the perfect first step

A cholecystectomy (removing the small organ that stores bile) is one of the most common operations in the world, and its steps are highly standardized. That predictability makes it an ideal, lower-risk target to start automating.

So I gave the robot eyes and a baseline brain

I built the data, the perception, and the control needed for the robot to see the anatomy, decide where to separate tissue, and move precisely — then released everything openly so others can reproduce it and push further.

The da Vinci surgical system: a surgeon seated at the console on the left, the patient-side cart with four robotic arms over a draped patient in the center, and the vision tower with a 3D display on the right. — Meet the **da Vinci** system at the heart of this work. The surgeon operates from a **console** (left); the **patient-side cart** holds the robotic arms and the stereo endoscope that reach inside the body (center); and the **vision tower** processes and shows the 3D view (right). My research adds perception and autonomy on top of this platform.

How a robot “does surgery on its own” — three jobs it must master

See

Look at the stereo (3D) camera feed and recognize the gallbladder, the liver, and each instrument — in real time.

Perception

Decide

Find the exact boundary where the gallbladder meets the liver, and plan a safe path to separate them.

Planning

Act

Drive the robotic arms to grasp, stretch, and cut along that boundary with sub-millimeter accuracy.

Control

Three open contributions that make this possible

A real-surgery dataset

The data to learn from: synchronized video, robot motion, and expert annotations from real procedures.

Explore the dataset → 02

Autonomous dissection

A working, vision-guided system that separates the gallbladder from the liver largely on its own.

See it work → 03

Vision-only instrument tracking

An AI that locates the instruments in 3D from camera images alone — no robot sensors required.

How it works →

The recording setup and a montage of stereo endoscope frames, robot kinematics, and pedal signals that make up the CRCD dataset.

Contribution 01

CRCD — a window into real robotic surgery

To teach a robot, you first need examples. The Comprehensive Robotic Cholecystectomy Dataset (CRCD) is one of the most complete public recordings of real robotic gallbladder surgeries ever released — think of it as a richly annotated “textbook” of how experts actually operate.

It was recorded during ex vivo procedures on porcine (pig) livers and brings together, perfectly time-synchronized, every signal a learning system could want:

Stereo endoscopic video — the robot's 3D view inside the body
Full robot & console kinematics — exactly how every arm and the surgeon's hands moved
Foot-pedal signals — when the surgeon delivered cutting/sealing energy (rarely shared publicly)
Dense annotations — tissue segmentation and instrument keypoints

755,000+ stereo frames 7 surgeons, rated by experience Multimodal & synchronized CC-BY-4.0

Hugging Face GitHub ISMR 2024 paper JMRR 2025 (extended)

Why it matters: existing datasets were missing something — kinematics, the surgeon's hand motion, pedal signals, or dense labels on a real procedure. CRCD brings them together, so researchers worldwide can train and fairly compare perception, control, and learning models.

The upgraded bimanual system: one arm grasps and stretches the tissue while the other follows the cut line.

Contribution 02

A robot that dissects from sight alone

The core demonstration of the thesis: a framework that lets the da Vinci robot separate the gallbladder from the liver using only its stereo camera — no pre-scripted trajectory, no external trackers.

It evolved through two generations. The first (v1) followed a pre-planned path with a single arm. The upgraded second version (v2) is bimanual and adaptive: it automatically grasps the gallbladder, stretches it to expose the boundary, and continuously re-finds the cut line as the tissue changes — delivering energy along the way.

Automatic grasping & tissue stretching for a clear, stable boundary
Online, boundary-guided cutting that adapts in real time
A new “liver-bed” tissue class to keep tracking after each pass — groundwork for multi-round dissection

3.3×faster than v1
(~101 s → ~31 s per pass)

0.49 mmboundary-tracking RMSE
(sub-millimeter)

11 + 5trials on porcine liver
& tissue proxies

Vision code Control code BioRob 2024 IROS 2025

Step 1 — Grasp. The forceps align, grasp, and pull until the boundary is taut.

Step 2 — Dissect. The hook follows the live-detected boundary while delivering energy.

Under the hood: teaching the robot to recognize what it sees

None of this works without reliable perception. The system identifies anatomy and instruments frame-by-frame, and reconstructs the scene in 3D from the stereo camera. The perception stack matured from Detectron2 to MaskDINO to YOLO11, which gave the most stable, real-time boundaries.

Surgical instrument with detected keypoints marking its tip and joints. — Instrument keypoint detection — locating the tool tip and joints.

Annotation of the liver bed region that remains after the gallbladder is dissected away. — The new “liver-bed” class — what's left after a dissection pass.

Block diagram of the version 2 autonomous dissection system architecture, from perception through control. — The v2 system architecture, end to end.

Contribution 03

Finding the instruments in 3D — from pictures alone

Robots normally know where their tools are from internal sensors (encoders). But that link can drift, break, or simply be unavailable when learning from video. So I asked: can we recover an instrument's 3D position using nothing but the camera images?

The answer is a baseline framework built on Vision Transformers (ViT). Each instrument gets its own backbone, first trained to segment the tool, then paired with a lightweight stereo head that estimates where the tip is in space — one model per instrument, which works better than forcing both into one.

Left camera

Right camera

Predicted instrument tip position, recovered purely from the stereo images.

0.94 cmtip error
Permanent Cautery Hook

1.13 cmtip error
Fenestrated Bipolar Forceps

Image-onlyno robot encoders
required at inference

Why it matters: recovering instrument state from images alone is a stepping stone to full 6-degree-of-freedom pose estimation, and to training robots from surgical video where sensor data doesn't exist.

stereo-endo-pose-vit

Architecture diagram of the Vision Transformer pose-estimation model with stereo inputs and per-component heads. — The ViT backbone + stereo pose-head architecture.

Grad-CAM heatmaps showing the model attends to the instrument tips when predicting their position. — Grad-CAM confirms the model looks at the instruments themselves.

For researchers & engineers

Use this work as your baseline

Everything is open source. Whether you want the data, real-time surgical vision, da Vinci control, or a vision-only pose model, start from one of these four repositories and build on top.

Where should you start?

I want the data CRCD Real-time surgical vision sitl_ros2_cv Drive the da Vinci / reproduce dissection sitl_ros2_dvrk Vision-only instrument pose stereo-endo-pose-vit

CRCD

The dataset — data & tools for robotic cholecystectomy research

DatasetROS2 msgsCC-BY-4.0

Synchronized stereo video, full da Vinci kinematics, pedal signals, and dense tissue/instrument annotations from real ex vivo procedures. Includes loaders and tutorial notebooks. Available on Hugging Face and via direct download.

GitHub 🤗 Hugging Face arXiv (ISMR) arXiv (JMRR)

from datasets import load_dataset

# Stream the synchronized dataset straight from the Hub
ds = load_dataset("SITL-Eng/CRCD", split="train", streaming=True)
sample = next(iter(ds))
print(sample.keys())  # stereo frames + kinematics + pedal signals

sitl_ros2_cv

Real-time computer vision for the da Vinci stereo endoscope

ROS2 HumblePythonCUDAMIT

Tissue segmentation (liver / gallbladder), surgical-instrument keypoint detection, stereo disparity, and 3D point-cloud reconstruction — with YOLO11, Detectron2, and MaskDINO backends. This is the “eyes” of the autonomous dissection system.

GitHub

cd ~/ros2_ws/src
git clone https://github.com/SITL-Eng/sitl_ros2_cv.git
cd ~/ros2_ws && colcon build --packages-select sitl_ros2_cv
source install/setup.bash

# Live liver / gallbladder segmentation with YOLO11
ros2 launch sitl_ros2_cv yolo_seg_lv_gb.xml

sitl_ros2_dvrk

Control layer for the da Vinci Research Kit (dVRK)

ROS2 HumbledVRKPython

Custom kinematics, arm coordination, and the autonomous behaviors (grasp, stretch, dissect) that turn perception into motion on the real robot. Pair it with sitl_ros2_cv to reproduce the dissection demos.

GitHub Required: sitl_ros2_interfaces

# Requires ROS 2 Humble + the official dVRK ROS2 stack
cd ~/ros2_ws/src
git clone https://github.com/SITL-Eng/sitl_ros2_dvrk.git
git clone https://github.com/SITL-Eng/sitl_ros2_interfaces.git
pip install pyquaternion pyserial
cd ~/ros2_ws && colcon build && source install/setup.bash

stereo-endo-pose-vit

Vision-Transformer instrument pose estimation from stereo images

PyTorchViTYOLO11uv

The full pipeline behind Contribution 03: train a ViT segmentation backbone on CRCD instrument masks, then train stereo pose heads to regress end-effector state. Includes training/eval scripts, Grad-CAM, and 3D visualization — a clean baseline for vision-only surgical pose estimation.

GitHub

git clone https://github.com/koh43/stereo-endo-pose-vit.git
cd stereo-endo-pose-vit
uv venv --python 3.12 && source .venv/bin/activate
uv pip install -r requirements.txt

# Train instrument segmentation (step 1 of the pipeline)
python scripts/train_inst_seg_yolo11.py --data-dir /path/to/instrument_segmentation

Supporting ROS 2 packages

Shared infrastructure that the stack above is built on.

sitl_ros2_pedal

Custom da Vinci pedal & electrosurgical-unit (ESU) control — read the surgeon's pedal presses and fire monopolar energy from ROS 2 (used for autonomous dissection and CRCD pedal logging).

ROS 2ArduinoPython

sitl_ros2_interfaces

The shared custom ROS 2 messages used across the dVRK platform — timestamped primitives for synchronization, plus perception and sensory-glove messages.

ROS 2rosidlmsgs

Peer-reviewed work

Publications

The dissertation builds on these papers. Citations and links below.

JMRR 2025

Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)

K.-H. Oh, L. Borgioli, A. Mangano, V. Valle, M. Di Pangrazio, F. Toti, G. Pozza, L. Ambrosini, A. Ducas, M. Žefran, L. Chen, P. C. Giulianotti

Journal of Medical Robotics Research, 2025

DOI arXiv
IROS 2025

Autonomous Dissection in Robotic Cholecystectomy

K.-H. Oh, L. Borgioli, M. Žefran, V. Valle, P. C. Giulianotti

IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 11240–11246, 2025

DOI arXiv
BioRob 2024

A Framework for Automated Dissection Along Tissue Boundary

K.-H. Oh, L. Borgioli, M. Žefran, L. Chen, P. C. Giulianotti

IEEE RAS/EMBS Int. Conf. for Biomedical Robotics and Biomechatronics (BioRob), pp. 1427–1433, 2024

DOI
ISMR 2024

Comprehensive Robotic Cholecystectomy Dataset (CRCD): Integrating Kinematics, Pedal Signals, and Endoscopic Videos

K.-H. Oh, L. Borgioli, A. Mangano, V. Valle, M. Di Pangrazio, F. Toti, G. Pozza, L. Ambrosini, A. Ducas, M. Žefran, L. Chen, P. C. Giulianotti

Int. Symposium on Medical Robotics (ISMR), pp. 1–7, 2024

DOI arXiv
ICRA 2025

Sensory Glove-Based Surgical Robot User Interface

L. Borgioli, K.-H. Oh, A. Mangano, A. Ducas, L. Ambrosini, F. Pinto, P. A. Lopez, J. Cassiani, M. Žefran, L. Chen, P. C. Giulianotti

IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 10487–10493, 2025

arXiv

Cite this dissertation

@phdthesis{oh2026autonomous,
  author = {Oh, Ki-Hwan},
  title  = {Towards Autonomous Robot-Assisted Surgery},
  school = {University of Illinois Chicago},
  year   = {2026}
}

About the author

Ki-Hwan Oh

Ph.D. in Electrical & Computer Engineering, University of Illinois Chicago

I work at the intersection of surgical robotics, computer vision, and machine learning. My doctoral research, carried out in the Surgical Innovation & Training Lab under Prof. Miloš Žefran and in collaboration with the Division of General, Minimally Invasive and Robotic Surgery, focuses on giving the da Vinci surgical robot the perception and autonomy needed to assist — and eventually automate — steps of real procedures.

I've also interned at Intuitive Surgical (makers of the da Vinci) and Retina Robotics, and I'm continuing toward imitation-learning and vision-language-action models for surgical autonomy.

Acknowledgements

This work was conducted at the Surgical Innovation & Training Lab and the UIC Robotics Lab at the University of Illinois Chicago. Special thanks to advisor Prof. Miloš Žefran, to Leonardo Borgioli and the surgical team led by Prof. Pier Cristoforo Giulianotti, and to the committee members (A. E. Cetin, S. Han, L. Chen) for their guidance and collaboration.

Towards AutonomousRobot-Assisted Surgery

What is this research about?

Surgeons already use robots

The robot still does nothing on its own

Gallbladder removal is the perfect first step

So I gave the robot eyes and a baseline brain

How a robot “does surgery on its own” — three jobs it must master

See

Decide

Act

Three open contributions that make this possible

A real-surgery dataset

Autonomous dissection

Vision-only instrument tracking

CRCD — a window into real robotic surgery

A robot that dissects from sight alone

Under the hood: teaching the robot to recognize what it sees

Finding the instruments in 3D — from pictures alone

Use this work as your baseline

Where should you start?

CRCD

sitl_ros2_cv

sitl_ros2_dvrk

stereo-endo-pose-vit

Supporting ROS 2 packages

sitl_ros2_pedal

sitl_ros2_interfaces

Publications

Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)

Autonomous Dissection in Robotic Cholecystectomy

A Framework for Automated Dissection Along Tissue Boundary

Comprehensive Robotic Cholecystectomy Dataset (CRCD): Integrating Kinematics, Pedal Signals, and Endoscopic Videos

Sensory Glove-Based Surgical Robot User Interface

Cite this dissertation

Ki-Hwan Oh

Acknowledgements

Towards Autonomous
Robot-Assisted Surgery