Chao Ding - Academic Homepage

About Me

I am a medical AI researcher and research intern at Shanghai Artificial Intelligence Laboratory, working on trustworthy medical agents, patient-facing LLM safety, and clinically grounded evaluation of healthcare AI systems. My research is motivated by multi-year clinical training across emergency medicine, cardiology, respiratory medicine, pediatrics, obstetrics, neurology, and related departments. My long-term goal is to build clinically reliable medical agents that can reason over evolving patient states, seek evidence when appropriate, and recognize when continued interaction becomes unsafe and escalation is required.

Research Interests

News & Highlights

💌 2026 Manuscript submitted to NEJM AI on clinical safety in patient-facing medical AI.
🎉 2026 Paper published in npj Digital Medicine on benchmarking and competition for specialty triage.
💐 2026 SafeMed-R1 released as an arXiv preprint with open code for safety and ethics alignment in medical LLMs.
🎊 2026 Paper accepted by KDD 2026 on finance agent evaluation with CNFinBench.
✨ 2026 Paper accepted by CVPR 2026 on biomedical object referring and segmentation with MLLMs.
🌱 2025 Joined Shanghai Artificial Intelligence Laboratory as a Research Assistant.

Research Experience

Research Assistant

Shanghai Artificial Intelligence Laboratory

Shanghai, China Dec. 2024 - Present

Patient-Facing Multi-Agent Medical AI

arXiv / Manuscript under review at NEJM AI | Dec. 2025 - May 2026

Built a simulated patient-doctor evaluation environment integrating patient simulators, frontier LLM doctor backbones, an automated clinical judge, and a SafeMed-R1 in-loop safety controller.
Formalized delayed escalation as a temporal safety failure and evaluated five LLMs across 150 multi-turn consultations against a 40-case clinician reference benchmark.
Implemented a five-action controller: PASS, REWRITE, ASK-MORE, ESCALATE, and REFUSE, reducing mean escalation turn from 6.28 to 3.78 rounds and missed escalation from 62.5% to 12.5%.

SafeMed-R1: Medical LLMs

arXiv:2605.28338 | Code | Dec. 2024 - Dec. 2025

Developed SafeMed-R1, a safety- and ethics-aligned medical LLM based on Qwen3-32B, using SFT and RL/GRPO with clinician-audited reasoning traces, safety/ethics supervision, and red-team jailbreak stress testing.
Built the Clinical Trust Signals pipeline for expert-reviewed QA-CoT construction, rubric scoring, adversarial re-answering checks, and benchmark evaluation; achieved 79.6% macro-averaged clinical accuracy and reduced unsafe outputs under adversarial testing.
Deployed SafeMed-R1 as both a standalone aligned medical model and a modular turn-level governance model for patient-facing dialogue, connecting training-time safety alignment with real-time escalation control.

MedBench / MedTriage: Benchmarking and Specialty Triage

Homepage | npj Digital Medicine | Dec. 2024 - Apr. 2025

Built MedTriage from hospital intake records, online guidance dialogues, and outpatient clinical notes with leakage-prevention cleaning and strict multi-label department recommendation.
Developed MedGPT-Guide, a retrieval-augmented triage model using BGE-m3 retrieval, CoT prompting, candidate-order perturbation, self-consistency voting, and ensemble aggregation.
Contributed to MedBench infrastructure for LLMs, a cloud-based evaluation platform spanning 700k+ expert-curated tasks, 24 primary and 91 secondary specialties, and dedicated tracks for LLMs, multimodal models, and clinical agents.

Agentic Benchmark for High-Stakes Financial LLM Agents

Project | Code | KDD 2026 accepted

Co-developed CNFinBench to evaluate expertise, autonomy, and integrity across 29 subtasks, 11,947 single-turn QA instances, and 321 multi-turn adversarial dialogues.
Designed workflow evaluation covering requirement parsing, path planning, API/database operations, tool invocation, multi-agent collaboration, and result verification.
Helped develop HICS to quantify behavioral compliance drift across open- and closed-source models under multi-turn attack settings.

Clinical Training

Standardized Residency Trainee

Jul. 2022 - Jul. 2025

Putuo District Central Hospital, Shanghai University of Traditional Chinese Medicine

Completed standardized residency training across emergency medicine, cardiology, respiratory medicine, pediatrics, obstetrics, neurology, and related departments, developing broad exposure to acute-care and longitudinal clinical workflows.
Participated in outpatient, inpatient, and emergency-care settings, gaining first-hand experience in history taking, physical examination, preliminary assessment, differential diagnosis, patient communication, and referral decision-making.
Managed and observed high-risk clinical presentations including chest pain, acute dyspnea, obstetric bleeding, pediatric fever, trauma, infection, and neurologic emergencies, strengthening clinical judgment in red-flag recognition and time-sensitive escalation.
Developed a clinical understanding of early-stage triage bottlenecks, including incomplete patient narratives, limited consultation time, uncertainty in referral thresholds, and communication gaps between patient symptoms and clinical risk.

Education

Expected Dec. 2026

Master of Medicine

Shanghai University of Traditional Chinese Medicine

Medical Statistics · Research Methodology · Scientific Writing · Clinical Practice

2015 - 2020

Bachelor of Medicine

Jiangxi University of Traditional Chinese Medicine

Anatomy · Physiology · Pathology · Pharmacology · Medical Imaging · Immunology

Publications and Conference Presentations

* Equal contribution / co-first authorship.

Conversational Fluency Does Not Ensure Clinical Safety in Patient-Facing Medical AI

Ding, C.*, et al.

Manuscript under review at NEJM AI.

SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

Ding, C.*, et al.

arXiv preprint, 2026.

arXiv Code

Beyond Knowledge to Agency: Evaluating Expertise, Autonomy, and Integrity in Finance with CNFinBench

Ding, J.*, Ding, C.*, Jiang, Y.*, et al.

KDD 2026, accepted. ACM SIGKDD / CCF-A.

Project Code

Advancing medical AI through benchmarking and competition for specialty triage

Ding, C.*, Bian, M.*, Yuan, M.*, et al.

npj Digital Medicine, IF: 15.1, 2026.

Homepage DOI

Ding, C.*, Lu, R.*, Kong, Z., Huang, R. TyG index, depression, and cognitive dysfunction: NHANES with machine learning support. Journal of Affective Disorders, IF: 4.9, 2025. DOI
Ding, C.*, Yuan, M.*, Cheng, J., Wen, J. Smoking types and stroke risk: development of a predictive model for identifying stroke risk. Frontiers in Physiology, IF: 3.4, 2025. DOI
Ding, C., Kong, Z., Cheng, J., Huang, R. U-shaped relationship between TyG index and depression using machine learning. Heliyon, IF: 3.6, 2024. DOI