About Me
I am a medical AI researcher and research intern at Shanghai Artificial Intelligence Laboratory, working on trustworthy medical agents, patient-facing LLM safety, and clinically grounded evaluation of healthcare AI systems. My research is motivated by multi-year clinical training across emergency medicine, cardiology, respiratory medicine, pediatrics, obstetrics, neurology, and related departments. My long-term goal is to build clinically reliable medical agents that can reason over evolving patient states, seek evidence when appropriate, and recognize when continued interaction becomes unsafe and escalation is required.
Research Interests
- Medical AI
- Machine Learning
- Artificial Intelligence
- Large Language Models
- Agent Systems
- Benchmark Construction
News & Highlights
-
💌
2026 Manuscript submitted to NEJM AI on clinical safety in patient-facing medical AI.
-
🎉
2026 Paper published in npj Digital Medicine on benchmarking and competition for specialty triage.
-
💐
2026 SafeMed-R1 released as an arXiv preprint with open code for safety and ethics alignment in medical LLMs.
-
🎊
2026 Paper accepted by KDD 2026 on finance agent evaluation with CNFinBench.
-
✨
2026 Paper accepted by CVPR 2026 on biomedical object referring and segmentation with MLLMs.
-
🌱
2025 Joined Shanghai Artificial Intelligence Laboratory as a Research Assistant.
Research Experience
Research Assistant
Shanghai Artificial Intelligence Laboratory
Shanghai, China
Dec. 2024 - Present
Patient-Facing Multi-Agent Medical AI
arXiv / Manuscript under review at NEJM AI | Dec. 2025 - May 2026
- Built a simulated patient-doctor evaluation environment integrating patient simulators, frontier LLM doctor backbones, an automated clinical judge, and a SafeMed-R1 in-loop safety controller.
- Formalized delayed escalation as a temporal safety failure and evaluated five LLMs across 150 multi-turn consultations against a 40-case clinician reference benchmark.
- Implemented a five-action controller: PASS, REWRITE, ASK-MORE, ESCALATE, and REFUSE, reducing mean escalation turn from 6.28 to 3.78 rounds and missed escalation from 62.5% to 12.5%.
SafeMed-R1: Medical LLMs
arXiv:2605.28338 | Code | Dec. 2024 - Dec. 2025
- Developed SafeMed-R1, a safety- and ethics-aligned medical LLM based on Qwen3-32B, using SFT and RL/GRPO with clinician-audited reasoning traces, safety/ethics supervision, and red-team jailbreak stress testing.
- Built the Clinical Trust Signals pipeline for expert-reviewed QA-CoT construction, rubric scoring, adversarial re-answering checks, and benchmark evaluation; achieved 79.6% macro-averaged clinical accuracy and reduced unsafe outputs under adversarial testing.
- Deployed SafeMed-R1 as both a standalone aligned medical model and a modular turn-level governance model for patient-facing dialogue, connecting training-time safety alignment with real-time escalation control.
MedBench / MedTriage: Benchmarking and Specialty Triage
Homepage | npj Digital Medicine | Dec. 2024 - Apr. 2025
- Built MedTriage from hospital intake records, online guidance dialogues, and outpatient clinical notes with leakage-prevention cleaning and strict multi-label department recommendation.
- Developed MedGPT-Guide, a retrieval-augmented triage model using BGE-m3 retrieval, CoT prompting, candidate-order perturbation, self-consistency voting, and ensemble aggregation.
- Contributed to MedBench infrastructure for LLMs, a cloud-based evaluation platform spanning 700k+ expert-curated tasks, 24 primary and 91 secondary specialties, and dedicated tracks for LLMs, multimodal models, and clinical agents.
Agentic Benchmark for High-Stakes Financial LLM Agents
Project | Code | KDD 2026 accepted
- Co-developed CNFinBench to evaluate expertise, autonomy, and integrity across 29 subtasks, 11,947 single-turn QA instances, and 321 multi-turn adversarial dialogues.
- Designed workflow evaluation covering requirement parsing, path planning, API/database operations, tool invocation, multi-agent collaboration, and result verification.
- Helped develop HICS to quantify behavioral compliance drift across open- and closed-source models under multi-turn attack settings.
Clinical Training
Standardized Residency Trainee
Jul. 2022 - Jul. 2025
Putuo District Central Hospital, Shanghai University of Traditional Chinese Medicine
- Completed standardized residency training across emergency medicine, cardiology, respiratory medicine, pediatrics, obstetrics, neurology, and related departments, developing broad exposure to acute-care and longitudinal clinical workflows.
- Participated in outpatient, inpatient, and emergency-care settings, gaining first-hand experience in history taking, physical examination, preliminary assessment, differential diagnosis, patient communication, and referral decision-making.
- Managed and observed high-risk clinical presentations including chest pain, acute dyspnea, obstetric bleeding, pediatric fever, trauma, infection, and neurologic emergencies, strengthening clinical judgment in red-flag recognition and time-sensitive escalation.
- Developed a clinical understanding of early-stage triage bottlenecks, including incomplete patient narratives, limited consultation time, uncertainty in referral thresholds, and communication gaps between patient symptoms and clinical risk.
Education
Expected Dec. 2026
Master of Medicine
Shanghai University of Traditional Chinese Medicine
Medical Statistics · Research Methodology · Scientific Writing · Clinical Practice
2015 - 2020
Bachelor of Medicine
Jiangxi University of Traditional Chinese Medicine
Anatomy · Physiology · Pathology · Pharmacology · Medical Imaging · Immunology
Publications and Conference Presentations
* Equal contribution / co-first authorship.
Conversational Fluency Does Not Ensure Clinical Safety in Patient-Facing Medical AI
Ding, C.*, et al.
Manuscript under review at NEJM AI.
SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models
Ding, C.*, et al.
arXiv preprint, 2026.
Beyond Knowledge to Agency: Evaluating Expertise, Autonomy, and Integrity in Finance with CNFinBench
Ding, J.*, Ding, C.*, Jiang, Y.*, et al.
KDD 2026, accepted. ACM SIGKDD / CCF-A.
Advancing medical AI through benchmarking and competition for specialty triage
Ding, C.*, Bian, M.*, Yuan, M.*, et al.
npj Digital Medicine, IF: 15.1, 2026.
-
Ding, C.*, Lu, R.*, Kong, Z., Huang, R.
TyG index, depression, and cognitive dysfunction: NHANES with machine learning support.
Journal of Affective Disorders, IF: 4.9, 2025.
DOI
-
Ding, C.*, Yuan, M.*, Cheng, J., Wen, J.
Smoking types and stroke risk: development of a predictive model for identifying stroke risk.
Frontiers in Physiology, IF: 3.4, 2025.
DOI
-
Ding, C., Kong, Z., Cheng, J., Huang, R.
U-shaped relationship between TyG index and depression using machine learning.
Heliyon, IF: 3.6, 2024.
DOI