Advancing Responsible Statistical and AI/ML Methods for Harnessing the Power of Electronic Health Records

March 20
1-2pm
708 Broadway, Room 801 | Online

A Biostatistics Seminar Series hosted by the Department of Biostatistics

Featuring:
Qi Long, PhD
Vice Chair of Faculty Professional Development, 
Department of Biostatistics, Epidemiology and Informatics, 
University of Pennsylvania Perelman School of Medicine

Abstract: Rich electronic health records (EHR) data offer remarkable opportunities in advancing precision medicine, they also present daunting analytical challenges. Multi-modal data in EHR that are recorded at irregular time intervals with varying frequencies include structured data such as labs and vitals, codified data such as diagnosis and procedure codes, and unstructured data such as clinical notes and pathology reports. They are typically incomplete and fraught with other errors and biases. What’s more, data gaps and errors in EHRs are often unequally distributed across patient groups: People with less access to care, often people of color or with lower socioeconomic status, tend to have more incomplete data in EHRs. Such data issues, if not adequately addressed, would lead to biased results and exacerbate health inequities. In this talk, Dr. Long will share his research group’s recent works on developing responsible statistical and AI/ML methods including large language models (LLMs) for addressing these challenges. Since LLMs are themselves plagued by various biases, he will also discuss our ongoing research on developing rigorous statistical and machine learning methods for mitigating pitfalls and risks of LLMs.

About the speaker: Dr. Long’s research is focused on developing robust statistical and AI/ML methods for advancing equitable, precision health and medicine. His research group has developed robust methods for integrative analysis of big health data (including, but not limited to, -omics, electronic health records, and imaging data), analysis of incomplete data, causal inference, data privacy, algorithmic fairness, and large language models (LLMs). His methods research has been supported by the National Institutes of Health (NIH), the Patient-Centered Outcomes Research Institute (PCORI), and the National Science Foundation (NSF), and Advanced Research Projects Agency for Health (ARPA-H). In addition, Dr. Long has led the Statistical and Data Coordinating Center for large-scale, national multi-center studies. He currently co-directs the Coordinating Center for the Pre-medical Cancer Immunotherapy Network for Canine Trials (PRECINCT) funded by the NCI's Cancer Moonshot Initiative, and the Statistical and Data Coordinating Center for the Risk Underlying Rural Areas Longitudinal (RURAL) Cohort Study funded by the NHLBI. The rich, yet complex data from these large-scale studies present exciting opportunities for methodological research. Dr. Long is a Senior Editor for Cancer Research, Executive Editor for Statistical Analysis and Data Mining, and Associate Editor for several leading bio/statistical journals. He has served in leadership roles in various professional societies including AAAS (American Association for the Advancement of Science), ASA (American Statistical Association), and IBS/ENAR (International Biometric Society Eastern North American Region). He has served on numerous grant review panels for NIH and DOD including serving as a standing member of the NIH Biostatistical Methods and Research Design (BMRD) Study Section in 2017-2021. Dr. Long is an elected Fellow of AAAS (American Association for the Advancement of Science), ASA (American Statistical Association), ISI (International Statistical Institute), and AMIA (American Medical Informatics Association).