Siyu Heng
Siyu Heng
Assistant Professor of Biostatistics
-
Professional overview
-
Siyu Heng, PhD is an Assistant Professor in the Department of Biostatistics, with an interest in both methodology research and applied research. His areas of expertise are in causal inference, health data science, observational studies, randomized trials, sensitivity analysis, instrumental variables, measurement error, and in survey data and their applications in public health.
Dr. Heng’s research has been published in the Journal of the Royal Statistical Society and in Physical Review, among others. He has been recognized with several awards, including the IPUMS Global Health Research Award for the Best Student Paper; the Lawrence D. Brown Best Paper Award; the ASA Mental Health Statistics Section Student Paper Award; the ENAR Distinguished Student Paper Award; the NESS Student Research Award, and the Wellcome Trust Data Reuse Prize.
Dr. Heng received his PhD in applied mathematics and computational science from the University of Pennsylvania, and his BA in statistics from Nanjing University.
-
Education
-
PhD Candidate, Applied Mathematics and Computational Science (Statistics Track) University of PennsylvaniaBS, Mathematics, Nanjing University
-
Honors and awards
-
IPUMS Global Health Research Award for the Best Student Paper, Integrated Public Use Microdata Series (2021)IMS Hannan Graduate Student Travel Award, Institute of Mathematical Statistics (2021)ASA Mental Health Statistics Section Student Paper Award, American Statistical Association Section on Mental Health Statistics (2021)ENAR Distinguished Student Paper Award, International Biometric Society Eastern North American Region (2021)Wellcome Trust Data Reuse Prize: Malaria, Wellcome Trust (2019)Benjamin Franklin Fellowship, University of Pennsylvania School of Arts and Sciences (201620172018)
-
Areas of research and study
-
Causal InferenceEpidemiologyGlobal HealthHealth EquityInstrumental VariablesObservational StudiesPublic Health PolicyRandomized ExperimentationSocial Sciences
-
Publications
Publications
Bias correction for randomization-based estimation in inexactly matched observational studies
AbstractHeng, S., Zhu, J., & Heng, S. (n.d.).Publication year
2023Journal title
In Preparation for Submission to International Journal of EpidemiologyAbstract~Contributed Talk at the CCI Seminar, University of Pennsylvania.
AbstractHeng, S. (n.d.).Publication year
2023Abstract~Contributed Talk at the CCI Seminar, University of Pennsylvania.
AbstractHeng, S. (n.d.).Publication year
2023Abstract~DELTA : Dual Consistency Delving with Topological Uncertainty for Active Graph Domain Adaptation
AbstractWang, P., Cao, Y., Russell, C., Shen, Y., Luo, J., Zhang, M., Heng, S., & Luo, X. (n.d.).Publication year
2025Journal title
Transactions on Machine Learning ResearchVolume
2025-FebruaryPage(s)
1-26AbstractGraph domain adaptation has recently enabled knowledge transfer across different graphs. However, without the semantic information on target graphs, the performance on target graphs is still far from satisfactory. To address the issue, we study the problem of active graph domain adaptation, which selects a small quantitative of informative nodes on the target graph for extra annotation. This problem is highly challenging due to the complicated topological relationships and the distribution discrepancy across graphs. In this paper, we propose a novel approach named Dual Consistency Delving with Topological Uncertainty (DELTA) for active graph domain adaptation. Our DELTA consists of an edge-oriented graph subnetwork and a path-oriented graph subnetwork, which can explore topological semantics from complementary perspectives. In particular, our edge-oriented graph subnetwork utilizes the message passing mechanism to learn neighborhood information, while our path-oriented graph subnetwork explores high-order relationships from sub-structures. To jointly learn from two subnetworks, we roughly select informative candidate nodes with the consideration of consistency across two subnetworks. Then, we aggregate local semantics from its K-hop subgraph based on node degrees for topological uncertainty estimation. To overcome potential distribution shifts, we compare target nodes and their corresponding source nodes for discrepancy scores as an additional component for fine selection. Extensive experiments on benchmark datasets demonstrate that DELTA outperforms various state-of-the-art approaches. The code implementation of DELTA is available at https://github.com/goose315/DELTA.Design-based causal inference with missing outcomes: Missingness mechanisms, imputation-assisted randomization tests, and covariate adjustment
AbstractHeng, S., Heng, S., Zhang, J., & Feng, Y. (n.d.).Publication year
2023Journal title
Ready for Submission to Journal of American Statistical AssociationAbstract~Effects of behavioral intervention components to increase COVID-19 testing for African American/Black and Latine frontline essential workers not up-to-date on COVID-19 vaccination : Results of an optimization randomized controlled trial
AbstractGwadz, M., Heng, S., Cleland, C. M., Strayhorn, J., Robinson, J. A., Serrano, F. G., Wang, P., Parameswaran, L., & Chero, R. (n.d.).Publication year
2025Journal title
Journal of Behavioral MedicineAbstractRacial/ethnic disparities in COVID-19, including incidence, hospitalization, and death rates, are serious and persistent. Among those at highest risk for COVID-19 and its adverse effects are African American/Black and Latine (AABL) frontline essential workers in public-facing occupations (e.g., food services, retail). Testing for COVID-19 in various scenarios (when exposed or symptomatic, regular screening testing) is an essential component of the COVID-19 control strategy in the United States. However, AABL frontline workers have serious barriers to COVID-19 testing at the individual (insufficient knowledge, distrust, cognitive biases), social (norms), and structural levels of influence (access). Thus, testing rates are insufficient and interventions are needed. The present study is grounded in the multiphase optimization strategy (MOST) framework. It tests the main and interaction effects of a set of candidate behavioral intervention components to increase COVID-19 testing rates in this population. The study enrolled adult AABL frontline essential workers who were not up-to-date on COVID-19 vaccination nor recently tested for COVID-19. It used a factorial design to examine the effects of candidate behavioral intervention components, where each component was designed to address a specific barrier to COVID-19 testing. All participants received a core intervention comprised of health education. The candidate components were motivational interviewing counseling (MIC), a behavioral economics intervention (BEI), peer education (PE), and access to testing (either self-test kits [SK] or a navigation meeting [NM]). The primary outcome was COVID-19 testing in the follow-up period. Participants were assessed at baseline, randomly assigned to one of 16 experimental conditions, and assessed six- and 12-weeks later. The study was carried out in English and Spanish. We used a logistic regression model and multiple imputation to examine the main and interaction effects of the four factors (representing components): MIC, BEI, PE, and Access. We also conducted a sensitivity analysis using the complete case analysis. Participants (N = 438) were 35 years old on average (SD = 10). Half identified as men/male (52%), and 48% as women/female/other. Almost half (49%) were African American/Black, and 51% were Latine/Hispanic (12% participated in Spanish). A total of 32% worked in food services. Attendance in components was very high (~ 99%). BEI had positive effect on the outcome (OR = 1.543; 95% CI: [0.977, 2.438]; p-value = 0.063) as did Access, in favor of SK (OR = 1.351; 95% CI: [0.859, 2.125]; p-value = 0.193). We found a three-way interaction among MIC*PE*Access (OR: 0.576; 95% CI: [0.367, 0.903]; p-value = 0.016): when MIC was present, SK tended to increase COVID testing when PE was not present. The study advances intervention science and takes the first step toward creating an efficient and effective multi-component intervention to increase COVID-19 testing rates in AABL frontline workers.Instrumental variables : to strengthen or not to strengthen?
AbstractHeng, S., Zhang, B., Han, X., Lorch, S. A., & Small, D. S. (n.d.).Publication year
2023Journal title
Journal of the Royal Statistical Society. Series A: Statistics in SocietyVolume
186Issue
4Page(s)
852-873AbstractInstrumental variables (IVs) are extensively used to handle unmeasured confounding. However, weak IVs may cause problems. Many matched studies have considered strengthening an IV through discarding some of the sample. It is widely accepted that strengthening an IV tends to increase the power of non-parametric tests and sensitivity analyses. We re-evaluate this conventional wisdom and offer new insights. First, we evaluate the trade-off between IV strength and sample size assuming a valid IV and exhibit conditions under which strengthening an IV increases power. Second, we derive a criterion for checking the validity of a sensitivity analysis model with a continuous dose and show that the widely used Γ sensitivity analysis model, which was used to argue that strengthening an IV increases the power of sensitivity analyses in large samples, does not work for continuous IVs. Third, we quantify the bias of the Wald estimator with a possibly invalid IV and leverage it to develop a valid sensitivity analysis framework and show that strengthening an IV may or may not increase the power of sensitivity analyses. We use our framework to study the effect on premature babies of being delivered in a high technology/high volume neonatal intensive care unit.Instrumental variables: to strengthen or not to strengthen?
AbstractHeng, S., Heng, S., Zhang, B., Han, X., Lorch, S. A., & Small, D. S. (n.d.).Publication year
2023Journal title
Journal of the Royal Statistical Society: Series A (Statistics in Society)Volume
186Issue
4Page(s)
852–873Abstract~Invited Talk at Doctoral Seminar, School of Global Public Health, New York University
AbstractHeng, S. (n.d.).Publication year
2023Abstract~Invited Talk at JSM 2023 (joint with Dr. Hyunseung Kang).
AbstractHeng, S. (n.d.).Publication year
2023Abstract~Invited Talk at ``Dose Finding in Drug Development and Beyond" -- Conference Honoring Dr. Naitee Ting’s 70th Birthday, Storrs, U.S.A.
AbstractHeng, S. (n.d.).Publication year
2023Abstract~Maximizing the reach of universal child sexual abuse prevention : Protocol for an equivalence trial
AbstractGuastaferro, K., Melchior, M. S., Heng, S., Trudeau, J., & Holloway, J. L. (n.d.).Publication year
2024Journal title
Contemporary Clinical Trials CommunicationsVolume
41AbstractBackground: Child sexual abuse (CSA) affects 1 in 5 girls and 1 in 12 boys before age 18. Universal school-based prevention programs are an effective and cost-efficient method of teaching students an array of personal safety skills. However, the programmatic reach of universal school-based programs is limited by the inherent reliance on the school infrastructure and a dearth of available alternative delivery modalities. Methods: The design for this study will use a rigorous cluster randomized design (N = 180 classrooms) to determine the equivalence of two delivery modalities of Safe Touches: as usual vs. modified. The as usual workshop will be delivered by two facilitators with live puppet skits (n = 90). Whereas, the modified workshop will be delivered by one facilitator using prerecorded skit videos (n = 90). We will determine the equivalence by measuring concept learning acquisition preworkshop to immediate postworkshop (Aim 1) and retention at 3-months postworkshop (Aim 2) among students in classrooms that receive the as usual or modified workshops. To conclude equivalence, it is imperative to also examine factors that may impact future dissemination and implementation, specifically program adoption among school personnel and implementation fidelity between the two modalities (Aim 3). Conclusion: Study findings will inform the ongoing development of effective CSA prevention programs and policy decisions regarding the sustainable integration of such programs within schools. Clinical trial registration: NCT06195852.Maximizing the reach of universal child sexual abuse prevention: Protocol for an equivalence trial
AbstractGuastaferro, K., Melchior, M. S., Heng, S., Trudeau, J., & Holloway, J. L. (n.d.).Publication year
2024Journal title
Contemporary Clinical Trials CommunicationsAbstractBackground Child sexual abuse (CSA) affects 1 in 5 girls and 1 in 12 boys before age 18. Universal school-based prevention programs are an effective and cost-efficient method of teaching students an array of personal safety skills. However, the programmatic reach of universal school-based programs is limited by the inherent reliance on the school infrastructure and a dearth of available alternative delivery modalities. Methods The design for this study, Roads to Impact, will use a rigorous cluster randomized design (N = 180 classrooms) to determine the equivalence of two delivery modalities of Safe Touches: as usual vs. modified. The usual workshop will be delivered by two facilitators with live puppet skits, as designed (n=90). Whereas, the modified workshop will be delivered by one facilitator using prerecorded skit videos (n=90). We will determine the equivalence by measuring concept learning acquisition preworkshop to immediate postworkshop (Aim 1) and retention at 3-months postworkshop (Aim 2) among students in classrooms that receive the as usual or modified workshops. To conclude equivalence, it is imperative to also examine factors that may impact future dissemination and implementation, specifically program adoption among school personnel and implementation fidelity between the two modalities (Aim 3). Conclusion Study findings will inform the ongoing development of effective CSA prevention programs and policy decisions regarding the sustainable integration of such programs within schools.Parent-daughter agreement about HPV vaccination status in Kenya and Malawi
AbstractMoucheraud, C., Ochieng, E., Kweka, A., Wang, P., Xie, S., Ototo, J., Golub, G., Kapindo, E., Banda, E., Abdillahi, H., Szilagyi, P. G., & Heng, S. (n.d.).Publication year
2025Journal title
VaccineVolume
55AbstractBackground: As more countries introduce the HPV vaccine, it is important to understand the validity of vaccination measures. This is especially true in low- and middle-income countries (LMICs) where public health monitoring of vaccination data may have delays or gaps, so alternative measurement approaches are often necessary. Parental report is a common approach for measuring routine childhood vaccination, but it has not been evaluated for HPV vaccination in LMICs. Methods: We conducted household surveys in Kenya (n = 146) and Malawi (n = 98) with parents/guardians and their daughters who were age-eligible for HPV vaccination. We compared parents'/guardians' reports of HPV vaccination status to daughters' reports; the latter was assumed to be the “gold standard” measure. Results: 88 % of Kenyan parents/guardians and 82 % of Malawian parents/guardians agreed with their daughters' reported HPV vaccination status. It was more common for parents/guardians to under-report (i.e., to say their daughter was unvaccinated but the girl said she had received dose(s)) than the inverse. Agreement with one's daughter was higher among parents/guardians who reported data from vaccination cards versus using recall, and among parents/guardians who expressed more versus less confidence in their knowledge. We did not find many differences in accuracy of report by parent/guardian characteristics, although in Kenya there were small and statistically significant negative associations with parental age, household income, and more girls in the household (the latter was also significantly negatively associated with report accuracy in Malawi). Conclusions: In countries where surveys will commonly be used to measure HPV vaccination status, we found very high agreement of parents/guardians with their daughters' reported receipt of the vaccine. These results are similar to findings from the literature about routine childhood vaccination measurement. This suggests that researchers, clinicians, and practitioners can use parent/guardian-reported HPV vaccination of their daughter as a relatively good proxy of her own reported immunization status especially in settings without universal use of vaccination cards or registries.Parent-daughter agreement about HPV vaccination status in Kenya and Malawi
AbstractMoucheraud, C., Ochieng, E., Kweka, A., Wang, P., Xie, S., Ototo, J., Kapindo, E., Banda, E., Szilagyi, P., & Heng, S. (n.d.).Publication year
2024Journal title
In Preparation for Submission to VaccineAbstract~Re-evaluating the impact of changing malaria burden on birth weight in sub-Saharan Africa: A pair-of-pairs study via optimal bipartite and non-bipartite matching
AbstractWang, P., Huang, P., Shen, Y., Shahawy, O. E., O’Meara, W. P., & Heng, S. (n.d.).Publication year
2024Journal title
In Preparation for Submission to Journal of American Statistical AssociationAbstract~Sensitivity Analysis for Binary Outcome Misclassification in Randomization Tests via Integer Programming
AbstractHeng, S., & Shaw, P. A. (n.d.).Publication year
2025Journal title
Journal of Computational and Graphical StatisticsAbstractConducting a randomization test is a common method for testing causal null hypotheses in randomized experiments. The popularity of randomization tests is largely because their statistical validity only depends on the randomization design, and no distributional or modeling assumption on the outcome variable is needed. However, randomization tests may still suffer from other sources of bias, among which outcome misclassification is a significant one. We propose a model-free and finite-population sensitivity analysis approach for binary outcome misclassification in randomization tests. A central quantity in our framework is “warning accuracy,” defined as the threshold such that a randomization test result based on the measured outcomes may differ from that based on the true outcomes if the outcome measurement accuracy did not surpass that threshold. We show how learning the warning accuracy and related concepts can amplify analyses of randomization tests subject to outcome misclassification without adding additional assumptions. We show that the warning accuracy can be computed efficiently for large datasets by adaptively reformulating a large-scale integer program with respect to the randomization design. We apply the proposed approach to the Prostate Cancer Prevention Trial (PCPT). We also developed an open-source (Formula presented.) package for implementation of our approach. Supplementary materials for this article are available online.Sensitivity analysis for matched observational studies with continuous exposures and binary outcomes
AbstractHeng, S., Zhang, J., Small, D., & Heng, S. (n.d.).Publication year
2023Journal title
Major Revision Invited by BiometrikaAbstract~Sensitivity analysis for matched observational studies with continuous exposures and binary outcomes
AbstractZhang, J., Small, D. S., & Heng, S. (n.d.).Publication year
2024Journal title
BiometrikaVolume
111Issue
4Page(s)
1349-1368AbstractMatching is one of the most widely used study designs for adjusting for measured confounders in observational studies. However, unmeasured confounding may exist and cannot be removed by matching. Therefore, a sensitivity analysis is typically needed to assess a causal conclusion's sensitivity to unmeasured confounding. Sensitivity analysis frameworks for binary exposures have been well established for various matching designs and are commonly used in various studies. However, unlike the binary exposure case, there still lacks valid and general sensitivity analysis methods for continuous exposures, except in some special cases such as pair matching. To fill this gap in the binary outcome case, we develop a sensitivity analysis framework for general matching designs with continuous exposures and binary outcomes. First, we use probabilistic lattice theory to show that our sensitivity analysis approach is finite population exact under Fisher's sharp null. Second, we prove a novel design sensitivity formula as a powerful tool for asymptotically evaluating the performance of our sensitivity analysis approach. Third, to allow effect heterogeneity with binary outcomes, we introduce a framework for conducting asymptotically exact inference and sensitivity analysis on generalized attributable effects with binary outcomes via mixed-integer programming. Fourth, for the continuous outcome case, we show that conducting an asymptotically exact sensitivity analysis in matched observational studies when both the exposures and outcomes are continuous is generally NP-hard, except in some special cases such as pair matching. As a real data application, we apply our new methods to study the effect of early-life lead exposure on juvenile delinquency. An implementation of the methods in this work is available in the R package doseSens.Sensitivity analysis for outcome misclassification in randomization tests via integer programming
AbstractHeng, S., Heng, S., & Shaw, P. A. (n.d.).Publication year
2023Journal title
Major Revision Invited by Journal of Computational and Graphical StatisticsAbstract~Social distancing and COVID-19: Randomization inference for a structured dose-response relationship
AbstractHeng, S., Zhang, B., Heng, S., Ye, T., & Small, D. S. (n.d.).Publication year
2023Journal title
The Annals of Applied StatisticsVolume
17Issue
1Page(s)
23--46Abstract~Testing biased randomization assumptions and quantifying imperfect matching and residual confounding in matched observational studies
AbstractHeng, S., Chen, K., Heng, S., Long, Q., & Zhang, B. (n.d.).Publication year
2023Journal title
Journal of Computational and Graphical StatisticsVolume
32Issue
2Page(s)
528--538Abstract~The central role of the propensity score in sensitivity analysis for matched observational studies
AbstractHeng, S., & Heng, S. (n.d.).Publication year
2023Journal title
Observational StudiesVolume
9Issue
1Page(s)
35--41Abstract~Valid randomization tests in inexactly matched observational studies via iterative convex programming
AbstractHeng, S., Heng, S., Shen, Y., & Wang, P. (n.d.).Publication year
2023Journal title
In Preparation for Submission to BiometrikaAbstract~