Siyu Heng

Siyu Heng

Siyu Heng

Scroll

Assistant Professor of Biostatistics

Professional overview

Siyu Heng, PhD is an Assistant Professor in the Department of Biostatistics, with an interest in both methodology research and applied research. His areas of expertise are in causal inference, health data science, observational studies, randomized trials, sensitivity analysis, instrumental variables, measurement error, and in survey data and their applications in public health.

Dr. Heng’s research has been published in the Journal of the Royal Statistical Society and in Physical Review, among others. He has been recognized with several awards, including the IPUMS Global Health Research Award for the Best Student Paper; the Lawrence D. Brown Best Paper Award; the ASA Mental Health Statistics Section Student Paper Award; the ENAR Distinguished Student Paper Award; the NESS Student Research Award, and the Wellcome Trust Data Reuse Prize.

Dr. Heng received his PhD in applied mathematics and computational science from the University of Pennsylvania, and his BA in statistics from Nanjing University.

Education

PhD Candidate, Applied Mathematics and Computational Science (Statistics Track) University of Pennsylvania
BS, Mathematics, Nanjing University

Honors and awards

IPUMS Global Health Research Award for the Best Student Paper, Integrated Public Use Microdata Series (2021)
IMS Hannan Graduate Student Travel Award, Institute of Mathematical Statistics (2021)
ASA Mental Health Statistics Section Student Paper Award, American Statistical Association Section on Mental Health Statistics (2021)
ENAR Distinguished Student Paper Award, International Biometric Society Eastern North American Region (2021)
Wellcome Trust Data Reuse Prize: Malaria, Wellcome Trust (2019)
Benjamin Franklin Fellowship, University of Pennsylvania School of Arts and Sciences (201620172018)

Areas of research and study

Causal Inference
Epidemiology
Global Health
Health Equity
Instrumental Variables
Observational Studies
Public Health Policy
Randomized Experimentation
Social Sciences

Publications

Publications

Bias correction for randomization-based estimation in inexactly matched observational studies

Heng, S., Zhu, J., & Heng, S. (n.d.).

Publication year

2023

Journal title

In Preparation for Submission to International Journal of Epidemiology
Abstract
Abstract
~

Contributed Talk at the 2023 ENAR.

Heng, S. (n.d.).

Publication year

2023
Abstract
Abstract
~

Contributed Talk at the CCI Seminar, University of Pennsylvania.

Heng, S. (n.d.).

Publication year

2023
Abstract
Abstract
~

Contributed Talk at the CCI Seminar, University of Pennsylvania.

Heng, S. (n.d.).

Publication year

2023
Abstract
Abstract
~

DELTA : Dual Consistency Delving with Topological Uncertainty for Active Graph Domain Adaptation

Wang, P., Cao, Y., Russell, C., Shen, Y., Luo, J., Zhang, M., Heng, S., & Luo, X. (n.d.).

Publication year

2025

Journal title

Transactions on Machine Learning Research

Volume

2025-February

Page(s)

1-26
Abstract
Abstract
Graph domain adaptation has recently enabled knowledge transfer across different graphs. However, without the semantic information on target graphs, the performance on target graphs is still far from satisfactory. To address the issue, we study the problem of active graph domain adaptation, which selects a small quantitative of informative nodes on the target graph for extra annotation. This problem is highly challenging due to the complicated topological relationships and the distribution discrepancy across graphs. In this paper, we propose a novel approach named Dual Consistency Delving with Topological Uncertainty (DELTA) for active graph domain adaptation. Our DELTA consists of an edge-oriented graph subnetwork and a path-oriented graph subnetwork, which can explore topological semantics from complementary perspectives. In particular, our edge-oriented graph subnetwork utilizes the message passing mechanism to learn neighborhood information, while our path-oriented graph subnetwork explores high-order relationships from sub-structures. To jointly learn from two subnetworks, we roughly select informative candidate nodes with the consideration of consistency across two subnetworks. Then, we aggregate local semantics from its K-hop subgraph based on node degrees for topological uncertainty estimation. To overcome potential distribution shifts, we compare target nodes and their corresponding source nodes for discrepancy scores as an additional component for fine selection. Extensive experiments on benchmark datasets demonstrate that DELTA outperforms various state-of-the-art approaches. The code implementation of DELTA is available at https://github.com/goose315/DELTA.

Design-based causal inference with missing outcomes: Missingness mechanisms, imputation-assisted randomization tests, and covariate adjustment

Heng, S., Heng, S., Zhang, J., & Feng, Y. (n.d.).

Publication year

2023

Journal title

Ready for Submission to Journal of American Statistical Association
Abstract
Abstract
~

Effects of behavioral intervention components to increase COVID-19 testing for African American/Black and Latine frontline essential workers not up-to-date on COVID-19 vaccination : Results of an optimization randomized controlled trial

Gwadz, M., Heng, S., Cleland, C. M., Strayhorn, J., Robinson, J. A., Serrano, F. G., Wang, P., Parameswaran, L., & Chero, R. (n.d.).

Publication year

2025

Journal title

Journal of Behavioral Medicine
Abstract
Abstract
Racial/ethnic disparities in COVID-19, including incidence, hospitalization, and death rates, are serious and persistent. Among those at highest risk for COVID-19 and its adverse effects are African American/Black and Latine (AABL) frontline essential workers in public-facing occupations (e.g., food services, retail). Testing for COVID-19 in various scenarios (when exposed or symptomatic, regular screening testing) is an essential component of the COVID-19 control strategy in the United States. However, AABL frontline workers have serious barriers to COVID-19 testing at the individual (insufficient knowledge, distrust, cognitive biases), social (norms), and structural levels of influence (access). Thus, testing rates are insufficient and interventions are needed. The present study is grounded in the multiphase optimization strategy (MOST) framework. It tests the main and interaction effects of a set of candidate behavioral intervention components to increase COVID-19 testing rates in this population. The study enrolled adult AABL frontline essential workers who were not up-to-date on COVID-19 vaccination nor recently tested for COVID-19. It used a factorial design to examine the effects of candidate behavioral intervention components, where each component was designed to address a specific barrier to COVID-19 testing. All participants received a core intervention comprised of health education. The candidate components were motivational interviewing counseling (MIC), a behavioral economics intervention (BEI), peer education (PE), and access to testing (either self-test kits [SK] or a navigation meeting [NM]). The primary outcome was COVID-19 testing in the follow-up period. Participants were assessed at baseline, randomly assigned to one of 16 experimental conditions, and assessed six- and 12-weeks later. The study was carried out in English and Spanish. We used a logistic regression model and multiple imputation to examine the main and interaction effects of the four factors (representing components): MIC, BEI, PE, and Access. We also conducted a sensitivity analysis using the complete case analysis. Participants (N = 438) were 35 years old on average (SD = 10). Half identified as men/male (52%), and 48% as women/female/other. Almost half (49%) were African American/Black, and 51% were Latine/Hispanic (12% participated in Spanish). A total of 32% worked in food services. Attendance in components was very high (~ 99%). BEI had positive effect on the outcome (OR = 1.543; 95% CI: [0.977, 2.438]; p-value = 0.063) as did Access, in favor of SK (OR = 1.351; 95% CI: [0.859, 2.125]; p-value = 0.193). We found a three-way interaction among MIC*PE*Access (OR: 0.576; 95% CI: [0.367, 0.903]; p-value = 0.016): when MIC was present, SK tended to increase COVID testing when PE was not present. The study advances intervention science and takes the first step toward creating an efficient and effective multi-component intervention to increase COVID-19 testing rates in AABL frontline workers.

Instrumental variables : to strengthen or not to strengthen?

Heng, S., Zhang, B., Han, X., Lorch, S. A., & Small, D. S. (n.d.).

Publication year

2023

Journal title

Journal of the Royal Statistical Society. Series A: Statistics in Society

Volume

186

Issue

4

Page(s)

852-873
Abstract
Abstract
Instrumental variables (IVs) are extensively used to handle unmeasured confounding. However, weak IVs may cause problems. Many matched studies have considered strengthening an IV through discarding some of the sample. It is widely accepted that strengthening an IV tends to increase the power of non-parametric tests and sensitivity analyses. We re-evaluate this conventional wisdom and offer new insights. First, we evaluate the trade-off between IV strength and sample size assuming a valid IV and exhibit conditions under which strengthening an IV increases power. Second, we derive a criterion for checking the validity of a sensitivity analysis model with a continuous dose and show that the widely used Γ sensitivity analysis model, which was used to argue that strengthening an IV increases the power of sensitivity analyses in large samples, does not work for continuous IVs. Third, we quantify the bias of the Wald estimator with a possibly invalid IV and leverage it to develop a valid sensitivity analysis framework and show that strengthening an IV may or may not increase the power of sensitivity analyses. We use our framework to study the effect on premature babies of being delivered in a high technology/high volume neonatal intensive care unit.

Instrumental variables: to strengthen or not to strengthen?

Heng, S., Heng, S., Zhang, B., Han, X., Lorch, S. A., & Small, D. S. (n.d.).

Publication year

2023

Journal title

Journal of the Royal Statistical Society: Series A (Statistics in Society)

Volume

186

Issue

4

Page(s)

852–873
Abstract
Abstract
~

Invited Talk at Doctoral Seminar, School of Global Public Health, New York University

Heng, S. (n.d.).

Publication year

2023
Abstract
Abstract
~

Invited Talk at JSM 2023 (joint with Dr. Hyunseung Kang).

Heng, S. (n.d.).

Publication year

2023
Abstract
Abstract
~

Invited Talk at ``Dose Finding in Drug Development and Beyond" -- Conference Honoring Dr. Naitee Ting’s 70th Birthday, Storrs, U.S.A.

Heng, S. (n.d.).

Publication year

2023
Abstract
Abstract
~

Maximizing the reach of universal child sexual abuse prevention : Protocol for an equivalence trial

Guastaferro, K., Melchior, M. S., Heng, S., Trudeau, J., & Holloway, J. L. (n.d.).

Publication year

2024

Journal title

Contemporary Clinical Trials Communications

Volume

41
Abstract
Abstract
Background: Child sexual abuse (CSA) affects 1 in 5 girls and 1 in 12 boys before age 18. Universal school-based prevention programs are an effective and cost-efficient method of teaching students an array of personal safety skills. However, the programmatic reach of universal school-based programs is limited by the inherent reliance on the school infrastructure and a dearth of available alternative delivery modalities. Methods: The design for this study will use a rigorous cluster randomized design (N = 180 classrooms) to determine the equivalence of two delivery modalities of Safe Touches: as usual vs. modified. The as usual workshop will be delivered by two facilitators with live puppet skits (n = 90). Whereas, the modified workshop will be delivered by one facilitator using prerecorded skit videos (n = 90). We will determine the equivalence by measuring concept learning acquisition preworkshop to immediate postworkshop (Aim 1) and retention at 3-months postworkshop (Aim 2) among students in classrooms that receive the as usual or modified workshops. To conclude equivalence, it is imperative to also examine factors that may impact future dissemination and implementation, specifically program adoption among school personnel and implementation fidelity between the two modalities (Aim 3). Conclusion: Study findings will inform the ongoing development of effective CSA prevention programs and policy decisions regarding the sustainable integration of such programs within schools. Clinical trial registration: NCT06195852.

Maximizing the reach of universal child sexual abuse prevention: Protocol for an equivalence trial 

Guastaferro, K., Melchior, M. S., Heng, S., Trudeau, J., & Holloway, J. L. (n.d.).

Publication year

2024

Journal title

Contemporary Clinical Trials Communications
Abstract
Abstract
Background Child sexual abuse (CSA) affects 1 in 5 girls and 1 in 12 boys before age 18. Universal school-based prevention programs are an effective and cost-efficient method of teaching students an array of personal safety skills. However, the programmatic reach of universal school-based programs is limited by the inherent reliance on the school infrastructure and a dearth of available alternative delivery modalities. Methods The design for this study, Roads to Impact, will use a rigorous cluster randomized design (N = 180 classrooms) to determine the equivalence of two delivery modalities of Safe Touches: as usual vs. modified. The usual workshop will be delivered by two facilitators with live puppet skits, as designed (n=90). Whereas, the modified workshop will be delivered by one facilitator using prerecorded skit videos (n=90). We will determine the equivalence by measuring concept learning acquisition preworkshop to immediate postworkshop (Aim 1) and retention at 3-months postworkshop (Aim 2) among students in classrooms that receive the as usual or modified workshops. To conclude equivalence, it is imperative to also examine factors that may impact future dissemination and implementation, specifically program adoption among school personnel and implementation fidelity between the two modalities (Aim 3). Conclusion Study findings will inform the ongoing development of effective CSA prevention programs and policy decisions regarding the sustainable integration of such programs within schools.

Parent-daughter agreement about HPV vaccination status in Kenya and Malawi

Moucheraud, C., Ochieng, E., Kweka, A., Wang, P., Xie, S., Ototo, J., Golub, G., Kapindo, E., Banda, E., Abdillahi, H., Szilagyi, P. G., & Heng, S. (n.d.).

Publication year

2025

Journal title

Vaccine

Volume

55
Abstract
Abstract
Background: As more countries introduce the HPV vaccine, it is important to understand the validity of vaccination measures. This is especially true in low- and middle-income countries (LMICs) where public health monitoring of vaccination data may have delays or gaps, so alternative measurement approaches are often necessary. Parental report is a common approach for measuring routine childhood vaccination, but it has not been evaluated for HPV vaccination in LMICs. Methods: We conducted household surveys in Kenya (n = 146) and Malawi (n = 98) with parents/guardians and their daughters who were age-eligible for HPV vaccination. We compared parents'/guardians' reports of HPV vaccination status to daughters' reports; the latter was assumed to be the “gold standard” measure. Results: 88 % of Kenyan parents/guardians and 82 % of Malawian parents/guardians agreed with their daughters' reported HPV vaccination status. It was more common for parents/guardians to under-report (i.e., to say their daughter was unvaccinated but the girl said she had received dose(s)) than the inverse. Agreement with one's daughter was higher among parents/guardians who reported data from vaccination cards versus using recall, and among parents/guardians who expressed more versus less confidence in their knowledge. We did not find many differences in accuracy of report by parent/guardian characteristics, although in Kenya there were small and statistically significant negative associations with parental age, household income, and more girls in the household (the latter was also significantly negatively associated with report accuracy in Malawi). Conclusions: In countries where surveys will commonly be used to measure HPV vaccination status, we found very high agreement of parents/guardians with their daughters' reported receipt of the vaccine. These results are similar to findings from the literature about routine childhood vaccination measurement. This suggests that researchers, clinicians, and practitioners can use parent/guardian-reported HPV vaccination of their daughter as a relatively good proxy of her own reported immunization status especially in settings without universal use of vaccination cards or registries.

Parent-daughter agreement about HPV vaccination status in Kenya and Malawi

Moucheraud, C., Ochieng, E., Kweka, A., Wang, P., Xie, S., Ototo, J., Kapindo, E., Banda, E., Szilagyi, P., & Heng, S. (n.d.).

Publication year

2024

Journal title

In Preparation for Submission to Vaccine
Abstract
Abstract
~

Re-evaluating the impact of changing malaria burden on birth weight in sub-Saharan Africa: A pair-of-pairs study via optimal bipartite and non-bipartite matching

Wang, P., Huang, P., Shen, Y., Shahawy, O. E., O’Meara, W. P., & Heng, S. (n.d.).

Publication year

2024

Journal title

In Preparation for Submission to Journal of American Statistical Association
Abstract
Abstract
~

Sensitivity Analysis for Binary Outcome Misclassification in Randomization Tests via Integer Programming

Heng, S., & Shaw, P. A. (n.d.).

Publication year

2025

Journal title

Journal of Computational and Graphical Statistics
Abstract
Abstract
Conducting a randomization test is a common method for testing causal null hypotheses in randomized experiments. The popularity of randomization tests is largely because their statistical validity only depends on the randomization design, and no distributional or modeling assumption on the outcome variable is needed. However, randomization tests may still suffer from other sources of bias, among which outcome misclassification is a significant one. We propose a model-free and finite-population sensitivity analysis approach for binary outcome misclassification in randomization tests. A central quantity in our framework is “warning accuracy,” defined as the threshold such that a randomization test result based on the measured outcomes may differ from that based on the true outcomes if the outcome measurement accuracy did not surpass that threshold. We show how learning the warning accuracy and related concepts can amplify analyses of randomization tests subject to outcome misclassification without adding additional assumptions. We show that the warning accuracy can be computed efficiently for large datasets by adaptively reformulating a large-scale integer program with respect to the randomization design. We apply the proposed approach to the Prostate Cancer Prevention Trial (PCPT). We also developed an open-source (Formula presented.) package for implementation of our approach. Supplementary materials for this article are available online.

Sensitivity analysis for matched observational studies with continuous exposures and binary outcomes

Heng, S., Zhang, J., Small, D., & Heng, S. (n.d.).

Publication year

2023

Journal title

Major Revision Invited by Biometrika
Abstract
Abstract
~

Sensitivity analysis for matched observational studies with continuous exposures and binary outcomes

Zhang, J., Small, D. S., & Heng, S. (n.d.).

Publication year

2024

Journal title

Biometrika

Volume

111

Issue

4

Page(s)

1349-1368
Abstract
Abstract
Matching is one of the most widely used study designs for adjusting for measured confounders in observational studies. However, unmeasured confounding may exist and cannot be removed by matching. Therefore, a sensitivity analysis is typically needed to assess a causal conclusion's sensitivity to unmeasured confounding. Sensitivity analysis frameworks for binary exposures have been well established for various matching designs and are commonly used in various studies. However, unlike the binary exposure case, there still lacks valid and general sensitivity analysis methods for continuous exposures, except in some special cases such as pair matching. To fill this gap in the binary outcome case, we develop a sensitivity analysis framework for general matching designs with continuous exposures and binary outcomes. First, we use probabilistic lattice theory to show that our sensitivity analysis approach is finite population exact under Fisher's sharp null. Second, we prove a novel design sensitivity formula as a powerful tool for asymptotically evaluating the performance of our sensitivity analysis approach. Third, to allow effect heterogeneity with binary outcomes, we introduce a framework for conducting asymptotically exact inference and sensitivity analysis on generalized attributable effects with binary outcomes via mixed-integer programming. Fourth, for the continuous outcome case, we show that conducting an asymptotically exact sensitivity analysis in matched observational studies when both the exposures and outcomes are continuous is generally NP-hard, except in some special cases such as pair matching. As a real data application, we apply our new methods to study the effect of early-life lead exposure on juvenile delinquency. An implementation of the methods in this work is available in the R package doseSens.

Sensitivity analysis for outcome misclassification in randomization tests via integer programming

Heng, S., Heng, S., & Shaw, P. A. (n.d.).

Publication year

2023

Journal title

Major Revision Invited by Journal of Computational and Graphical Statistics
Abstract
Abstract
~

Social distancing and COVID-19: Randomization inference for a structured dose-response relationship

Heng, S., Zhang, B., Heng, S., Ye, T., & Small, D. S. (n.d.).

Publication year

2023

Journal title

The Annals of Applied Statistics

Volume

17

Issue

1

Page(s)

23--46
Abstract
Abstract
~

Testing biased randomization assumptions and quantifying imperfect matching and residual confounding in matched observational studies

Heng, S., Chen, K., Heng, S., Long, Q., & Zhang, B. (n.d.).

Publication year

2023

Journal title

Journal of Computational and Graphical Statistics

Volume

32

Issue

2

Page(s)

528--538
Abstract
Abstract
~

The central role of the propensity score in sensitivity analysis for matched observational studies

Heng, S., & Heng, S. (n.d.).

Publication year

2023

Journal title

Observational Studies

Volume

9

Issue

1

Page(s)

35--41
Abstract
Abstract
~

Valid randomization tests in inexactly matched observational studies via iterative convex programming

Heng, S., Heng, S., Shen, Y., & Wang, P. (n.d.).

Publication year

2023

Journal title

In Preparation for Submission to Biometrika
Abstract
Abstract
~

Contact

siyuheng@nyu.edu 708 Broadway New York, NY, 10003