Shu (Violet) Xu
Clinical Assistant Professor of Biostatistics
Dr. Shu Xu’s work represents a balance of both statistical and applied aspects of quantitative methodology. Her primary quantitative interests include evaluating and developing statistical methods for longitudinal data analysis. Specifically, Dr. Xu’s research focuses on various aspects of latent growth models, missing data methods, and causal inference models.
Dr. Xu has collaborated with substance use, family, and health researchers to advance and share her knowledge of quantitative methodology and pursue a better understanding of the social sciences and public health. She has conducted research with the Family Translational Research Group at NYU and the Methodology Center at the Pennsylvania State University.
BS, Psychology, East China Normal University, Shanghai, ChinaMS, Quantitative Psychology, University of California, DavisPhD, Quantitative Psychology, University of California, Davis
BiostatisticsFamily researchLongitudinal Data AnalysisMissing Data MethodsMixture ModelsQuantitative Research
Using Security Questions to Link Participants in Longitudinal Data CollectionXu, S., Chan, A., Lorber, M. F., & Chase, J. P.
Journal titlePrevention Science
Page(s)194-202Anonymous data collection systems are often necessary when assessing sensitive behaviors but can pose challenges to researchers seeking to link participants over time. To assist researchers in anonymously linking participants, we outlined and tested a novel security question linking (security question linking; SEEK) method. The SEEK method includes four steps: (1) data management and standardization, (2) many-to-many matching, (3) fuzzy matching, and (4) rematching and verification. The method is demonstrated in SAS with two samples from a longitudinal study of adolescent dating violence. After an initial assessment during a laboratory visit, participants were asked to complete an online assessment either (a) once, 3 months later (Sample 1, n = 60), or (b) three times at 1-month intervals (Sample 2, n = 140). Demographics, eye color, and responses to nine security questions were used as key variables to link responses from the laboratory and online follow-up assessments. The rates of matched cases were 100% in Sample 1 and from 94.3 to 98.3% in Sample 2. To quantify the confidence in the data quality of successfully matched pairs, we reported the means and standard deviations of the number of matched security questions. In addition, we reported the rank order and counts of the mismatched components in key variables. Results indicate that the SEEK method provides a feasible and reliable solution to link responses in longitudinal studies with sensitive questions.
A Randomized, Controlled Trial of the Impact of the Couple CARE for Parents of Newborns Program on the Prevention of Intimate Partner Violence and Relationship ProblemsHeyman, R. E., Slep, A. M., Lorber, M. F., Mitnick, D. M., Xu, S., Baucom, K. J., Halford, W. K., & Niolon, P. H.
Journal titlePrevention Science
Page(s)620-631Effective, accessible prevention programs are needed for adults at heightened risk for intimate partner violence (IPV). This parallel group randomized controlled trial examines whether such couples receiving the American version of Couple CARE for Parents of Newborns (CCP; Halford et al. 2009) following the birth of a child, compared with controls, report fewer first occurrences of clinically significant IPV, less frequent physical and psychological IPV, and improved relationship functioning. Further, we test whether intervention effects are moderated by level of risk for IPV. Couples at elevated risk for IPV (N = 368) recruited from maternity units were randomized to CCP (n = 188) or a 24-month waitlist (n = 180) and completed measures of IPV and relationship functioning at baseline, post-program (when child was 8 months old), and two follow-ups (at 15 and 24 months). Intervention effects were tested using intent to treat (ITT) as well as complier average causal effect (CACE; Jo and Muthén 2001) structural equation models. CCP did not significantly prevent clinically significant IPV nor were there significant main effects of CCP on clinically significant IPV, frequency of IPV, or most relationship outcomes in the CACE or ITT analyses. Risk moderated the effect of CCP on male-to-female physical IPV at post-program, with couples with a planned pregnancy declining, but those with unplanned pregnancies increasing. This study adds to previous findings that prevention programs for at-risk couples are not often effective and may even be iatrogenic for some couples.
Patterns of psychological health problems and family maltreatment among United States Air Force membersLorber, M. F., Xu, S., Heyman, R. E., Slep, A. M., & Beauchaine, T. P.
Journal titleJournal of Clinical Psychology
Page(s)1258-1271Objectives:: We sought to identify subgroups of individuals based on patterns of psychological health problems (PH; e.g., depressive symptoms, hazardous drinking) and family maltreatment (FM; e.g., child and partner abuse). Method:: We analyzed data from very large surveys of United States Air Force active duty members with romantic partners and children. Results:: Latent class analyses indicated six replicable patterns of PH problems and FM. Five of these classes, representing ∼98% of survey participants, were arrayed ordinally, with increasing risk of multiple PH problems and FM. A sixth group defied this ordinal pattern, with pronounced rates of FM and externalizing PH problems, but without correspondingly high rates/levels of internalizing PH problems. Conclusions:: Ramifications of these results for intervention are discussed.
A New Look at the Psychometrics of the Parenting Scale Through the Lens of Item Response TheoryLorber, M. F., Xu, S., Slep, A. M., Bulling, L., & O’Leary, S. G.
Journal titleJournal of Clinical Child and Adolescent Psychology
Page(s)613-626The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on 2 community samples of cohabiting parents of 3- to 8-year-old children, combined to yield a total sample size of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater 6-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development.
Interrater agreement statistics with skewed data: Evaluation of alternatives to Cohen's kappaXu, S., & Lorber, M. F.
Journal titleJournal of consulting and clinical psychology
Page(s)1219-1227Objective: In this study, we aimed to evaluate interrater agreement statistics (IRAS) for use in research on low base rate clinical diagnoses or observed behaviors. Establishing and reporting sufficient interrater agreement is essential in such studies. Yet the most commonly applied agreement statistic, Cohen's, has a well known sensitivity to base rates that results in a substantial penalization of interrater agreement when behaviors or diagnoses are very uncommon, a prevalent and frustrating concern in such studies. Method: We performed Monte Carlo simulations to evaluate the performance of 5 of κ's alternatives (Van Eerdewegh's V, Yule's Y, Holley and Guilford's G, Scott's π, and Gwet's AC1), alongside κ itself. The simulations investigated the robustness of these IRAS to conditions that are common in clinical research, with varying levels of behavior or diagnosis base rate, rater bias, observed interrater agreement, and sample size. Results: When the base rate was 0.5, each IRAS provided similar estimates, particularly with unbiased raters. G was the least sensitive of the IRAS to base rates. Conclusions: The results encourage the use of the G statistic for its consistent performance across the simulation conditions. We recommend separately reporting the rates of agreement on the presence and absence of a behavior or diagnosis alongside G as an index of chance corrected overall agreement.
Noxious family environments in relation to adult and childhood cariesLorber, M. F., Slep, A. M., Heyman, R. E., Xu, S., Dasanayake, A. P., & Wolff, M. S.
Journal titleJournal of the American Dental Association
Page(s)924-930Background. The authors tested hypotheses that more noxious family environments are associated with poorer adult and child oral health. Methods. A community sample of married or cohabiting couples (N = 135) and their elementary school-aged children participated. Dental hygienists determined the number of decayed, missing and filled surfaces via oral examination. Subjective oral health impacts were measured by means of questionnaires completed by the parents and children. The parents completed questionnaires about interparental and parent-to-child physical aggression (for example, pushing) and emotional aggression (for example, derision), as well as harsh discipline. Observers rated the couples' hostile behavior in laboratory interactions. Results. The extent of women's and men's caries experience was associated positively with their partners' levels of overall noxious behavior toward them. The extent of children's caries experience was associated positively with the level of their mothers' emotional aggression toward their partners. Conclusions. Noxious family environments may be implicated in compromised oral health. Future research that replicates and extends these findings can provide the foundation to translate them into preventive interventions. Practical Implications. Noxious family environments may help explain the limitations of routine oral health preventive strategies. Interprofessional strategies that also address the family environment ultimately may prove to be more effective than are single modality approaches.
On Fitting a Multivariate Two-Part Latent Growth ModelXu, S., Blozis, S. A., & Vandewater, E. A.
Journal titleStructural Equation Modeling
Page(s)131-148A 2-part latent growth model can be used to analyze semicontinuous data to simultaneously study change in the probability that an individual engages in a behavior, and if engaged, change in the behavior. This article uses a Monte Carlo (MC) integration algorithm to study the interrelationships between the growth factors of 2 variables measured longitudinally where each variable can follow a 2-part latent growth model. A SAS macro implementing Mplus is developed to estimate the model to take into account the sampling uncertainty of this simulation-based computational approach. A sample of time-use data is used to show how maximum likelihood estimates can be obtained using a rectangular numerical integration method and an MC integration method.
Causal Inference in Latent Class AnalysisLanza, S. T., Coffman, D. L., & Xu, S.
Journal titleStructural Equation Modeling
Page(s)361-383The integration of modern methods for causal inference with latent class analysis (LCA) allows social, behavioral, and health researchers to address important questions about the determinants of latent class membership. In this article, 2 propensity score techniques, matching and inverse propensity weighting, are demonstrated for conducting causal inference in LCA. The different causal questions that can be addressed with these techniques are carefully delineated. An empirical analysis based on data from the National Longitudinal Survey of Youth 1979 is presented, where college enrollment is examined as the exposure (i.e., treatment) variable and its causal effect on adult substance use latent class membership is estimated. A step-by-step procedure for conducting causal inference in LCA, including multiple imputation of missing data on the confounders, exposure variable, and multivariate outcome, is included. Sample syntax for carrying out the analysis using SAS and R is given in an appendix.
Preadolescent drug use resistance skill profiles, substance use, and substance use preventionHopfer, S., Hecht, M. L., Lanza, S. T., Tan, X., & Xu, S.
Journal titleJournal of Primary Prevention
Page(s)395-404The aims of the current study were threefold: (1) specify the skills component of social influence prevention interventions for preadolescents, (2) examine the relationship between resistance skill profiles and substance use among preadolescents, and (3) evaluate whether subgroups of preadolescents based on their resistance skills and refusal confidence may be differentially impacted by the kiR prevention program. Latent class analysis showed a four-class model of 5th grader resistance skill profiles. Approximately half of preadolescents (53 %) were familiar with four prototypical resistance skills and showed confidence to apply these skills in real-world settings (highly competent profile); 15 % were familiar with resistance skills but had little confidence (skillful profile); 18 % were confident yet had little knowledge (confident profile); while 15 % had low knowledge and confidence (low competence profile). These skill profiles significantly predicted 8th grade recent substance use (2LL = -2,262.21, df = 3, p = .0005). As predicted by theory, the highly competent skill profile reported lower mean recent substance use than the population sample mean use. Latent transition analysis showed that although patterns of transiting into the highly competent skill profile over time were observed in the expected direction, this pattern was not significant when comparing treatment and control. Identifying skill profiles that predict recent substance use is theoretically consistent and has important implications for healthy and substance-free development.
Sensitivity Analysis of Multiple Informant Models When Data Are Not Missing at RandomBlozis, S. A., Ge, X., Xu, S., Natsuaki, M. N., Shaw, D. S., Neiderhiser, J. M., Scaramella, L. V., Leve, L. D., & Reiss, D.
Journal titleStructural Equation Modeling
Page(s)283-298Missing data are common in studies that rely on multiple informant data to evaluate relationships among variables for distinguishable individuals clustered within groups. Estimation of structural equation models using raw data allows for incomplete data, and so all groups can be retained for analysis even if only 1 member of a group contributes data. Statistical inference is based on the assumption that data are missing completely at random or missing at random. Importantly, whether or not data are missing is assumed to be independent of the missing data. A saturated correlates model that incorporates correlates of the missingness or the missing data into an analysis and multiple imputation that might also use such correlates offer advantages over the standard implementation of SEM when data are not missing at random because these approaches could result in a data analysis problem for which the missingness is ignorable. This article considers these approaches in an analysis of family data to assess the sensitivity of parameter estimates and statistical inferences to assumptions about missing data, a strategy that could be easily implemented using SEM software.
Sensitivity analysis of mixed models for incomplete longitudinal dataXu, S., & Blozis, S. A.
Journal titleJournal of Educational and Behavioral Statistics
Page(s)237-256Mixed models are used for the analysis of data measured over time to study population-level change and individual differences in change characteristics. Linear and nonlinear functions may be used to describe a longitudinal response, individuals need not be observed at the same time points, and missing data, assumed to be missing at random (MAR), may be handled. While the mechanism giving rise to the missing data cannot be determined by the observations, the sensitivity of parameter estimates to missing data assumptions can be studied, for example, by fitting multiple models that make different assumptions about the missing data process. Sensitivity analysis of a mixed model that may include nonlinear parameters when some data are missing is discussed. An example is provided.
Latent curve models: A structural equation perspectiveBlozis, S. A., Cho, Y., & Xu, V. S.
Journal titleSociological Methods and Research
The belief and modeling of agingCui, L. J., Xu, V. S., & Wang, X. J.
Journal titleChinese Journal of Gerontology
A study on the relationship between adaptive ability and home environment in middle schoolLi, G., & Xu, V. S.
Journal titleIn Learning and Research