I am an incoming assistant professor of biostatistics at the Vaccine and Infectious Disease Division of the Fred Hutchinson Cancer Center. I obtained my PhD from the Department of Statistics and Data Science, The Wharton School, University of Pennsylvania under the supervision of Dr. Dylan Small. I am broadly interested in the design and analysis of observational studies, instrumental variables, and application of causal inference and statistical modeling in medicine and public health in general. I have been collaborating actively with researchers across multiple disciplines, in particular cardiac surgery and cerebral Malaria in sub-Saharan Africa.
Download my resumé.
Link to my Google Scholar page.
PhD in Statistics, 2022
University of Pennsylvania
BS in Mathematics, 2017
Harvey Mudd College
Subclassification and matching are often used to adjust for observed covariates in observational studies; however, they are largely restricted to relatively simple study designs with a binary treatment. One important exception is Lu et al.(2001), who considered optimal pair matching with a continuous treatment dose. In this article, we propose two criteria for optimal subclassification/full matching based on subclass homogeneity with a continuous treatment dose, and propose an efficient polynomial-time algorithm that is guaranteed to find an optimal subclassification with respect to one criterion and serves as a 2-approximation algorithm for the other criterion. We discuss how to incorporate treatment dose and use appropriate penalties to control the number of subclasses in the design. Via extensive simulations, we systematically examine the performance of our proposed method, and demonstrate that combining our proposed subclassification scheme with regression adjustment helps reduce model dependence for parametric causal inference with a continuous treatment dose. We illustrate the new design and how to conduct randomization-based statistical inference under the new design using Medicare and Medicaid claims data to study the effect of transesophageal echocardiography (TEE) during CABG surgery on patients' 30-day mortality rate.
Cerebral malaria is still a major cause of death in children in sub-Saharan Africa. Among survivors, debilitating neurological sequelae can leave children with permanent cognitive impairments and societal stigma, resulting in taxing repercussions for their families. This study investigated the effect of delay in presentation to medical care on outcome in children with cerebral malaria in Malawi.
Social distancing is widely acknowledged as an effective public health policy combating the novel coronavirus. But extreme forms of social distancing like isolation and quarantine have costs and it is not clear how much social distancing is needed to achieve public health effects. In this article, we develop a design-based framework to test the causal null hypothesis and make inference about the dose-response relationship between reduction in social mobility and COVID-19 related public health outcomes. We first discuss how to embed observational data with a time-independent, continuous treatment dose into an approximate randomized experiment, and develop a randomization-based procedure that tests if a structured dose-response relationship fits the data. We then generalize the design and testing procedure to accommodate a time-dependent treatment dose in a longitudinal setting. Finally, we apply the proposed design and testing procedures to investigate the effect of social distancing during the phased reopening in the United States on public health outcomes using data compiled from sources including Unacast, the United States Census Bureau, and the County Health Rankings and Roadmaps Program. We rejected a primary analysis null hypothesis that stated the social distancing from April 27, 2020, to June 28, 2020, had no effect on the COVID-19-related death toll from June 29, 2020, to August 2, 2020 (p-value < 0.001), and found that it took more reduction in mobility to prevent exponential growth in case numbers for non-rural counties compared to rural counties.
We examine the role of textual data as study units when conducting causal inference by drawing parallels between human subjects and organized texts. We elaborate on key causal concepts and principles, and expose some ambiguity and sometimes fallacies. To facilitate better framing a causal query, we discuss two strategies: (i) shifting from immutable traits to perceptions of them, and (ii) shifting from some abstract concept/property to its constituent parts, i.e., a constructivist perspective of an abstract concept. We hope this article would raise the awareness of the importance of articulating and clarifying fundamental concepts before delving into developing methodologies when drawing causal inference using textual data.
Nearly 150,000 patients undergo open cardiac valve or aortic surgery each year in the US. Intraoperative transesophageal echocardiography (TEE) is used frequently during cardiac surgery, but there is a lack of evidence associating TEE use to improved clinical outcomes. This matched, retrospective cohort study used national registry data from the Society of Thoracic Surgeon (STS), Adult Cardiac Surgery Database (ACSD) between 2011–2019 to compare clinical outcomes among patients undergoing cardiac valve or aortic surgery with vs without intraoperative TEE. Statistical analyses consisted of multiple matched comparisons (including within-hospital and within-surgeon matches), a negative control outcome analysis, and sensitivity analyses.
Multivariate matching has two goals: (i) to construct treated and control groups that have similar distributions of observed covariates, and (ii) to produce matched pairs or sets that are homogeneous in a few key covariates. When there are only a few binary covariates, both goals may be achieved by matching exactly for these few covariates. Commonly, however, there are many covariates, so goals (i) and (ii) come apart, and must be achieved by different means. As is also true in a randomized experiment, similar distributions can be achieved for a high-dimensional covariate, but close pairs can be achieved for only a few covariates. We introduce a new polynomial-time method for achieving both goals that substantially generalizes several existing methods; in particular, it can minimize the earthmover distance between two marginal distributions. The method involves minimum cost flow optimization in a network built around a tripartite graph, unlike the usual network built around a bipartite graph. In the tripartite graph, treated subjects appear twice, on the far left and the far right, with controls sandwiched between them, and efforts to balance covariates are represented on the right, while efforts to find close individual pairs are represented on the left. In this way, the two efforts may be pursued simultaneously without conflict. The method is applied to our on-going study in the Medicare population of the relationship between superior nursing and sepsis mortality. The match2C package in R implements the method.
Coronary artery bypass graft (CABG) surgery is the most widely performed cardiac surgery in the United States. Transesophageal echocardiography (TEE) is frequently used in a variety of cardiac surgical procedures, but its clinical benefit in isolated CABG surgery is unclear, and guidelines remain indeterminate. The aim of this study was to compare clinical outcomes among patients undergoing isolated CABG surgery with versus without TEE in order to test the hypothesis that TEE would be associated with improved clinical outcomes after CABG surgery.
This paper proposes to embed a class of observational IV data into a cluster-randomized encouragement experiment using nonbipartite matching. Potential outcomes and causal assumptions underpinning the design are formalized and examined. Testing procedures for two commonly used estimands, Fisher’s sharp null hypothesis and the pooled effect ratio (PER), are extended to the current setting. We then introduce a novel cluster-heterogeneous proportional treatment effect model and the relevant estimand: the average cluster effect ratio. This new estimand allows treatment heterogeneity, and is advantageous over the PER estimand in that it does not suffer from Simpson’s paradox. We develop an asymptotically valid randomization-based testing procedure for this new estimand based on solving a mixed-integer quadratically constrained optimization problem.
We propose a general framework of approaching the optimal individualized treatment rules (ITR) estimation problem when a valid IV is allowed to only partially identify the treatment effect. We introduce a novel notion of optimality called ‘IV-optimality’. A treatment rule is said to be IV-optimal if it minimizes the maximum risk with respect to the putative IV and the set of IV identification assumptions. We derive a bound on the risk of an IV-optimal rule that illuminates when an IV-optimal rule has favourable generalization performance. We propose a classification-based statistical learning method that estimates such an IV-optimal rule, design computationally efficient algorithms, and prove theoretical guarantees.
How many healthcare workers have lost their lives fighting coronavirus disease (COVID-19)? We estimate using the capture–recapture method.
A commonly used sensitivity analysis for matched observational studies adopts a worst-case perspective, which assumes that, in each matched set, the unmeasured confounder U is allocated to make the bias worst. This worst-case allocation of U does not correspond to any realistic distribution of U in the population and is difficult to compare with observed covariates. We proposed a new sensitivity analysis method that addresses these concerns. We apply the new method to a study of second-hand smoking and blood lead levels in children and find that, to explain away the association between second-hand smoke exposure and blood lead levels as non-causal, the unmeasured confounder would have to be a bigger confounder than any measured confounder.
It is common to compare individualized treatment rules based on the value function, which is the expected potential outcome under the treatment rule. Although the value function is not point-identified when there is unmeasured confounding, it still defines a partial order among the treatment rules under Rosenbaum’s sensitivity analysis model. We first consider how to compare two treatment rules with unmeasured confounding in the single-decision setting and then use this pairwise test to rank multiple treatment rules. We consider how to, among many treatment rules, select the best rules and select the rules that are better than a control rule. The proposed methods are illustrated using two real examples, one about the benefit of malaria prevention programs to different age groups and another about the effect of late retirement on senior health in different gender and occupation groups. Supplementary materials for this article are available online.
Cerebral malaria (CM) remains a leading cause of mortality and morbidity in children in sub-Saharan Africa. Recent studies using brain magnetic resonance imaging have revealed increased brain volume as a major predictor of death. Similar morphometric predictors of morbidity at discharge are lacking. The aim of this study was to investigate the utility of serial cranial cisternal cerebrospinal fluid (CSF) volume measurements in predicting morbidity at discharge in pediatric CM survivors.