Source Themes

Testing Biased Randomization Assumptions and Quantifying Imperfect Matching and Residual Confounding in Matched Observational Studies
One central goal of design of observational studies is to embed non-experimental data into an approximate randomized controlled trial using statistical matching. Researchers then make the randomization assumption in their downstream, outcome analysis. For matched pair design, the randomization assumption states that the treatment assignment across all matched pairs are independent, and that the probability of the first subject in each pair receiving treatment and the other control is the same as the first receiving control and the other treatment. In this article, we develop a novel framework for testing the randomization assumption based on solving a clustering problem with side-information using modern statistical learning tools. Our testing framework is nonparametric, finite-sample exact, and distinct from previous proposals in that it can be used to test a relaxed version of the randomization assumption called the biased randomization assumption. One important by-product of our testing framework is a quantity called residual sensitivity value (RSV), which quantifies the level of minimal residual confounding due to observed covariates not being well matched. We advocate taking into account RSV in the downstream primary analysis. The proposed methodology is illustrated by re-examining a famous observational study concerning the effect of right heart catheterization (RHC) in the initial care of critically ill patients.
A semiparametric approach to model-based sensitivity analysis in observational studies
When drawing causal inference from observational data, there is almost always concern about unmeasured confounding. One way to tackle this is to conduct a sensitivity analysis. One widely-used sensitivity analysis framework hypothesizes the existence of a scalar unmeasured confounder U and asks how the causal conclusion would change were U measured and included in the primary analysis. Works along this line often make various parametric assumptions on U, for the sake of mathematical and computational simplicity. In this article, we substantively further this line of research by developing a valid sensitivity analysis that leaves the distribution of U unrestricted. Our semiparametric estimator has three desirable features compared to many existing methods in the literature. First, our method allows for a larger and more flexible family of models, and mitigates observable implications (Franks et al., 2019). Second, our method works seamlessly with any primary analysis that models the outcome regression parametrically. Third, our method is easy to use and interpret. We construct both pointwise confidence intervals and confidence bands that are uniformly valid over a given sensitivity parameter space, thus formally accounting for unknown sensitivity parameters. We apply our proposed method on an influential yet controversial study of the causal relationship between war experiences and political activeness using observational data from Uganda.
Statistical matching and subclassification with a continuous dose: characterization, algorithm, and application to a health outcomes study
Subclassification and matching are often used to adjust for observed covariates in observational studies; however, they are largely restricted to relatively simple study designs with a binary treatment. One important exception is Lu et al.(2001), who considered optimal pair matching with a continuous treatment dose. In this article, we propose two criteria for optimal subclassification/full matching based on subclass homogeneity with a continuous treatment dose, and propose an efficient polynomial-time algorithm that is guaranteed to find an optimal subclassification with respect to one criterion and serves as a 2-approximation algorithm for the other criterion. We discuss how to incorporate treatment dose and use appropriate penalties to control the number of subclasses in the design. Via extensive simulations, we systematically examine the performance of our proposed method, and demonstrate that combining our proposed subclassification scheme with regression adjustment helps reduce model dependence for parametric causal inference with a continuous treatment dose. We illustrate the new design and how to conduct randomization-based statistical inference under the new design using Medicare and Medicaid claims data to study the effect of transesophageal echocardiography (TEE) during CABG surgery on patients' 30-day mortality rate.
Social distancing and COVID-19: Randomization inference for a structured dose-response relationship
Social distancing is widely acknowledged as an effective public health policy combating the novel coronavirus. But extreme forms of social distancing like isolation and quarantine have costs and it is not clear how much social distancing is needed to achieve public health effects. In this article, we develop a design-based framework to test the causal null hypothesis and make inference about the dose-response relationship between reduction in social mobility and COVID-19 related public health outcomes. We first discuss how to embed observational data with a time-independent, continuous treatment dose into an approximate randomized experiment, and develop a randomization-based procedure that tests if a structured dose-response relationship fits the data. We then generalize the design and testing procedure to accommodate a time-dependent treatment dose in a longitudinal setting. Finally, we apply the proposed design and testing procedures to investigate the effect of social distancing during the phased reopening in the United States on public health outcomes using data compiled from sources including Unacast, the United States Census Bureau, and the County Health Rankings and Roadmaps Program. We rejected a primary analysis null hypothesis that stated the social distancing from April 27, 2020, to June 28, 2020, had no effect on the COVID-19-related death toll from June 29, 2020, to August 2, 2020 (p-value < 0.001), and found that it took more reduction in mobility to prevent exponential growth in case numbers for non-rural counties compared to rural counties.