Title: Does Regression Produce Representative Estimates of Causal Effects?
It is well-known that, with an unrepresentative sample, the estimate
of a causal effect may fail to characterize how effects operate in the
population of interest. What is less well understood is that
conventional estimation practices for observational studies may
produce the same problem even with a representative sample.
Specifically, causal effects estimated via multiple regression
differentially weight each unit's contribution. The ``effective
sample'' that regression uses to generate the causal effect estimate
may bear little resemblance to the population of interest. The effects
that multiple regression estimate may be nonrepresentative in a
similar manner as are effects produced via quasi-experimental methods
such as instrumental variables, matching, or regression discontinuity
designs, implying there is no representativeness basis for preferring
multiple regression on representative samples over quasi-experimental
methods. We show how to estimate the implied ``multiple regression
weights'' for each unit, thus allowing researchers to visualize the
characteristics of the effective sample. Knowing the effective sample
is crucial, because it allows one to relate effect estimates to sample
characteristics. We then discuss alternative approaches that, under
certain conditions, recover representative average causal effects. The
requisite conditions cannot always be met.