(Changing things around and not discussing average treatment effects yet. As I begin to get settled in with this, hopefully I should get a better outline of the order in which I wish to proceed. For now, I’m sure I will change my mind a lot.)

Causal inference is not always possible. We can’t make causal claims about all varieties of effects. In the last post I briefly introduced the concept of potential outcomes. And important element of the very concept of potential outcomes is that, prior to the moment of treatment, it was possible for either outcome to be realized. If we can’t supply a counterfactual, we can’t really talk about causal effects.

Consider race. Can we think about identification of a causal effect of sex on incomes? If we imagine “sex” to be our treatment, can we really conceptualize gender as an intervention? Rubin argues that we cannot. Even a procedure which could change the sex of the child at birth would not be sufficient to identify the actual effect of gender. How could we tell whether the effect was due to the procedure or due to the sex of the child? Thus, it is necessary that, for any unit under study, we are able to imagine a scenario wherein the intervention could have been different. For this reason, Rubin suggests, we should conceive of treatment as a set of actions performed on an individual. Then we can unambiguously discuss the causal effects of those actions.

My final note before actually moving on to a discussion of averages as the basis of treatment effects will be around what I find to be one of the most compact and profound equations in causal inference.

Consider the joint probability density function of $(X, Y, W, M)$ with unknown parameter $\pi$. $X$ is a collection of all the pre–treatment variables. Let’s consider an example of conflict. We wish to measure the causal effect of democracy on conflict propensity. Then $X$ would consist of such variables as economic growth, colonial legacy, resources, trade flows and so forth. $Y$ is our outcome variable, which is, of course, measured post–treatment (but we are in the potential outcomes framework, so there exists a potential outcome for both democracy and for non–democracy treatments). $W$ is an indicator which shows what treatment was received by the unit. Finally, $M$ is a matrix signifying the missingness of all the previous variables. We can think of the joint probability density function of these variables as showing the observed data on this subject. Rubin demonstrates a wonderful decomposition of this:

The first part of this expression is essentially what we care the most about. $f(X,Y \vert \pi)$ is the potentially observable data. In other words, the combination of covariates and potential outcomes are represented here. The rest of the expression is all about the design. $k(W \vert X,Y,\pi)$ is the probability distribution of the assignment to treatment. We need to understand this mechanism if we want to draw appropriate conclusions from our study. Likewise, the final element of the expression, $g(M \vert X,Y,W,\pi)$ is the assignment of missingness to the data. Like the treatment, we have to understand this mechanism if we want to be able to draw inferences about the underlying process of data generation.

For observational studies, the researcher does not have control over the latter two portions. That’s why it is very appealing to design an experiment of sorts, rather than rely on collecting existing observational data. In so doing, you can gain a much better appreciation of exactly how the design portion of this distribution behaves. It makes things worlds easier. To bring things back to my example, we need to have a clear understanding of how it is that states are “assigned” to democracy (not exactly a straightforward task), as well as a good understanding of where there is data and where there isn’t. For example, autocratic states may try to censor details about the oppression of their citizenry. This oppression likely also has an effect on the democratizing process. Clearly, this is not an easily identified causal effect.

Nevertheless, I really like this expression, as it neatly ties together many of the things that empirical researchers really have to worry a lot about: the data generating process, the assignment to treatment, and the missingness of the data. It’s a really profound expression to keep in mind.

I think my next post will discuss the role of randomization and the use of averages to help overcome the fundamental problem of causal inference.