CEM via the Dialog Menu
In order to demonstrate the syntax of CEM in SPSS, you can download an example dataset of a job training program. The goal of the original study was to estimate the effect of a job training program (the variable treated
) on real earnings in 1978 (the variable re78
). Thus, we want to run CEM to remove imbalances on the treated
variable. In the CEM dialog box, you can simply put the treated
variable into the "Treatment" variable box and any matching variables in the "Matching Variables" box, then click "OK". This will call CEM and produce the output in the SPSS output viewer.
CEM via SPSS SYNTAX
The syntax for the CEM command in SPSS is:
CEM TREATMENT=variable VARIABLES=variable list
[/NOCOARSENING VARIABLES=variable list]
[/CUTPOINTS cutlist] [/GROUPINGS grouplist]
[/OPTIONS [K2K] [NOIMBAL]]
Here is an concrete example using the above dataset:
CEM TREATMENT=treated VARIABLES=educ black nodegree married re74
Here, we use the TREATMENT
key to indicate the treatment variable and the VARIABLES
key to list the variables on which to match.
Matching Output
CEM will add the following variables to the active dataset:
-
cem.strata
: the stratum to which each unit was assigned by CEM -
cem.matched
: whether or not each unit was matched -
cem.weights
: weights for each unit based on the matching (to be passed to regressions, etc)
Note that if these variables are already present in the data, CEM will overwrite them.
Post-matching Analysis
To find the estimated average treatment effect of the job training program, simply run a weighted least squares regression with re78
as the dependent variable and treated
as the independent variable, with cem.weights
as the weights. Here is the syntax:
REGRESSION
/MISSING LISTWISE
/REGWGT=cem.weights
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT re78
/METHOD=ENTER treated.
This, of course, can be accessed by the menus as well.
No Imbalance Calculations
On extremely large datasets, it can take quite some time to calculate the imbalance measures used by CEM. To perform matching without calculating the imbalance, simply use the /OPTIONS NOIMBAL
command.
K:K matching
To run CEM with k-to-k matching, simply check the box in the dialog menu, or use the follow syntax:
CEM TREATMENT=treated VARIABLES=education black nodegree married re74
/OPTIONS K2K
This will produce a matched sample that has equal numbers of treated and control units in each stratum.
User-Specified Cutpoints
You can use different cutpoints than the automatically generated CEM cutpoints by added each variables cutpoints as a quoted list: "educ = (0,6,12,16,24)"
in "Cutpoints" box of the dialog menu or with the follow syntax:
CEM TREATMENT=treated VARIABLES=education black nodegree married re74
/CUTPOINTS "educ = (0,6,12,16,24)"
Each additional set of custom cutpoints should be in a separate quoted statement:
CEM TREATMENT=treated VARIABLES=education black nodegree married age
/CUTPOINTS "educ = (0,6,12,16,24)" "age = (0,18,35,45,60,100)"
These cutpoints can also be added to the "Cutpoints" box in the CEM dialog menu.
Groupings for Categorical Variables
With categorical variables, the groups may not be ordered or the variables itself might be a string. For these variables, you can group different responses into categories for the purposes of the matching using /GROUPING
subcommand:
/GROUPING "var1 = [(value1, value2), (value3, value4)]" "var2 = [(value1), (value2, value3, value4)]"
Here, var1
and var2
are names of variables and value1
, etc. are values that each variable takes. The GROUPING
subcommand is a set of nested lists, with each grouping in parentheses ("( )
") and the list of groupings in brackets ("[ ]
"). The set of the groupings for each variable should be quoted as above. Here is an example of groupings in action:
CEM TREATMENT=treated VARIABLES= age education black nodegree married re74 re75 hispanic u74 u75 q1
/GROUPING "q1 = [('strongly agree', 'agree'), ('neutral', 'no opinion'), ('strongly disagree', 'disagree')]"
Note that if the variables is string-based, the quoted values must use single-quotes and not double-quotes.
Exact Matching on Certain Variables
Occasionally you may wish to exact match on one or more of the variables in the data, and thus leave that variable uncoarsened. You can achieve this by using the /NOCOARSENING subcommand, including a list of variables on which to exact match:
CEM TREATMENT=treated VARIABLES= age education black nodegree married re74 re75 hispanic u74 u75 q1
/NOCOARSENING VARIABLES=age
For a general discussion of CEM, its motivation, and its interpretation, please see the original R documenation.