How to use CEM for SPSS

CEM via the Dialog Menu

In order to demonstrate the syntax of CEM in SPSS, you can download an example dataset of a job training program. The goal of the original study was to estimate the effect of a job training program (the variable treated) on real earnings in 1978 (the variable re78). Thus, we want to run CEM to remove imbalances on the treated variable. In the CEM dialog box, you can simply put the treated variable into the "Treatment" variable box and any matching variables in the "Matching Variables" box, then click "OK". This will call CEM and produce the output in the SPSS output viewer.

CEM dialog box image

CEM via SPSS SYNTAX

The syntax for the CEM command in SPSS is:

CEM TREATMENT=variable VARIABLES=variable list
[/NOCOARSENING VARIABLES=variable list]
[/CUTPOINTS cutlist] [/GROUPINGS grouplist]
[/OPTIONS [K2K] [NOIMBAL]]

Here is an concrete example using the above dataset:

CEM TREATMENT=treated VARIABLES=educ black nodegree married re74

Here, we use the TREATMENT key to indicate the treatment variable and the VARIABLES key to list the variables on which to match.

Matching Output

CEM will add the following variables to the active dataset:

cem.strata: the stratum to which each unit was assigned by CEM
cem.matched: whether or not each unit was matched
cem.weights: weights for each unit based on the matching (to be passed to regressions, etc)

Note that if these variables are already present in the data, CEM will overwrite them.

Post-matching Analysis

To find the estimated average treatment effect of the job training program, simply run a weighted least squares regression with re78 as the dependent variable and treated as the independent variable, with cem.weights as the weights. Here is the syntax:

REGRESSION 
  /MISSING LISTWISE 
  /REGWGT=cem.weights 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT re78 
  /METHOD=ENTER treated.

This, of course, can be accessed by the menus as well.

No Imbalance Calculations

On extremely large datasets, it can take quite some time to calculate the imbalance measures used by CEM. To perform matching without calculating the imbalance, simply use the /OPTIONS NOIMBAL command.

K:K matching

To run CEM with k-to-k matching, simply check the box in the dialog menu, or use the follow syntax:

CEM TREATMENT=treated VARIABLES=education black nodegree married re74
  /OPTIONS K2K

This will produce a matched sample that has equal numbers of treated and control units in each stratum.

User-Specified Cutpoints

You can use different cutpoints than the automatically generated CEM cutpoints by added each variables cutpoints as a quoted list: "educ = (0,6,12,16,24)" in "Cutpoints" box of the dialog menu or with the follow syntax:

CEM TREATMENT=treated VARIABLES=education black nodegree married re74
  /CUTPOINTS "educ = (0,6,12,16,24)"

Each additional set of custom cutpoints should be in a separate quoted statement:

CEM TREATMENT=treated VARIABLES=education black nodegree married age
  /CUTPOINTS "educ = (0,6,12,16,24)" "age = (0,18,35,45,60,100)"

These cutpoints can also be added to the "Cutpoints" box in the CEM dialog menu.

Groupings for Categorical Variables

With categorical variables, the groups may not be ordered or the variables itself might be a string. For these variables, you can group different responses into categories for the purposes of the matching using /GROUPING subcommand:

/GROUPING "var1 = [(value1, value2), (value3, value4)]" "var2 = [(value1), (value2, value3, value4)]"

Here, var1 and var2 are names of variables and value1, etc. are values that each variable takes. The GROUPING subcommand is a set of nested lists, with each grouping in parentheses ("( )") and the list of groupings in brackets ("[ ]"). The set of the groupings for each variable should be quoted as above. Here is an example of groupings in action:

CEM TREATMENT=treated VARIABLES= age education black nodegree married re74 re75 hispanic u74 u75 q1
/GROUPING "q1 = [('strongly agree', 'agree'), ('neutral', 'no opinion'), ('strongly disagree', 'disagree')]"

Note that if the variables is string-based, the quoted values must use single-quotes and not double-quotes.

Exact Matching on Certain Variables

Occasionally you may wish to exact match on one or more of the variables in the data, and thus leave that variable uncoarsened. You can achieve this by using the /NOCOARSENING subcommand, including a list of variables on which to exact match:

CEM TREATMENT=treated VARIABLES= age education black nodegree married re74 re75 hispanic u74 u75 q1
/NOCOARSENING VARIABLES=age

For a general discussion of CEM, its motivation, and its interpretation, please see the original R documenation.

CEM for SPSS

Coarsened Exact Matching for SPSS