Understanding weight calculations in Stata’s psmatch2

UPDATE: Edwin Leuven was kind enough to comment on my post and has a much better explanation of the weight situation: http://leuven.economists.nl/psmatch2/2023/02/24/weights-psmatch2.html

If you’ve ever used psmatch2 in Stata, you know that it has one of the least useful help files ever created. I’ve always been frustrated with not understanding how the weights in psmatch2 are calculated, so I decided to sit down and figure it out.

After running psmatch2 in Stata, the program creates a variable called _weight. This indicates which observations are used in matching, and what weight they are given in the final estimation. But the weights can appear baffling at first.

Using the maternal smoking/birthweight example from the teffects manual, we can generate the following matching model:

* read in catteneo data
use http://www.stata-press.com/data/r13/cattaneo2, clear
* create propensity for mother to smoke
logit mbsmoke prenatal1 fbaby mmarried medu fedu mage fage mrace frace 
predict p
* sort dataset in random order
set seed 795     
generate x=uniform()  
sort x   
* calculate caliper
sum p
disp .25*.1180944 
* match using caliper, one match only, and no replacements
psmatch2 mbsmoke, out(bweight) pscore(p) neighbor(1) noreplacement caliper(.0295236)
tab _weight

 psmatch2: |
 weight of |
   matched |  1 if mother smoked
  controls | nonsmoker     smoker |     Total
-----------+----------------------+----------
         1 |       859        859 |     1,718 
-----------+----------------------+----------
     Total |       859        859 |     1,718

Here, each treated unit is given a weight of 1. Because we matched with no replacement and only one neighbor, we ended up with 859 unique controls, and these also have a weight of 1. Makes sense to me.

Suppose we allow multiple matches per treated:

* match using caliper and allowing up to 5 matches per treated
psmatch2 mbsmoke, out(bweight) pscore(p) neighbor(5) caliper(.0295236)
tab _weight mbsmoke

 psmatch2: |
 weight of |
   matched |  1 if mother smoked
  controls | nonsmoker     smoker |     Total
-----------+----------------------+----------
        .2 |     1,014          0 |     1,014 
  .3333333 |         1          0 |         1 
        .4 |       465          0 |       465 
  .5333333 |         1          0 |         1 
        .6 |       267          0 |       267 
  .7333333 |         1          0 |         1 
        .8 |       143          0 |       143 
       .85 |         1          0 |         1 
         1 |        62        864 |       926 
       1.2 |        48          0 |        48 
      1.25 |         3          0 |         3 
       1.4 |        22          0 |        22 
       1.6 |        11          0 |        11 
       1.8 |         6          0 |         6 
       2.2 |         6          0 |         6 
       2.4 |         1          0 |         1 
-----------+----------------------+----------
     Total |     2,052        864 |     2,916

The weights for the treated make intuitive sense: they are all 1. But what about the controls?

The first thing to notice is that if you multiple each weight by its n in the control group (e.g., .2*1,014), and sum each of these quantities, the resulting number is 864.

This also makes sense, because in this example we have many more controls than treated. Suppose that all treated units with propensities greater than or equal to .25 had single matches, and all treated units with propensities less than .25 had 5 matches. If we did not take the multiple matches at the low end of the propensity distribution into account, the estimated treatment effect would be dominated by what was going on at the low end of the distribution, due to all of the multiple matches there. Basically, we need to adjust for the multiple matches, by weighing down control units if there is more than 1 match per treated unit.

The weights become more clear if we multiply them by 5 (the largest possible number of matches possible):

 psmatch2: |
 weight of |
   matched |  1 if mother smoked
  controls | nonsmoker     smoker |     Total
-----------+----------------------+----------
         1 |     1,014          0 |     1,014 
  1.666667 |         1          0 |         1 
         2 |       465          0 |       465 
  2.666667 |         1          0 |         1 
         3 |       267          0 |       267 
  3.666667 |         1          0 |         1 
         4 |       143          0 |       143 
      4.25 |         1          0 |         1 
         5 |        62        864 |       926 
         6 |        48          0 |        48 
      6.25 |         3          0 |         3 
         7 |        22          0 |        22 
         8 |        11          0 |        11 
         9 |         6          0 |         6 
        11 |         6          0 |         6 
        12 |         1          0 |         1 
-----------+----------------------+----------
     Total |     2,052        864 |     2,916

Now most of the weights are whole numbers. They reflect the number of times a unit was matched. For example, 1,014 controls were matched once, 62 were matched 5 times, and one control unit was matched 12 times. This unit (_id=3756) and where it was matched can be seen with the following code:

list if _weight==12
gen idnumber=3756
gen flag=1 if _n1==idnumber
replace flag=1 if _n2==idnumber
replace flag=1 if _n3==idnumber
replace flag=1 if _n4==idnumber
replace flag=1 if _n5==idnumber
list _id mbsmoke _n1 _n2 _n3 _n4 _n5 if flag==1

      +---------------------------------------------------+
      |  _id   mbsmoke    _n1    _n2    _n3    _n4    _n5 |
      |---------------------------------------------------|
 503. | 4621    smoker   3756   3755   3757   3758   3754 |
 577. | 4626    smoker   3758   3757   3759   3760   3756 |
1258. | 4617    smoker   3755   3754   3756   3753   3752 |
1439. | 4623    smoker   3757   3758   3756   3755   3759 |
1595. | 4620    smoker   3756   3755   3757   3758   3754 |
      |---------------------------------------------------|
1834. | 4625    smoker   3757   3758   3759   3756   3760 |
2423. | 4619    smoker   3756   3755   3757   3758   3754 |
2481. | 4627    smoker   3758   3757   3759   3760   3756 |
2631. | 4618    smoker   3755   3756   3757   3758   3754 |
2739. | 4616    smoker   3754   3755   3756   3753   3752 |
      |---------------------------------------------------|
3257. | 4624    smoker   3757   3758   3759   3756   3755 |
3599. | 4622    smoker   3757   3756   3758   3755   3759 |
      +---------------------------------------------------+

The _n1 through _n5 variables are created by psmatch2, and list the _id of the matches for each unit. As can be seen, 3756 is matched to 12 different treated units. So psmatch2 seems to normalize the weights for the controls by dividing by 5.

What about the units with odd weights like 1.66667? This is unit _id=3777, and it is only matched once:

list if _weight>1 & _weight<2
gen idnumber=3777
gen flag=1 if _n1==idnumber
replace flag=1 if _n2==idnumber
replace flag=1 if _n3==idnumber
replace flag=1 if _n4==idnumber
replace flag=1 if _n5==idnumber
list _id mbsmoke _n1 _n2 _n3 _n4 _n5 if flag==1

      +-------------------------------------------------+
      |  _id   mbsmoke    _n1    _n2    _n3   _n4   _n5 |
      |-------------------------------------------------|
 623. | 4642    smoker   3777   3776   3775     .     . |
      +-------------------------------------------------+

Notice that unlike the previous example, unit 4642 has only 3 matches. So the second adjustment that psmatch2 does is adjust the weight for a control unit based on how many other matches are matched to a treated unit, besides itself.

So the weight for 3777 is calculated as (5/3), or 1.67.

The general formula seems to be size of possible match set/size of actual match set, and summed for every treated unit to which a control unit is matched.

Consider unit 3765, which has a weight of 6.25:

list if _weight==6.25
gen idnumber=3765
gen flag=1 if _n1==idnumber
replace flag=1 if _n2==idnumber
replace flag=1 if _n3==idnumber
replace flag=1 if _n4==idnumber
replace flag=1 if _n5==idnumber
list _id mbsmoke _n1 _n2 _n3 _n4 _n5 if flag==1

      +---------------------------------------------------+
      |  _id   mbsmoke    _n1    _n2    _n3    _n4    _n5 |
      |---------------------------------------------------|
  52. | 4635    smoker   3767   3768   3766   3765      . |
 247. | 4630    smoker   3765   3766   3767   3764   3763 |
2150. | 4634    smoker   3767   3766   3765   3768   3764 |
2790. | 4632    smoker   3766   3767   3765   3764   3768 |
3360. | 4631    smoker   3765   3766   3767   3764   3763 |
      |---------------------------------------------------|
3856. | 4633    smoker   3767   3766   3765   3764   3768 |
      +---------------------------------------------------+

For its first match, we divide 5/4 to yield 1.25. The next five matches are 5/5, so that the resulting sum is 6.25.

I’m not sure how the weights work for any other type of matching procedure in psmatch2, and having spent a couple of hours goofing around with Stata, I have no desire to find out.

But this reminds me of an email exchange with someone I consider to be an international expert on matching. When queried about a particular aspect of psmatch2, this person replied

I can never figure out what psmatch2 is doing, and that’s actually one of the reasons I don’t use it much and don’t tend to recommend it. The documentation isn’t very clear and there are a number of things where I think it should be doing something but then it’s not clear that it is.

Yikes! Not exactly a compelling endorsement. This is why I now recommend to my students that they avoid psmatch2: how can you justify using a software procedure if you have no idea what it is doing? Especially considering so many other procedures available for Stata, as well as matchit in R. But after struggling with the teffects command in Stata, I now recommend psmatch2 due to its ease of use.

IMPORTANT NOTE: You should understand that these weights are not inverse propensity weights. Inverse propensity weights are derived directly from the propensities, and reflect the probability of treatment. The weights in psmatch2 are the result of the matching process, and reflect how many times a unit was matched during the matching process.

Share
By Stephen

About me

Professor and quant guy. Libertarian turned populist Republican. Trying to learn Japanese and play Spanish Baroque music on the ukulele.

Subscribe via email

Enter your email address to subscribe to my blog and receive notifications of new posts by email.

Tags