UPDATE: Edwin Leuven was kind enough to comment on my post and has a much better explanation of the weight situation: http://leuven.economists.nl/psmatch2/2023/02/24/weights-psmatch2.html
If you’ve ever used psmatch2 in Stata, you know that it has one of the least useful help files ever created. I’ve always been frustrated with not understanding how the weights in psmatch2 are calculated, so I decided to sit down and figure it out.
After running psmatch2 in Stata, the program creates a variable called _weight. This indicates which observations are used in matching, and what weight they are given in the final estimation. But the weights can appear baffling at first.
Using the maternal smoking/birthweight example from the teffects manual, we can generate the following matching model:
* read in catteneo data use http://www.stata-press.com/data/r13/cattaneo2, clear * create propensity for mother to smoke logit mbsmoke prenatal1 fbaby mmarried medu fedu mage fage mrace frace predict p * sort dataset in random order set seed 795 generate x=uniform() sort x * calculate caliper sum p disp .25*.1180944 * match using caliper, one match only, and no replacements psmatch2 mbsmoke, out(bweight) pscore(p) neighbor(1) noreplacement caliper(.0295236) tab _weight psmatch2: | weight of | matched | 1 if mother smoked controls | nonsmoker smoker | Total -----------+----------------------+---------- 1 | 859 859 | 1,718 -----------+----------------------+---------- Total | 859 859 | 1,718
Here, each treated unit is given a weight of 1. Because we matched with no replacement and only one neighbor, we ended up with 859 unique controls, and these also have a weight of 1. Makes sense to me.
Suppose we allow multiple matches per treated:
* match using caliper and allowing up to 5 matches per treated psmatch2 mbsmoke, out(bweight) pscore(p) neighbor(5) caliper(.0295236) tab _weight mbsmoke psmatch2: | weight of | matched | 1 if mother smoked controls | nonsmoker smoker | Total -----------+----------------------+---------- .2 | 1,014 0 | 1,014 .3333333 | 1 0 | 1 .4 | 465 0 | 465 .5333333 | 1 0 | 1 .6 | 267 0 | 267 .7333333 | 1 0 | 1 .8 | 143 0 | 143 .85 | 1 0 | 1 1 | 62 864 | 926 1.2 | 48 0 | 48 1.25 | 3 0 | 3 1.4 | 22 0 | 22 1.6 | 11 0 | 11 1.8 | 6 0 | 6 2.2 | 6 0 | 6 2.4 | 1 0 | 1 -----------+----------------------+---------- Total | 2,052 864 | 2,916
The weights for the treated make intuitive sense: they are all 1. But what about the controls?
The first thing to notice is that if you multiple each weight by its n in the control group (e.g., .2*1,014), and sum each of these quantities, the resulting number is 864.
This also makes sense, because in this example we have many more controls than treated. Suppose that all treated units with propensities greater than or equal to .25 had single matches, and all treated units with propensities less than .25 had 5 matches. If we did not take the multiple matches at the low end of the propensity distribution into account, the estimated treatment effect would be dominated by what was going on at the low end of the distribution, due to all of the multiple matches there. Basically, we need to adjust for the multiple matches, by weighing down control units if there is more than 1 match per treated unit.
The weights become more clear if we multiply them by 5 (the largest possible number of matches possible):
psmatch2: | weight of | matched | 1 if mother smoked controls | nonsmoker smoker | Total -----------+----------------------+---------- 1 | 1,014 0 | 1,014 1.666667 | 1 0 | 1 2 | 465 0 | 465 2.666667 | 1 0 | 1 3 | 267 0 | 267 3.666667 | 1 0 | 1 4 | 143 0 | 143 4.25 | 1 0 | 1 5 | 62 864 | 926 6 | 48 0 | 48 6.25 | 3 0 | 3 7 | 22 0 | 22 8 | 11 0 | 11 9 | 6 0 | 6 11 | 6 0 | 6 12 | 1 0 | 1 -----------+----------------------+---------- Total | 2,052 864 | 2,916
Now most of the weights are whole numbers. They reflect the number of times a unit was matched. For example, 1,014 controls were matched once, 62 were matched 5 times, and one control unit was matched 12 times. This unit (_id=3756) and where it was matched can be seen with the following code:
list if _weight==12 gen idnumber=3756 gen flag=1 if _n1==idnumber replace flag=1 if _n2==idnumber replace flag=1 if _n3==idnumber replace flag=1 if _n4==idnumber replace flag=1 if _n5==idnumber list _id mbsmoke _n1 _n2 _n3 _n4 _n5 if flag==1 +---------------------------------------------------+ | _id mbsmoke _n1 _n2 _n3 _n4 _n5 | |---------------------------------------------------| 503. | 4621 smoker 3756 3755 3757 3758 3754 | 577. | 4626 smoker 3758 3757 3759 3760 3756 | 1258. | 4617 smoker 3755 3754 3756 3753 3752 | 1439. | 4623 smoker 3757 3758 3756 3755 3759 | 1595. | 4620 smoker 3756 3755 3757 3758 3754 | |---------------------------------------------------| 1834. | 4625 smoker 3757 3758 3759 3756 3760 | 2423. | 4619 smoker 3756 3755 3757 3758 3754 | 2481. | 4627 smoker 3758 3757 3759 3760 3756 | 2631. | 4618 smoker 3755 3756 3757 3758 3754 | 2739. | 4616 smoker 3754 3755 3756 3753 3752 | |---------------------------------------------------| 3257. | 4624 smoker 3757 3758 3759 3756 3755 | 3599. | 4622 smoker 3757 3756 3758 3755 3759 | +---------------------------------------------------+
The _n1 through _n5 variables are created by psmatch2, and list the _id of the matches for each unit. As can be seen, 3756 is matched to 12 different treated units. So psmatch2 seems to normalize the weights for the controls by dividing by 5.
What about the units with odd weights like 1.66667? This is unit _id=3777, and it is only matched once:
list if _weight>1 & _weight<2 gen idnumber=3777 gen flag=1 if _n1==idnumber replace flag=1 if _n2==idnumber replace flag=1 if _n3==idnumber replace flag=1 if _n4==idnumber replace flag=1 if _n5==idnumber list _id mbsmoke _n1 _n2 _n3 _n4 _n5 if flag==1 +-------------------------------------------------+ | _id mbsmoke _n1 _n2 _n3 _n4 _n5 | |-------------------------------------------------| 623. | 4642 smoker 3777 3776 3775 . . | +-------------------------------------------------+
Notice that unlike the previous example, unit 4642 has only 3 matches. So the second adjustment that psmatch2 does is adjust the weight for a control unit based on how many other matches are matched to a treated unit, besides itself.
So the weight for 3777 is calculated as (5/3), or 1.67.
The general formula seems to be size of possible match set/size of actual match set, and summed for every treated unit to which a control unit is matched.
Consider unit 3765, which has a weight of 6.25:
list if _weight==6.25 gen idnumber=3765 gen flag=1 if _n1==idnumber replace flag=1 if _n2==idnumber replace flag=1 if _n3==idnumber replace flag=1 if _n4==idnumber replace flag=1 if _n5==idnumber list _id mbsmoke _n1 _n2 _n3 _n4 _n5 if flag==1 +---------------------------------------------------+ | _id mbsmoke _n1 _n2 _n3 _n4 _n5 | |---------------------------------------------------| 52. | 4635 smoker 3767 3768 3766 3765 . | 247. | 4630 smoker 3765 3766 3767 3764 3763 | 2150. | 4634 smoker 3767 3766 3765 3768 3764 | 2790. | 4632 smoker 3766 3767 3765 3764 3768 | 3360. | 4631 smoker 3765 3766 3767 3764 3763 | |---------------------------------------------------| 3856. | 4633 smoker 3767 3766 3765 3764 3768 | +---------------------------------------------------+
For its first match, we divide 5/4 to yield 1.25. The next five matches are 5/5, so that the resulting sum is 6.25.
I’m not sure how the weights work for any other type of matching procedure in psmatch2, and having spent a couple of hours goofing around with Stata, I have no desire to find out.
But this reminds me of an email exchange with someone I consider to be an international expert on matching. When queried about a particular aspect of psmatch2, this person replied
I can never figure out what psmatch2 is doing, and that’s actually one of the reasons I don’t use it much and don’t tend to recommend it. The documentation isn’t very clear and there are a number of things where I think it should be doing something but then it’s not clear that it is.
Yikes! Not exactly a compelling endorsement. This is why I now recommend to my students that they avoid psmatch2: how can you justify using a software procedure if you have no idea what it is doing? Especially considering so many other procedures available for Stata, as well as matchit in R. But after struggling with the teffects command in Stata, I now recommend psmatch2 due to its ease of use.
IMPORTANT NOTE: You should understand that these weights are not inverse propensity weights. Inverse propensity weights are derived directly from the propensities, and reflect the probability of treatment. The weights in psmatch2 are the result of the matching process, and reflect how many times a unit was matched during the matching process.