Tutorial: Propensity Score Weighting & Matching

Propensity-score-based metrics

Propensity-score-based method

In this section, we will introduce a propensity-score-based method to assess the generalizability of clinical study. We propose the use of propensity-score-based metrics to quantify the similarity of the participants in a designed clinical trial and a target population.

In Rubin Causal model, propensity score is defined as \(e(X)=Pr(Z=1|X)\), where \(Z=z,z\in{0,1}\) is an indicator variable denoting subject’s participation of trial, \(X = (X_1,X_2,…,X_n)\) is a vector of covariates.

Propensity-score difference

In [Elizabeth A. Stuart, 2010], the author introduces a propensity-score-based metric as:

\[
\Delta e(X) = \frac{1}{\sum I(Z_i=1)}\sum I(Z_i=1) e(X_i)-\frac{1}{\sum I(Z_i=0)}\sum I(Z_i=0)(1- e(X_i))
\]

However, we can find that this method has obvious shortcoming. Here we assume there are two trials. Two trials’ propensity score distributions are:

mu1 <- 0.4
mu2 <- 0.5
sigma1 <- 0.05
sigma2 <- 0.01

x <- seq(0,1,0.01)
f1 <- 1/sqrt(2*pi*sigma1^2)*exp(-(x-mu1)^2/(2*sigma1^2))
f2 <- 1/sqrt(2*pi*sigma1^2)*exp(-(x-mu2)^2/(2*sigma1^2))
f3 <- 1/sqrt(2*pi*sigma2^2)*exp(-(x-mu1)^2/(2*sigma2^2))
f4 <- 1/sqrt(2*pi*sigma2^2)*exp(-(x-mu2)^2/(2*sigma2^2))

matplot(x, cbind(f1,f2), type='l', main='Trail 1', xlab='Propensity score', ylab='Density', pch=1)

matplot(x, cbind(f3,f4), type='l', main='Trial 2', xlab='Propensity score', ylab='Density', pch=2)

If \(Y=1\) means patients are eligible for the trial design while \(Y=0\) means real world population. We find although the propensity score’s difference is same, which is \(0.5-0.4=0.1\) here, variances of score distribution is different. We find the overlap of trial 2’ two distribution is larger than trial 1’s. We may wonder only using propensity score’s difference to describe the generalizability of trial is not enough.

Strictly standardized mean difference (SSMD) of propensity score

In [Ryoko Susukida,2016], the author uses SSMD to estimate the generalizability of trial. The definition of SSMD is \[\frac{\Delta e(X)}{\text{pooled standard deviation}}\]

And we find SSMD of trial 1 is \(\frac{0.5-0.4}{\sqrt{0.01}}=1\), while SSMD of trial 2 is \(\frac{0.5-0.4}{\sqrt{0.05}}=0.447\). According to [Ryoko Susukida,2016], propensity score mean values that differ by more than 0.25 standard deviations (standardized \(\Delta e(X)\)) indicate significant differences between the samples. The significant differences of trial 1 is higher than trial 2, which verify our former assumption.

Run propensity-score-based method on R

We use a real world data to exhibit this propensity-score-based method

Introduction to Data Sets:

load data:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
T2D <- read.csv("NHANES_T2D_tutorial_10242019.csv",header=T)
T2D = mutate(T2D, GROUP = ifelse(T2D$BMXBMI<40 & T2D$LBXGH>7.5 & T2D$LBXGH<10 & T2D$RIDAGEYR>18 & T2D$RIDAGEYR<90, 1, 0))
head(T2D)
##    SEQN RIAGENDR RIDRETH1 LBXGH BMXBMI RIDAGEYR GROUP
## 1 21043        1        4   5.4  62.77       31     0
## 2 21093        2        2  12.1  28.97       64     0
## 3 21099        1        3   5.8  40.41       75     0
## 4 21104        1        4   6.5  29.33       70     0
## 5 21112        1        5   7.3  21.65       76     0
## 6 21118        1        1   5.5  25.76       66     0

use logistics regression model to estimate propensity score:

m_ps <- glm(GROUP ~ RIDAGEYR + LBXGH + BMXBMI, data = T2D, family = binomial())   ## construct the model
summary(m_ps)
## 
## Call:
## glm(formula = GROUP ~ RIDAGEYR + LBXGH + BMXBMI, family = binomial(), 
##     data = T2D)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6542  -0.6091  -0.4971  -0.3405   2.0400  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.090327   0.516348  -7.922 2.34e-15 ***
## RIDAGEYR     0.012122   0.004429   2.737   0.0062 ** 
## LBXGH        0.462886   0.029639  15.618  < 2e-16 ***
## BMXBMI      -0.050434   0.008365  -6.029 1.65e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2713.7  on 2694  degrees of freedom
## Residual deviance: 2371.4  on 2691  degrees of freedom
## AIC: 2379.4
## 
## Number of Fisher Scoring iterations: 5
prs_df <- data.frame(pr_score = predict(m_ps, type = "response"),
                     GROUP = m_ps$model$GROUP)   ## estimate the propensity score 
head(prs_df)  ## show the head line of propensity score
##     pr_score GROUP
## 1 0.01236198     0
## 2 0.69536457     0
## 3 0.07347589     0
## 4 0.15287001     0
## 5 0.29278493     0
## 6 0.11470159     0

Now we want to compare the Strictly standardized mean difference (SSMD) of propensity scores of two groups. Here we will use the pooled standard deviation.

Definition of pooled standard deviations: \[\sigma=\sqrt{\frac{s1^2*(n_1-1)+s2^2*(n_2-1)}{(n_1+n_2-2)}}\]

c_0 <- prs_df %>% 
  filter(GROUP=="0") %>%
  select(pr_score)
pr_score0 <- c_0$pr_score
c_1 <- prs_df %>% 
  filter(GROUP=="1") %>%
  select(pr_score)
pr_score1 <- c_1$pr_score

n0 = length(pr_score0)
n1 = length(pr_score1)
pooled_sd <- sqrt((sd(pr_score1)^2*(n1-1)+sd(pr_score0)^2*(n0-1))/(n0+n1-2))

prs_df %>% 
  mutate(pooled_sd) %>%
  group_by(GROUP) %>%
  summarise(prs_mean = mean(pr_score),
            pooled_standard_deviations = mean(pooled_sd),
            standardized_prs_mean = mean(pr_score/pooled_sd),
            n = n())
## # A tibble: 2 x 5
##   GROUP prs_mean pooled_standard_deviations standardized_prs_mean     n
##   <dbl>    <dbl>                      <dbl>                 <dbl> <int>
## 1     0    0.179                      0.145                  1.24  2150
## 2     1    0.294                      0.145                  2.04   545
prs_dis <- c(mean(pr_score1)-mean(pr_score0),pooled_sd,(mean(pr_score1)-mean(pr_score0))/pooled_sd)
names(prs_dis) <- c("prs_difference","pooled_standard_deviations","standardized_prs_difference")
data.frame(prs_dis)
##                               prs_dis
## prs_difference              0.1156551
## pooled_standard_deviations  0.1445925
## standardized_prs_difference 0.7998697

Compare propensity distributions for group with

library(sm)
## Package 'sm', version 2.2-5.6: type help(sm) for summary information
# create value labels 
GROUP.f <- factor(prs_df$GROUP, levels= c(0,1),
                     labels = c("GROUP=0","GROUP=1")) 

# plot densities 

sm.density.compare(prs_df$pr_score, prs_df$GROUP, xlab="propensity score")
title(main="Propensity_score Distribution by group")

# add legend via mouse click
colfill<-c(2:(2+length(levels(GROUP.f)))) 
legend("topright",legend = levels(GROUP.f), fill=colfill)