Propensity-score-based metrics
Propensity-score-based method
In this section, we will introduce a propensity-score-based method to assess the generalizability of clinical study. We propose the use of propensity-score-based metrics to quantify the similarity of the participants in a designed clinical trial and a target population.
In Rubin Causal model, propensity score is defined as \(e(X)=Pr(Z=1|X)\), where \(Z=z,z\in{0,1}\) is an indicator variable denoting subject’s participation of trial, \(X = (X_1,X_2,…,X_n)\) is a vector of covariates.
Propensity-score difference
In [Elizabeth A. Stuart, 2010], the author introduces a propensity-score-based metric as:
\[
\Delta e(X) = \frac{1}{\sum I(Z_i=1)}\sum I(Z_i=1) e(X_i)-\frac{1}{\sum I(Z_i=0)}\sum I(Z_i=0)(1- e(X_i))
\]
However, we can find that this method has obvious shortcoming. Here we assume there are two trials. Two trials’ propensity score distributions are:
mu1 <- 0.4
mu2 <- 0.5
sigma1 <- 0.05
sigma2 <- 0.01
x <- seq(0,1,0.01)
f1 <- 1/sqrt(2*pi*sigma1^2)*exp(-(x-mu1)^2/(2*sigma1^2))
f2 <- 1/sqrt(2*pi*sigma1^2)*exp(-(x-mu2)^2/(2*sigma1^2))
f3 <- 1/sqrt(2*pi*sigma2^2)*exp(-(x-mu1)^2/(2*sigma2^2))
f4 <- 1/sqrt(2*pi*sigma2^2)*exp(-(x-mu2)^2/(2*sigma2^2))
matplot(x, cbind(f1,f2), type='l', main='Trail 1', xlab='Propensity score', ylab='Density', pch=1)
matplot(x, cbind(f3,f4), type='l', main='Trial 2', xlab='Propensity score', ylab='Density', pch=2)
If \(Y=1\) means patients are eligible for the trial design while \(Y=0\) means real world population. We find although the propensity score’s difference is same, which is \(0.5-0.4=0.1\) here, variances of score distribution is different. We find the overlap of trial 2’ two distribution is larger than trial 1’s. We may wonder only using propensity score’s difference to describe the generalizability of trial is not enough.
Strictly standardized mean difference (SSMD) of propensity score
In [Ryoko Susukida,2016], the author uses SSMD to estimate the generalizability of trial. The definition of SSMD is \[\frac{\Delta e(X)}{\text{pooled standard deviation}}\]
And we find SSMD of trial 1 is \(\frac{0.5-0.4}{\sqrt{0.01}}=1\), while SSMD of trial 2 is \(\frac{0.5-0.4}{\sqrt{0.05}}=0.447\). According to [Ryoko Susukida,2016], propensity score mean values that differ by more than 0.25 standard deviations (standardized \(\Delta e(X)\)) indicate significant differences between the samples. The significant differences of trial 1 is higher than trial 2, which verify our former assumption.
Run propensity-score-based method on R
We use a real world data to exhibit this propensity-score-based method
Introduction to Data Sets:
load data:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
T2D <- read.csv("NHANES_T2D_tutorial_10242019.csv",header=T)
T2D = mutate(T2D, GROUP = ifelse(T2D$BMXBMI<40 & T2D$LBXGH>7.5 & T2D$LBXGH<10 & T2D$RIDAGEYR>18 & T2D$RIDAGEYR<90, 1, 0))
head(T2D)
## SEQN RIAGENDR RIDRETH1 LBXGH BMXBMI RIDAGEYR GROUP
## 1 21043 1 4 5.4 62.77 31 0
## 2 21093 2 2 12.1 28.97 64 0
## 3 21099 1 3 5.8 40.41 75 0
## 4 21104 1 4 6.5 29.33 70 0
## 5 21112 1 5 7.3 21.65 76 0
## 6 21118 1 1 5.5 25.76 66 0
use logistics regression model to estimate propensity score:
m_ps <- glm(GROUP ~ RIDAGEYR + LBXGH + BMXBMI, data = T2D, family = binomial()) ## construct the model
summary(m_ps)
##
## Call:
## glm(formula = GROUP ~ RIDAGEYR + LBXGH + BMXBMI, family = binomial(),
## data = T2D)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.6542 -0.6091 -0.4971 -0.3405 2.0400
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.090327 0.516348 -7.922 2.34e-15 ***
## RIDAGEYR 0.012122 0.004429 2.737 0.0062 **
## LBXGH 0.462886 0.029639 15.618 < 2e-16 ***
## BMXBMI -0.050434 0.008365 -6.029 1.65e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2713.7 on 2694 degrees of freedom
## Residual deviance: 2371.4 on 2691 degrees of freedom
## AIC: 2379.4
##
## Number of Fisher Scoring iterations: 5
prs_df <- data.frame(pr_score = predict(m_ps, type = "response"),
GROUP = m_ps$model$GROUP) ## estimate the propensity score
head(prs_df) ## show the head line of propensity score
## pr_score GROUP
## 1 0.01236198 0
## 2 0.69536457 0
## 3 0.07347589 0
## 4 0.15287001 0
## 5 0.29278493 0
## 6 0.11470159 0
Now we want to compare the Strictly standardized mean difference (SSMD) of propensity scores of two groups. Here we will use the pooled standard deviation.
Definition of pooled standard deviations: \[\sigma=\sqrt{\frac{s1^2*(n_1-1)+s2^2*(n_2-1)}{(n_1+n_2-2)}}\]
c_0 <- prs_df %>%
filter(GROUP=="0") %>%
select(pr_score)
pr_score0 <- c_0$pr_score
c_1 <- prs_df %>%
filter(GROUP=="1") %>%
select(pr_score)
pr_score1 <- c_1$pr_score
n0 = length(pr_score0)
n1 = length(pr_score1)
pooled_sd <- sqrt((sd(pr_score1)^2*(n1-1)+sd(pr_score0)^2*(n0-1))/(n0+n1-2))
prs_df %>%
mutate(pooled_sd) %>%
group_by(GROUP) %>%
summarise(prs_mean = mean(pr_score),
pooled_standard_deviations = mean(pooled_sd),
standardized_prs_mean = mean(pr_score/pooled_sd),
n = n())
## # A tibble: 2 x 5
## GROUP prs_mean pooled_standard_deviations standardized_prs_mean n
## <dbl> <dbl> <dbl> <dbl> <int>
## 1 0 0.179 0.145 1.24 2150
## 2 1 0.294 0.145 2.04 545
prs_dis <- c(mean(pr_score1)-mean(pr_score0),pooled_sd,(mean(pr_score1)-mean(pr_score0))/pooled_sd)
names(prs_dis) <- c("prs_difference","pooled_standard_deviations","standardized_prs_difference")
data.frame(prs_dis)
## prs_dis
## prs_difference 0.1156551
## pooled_standard_deviations 0.1445925
## standardized_prs_difference 0.7998697
Compare propensity distributions for group with
library(sm)
## Package 'sm', version 2.2-5.6: type help(sm) for summary information
# create value labels
GROUP.f <- factor(prs_df$GROUP, levels= c(0,1),
labels = c("GROUP=0","GROUP=1"))
# plot densities
sm.density.compare(prs_df$pr_score, prs_df$GROUP, xlab="propensity score")
title(main="Propensity_score Distribution by group")
# add legend via mouse click
colfill<-c(2:(2+length(levels(GROUP.f))))
legend("topright",legend = levels(GROUP.f), fill=colfill)
// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });