Title: | Toolkit for Weighting and Analysis of Nonequivalent Groups |
---|---|
Description: | Provides functions for propensity score estimating and weighting, nonresponse weighting, and diagnosis of the weights. |
Authors: | Matthew Cefalu <[email protected]>, Greg Ridgeway, Dan McCaffrey, Andrew Morral, Beth Ann Griffin, and Lane Burgette |
Maintainer: | Matthew Cefalu <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 2.5 |
Built: | 2024-11-21 06:07:11 UTC |
Source: | https://github.com/mattcefalu/twang |
Provides functions for propensity score estimating and weighting, nonresponse weighting, and diagnosis of the weights.
A small subset of the data from McCaffrey et al. (2013).
data(AOD)
data(AOD)
A data frame with 600 observations on the following 10 variables.
treat
Treatment that each study subject received. Either community, metcbt5, or scy.
suf12
outcome variable, substance use frequency at 12 month follow-up
illact
covariate, illicit activities scale
crimjust
covariate, criminal justice involvement
subprob
covariate, substance use problem scale
subdep
covariate, substance use dependence scale
white
1 if non-Hispanic white, 0 otherwise
McCaffrey, DF, BA Griffin, D Almirall, ME Slaughter, R Ramchand and LF Burgette (2013). A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Statistics in Medicine.
'bal.stat' compares the treatment and control subjects by means, standard deviations, effect size, and KS statistics
bal.stat( data, vars = NULL, treat.var, w.all, sampw, get.means = TRUE, get.ks = TRUE, na.action = "level", estimand, multinom, fillNAs = FALSE )
bal.stat( data, vars = NULL, treat.var, w.all, sampw, get.means = TRUE, get.ks = TRUE, na.action = "level", estimand, multinom, fillNAs = FALSE )
data |
A data frame containing the data |
vars |
A vector of character strings with the names of the variables on which the function will assess the balance |
treat.var |
The name of the treatment variable |
w.all |
Oobservation weights (e.g. propensity score weights, sampling weights, or both) |
sampw |
Sampling weights. These are passed in addition to 'w.all' because the "unweighted" results shoud be adjusted for sample weights (though not propensity score weights). |
get.means |
logical. If 'TRUE' then 'bal.stat' will compute means and variances |
get.ks |
logical. If 'TRUE' then 'bal.stat' will compute KS statistics |
na.action |
A character string indicating how 'bal.stat' should handle missing values. Current options are "level", "exclude", or "lowest" |
estimand |
Either "ATT" or "ATE" |
multinom |
logical. 'TRUE' if used for multinomial propensity scores. |
fillNAs |
logical. If 'TRUE', fills in zeros for missing values. |
'bal.stat' calls auxiliary functions for each variable and assembles the results in a table.
'get.means' and 'get.ks' manipulate the inclusion of certain columns in the returned result.
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", *Psychological Methods* 9(4):403-425.
The example for [ps] contains an example of the use of [bal.table]
Extract the balance table from ps, dx.wts, and mnps objects
bal.table( x, digits = 3, collapse.to = c("pair", "covariate", "stop.method")[1], subset.var = NULL, subset.treat = NULL, subset.stop.method = NULL, es.cutoff = 0, ks.cutoff = 0, p.cutoff = 1, ks.p.cutoff = 1, timePeriods = NULL, ... )
bal.table( x, digits = 3, collapse.to = c("pair", "covariate", "stop.method")[1], subset.var = NULL, subset.treat = NULL, subset.stop.method = NULL, es.cutoff = 0, ks.cutoff = 0, p.cutoff = 1, ks.p.cutoff = 1, timePeriods = NULL, ... )
x |
|
digits |
The number of digits that the numerical entries should be rounded to. Default: 3. |
collapse.to |
For |
subset.var |
Eliminate all but a specified subset of covariates. |
subset.treat |
Subset to either all pairs that include a specified treatment or a single pair of treatments. |
subset.stop.method |
Subset to either all pairs that include a specified treatment or a single pair of treatments. |
es.cutoff |
Subsets to comparisons with absolute ES values bigger than
|
ks.cutoff |
Subsets to comparisons with KS values bigger than
|
p.cutoff |
Subsets to comparisons with t- or chi-squared p-values
no bigger than |
ks.p.cutoff |
Subsets to comparisons with t- or chi-squared p-values
no bigger than |
timePeriods |
Used to subset times for iptw fits. |
... |
Additional arugments. |
bal.table
is a generic function for extracting balance
tables from ps and dx.wts objects. These objects
usually have several sets of candidate weights, one for an unweighted
analysis and perhaps several stop.methods
. bal.table
will return a table for each set of weights combined into a list. Each list
component will be named as given in the x
, usually the name of the
stop.method
. The balance table labeled “unw” indicates the
unweighted analysis.
Returns a data frame containing the balance information.
tx.mn
The mean of the treatment group.
tx.sd
The standard deviation of the treatment group.
ct.mn
The mean of the control group.
ct.sd
The standard deviation of the control group.
std.eff.sz
The standardized effect size, (tx.mn-ct.mn)/tx.sd.
If tx.sd is small or 0, the standardized effect size can be large or INF.
Therefore, standardized effect sizes greater than 500 are set to NA.
stat
The t-statistic for numeric variables and the chi-square
statistic for continuous variables.
p
The p-value for the test associated with stat
ks
The KS statistic.
ks.pval
The KS p-value computed using the analytic approximation,
which does not necessarily work well with a lot of ties.
This function produces a collection of diagnostic plots for mnps objects.
## S3 method for class 'mnps' boxplot( x, stop.method = NULL, color = TRUE, figureRows = NULL, singlePlot = NULL, multiPage = FALSE, time = NULL, print = TRUE, ... )
## S3 method for class 'mnps' boxplot( x, stop.method = NULL, color = TRUE, figureRows = NULL, singlePlot = NULL, multiPage = FALSE, time = NULL, print = TRUE, ... )
x |
A 'ps' object |
stop.method |
Only 1 'stop.method' can be presented at a time for 'mnps' objects. Use a numeric indicator of which 'stop.method' (among those specified when fitting the 'mnps' object) should be used. |
color |
If 'FALSE', a grayscale figure will be returned. |
figureRows |
The number of rows in the figure. Defaults to the number of panels. |
singlePlot |
If multiple sets of boxplots are produced, 'singlePlot' can be used to select only one. For example, 'singlePlot = 2' would return only the second set of boxplots. |
multiPage |
When multiple frames of a figure are produced, 'multiPage = TRUE' will print each frame on a different page. This is intended for situations where the graphical output is being saved to a file. |
time |
For use with iptw fits. |
print |
If 'FALSE', the figure is returned but not printed. |
... |
Additional arguments that are passed to boxplot function, which may bepassed to the underlying 'lattice' package plotting functions. |
This function produces lattice-style graphics of diagnostic plots.
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", *Psychological Methods* 9(4):403-425.
[mnps]
This function produces a collection of diagnostic plots for ps objects.
## S3 method for class 'ps' boxplot(x, subset = NULL, color = TRUE, time = NULL, ...)
## S3 method for class 'ps' boxplot(x, subset = NULL, color = TRUE, time = NULL, ...)
x |
A 'ps' object |
subset |
If multiple 'stop.method' rules were used in the 'ps()' call, 'subset' restricts the plots of a subset of the stopping rules that were employed. This argument expects a subset of the integers from 1 to k, if k 'stop.method's were used. |
color |
If 'FALSE', a grayscale figure will be returned. |
time |
For use with iptw fits. |
... |
Additional arguments that are passed to boxplot function, which may bepassed to the underlying 'lattice' package plotting functions. |
This function produces lattice-style graphics of diagnostic plots.
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", *Psychological Methods* 9(4):403-425.
[ps]
desc.wts
assesses the quality of a set of weights on balancing a treatment
and control group.
desc.wts(data, w, sampw = sampw, vars = NULL, treat.var, tp, na.action = "level", perm.test.iters=0, verbose=TRUE, alerts.stack, estimand, multinom = FALSE, fillNAs = FALSE)
desc.wts(data, w, sampw = sampw, vars = NULL, treat.var, tp, na.action = "level", perm.test.iters=0, verbose=TRUE, alerts.stack, estimand, multinom = FALSE, fillNAs = FALSE)
data |
a data frame containing the dataset |
w |
a vector of weights equal to |
sampw |
sampling weights, if provided |
vars |
a vector of variable names corresponding to |
treat.var |
the name of the treatment variable |
tp |
a title for the method “type" used to create the weights, used to label the results |
na.action |
a string indicating the method for handling missing data |
perm.test.iters |
an non-negative integer giving the number of iterations
of the permutation test for the KS statistic. If |
verbose |
if TRUE, lots of information will be printed to monitor the the progress of the fitting |
alerts.stack |
an object for collecting warnings issued during the analyses |
estimand |
the estimand of interest: either "ATT" or "ATE" |
multinom |
Indicator that weights are from a propsensity score analysis with 3 or more treatment groups. |
fillNAs |
If |
desc.wts
calls bal.stat
to assess covariate balance.
If perm.test.iters>0
it will call bal.stat
multiple
times to compute Monte Carlo p-values for the KS statistics and the maximum KS
statistic. It assembles the results into a list object, which usually becomes
the desc
component of ps objects that ps
returns.
See the description of the desc
component of the ps
object that
ps
returns
Display plots
displayPlots(ptList, figureRows, singlePlot, multiPage, bxpt = FALSE)
displayPlots(ptList, figureRows, singlePlot, multiPage, bxpt = FALSE)
ptList |
A list of plots to display. |
figureRows |
The number of rows in the figure. |
singlePlot |
An integer indicating the index of the plot to display. |
multiPage |
Whether to display plots on multiple pages. |
bxpt |
Whether to display boxplots. Default: 'FALSE'. |
dx.wts
takes a ps
object or a set of propensity scores and
computes diagnostics assessing covariates balance.
dx.wts( x, data, estimand, vars = NULL, treat.var, x.as.weights = TRUE, sampw = NULL, perm.test.iters = 0 )
dx.wts( x, data, estimand, vars = NULL, treat.var, x.as.weights = TRUE, sampw = NULL, perm.test.iters = 0 )
x |
A data frame, matrix, or vector of propensity score weights or a ps
object. |
data |
A data frame. |
estimand |
The estimand of interest: either "ATT" or "ATE". |
vars |
A vector of character strings naming variables in |
treat.var |
A character string indicating which variable in |
x.as.weights |
|
sampw |
Optional sampling weights. If |
perm.test.iters |
A non-negative integer giving the number of iterations
of the permutation test for the KS statistic. If |
Creates a balance table that compares unweighted and weighted means and standard deviations, computes effect sizes, and KS statistics to assess the ability of the propensity scores to balance the treatment and control groups.
Returns a list containing
treat
The vector of 0/1 treatment assignment indicators.
A subset of the mathematics scores from the U.S. Sustaining Effects Study.
The subset consists of information on 1721 students from 60 schools. This
dataset is available in the mlmRev
package.
data(egsingle)
data(egsingle)
A data frame with 7230 observations on the following 12 variables.
schoolid
a factor of school identifiers
childid
a factor of student identifiers
year
a numeric vector indicating the year of the test
grade
a numeric vector indicating the student's grade
math
a numeric vector of test scores on the IRT scale score metric
retained
a factor with levels 0
1
indicating if
the student has been retained in a grade.
female
a factor with levels Female
Male
black
a factor with levels 0
1
indicating if the student is Black
hispanic
a factor with levels 0
1
indicating if the student is Hispanic
size
a numeric vector indicating the number of students enrolled in the school
lowinc
a numeric vector giving the percentage of low-income students in the school
mobility
a numeric vector
Reproduced from themlmRev
package for use in the
section on nonresponse weighting in the twang
package
vignette. These data are distributed with the HLM software package
(Bryk, Raudenbush, and Congdon, 1996). Conversion to the R format is
described in Doran and Lockwood (2006).
Doran, H.C. and J.R. Lockwood (2006). “Fitting value-added models in R,” Journal of Educational and Behavioral Statistics, 31(1)
Extracts propensity score weights from a ps or mnps object.
get.weights(ps1, stop.method = NULL, estimand = NULL, withSampW = TRUE)
get.weights(ps1, stop.method = NULL, estimand = NULL, withSampW = TRUE)
ps1 |
A |
stop.method |
Indicates which set of weights to retrieve from the |
estimand |
Indicates whether the weights are for the average treatment effect on
the treated (ATT) or the average treatment effect on the population (ATE). By default,
|
withSampW |
Whether to return weights with sample weights multiplied in, if they were
provided in the original |
Weights for ATT are 1 for the treatment cases and p/(1-p) for the control cases. Weights for ATE are 1/p for the treatment cases and 1/(1-p) for the control cases.
Returns a vector of weights.
Forms numerators to stabilize weights for an iptw object.
get.weights.num(iptw, fitList)
get.weights.num(iptw, fitList)
iptw |
An 'iptw“ object. |
fitList |
A list containing objects with an associated "fitted" function. |
Returns numerator of stabilized weights to be used in conjunction with 'get.weights.unstab'
[iptw]
Extracts propensity score weights from an 'iptw' or 'mniptw' object.
get.weights.unstab(x, stop.method = NULL, withSampW = TRUE)
get.weights.unstab(x, stop.method = NULL, withSampW = TRUE)
x |
An 'iptw' or 'mniptw' object. |
stop.method |
The twop method used for the fit of interest. |
withSampW |
Returns weights with sample weights multiplied in, if they were provided in the original 'iptw' call. Default: 'TRUE'. |
Weights are the reciprocal of the product of the probability of receiving the treatment received.
Returns a data.frame of weights.
[iptw]
iptw
calculates propensity scores for sequential treatments using gradient boosted logistic
regression and diagnoses the resulting propensity scores using a variety of
methods
iptw( formula, data, timeInvariant = NULL, cumulative = TRUE, timeIndicators = NULL, ID = NULL, priorTreatment = TRUE, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, bag.fraction = 1, n.minobsinnode = 10, perm.test.iters = 0, print.level = 2, verbose = TRUE, stop.method = c("es.max"), sampw = NULL, version = "gbm", ks.exact = NULL, n.keep = 1, n.grid = 25, ... )
iptw( formula, data, timeInvariant = NULL, cumulative = TRUE, timeIndicators = NULL, ID = NULL, priorTreatment = TRUE, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, bag.fraction = 1, n.minobsinnode = 10, perm.test.iters = 0, print.level = 2, verbose = TRUE, stop.method = c("es.max"), sampw = NULL, version = "gbm", ks.exact = NULL, n.keep = 1, n.grid = 25, ... )
formula |
Either a single formula (long format) or a list with formulas. |
data |
The dataset, includes treatment assignment as well as covariates. |
timeInvariant |
An optional formula (with no left-hand variable) specifying time-invariant chararacteristics. |
cumulative |
If |
timeIndicators |
For long format fits, a vector of times for each observation. |
ID |
For long format fits, a vector of numeric identifiers for unique analytic units. |
priorTreatment |
For long format fits, includes treatment levels from previous times if |
n.trees |
Number of gbm iterations passed on to |
interaction.depth |
A positive integer denoting the tree depth used in gradient boosting. Default: 3. |
shrinkage |
A numeric value between 0 and 1 denoting the learning rate.
See |
bag.fraction |
A numeric value between 0 and 1 denoting the fraction of
the observations randomly selected in each iteration of the gradient
boosting algorithm to propose the next tree. See |
n.minobsinnode |
An integer specifying the minimum number of observations
in the terminal nodes of the trees used in the gradient boosting. See |
perm.test.iters |
A non-negative integer giving the number of iterations
of the permutation test for the KS statistic. If |
print.level |
The amount of detail to print to the screen. Default: 2. |
verbose |
If |
stop.method |
A method or methods of measuring and summarizing balance across pretreatment
variables. Current options are |
sampw |
Optional sampling weights. |
version |
Default: |
ks.exact |
|
n.keep |
A numeric variable indicating the algorithm should only
consider every |
n.grid |
A numeric variable that sets the grid size for an initial
search of the region most likely to minimize the |
... |
Additional arguments that are passed to ps function. |
For user more comfortable with the options of xgboost
],
the options for iptw
controlling the behavior of the gradient boosting
algorithm can be specified using the xgboost
naming
scheme. This includes nrounds
, max_depth
, eta
, and
subsample
. In addition, the list of parameters passed to
xgboost
can be specified with params
.
Returns an object of class iptw
, a list containing
psList
A list of ps
objects with length equal to the number of time periods.
estimand
The specified estimand.
stop.methods
The stopping rules used to optimize iptw
balance.
nFits
The number of ps
objects (i.e., the number of distinct time points).
uniqueTimes
The unique times in the specified model.
ps
, mnps
, gbm
, xgboost
, plot
, bal.table
These data are simulated to demonstrate the iptw
function in the "long" data format.
data(lindner)
data(lindner)
A list with a covariate matrix and outcomes.
Time-invariant covariates are gender
and age
. The time-varying covariate is use
. The reatment indicator is given by tx
. An individual level identifier is given in ID
, and the time period is time
.
Vector of post-treatment outcomes.
These data are simulated to demonstrate the iptw
function in the "wide" data format.
data(lindner)
data(lindner)
A list with a covariate matrix and outcomes.
Gender.
Age.
Baseline substance use
.
Use following first time period treatment.
Use following second time period treatment.
Treatment indicator (first time period).
Treatment indicator (second time period).
Treatment indicator (third time period).
Time-invariant covariates are gender
and age
. The time-varying covariate is use
. The reatment indicator is given by tx
. An individual level identifier is given in ID
, and the time period is time
.
Post-treatment outcomes.
One of the datasets used by Dehejia and Wahba in their paper "Causal Effects
in Non-Experimental Studies: Reevaluating the Evaluation of Training Programs."
Also used as an example dataset in the MatchIt
package.
data(lalonde)
data(lalonde)
A data frame with 614 observations on the following 10 variables.
treat
1 if treated in the National Supported Work Demonstration, 0 if from the Current Population Survey
age
age
educ
years of education
black
1 if black, 0 otherwise
hispan
1 if Hispanic, 0 otherwise
married
1 if married, 0 otherwise
nodegree
1 if no degree, 0 otherwise
re74
earnings in 1974 (pretreatment)
re75
earnings in 1975 (pretreatment)
re78
earnings in 1978 (outcome)
http://www.columbia.edu/~rd247/nswdata.html http://cran.r-project.org/src/contrib/Descriptions/MatchIt.html
Lalonde, R. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76: 604-620.
Dehejia, R.H. and Wahba, S. (1999). Causal Effects in Nonexperimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94: 1053-1062.
These data are adapted from the lindner
dataset in the USPS
package.
The description comes from that package, except for the variable sixMonthSurvive
, which
is a recode of lifepres
Data from an observational study of 996 patients receiving an initial Percutaneous Coronary Intervention (PCI) at Ohio Heart Health, Christ Hospital, Cincinnati in 1997 and followed for at least 6 months by the staff of the Lindner Center. The patients thought to be more severely diseased were assigned to treatment with abciximab (an expensive, high-molecular-weight IIb/IIIa cascade blocker); in fact, only 298 (29.9 percent) of patients received usual-care-alone with their initial PCI.
data(lindner)
data(lindner)
A data frame of 10 variables collected on 996 patients; no NAs.
Mean life years preserved due to survival for at least 6 months following PCI; numeric value of either 11.4 or 0.
Cardiac related costs incurred within 6 months of patient's initial PCI; numeric value in 1998 dollars; costs were truncated by death for the 26 patients with lifepres == 0.
Numeric treatment selection indicator; 0 implies usual PCI care alone; 1 implies usual PCI care deliberately augmented by either planned or rescue treatment with abciximab.
Coronary stent deployment; numeric, with 1 meaning YES and 0 meaning NO.
Height in centimeters; numeric integer from 108 to 196.
Female gender; numeric, with 1 meaning YES and 0 meaning NO.
Diabetes mellitus diagnosis; numeric, with 1 meaning YES and 0 meaning NO.
Acute myocardial infarction within the previous 7 days; numeric, with 1 meaning YES and 0 meaning NO.
Left ejection fraction; numeric value from 0 percent to 90 percent.
Number of vessels involved in the patient's initial PCI procedure; numeric integer from 0 to 5.
Survival at six months — a recoded version of lifepres
.
Kereiakes DJ, Obenchain RL, Barber BL, et al. Abciximab provides cost effective survival advantage in high volume interventional practice. Am Heart J 2000; 140: 603-610.
Obenchain RL. (2009) USPSinR.pdf ../R\_HOME/library/USPS 40 pages.
Extracts table of means from an mnps object.
means.table(mnps, stop.method = 1, includeSD = FALSE, digits = NULL)
means.table(mnps, stop.method = 1, includeSD = FALSE, digits = NULL)
mnps |
An 'mnps' object. |
stop.method |
Indicates which set of weights to retrieve from the 'ps' object. Either the name of the stop.method used, or a natural number with 1, for example, . indicating the first stop.method specified. |
includeSD |
Indicates whether standard deviations as well as means are to be displayed. By default, they are not displayed. |
digits |
If not 'NULL', results will be rounded to the specified number of digits. |
Displays a table with weighted and unweighted means and standardized effect sizes, and – if requested – standard deviations.
'A table of means, standardized effect sizes, and perhaps standard deviations, by treatment group.
[mnps]
These data are simulated to demonstrate the iptw
function in the "long" data format.
data(lindner)
data(lindner)
A list with a covariate matrix and outcomes.
Time-invariant covariates are gender
and age
. The time-varying covariate is use
. The reatment indicator is given by tx
. An individual level identifier is given in ID
, and the time period is time
.
Vector of post-treatment outcomes.
These data are simulated to demonstrate the iptw
function in the "wide" data format.
data(lindner)
data(lindner)
A list with a covariate matrix and outcomes.
Gender.
Age.
Baseline substance use
.
Use following first time period treatment.
Use following second time period treatment.
Treatment indicator (first time period).
Treatment indicator (second time period).
Treatment indicator (third time period).
Time-invariant covariates are gender
and age
. The time-varying covariate is use
. The reatment indicator is given by tx
. An individual level identifier is given in ID
, and the time period is time
.
Post-treatment outcomes.
mnps
calculates propensity scores for more than two treatment groups using gradient boosted
logistic regression, and diagnoses the resulting propensity scores using a variety of methods.
mnps( formula, data, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, bag.fraction = 1, n.minobsinnode = 10, perm.test.iters = 0, print.level = 2, verbose = TRUE, estimand = "ATE", stop.method = c("es.max"), sampw = NULL, version = "gbm", ks.exact = NULL, n.keep = 1, n.grid = 25, treatATT = NULL, ... )
mnps( formula, data, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, bag.fraction = 1, n.minobsinnode = 10, perm.test.iters = 0, print.level = 2, verbose = TRUE, estimand = "ATE", stop.method = c("es.max"), sampw = NULL, version = "gbm", ks.exact = NULL, n.keep = 1, n.grid = 25, treatATT = NULL, ... )
formula |
A formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side. |
data |
The dataset, includes treatment assignment as well as covariates. |
n.trees |
Number of gbm iterations passed on to |
interaction.depth |
A positive integer denoting the tree depth used in gradient boosting. Default: 3. |
shrinkage |
A numeric value between 0 and 1 denoting the learning rate.
See |
bag.fraction |
A numeric value between 0 and 1 denoting the fraction of
the observations randomly selected in each iteration of the gradient
boosting algorithm to propose the next tree. See |
n.minobsinnode |
An integer specifying the minimum number of observations
in the terminal nodes of the trees used in the gradient boosting. See |
perm.test.iters |
A non-negative integer giving the number of iterations
of the permutation test for the KS statistic. If |
print.level |
The amount of detail to print to the screen. Default: 2. |
verbose |
If |
estimand |
|
stop.method |
A method or methods of measuring and summarizing balance across pretreatment
variables. Current options are |
sampw |
Optional sampling weights. |
version |
Default: |
ks.exact |
|
n.keep |
A numeric variable indicating the algorithm should only
consider every |
n.grid |
A numeric variable that sets the grid size for an initial
search of the region most likely to minimize the |
treatATT |
If the estimand is specified to be ATT, this argument is used to specify which treatment condition is considered 'the treated'. It must be one of the levels of the treatment variable. It is ignored for ATE analyses. |
... |
Additional arguments that are passed to |
For user more comfortable with the options of xgboost
,
the options for mnps
controlling the behavior of the gradient boosting
algorithm can be specified using the xgboost
naming
scheme. This includes nrounds
, max_depth
, eta
, and
subsample
. In addition, the list of parameters passed to
xgboost
can be specified with params
.
Note that unlike earlier versions of twang
, the plotting functions are
no longer included in the mnps
function. See plot
for
details of the plots.
Returns an object of class mnps
, which consists of the following.
psList
A list of ps
objects with length equal to the number of time periods.
nFits
The number of ps
objects (i.e., the number of distinct time points).
estimand
The specified estimand.
treatATT
For ATT fits, the treatment category that is considered "the treated".
treatLev
The levels of the treatment variable.
levExceptTreatAtt
The levels of the treatment variable, excluding the treatATT
level.
data
The data used to fit the model.
treatVar
The vector of treatment indicators.
stopMethods
The stopping rules specified in the call to mnps
.
sampw
Sampling weights provided to mnps
, if any.
Lane Burgette '<[email protected]>', Beth Ann Griffin '<[email protected]>', Dan Mc- Caffrey '<[email protected]>'
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", *Psychological Methods* 9(4):403-425.
ps
, gbm
, xgboost
, plot
, bal.table
dxwts
Plot dxwts
## S3 method for class 'dxwts' plot(x, plots = "es", ...)
## S3 method for class 'dxwts' plot(x, plots = "es", ...)
x |
An |
plots |
An indicator of which type of plot is desired. The options are
|
... |
Additional arguments. |
iptw
objectsThis function produces a collection of diagnostic plots for iptw
objects.
## S3 method for class 'iptw' plot( x, plots = "optimize", subset = NULL, color = TRUE, timePeriods = NULL, multiPage = FALSE, figureRows = NULL, hline = c(0.1, 0.5, 0.8), ... )
## S3 method for class 'iptw' plot( x, plots = "optimize", subset = NULL, color = TRUE, timePeriods = NULL, multiPage = FALSE, figureRows = NULL, hline = c(0.1, 0.5, 0.8), ... )
x |
An |
plots |
An indicator of which type of plot is desired. The options are
|
subset |
Used to restrict which of the |
color |
If |
timePeriods |
The number of distinct time points. If |
multiPage |
When multiple frames of a figure are produced, |
figureRows |
The figure rows, passed to displayPlots. Default: |
hline |
Arguments passed to |
... |
Additional arguments. |
This function produces lattice-style graphics of diagnostic plots.
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", Psychological Methods 9(4):403-425.
mniptw
Plot mniptw
## S3 method for class 'mniptw' plot( x, plots = "optimize", pairwiseMax = TRUE, figureRows = NULL, color = TRUE, subset = NULL, treatments = NULL, singlePlot = NULL, multiPage = FALSE, timePeriods = NULL, hline = c(0.1, 0.5, 0.8), ... )
## S3 method for class 'mniptw' plot( x, plots = "optimize", pairwiseMax = TRUE, figureRows = NULL, color = TRUE, subset = NULL, treatments = NULL, singlePlot = NULL, multiPage = FALSE, timePeriods = NULL, hline = c(0.1, 0.5, 0.8), ... )
x |
An |
plots |
An indicator of which type of plot is desired. The options are
|
pairwiseMax |
If |
figureRows |
The figure rows, passed to displayPlots. Default: |
color |
If |
subset |
Used to restrict which of the |
treatments |
Only applicable when |
singlePlot |
For Plot calls that produce multiple plots, specifying an integer value of
|
multiPage |
When multiple frames of a figure are produced, |
timePeriods |
The number of distinct time points. If |
hline |
Arguments passed to |
... |
Additional arguments. |
mnps
objectsThis function produces a collection of diagnostic plots for mnps
objects.
## S3 method for class 'mnps' plot( x, plots = "optimize", pairwiseMax = TRUE, figureRows = NULL, color = TRUE, subset = NULL, treatments = NULL, singlePlot = NULL, multiPage = FALSE, time = NULL, print = TRUE, hline = c(0.1, 0.5, 0.8), ... )
## S3 method for class 'mnps' plot( x, plots = "optimize", pairwiseMax = TRUE, figureRows = NULL, color = TRUE, subset = NULL, treatments = NULL, singlePlot = NULL, multiPage = FALSE, time = NULL, print = TRUE, hline = c(0.1, 0.5, 0.8), ... )
x |
An |
plots |
An indicator of which type of plot is desired. The options are
|
pairwiseMax |
If |
figureRows |
The number of rows of figures that should be used.
If left as |
color |
If |
subset |
Used to restrict which of the |
treatments |
Only applicable when |
singlePlot |
For Plot calls that produce multiple plots, specifying an integer value of
|
multiPage |
When multiple frames of a figure are produced, |
time |
For use with |
print |
If |
hline |
Arguments passed to |
... |
Additional arguments. |
This function produces lattice-style graphics of diagnostic plots.
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", Psychological Methods 9(4):403-425.
ps
objectsThis function produces a collection of diagnostic plots for ps
objects.
## S3 method for class 'ps' plot(x, plots = "optimize", subset = NULL, color = TRUE, ...)
## S3 method for class 'ps' plot(x, plots = "optimize", subset = NULL, color = TRUE, ...)
x |
A |
plots |
An indicator of which type of plot is desired. The options are
|
subset |
If multiple |
color |
If |
... |
Additional arguments. |
This function produces lattice-style graphics of diagnostic plots.
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", Psychological Methods 9(4):403-425.
dxwts
classDefault print statement for dxwts
class
## S3 method for class 'dxwts' print(x, ...)
## S3 method for class 'dxwts' print(x, ...)
x |
A |
... |
Additional arguments. |
iptw
classDefault print statement for iptw
class
## S3 method for class 'iptw' print(x, ...)
## S3 method for class 'iptw' print(x, ...)
x |
A |
... |
Additional arguments. |
mniptw
classDefault print statement for mniptw
class
## S3 method for class 'mniptw' print(x, ...)
## S3 method for class 'mniptw' print(x, ...)
x |
A |
... |
Additional arguments. |
mnps
classDefault print statement for mnps
class
## S3 method for class 'mnps' print(x, ...)
## S3 method for class 'mnps' print(x, ...)
x |
A |
... |
Additional arguments. |
ps
classDefault print statement for ps
class
## S3 method for class 'ps' print(x, ...)
## S3 method for class 'ps' print(x, ...)
x |
An |
... |
Additional arguments. |
iptw
objectProduces a summary table for iptw
object
## S3 method for class 'summary.iptw' print(x, ...)
## S3 method for class 'summary.iptw' print(x, ...)
x |
An |
... |
Additional arguments. |
mniptw
objectProduces a summary table for mniptw
object
## S3 method for class 'summary.mniptw' print(x, ...)
## S3 method for class 'summary.mniptw' print(x, ...)
x |
An |
... |
Additional arguments. |
mnps
objectProduces a summary table for mnps
object
## S3 method for class 'summary.mnps' print(x, ...)
## S3 method for class 'summary.mnps' print(x, ...)
x |
An |
... |
Additional arguments. |
ps
objectProduces a summary table for ps
object
## S3 method for class 'summary.ps' print(x, ...)
## S3 method for class 'summary.ps' print(x, ...)
x |
An |
... |
Additional arguments. |
ps
calculates propensity scores using gradient boosted logistic
regression and diagnoses the resulting propensity scores using a variety of
methods
ps( formula = formula(data), data, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, bag.fraction = 1, n.minobsinnode = 10, perm.test.iters = 0, print.level = 2, verbose = TRUE, estimand = "ATE", stop.method = c("ks.mean", "es.mean"), sampw = NULL, version = "gbm", ks.exact = NULL, n.keep = 1, n.grid = 25, keep.data = TRUE, ... )
ps( formula = formula(data), data, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, bag.fraction = 1, n.minobsinnode = 10, perm.test.iters = 0, print.level = 2, verbose = TRUE, estimand = "ATE", stop.method = c("ks.mean", "es.mean"), sampw = NULL, version = "gbm", ks.exact = NULL, n.keep = 1, n.grid = 25, keep.data = TRUE, ... )
formula |
An object of class |
data |
A dataset that includes the treatment indicator as well as the potential confounding variables. |
n.trees |
Number of gbm iterations passed on to |
interaction.depth |
A positive integer denoting the tree depth used in gradient boosting. Default: 3. |
shrinkage |
A numeric value between 0 and 1 denoting the learning rate.
See |
bag.fraction |
A numeric value between 0 and 1 denoting the fraction of
the observations randomly selected in each iteration of the gradient
boosting algorithm to propose the next tree. See |
n.minobsinnode |
An integer specifying the minimum number of observations
in the terminal nodes of the trees used in the gradient boosting. See |
perm.test.iters |
A non-negative integer giving the number of iterations
of the permutation test for the KS statistic. If |
print.level |
The amount of detail to print to the screen. Default: 2. |
verbose |
If |
estimand |
|
stop.method |
A method or methods of measuring and summarizing balance across pretreatment
variables. Current options are |
sampw |
Optional sampling weights. |
version |
Default: |
ks.exact |
|
n.keep |
A numeric variable indicating the algorithm should only
consider every |
n.grid |
A numeric variable that sets the grid size for an initial
search of the region most likely to minimize the |
keep.data |
A logical variable indicating whether or not the data is saved in
the resulting |
... |
Additional arguments that are passed to |
For user more comfortable with the options of xgboost
,
the options for ps
controlling the behavior of the gradient boosting
algorithm can be specified using the xgboost
naming
scheme. This includes nrounds
, max_depth
, eta
, and
subsample
. In addition, the list of parameters passed to
xgboost
can be specified with params
.
Note that unlike earlier versions of 'twang', the plotting functions are
no longer included in the ps
function. See plot
for
details of the plots.
Returns an object of class ps
, a list containing
treat
The vector of treatment indicators.
treat.var
The treatment variable.
desc
A list containing balance tables for each method selected in
stop.methods
. Includes a component for the unweighted
analysis names “unw”. Each desc
component includes
a list with the following components
ess
The effective sample size of the control group.
n.treat
The number of subjects in the treatment group.
n.ctrl
The number of subjects in the control group.
max.es
The largest effect size across the covariates.
mean.es
The mean absolute effect size.
max.ks
The largest KS statistic across the covariates.
mean.ks
The average KS statistic across the covariates.
bal.tab
a (potentially large) table summarizing the quality of the
weights for equalizing the distribution of features across
the two groups. This table is best extracted using the
bal.table
method. See the help for bal.table
for details
on the table's contents.
n.trees
The estimated optimal number of gradient boosted
iterations to optimize the loss function for the associated
stop.methods
.
ps
a data frame containing the estimated propensity scores. Each
column is associated with one of the methods selected in stop.methods
.
w
a data frame containing the propensity score weights. Each
column is associated with one of the methods selected in stop.methods
.
If sampling weights are given then these are incorporated into these weights.
estimand
The estimand of interest (ATT or ATE).
datestamp
Records the date of the analysis.
parameters
Saves the ps
call.
alerts
Text containing any warnings accumulated during the estimation.
iters
A sequence of iterations used in the GBM fits used by plot
function.
balance
The balance measures for the pretreatment covariates used in plotting, with a column for each
stop.method
.
balance.ks
The KS balance measures for the pretreatment covariates used in plotting, with a column for each
covariate.
balance.es
The standard differences for the pretreatment covariates used in plotting, with a column for each
covariate.
ks
The KS balance measures for the pretreatment covariates on a finer grid, with a column for each
covariate.
es
The standard differences for the pretreatment covariates on a finer grid, with a column for each
covariate.
n.trees
Maximum number of trees considered in GBM fit.
data
Data as specified in the data
argument.
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment", *Psychological Methods* 9(4):403-425.
Simulated example data for assessing race bias in traffic stop outcomes
data(raceprofiling)
data(raceprofiling)
A data frame with 5000 observations on the following 10 variables.
id
an ID for each traffic stop
nhood
a factor indicating the neighborhood in which the stop occurred.
reason
The reason for the stop, mechanical/registration violations, dangerous moving violation, non-dangerous moving violation
resident
an indicator whether the driver is a resident of the city
age
driver's age
male
an indicator whether the driver was male
race
the race of the driver, with levels A
, B
,
H
, W
hour
the hour of the stop (24-hour clock)
month
and ordered factor indicating in which month the stop took place
citation
an indicator of whether the driver received a citation
This is simulated data to demonstrate how to use twang
to adjust
estimates of racial bias for important factors. This dataset does not represent
real data from any real law enforcement agency.
G. Ridgeway (2006). “Assessing the effect of race bias in post-traffic stop outcomes using propensity scores,” Journal of Quantitative Criminology 22(1).
data(raceprofiling) # the first five lines of the dataset raceprofiling[1:5,]
data(raceprofiling) # the first five lines of the dataset raceprofiling[1:5,]
Performs the sensitivity analyses described in Ridgeway (2006). This is a beta version of this functionality. Please let the developers know if you have problems with it.
sensitivity(ps1, data, outcome, order.by.importance = TRUE, verbose = TRUE)
sensitivity(ps1, data, outcome, order.by.importance = TRUE, verbose = TRUE)
ps1 |
A 'ps' object. |
data |
The dataset including the outcomes |
outcome |
The outcome of interest. |
order.by.importance |
Orders the output by relative importance of covariates. |
verbose |
If 'TRUE', extra information will be printed. |
Returns the following * 'tx' Summary for treated observations. * 'ctrl' Summary for control observations.
Ridgeway, G. (2006). "The effect of race bias in post-traffic stop outcomes using propensity scores", *Journal of Quantitative Criminology* 22(1):1-29.
In older versions of twang, the 'ps' function specified the 'stop.method' in a different manner. This 'stop.methods' object is used to ensure backward compatibility; new twang users should not make use of it.
stop.methods
stop.methods
An object of class matrix
(inherits from array
) with 1 rows and 6 columns.
This is merely a vector with the names of the stopping rules.
iptw
objectComputes summary information about a stored iptw
object
## S3 method for class 'iptw' summary(object, ...)
## S3 method for class 'iptw' summary(object, ...)
object |
An |
... |
Additional arguments. |
Compresses the information in the desc
component of the iptw
object
into a short summary table describing the size of the dataset and the quality of
the propensity score weights.
See iptw for details on the returned table.
mniptw
objectSummarize a mniptw
object
## S3 method for class 'mniptw' summary(object, ...)
## S3 method for class 'mniptw' summary(object, ...)
object |
A |
... |
Additional arguments. |
mnps
objectComputes summary information about a stored mnps
object
## S3 method for class 'mnps' summary(object, ...)
## S3 method for class 'mnps' summary(object, ...)
object |
An |
... |
Additional arguments. |
Compresses the information in the desc
component of the mnps
object
into a short summary table describing the size of the dataset and the quality of
the propensity score weights.
See mnps for details on the returned table.
ps
objectComputes summary information about a stored ps
object
## S3 method for class 'ps' summary(object, ...)
## S3 method for class 'ps' summary(object, ...)
object |
An |
... |
Additional arguments. |
Compresses the information in the desc
component of the ps
object
into a short summary table describing the size of the dataset and the quality of
the propensity score weights.
See ps for details on the returned table.