Introduction
In the field of surgery, evaluating and comparing treatments via the gold standard of a randomised controlled trial is not always feasible. Comparing laparoscopic ("keyhole" or "minimally invasive" surgery) with standard open surgery is particularly challenging due to difficulties with blinding and, in many cases, a perceived lack of equipoise.
Unfortunately, the result is that we are often faced with multiple, usually small (< 100 patients) observational cohort trials. These are plagued with selection bias, as early in the development of new surgical procedures "ideal" patients are legitimately selected to minimise the difficulty of the operation and ensure patient safety. This generally leads to a gross overestimate of treatment effect for the new procedure in early trials. With the passage of time and gaining of experience, the patient populations generally approach each other and the estimate of effect size becomes more reliable. Meta-analysis of these trials can be helpful, but does little to ensure that treatment and control groups don't differ in a systematic way.
This is the current state of affairs with laparoscopic liver surgery. The technique has been in use for approximately 20 years and whilst not "widespread", it is a commonly used technique among specialist liver surgeons. These sub-specialists who perform this surgery on a regular basis, point to the multitude of observational trials showing safety and comparative oncological efficacy (as measured by disease free and overall survival). Indeed, most surgeons utilising the technique would quote a significant improvement in peri-operative outcomes (such as shorter hospital stay, reduced blood loss, pain and wound complications). Among the wider surgical community concerns remain regarding the lack of robust comparative trials that would provide confidence that patient safety and oncological outcomes are not compromised.
In the absence of an RCT, techniques are available that can reduce the impact of observable sources of bias. Regression is often used to produce treatment effect estimates. However, if the groups are significantly different in some covariates and there exist interactions and non-linear relationships, these estimates will be biased. Propensity score(PS) techniques have been used in the social sciences for many years, but are not widely used or understood in the surgical literature. PS-based analyses are, however, becoming increasingly popular in the surgical literature due to there ability to remove bias from observed confounders. There are a variety of methods whereby the propensity score can be used to estimate treatment effects: matching, stratification, covariate adjustment and inverse probability of treatment weighting (IPTW).
Following, we discuss the principles of propensity scores with a focus on IPTW techniques. Various PS estimation techniques are compared and modern techniques such as generalised boosted models are described.
This site will host the statistical supplement for the above paper including details to reproduce the analysis using R.
Importantly, the exact results presented in the following supplement may differ slightly from those reported in Lewin et al due to differences in the missing value imputation and propensity scoring process (both procedures rely on underlying random number generation), however the magnitude of change should be small and the overall conclusions unchanged.