ISSN: 2161-0533
+44-77-2385-9429
Research Article - (2018) Volume 7, Issue 1
Objective: Although previous studies have developed classical models predicting outcomes after hip replacement, no formal machine-learning based calculators have been designed to predict Oxford Hip Score based on a national sample. The aim of our study was to develop a series of machine-learning models and a web-based calculator to predict Oxford Hip Scores after total hip replacement.
Methods: We made use of the National Health Service Patient Reported Outcome Measures and Hospital Episode Statistics (NHS PROMS/HES) database, evaluating pre and post-operative data from patients aged over 50 years old undergoing total hip replacement from 2010 to 2015. Predictors of Oxford Hip Score were assessed using a combination of machine-learning and tree regression models.
Results: A total of 170,283 patients participated in the study. Most patients were female (60.7%), aged between 70 and 79 years, with a baseline Oxford Hip Score lower than 41. Across all machine learning models, the most significant predictors of Oxford Hip Scores were pre-operative EQ-5D index and self-perceived disability, problems while shopping, circulation diseases, and pre-operative problems while climbing stairs. The best performing models were Gradient Boosting Machines, Boosted Generalized Linear Model, and Multivariate Adaptive Regression Splines with R-Squared values of, respectively, 0.18, 0.18, and 0.18. A Web-based calculator was developed (https:// companionsite.sporedata.com/app/predicthip/).
Conclusion: Highly accurate models were developed to predict the Oxford Hip Scores, which can be used in both clinical decision-making and healthcare the management of healthcare resources.
Keywords: Hip arthroplasty; Predictors; Osteoarthritis; Patientreported outcome measures; Oxford hip score; Web-based calculator
With the increase in the average age of the global population, the number of total hip replacement procedures has increased significantly over the past decades [1,2]. While prognostic prediction is essential to the quality of care provided to these patients [3], to date most efforts in total hip replacement have either focused on manually-calculated scores or predictive models calculated through traditional statistical methods [4,5]. In contrast, specialties such as cardiology and cardiothoracic surgery now use machine learning algorithms which not only allow for increased prognostic accuracy but also for the creation of online calculators to assist with bedside decision support.
Although the ultimate goal of total hip replacement is to relieve pain and disability from conditions such as osteoarthritis [3], some studies have demonstrated that outcomes resulting from this procedure are not always equally beneficial [6], making prognostic prediction an important decision support tool. In general, predictors of disability after total hip replacement can be classified into modifiable, e.g., comorbidities, BMI or mental state, and non-modifiable, e.g., age, gender, and socioeconomic class [7]. Among these two categories, the most common predictors of function worsening following total hip replacement are preoperative pain, disability, and age [5,8]. Others have reported socioeconomic status [9] and psychosocial factors [10] as additional predictors of disability.
Standard methods used to predict and evaluate orthopedic surgical outcomes, in general, can also be applied to total hip replacement. These include generic and disease-specific questionnaires, and hipspecific and general clinical measures [11]. Among these, the Oxford Hip Score is a standardized, validated, patient-reported measure of outcomes after total hip replacement [12], with the ability to predict its scores constituting a fundamental element in not only assisting with clinical decision making but also allowing for better management in the allocation of healthcare resources. As an example of the latter, patients with predicted Oxford Hip Scores might need less physical therapy sessions or a longer interval between follow-up visits. Machine learning can, therefore, be used to facilitate the integration of various risk factors for disability following total hip replacement, ultimately maximizing results accuracy and improving care management.
Machine learning is increasingly employed in the prediction of patient prognosis. These techniques involve the use of algorithms to predict outcomes while allowing for non-linear associations between predictors and outcomes [13]. This non-linear aspect is in contrast to the most common use of traditional generalized linear models in the healthcare literature. Machine learning has been previously used in areas such as the prediction of postoperative pain, far surpassing the accuracy achieved through traditional statistical models [14]. Although machine learning has been applied in the prediction of surgical complications and postoperative morbidity for many surgical procedures [15], to our knowledge it has not been applied to the prediction of Oxford Hip Scores based on a large, national sample.
In the face of this gap in the literature, the objective of this study was to elaborate a predictive model based on machine learning along with a web-based calculator for the Oxford Hip Score after total hip replacement. Specifically, our models are based on an extensive panel database from England.
Study design
Our objective was to develop a predictive model of disability after total hip replacement along with a corresponding web-based calculator, our model based on the NHS PROMS/HES database (National Health Service Patient Reported Outcome Measures and Hospital Episode Statistics). This study is described per the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) guideline [16].
Ethics
The Institutional Review Board of the SECONCI of the State of São Paulo approved our study. Since no data with identifiable information were part of the PROMS dataset, no informed consent was sought from participating patients.
Setting
Data for this study were obtained from the aggregation of the 2010- 2015 National Health Service Patient Reported Outcome Measures and Hospital Episode Statistics (NHS PROMS/HES) datasets. Specifically, we focused on the dataset containing pre-operative and six-month postoperative information [17].
All English providers of NHS-funded unilateral total hip replacement are expected to offer patients a preoperative, Patient- Reported Outcome Measures (PROMs) questionnaire. The PROMs questionnaire, a validated questionnaire originally covering four main clinical procedures (hip replacements, knee replacements, groin hernia, varicose veins) (https://www.england.nhs.uk/statistics/statistical-workareas/ proms/), calculates the health gains after surgical treatment using pre- and post-operative surveys. Each of these conditions evaluated by PROMs presents scores for the EQ-VAS (Visual Analogue Scale) and EQ- 5D™ Index, as well as condition-specific measures for some (Oxford Hip Score, Oxford Knee Score, and Aberdeen Varicose Vein Questionnaire). PROMs are compelling because they use validated questionnaires to turn a symptom into a numerical score (https://catalyst.nejm.org/implementing-proms-patient-reported-outcome-measures/). Also, this questionnaire has been extensively evaluated for validity and reliability, for a wide range of clinical conditions [18-21]. The completed PROMs questionnaires are then securely transferred to the contractors in charge of merging all data, where the forms are electronically scanned along with the NHS identifier. Postoperative questionnaires are then sent to patients after six months from the date of the surgical procedure. Once forms have been filled out, they are electronically scanned and linked with the pre-operative data. After the consent period, personal identifiers are removed from the database, ultimately anonymizing it. Finally, the data collection program is limited to England, including only a small number of patients from Scotland and Wales.
Participants
We admitted all elective patients included in the database, who were aged 50 years old and above, and undergoing total hip replacement between April of 2010 and March of 2015.
Outcomes
Our primary outcome was the postoperative Oxford Hip Score, a questionnaire which has proven to be valid for the evaluation of results after hip replacement [22]. Its self-reported questionnaire consists of 12 Likert-type response items. Specifically, the Oxford Hip Score includes three questions assessing pain as perceived by patients, as well as nine questions evaluating self-perceived problems related to activities of daily living including walking, climbing stairs, putting on socks, standing, activities associated with transport, washing, shopping and carrying out domestic activities [23]. We summed up responses to an overall score where 0 indicates the worst possible, and 48 the best possible outcome. Scores ranging from 0 to 19 indicate severe hip arthritis, 20 to 29 designate moderate to severe hip arthritis, 30 to 39 mild to moderate hip arthritis and 40 to 48 indicate satisfactory joint function [23]. This questionnaire has been extensively evaluated for validity and reliability, as well as with a well-established factorial structure [24].
Predictors
After the review of the available evidence, the following preoperative variables were considered as predictors: (1) Preoperative EQ-5D index; (2) Self-perceived pre-operative items of the EQ5D health status such as disability, mobility, self-care, ability to perform usual activities, discomfort, anxiety, pain, night and sudden pain; (3) Preoperative variables assessed through the Oxford Hip questionnaire including troubles with daily activities such as washing, transport, dressing, shopping, walking, limping, climbing stairs, and shopping; (4) Presence of co-morbidities such as cardiovascular diseases (stroke, high blood pressure, heart and circulation diseases), lung disease, diabetes, kidney disease, nervous system disease and depression.
Data analysis
Our exploratory analysis started by evaluating distributions, frequencies, and percentages for each of the numeric and categorical variables. We assessed categorical variables for near-zero variation. Extensive graphical displays were used for both univariate analysis and bivariate associations, accompanied by broader tests such as Maximal Information Coefficient [25] and Nonnegative Matrix Factorization [26] algorithms for numeric variables. Missing data were explored using a combination of graphical displays involving univariate, bivariate and multivariate methods. Imputation was performed using a k-nearest neighbors algorithm (n=5).
We modeled the Oxford Hip Score as an outcome variable using the following variables as predictors: (1) Preoperative EQ-5D index, (2) Selfperceived pre-operative disability, mobility, self-care, ability to perform usual activities, discomfort, anxiety, pain, night and sudden pain; (3) Preoperative variables assessed in the Oxford hip questionnaire such as troubles with daily activities such as washing, transport, dressing, shopping, walking, limping, climbing stairs, and shopping; (4) Presence of co-morbidities such as cardiovascular diseases (stroke, high blood pressure, heart and circulation diseases), lung disease, diabetes, kidney disease, nervous system disease and depression. To train and test our models, we used a 5-fold model validation.
We made use of a series of machine-learning regression models, i.e., models focusing on numeric outcome variables, including the Radial Basis Function Kernel Regularized Least Squares, Linear Regression with Backwards Selection, Linear Regression with Forward Selection, Principal Component Analysis, Support Vector Machines with Linear Kernel, Random Forest. Equally included were Neural Network, Recursive Partitioning (Tree Regression Models), Stochastic Gradient Boosting, Boosted Generalized Linear Model, Bagged Model, k-Nearest Neighbors, Sparse Partial Least Squares and Multivariate Adaptive Regression Splines.
Model performance was evaluated from the Root Mean Square Error, and Root Squared indices, which determined what models had the best prognostic performance.
We also used regression trees (recursive partitioning) with the same set of previously described outcomes and predictors. Regression trees complement the use of machine learning models as they represent the best cut-points for predictor values in the context of a given outcome after previous predictors have been taken into account. To avoid overfitting, we used a cost-complexity pruning strategy using the weakest link pruning strategy by successively collapsing the internal node that produces the smallest per-node increase in the cost complexity criterion [27]. When overfitting is detected, those nodes were removed. Otherwise, they were left intact. We have also provided a graphical representation of each model.
All calculations were performed using the statistical language R and packages ggplot2, caret, knitr, vcd, randomForest, MASS, glmnet, mda, pROC, corrplot, and tabplot. Finally, a Web application using R shiny was developed to render our model results accessible to healthcare professionals at the point of care or for healthcare management purposes.
Table 1 reports information on our total study sample as well as a stratification by median values for pre-operative Oxford Hip score, since this is an important predictor in our study. A total of 192,514 patients were considered eligible for inclusion in this study. We excluded 21,300 patients who were younger than 50 years old from the analysis, resulting in a total of 171,214 patients. Most patients were female (60.9%), between 70 and 79 years of age (39.1%), and presented a baseline Oxford Hip Score lower than 17. Some demographic and clinical characteristics were significant predictors of Oxford Hip Score indicating disability after total hip replacement with the p value < 0.001. Demographic variables included older age and female gender, EQ-5D index scores, while predictive clinical characteristics included (a) self-reported problems related to disability, mobility, self-care, ability to perform usual activities, discomfort, anxiety, pain, night and sudden pain, (b) problems with daily activities such as washing, transport, dressing, shopping, walking, limping, climbing stairs, and shopping, and (c) the presence of co-morbidities such as cardiovascular diseases (heart, high blood pressure and circulation diseases), stroke, lung disease, diabetes, kidney disease, nervous system disease, and depression.
Variable [Missing] | Total (171,214) | Pre-op Oxford Hip Score < 17 (78,152) | Pre-op Oxford Hip Score ≥17 (93,062) | p |
---|---|---|---|---|
Age band [0] | < 0.001 | |||
- 50 to 59 | 21,443 (12.5%) | 10,123 (13%) | 11,320 (12.2%) | |
- 60 to 69 | 59,788 (34.9%) | 25,847 (33.1%) | 33,941 (36.5%) | |
- 70 to 79 | 67,012 (39.1%) | 29,945 (38.3%) | 37,067 (39.8%) | |
- 80 to 89 | 22,887 (13.4%) | 12,181 (15.6%) | 10,706 (11.5%) | |
- 90 to 120 | 84 (0%) | 56 (0.1%) | 28 (0%) | |
Female | 104,124 (60.9%) | 52,282 (66.9%) | 51,842 (55.8%) | < 0.001 |
Disability [147,319] | 13,853 (8.1%) | 7,984 (10.2%) | 5,869 (6.3%) | < 0.001 |
Heart disease [0] | 2,464 (1.4%) | 1,268 (1.6%) | 1,196 (1.3%) | < 0.001 |
High blood pressure [0] | 10,300 (6%) | 4,980 (6.4%) | 5,320 (5.7%) | < 0.001 |
Stroke [0] | 373 (0.2%) | 215 (0.3%) | 158 (0.2%) | < 0.001 |
Circulation disease [0] | 1,392 (0.8%) | 887 (1.1%) | 505 (0.5%) | < 0.001 |
Lung disease [0] | 2,141 (1.3%) | 1,158 (1.5%) | 983 (1.1%) | < 0.001 |
Diabetes [0] | 2,438 (1.4%) | 1,278 (1.6%) | 1,160 (1.2%) | < 0.001 |
Kidney disease [0] | 485 (0.3%) | 261 (0.3%) | 224 (0.2%) | < 0.001 |
Nervous system disease [0] | 197 (0.1%) | 117 (0.1%) | 80 (0.1%) | < 0.001 |
Liver disease [0] | 131 (0.1%) | 74 (0.1%) | 57 (0.1%) | 0.016 |
Cancer [0] | 1,354 (0.8%) | 644 (0.8%) | 710 (0.8%) | 0.163 |
Depression [0] | 2,000 (1.2%) | 1,222 (1.6%) | 778 (0.8%) | < 0.001 |
Arthritis [0] | 18,286 (10.7%) | 8,775 (11.2%) | 9,511 (10.2%) | < 0.001 |
Pre-operative EQ5D INDEX [9,419] | 0.35 (± 0.32) | 0.12 (± 0.26) | 0.55 (± 0.23) | < 0.001 |
Table 1: Sample description stratified by oxford hip score.
Exploratory analysis
When evaluating the association between the presence of disability and Oxford Hip score, we found that patients with no functional disability before surgery were significantly associated with an increased postoperative Oxford Hip score indicating improvement in ability to perform activities of daily living (Figure 1).
When evaluating the Oxford hip score in reference to pre-operative EQ-5D index score, we found that patients with high EQ-5D index scores representing best health status were associated with significant improvement in postoperative Oxford Hip scores, indicating an improvement in their ability to perform activities of daily living (Figure 2).
Model performance
When comparing different machine-learning models, the best performing algorithms were Gradient Boosting Machines, Boosted Generalized Linear Model, and Multivariate Adaptive Regression Splines, with R-Squared values of, respectively, 0.18, 0.18, and 0.18 (Table 2 and Figure 3).
Root Mean Square Error | Root Squared | |
---|---|---|
Stochastic Gradient Boosting | 7.92 | 0.18 |
Multivariate Adaptive Regression Splines | 7.95 | 0.18 |
Boosted Generalized Linear Model | 7.95 | 0.18 |
Random Forest | 8.08 | 0.16 |
Table 2: Model performance table predicting Oxford Hip Scores.
Although results regarding the most relevant variables for prediction of Oxford Hip Scores varied across models, pre-operative values for the EQ-5D indices, shopping ability, and self-perceived disability were the most important variables across all models. Conditions related to the circulatory system were most important for Gradient Boosting Machines and Multivariate Adaptive Regression Splines while preoperative shopping capacity were important variables for Oxford Hip Score prediction using Stochastic Gradient Boosting and Boosted Generalized Linear Model (Figure 4).
Following the best available model, we designed a web-based calculator of Oxford Hip Scores based on baseline variables [28]. The surgeon can use our calculator to predict the post-operative Oxford Hip score by adding the following information of the patient such as age, gender, pre-operative disability, poor circulation, diabetes, depression, arthritis, pre-operative EQ-5D index score, and preoperative problems with shopping, climbing stairs and limping in the web-based calculator link. After adding all the information click on the “Predict” button to get the post-operative outcome score.
Tree regression
Finally, a tree regression model was used to evaluate how sequential combinations of predictors would affect the outcomes. The model demonstrated that 50.6% of patients who had moderate, little or no trouble in shopping and climbing stairs ability before surgery showed better improvement in joint function with higher postoperative Oxford Hip score of 41.3 (Figure 5).
To our knowledge, no previous study has used machine learning methods to explore clinical prognostic predictors of disability as measured by the Oxford Hip Score using a national sample. This makes our findings unique to a wide range of variables were integrated to predict hip-related disability. Across all machine learning models, the most significant predictors of Oxford Hip Scores were pre-operative EQ5D index and self-perceived disability, problems while shopping, circulation diseases, and pre-operative problems while climbing stairs. The best performing models were Gradient Boosting Machines, Boosted Generalized Linear Model, and Multivariate Adaptive Regression Splines with R-Squared values of, respectively, 0.183, 0.179, and 0.175.
Age and gender are established demographic predictors of outcomes following total hip replacement. These variables have been recently identified as important in the construction of algorithms to predict patients with poor outcomes following total hip replacement [29]. More specifically, the female gender and older age have been associated with lower improvement levels after total hip replacement alongside other clinical characteristics [30,31]. Other clinical symptoms have been reported to predict outcome following total hip replacement. For example, pain has been demonstrated in several studies to be an important predictor of poor outcomes following total hip replacement [29,30,32]. Similarly, co-morbidities such as osteoarthritis grade have been shown to determine outcomes following total hip replacement [30,32]. Other authors further described the loss of function or problems walking as predictors of outcome of total hip replacement [31,32].
While a number of previous studies have incorporated different variables in the prediction of outcomes after total hip replacement, to our knowledge the Oxford Hip Score has yet to be used as a measure of outcome, especially within a large sample with patients from an entire country. The present study has yielded a wider range of predictors of outcomes following total hip replacement. For example, specific activities such as mobility, self-care, washing, walking, limping, climbing stairs, and shopping are now included as predictors of outcomes after total hip replacement. Furthermore, preoperative EQ- 5D, as well as specific co-morbidities affecting post-operative outcomes following total hip replacement, have been included in the list of predictors. Although previous reports predicted the Oxford Hip Score incorporating several variables, they mostly represent a limited sample drawn from cohorts or clinical trials [33]. In contrast, our results are representative of the entire US population. Moreover, an important strength of the present report is the evaluation of specific activities as predictors of the Oxford Hip Score, instead of only taking into account the global result of a scale representing the broader category of activities of daily living. For example, shopping and climbing stairs were two of the activities of major relevance to the machine learning models. Therefore, rehabilitation programs after total hip replacement should be designed prioritizing these specific activities to accomplish the goal of impacting patient-reported outcomes.
When predicting the Oxford Hip Score, the most relevant variables were: EQ-5D index, preoperative disability, preoperative problems with shopping, circulatory diseases, and pre-operative problems. Contrastingly, gender, age, BMI, pain, and problems with wearing stockings were the essential variables according to previous results [29]. Such differences might have resulted from our database having a greater degree of granularity related to specific activities of daily living, ultimately resulting in more distinct activities and scores. The incorporation of these variables in a web-based calculator might assist clinicians to estimate patient’s outcomes. Additionally, our results could inform healthcare systems about resource allocation and management according to patient characteristics, thus advancing the field of orthopedics precision.
Our study has limitations that are inherent to an observational design. First, our diagnoses were not validated through agreement across different observers, thus introducing a potential bias. This limitation is currently being addressed in a study where some of our variables are formally tested to ensure acceptable levels of observer reliability. Second, we did not include self-reported measures of quality of life or dysfunction. These measures constitute an important metric in that they take into account a direct patient perspective, which is missing when only provider-driven metrics are used. This absence in our study was primarily driven by logistical reasons, in that the inclusion of self-report questionnaires would significantly increase the complexity of data collection across all participating sites. Third, despite our best efforts in controlling for missing rates, some of our variables had particularly high rates. To minimize this limitation, we made use of imputation algorithms followed by sensitivity analyses to ensure that our conclusions were valid under different assumptions. Finally, given that our sample was not randomly drawn from a larger patient population, its external validity can be questioned.
Using the Oxford Hip Score as a measure of outcome after total hip replacement, many variables play a role in the prediction of outcome, the most important of which are the EQ5D index, pre-operative disability, pre-operative problems with shopping, circulation diseases and pre-operative problems with climbing stairs. The incorporation of these variables in a Web-based calculator might assist with not only providing patients with a better estimation of what their outcomes will likely be but also helping with the management of healthcare resources to be used within a given patient population.