ISSN: 0974-276X
Research Article - (2013) Volume 6, Issue 3
HIV/AIDS symptoms, treatment efficacy and type of treatments differ from one region/country to another. In this article, we discuss and analyze American Midwest patients HIV/AIDS laboratory data. A total of 9,392 patient’s visits for 2,588 patients have been considered. The efficacy of different treatments, in terms of reducing the Ribonucleic acid (RNA), and/or increasing the CD4 counts, has been analyzed. We provide data summary graphs that help to identify important patterns embedded in patient’s reactions to different therapies. We provide statistical evidence regarding treatment tolerance (in terms of the number of consecutive times a given treatment can be used on a considered patient) and most effective treatments (in terms of the rate of change in CD4 and RNA levels). We avoided using genetic sequences, to help understand the treatment efficacy, based solely on basic laboratory tests (CD4 count and RNA levels). We believe that since genetic analysis cannot be easily obtained in resource-limited countries, it is important to investigate if basic laboratory data (when collected in large quantities) will be sufficient to determine the most effective treatments. We show that, before 2004, the percentage of CD4 count>350 kept improving to rise from about 51% on 2000, to about 55% in 2004. During the same period of time, the percentage of undetectable RNA level declined from about 65% to about 52%. We also show that both healthy CD4 count of >350 cells/μl and undetectable RNA level (<75/50 cells/μl depending on the year of measurement) have significantly improved with CD4 count>350 going up from about 55% in 2004 to about 65% in 2007. We provide statistical analyses and efficacy evaluation, related to the most significant treatments throughout the years 2000-2008. All data and analysis tools are provided on our Midwest HIV/AIDs web portal (http://hivdatamining.com).
Keywords: HIV; AIDS; CD4; RNA; Data mining
In 2011, the National Institutes of Health published guidelines for the Use of Antiretroviral Agents in HIV-1-Infected adults and Adolescents [1]. The report suggests that an adequate CD4 response for most patients on therapy is defined as an increase in average CD4 count, in the range of 50-150 cells/μl per year, generally with an accelerated response in the first three months. Subsequent increases in patients with good virologic control show an average decrease of approximately 50-100 cell/μl per year for the subsequent years, until a steady state level is reached [2]. Patients, who initiate therapy with a low CD4 count, or at an older age, may have a blunted increase in their count, despite virologic suppression. Boyd [3] declared that although HIV treatments normally require three antiretroviral drugs, it is very difficult to make a decision on which regimen to choose, depending on the considered patients’ condition. Boyd [3] explains that first, with the availability of over 20 antiretroviral agents in six different drug classes today, numerous regimens can be created. Second, the decision about which regimen to initiate is based not only on safety and efficacy data from clinical trials, but also on baseline drug-resistance mutations, the adherence related factors, the potential for drug-drug interactions, and other patient-specific factors. Third, the data supporting when to initiate antiretroviral therapy are less definitive than the evidence is for which regimen to initiate.
The guidelines provided by the Department of Health and Human Services (DHHS) provided many revisions to the use of guidelines for the use of antiretroviral agents in HIV-1. In 1998, DHHS recommended that patients should be treated early and aggressively to eradicate HIV. Treatments was recommended, for treatment naive patients, based either on a CD4 count (≤ 500 cell/μl), or lower viral load thresholds (≥ 20,000 cell/μl). However, it was soon determined that HIV eradication was not feasibly due to decreased quality of patients life, in addition to the high possibility of generating an early drug resistance. Hence, in 2007, a treatment deferral strategy was recommended by the DHHS guideline. Boyd [3] explains that low pill burdens oncedaily dosing, better adverse effect profiles, and higher potency led to increased adherence and better success with early treatment. In 2007, DHHS recommended earlier treatments at CD4 cell count ≤ 350, and eliminated the use of viral loads, as a criterion to initiate therapy. In 2009, DHHS recommended starting treatments in patients with CD4 cell counts of <500 (back to the 1998 recommendation), and this recommendation remain unchanged in the 2011 guidelines [3,4]. The reader is strongly encouraged to read [5-9] for a detailed discussion and justification, regarding the use of early treatments (<500 and <350).
In 2006, Rodriguez et al. [10] reported their research findings on 1,512 patients. They report that median CD4 cell decrease among participants with HIV RNA levels of 500 or less, 501 to 2000, 2001 to 10,000, 10,001 to 40,000, and more than 40,000 copies/μl were 20, 39, 48, 56, and 78 cells/μl, respectively. They report that despite this trend across broad categories of HIV RNA levels, only a small proportion of CD4 cell loss variability (4%-6%) could be explained by presenting plasma HIV RNA level. Kovacs et al. [11] studied the effect of first line treatments on 1098 woman. They report that for non treated woman between visits 1 and 2, the CD4 declined within the three averages of -31, -17, -8 cells/6 months, whether RNA increased, remained stable or decreased. Initial CD4 and RNA had an impact on magnitude, but not pattern of decline (in the range of -11 to -35 cells/μl). For treated women, CD4 increased in those with persistently undetectable RNA (mean+7.6 cells/6 months), and decreased for those with maximum RNA, over 1 M copies/μl (mean -13 cells/6 months).
In this research, we report major data pattern found in 46,960 datasets, collected from a real life patients records in Midwest clinics. The objective is to provide HIV-1 data analysis based on real-life patients records. We focus on the following outcomes:
i. Providing data summary for the available datasets.
ii. Providing optimal CD4 count and RNA level for treatment initiations.
iii. Determining ranges of CD4 and RNA values where treatments potency increases.
iv. Providing and analyzing rate of change of CD4 count and RNA level related to different treatments.
v. Providing correlation analysis between the CD4 and RNA levels.
vi. Providing analysis for first, second, and third treatments levels.
vii. Discussing how treatments efficacy evolved during the last decade.
viii. Determining ranges of CD4 and RNA values where major treatments have similar effect.
A total of 9,392 patient’s visits for 2,588 patients have been considered. The following data is stored for each visit:
i. Visit date
ii. prescribed treatments
iii. CD4 count (measured as number of cells/μl)
iv. RNA level (measured as number of viral load/μl.
All data are published on our http://www.hivdatamining.com web portal. We used the GGobi software [12] to plot most figures in this manuscript. Figure 1 depicts the distribution of the http://hivdatamining.com’s patient’s visits. The figure shows that 440 patients have visited their clinics between 2-6 times, and that 306 patients have visited their clinics 6-10 times, 200 patients have visited their clinics 10-14 times, etc. Although our database stores data treatment that are available before 2009, we must drew attention to the fact that these treatments are still major treatment used at this point in time, since the only FDA [13] approved medication since 2008 is the Rilpivirine/ Edurant”, which was approved in 2011. Figure 2 depicts the number of available patient’s visits throughout the years. It is important to note that the depicted numbers of visits does not indicate the actual number of registered patients, since only available electronic data records have been added to the http://www.hivdatamining.com database. Furthermore, only the first three months of the 2008 patient’s records were provided to our database, and a major clinic’s patients records have not been provided after 2005. Figure 3 depicts the ranges of changes of CD4 count and RNA level over the years. The dots depicted on each percentile bar indicate the average CD4 and RNA values. The bottom of each line indicates the minimum value, and the top of each bar indicates the maximum value. The figure shows that significant improvement occurred in the year 2004 onward. This improvement is clearly noticed by the gradual increase in the minimum, average, and maximum CD count, and decrease in above three values of the RNA level. This is also clear from figure 4 that depicts the percentile CD4 count >350 (acceptable), and undetectable RNA values (<50 or <75, depending on the technology used to measure the RNA level at different years).
Figure 3: Error bar graphs for the distribution of CD4 count and RNA level over the years.
Figure 4: Most used treatments and percentage change of CD4 count and RNA level along the considered years.
Figure 4 clearly indicates that a significant negative correlation has been obtained throughout the years. It is interesting to note that the average values of CD4 count and RNA level indicates that treatments provided before 2004, did not provide affective impact in increasing the CD4 count or deceasing the RNA level. To understand the affect of treatments on the percentile improvement in CD4 count and RNA levels, we provided in figure 4, major treatments (obtained from our database) over the years. The figure shows that the treatment “Kaletra, Truvada” (shown on the figure as KaTr), has been used in 2005, and it has clearly effected the suppression of the RNA level during the years 2004-2007. However, figure 4 show that Trizivir (which was approved by FDA on November 2000), has made a positive impact in providing higher CD4 count, when it was heavily used in 2002. So, although Trizivir made a positive impact on the CD4 count, it is “Kaletra, Truvada”, which made a positive impact on the RNA level. We will discuss treatment efficacy in more details in the next section. It is, however, worthwhile noticing here that in 2003, the Federal Drug Association (FDA) approved four regimens [13]: two Protease Inhibitors (PI), namely Lexiva and Reyataz, one Nucleoside Reverse Transcriptase Inhibitor (NRTI), namely Reyataz. In 2004, FDA approved two NRTI regimens, Epzicom and Truvada.
Figure 5 depicts the distribution of CD4 count and RNA level associated with major treatments. The red circles on each line represent the average CD4 or RNA value for each treatment. The figure shows that an efficient treatment should provide an average CD4, in the range of 400-500 cell/μl. Hence, considering a 500 cell/μl threshold for successful treatment is acceptable. Treatments providing less than 400 cells/μl average CD4 count are not likely to be categorized as a successful treatment. When this is applied to individual patients, figure 5 shows that although the CD4 count range can be within the (0-1600) limit (with few beyond this limit), most CD4 count are in the range of (0-1000). Figure 5 shows ranges that contains most CD4 values. To determine CD4 count ranges with identical patients treatments responses, we calculated the average rate of change of CD4 count for patients within different ranges. We obtained rates of change by subtracting current CD4 count (obtained from the current lab reading) from the next CD4 count (obtained from the next lab reading), and divided the result by the number of treatments days between the two consecutive CD4 readings. We then grouped CD4 counts with close rate of change. With the range of CD4 count shown in brackets, the percentage rate of change shown in the first parentheses, and the percentage number of patients with CD4 counts in the considered range in the second parentheses, we obtained the following ranges: [<125] (64%)(13.34%), [126-190](33%)(10.75%), [191- 376](9.7%) (21.66%), [377-500](3.6%)(17.76%), [501-725](1.1%) (16.89%), [>726] (-3%)(19.59%). Figure 5 shows that there is a significant chance that the average CD4 count can reach a 500 cell/μl, if it is currently below this level. All treatments managed to keep the CD4 count in the range of (300-500). However, it is interesting to note that Kaletra, Truvada did not do well, providing an average CD4 count of about 300 cell/μl. Given that most treatments kept CD4 count in the range of (300-500), we suggest that it is safer to start initial treatments at about 400 (the average of 300 and 500).
Figure 5: Distribution of CD4 count and RNA level associated with major treatments.
Figure 5 also shows that when it comes to the RNA level, efficient treatments kept the average RNA level around the 10,000 cells/mμl level. Therefore, we conclude that an RNA level of 10,000 cells/mμl is a reasonable target. Again, figure 5 shows that none of the major treatments was able to keep the RNA average at the undetectable level (<50 cell/μl). In particular, “Kaletra, Truvada” did not do well suppressing the RNA level (its average RNA level is about 1000,000 cells/μl). In comparing the upper and lower charts of figure 5, we notice that, for example, Epivire; Sustiva; V iread, did better in increasing the CD4 count than controlling the RNA level.
Efficacy of the first three lines of treatments
Due to the special effect of drug resistance and side effects resulting from HIV/AIDS treatments, we are interested in investigating the effect of frequently used HIV/AIDS treatments on side effects. The term “frequently used” treatment is defined here as a treatment that satisfies the following two categories: 1. A treatment which is used at least three recurrent times, as a first line treatment (on the same patient). 2. A treatment that has been used on at least 30 patients (to provide a statistical significance using the bootstrap method). For each of the frequently used treatments, we calculated the daily rate of change (measured as number of cells gained/lost per day) of CD4 count and RNA level, using the following equation:
(1)
In equation (1), the variable X represents either the CD4 count or the RNA level. The index i represent the date when the treatment was used, and the index (i+1) represents the date of the next laboratory reading, after the treatment is used. The variable Duration represents the number of days between the date when a treatment is taken, and date of the first laboratory test after a treatment is taken. For example, to calculate the rate of change in the CD4 count after the use of the first treatment (starting from base CD4 count), we subtract the CD4 count when the treatment was used (X(i)), in this case, the base CD4 count from the CD4 count, after the use of the first treatment (X(i+1)). We then divide the result by the number of days between the two readings. Hence, the output from equation 1 will provide the number of cells/μl gained/lost per day, between two consecutive lab readings.
Figure 6 depicts the rate of change in CD4 count RNA level for the first, second, and third treatment lines (depicted at first, second, and third rows). The x-axis in figure 6 represents the rate of change for each of the three treatment lines, depicted on the y-axis. The gray circles represent the rate of change, and the red circles represents the average rate of change (for the considered treatments across all patients). The figure shows that the combination treatment Kaletra+Truvada, on average, added one CD4 cell/day, when used as a first line treatment. This is followed by the Atripla and Epivir, Sustiva, Viracept, with an approximate average of 0.5 and 0.3 CD4 cell/day, respectively. The figure also shows that the treatment combination Kaletra+Truvada, on average, reduced the RNA level by about 600 viral/day (when used as a first line treatment). The figure also shows that Atripla follows Kaletra+Truvada in reducing the viral load. The second and third rows of figure 6 depicts the most effective second and third line treatments in increasing the CD4 count, and reducing the RNA level. We implemented the bootstrap statistical model [14], such that each treatment was parsed as a combination of the 26 drugs available in our http://hivdatamining.com database (abacavir, amprenavir, atazanavir, darunavir, delavirdine, didanosine, efavirenz, emtric- itabine, enfuvirtide, etravirine, fosamprenavir, indinavir, lamivudine, lopinavir, maraviroc, nelfinavir, nevirapine, preveon, raltegravir, ritonavir, saquinavir, stavudine, tenofovir, tipranavir, zalcitabine, zidovudine). All treatment combinations that occurred at least twice in the dataset, possibly in the same patient, were retained. In addition to all such treatment combinations, five additional covariates were considered: the baseline (at study entry) CD4 count and virus load of each patient, the clinic where the patient was treated, and whether the treatment included a protease inhibitor or NNRTI. We separately considered first line treatments, treatments first applied to patients entering the study, and second line treatments, any treatment applied as a change of treatment while in the study. The response was the estimated slope of the CD4 response immediately after treatment start, i.e. computed from the first two time points on the new treatment. Covariates were incorporated in the model in a stepwise additive fashion. The procedure for adding covariates into the model started by computing the R2 for every possible covariate currently not in the model. The covariate with largest R2 was tested for inclusion, using bootstrap of studentized residuals. If the p-value fell below 0.05, the covariate was added to the model, and the R2 values recomputed for the remaining covariates to begin the cycle again. If a covariate was rejected for inclusion, all other covariates, in order of declining R2, were also tested for inclusion, but for both first and second line treatments, no additional covariates were marked for inclusion.
The bootstrap method was used to calculate the statistical significance of each of the three lines of treatments. We used 101 bootstrap trials to test the statistics significance of the calculated rate of change, using equation (1). A 95% significance test is performed (using bootstrap), when the calculated p-value, obtained from the original data samples, are below in the range of (0.001-0.1). No bootstraping is performed when the p-values of the original data samples are below or above the previous limits.
Figure 7 shows that, as a first line treatment, the Kaletra+Truvada combination is statistically effective for both CD4 count and RNA level, since its principle component (PC) p-value is 2.4 E-005, df=264. Although Atripla provided a statistically significant evidence (p-value 2.8 E-009, df=264) as an effective first line treatment to reduce the RNA level, there is no statistical evidence that it is effective, as a first line treatment, in increasing patient’s CD4 count. It is clear from figure 7 that the bootstrap method does not provide significant evidence that any of the second or third lines of treatments are consistently effective in increasing the CD4 count, or reducing the RNA level. Although the bootstrap method shows that only one treatment is consistently effective as a first line and second line, we need to mention that figure 7 shows that the number of considered treatments (at a given line of treatment) has a significant effect on the bootstrap results. More frequently used treatments (for example Kaletra+Truvada with df=264), provided better consistency evidence (in improving patient’s HIV dynamics), than treatment which are less frequently used at a considered line of treatment (1, 2 or 3).
As expected, depending on patients’ reaction to previous line of treatments, the effect of treatments in increasing the CD4 count can significantly vary from one line of treatment to another. For example, as a first line medication, the combination, Keletra+Truvada, managed to increase the average CD4 count (across all patients taking this medication at a base CD4 count) by about 100% (Figure 6). However, the second and third line effect of the CD4 count show that the effect of this combination has been reduced by approximately 50% from one line of treatment to another. On the other hand, the above three figures show that Combivir+Viracept had very limited effect on the CD4 count as a first line treatment. Hence, its affect stayed low throughout the second and third treatment lines. The same analogy can be applied on Trizivir, and most of the medication that had a moderate or no effect on the CD4 count during the previous treatment line. The previous discussion is also valid for the rate of change of the RNA level (Figure 6).
Treatments side effects
In this section, we use two decision support tools (number 2 and 3 on the “Healthcare Workers Queries” link) from our http://hivdatamining.com web portal. We use these tools to analyze possible side effects associated with major treatments. We focus our study on the selected treatments depicted in figure 5. However, the same approach can be used to analyze the side effect associated with treatments depicted in figure 6. For each selected treatment, we provide information related to the maximum number of recurrent prescriptions of the selected treatments, and overall percentile distribution of CD4 count and RNA level. The number of recurrent use of a given treatment is a good indication of the side effects associated with a considered treatment. The more a treatment is used, the less likely that it has generated a serious drug resistance or side effect. For example, if a treatment is prescribed in the range of (0-3) consecutive times on 90% of the patients (who used that treatment), then it is most likely that the selected treatment has generated a drug resistance or serious side effects, after a maximum of three recurrent prescriptions. The percentile distribution of CD4 and RNA values for a selected treatment is a good measure for the efficacy of the considered treatments, since it shows, for example, if the selected treatment was able to keep the CD4 count at the high level (and/or the RNA level at the lower level). Figure 8 depicts the percentile of CD4 count and RNA level for selected treatments. The figure shows that Trizivir is slightly less effective than “Epivire, Sustiva, Viread”, in terms of average CD4 count and RNA level. However, b shows that Trizivir managed to keep its patients’ CD4 on average 500 cell/μl and average RNA of about 10,000 cell/μl, while “Epivire, Sustiva, Viread” (Figure 8a) provided an average CD4 of 400 and average RNA about 10,000. More Trizivir patients (Figure 8b) (about 60% taking Trizivir, compared to 52% on “Epivire, Sustiva, Viread”), were able to sustain a CD4 count>350. Moreover, about 75% of the Trizivir patients were able to sustain RNA level<50 cell/μl (undetectable), compared to 68% of the patients on “Epivire, Sustiva, Viread”. Figures 9a and b also show that 41% of the patients on Trizivir were able to use the treatment for more than three consecutive prescriptions. On the other hand, only 35% of the patients taking “Epivire, Sustiva, Viread” were able to use the treatment for more than three consecutive prescriptions. This clearly indicates less severe side effect associated with Trizivir, compared with “Epivire, Sustiva, Viread”. Figure 8c depicts the percentile distribution of the CD4 and RNA values related to the “Kaletra, Truvada” treatment. The figure shows that, compared to Trizivir and “Epivire, Sustiva, Viread”, this treatment did not do well in terms of maximizing the percentile of patients, with good CD4 count (>350) and minimizing the percentile of patients with high RNA level (>10,000). In addition, Figure 9c shows that about 90% of “Kaletra, Truvada” were not able to take the treatment more than three consecutive prescriptions (which indicates that this medication is associated with severe side effects).
Figure 8: Distribution of CD4 count and RNA level associated with most effective treatments.
Data mining analysis performed on 9,392 HIV/AIDS patients visits (available on http://hivdatamining.com web portal) shows that, before 2004, treatments aimed at improving the CD4 count, ignoring the fact that the RNA level kept inclining. Figure 4 shows that, before 2004, the percentage of CD4 count>350 kept improving to rise from about 51% on 2000, to about 55% in 2004. During the same period of time, the percentage of undetectable RNA level declined from about 65% to about 52%. Figure 4 also show that the effect of medications on HIV dynamics changed significantly after 2004. The figure shows that both healthy CD4 count of >350 cells/μl and undetectable RNA level (<75/50 cells/μl depending on the year of measurement) have significantly improved with CD4 count>350, going up from about 55% in 2004 to about 65% in 2007; meanwhile, the percentage of undetectable RNA going up from about 52% in 2004 to about 66% in 2007. We show in figure 4 that the combination Kaletra, Truvada has been the most significant treatment throughout the years 2000- 2008. Since the FDA approved this treatment (in 2004), the annual average CD4 count has increased steadily with an average of 20 cells/ μl, and the annual average RNA level has decreased steadily with an average of 4,500 cells/μl, with an average decrease of about 7,500 cells/ μl during the years 2004-2005, and 2005-2006, and a decrease of about 500 during the year 2006-2007. Figure 3 shows that the average CD4 count increased from 430 cells/μl in 2004 to 480 cells/μl in 2007. The figure also shows that the average RNA level went down from about 25,000 cells/μl in 2004 to about 14,000 cells/μl in 2007. When it comes to major treatments, figure 5 shows that the average CD4 count for major treatments has never exceeded 500 cells/μl (with higher average provided by Epivire, Sustiva, Viread at about 500 cells/μl, and lower average of about 300 obtained from Kaletra, Truvada). Figure 6 also shows that Epivir, Sustiva, Viracept managed to keep the average RNA level at about 10,000 cells/μl, while Kaletra,Truvada was much less efficient in controlling the RNA level, since it kept the average RNA at about 1000,000 cells/μl. When considering the rate of change of CD4 count and RNA levels for the first, second, and third treatment lines, we show in figure 6 that Kaletra, Truvada has been most effective as a first line treatment, with an average rate of one cell/day CD4 count increase (compared to baseline CD4 count), and average RNA level reduction of 600 cells/day. Details of the rate of change related to first, second and third treatments lines are depicted in figure 6. We studied the possibility major treatments’ side effect. Figure 9a and b shows that when considering effective treatments, such as Trizivir (with average CD4 count of 500 cells/μl), 41% of the patients were able to use the treatment for more than three consecutive prescriptions. On the other hand, only 35% of the patients taking Epivire, Sustiva, Viread (with average CD4 count of about 400) were able to use the treatment for more than three consecutive prescriptions. This clearly indicates less severe side effect associated with Trizivir, compared with “Epivire, Sustiva, Viread”. The reader is encouraged to visit the “Healthcare Workers Queries” link available on our http://hivdatamining.com web portal for more information and analysis regarding treatments efficacy and side effects.
This work is supported by Clarke University, Dubuque, IA and the Department of Economic Development Grow Iowa Values Fund. The authors would like to express their deep gratitude for counseling, financial support, valuable comments, and unlimited help and encouragement from both supporters. Dr. Mary Lou Caffery provided valuable suggestions to improve this manuscript, and to secure valuable funding. Without the data and support provided by the University of Wisconsin- Madison and the University of Iowa, it would have been impossible to complete this work. We are very grateful to Dr. Jack Stapleton, the Director of the University of Iowa HIV Program and Director of the University of Iowa’s Helen C. Levitt Center for Viral Pathogenesis and Disease, for his support and encouragement.