Assessing Hepatoxicants based on High-throughput Quantitative SILAC
Proteomics and Causal Biological Networks

Ahmed Enayetallah; Sashi Nadanaciva; Paul Ajuh; Cristina Vazquez Martin; Amy Wheat; Angus Lamond; Daniel Ziemek; Karen L Leach

doi:10.4172/jpb.1000352

Research Article - (2015) Volume 8, Issue 3

View PDF Download PDF

Assessing Hepatoxicants based on High-throughput Quantitative SILAC Proteomics and Causal Biological Networks

Ahmed Enayetallah¹, Sashi Nadanaciva², Paul Ajuh³, Cristina Vazquez Martin³, Amy Wheat³, Angus Lamond³, Daniel Ziemek⁴ and Karen L Leach^2,⁵^*: ¹Translational Medicine, Biogen Idec, Cambridge, Massachusetts, USA; ²Compound Safety Prediction, Pfizer Global Research and Development, Pfizer, Inc, Groton, Connecticut, USA; ³Dundee Cell Products, Dundee, SCOTLAND; ⁴Computational Sciences Center of Emphasis, Pfizer Global Research and Development, Pfizer, Inc, Berlin, Germany; ⁵Pfizer Global Research and Development, Centers for Therapeutic Innovation, Boston, Massachusetts, USA

^*Corresponding Author: Karen L Leach, Safety Prediction, Pfizer Global Research and Development, Pfizer, Inc, Groton, Connecticut, USA, Tel: +1 860-441-4100

Abstract

Drug-induced liver injury can result in the termination of compounds in preclinical development, as well as withdrawal of marketed drugs. Identification of the signaling pathways and proteins involved in injury is an important step towards establishing assays that could be utilized to identify potential hepatotoxicants early in the drug discovery process. In this study we used high throughput quantitative mass spectrometry proteomics involving Stable Isotope Labeling with Amino acids in Cell culture (SILAC) leveraged by a recently developed systems biology approach (Causal Reasoning Engine, CRE) to investigate the effects of hepatotoxic compounds on the cellular proteome of HepG2 cells. Cells were treated with various concentrations of nefazadone, nimesulide, nomifensine, or glafenine, all of which cause hepatotoxicity in humans. Buspirone and rosiglitazone were used as comparator compounds not associated with hepatotoxicity. In comparison to more traditional proteomic analysis tools CRE results provided detailed molecular hypotheses that condense into biological networks and collectively explain a significant number of the measured protein changes. The CRE hypotheses demonstrate that magnitude of response is not necessarily the differentiating factor between DILI and non-DILI compounds but rather the biological processes implicated. Differentiating CRE molecular hypotheses implicate lipid and glucose metabolism, inflammation, oxidative stress and DNA damage as consistent major in vitro discriminating factors. Overall, this paper provides evidence that SILAC data coupled with the CRE analysis method can provide context and new insight into variation of stress responses in hepatotoxic versus non-hepatotoxic compounds even within the same therapeutic class.

Keywords: Drug-Induced Liver Injury, Causal reasoning, SILAC, Amino acids

Abbreviations

DILI: Drug-Induced Liver Injury; SILAC: Stable Isotope Labeling with Amino acids in Cell culture; GO: Gene Ontology; CRE: Causal Reasoning Engine; MeSH: Medical Subject Heading; Nim: Nimesulide; Nef: Nefazodone; Nom: Nomifensine; Gla: Glafenine; Bus: Buspirone; Ros: Rosiglitazone

Author Summary

Drug-induced toxicities can result in stopping preclinical development of a compound, or even in the withdrawal of compounds once they are marketed. What is lacking is a means of identifying early in the drug discovery process the potential safety liabilities of compounds. By the time a compound is tested in vivo, it is usually too late to change the compound profile and remove the toxicity. Testing compounds in in vitro cellular models is a widely used approach that can help identify potential toxicities, but this approach is limited by the endpoints that are measured. Compounds that are cytotoxic can be identified easily but there are many other in vivo toxicities for which in vitro correlates are not known. In this work, we have treated HepG2 liver cells with compounds known to cause liver toxicity, and profiled the resulting protein changes using SILAC mass spectrometry. We analyzed the results using a systems biology approach which identified biological networks associated with compound toxicity. Application of this approach to a wider range of compounds will be important in helping to build a proteomic database that can be utilized to catalog toxicity profiles.

Introduction

Drug-induced liver injury (DILI) is a major cause of hepatic failure, and also is the most common cause of failure of drugs during both preclinical as well as clinical development. A wide variety of drugs can result in liver injury, including antibiotics, CNS drugs and non-steroidal anti-inflammatory drugs (NSAIDS) [1]. Two categories of DILI have been designated: intrinsic and idiosyncratic [2]. Acetaminophen (APAP) overdose is a well-studied example of intrinsic hepatotoxicity, which results from metabolism of APAP to the electrophilic N-acetylp- benzoquinoneimine which can covalently bind to cellular proteins [3]. Idiosyncratic DILI is observed in only a small percentage of treated patients, but overall, accounts for more than 10% of all cases of acute liver failure [2]. The manifestation of DILI is quite heterogeneous, and includes a clinical spectrum from an increase in liver enzymes to acute liver failure. Because of this, DILI can be difficult to diagnose, and there is a lack of standard diagnostic tests and biomarkers [2,4].

Besides the need for clinical tools, there also is a need for preclinical in vitro models which could be used both to identify potential hepatotoxicants as well as to study the underlying cellular pathways contributing to toxicity. A variety of injury mechanisms are involved in the hepatotoxic response, including cellular and oxidant stress, increased lysosomal permeability, mitochondrial inhibition, and opening of the mitochondrial permeability transition pore, resulting in apoptosis or necrosis [2]. Activation of these pathways in in vitro cell models following treatment with hepatotoxic drugs is associated with changes in many cellular proteins, including an increase in enzymes leading to DNA fragmentation and loss, c-jun-N-terminal kinase (JNK) activation, mitochondrial translocation of proteins such as Bax, and release of apoptotic proteins [2,5].

Multiple approaches have been used to examine these pathways and monitor changes in cellular proteins, including transcriptomics, 2-D gel electrophoresis, and metabolomics [6-8]. In this work we have utilized a sensitive mass spectrometry approach, SILAC, to examine protein changes in HepG2 cells in response to a selection of hepatotoxicants. SILAC is most appropriate for the comparative study of protein expression in cells with and without drug treatment [9-11]. It is a relatively simple technique, labeling is metabolic and the approach facilitates the quantitative analysis of a large number of cellular proteins in an unbiased way in a single experiment.

Using the SILAC proteomics approach, four hepatotoxic compounds were profiled including nimesulide (Nim), nefazadone (Nef), nomifensine (Nom), and glafenine (Gla), all of which have been withdrawn from the market in many countries due, at least in part, to liver toxicity. For these compounds, the exact mechanisms resulting in hepatotoxicity are not known, and thus it is of interest to determine whether an analysis of the proteomic changes resulting from cell treatment with these compounds identifies new pathways that may contribute to toxicity. The proteomic changes resulting from these compounds were compared to the proteomic profiles resulting from HepG2 cells treated with buspirone (Bus) and rosiglitazone (Ros), two compounds not identified with hepatotoxicity. Using SILAC analysis, several thousand proteins were identified and quantified under each condition.

For data analysis we used standard enrichment analysis on functional categories from the Gene Ontology and a novel causal reasoning engine method, CRE, to infer the upstream biological events that likely caused the observed protein changes. CRE is an algorithm originally designed to analyze microarray transcriptomic data [12]. Here we adapted the algorithm to interrogate proteomics results in context of causal statements derived from the biomedical literature to infer upstream molecular events driving these protein expression changes. The inferred upstream events (also called hypotheses) are aggregated into biological models using a set of analytical tools that allow for evaluation and integration of the hypotheses in context of their supporting evidence from prior biological knowledge. Causal reasoning is similar to gene set enrichment with two main advantages. The first advantage is that it provides inferences on molecular mechanisms by using sets of molecular entities (e.g. genes, proteins, metabolites) corresponding to the effects of known perturbations. Second, it accounts for the direction of the expression changes and hence the direction of the generated hypothesis can be calculated [13]. Causal reasoning facilitates the analysis of large scale data such as proteomics experiments and allows inferences of biological perturbations that cannot be observed at the protein expression level. This allows us to predict the actual functional status of genes and proteins even if we do not observe any change in their corresponding protein expression level. This is largely leveraged through the utilization of previous biological knowledge that is exponentially growing. In the experiments described here, the use of CRE to analyze the SILAC data appears to be as robust as when CRE is used to analyze transcriptomic data. CRE generated both concise molecular hypotheses (such as a predicted increase in sterol regulatory binding protein (SREBF1) transcription activity, SREBF+) and provided their biological context (such as lipid and glucose metabolism).

Materials and Methods

Experimental design

Cell culture and sILAC treatment: A triple labeling strategy for cell culture (R0K0, R6K4 and R10K8 was utilized [14]. HepG2 cells were cultured in 150 mm dishes. The compounds selected for treatment of the cells were buspirone (Bus), nefazodone (Nef), glafenine (Glaf), nimesulide (Nim), nomifensine (Nom) and rosiglitazone (Ros). To allow standard SILAC triplex comparison of samples, the six compounds were divided into three groups prior to treatment of labeled cells as follows: Group 1 (Nef: R6K4, Bus: R10K8 and DMSO control: R0K0), Group 2 (Nim: R6K4, Glaf: R10K8, DMSO control: R0K0), Group 3 (Nom: R6K4, Ros: R10K8 and DMSO control: R0K0).

Compounds were each dissolved in dimethyl sulphoxide (DMSO). Cells were cultured in SILAC media (i.e. DMEM containing 10% fetal calf serum, ¹³C and ¹⁵N labelled arginine and lysine or lysine labelled with deuterium, 2mM glutamine, 100 U penicillin/0.1 mg/ ml streptomycin) for a minimum period of 10 days prior to drug treatments and harvested by trypsinization. Cells were treated with compounds at 3 μM, 10 μM or 100 μM concentrations, or, in the case of Nef, 3, 10 or 40 μM. After incubation for 24, 48 or 72 h, the cells were collected by trypsinization and cell lysates prepared by sonication in a lysis buffer containing 50 mM Tris, pH 7.5, 0.5 M NaCl, 1% NP- 40, 1% DOC, 0.1% SDS, 2 mM EDTA. Prior to the analysis of the six compounds, preliminary experiments had been performed to check the labeling efficiency of the homogenous cell line HepG2 (data not shown). Protein assays for each time point/drug concentration and controls were performed in triplicates. HepG2 cells were cultured and treated identically in multiple petri dishes. Triplicates were then pooled and each three pooled samples from each group were mixed (1:1:1). Following this mixing all processing was carried out in a single tube thus avoiding any processing errors prior to LC-MS/MS analysis. For each time point and drug concentration, total protein concentrations were determined using the Bradford assay for unlabeled arginine and lysine (R0K0), ¹³C labelled arginine and deuterium labelled lysine (R6K4), and ¹³C and 15N labelled arginine and ¹³C labelled arginine and ¹⁵N labelled lysine (R10K8) cell lysates. Equal amounts of total protein from the respective cell lysates were then mixed, proteins separated by SDS PAGE and processed as described in the next section.

Gel electrophoresis and in-gel digestion: Each sample was reduced in SDS PAGE loading buffer containing 10mM DTT and alkylated in 50 mM iodoacetamide prior to being boiled and then separated by onedimensional SDS–PAGE (4–12% Bis-Tris Novex mini-gel, Invitrogen) and visualized by colloidal Coomassie staining (Novex, Invitrogen). The entire protein gel lane was excised and cut into 10 gel slices each. Every gel slice was subjected to in-gel digestion with trypsin [15]. The resulting tryptic peptides were extracted by 1% formic acid, acetonitrile, lyophilized in a speedvac (Helena Biosciences) and resuspended in 1% formic acid.

Mass spectrometry: Trypsin digested peptides were separated using a Dionex Ultimate 3000 nanoflow LC-System consisting of a solvent degasser, a nanoflow pump, and a thermostated microautosampler. 10 μl of sample was loaded with constant flow of 500 nl/min onto a 15 cm fused silica emitter with an inner diameter of 75 μM (Proxeon Biosystems) packed in-house with reverse-phase ReproSil-Pur C18-AQ 3 μM resin (Dr. Maisch GmbH). Peptides were eluted with a segmented gradient of 10-60% solvent B over 100 minutes with a constant flow of 200 nL/min. The HPLC system was coupled to an LTQ-Orbitrap mass spectrometer (Thermo-Fisher Scientific) via a nanoscale LC interface (Proxeon Biosystems). The spray voltage was set to 2.3kV and the temperature of the heated capillary was set to 180ºC. Survey full scan MS spectra (m/z 300-1700) were acquired in the Orbitrap with a resolution of 60,000 at m/z 400 after accumulation of 1,000,000 ions. The five most intense ions from the preview survey scan delivered by the Orbitrap were sequenced by collision induced dissociation (normalized collision energy 40%) in the LTQ after accumulation of 5,000 ions concurrently to full scan acquisition in the Orbitrap. Maximal filling times were 1,000 ms for the full scans and 150 ms for the MS/MS scans. Precursor ion charge state screening was enabled and all unassigned charge states as well as singly charged species were rejected. The dynamic exclusion list was restricted to a maximum of 500 entries with a maximum retention period of 180 seconds and a relative mass window of 15ppm. The lock mass option was enabled for survey scans to improve mass accuracy [16]. Data were acquired using the Xcalibur software.

Quantification and bioinformatics analysis: Quantification was performed with MaxQuant version 1.0.7.4 [17], and was based on twodimensional centroid of the isotope clusters within each SILAC pair. To minimize the effect of outliers, protein ratios were calculated as the median of all SILAC pair ratios that belonged to peptides contained in the protein. The percentage variability of the quantitation was defined as the standard deviation of the natural logarithm of all ratios used for obtaining the protein ratio multiplied by a constant factor 100.

The generation of peak list, SILAC- and extracted ion currentbased quantitation, calculated posterior error probability, and false discovery rate based on search engine results, peptide to protein group assembly, and data filtration and presentation was carried out using MaxQuant. The derived peak list was searched with the Mascot search engine (version 2.1.04; Matrix Science, London, UK) against a concatenated database combining 80,412 proteins from International Protein Index (IPI) human protein database version 3.6 (forward database), and the reversed sequences of all proteins (reverse database). Parameters allowed included up to three missed cleavages and three labeled amino acids (arginine and lysine). Initial mass deviation of precursor ion and fragment ions were up to 7 ppm and 0.5Da, respectively. The minimum required peptide length was set to 6 amino acids. To pass statistical evaluation, posterior error probability (PEP) for peptide identification (MS/MS spectra) should be below or equal to 0.1. The required false positive rate (FPR) was set to 5% at the peptide level. False positive rates or PEP for peptides were calculated by recording the Mascot score and peptide sequence length-dependent histograms of forward and reverse hits separately and then using Bayes’ theorem in deriving the probability of a false identification for a given top scoring peptide. At the protein level, the false discovery rate (FDR) was calculated as the product of the PEP of a protein’s peptides where only peptides with distinct sequences were taken into account. Proteins were quantified if at least one MaxQuant quantifiable SILAC pair was present. Identification was set to a false discovery rate of 1% with a minimum of two quantifiable peptides. The set value for FPR/ PEP at the peptide level ensures that the worst identified peptide has a probability of 0.05 of being false; and proteins are sorted by the product of the false positive rates of their peptides where only peptides with distinct sequences are recognized. During the search, proteins are successively included starting with the best-identified ones until a false discovery rate of 1% is reached; an estimation based on the fraction of reverse protein hits. Enzyme specificity was set to trypsin allowing for cleavage of N-terminal to proline and between aspartic acid and proline. Carbamidomethylation of cysteine was searched as a fixed modification, whereas N-acetyl proline and oxidation of methionine were searched as variable modifications. In total, 25237 peptides and 2754 proteins were identified of which 24916 peptides and 2234 proteins were quantified. Only proteins identified from at least two different peptides were selected for further bioinformatics analyses.

Gene ontology (GO) analysis: Gene ontology analyses were performed using the software Perseus version 1.2.7.4. The selected up- and down-regulated protein data for each drug were loaded separately into Perseus using the proteingroups.txt file format generated by MaxQuant. The appropriate ‘Expression’, ‘Categorical annotation’, ‘Numerical annotation’, ‘Multi-numerical annotation’ or ‘Textual annotation’ for each column was selected. In Processing Row annotation, the annotation was added and GO terms added to the data using UniProt identifiers to map them. The files were then exported in text format. The exported files were opened in Excel and GO terms, i.e. biological process, cellular compartment, molecular function, etc. sorted alphabetically. We used the GO-term “biological process” for this analysis because it is the most informative with respect to determining biochemical pathways that may be affected by the compounds. The number of times the GO term was mentioned for each drug was added and the data used to create a table.

Causal reasoning engine algorithm: The Causal Reasoning Engine (CRE) follows the general methodology introduced in Pollard et al. [18]. The CRE algorithm of Chindelevitch et al. [12] provides novel statistical measures to assess relevance of upstream regulators to interpret the observed gene (or protein) expression changes. Briefly, the method utilizes a large collection of curated causal statements in the form:

A [increases or decreases] B, where A and B are measurable biological entities.

The biological entities can be of different types such as phosphorylated proteins, transcript levels, biological process or compound exposure. Each statement is tied to peer-reviewed articles. In this study, we licensed approximately 450,000 causal statements from the commercial vendors Ingenuity Systems and Selventa.

CRE generates hypotheses that are likely explanation of downstream gene or protein changes based on the curated statements in the knowledge base. We consider two metrics to quantify the significance of a hypothesis with regards to experimental data set, namely enrichment and correctness. The Enrichment p-value for a hypothesis calculates the statistical significance of finding (#incorrect + #correct) genes within the set of all genes downstream of a hypothesis. The exact p-value can be computed by a Fisher’s exact test [19,20].

The Correctness p-value is a measure of significance for the score of a hypothesis h defined as (#correct - #incorrect). As desired, this score is high, if the number of correct predictions exceeds the number of incorrect predictions. To ensure statistical significance under a null model of randomly re-assigning up- and downregulated transcripts to arbitrary nodes, we compute the distributions for this score and derive appropriate p-values. The Causal Reasoning Engine is implemented in the statistical programming language R [21].

Significant CRE hypotheses can be visualized as biological networks (e.g. Figure 1) in which nodes represent the implicated molecular entities. Nodes are colored to indicate a predicted up- (yellow color) or down-regulation (blue color). Directed edges between the nodes refer to known causal relationships in the literature among the hypotheses. These edges are drawn from the same curated knowledgebase that underlies the upstream inference described above. As a result implicated hypotheses can be seen in biological context and interpreted more easily.

proteomics-bioinformatics-biological-nimesulide-predicted

Figure 1: CRE biological network for Nimesulide 3 μM 48 h (yellow=predicted increase, blue=predicted decrease).

MeSH analysis of CRE hypotheses: Evidence for a particular molecular hypothesis is always linked to one or more biomedical articles. If the particular article is indexed by Medline, Medical Subject Headings (MeSH terms) describing key aspects of the article are often available [22]. MeSH terms are naming descriptors of varying specificity drawn from a comprehensive hierarchy of biological terms (e.g. “cardiac arrythmia”, “blood”, “diabetes”, “liver”). All MeSH terms of articles in the selected hypotheses are aggregated. For each MeSH term the number of occurrences in the set of articles are counted and compared to the general frequency of occurrence of the particular MeSH term in all articles underlying the causal network. Statistical significance of this ratio is assessed by Fisher’s exact test. All MeSH terms significanct at a level of p<0.001 are reported.

Results

SILAC quantitative proteomics profiling

HepG2 cells were treated with 3, 10 or 100 μM concentrations of the six compounds Gla, Nef, Nim, Nom, Bus and Ros for 24 to 72 h. Pilot experiments were carried out to confirm that the compounds were not overtly toxic to the HepG2 cells. Except for Nef, none of the compounds had any effect on cell growth when measured at concentrations as high as 300 μM (data not shown). Nefazadone inhibited cell proliferation with an IC₅₀ of 38 μM (data not shown), and thus the highest concentration of that compound used in these studies was adjusted to 40 μM.

SILAC technology was used to quantify the compound effects on the cellular proteome. In total over 2,000 proteins were identified using high-resolution mass spectrometry with a false discovery rate of 1% (Table S1). MaxQuant software analysis was utilized for protein identification and quantitative data analysis. A cutoff of significant protein changes of 1.5 was determined empirically for selecting proteins to be used in functional and bioinformatics analyses. Using protein changes ≥ 1.5 fold (as significant) compared to the DMSO control, the largest number of protein changes occurred for Nom and Ros, with about 120-180 proteins with decreased levels and 100-140 up-regulated proteins (Figure 2). In contrast, Gla treatment of cells resulted in increases or decreases in only about 20-40 proteins. These results demonstrate that the number of protein changes alone was not a predictor of compound toxicity. The time course of the protein changes was monitored over 72 h. In general, protein changes increased throughout the period of cell treatment. In the case of Bus, protein changes were maximal at the 48 h time point and decreased at 72 h.

proteomics-bioinformatics-global-protein-quantitation

Figure 2: Global analysis of protein changes. Bar chart showing significant protein changes in HepG2 cells treated with 10 μM concentration of each of the six compounds at three different time points as described in the Materials and Methods section. A, Quantitation of down-regulated proteins; B, Quantitation of up-regulated proteins. The panel below the bar charts shows the number of proteins affected compared to the DMSO controls.

Gene ontology analyses using SILAC data

To gain insight into the effects of the compounds on HepG2 cellular processes, gene ontology (GO) analyses was performed on the SILAC data. The GO terms tables produced were used to create pie charts for all protein changes of the GO term “biological process” for each drug (Figure 3 and Table S1).

proteomics-bioinformatics-drug-concentration-biological

Figure 3: Pie charts obtained from GO analysis performed. The SILAC data used for the GO analyses were obtained by treating HepG2 cells with the respective compounds at 10 μM drug concentration for 48 hrs at 37ºC. The numbers on each sector of a pie chart represent the number of proteins that changed significantly in the biological process. The colours of the sectors of the pie charts represent specific biological processes to which the protein changes have been allocated by the GO analyses (see key at the bottom of the figure). Note that not all proteins that changed could be allocated to a biological process due to lack of functional and structural information about these proteins in the databases used for GO analyses.

The pie charts from the GO analyses representing high level biological processes affected by treatment with 10 μM of Bus or Nim indicate that the top processes affected were identical for both compounds (Figure 3). These processes were nucleic acid metabolic processes, carbohydrate metabolism, transport, and organic acid metabolism. The relative number of proteins affected differed slightly for the two drugs, however, with 28 protein changes attributed to nucleic metabolic processes for Bus versus 13 in the same category for Nim. Similar GO analysis was carried out for the other drugs used in this study (Table 1). Overall, the results from the GO analysis for all the compounds show that the top 5-6 biological processes (involving >50% of the protein changes identified in the SILAC analysis) were about 80% similar between the 6 compounds, despite the fact that no biological target is common to all the compounds. Several biological processes were not shared, however; for example protein changes attributed to RNA splicing were affected only by Bus, Nom and Nef, and chromatin organization and protein folding also were not shared across all compounds. These differences may be linked to differences in cellular targets or secondary drug effects including toxicity. Overall, the results suggest that GO analysis may be most useful in identifying the broad categories of proteomic changes, but that it does not identify specific mechanisms of action, nor mechanisms of toxicity. In order to address this issue, a Causal Reasoning Engine (CRE) approach was used to analyze the protein changes.

Biological processes	Nom	Ros	Nim	Nef	Bus	Glaf
nucleobase, nucleoside, nucleotide and nucleic acid metabolic process	27	23	13	10	28	2
organic acid metabolic process	26	19	6	9	10	5
transport	13	13	7	6	11	5
carbohydrate metabolic process	12	7	6	3	8	3
metabolic process	7	4	2	3	5
alcohol metabolic process	7	2	1	2
protein modification process	6	3	3	3	4	2
proteolysis	5	2	2	4	4
protein folding	5	3	4		1
lipid metabolic process	4	3	2		2	1
protein complex assembly	3	3	4	1	2	2
cell morphogenesis	3	3	3	4		2
RNA splicing, via transesterification reactions	3			3	8
immune system process	3	3	1	4	1
generation of precursor metabolites and energy	2	6		6	12
translation	1	3	4	5	6	1
system process		2	3	2	4	4
chromatin organization		1	4	1	5	1

Table showing the number of protein changes observed when HepG2 cells were treated with the respective compounds at 10 μM drug concentration for 48 hrs at 37°C. The numbers in the table represent the number of proteins that changed significantly for each biological process following drug treatment.

Table 1: Gene Ontology Analysis of Protein Changes in Compound Treated Cells.

Interrogation of the observed protein changes using the Causal Reasoning Engine

The CRE was used to interrogate the observed protein abundance changes in light of prior biological knowledge and to identify the biological events likely to explain these changes. CRE hypotheses were analyzed and integrated into biological networks as previously described [12,13]. Briefly, the biological events predicted by CRE were ranked based on their correctness and enrichment statistical significance (p-values) and the top ranking hypotheses were used for clustering analysis as well as construction of biological networks based on their overlap in supporting evidence and known biological relationships between biological entities.

Figure 4 shows unsupervised hierarchical clustering for CRE hypotheses that ranked 1-20 in at least 2 of the 54 cell treatments (Table S2). The hierarchical clustering identified two main clusters. The first cluster includes all concentrations and time-points from Nef, Nim and Gla and the other major cluster included all treatments from Ros, Nom and most treatments of Bus. Subsequent to the unsupervised clustering, a t-test was used to identify the hypotheses that are significantly different between the two groups. Figure 5 shows the clustering of hypotheses that are significantly different between the two groups with a p-value <0.001. Hypotheses that are significant in the Ros/Nom/Bus cluster are enriched for lipid and glucose metabolism such as SREBF1+, INSIG1+, cholesterol- and IGF+ as well as inflammation related hypotheses such as lipopolysaccharide+, IL1A+, JAK2, Dexamethasone+ and other hypotheses such as HIF1A+, SOD2-, EGR2+ and BRCA1+. On the other hand, the Nef/Gla/Nim cluster is significant for hypotheses such as Smoke, cigarette+ (surrogate for oxidative stress induced injury), RHOA/ROCK1- (cytokinesis and transcriptional activation), AHR+ (xenobiotic metabolism) and EIF4G1-. (protein synthesis).

proteomics-bioinformatics-hierarchical-clustering-euclidean

Figure 4: Unsupervised hierarchical clustering computed on Euclidean distance of the correctness p-values of the CRE generated hypotheses shows two main clusters that discriminate DILI from non-DILI compounds (Red=Min, Orange=0.01, Yellow=0.05, Grey=>0.05).

CRE biological networks for Nim and Bus

The biological networks implicated by the CRE analysis can be visualized, and Figures 1 and 6 show a graphical representation of the CRE mechanistic hypotheses that best explain the observed protein changes. Results for Nim, Nef and Gla were all highly similar, while results for Bus, Nom and Ros formed a distinct second group. Therefore, results for Nim and Bus are shown as representatives of the 2 compound sets. Following treatment of cells with 3 μM Nim for 48 h, the major biological processes reflected by the CRE hypotheses include oxidative stress (e.g. Reactive Oxygen+, NFE2L2+, nitric oxide+). Downstream of the oxidative stress are a number of hypotheses suggestive of DNA damage and apoptosis (e.g. TP53+, etoposide+ and PARP1+). In addition, Medical Subject Heading (MeSH) term analysis of the supporting protein changes of individual hypotheses such as HNF4A+ is consistent with protein changes that have been observed in previous literature in association with liver dysfunction, including steatosis and hepatomegaly [23].

Figure 5: Hierarchical clustering computed on Euclidean distance of the correctness p-values of CRE hypotheses that are significantly different (t-test p-value <0.001) between the DILI and non-DILI compound groups (Red=Min, Orange=0.01, Yellow=0.05, Grey=>0.05)

proteomics-bioinformatics-biological-network-buspirone

Figure 6: CRE biological network for Buspirone 3 μM at 48 h (yellow=predicted increase, blue=predicted decrease).

In contrast, Figure 6 shows the biological network that best explains the observed gene expression response to Bus 3 μM at 48 h. Response to Bus is largely predominated by hypotheses of glucose/carbohydrate metabolism (e.g. carbohydrate+, D-glucose+, MLX+ and MLXIPL+) and lipid/cholesterol metabolism (e.g. SREBF1+, SREBF2+ and SCAP+). MeSH analysis of individual hypotheses is strongly suggestive of energy metabolism changes and changes in gene expression, which could be a reflection of cellular adaptative changes to compound treatment.

CRE MeSH term enrichment analysis

term enrichment allows us to gain further insights into the context of the generated CRE hypotheses since many genes may have diverse biological functions. This is done by calculating the significance of the presence of a specific MeSH term in the manuscripts supporting the generated CRE hypotheses. Table 2 shows all the MeSH terms with a p-value <0.001 for the Nim and Bus treatments of 3 μM at 48 h. The most significantly enriched MeSH term for Nim is “oxidative stress” in contrast to “cell hypoxia” and “anoxia” which are the most significant terms for Bus. MeSH terms significant for Nim only include “Neoplastic gene expression Regulation”, Pulmonary Disease, Chronic Obstructive”, “Liver Neoplasms”, and “Carcinoma, Hepatocellular”. All of these terms denote more severe physiological phenotypes than those identified in the Bus MeSH terms.

Nimesulide 3µM 48hrs		Buspirone 3µM 48hrs
MeSH Term	p-value	MeSH Term	p-value
Oxidative Stress	1.55E-12	Cell Hypoxia	3.44E-26
Gene Expression Regulation, Neoplastic	6.70E-07	Anoxia	1.30E-14
Pulmonary Disease, Chronic Obstructive	2.04E-06	Lipid Metabolism	1.40E-13
Liver Neoplasms	2.37E-06	Gene Expression Regulation, Enzymologic	2.43E-08
Carcinoma, Hepatocellular	2.48E-06	Gene Expression Regulation, Neoplastic	3.15E-07
Lipid Metabolism	8.36E-05	Glycolysis	1.17E-06
Emphysema	1.05E-04	Fasting	3.54E-05
Metabolic Detoxication, Phase II	3.13E-04	Genes, myc	8.53E-05
Gene Expression Regulation, Enzymologic	3.24E-04	Osmotic Pressure	1.03E-04
Oxidation-Reduction	6.61E-04	Oxidation-Reduction	2.08E-04
Organ Size	9.24E-04	Adaptation, Physiological	4.17E-04

The MeSH term enrichment analysis provides the biological context of the evidence supporting the CRE molecular hypotheses. The p-values are computed by Fisher’s exact test.

Table 2: MeSH Terms enriched for the CRE hypotheses.

Discussion

The application of proteomics is a powerful approach for investigating mechanisms of compounds. Initially this technique was applied primarily to profiling the mechanism of action of compounds, but has subsequently been applied to toxicology research [24,25]. Compound toxicity is often manifest at a phenotypic level, with little or no indication of underlying mechanisms, and thus a major advantage of a proteomic approach is the broad, unbiased view it provides. Protein levels are measured directly, in contrast to transcriptomic approaches, where mRNA levels do not always correlate with protein expression due to post-transcriptional changes which may affect protein abundance. Utilization of the SILAC technique has the added advantage of quantification of the protein changes due to the metabolic labeling of cells [26].

Data integration and interpretation remain among the main challenges to effectively utilize large scale omic data to answer complex scientific questions. CRE was developed to address this hurdle whereby the CRE algorithm infers the probable biological explanation(s) of the observed experimental changes by interrogating a knowledge base of biological relationships captured from published literature. The knowledgebase used in this work is from commercial sources, but the academic community is increasingly perceiving the value of curated causal content and we expext more public databases to include such content in the future. The CRE method has been previously applied to condense a large number of differentially expressed transcripts into a smaller, biologically relevant set of upstream regulators [13,27]. In this work, we applied the method to SILAC-derived measured protein level changes with a similar goal. GO analysis of the changes provided inferences at the biological process level such as carbohydrate metabolic process and lipid metabolic process. CRE analysis identified these same processes but provided higher resolution information and molecular details of the pathways perturbed and the directionality of change. The MeSH analysis provides additional evidence to enhance the understanding of the context of the CRE molecular hypotheses.

In our study, six compounds were investigated, four of which are associated with idiosyncratic DILI: Nim, Nef, Gla, and Nom. The remaining two compounds, Bus and Ros, are not associated with DILI. The results of CRE analysis and hierarchical clustering showed a clear classification of the compounds into 2 groups. Group 1 contained Nim, Nef and Gla while Group 2 contained Nom, Bus, and Ros. The Nim/ Nef/Gla cluster was significant for hypotheses associated with oxidative stress. Hypotheses are considered indicative of oxidative stress either due to their known roles (e.g. Reactive Oxygen+, NFE2L2+) or due to underlying evidence and MeSH term analysis suggesting oxidative stress context (e.g. Smoke, cigarette+). Oxidative stress is a well-known mechanism that leads to liver injury [28]. Reactive oxygen species (ROS), although necessary at low levels for cell signaling, have the potential to cause oxidative damage to cellular components such as DNA, proteins, and lipids. Under normal physiological conditions, cells have nonenzymatic antioxidants (e.g. glutathione) and enzymatic antioxidants (e.g. superoxide dismutases, glutathione peroxidases, catalase) that keep ROS levels low. However, when there is an excessive production of ROS and the cell’s antioxidant defenses are overwhelmed, oxidative stress can occur, leading to cellular damage. In the liver, this can lead to massive apoptosis and/or necrosis and hence to liver injury [28]. CRE analysis for Nim shows hypotheses indicative of increased oxidative stress such as Reactive Oxygen+, NFE2L2+, nitric oxide+ and hypotheses indicative of increased DNA damage and apoptosis such as TP53+, etoposide+ and PARP1+. Nim is a non-steroidal antiinflammatory (NSAID) drug that selectively inhibits COX-2. It was withdrawn from the market in many countries as a result of rare but serious adverse events including increases in serum aminotransferase, hepatocellular necrosis and hepatic cholestasis [29]. The mechanism(s) of the hepatotoxicity is not known, but both direct effects of the parent drug as well as toxicity from reactive metabolites have been suggested [30]. In a recent study of NSAIDs in rat hepatocytes, Nim caused glutathione depletion within 24 h at concentrations (AC₅₀: 1 μM) well below its Cmax value [31] in agreement with the CRE hypothesis that this drug is associated with oxidative stress. Nef is an antidepressant that acts as a 5-HT_2A, 5 HT_1A and α1-adrenergic receptor antagonist and was removed from the US market [32,33] due to reports of idiosyncratic hepatotoxicity [34,35]. A number of mechanisms may contribute to its liver toxicity, including inhibition of bile salt export pump (BSEP)- mediated transport [36], depletion of glutathione levels and inhibition of mitochondrial electron transport via Complex I inhibition [37]. Inflammation-associated idiosyncratic liver injury also has been suggested as a mechanism for toxicity caused by nefazodone [38]. Gla is an NSAID that has been associated with severe hepatotoxicity and hypersensitivity reactions [39].

In the CRE analysis, Bus, Ros and Nom clustered together. These three compounds were associated with lipid and glucose metabolism hypotheses such as SREBF1+, INSIG1+, cholesterol- and IGF+ as well as inflammation related hypotheses such as lipopolysaccharide+, IL1A+, JAK2, Dexamethasone+ and other hypotheses such as HIF1A+, SOD2- , EGR2+ and BRCA1+. These hypotheses generally reflect cellular perturbation more than specific toxicity. The antidepressant and antianxiolytic, Bus [40], can cause nausea and dizziness but has no known reports of hepatotoxicity. It is dosed at low Cmax exposures (750 nM), and thus even the 3 μM concentration tested here is greater than 100- fold above the doses used clinically [41].

Ros, a thiazolidinedione (TZD), belongs to a class of selective ligands of the nuclear transcription factor, peroxisome-proliferatoractivated receptor γ (PPARγ) used in the treatment of type 2 diabetes [42]. Ros and other TZDs affect glucose and lipid metabolism through activation of the PPARγ receptor. The TZDs have recently been reported to have many effects that go beyond glucose and lipid metabolism, such as reducing inflammation by decreasing some cytokines [43] and inducing autophagy through HIF1 alpha activation [44], and some of these effects are thought to occur via PPAR γ-independent pathways [43]. The upregulation of HIF1 alpha expression by rosiglitazone is consistent with the hypothesis, HIF1A+. Furthermore, the anti-inflammatory property of rosiglitazone is consistent with the hypothesis, Dexamethasone+, dexamethasone being an anti-inflammatory drug.

Nom, an antidepressant, has been associated with a few cases of hepatotoxicity but was withdrawn from the market because of hemolytic anemia [45]. In our study, Nom did not cluster with the other three hepatotoxicants, Nim, Nef, and Gla, possibly because hepatotoxicity has been very rarely associated with Nom. It also has been reported that P450-mediated metabolites of Nom could be the reason for the hemolytic anemia and hepatotoxicity associated with this drug [46]. Nom may not have clustered with the hepatotoxic compounds because HepG2 cells express very low levels of P450 enzymes, thus limiting Nom metabolism and activity.

Intriguingly, the Ros/Nom/Bus cluster also elicited a larger number of protein changes (Figure 2) and larger number of CRE hypotheses at most doses and time points than the Nim/Nef/Gla cluster (Figures 1 and 6). It is counter intuitive that the Nim/Nef/Gla cluster, although associated with DILI, caused fewer protein changes. However, the protein categories for the DILI group of compounds, oxidative stress, neoplasms, and carcinoma reflect more toxic cellular responses compared to the more benign terms of glucose, lipid and energy metabolism associated with the non-DILI group. This highlights the significance of the qualitative over quantitative difference as evidenced by the contextualized CRE analysis approach.

The SILAC approach used here provides a proteome-wide view of the cellular effects of compounds, and is a novel use of this technology for investigating toxicological mechanisms. Overall, the relative fold change in protein levels was between 0.75 and 1.5-fold, but covered a wide range of cellular processes. The application of the CRE method to this large data set afforded an integrated approach to analysis. The results using these algorithms showed a clear difference between the DILI and non-DILI compounds. Furthermore, the generated hypotheses were consistent with effects of these drugs reported in the literature. These results demonstrate that measuring cellular protein changes in response to compound treatment is a valuable approach for differentiating compounds and identifying potential mechanisms of toxicity. It is important to note, however, that the compound set used here was small and application of this approach to a wider range of compounds will be important both in further validating this approach and in helping to build a proteomic database that can be utilized to catalog toxicity profiles.

References

Citation: Enayetallah A, Nadanaciva S, Ajuh P, Martín CV, Wheat A, et al. (2015) Assessing Hepatoxicants based on High-throughput Quantitative SILAC Proteomics and Causal Biological Networks. J Proteomics Bioinform 8:049-059.

Copyright: © 2015 Enayetallah A, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Journal of Proteomics & BioinformaticsOpen Access