ISSN: 2167-7700
Research Article - (2017) Volume 6, Issue 2
Abstract Background and Aims: Crohn’s disease (CD) is a chronic and refractory intestinal inflammatory disease. The 3′ untranslated region (3′ UTR) of mRNA transcript can regulate its own translational process post-transciptionally, and its role in CD remains unclear. The aim of this study was to identify the critical genes influencing CD phenotype via the regulation of mRNA 3′ UTR. Methods: Isoform Structural Change Model (IsoSCM) was used to evaluate the length of 3′ UTR of the whole transciptome from a RNA-seq based online CD database. Correlations between 3′ UTR length status and mRNA level were analyzed by Spearman coefficients. Correlated genes list was overlapped with published critical CD related gene list. Univariate and multivariate analysis were performed to identify the genes associated with CD phenotype. Results: Compared with normal control, 3′ UTR of 34192 genes were shortened and 57389 were lengthened among 432784 genes in CD patients. The 3′ UTR changes of 255 genes were found significantly correlated with their mRNA level. Furthermore, 8 overlapped genes were shared by above 255 correlated genes and known CD related gene list (ABCB1, FAF1, TUT2, IL27, JAK2, LTB, NAGLU and PTPN2). The mRNA level of ABCB1 and JAK2, and shortened 3′ UTR length of JAK2 transcript were found to be correlated with deep ulcer in CD patients by univariate and multivariate analysis. Conclusions: The 3′ UTR changes of CD related genes were confirmed to be related to the phenotype of CD. Our findings may provide further insight into the epigenetic mechanisms and therapeutic strategies for CD.
Keywords: 3′ UTR; Crohn’s disease; IsoSCM; Genetic; Phenotype
3′ UTR: 3′ untranslated region; CD: Crohn’s Disease; IsoSCM: Isoform Structural Change Model; ApA: alternative cleavage and polyadenylation; miRNA: microRNA; GWAS: Genome wide association studies; NF kappa B: Nuclear factor kappa B; FID: Fas-interacting domain; IFN-γ: interferon gamma; Th17: T helper 17 cell; CDS: coding sequence; ABCB1: ATP binding cassette subfamily B member 1; FAF1: Fas-Associated Factor 1; IL27: interleukin 27; JAK2: Janus kinase 2; PTPN2: protein tyrosine phosphatase; Non-receptor type 2; FUT2: fucosyltransferase 2; LTB: lymphotoxin beta; NAGLU: N-acetyl-alpha-glucosaminidase.
Crohn’s disease (CD) is a chronic and relapsing intestinal inflammatory disease which influences millions of people in Europe, with an increasing prevalence in Asia and other developing countries [1,2]. It is generally accepted that CD is a genetically predisposed, environmentally triggered complex disease [3]. Nevertheless, the genetic background of CD remains unclear, especially the epigenetic mechanism. Understanding the underlying genetic mechanisms will be of great significance for early detection and therapeutic decisions.
The 3′ untranslated region (3′ UTR) is the untranslated sequence outside of the 3′ terminal exon of mRNA. Over half of mammalian genes use alternative cleavage and polyadenylation (ApA) or splicing of terminal exons to produce various mRNA isoforms that distinguish in their length of 3′ UTRs, whereas generating the same proteins [4]. In other words, mRNA with paroximal ApA site or distal ApA site corresponds to short or long 3′ UTR separately, while the exons of mRNA remain unaltered. The 3′ UTR plays a significant role in post-transcriptional regulation such as regulating mRNA stability, localization, rate of protein synthesis and localization of membrane proteins [5,6]. Dysregulation of 3′ UTR has been proven to be related to a number of diseases, such as cancer, cardiovascular diseases and autoimmune diseases [7].
According to the mainstream view, functions of 3′ UTR are mediated by micro-RNA, AU-rich-element and RNA-binding proteins (RBPs) etc [6]. In general, there are more microRNA target sites in the longer 3′ UTR region, mRNA with a longer 3′ UTR displays reduced translation, whereas mRNA with shorter one has a high level of translation [4].
Despite substantial progress in identifying the genetic spectrum of CD, our understanding is insufficient for the key mechanisms contributing to the disease. Over the past decade, specialized techniques have been developed to annotate RNA-seq data to reveal the transcriptome at the levels of RNA processing machineries, and their relationship with disease [8]. Isoform Structural Change Model (IsoSCM) is one of the tools that can analyze 3’ UTR. The change-point analysis applied in IsoSCM is able to tremendously increase the 3′ UTR annotation efficiency, by which we can integrate RNA-seq coverage to recognize polyadenylation sites and 3′ UTR boundaries with higher sensitivity and specificity [7]. Hence, we applied a novel strategy (IsoSCM) to identify 3’ UTR alternative splicing related to CD phenotype.
Microarray Data
Gene Expression Omnibus (GEO) database is a comprehensive public genomics database which stores individual gene expression profiles from curated datasets in the GEO repository. (http://www.ncbi.nlm.nih.gov/geo/). The following key words were used: (“Crohn’s disease” [MeSH Terms] AND “Homo sapiens” [porgn] AND “gse” [Filter]). Database GSE62207 contains expression profile in a cohort of 310 treatment-naive pediatric CD patients and controls and all the expression profiles were measured using the platform of IlluminaHiSeq 2000 (Homo sapiens) 2.0 Array. Ileal biopsies were obtained during diagnostic colonoscopies of patients with IBD-like symptoms and biopsies were stored at -80 degrees. All patients received colonoscopy and histological score and normal controls were those with suspected CD, but with normal colonoscopic and histologic findings.
Encoding mRNA level and the corresponding 3′ UTR length in the CD database
IsoSCM is a novel strategy which utilizes change-point analysis to increase the 3′ UTR annotation efficiency [7]. With the help of IsoSCM, we integrated RNA-seq coverage of 432784 genes from 188 CD patients and 70 normal people in the GSE62207. And UTR differential usage between CD and their normal counterpart was calculated using the following equation:
APA site usage=1 – down coverage/up coverage
Differential usage=APA site usage (case) – APA site usage(control)
The down coverage and up coverage are the UTR length in the downstream and upstream of the APA site. Moreover, the differential usage over 0 means the mRNA with short 3′ UTR, whereas differential usage less than 0 corresponds the mRNA with long 3′ UTR. Finally, the 3′ UTR length change (namely, the differential usage) for above genes in CD patients were listed in Supplementary Table 1.
Variations | Univariate Analysis | Multivariate Analysis | ||||||
---|---|---|---|---|---|---|---|---|
All cases (N=188) |
Patients without deep ulcer(N=124) | Patients with deep ulcer(N=64) | P value | OR (95% CI) | P value | |||
Age at diagnosis | 12.2 ± 3.1 | 12.2 ± 3.3 | 12.2 ± 2.7 | 0.907 | ─ | ─ | ||
Male gender, n (%) | 112(59.6%) | 73 (65.2%) | 39 (34.8%) | 0.782 | ─ | ─ | ||
ABCB1, quantile | 0.37(0.20-0.75) | 0.47(0.26-0.80) | 0.27(0.12-0.41) | <0.001 | 0.23(0.07-0.75) | 0.014 | ||
FAF1, quantile | 0.87(0.79-0.94) | 0.90 (0.82-0.97) | 0.80(0.74-0.91) | 0.001 | 0.36(0.01-11.20) | 0.563 | ||
FUT2, quantile | 0.98(0.69-1.34) | 0.95(0.69-1.20) | 1.13(0.67-1.42) | 0.204 | ─ | ─ | ||
IL27, quantile | 2.94(1.62-5.87) | 2.33(1.24-3.81) | 4.82(2.34-9.97) | <0.001 | 1.07(0.96-1.19) | 0.223 | ||
JAK2, quantile | 1.41(1.11-1.87) | 1.31(0.97-1.61) | 1.67(1.26-2.15) | 0.001 | 1.89(1.023-3.50) | 0.042 | ||
LTB, quantile | 0.51(0.31-1.01) | 0.47(0.29-1.10) | 0.59(0.33-0.97) | 0.654 | ─ | ─ | ||
NAGLU, quantile | 1.03(0.92-1.18) | 1.01(0.89-1.67) | 1.06(0.95-1.18) | 0.246 | ─ | ─ | ||
PTPN2, quantile | 0.98(0.83-1.16) | 0.93(0.82-1.09) | 1.10(0.89-1.27) | 0.004 | 0.49(0.05-5.256) | 0.55 |
Table1: Univariate and multivariate Analysis of the relationship between mRNA level and deep ulcer in CD patients.
Finding shared CD-related genes
We downloaded 290 CD related genes based on data of LD analyses, Genome wide association studies (GWASs), meta-analyses and HuGE Navigator [9-12]. And then, we compared 255 correlated genes with above 290 genes using the Microsoft Excel to find the overlapped genes shared by them.
Statistical Analysis
Descriptive statistics were computed for all variables including means and standard deviations (SD) or medians and interquartile ranges (IQR) for continuous factors, and frequencies for categorical factors. Comparisons of the distribution of clinical characteristics and mRNA level between the patients with or without deep ulcer were made by using the 2-tail t test or Wilcoxon rank sum test for continuous variables and chi-square test for categorical variables. Correlation analysis between 3′ UTR length and expression level of 432784 genes were made by using Spearman correlation analysis. Multivariate analysis of the factors associated with the deep ulcer were constructed using the logistic regression analysis. All statistical analysis was performed with the SPSS software (version 16; SPSS, Chicago, IL). P value less than 0.05 was considered statistically significant.
mRNA level and the corresponding 3′ UTR length in the CD database
We obtained the lengthened or shortened status of 3′ UTR length by IsoSCM and mRNA level of 432784 genes from 188 CD patients and 70 normal people in the GSE62207 database. There were 34192 genes with shortened 3′ UTR and 57389 genes with lengthened one in CD patients (Supplementary Table 1).
Correlation between 3′ UTR length and mRNA Level
RNA with longer 3′ UTR length tends to have lower mRNA level, whereas that with shorter 3′ UTR has higher one (Figure 1). Correlation analysis (Spearman correlation coefficients) between the 3′ UTR length and mRNA level was performed across the whole transcriptome. The mRNA level of 255 genes were found significantly correlated with 3′UTR length status (P<0.05, Spearman r ≧ 0.3) (Supplementary Table 2). About half of the mRNA of the 255 genes was shortened while the other half were lengthened in CD patients (Figure 2).
Figure 1: Model of genes with short or long 3′ UTR length: Alternative cleavage and polyadenylation (ApA) leads to mRNA isoforms with different 3′ UTR length. And mRNA with distal ApA site corresponds to a long 3′ UTR length, while that with paroximal ApA site has a short 3′ UTR length (Figure 1 was quoted from Zheng et al. Nat Commun. 2014 Nov 20;5:5274).
Figure 2: The central heatmap shows 255 genes′ expression (columns) undergoing 3′UTR shortening (green) or lengthening (red) in 188 CD patients (rows) compared with normal patients; The upper histogram shows the number of genes with shortening (green) or lengthening (red) per patient; The side histograms show the percentage of patients with 3′ UTR shortening (left) or lengthening (right) for each gene.
The overlapped genes in CD database and published literatures
Sung et al reported that they found 290 genes associated with CD based on data from Genome wide association studies (GWAS), LD analyses and HuGE Navigator. Hence, we used the Microsoft excel to find the overlapped genes between the 290 CD related genes and the 255 genes with 3′ UTR and mRNA level correlation. In the end, 8 genes were found overlapped by the two gene list (ABCB1, FAF1, FUT2, IL27, JAK2, LTB, NAGLU, PTPN2). (Supplementary Table 3, Figures 3 and 4).
Genes related to deep ulcer in CD patients
Based on these findings, correlation between 3′ UTR status and mRNA level of the 8 genes and clinical parameters (deep ulcer, age and gender) of the CD patients were analyzed by univariate and multivariate analysis. Univariate analysis showed that mRNA level of ABCB1, FAF1, IL27, JAK2 and PTPN2 were associated with deep ulcer status in CD patients. Multivariate analysis revealed that low mRNA level of ABCB1 and high mRNA level of JAK2 were independently associated with deep ulcer. Meanwhile, shortened 3′ UTR of JAK2 was associated with deep ulcer by univariate analysis (Tables 1,2 and Figure 5).
Despite abnormal inflammatory response to gut microbiota were considered as the critical factors for CD pathogenesis, the epigenetic and genetic pathogenesis still remains unclear. Genome wide association studies (GWAS) and subsequent meta-analysis for CD have identified more than 140 susceptible genes contributing to CD, such as NOD2, IL23R, ATG16L1, TNFSF15, IL17 etc. [13,14]. These researches focused on the functional relevance of genetic variations, the mechanisms of how these susceptible genes lead to the pathogenesis of CD has not been fully explained [15]. Since the 3′ UTR plays significant role in post-transcriptional regulation, different 3′ UTR lengths caused by alternative cleavage and polyadenylation (ApA) can lead to major difference in post-transcriptional binding site, such as binding sequence of miRNA. Several studies have demonstrated that shorter 3′ UTR of IL23R and ATG16L1 displays elevated levels of both mRNA and protein production by the loss of binding capacity for the miRNAs (miRNA Let-7e and Let-7f for IL23R while miR-106b and miR142-3p for ATG16L1) [15-17].
From whole trancriptome data, we identified that the changes of 3′ UTR length of 255 genes were correlated with mRNA level. Eight genes were shared by our 255 genes list and 290 CD related genes found by GWAS and deep sequencing. The relationship between mRNA level and the changes of 3′ UTR length for the eight overlapped genes was confirmed by the Spearman correlation coefficients (Figure 4). Among them, low mRNA level of ABCB1 and FAF1 was connected with deep ulcers in CD patients while high mRNA level of JAK2, IL-27 and PTPN2 corresponded to that (Figure 5). Moreover, mRNA level of ABCB1 and JAK2 were independently associated with deep ulcers in CD patients by multivariate regression analysis. In addition, 3′ UTR length of JAK2 was related to deep ulcer by univariate regression analysis (Tables 1 and 2).
JAK2 (Janus kinase 2) is a protein tyrosine kinase related to multiple inflammatory signaling pathways [18]. JAK2 is the juncture of two critical intracellular inflammatory pathways in Th17 cells (IL17─JAK2─STAT3 and IL23R─JAK2─STAT3 pathways), which plays an important role in the pathogenesis of CD [19,20]. Moreover, microRNA-375 was reported to activate the JAK2-STAT3 pathway by binding to the 3′ UTR of JAK2 during inflammatory response [21]. We found that shortened 3′ UTR and higher mRNA level of JAK2 were mutually correlated and simultaneously associated with deep ulcer in CD patients (Figures 4 and 5). Furthermore, multivariate regression analysis and confirmed that high mRNA level of JAK2 was an independent risk factor for deep ulcer in CD patients with p value as 0.042 OR (95% CI) as 1.89 (1.023-3.50). The underlying mechanism for shortened 3′ UTR and higher mRNA level of JAK2 in CD patients may be explained by the relative few microRNA binding sites in its 3′ UTR region. Our findings provided a novel post-transcriptional regulation mechanism of JAK2 in the pathogenesis of CD.
We also found the 3′ UTR length change of seven other genes were correlated with their mRNA level (ABCB1, FAF1, TUT2, IL27, LTB, NAGLU and PTPN2) (Figure 4), but the length change of 3′ UTR of these genes was not significantly associated with deep ulcer by univariate regression analysis (Table 2). These results suggested that the expression level of genes could be regulated by the length of 3′ UTR, but this type of regulation might not have significant clinical influence in our dataset.
Variations | All cases | Patients without deep ulcer | Patients with deep ulcer | P value | |
---|---|---|---|---|---|
Number of patients | 188 | 124 | 64 | ─ | |
Age at CD diagnosis | 12.2 ± 3.1 | 12.2 ± 3.3 | 12.2 ± 2.7 | 0.907 | |
Male gender, n (%) | 112(59.6%) | 73 (65.2%) | 39 (34.8%) | 0.781 | |
ABCB1, quantile | 0.00(-0.04-0.049) | 0.002(-0.03-0.06) | 0.00(-0.04-0.027) | 0.338 | |
FAF1, quantile | 0.02(0.00-0.04) | 0.03 (0.00-0.045) | 0.02(0.00-0.04) | 0.538 | |
FUT2, quantile | 0.045(0.00-0.08) | 0.04(0.00-0.08) | 0.05(0.02-0.09) | 0.240 | |
IL-27, quantile | 0.02(0.00-0.135) | 0.01(0.00-0.13) | 0.05(0.00-0.208) | 0.133 | |
JAK2, quantile | 0.03(0.00-0.07) | 0.01(0.00-0.06) | 0.05(0.00-0.085) | 0.009 | |
LTB, quantile | -0.001(-0.002-0.043) | -0.001(-0.0029-0.039) | -0.001(-0.0018-0.035) | 0.595 | |
NAGLU, quantile | 0.000(-0.02-0.008) | 0.00(-0.03-0.01) | 0.00(-0.02-0.006) | 0.460 | |
PTPN2, quantile | 0.005(-0.02-0.071) | 0.0035(-0.03-0.075) | 0.006(-0.01-0.062) | 0.773 |
Table 2: Univariate Analysis of the relationship between 3′ UTR length change and deep ulcer in CD patients.
ATP binding cassette subfamily B member 1 (ABCB1) is an ATP-dependent drug efflux pump, which can decrease drug accumulation involved in multidrug resistance [22,23]. Alfreda et al. found that 2 SNPs of ABCB1 (rs10248420 and rs2032583) were significantly related to CD, and 5 SNPs (rs1128503, rs1202184, rs1202186, rs2091766 and rs2235046) were associated with non-inflammatory CD based on a genotype-phenotype analysis [24]. As an ATP-dependent drug efflux pump, ABCB1 plays a crucial role in preventing the uptake of toxic compounds including bacterial toxins into the intestinal mucosa, which could account for its protective role in CD [25]. In line with these results, we also found that the mRNA level of ABCB1 was lower in deep ulcer group of CD patients (Figure 5). And its protective factor was verified by univariate and multivariate regression analysis with p value of <0.001 and 0.014 respectively (Tables 1 and 2).
The 3′ UTR of mRNA were found deeply involved in the signaling pathways in CD such as NF κB. Fas associated factor 1 (FAF1) was a protective factor for CD patients with lower mRNA levels in the deep ulcer group (Figure 5). FAF1 could suppress CD40-induced NF κB activation by interacting with the CD40 receptor [26]. And previous studies have demonstrated that CD40 expression level was high in intestinal mucosa and dendritic cells in CD patients [27]. What’s more, IFN-γ/CD40L induced cytokine secretion plays an important role in intestinal mucosal inflammation [28]. Therefore, we hypothesize that the loss of FAF1 function in certain patients may contribute to pathogenesis and severity of CD through unchecked NF κB activation. And further investigation of this process is warranted, which may provide a novel theoretical basis for the molecule-targeted treatment of CD.
IL27 is a cytokine which belongs to interleukin 12 family, as well as a subunit of the IL27/WSX-1 heterodimeric cytokine complex. We found that high mRNA level of IL27 was related to deep ulcer in CD patients (Figure 5), which was in accordance with other studies [29-32]. And Honda et al revealed the inflammatory property of IL27/WSX-1 signaling in intestinal inflammation, which may thus be a promising candidate for the therapeutic intervention in CD [33]. Bancroft and Takeda et al discovered that IL27/ WSX-1 signaling occupied a central position in triggering the production of IFN-γ by naive CD4+ T cells via STAT1-mediated T-bet activation [34,35]. Hence, IL 27 may trigger excessive intestinal immune response by inducing IFN-γ production via the IL27/WSX-1─STAT1 pathway, which could lead to the onset of CD.
PTPN2 is a member of the PTP family which is known to be signaling molecules regulating a variety of cellular processes including cell growth, differentiation, mitotic cycle and apoptosis [36]. Some studies showed that PTPN2 was a susceptible gene for CD and PTPN2 was reported to overexpress in the inflamed intestinal tissues of CD patients [37,38]. In accordance with them, we discovered that high mRNA level of PTPN2 was significantly correlated with deep ulcer in CD patients (Figure 5). As a signaling molecule, PTPN2 has a number of target genes including Janus kinases (JAKs), signal transducer and activator of transcription (STAT) 1 and 3, mitogen-activated protein kinase (MAPK) and epidermal growth factor receptor (EGFR) [39-41]. As mentioned above, IL17─JAK2─STAT3 and IL23R─JAK2 ─STAT3 pathway were essential for the function of Th17 in inflammation. We suppose that PTPN2 and JAK2 may have overlapped effects on deep ulcer in CD since only JAK2 was associated with deep ulcer in multivariate analysis whereas both PTPN2 and JAK2 were significant in univariate analysis.
Among the 8 overlapped genes, there was no significant correlation found between deep ulcer phenotype and FUT2, LTB and NAGLU. Fucosyltransferase 2 (FUT2) is an enzyme engaged in the synthesis of the H antigen, while H antigen is both an attachment site and carbon source for gut bacteria. So we hypothesize that high mRNA level of FUT2 may result in increased susceptibility to CD [42]. As for Lymphotoxin beta (LTB), a type II membrane protein of the TNF family, its role in the pathogenesis and phenotype of CD remains controversial. Krause et al and Spahn et al found that LTB could modulate the innate immune response and accelerate the recovery of intestinal mucus [43,44], while Friedrich et al and Wang et al hold an opposite opinion that LTB might exacerbate immune response of intestinal mucous [45]. When it comes to N-acetylglucosaminidase (NAGLU), the association of NAGLU with CD was not found in several databases such as PubMed. Despite there was no correlation between above three genes and deep ulcer phenotype, we have confirmed their relationship with CD by comparing our result with GWAS data.
This study was based on bioinformatic analysis for the data from GWAS and CD database. Up to now, we have been making every effort to collect clinical specimens of different CD phenotype for further investigation. Our study identified a novel epigenetic regulation mechanism for the mRNA level of CD related genes. Based on these findings, we can explore the potential mechanisms of how these genes are connected with CD phenotype. Furthermore, we can construct a clinically useful genetic profile that can aid in treatment decisions.
Grant Support: This work was supported by National Natural Science Foundation of China (No. 81400603), National Natural Science Foundation of China (No. 81402019), Guangdong Natural Science Foundation (No.2015A030310190), Guangdong Natural Science Foundation (No.2015A030313108) and Science and Technology Planning Project of Guangdong Province (No.2015B020229001).
TH, HSL and JK contributed to study concept and design, acquisition, analysis, interpretation of data and drafting of the manuscript. XRW, CZ, HSL, YFC, XHL, XWH, YFZ and XSH contributed to data collections and manuscript review. XJW and PL contributed to study concept and design, analysis and interpretation of data and critical revision of the manuscript for important intellectual content. XJW and PL supervised the study. All authors read and approved the final manuscript.