ISSN: 2376-0354
+44-77-2385-9429
Research Article - (2019)Volume 6, Issue 2
Alternative splicing (AS) produces multiple mRNA splicing variants from a single precursor transcript. Recent genome sequencing analyses and increasing experimental evidence in flowering plants have revealed that AS is far more prevalent than previously thought and plays crucial roles in the diversification of gene regulation. Despite numerous studies, the extent and complexity of mRNA variants in plants remain poorly characterized from a global perspective. In present study, 589,034 mRNA variants from 442,541 annotated genes of 12 plant species were investigated. All AS genes were classified into four groups on the basis of the numbers of mRNA variants, namely, 2V (two variants), 3V (three variants), 4V (four variants), and 5V+ (five or more variants). Interestingly, our analysis indicated that more than 50% AS genes generated only two variants in higher plants. A global analysis of gene structure revealed that AS genes contained more but shorter exons and introns as the number of mRNA variants increased. The results also suggested that AS elicited different effects on the improvement of transcriptome and proteome diversity. Taken together, cross-species analysis provided the most comprehensive set of annotated splicing variants in higher plants thus far and extended the current view about mRNA variants.
mRNA; Alternative splicing; Deep sequencing; Splicing variants
AA: Alternative Acceptor Sites; AD: Alternative Donor Sites; AS: Alternative Splicing; ES: Exon Skipping; IR: Intron Retention
Alternative splicing (AS) is a common and fundamental process that contributes to both transcriptome and proteome diversities [1]. AS generates two or more mRNA variants from a single pre-mRNA with multiple introns through different splice sites [2,3]. Most plant genes (80% to 85%) are interrupted by introns [4]. However, the intron and exon organization of higher plant genes is not similar to that of animal genes [5]. Introns in plant genes are generally much shorter in length and fewer in number compared with those in animal genes [6]. In contrast to reports on the differences between plant and animal species, studies that compare the gene structures of different plant species such as food crops and horticultural crops are scarce.
Moreover, AS has been found to be an important regulatory process in different cell types, different developmental stages, and environmental responses. Therefore, individual mRNA variants may play specific spatial or temporal roles [7,8]. The crucial role of AS has been extensively investigated in animals and plants [9,10].
AS is associated with the human genetic diseases, and also involved in a range of functions in plants, such as seed germination, stress response [11,12]. In eukaryotes, AS of pre-mRNAs significantly contributes to the proper expression of the genome and results in new protein products (13). However, not all of the products are functional [13,14]. Several AS variants contain premature termination codons (PTCs) that potentially lead to unproductive transcripts, truncated proteins or mRNA decay [15,16].
Conversely, recent studies indicated AS act as a regulatory mechanism influencing the expression of non-coding RNA [17]. Primary transcripts of miRNAs (pri-miRNAs) contain introns, and their AS events has been detected [18,19]. In Arabidopsis, AS event disrupts the secondary structures of pri-miR162a and provides a mechanism for miR400 expression in response to environmental cues [20,21]. Tomato as a kind of very important horticultural crops, research has found that the AS of miR4376 through regulating the expression of an autoinhibited Ca2+- ATPase, ACA10, affect the reproductive growth of tomato [22,23]. Thus, growing evidence has indicated that AS in plants is much more prevalent than previously thought and plays crucial roles in the diversification of gene regulation.
Genome-wide studies of AS events in various organisms have relied on using the traditional expressed sequence tag and cDNA libraries [24]. However, the occurrence of AS events might be underestimated because data from these libraries do not provide a complete coverage of the transcriptome. High-throughput RNA sequencing (RNA-seq) technologies allow transcriptome analyses at unprecedented levels of sensitivity and precision. Such technologies also provide large data sets for comprehensive analyses [8].
Studies on plant RNA-seq data sets have identified thousands of AS events and confirmed most annotated introns in several vertebrate species. A previous RNA-seq analysis revealed that approximately 95% of human multi-exon genes express multiple splicing variants [25]. Recently, it was shown that 61% of Arabidopsis intron-containing genes and 56% of maize multiexonic genes are subject to AS [26,27]. A large number of variable splicing events of genes have been found in horticultural crops. For example, in strawberries, 66.43% detected multi-exon genes undergo AS [28,29]. And in cucumber, recent studies have found that 58% of the multi-exon genes underwent AS [30]. Also, a large number of genes were alternatively spliced in the soybean genome [31]. To our knowledge, the extent and frequency of plant mRNA variants remain poorly characterized from a global perspective.
According to different AS types, AS events can be mainly classified as exon skipping (ES), intron retention (IR), alternative donor sites (AD), and alternative acceptor sites (AA). IR is the most common splicing form in plants [32]. However, one AS gene variant might simultaneously have multiple different AS types. In this study, to achieve a nonbiased and complete analysis of the plant transcriptome, we analyzed the mRNA splicing variants of 12 plant species (8 dicotyledons and 4 monocotyledons) including food crops and horticultural crops which subjected to high-throughput sequencing database, including Poplar (Populus trichocarpa), Medicago (Medicago truncatula), soybean (Glycine max), Arabidopsis (Arabidopsis thaliana), cotton (Gossypium raimondii), cacao (Theobroma cacao), rose gum (Eucalyptus grandis), blue columbine (Aquilegia coerulea), sorghum (Sorghum bicolor), maize (Zea mays ), rice (Oryza sativa), and Brachypodium (Brachypodium distachyon). All mRNA variants from the annotated genes were confirmed by genome alignment and investigated the relationship between the number of AS mRNA variants and their effects on gene features at the genome-wide level.
Database set-up
A total of 12 recently published RNA-seq data sets from higher plants were selected and downloaded at Phytozome, which is the Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute (https://phytozome.jgi.doe.gov/ pz/portal.html). Detail information about mRNA variants of 12 plant species is provided in the Supplemental section (Table 1 and Supplementary Table 1).
Species | Gene number | The number of mRNA variants | AS gene number | Percentage | The number of AS variants | mRNA variants per AS gene | Database |
---|---|---|---|---|---|---|---|
Populus trichocarpa | 40668 | 45033 | 3063 | 0.075317203 | 7428 | 2.425073457 | http://www.plantgdb.org/PtGDB/ |
Medicago truncatula | 50962 | 53423 | 1797 | 0.035261567 | 4258 | 2.36950473 | http://www.medicago.org/ |
Glycine max | 46367 | 55788 | 6611 | 0.142579852 | 16031 | 2.424897897 | http://www.plantgdb.org/GmGDB/ |
Arabidopsis thaliana | 27416 | 35386 | 5804 | 0.211701196 | 13774 | 2.373190903 | http://www.arabidopsis.org/ |
Gossypium raimondii | 37505 | 77267 | 15403 | 0.410691908 | 55165 | 3.581445173 | http://www.cottondb.org/wwwroot/cdbhome.php |
Theobroma cacao | 29452 | 44404 | 7630 | 0.259065598 | 22582 | 2.959633028 | http://www.cacaogenomedb.org/ |
Eucalyptus grandis | 36376 | 46315 | 6110 | 0.167967891 | 16049 | 2.626677578 | http://www.phytozome.net/eucalyptus.php |
Aquilegia coerulea | 24823 | 41063 | 7487 | 0.301615437 | 23727 | 3.169093095 | http://www.phytozome.net/aquilegia.php |
Sorghum bicolor | 27608 | 29448 | 1489 | 0.053933642 | 3329 | 2.235728677 | http://www.plantgdb.org/SbGDB/ |
Zea mays | 38914 | 63540 | 13137 | 0.337590584 | 37763 | 2.87455279 | http://www.maizegdb.org/ |
Oryza sativa | 55898 | 66338 | 6458 | 0.115531862 | 16809 | 2.60281821 | http://rice.plantbiology.msu.edu/ |
Brachypodium distachyon | 26552 | 31029 | 3201 | 0.12055589 | 7678 | 2.39862543 | http://www.brachypodium.org/database |
Table 1: The mRNA variants of 12 plant species used in this study.
mRNA variants alignment to the reference plant genome
A total of 589,034 mRNA variants were aligned to perfectly match gene locations in the reference genome with a maximum of two mismatches. Among the mRNA splicing products, only the annotated genes were selected and classified into groups according to the number of mRNA variants. The correlative gene features (introns, coding sequence, exons, or UTRs) were generated, including their distribution, location, length, and numbers. Detail information about the gene features are given in the Supplemental section (Supplementary Table 1).
Clustering of AS gene features
The gene features in the different groups were calculated.
Identification and classification of mRNA variants in higher plants
In general, RNA-seq provides broad and deep sequencing of the transcriptome with low false discovery rates. Therefore, to facilitate further investigation of plant mRNA variants from a global perspective, we mined the RNA-seq data sets of 12 plant species to identify thousands of mRNA variants and all the gene sets from 12 annotation genomes downloaded at https://phytozome.jgi.doe. gov/pz/portal.html (Figure 1 and Table 1). All mRNA variants were aligned to perfectly match the corresponding gene locations in the reference genome (Supplementary Table 1). To ensure the authenticity of each mRNA product, only the annotated mRNA were pick up and classified according to the number of mRNA variants into groups (Table 2). The correlative gene features of introns, coding sequence, exons, or untranslated regions (UTRs) were recorded. These features include distribution, location, length and number (Tables 1- 3 and Supplementary Table 1).
Figure 1: Characteristics of plant annotated gene features and AS variants.
Statistics of the annotated gene features and AS events in 12 plant species. The line represents the evolutionary relationships among various biological species in the phylogenetic tree.
Populus trichocarpa | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 40668 | 45033 | 2790.17 | 382.78 | 1151.35 | 4.98 | 231.14 | 3.98 | 347.09 | 31860 | 4358512 | 30548 | 8673721 |
AS gene | 3063 | 7428 | 4461.6 | 435.25 | 1308.76 | 6.95 | 188.18 | 5.95 | 358.74 | 9286 | 1369251 | 8457 | 2592763 |
1V (1 variant) | 37605 | 37605 | 2655.11 | 372.42 | 1120.26 | 4.59 | 243.98 | 3.59 | 343.28 | 22574 | 2989261 | 22091 | 6080958 |
2V (2 variants) | 2341 | 4682 | 4381.25 | 438.89 | 1319.67 | 6.96 | 189.39 | 5.96 | 367.39 | 5257 | 729357 | 4956 | 1465027 |
3V (3 variants) | 441 | 1323 | 4660.35 | 414.59 | 1243.76 | 6.97 | 178.52 | 5.97 | 357.08 | 1820 | 259825 | 1628 | 511954 |
4V (4 variants) | 140 | 560 | 4693.83 | 421.74 | 1265.23 | 6.34 | 199.59 | 5.34 | 366.9 | 866 | 131871 | 674 | 202575 |
5V+ (5 and more variants) | 141 | 863 | 4943.29 | 455.97 | 1367.9 | 7.22 | 189.62 | 6.22 | 311.62 | 1343 | 248198 | 1199 | 413207 |
Medicago truncatula | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 50962 | 53423 | 2108.34 | 243.25 | 732.74 | 3.41 | 214.77 | 2.41 | 412.87 | 28152 | 9491631 | 26769 | 8876325 |
AS gene | 1797 | 4258 | 5024.56 | 328.08 | 987.23 | 6.5 | 151.71 | 5.5 | 413.07 | 6165 | 1160752 | 5475 | 1529566 |
1V (1 variant) | 49165 | 49165 | 2002.78 | 235.9 | 710.69 | 3.14 | 226.05 | 2.14 | 412.83 | 21987 | 8330879 | 21294 | 7346759 |
2V (2 variants) | 1272 | 2544 | 4808.59 | 332.91 | 1001.74 | 6.25 | 160.15 | 5.25 | 418.02 | 3473 | 658254 | 3065 | 866334 |
3V (3 variants) | 416 | 1248 | 5483.05 | 323.74 | 971.22 | 6.77 | 143.66 | 5.77 | 415.19 | 1957 | 380253 | 1730 | 477710 |
4V (4 variants) | 89 | 356 | 5565.34 | 324.37 | 973.12 | 7.39 | 131.72 | 6.39 | 364.38 | 554 | 84931 | 527 | 148381 |
5V+ (5 and more variants) | 20 | 110 | 6816.85 | 277.38 | 832.15 | 6.29 | 132.36 | 5.29 | 463.6 | 181 | 37314 | 153 | 37141 |
Glycine max | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 46367 | 55787 | 3715.72 | 407.16 | 1224.49 | 5.93 | 206.26 | 4.93 | 423.71 | 43437 | 5511798 | 43367 | 13069327 |
AS gene | 6611 | 16031 | 5137.82 | 381.42 | 1147.25 | 6.87 | 166.73 | 5.87 | 446.05 | 20035 | 2639875 | 18579 | 5918727 |
1V (1 variant) | 39756 | 39756 | 3480.4 | 417.54 | 1255.63 | 5.56 | 225.96 | 4.56 | 412.11 | 23402 | 2871923 | 24788 | 7150600 |
2V (2 variants) | 4738 | 9476 | 5156.83 | 410.17 | 1233.52 | 6.9 | 178.43 | 5.9 | 455.25 | 11083 | 1465272 | 10617 | 3365542 |
3V (3 variants) | 1265 | 3795 | 5077.25 | 356.06 | 1068.18 | 6.93 | 154.37 | 5.93 | 432.93 | 4969 | 654303 | 4553 | 1460803 |
4V (4 variants) | 417 | 1668 | 5118.5 | 329.68 | 989.04 | 6.72 | 147.34 | 5.72 | 457.24 | 2353 | 323642 | 2065 | 639972 |
5V+ (5 and more variants) | 191 | 1092 | 5109.7 | 299.05 | 897.15 | 6.61 | 135.83 | 5.61 | 392.68 | 1630 | 196658 | 1344 | 452410 |
Arabidopsis thaliana | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 27416 | 35386 | 2205.02 | 409.5 | 1231.49 | 5.57 | 220.87 | 4.57 | 156.9 | 34621 | 4123096 | 30634 | 6636007 |
AS gene | 5804 | 13774 | 2967.51 | 441.42 | 1327.26 | 7.24 | 183.07 | 6.24 | 154.82 | 18585 | 2308121 | 15036 | 3362233 |
1V (1 variant) | 21612 | 21612 | 2001.51 | 389.15 | 1170.45 | 4.51 | 259.57 | 3.51 | 159.24 | 16036 | 1814975 | 15598 | 3273774 |
2V (2 variants) | 4251 | 8502 | 2904.55 | 448.91 | 1349.72 | 6.96 | 193.76 | 5.96 | 155.16 | 10653 | 1331586 | 9013 | 2018121 |
3V (3 variants) | 1133 | 3399 | 3151.09 | 445.73 | 1337.2 | 7.78 | 171.92 | 6.78 | 155.08 | 4883 | 620999 | 3827 | 830656 |
4V (4 variants) | 291 | 1164 | 3092.06 | 408.93 | 1226.78 | 7.35 | 167.08 | 6.35 | 153.96 | 1829 | 221557 | 1371 | 325446 |
5V+ (5 and more variants) | 129 | 709 | 3149.03 | 384.34 | 1153.02 | 7.85 | 146.97 | 6.85 | 151.44 | 1220 | 133979 | 825 | 188010 |
Gossypium raimondii | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 37505 | 77267 | 3242.49 | 426.55 | 1282.64 | 6.29 | 203.66 | 5.29 | 341.58 | 89649 | 16338798 | 86495 | 27777027 |
AS gene | 15403 | 55165 | 4677.3 | 462.46 | 1390.38 | 7.47 | 185.83 | 6.47 | 341.18 | 73711 | 13404603 | 70032 | 23251420 |
1V (1 variant) | 22102 | 22102 | 2244.26 | 336.91 | 1013.74 | 3.35 | 302.98 | 2.35 | 344.31 | 15938 | 2934195 | 16463 | 4525607 |
2V (2 variants) | 6049 | 12098 | 3980.47 | 436.11 | 1311.33 | 5.93 | 220.81 | 4.93 | 332.68 | 14143 | 2587001 | 13674 | 4400916 |
3V (3 variants) | 3566 | 10698 | 4611.7 | 451.9 | 1355.71 | 6.8 | 199.56 | 5.8 | 339.25 | 13589 | 2510033 | 12991 | 4325895 |
4V (4 variants) | 2213 | 8852 | 5116.82 | 471.31 | 1413.92 | 7.57 | 186.92 | 6.57 | 343.62 | 11710 | 2144201 | 11288 | 3785673 |
5V+ (5 and more variants) | 3575 | 23517 | 5649.75 | 477.49 | 1432.46 | 8.53 | 167.98 | 7.53 | 343.92 | 34269 | 6163368 | 32079 | 10738936 |
Theobroma cacao | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 29452 | 44404 | 6091.33 | 437.57 | 1315.71 | 5.97 | 220.4 | 4.97 | 493.04 | 60128 | 11290078 | 56959 | 18099236 |
AS gene | 7630 | 22582 | 8624.45 | 535.06 | 1608.17 | 8.15 | 197.07 | 7.15 | 449.92 | 31934 | 5924456 | 31173 | 9963478 |
1V (1 variant) | 21822 | 21822 | 5206.98 | 336.68 | 1013.05 | 3.7 | 273.54 | 2.7 | 611.06 | 28194 | 5365622 | 25786 | 8135758 |
2V (2 variants) | 3963 | 7926 | 7785.43 | 467.22 | 1404.66 | 6.76 | 207.64 | 5.76 | 459.79 | 10430 | 1960962 | 9731 | 3232738 |
3V (3 variants) | 1848 | 5544 | 8638.71 | 530.63 | 1591.88 | 8.25 | 193.17 | 7.25 | 455.75 | 7815 | 1455309 | 7354 | 2420859 |
4V (4 variants) | 918 | 3672 | 8456.85 | 570.26 | 1710.78 | 9.24 | 185.22 | 8.24 | 415.2 | 5258 | 973926 | 5200 | 1695281 |
5V+ (5 and more variants) | 901 | 5440 | 12456.36 | 614.65 | 1843.96 | 9.35 | 197.34 | 8.35 | 457.99 | 8431 | 1534259 | 8888 | 2614600 |
Eucalyptus grandis | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 36376 | 46315 | 3104.6 | 391.89 | 1178.68 | 5.18 | 227.58 | 4.18 | 423.72 | 37182 | 5850297 | 39285 | 12218689 |
AS gene | 6110 | 16049 | 5253.45 | 428.5 | 1288.49 | 7.4 | 173.86 | 6.4 | 438.37 | 19483 | 3184895 | 20353 | 7133832 |
1V (1 variant) | 30266 | 30266 | 2672 | 372.49 | 1120.46 | 4 | 280.32 | 3 | 407.12 | 17699 | 2665402 | 18932 | 5084857 |
2V (2 variants) | 3818 | 7636 | 5019.08 | 440.13 | 1323.39 | 6.9 | 191.46 | 5.9 | 440.79 | 8640 | 1406841 | 9229 | 3198094 |
3V (3 variants) | 1412 | 4236 | 5507.08 | 431.47 | 1294.4 | 7.68 | 168.73 | 6.68 | 432.69 | 5152 | 842774 | 5486 | 1934006 |
4V (4 variants) | 499 | 1996 | 5431.48 | 392.02 | 1176.05 | 7.37 | 159.57 | 6.37 | 427.56 | 2601 | 426692 | 2649 | 970144 |
5V+ (5 and more variants) | 381 | 2181 | 6428.97 | 415.38 | 1246.13 | 8.62 | 144.56 | 7.62 | 449.74 | 3090 | 508588 | 2989 | 1031588 |
Aquilegia coerulea | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 24823 | 41063 | 3560.47 | 435.8 | 1310.41 | 6.25 | 209.54 | 5.25 | 454.82 | 40850 | 6425333 | 41754 | 12527440 |
AS gene | 7487 | 23727 | 5824.71 | 490.02 | 1473.05 | 8.17 | 179.96 | 7.17 | 460.53 | 30044 | 4978721 | 30126 | 9726927 |
1V (1 variant) | 17336 | 17336 | 2584.03 | 361.6 | 1087.81 | 3.61 | 301.15 | 2.61 | 433.33 | 10806 | 1446612 | 11628 | 2800513 |
2V (2 variants) | 3590 | 7180 | 4991.83 | 453.29 | 1362.87 | 6.51 | 208.99 | 5.51 | 467.93 | 8123 | 1203436 | 8315 | 2459966 |
3V (3 variants) | 1765 | 5295 | 6176.44 | 501.17 | 1503.5 | 8.01 | 187.88 | 7.01 | 472.41 | 6590 | 1062144 | 6615 | 2114883 |
4V (4 variants) | 969 | 3876 | 6683.13 | 491.79 | 1475.36 | 8.48 | 174.03 | 7.48 | 484.4 | 5100 | 846089 | 4983 | 1631428 |
5V+ (5 and more variants) | 1163 | 7376 | 7146.66 | 516.84 | 1550.51 | 9.75 | 159.11 | 8.75 | 438.44 | 10231 | 1867052 | 10213 | 3520650 |
Sorghum bicolor | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 27608 | 29448 | 3226.51 | 418.56 | 1258.69 | 4.96 | 253.59 | 3.96 | 430.76 | 15843 | 2076458 | 16726 | 4738354 |
AS gene | 1489 | 3329 | 4706.41 | 411.01 | 1236.02 | 7.07 | 174.55 | 6.07 | 393.37 | 4267 | 616662 | 3658 | 1110048 |
1V (1 variant) | 26119 | 26119 | 3143.2 | 419.53 | 1261.58 | 4.69 | 268.77 | 3.69 | 438.59 | 11576 | 1459796 | 13068 | 3628306 |
2V (2 variants) | 1218 | 2436 | 4593.88 | 414.39 | 1246.17 | 6.84 | 181.87 | 5.84 | 397.34 | 2943 | 419295 | 2658 | 784846 |
3V (3 variants) | 213 | 639 | 5097.32 | 402.48 | 1207.44 | 7.85 | 153.95 | 6.85 | 381.36 | 911 | 131649 | 716 | 229618 |
4V (4 variants) | 43 | 172 | 5509.35 | 378.39 | 1135.17 | 7.1 | 159.88 | 6.1 | 412.07 | 285 | 43535 | 185 | 62900 |
5V+ (5 and more variants) | 15 | 82 | 5991.4 | 445.29 | 1335.88 | 7.71 | 173.4 | 6.71 | 350.72 | 128 | 22183 | 99 | 32684 |
Zea mays | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 38914 | 63540 | 4074 | 332.02 | 999.07 | 4.6 | 216.96 | 3.6 | 634.05 | 71771 | 13942080 | 68219 | 20414673 |
AS gene | 13137 | 37763 | 6029.87 | 333.26 | 1002.77 | 5.37 | 186.4 | 4.37 | 622.83 | 51975 | 10331181 | 48954 | 14727072 |
1V (1 variant) | 25777 | 25777 | 3129.56 | 330.22 | 993.65 | 3.47 | 286.18 | 2.47 | 663.12 | 19796 | 3610899 | 19265 | 5687601 |
2V (2 variants) | 7220 | 14440 | 5503.46 | 363.23 | 1092.69 | 5.11 | 213.58 | 4.11 | 637.92 | 16299 | 3227319 | 15939 | 4992829 |
3V (3 variants) | 3045 | 9135 | 6492.13 | 338.13 | 1014.39 | 5.5 | 184.65 | 4.5 | 640.23 | 12345 | 2500623 | 12000 | 3636922 |
4V (4 variants) | 1476 | 5904 | 6397.92 | 309.67 | 929.01 | 5.54 | 167.84 | 4.54 | 596.55 | 9065 | 1798140 | 8251 | 2451850 |
5V+ (5 and more variants) | 1396 | 8284 | 7354.95 | 292.45 | 877.34 | 5.56 | 157.94 | 4.56 | 598.83 | 14266 | 2805099 | 12764 | 3645471 |
Oryza sativa | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 55801 | 66153 | 2963.58 | 382.78 | 1151.35 | 4.4 | 304.6 | 3.4 | 381.81 | 44750 | 8497020 | 42193 | 15794083 |
AS gene | 6457 | 16809 | 4851.11 | 435.25 | 1308.76 | 6.26 | 195.99 | 5.26 | 378.58 | 25122 | 5139835 | 22903 | 8983107 |
1V (1 variant) | 49344 | 49344 | 2717.72 | 372.42 | 1120.26 | 3.77 | 365.94 | 2.77 | 383.9 | 19628 | 3357185 | 19290 | 6810976 |
2V (2 variants) | 4120 | 8240 | 4610.08 | 438.89 | 1319.67 | 6.07 | 205.93 | 5.07 | 384.05 | 11136 | 2204572 | 10717 | 4158976 |
3V (3 variants) | 1450 | 4350 | 5106.05 | 414.59 | 1243.76 | 6.42 | 191.24 | 5.42 | 378.1 | 6498 | 1314676 | 6016 | 2366342 |
4V (4 variants) | 534 | 2136 | 5409.58 | 421.74 | 1265.23 | 6.37 | 183.66 | 5.37 | 381.07 | 3446 | 743922 | 3077 | 1170560 |
5V+ (5 and more variants) | 353 | 2083 | 5772.32 | 455.97 | 1367.9 | 6.53 | 181.54 | 5.53 | 357.2 | 4042 | 876665 | 3093 | 1287229 |
Brachypodium distachyon | Gene Number | Gene model | The average size of Gene | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number | The average size of Intron | The number of 5'-UTR | The total length of 5'-UTR | The number of 3'-UTR | The total length of 3’-UTR |
Total | 26552 | 31029 | 3580.54 | 425.96 | 1280.87 | 5.39 | 237.49 | 4.39 | 385.02 | 14174 | 1763666 | 24808 | 7639899 |
AS gene | 3201 | 7678 | 4806.08 | 437.67 | 1316.02 | 7.82 | 168.11 | 6.82 | 367.34 | 6716 | 866548 | 9294 | 3030001 |
1V (1 variant) | 23351 | 23351 | 3413.68 | 422.1 | 1269.31 | 4.59 | 276.31 | 3.59 | 396.06 | 7458 | 897118 | 15514 | 4609898 |
2V (2 variants) | 2304 | 4608 | 4745.38 | 449.03 | 1350.09 | 7.68 | 175.63 | 6.68 | 373.04 | 3583 | 459424 | 5433 | 1778574 |
3V (3 variants) | 644 | 1932 | 4960.34 | 443.49 | 1330.46 | 8.29 | 160.6 | 7.29 | 349.46 | 1795 | 229394 | 2382 | 765742 |
4V (4 variants) | 178 | 712 | 4716.65 | 404.72 | 1214.15 | 7.7 | 157.85 | 6.7 | 348.61 | 781 | 107905 | 904 | 301453 |
5V+ (5 and more variants) | 75 | 426 | 5558.56 | 343.56 | 1030.68 | 7.38 | 139.68 | 6.38 | 428.28 | 557 | 69825 | 575 | 184232 |
Table 2: Classification of annotated mRNA according to the number of mRNA variants into groups.
Species | Gene Number | Gene Models | AS gene Number | Percentage | The average size of Protein | The average size of CDS | Exon Number | The average size of Exon | Intron Number in CDS | Intron number in in 5'UTR | Intron Number in 3'UTR | Intron Number | The average size of Intron | The average size of Gene | The average size of 5'UTR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Populus trichocarpa | 40668 | 45033 | 3063 | 0.08 | 382.72 | 1151.17 | 4.98 | 231.16 | 3.98 0.14 | 0.07 | 4.19 | 352.53 | 2916.61 | 96.78 | 192.61 |
Medicago truncatula | 50962 | 53423 | 1797 | 0.04 | 243.21 | 732.62 | 3.41 | 214.77 | 2.41 0.11 | 0.06 | 2.58 | 430.47 | 2187.65 | 177.67 | 166.15 |
Glycine max | 46367 | 55788 | 6611 | 0.14 | 421.96 | 1268.87 | 5.93 | 213.82 | 4.93 0.17 | 0.08 | 5.18 | 427.24 | 3816.24 | 98.8 | 234.27 |
Arabidopsis thaliana | 27416 | 35386 | 5804 | 0.21 | 409.28 | 1230.85 | 5.57 | 220.91 | 4.57 0.21 | 0.07 | 4.86 | 164.86 | 2335.51 | 116.52 | 187.53 |
Gossypium raimondii | 37505 | 77267 | 15403 | 0.41 | 426.04 | 1281.12 | 6.29 | 203.66 | 5.29 0.31 | 0.22 | 5.82 | 348.92 | 3882.58 | 211.46 | 359.49 |
Theobroma cacao | 29452 | 44404 | 7630 | 0.26 | 437.26 | 1314.79 | 5.97 | 220.42 | 4.97 0.38 | 0.31 | 5.66 | 720.68 | 6058.62 | 254.26 | 407.6 |
Eucalyptus grandis | 36376 | 46315 | 6110 | 0.17 | 391.72 | 1178.15 | 5.18 | 227.6 | 4.18 0.17 | 0.13 | 4.47 | 433.39 | 3504.62 | 126.32 | 263.82 |
Aquilegia coerulea | 24823 | 41063 | 7487 | 0.3 | 435.41 | 1309.24 | 6.25 | 209.54 | 5.25 0.25 | 0.19 | 5.69 | 466.77 | 4424.45 | 156.48 | 305.08 |
Sorghum bicolor | 27608 | 29448 | 1489 | 0.05 | 418.55 | 1258.64 | 4.96 | 253.62 | 3.96 0.10 | 0.05 | 4.11 | 440.47 | 3298.84 | 70.51 | 160.91 |
Zea mays | 38914 | 63540 | 13137 | 0.34 | 331.67 | 998 | 4.21 | 236.86 | 3.21 0.32 | 0.26 | 3.79 | 710.91 | 4236.29 | 219.42 | 321.29 |
Oryza sativa | 55898 | 66338 | 6458 | 0.12 | 446.34 | 1342.01 | 4.4 | 304.86 | 3.40 0.18 | 0.12 | 3.7 | 401.36 | 3194.44 | 128.09 | 238.09 |
Brachypodium distachyon | 26552 | 31029 | 3201 | 0.12 | 425.82 | 1280.45 | 5.39 | 237.49 | 4.39 0.09 | 0.09 | 4.57 | 452.57 | 3652.46 | 56.84 | 246.22 |
Table 3: The correlative gene features of 12 plant species used in this study.
Generally, AS can be classified as IR, ES, AD and AA by the type of AS events. However, one AS gene variant might have multiple different AS types, simultaneously. This means that one AS gene can generate two, three, four, or more variants, all AS variants were classified into four groups on the basis of the number of AS variants: 2V (two variants), 3V (three variants), 4V (four variants), and 5V+ (five or more AS variants) (Table 2).
In total, 442,541 annotated genes from 589,034 mRNA products were obtained in the genome annotation of the 12 plant species (Figure 1 and Table 1). Our analysis identified 224,593 annotated mRNA variants. The events were distributed in 78,190 AS genes, which accounted for 17.67% of the total annotated genes (Table 1 and Supplementary Table 1). To eliminate the bias effect of the different species, we calculated the AS gene number and ratio (AS gene number/total gene number) for each species. We generated 3,063 genes in Populus (7.5%), 1,797 in Medicago (3.5%), 6,611 in soybean (14.3%), 5,804 in Arabidopsis (21.2%), 15,403 in cotton (41.1%), 7,630 in Theobroma (25.9%), 6,110 in Eucalyptus (16.8%),7,487 in Aquilegia (30.2%), 1,489 in sorghum (5.4%), 13,137 in maize (33.8%), 6,458 in rice (11.6%), and 3,201 in Brachypodium (12.6%) from the pooled data (Figure 1). A comparison of the AS variants among these species revealed that the AS gene ratios were higher in cotton and maize but lower in Populus, Medicago, and sorghum than in the other species (Figure 1 and Table 1).
Global analysis of gene structure
Gene structure analysis from a global perspective differentiates the architecture of plant and animal genes. In general, the composition of introns differs in plants and animals. Therefore, we evaluated the characteristics of gene structure to determine the putative differences in splicing among these plant species (Table 1).
Our analysis revealed that the composition of introns differed in dicots and monocots, especially in terms of the number of introns. The average number of introns in the different dicotyledonous plants ranged from approximately 2.5 to 5.8 (Table 1). This wide range indicated their divergence despite the close phylogenetic relationship. Interestingly, each gene contained an average of four introns in the monocotyledonous plants. These numbers were relatively stable (Table 1). However, the average length of each intron ranged from 0.16 kb to 0.72 kb and greatly differed across the different species. We also found that more than 85% of the plant introns were located in the coding regions (Table 1 and Table 3).
Unlike the composition of introns, the average number and length of exons were highly similar in plants with subtle differences. These differences determined the sizes of the coding regions and proteins (Table 1). The above mentioned exon analysis was consistent with the similar average size of proteins in multiple species (Table 3). This finding explained why the different average gene sizes of various plant species generated similar protein sizes. In addition, the average sizes of UTRs (both 5′UTR and 3′UTR) differed among plant species. The length of the 3′UTR was much longer than that of the 5′UTR, except in Medicago (Table 1 and Table 3).
To gain clues about the features of multiple AS variants in 4 variant groups, we checked the assembled raw numbers and frequencies of AS among the different plant species. Group 2V had the most abundant AS variants, followed by groups 3V, 4V, and 5V+. A high proportion of AS variants in group 2V were generated from each species: Populus, 76.4%; Medicago, 70.8%; soybean, 71.7%; Arabidopsis, 73.2%; cotton, 39.3%; Theobroma, 51.9%; Eucalyptus, 62.5%; Aquilegia, 47.9%; sorghum, 81.8%; maize, 55.0%; rice, 63.8%; and Brachypodium, 72.0% (Figure 2 and Table 2). It is noteworthy that almost more than 50% pre-mRNAs which underwent alternative splicing events produced two variants in plants, whereas the proportions decrease with the increasing numbers of variants in other groups (Figure 2).
Figure 2: Features of multiple AS variants in plants.
From outside to inside, each circle represents the gene model of multiple AS variants, the distribution of AS variants along annotated genes features, and the proportion of multiple AS variants in plants. See also the Supplemental section (for details including the gene structure, location, number, length, and number of AS events).
Examination of the characteristics of gene structure from different AS groups in plants showed that the average number of exons (ranged from 6.25 to 9.74) and introns (ranged from 4.65 to 9.64) in AS genes were more than the average number of the total annotated genes in each genome, but the average lengths of these exons and introns were slightly shorter (Figure 2 and Table 2). Additionally, the average sizes of UTRs were longer in the genes that generated multiple AS variants. These results implied the presence of a relationship between gene structure and AS. However, the total length of annotated genes did not significantly change (Table 2).
Distribution and frequency of AS variants
The above-mentioned analysis suggested that gene features such as the number and length of introns can influence AS variants. To further explore other genomic features, we performed correlation analysis to investigate the relationship between the AS variants and gene density of higher plants. The distributions of the AS variants in the four groups were pooled in 1 Mb contiguous regions across the chromosome where the genes are located (Figure 3). The analysis revealed extensive transcriptional activity in the chromosomes of the different plant genomes. The AS variants were principally located in the chromosome arms and were associated with gene distribution. As expected, the AS variants in group 2V that matched with multiple locations were predominantly mapped to the most transcribed chromosomal regions. These results were consistent with the high number of group 2V variants in all AS genes (Figure 3).
Figure 3: Statistics of multiple AS variants and AS distribution with gene density along chromosome length in plants.
Each vertical bar represents the frequency of an annotated gene in the plant genome. A schematic of each chromosome and its features is shown below. Different colors represent different groups on the basis of the number of AS variants. Red, two variants; deep blue, three variants; yellow, four variants; green, five or more variants.
A comparison of the AS variants to the annotated plant genome showed that most of the AS variants that matched the genome can be mapped to annotated genic regions in plant species. However, approximately 50% of the AS variants cannot be mapped to the right chromosome in Aquilegia (Figure 3). Two reasons may explain this discrepancy. First, the depth of coverage of the Aquilegia genome was lower than the gene location coverage. Second, information regarding the annotated genes in each particular chromosome was limited. Overall, the distribution of multiple AS variants significantly positively correlated with gene density. Similar correlation patterns were observed in all plant species.
Multiple AS variants exert different effects toward improving proteome diversity
AS variants improve gene representation at the transcriptional and proteomic levels. The overall proteome from all the pooled data sets contained 211,558 protein isoforms, whereas each AS gene generated 2.87 mRNA variants and 2.71 protein variants. The number of variants per AS gene was not significantly different among the different plant species (Figure 1 and Table 1). These data indicated that AS enhanced transcriptome plasticity and proteome diversity in higher plants.
To explore the effects of multiple AS variants on proteome diversity, we determined the number of protein isoforms that were generated by the AS variants in the four groups. Intriguingly, the proportion of the effective AS events to generate various proteins was the highest in 2V group, but not in 4V or 5V+. Within group 2V, approximately 30% AS genes generated only one protein product, and similar proportions were observed for all plant species, except in Medicago (Figure 4). By contrast, approximately half of the genes from group 3V generated three different proteins, whereas the remaining genes produced one or two proteins (except in Medicago and maize) (Figure 4). The proportion of effective AS events noticeably increased with the number of mRNA variants. The observed proportion was even higher in groups 4V and 5V+ (Figure 4). In brief, a high proportion of the annotated genes which generated the isofroms numbers less than mRNA variants were found, indicating that the effects of AS on the improvement of proteome diversity were lower than previously thought, especially in groups 3V, 4V, and 5V+. The AS variants in the different AS groups exerted distinct influences on proteome diversity (Figure 4).
Figure 4: Comparison of studies on proteome diversity with multiple AS variants. From outside to inside, each circle represents the classification of AS as defined by the number of AS variants.
In this study, the total number and ratio of AS were below those of AS in previous studies on some species. Among the multiexonic genes, approximately 42% to 61% of those in Arabidopsis, approximately 56% of those in maize, and approximately 63% of those in soybean underwent AS. Two possible reasons may explain this result. First, several mRNA variants occurred in specific developmental states might have triggered in response to environmental conditions. Second, the basic information of the annotated gene in some species would be limited. Therefore, alignment quality, low transcript abundance, and transcriptome coverage directly influenced the number of annotated AS genes.
We noticed that the features of multiple AS variants in 4 variant groups differed in most plants food and horticultural. With the increasing numbers of variants, the AS genes contained more but shorter exons and introns, implying AS might improve gene representation by changing the extent and complexity of gene structure (Figure 2 and Table 2). We speculated that these trends may be related to the evolution of various species from a long-term perspective.
Our analysis suggested that a high proportion of AS events might occur in the UTRs. We hypothesized that a large proportion of the AS events in the UTRs plays crucial roles in gene regulation [33,34]. Growing evidence has indicated that splicing is a regulatory mechanism for noncoding RNA [35]. In a recent study, we found a stress-induced AS event in the 5′UTR that influences intronic miRNA expression in Arabidopsis [21,36]. However, we also found a few introns in the UTRs (both 5′UTR and 3′UTR). Further investigation should be conducted to determine whether or not the AS variants are governed by the same regulators in the UTRs. In summary, we investigated the relationship between the number of AS mRNA variants and their effects on gene features at the genome-wide level through a comprehensive comparison of 12 plant species, as a valuable resource for functional research on food crops and horticultural crops. The present reanalysis and reclassification of previously reported RNA-seq data sets evaluated the global existence of AS variants in plants and suggested the need for future analyses on mRNA variants [37,38].
This work was supported by the National Natural Science Foundation (grant numbers 31570271 and 31870234).
Ying Li and Kang Yan performed the research and wrote the manuscript. Shizhong Zhang, Qianhuan Guo and Chengchao Zheng conceived of the study, and participated in its design and contributed to revisions of the manuscript. All authors approved the final version of the manuscript.
Citation: Li Y, Guo Q, Zheng C, Zhang S, Yan K (2019) Genome-Wide Analysis of mRNA Splicing vaRiants in Higher Plants. J Hortic 6:255. doi: 10.35248/2376-0354.19.06.255
Received: 06-Apr-2019 Accepted: 24-Apr-2019 Published: 03-May-2019
Copyright: © 2019 Li Y, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.