ISSN: 0974-276X
Research Article - (2017) Volume 10, Issue 12
Proteases belong to the group of hydrolases which tend to break the chemical bond joining two amino acids together. Chymotrypsin the first serine protease to be discovered by the scientists in our pancreas revolutionized their study both in the living system and their applications in the industry. Computational tools and techniques to analyse and identify proteases from organisms inhabiting extreme of habitats has opened avenues to study as to what contributes sequentially and structurally to whithstand extreme of pH or temperature. Keeping this in view sixteen amino acid sequnces of serine proteases from mesophilic, thermophilic, hyperthermophilic and psychrophilic organisms were critically analyzed to identify the variation in the physiochemical properties and their amino acids which are responsible in making them to adapt in various extreme conditions. Physiochemical properties and their analysis showed negatively charged residues (Asp+Glu) to be stastically significant contributing for the stability of proteases. Multiple sequence alignment of the amino acid sequences of serine proteases showed catalytic triad (Asp-130; His-163 and Ser- 315) to be conserved in all the four groups. Amino acids Ala (A), Arg (R), Asn (N), Asp (D), Cys (C), Gly (G), Phe (F), Tyr (Y) and Val (V) were found to be stastically significant. Cysteine (C) was exceptionally high in the psychrophilic serine proteases in comparison to their counterpart. Phylogenetic analysis using Neighbour Joining (NJ) method distinguished thermophilic, mesophilic, hyperthermophilic and psychrophilic serine proteases into their respected groups.
Keywords: Proteases; Thermophiles; Mesophiles; Hyperthermophiles; Psychrophiles; Physiochemical properties, Amino acids
Serine proteases are the most studied class of proteases having a histidine, aspartic acid and serine residue at the catalytic center. Microbial serine proteases have attracted growing interest in the last decade because they find applications mainly in leather tanning, detergent formulation and diagnostics [1-4]. Keeping in view the wider acceptability and high industrial demand, serine proteases have drawn interest of the researchers and efforts are being made to either look for novel proteases [5] or tailor these proteins which can withstand extremes of pH and temperature [6]. Although conventional methods which involves isolation of microbes and their screening for desired products are quite popular and largely followed in industrial microbiology yet are time consuming, tedious and cost intensive [7,8]. Newer tools and techniques in computational biology have led to generate sufficient data available in the biological databases which have opened new oppturnities for the researches to analyze various attributes of the proteins responsible for their extreme stability at different pH and temperatures [9,10]. A comparative study of important properties and variation in amino acids of proteins thriving at extreme conditions using traditional in vitro approaches is an expensive venture. Advances in computational biology and bioinformatics have opened new vistas in molecular sciences to analyze and compare gene and protein sequences data to deduce and predict site specific amino acids or motifs or domains of proteins responsible for their stability under extremes of temperature, pH, salt or pressure and organic solvent concentration [11-13]. Although some information on serine proteases of microbes from various environments is there yet an overall comparison of psychrophilic, mesophilic, thermophilic and hyperthermophilic proteases till date has not been carried out [14,15]. Some important physiochemical properties e.g. molecular mass, theoretical pI, amino acid composition, negative and positive charged residues, extinction coefficients, instability index, grand average hydropathicity of enzymes immensely influence their applications and need to be carefully studied. Besides to these properties variation in the total count of amino acids has been found to play a significant role in stability, selectivity and reactivity of the enzymes [11,16,17]. In view of the above a systematic comparative in silico analysis of amino acid sequences and physiochemical properties of psychrophilic, mesophilic, thermophilic and hyperthermophilic microbial serine proteases has been undertaken and the observations will be useful for predicting the behavior of a given serine protease as mesophilic or thermophilic or psychrophilic in terms of its temperature stability is reported in this communication.
Data collection and tools
The amino acid sequences of some microbial serine proteases from thermophiles, hyperthermophiles, mesophiles and psychrophiles were retrieved from NCBI (http://www.ncbi.nlm.nih.gov/protein), UniProt proteomic server (http://www.expasy.org/), and MEROPS database (http://www.merops.sanger.ac.uk) were downloaded in fasta format. ProtParam tool (http://expasy.org/tools/protparam.html) available on ExPASy proteomic server, was used for comparison of various physiochemical parameters among the different serine proteases.
To identify and highlight the conserved catalytic triad in the amino acid sequence of proteases, multiple sequence alignment of various organisms were performed using clustal omega and phylogenetic tree was generated.
Statistical analysis
An analysis of variance (ANOVA) was used to calculate different physiochemical parameter for each study with the statistical packages ‘Assistat version-7.7 beta 2016’. F-tests were applied to determine the statistical significance. Tukey test was applied for all significant effects over the pairwise comparison of mean responses.
Computational analysis of physiochemical parameters of various proteases
In the present study comparison of some important physiochemical parameters of various groups of serine proteases has been done and significant differences are recorded. Overall analysis revealed only negatively charged residues (Asp + Glu) to be statistically significant among all the groups of serine proteases (Tables 1 and 2). Individual comparison among the various group of serine proteases found negatively charged residues (Asp+Glu) to be statisctically significantly and higher in case of mesophiles (1.61 fold) in comparison to thermophiles. On the other hand aliphatic index which is defined as the volume occupied by the aliphatic amino acids in proteins was found to be significantly higher (1.05 fold) in thermophiles. When compared molecular weight and negatively charged residues were found to be higher (1.16 and 1.35 fold) in mesophiles as compare to hyperthermophiles. Aliphatic index was higher in case of hyperthermophiles (1.16) in comparison to mesophiles. Mesophilic and psychrophilic proteases too showed some significant difference with molecular weight (1.12 fold) of the psychrophilic proteases higher in comparison to mesophiles whereas, positively charged residues and theoretical pI were 1.16 and 1.24 fold higher in mesophiles as when compared with psychrophiles. The instability index which estimates the stability of the protein in a test tube was alone found significantly higher (1.34 fold) in thermophiles in comparison to hyperthermophiles. Significant difference was observed for the negatively charged residues (Asp+Glu) which were higher in psychrophiles as compared with thermophiles (1.32 fold) and statistically significant aliphatic index (1.19 fold) higher in hyperthermophilic proteases in comparison to psychrophilic proteases.
Sr. No. | Accession number (UniProtKB/ MEROPS) |
Microorganisms |
---|---|---|
Thermophiles | ||
1. | Q9AER6 | Thermoanaerobacter yonseii |
2. | P41363 | Bacillus halodurans |
3. | P08594 | Thermus aquaticus |
4. | P80146 | Thermus sp. (strain Rt41A) |
Mesophiles | ||
1. | P30199 | Staphylococcus epidermidis |
2. | Q8KH46 | Enterococcus faecalis |
3. | H2JJ14 | Clostridium sp. BNL1100 |
4. | MER016986 | Streptococcus mutans |
Hyperthermophiles | ||
1. | F4HL71 | Pyrococcus sp. NA2 |
2. | Q5JIZ5 | Thermococcus kodakarensis ATCC BAA-918 |
3. | G0EG32 | Pyrolobus fumarii |
4. | B8D5T9 | Desulfurococcus kamchaatkensis |
Psychrophiles | ||
1. | B8CU08 | Shewanella piezotolerans |
2. | K4M7H8 | Methanolobus psychrophilus R15 |
3. | Q480E3 | Colwellia psychrerythraea ATCC BAA-681 |
4. | Q8GB52 | Vibrio sp. PA-44 |
Table 1: Sources of some microbial proteases from various environmental conditions and their accession number.
Parameters | Microorganisms | Significance | ||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
Number of amino acids | Thermophiles | 412.0 | 361.0 | 513.0 | 410.0 | ns |
Mesophiles | 461.0 | 412.0 | 564.0 | 447.0 | ||
Hyperthermophiles | 422.0 | 663.0 | 401.0 | 411.0 | ||
Psychrophiles | 608.0 | 529.0 | 789.0 | 530.0 | ||
Molecular weight (Da) | Thermophiles | 44503.2 | 38115.8 | 53913 | 42876.4 | ns |
Mesophiles | 51813.9 | 45570.2 | 59331.1 | 49196.3 | ||
Hyperthermophiles | 44986.0 | 70955.1 | 42709.8 | 44143.0 | ||
Psychrophiles | 61541.0 | 55101.8 | 80857.1 | 55682.5 | ||
Theoretical pI | Thermophiles | 9.2 | 6.6 | 6.9 | 6.2 | ns |
Mesophiles | 9.4 | 4.9 | 5.2 | 4.9 | ||
Hyperthermophiles | 5.3 | 4.8 | 9.0 | 5.2 | ||
Psychrophiles | 4.7 | 4.9 | 4.4 | 4.6 | ||
Negatively charged residues (Asp + Glu) | Thermophiles | 40.0 | 29.0 | 35.0 | 30.0 | * |
Mesophiles | 56.0 | 57.0 | 56.0 | 47.0 | ||
Hyperthermophiles | 40.0 | 68.0 | 34.0 | 37.0 | ||
Psychrophiles | 59.0 | 51.0 | 80 | 48.0 | ||
Positively charged residues (Arg + Lys) | Thermophiles | 49.0 | 27.0 | 35.0 | 27.0 | ns |
Mesophiles | 75.0 | 43.0 | 45.0 | 34.0 | ||
Hyperthermophiles | 33.0 | 46 | 43.0 | 30.0 | ||
Psychrophiles | 35.0 | 36.0 | 44.0 | 31.0 | ||
Extinction coefficients (M-1cm-1) at 280 nm |
Thermophiles | 45965 | 30370 | 109585 | 56060 | ns |
Mesophiles | 49405 | 33810 | 57300 | 60740 | ||
Hyperthermophiles | 81835 | 123540 | 55030 | 79315 | ||
Psychrophiles | 44975 | 63050 | 78325 | 54945 | ||
Instability Index | Thermophiles | 31.24 | 29.93 | 34.86 | 28.35 | ns |
Mesophiles | 23.67 | 28.57 | 22.52 | 32.65 | ||
Hyperthermophiles | 20.33 | 18.1 | 30.02 | 23.82 | ||
Psychrophiles | 22.79 | 24.68 | 30.32 | 40.18 | ||
Aliphatic Index | Thermophiles | 95.17 | 90.8 | 73.68 | 90.98 | ns |
Mesophiles | 80.3 | 90.87 | 81.15 | 80.94 | ||
Hyperthermophiles | 98.08 | 81.21 | 93.42 | 100.78 | ||
Psychrophiles | 73.45 | 83.53 | 76.92 | 79.09 | ||
Grand average of hydropathicity (GRAVY) | Thermophiles | -0.121 | -0.111 | -0.121 | 0.054 | -------- |
Mesophiles | -0.683 | -0.333 | -0.165 | -0.456 | ||
Hyperthermophiles | 0.113 | -0.186 | -0.029 | 0.155 | ||
Psychrophiles | -0.02 | -0.013 | -0.115 | -0.181 |
Thermophiles: 1) Thermoanaerobacter yonseii 2) Bacillus halodurans 3) Thermus aquaticus 4) Thermus sp. (strain Rt41A)
Mesophiles: 1) Staphylococcus epidermidis 2) Enterococcus faecalis 3) Clostridium sp. BNL1100 4) Streptococcus mutans
Hyperthermophiles: 1) Pyrococcus sp. NA2 2) Thermococcus onnurineus 3) Pyrolobus fumarii 4) Desulfurococcus kamchaatkensis
Psychrophiles: 1) Shewanella piezotolerans 2) Methanolobus psychrophilus R15 3) Psychroflexus gondwanensis 4) Vibrio sp. PA-44
Table 2: Physiochemical parameters of various microorganisms calculated using ProtParam tool at ExPASy proteomic server.
Computational analysis of twenty amino acid of bacterial proteases
Overall comparison of amino acids for various serine proteases exhibited amino acids Ala (A), Arg (R), Asn (N), Asp (D), Cys (C), Gly (G), Phe (F), Tyr (Y) and Val (V) to be statistically significant (Table 3). Comparative analysis between mesophilic and thermophilic serine proteases revealed Ala (1.70) Gly (1.30), Pro (1.8), Arg (1.2) and Val (2.2) to be statistically significant in case of thermophiles whereas, Asp (1.6 fold) was significantly higher in mesophiles. A significant difference and higher the number of Ala (A), Arg (R), Gly (G) and Val (V) (1.5, 2.0, 1.4 and 1.8 fold) were found in case of hyperthermophiles as when compared with mesophiles having more number of Asn (N) and Phe (F) (2.2 & 1.3 fold). The amino acid residues Cys (C), Gly (G) and Val (V) were found to be significantly higher with 9.5;1.5 and 1.28 fold in psychrophilic serine proteases whereas, Glu (E), Ile (I) and Phe (F) were significantly higher with 1.7, 1.5 and 1.3 fold respectively in mesophilic bacteria Fink.
Amino acid composition | Microorganisms | Significance | ||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
Ala (A) | Thermophiles | 8.7 | 11.6 | 12.5 | 13.9 | * |
Mesophiles | 4.6 | 5.8 | 10.1 | 6.5 | ||
Hyperthermophiles | 10.2 | 10.0 | 11.2 | 9.5 | ||
Psychrophiles | 14.1 | 10.8 | 10.8 | 7.9 | ||
Arg (R) | Thermophiles | 2.9 | 3.9 | 5.3 | 4.6 | * |
Mesophiles | 1.7 | 2.7 | 1.1 | 1.8 | ||
Hyperthermophiles | 4.5 | 1.1 | 4.0 | 4.1 | ||
Psychrophiles | 2.5 | 1.1 | 1.6 | 3.4 | ||
Asn (N) | Thermophiles | 6.1 | 7.8 | 4.1 | 3.9 | * |
Mesophiles | 10.2 | 8.7 | 5.3 | 10.5 | ||
Hyperthermophiles | 4.5 | 5.3 | 4.0 | 4.9 | ||
Psychrophiles | 6.9 | 5.7 | 5.8 | 6.6 | ||
Asp (D) | Thermophiles | 5.3 | 2.5 | 4.3 | 4.4 | ** |
Mesophiles | 6.5 | 8.3 | 6.9 | 6.5 | ||
Hyperthermophiles | 5.9 | 7.7 | 5.2 | 6.1 | ||
Psychrophiles | 6.9 | 6.4 | 7.1 | 7.2 | ||
Cys (C) | Thermophiles | 0.5 | 0.0 | 1.4 | 1.2 | * |
Mesophiles | 0.4 | 0.0 | 0.0 | 0.2 | ||
Hyperthermophiles | 0.5 | 0.0 | 1.2 | 0.5 | ||
Psychrophiles | 1.6 | 0.9 | 1.3 | 1.9 | ||
Gln (Q) | Thermophiles | 1.5 | 3.6 | 3.1 | 3.9 | ns |
Mesophiles | 2.8 | 1.7 | 2.7 | 5.8 | ||
Hyperthermophiles | 1.9 | 3.6 | 2.7 | 1.5 | ||
Psychrophiles | 1.6 | 1.7 | 2.9 | 5.7 | ||
Glu (E) | Thermophiles | 4.4 | 5.5 | 2.5 | 2.9 | ns |
Mesophiles | 5.6 | 5.6 | 3.0 | 4.0 | ||
Hyperthermophiles | 3.6 | 2.6 | 3.2 | 2.9 | ||
Psychrophiles | 2.8 | 3.2 | 3.0 | 1.9 | ||
Gly (G) | Thermophiles | 9.0 | 9.1 | 12.1 | 10.0 | ** |
Mesophiles | 6.9 | 7.0 | 8.3 | 8.3 | ||
Hyperthermophiles | 11.4 | 10.3 | 10.2 | 10.0 | ||
Psychrophiles | 13.5 | 9.8 | 12.0 | 10.8 | ||
His (H) | Thermophiles | 1.5 | 2.8 | 1.2 | 1.7 | ns |
Mesophiles | 1.1 | 1.2 | 1.2 | 1.3 | ||
Hyperthermophiles | 1.2 | 1.5 | 1.5 | 1.0 | ||
Psychrophiles | 1.8 | 1.3 | 1.3 | 0.9 | ||
Ile (I) | Thermophiles | 9.2 | 6.4 | 2.7 | 3.4 | ns |
Mesophiles | 5.9 | 9.7 | 6.4 | 8.5 | ||
Hyperthermophiles | 5.2 | 5.4 | 6.5 | 7.1 | ||
Psychrophiles | 5.1 | 6.0 | 4.6 | 4.2 | ||
Leu (L) | Thermophiles | 7.5 | 6.9 | 7.6 | 10.0 | ns |
Mesophiles | 8.2 | 7.8 | 6.6 | 6.3 | ||
Hyperthermophiles | 6.4 | 6.3 | 7.7 | 8 | ||
Psychrophiles | 4.6 | 5.9 | 6.1 | 7.9 | ||
Lys (K) | Thermophiles | 9.0 | 3.6 | 1.6 | 2.0 | ns |
Mesophiles | 14.5 | 7.8 | 6.9 | 5.8 | ||
Hyperthermophiles | 3.3 | 5.9 | 6.7 | 3.2 | ||
Psychrophiles | 3.3 | 5.7 | 3.9 | 2.5 | ||
Met (M) | Thermophiles | 1.5 | 1.9 | 1.4 | 1.2 | ns |
Mesophiles | 1.7 | 2.2 | 0.5 | 1.3 | ||
Hyperthermophiles | 1.4 | 1.7 | 1.2 | 1.7 |
Thermophiles: 1) Thermoanaerobacter yonseii 2) Bacillus halodurans 3) Thermus aquaticus 4) Thermus sp. (strain Rt41A)
Mesophiles: 1) Staphylococcus epidermidis 2) Enterococcus faecalis 3) Clostridium sp.BNL1100 4) Streptococcus mutans
Hyperthermophiles: 1) Pyrococcus sp. NA2 2) Thermococcus onnurineus 3) Pyrolobus fumarii 4) Desulfurococcus kamchaatkensis
Psychrophiles: 1) Shewanella piezotolerans 2) Methanolobus psychrophilus R15 3) Psychroflexus gondwanensis 4) Vibrio sp. PA-44
** Significant at a level of 1 % of probability (P<0.01)
* Significant at a level of 5 % of probability (0.01 ≤ P<0.05)
ns non-significant (P ≥ 0.05)
Table 3: Comparative analysis of amino acid residues in thermophiles, mesophiles, hyperthermophiles and psychrophiles.
Multiple sequence alignment and phylogenetic analysis
Multiple sequence alignment (MSA) showed the presence of conserved catalytic triad of D-130, H-163 and S-315 (Figures 1 and 2) which is responsible for the catalytic activity in serine proteases. Phylogram was generated using Neighbor Joining method to study the evolutionary relationship among the bacteria for the four groups of serine proteases.
Figure 1: Multiple sequence alignment (MSA) of bacterial amino acid sequences of serine proteases from thermophilic, mesophilic, hyperthermophilic and psychrophilic microorganisms with their catalytic triad of D-130, H-163 and S-315.
Figure 2: Phylogenetic tree of bacterial serine protease sequences from thermophilic, mesophilic, hyperthermophilic and psychrophilic organisms constructed by NJ-method of CLC workbench software.
Looking into the fundamentals of protein stability, discovering enzymes bearing extreme of temperature and pressure has led to many practical applications in the industry and for the scientific community. Understanding how these enzymes achieve the ability to bear extreme of conditions could lead to design proteins with better selectivity, reactivity and stability. The four groups of proteases i.e. mesophilic, thermophilic, hyperthermophillic and psychrophillic serine proteases amino acid sequences were distinguished using the sequencing and statistical methods. Analysis of physiochemical properties and amino acid compositions of different groups of serine proteases revealed a clearcut segregation as to what makes proteins to work at extreme of temperature. Detailed comparative and statistical analyses confirmed the separation of the mesophiles from the three classes i.e. psychrophiles, thermophiles and hyperthermophiles in terms of the amino acids usage. Keeping in view the broad applications of serine proteases in the industries which have have drawn a considerable interest of the researchers to engineer and produce the proteases with better stability and selectivity [6,18] which indeed will be useful in economic and environmental benefits [19-21]. The diversity in twenty amino acids and their combinations make the proteins to differ in their physicochemical properties as well as substrate specificity [11,18,22]. The predominance of alanine (A) and proline (P) have less surface nonpolar area exposed in both thermostable and hyperthermostable proteases making them to be buried in the core [23]. Glycine (G) and Valine (V) are responsible for compact core packing and functional regulation [24,25]. The hydrophobic core is very necessary for folding and stability so more the hydrophobic interactions more stable are proteins i.e. these attain higher thermostability [26]. Another important amino acid proline (P) which was higher in thermophilic proteases provides rigidity and reduces the free energy of the main chain [27]. Proline is said to be highly prevalent in thermophilic proteins because of its side chain having distinctive cyclic structure that locks its backbone and leads to an exceptional conformational rigidity in the turns and loops [28]. Cysteine (C) content was exceptionally high with 9.5 fold in psychrophilic proteases as compared to its hyperthermophilic, thermophilic and mesophilic counterparts. Cysteine (C) tend to provide flexibility and are capable of making cavities in the core of the psychrophilic protein structure [29,30] which imparts extra stability to psychrophilic proteins. Cysteine residues also play a dual role by both increasing thermostability by forming disulphide bridges and decreasing thermostability when available in free form as it is highly sensitive to oxidation at elevated temperature [31]. Keeping this in view the trend observed in the present study shows with maximum frequency of Cys (C) to occur in psychrophilic proteases in comparion to its counterparts. This natural or any changes made through mutagenesis under controlled temperature conditions can lead to tailor proteases which could be a big boon for the food industry and human mankind.
The presence of Ala (A), Gly (G), Pro (P), Arg (R) and Val (V) in thermophiles and Asp (D) in mesophiles clearly discriminates the thermophiles from the mesophiles. The amino acid residues Ala (A), Arg (R), Gly (G) and Val (V) were significantly higher in hyperthermophiles and Asn (N) and Ser (S) in mesophillic bacteria demarcate the mesophiles from hyperthermophiles. Similarly, the presence of exceptionally high Cys (C), in psychrophiles differentiates them from their counterpart. The results of the present study will indeed be of great help to understand the role of amino acids especially cysteine to develop practical stratagies in engineering serine proteases and their potential use in different industries, their role in biological and in bioremediation processes.
The authors declare that they have no conflict of interests.
The authors are thankful to the Department of Biotechnology (DBT), New Delhi for the continuous support to the Bioinformatics Centre, Himachal Pradesh University, Summer Hill, Shimla, India.