Bioinformatics Relevance in Biotechnology

Sai YRKM; Siva Kishore N; Dattatreya A; An; SY

doi:10.4172/jpb.1000205

Review Article - (2011) Volume 4, Issue 12

View PDF Download PDF

Bioinformatics Relevance in Biotechnology

Sai YRKM¹^*, Siva Kishore N²^#, Dattatreya A³ and Anand SY¹: ¹Department of Biochemistry, GITAM Institute of Sciences, GITAM University, Visakhapatnam, Andhra Pradesh, India; ²Department of Bioinformatics, GITAM Institute of Sciences, GITAM University, Visakhapatnam, Andhra Pradesh, India; ³Department of Microbiology, GITAM Institute of Sciences, GITAM University, Visakhapatnam, Andhra Pradesh, India; ^#Contributed equally to this work

^*Corresponding Author: Sai YRKM, GITAM Institute of Sciences, GITAM University, Visakhapatnam, A.P, India, Tel: +91-9160066147

Keywords: Bioinformatics; Bio-availability; Biotechnology; Proteomics; in silico Analysis; Proteomics analysis

Bioinformatics Relevance in Biology

As we come across many definitions of bioavailability, what we finally understood is, Bioavailability means when the amount of something administered into individuals body, the amount of it appeared in the flow of blood is said to be bioavailability of the thing. This review will be dealing with both the types of bioavailability and bio-accessibility, Bio-accessibility is a concept related to bioavailability in the context of biodegradation and environmental pollution. A molecule is said to be bioavailable when “it is available to cross an organism’s cellular membrane from the environment, if the organism has access to the chemical”[1]. In present day studies the presence of contaminants in small, bioavailable quantities has generated concerns about health threats resulting from accumulation of potential toxins in the food chain [2]. and also in mining industries, where bioaccessibility tests have not yet been conducted on those materials, which is essential for better health risk estimates [3].

Bio-accessibility is the availability of compounds other than drugs and other than in blood, and biologically (not only confined to serological studies). Some metals and chemicals as Biomaterials like Hydroxyapatite, Zirconia, Alumina [4], in turn the availability of Hydroxyapatite from garden snail shell Helix Aspersa [5] is to be considered as bio accessibility of hydroxyapatite. Preceding to this Pure titanium and titanium alloys are materials widely used in orthopedics and dental surgery [6] where the availability of titanium in the body should be taken into consideration for the study of its bioavailability, these are all the aspects of the biology dealing with bio-accessibility.

Further studies go beyond use of biological materials in the synthesis of the products, in in vitro studies. For instance, Glycine betaine can be synthesized from glycine betaine biosynthesis genes GbsA and GbsB of Bacillus Subtilis [7], similarly the in silico analysis of many other proteins or enzymes [8] have lead to the relative study going in an exclusive path to correlate or interfere with bioinformatics. Sequence analysis studies leads to the relative study development. Disease and drugs can modulate the concentrations of hundreds of proteins in the blood which can be accurately measured using contemporary in silico analysis (proteomic methods) [9], these in silico analytical studies might help in understanding of biosynthetic pathways, for instance have been applies to thienamycin and other highly substituted carbapenems [10]. In parallel to the above study, qualitative proteomics is also been emerging for annotation of novel and uncharacterized proteins, this is an integration of applied bioinformatics [11]. In the progressive evolution of biology and Biotechnology, metabolic profiling parallels techniques measuring changes in gene expression such as 2-D gel [12] and mass spectrometric protein profiling (a novel tool for the analysis of protein expression patterns in humans and also in desirable subjects) [13,14] were developed, where it is important to investigate the reproducibility of raw mass spectrometry (MS) features of abundance, such as spectral count, peptide number and ion intensity values. When conducting replicate mass spectrometry measurements [15], we can assess the efficacy of trypsin-immobilization techniques by using Magnetic mesospores that can also be demonstrated by the mass spectral analysis [16], serial analysis of gene expression [17], and cDNA microarray analysis [18], have made us think at the relevancy of development of tools for the sequence analysis for further studies of bioaccessibility of compounds. Where, GLUT-1 overexpression correlates with an aggressive phenotype (kras mutations) of lung carcinoma, which was shown by microarray and serial analysis of gene expression correspondingly [19].

These days in a broader spectrum, each and every study is being developed in relative to bioinformatics with up gradation in its technology and database, some of the tools will be discussed further in this paper for a brief note to relate the study of Bioaccessibility to Bioavailability to Biotechnology and finally to Bioinformatics. Most of the applications lie in biotechnology and biochemistry with a touch of physical chemistry and informatics along with a keen reliance on databases. Homology modeling is one of the formulistic models in Bioinformatics, the 3D structure of rat cathepsin L was constructed through homology modeling using the X-ray structure of procathepsin L from Homo sapiens (PDB code: 1CS8). The homology modeling was done by using the MODELLER 9v2 software. The final model obtained by molecular mechanics and dynamics method and was assessed by PROCHECK and VERIFY 3D graph, which showed that the final refined model is reliable. The model could be further explored for characterizing the protein [20]. By performing homology modeling we have also developed three-dimensional (3D) structural models of the M5 muscarinic acetylcholine receptor (mAChR) and two complexes for M5 mAChR binding with antagonists SVT-40776 and solifenacin [21] which shows the appliance of bioinformatic tools in Biotehnology. Meanwhile study was also done on LEA proteins which are ubiquitous among photosynthetic organisms and have been reported in mono- and dicot plants as well as in nematodes, yeast, bacteria and cyanobacteria. EMV2 is a Group 1 LEA protein isolated from Vigna radiata, which is speculated to impart desiccation tolerance in plants. The homology model of this protein was generated by using the LOOPP software based on available structural homologues in protein databases. The final model obtained by molecular mechanics and dynamics method was assessed by PROCHECK [22].

Some Computational tools to predict the functional role of mRNAs targeted by miRNA in colon cancer genes have also been developed based on the miRNA’s and SNP’s. We have presented a method which allows the use of PupaSuite, UTRscan and miRBase as a pipeline for the prediction of miRNA and their target, and evaluated the functional role of mRNA in colon cancer [23]. Magnetic bead-based purification (ClinProt system) followed by matrix-assisted laser desorption/ ionization time of flight mass spectrometry (MALDI-TOF-MS) to profile human tear proteins [24] has been developed. Biophysical, biochemical and classical immunological methods have proven very valuable in studying carbohydrate- carbohydrate and carbohydrateprotein interactions [25]. Glycans profiling and sequencing with MALDI-MS, MALDI-FTMS, CID MS/MS and MALDI-TOF-MS are developed for high throughput structural characterization of glycomes [25] as if for instance to clear out the questions regarding we go to an example study such as the association of MALDI with that of linear Ion trap mass spectrometry have automated measurements [26]. Human fetal liver studied using a 2-DE step followed by western blotting detection and MS identification. The possible phosphorylation sites were further predicted using Netphos, ScanProsite and Scansite programs and most proteins were predicted the same site by at least 2 programs [27].

Imaging mass spectrometry (IMS) is an emerging technology, pioneered by Prof. Richard Caprioli’s group starting more than a decade ago. With two different concepts of automated matrix deposition on Murine brain sections and discussed their different features and capabilities in IMS [28].

Bioinformatics Help in Biotechnology

Bioinformatics is the discipline of science in which biology, computer science, and information technology merge to form a single restraint. The ultimate goal of the field is to enable the unearthing of new biological imminents’ as well as to generate a global perspective from which merging principles in biology can be discerned.

The rationale for applying computational approaches to facilitate the understanding of various biological processes includes:

1. A more global perspective in experimental design.

2. The ability to capitalize on the emerging technology of database-mining - the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms.

Bioinformatics is applicable in biotechnology in many ways, like in genomics, proteomics-proteomics is a powerful methodology to investigate protein expression in cells, tissues, organs or whole organisms [29], cheminformatics, drug designing, molecular phylogenies, drug modification, genome mapping, protein modeling.

Applications

Lung cancer (LC) is one of the most common causes of cancer deaths throughout the world, gene expression profiling has been successfully used to classify various tumors and assess tumor stages [30]. Primer designing for cold induced gene, DREB1A is done using Primer3 software [31]. Alzheimer’s disease is a progressive neurodegenerative disorder characterized by deposition of amyloid plaques composed of aggregated amyloid beta plaques, and neurofibrillary tangles composed of hyperphosphorylated tau that leads to synaptic defects resulting in neurotic dystrophy and neuronal death. Retrieving data from various biological databases available online it was found that there are 74 genes that may cause Alzheimer’s disease, out of which 74 proteins that are likely to be involved with the diseases are evaluated by using ClustalW and phylogenetic tree analysis [32]. Twodimensional gel electrophoresis can retrieve information regarding thousands of different proteins from a crude protein sample. A web server for the analysis and comparison of 2D gels using bioinformatics toolshas been developed, [33]. Functional analysis and interpretation of large-scale proteomics and gene expression data require effective use of bioinformatics tools and public knowledge resources coupled with expert-guided examination [34]. The proper theoretical description of the distribution of the node degree for yeast protein-protein interaction network was investigated to deal with the observed discrepancy between usually proposed models and the existing data [35].

Lectin microarray is an emerging technique enabling multiplex glycan profiling in a direct, rapid and sensitive manner. So far, there has been no robust system available for efficient data-mining to realize differential profiling, which is an effective approach to biomarker investigation. Data obtained from the respective study was processed by the microarray system using a max-normalization procedure after a gain-merging process, followed by principal component analysis [36]. During Human Immunodeficiency Virus infection interactions take place between host and the pathogen. This interaction mainly determines the efficiency of viral infection and the disease progression. The theoretical structure of VpR is generated using Modeller9v1, a program for comparative modeling of protein using special restraints. This theoretical structure believes to paves the way for the novel lead synthesis [37]. Data mining approach was used to generate association rules for predicting average flexibility from the various derived sequence and structural features. 21 parameters were calculated and their variable importance was calculated for 115 sequences of AGC kinase family belonging to mouse and human using Classification and Regression Tree (CART). Beta turns were found to have maximum influence on average flexibility while the total beta strands were found to exert minimum impact on average flexibility [38].

The role of several proteins that are likely to be involved in hypertriglyceridemia by employing multiple sequence alignment using ClustalW tool and constructed a phylogenic tree using functional protein sequences extracted from NCBI. The phylogeny tree was constructed with Neighbor Joining Algorithm using bioinformatic principles and applications [39]. Proteolytic Enzyme and Physicochemical properties of Proteolytic Enzyme including, enzyme’s class, source, EC, molecular weight, N-terminal, C-terminal ,thiols, activators, inhibitors, bond specificity of proteins like serine proteinase, cysteine proteinase, aspartic proteinase and metalloproteinase are curated in a single database which is an application of bioinformatics [40]. The discovery of short interfering RNA has admitted the development of facile regulated methods for disruption of gene expression. However, this method continues to grow in popularity, designing effective siRNA can be demanding. A siRNA selection program that automatically selects Small Interfering RNA from the given RNA sequences was designed. siRNA Scanner uses a fuzzy logic-based system to calculate siRNA qualities. This program is fully built in Practical Extract Report Language (PERL5.8.8.6 Build 820) and accessible in a command line interface. siRNA Scanner’s high performance, minimal user interaction, and its fast algorithm, make this program useful for selecting Small Interfering RNA for gene expression studies [41]. Virtual screening by molecular docking has become a largely used approach to lead discovery in the pharmaceutical industry when a high-resolution structure of the biological target of interest is available. For this tools like Arguslab, Autodock and FlexX are developed [42]. The study and use of protein-protein interactions can be done by PIANA, a proteinprotein interaction software framework under the GNU Public License (http://sbi.imim.es/piana) [43].

Mutational biasing and translational selection are also the important factors for predicting the appropriate biasing, which can be analyzed through Nc plot and correspondence analysis [44]. The computational tool was used to model the RNA secondary structure of nine different strains of Influenza A virus. The thermodynamic free energy ranges between -222.90 to -251.10 Kcal/mol of the NS which may provide new insight to understand the evolutionary stability and pathogenesis of Influenza virus [45]. 181 various plant proteins of 18 different medicinal plants using molecular modeling techniques (Indian medicinal plants) and made into a dataset IMPPDS. The models are constructed using MODELLER 9v2, with careful manual pair wise alignment [46]. DataBiNS-Viz – a visualization and exploration environment for nonsynonymous coding single nucleotide polymorphisms (nsSNPs) data gathered by the BioMoby-based DataBiNS workflow. DataBiNSViz enables execution of the DataBiNS workflow on proteins described by KEGG, PubMed, or OMIM identifiers, followed by manual exploration of the integrated structure/function and pathway data for those proteins, with a particular focus on nsSNP data in-context [47]. Transcriptomic and proteomic technologies are vogues in analyzing biological entities. The integrality of their probe sets or searching databases is the prerequisite of full identification, which could be estimated by their coverage over genome [48].

Today it is possible to perform (using heuristic algorithms) 80% accurate searches perhaps 90 - 95% accuracy from the leading software systems were these algorithms form the efficient algorithms for reconstructing gene content by co-evolution [49], meanwhile the hyper- heuristics raise the generality of search methodologies by manipulating a set of Low Level Heuristics [50]. The whole area of biology, biotechnology can immensely benefit from the bioinformatic approach. Bioinformatics tools for efficient research will have significant implications in biotechnology and betterment of human lives. Junker is a web server developed to facilitate the indepth analysis of IGRs for examining their length distribution, fourquadrant plots, GC percentage and repeat details. Upon selection of a particular bacterial genome, the physical genome map is displayed as a multiple loci with options to view any loci of interest in detail. In addition, an IGR statistics module has been created and implemented in the web server to analyze the length distribution of the IGRs and to understand the disordered grouping of IGRs across the genome by generating the four-quadrant plots [51]. The mechanism of calcium uptake, translocation and accumulation in Poaceae has not yet been fully understood. To address this issue, we conducted genome-wide comparative in silico analysis of the calcium (Ca2+) transporter gene family of two crop species, rice and sorghum. Gene annotation, identification of upstream cis-acting elements, phylogenetic tree construction and syntenic mapping of the gene family were performed using several bioinformatics tools [52]. MiRNAs play a relevant role in regulating gene expression in a variety of physiological and pathological conditions including autoimmune disorders. MiRNAs are also important in the differentiation and function of the mouse intestinal epithelium. Bioinformatics analysis helped to predict putative target genes of miRNAs and to select biological pathways. The presence of NOTCH1, HES1, KLF4, MUC-2, Ki67 and beta-catenin proteins in the small intestine of CD was tested by immunehistochemistry [53]. Microarrays are a valuable technology to study fungal physiology on a transcriptomic level. Various microarray platforms are available comprising both single and two channel arrays. Using two publicly available microarray datasets from Aspergillus niger, detailed protocols on how to identify differentially expressed genes and how to construct gene coexpression networks can be explained [54]. IHF and HU are two heterodimeric nucleoid-associated proteins (NAP) that belong to the same protein family but interact differently with the DNA. IHF is a sequence-specific DNA-binding protein that bends the DNA by over 160°. HU is the most conserved NAP, which binds non-specifically to duplex DNA with a particular preference for targeting nicked and bent DNA. Despite their importance, the in vivo interactions of the two proteins to the DNA remain to be described at a high resolution and on a genome-wide scale. Further, the effects of these proteins on gene expression on a global scale remain contentious.A genome-scale study of HU- and IHF binding to the Escherichia coli K12 chromosome using ChIP-seq is made later microarray analysis of gene expression in single- and double-deletion mutants of each protein to identify their regulons are observed. The sequence-specific binding profile of IHF encompasses 30% of all operons, though the expression of <10% of these is affected by its deletion suggesting combinatorial control or a molecular backup. The binding profile for HU is reflective of relatively non-specific binding to the chromosome, however, with a preference for A/T-rich DNA. The HU regulon comprises highly conserved genes including those that are essential and possibly supercoiling sensitive. By performing ChIP-seq experiments, where possible, of each subunit of IHF and HU in the absence of the other subunit, genome-wide maps of DNA binding of the proteins in their hetero- and homodimeric forms are defined by the conclusion [55]. Detailed information about stage-specific changes in gene expression is crucial for understanding the gene regulatory networks underlying development and the various signal transduction pathways contributing to morphogenesis.wholegenome microarrays to identify genes with differential expression at 5 stages of limb development (E9.5 to 13.5), during fore- and hindlimb patterning. It was found that the onset of limb formation is characterized by an up-regulation of transcription factors, which is followed by a massive activation of genes during E10.5 and E11.5 which levels off at later time points [56]. an energy based approach for identifying the binding site residues in protein-protein complexes. The binding site residues have been analyzed with sequence and structure based parameters such as binding propensity, neighboring residues in the vicinity of binding sites, conservation score and conformational switching.The results obtained binding propensities of amino acid residues are specific for protein-protein complexes [57]. Several complex diseases are caused by the malfunction of human metabolism, and deciphering the underlying molecular mechanisms can elucidate their aetiology. Systems biology is an integrative approach combining experimental and computational biology to identify and describe the molecular mechanisms of complex biological systems. Systems medicine has the potential to elucidate the onset and progression of complex metabolic diseases through the use of computational approaches. Advances in biotechnology have resulted in the provision of high-throughput data, which provide information about different metabolic processes. The systems medicine approach can utilize such data to reconstruct genome-scale metabolic models which can be used to study the function of specific enzymes and pathways in the context of the complete metabolic network [58]. ccPDB is a database of data sets compiled from the literature and Protein Data Bank (PDB). First, data sets from the literature used for developing bioinformatics methods to annotate the structure and function of proteins are collected and compiled. Second, data sets were derived from the latest release of PDB using standard protocols. Third, a powerful module for creating a wide range of customized data sets from the current release of PDB was developed. This is a flexible module that allows users to create data sets using a simple six step procedure. In addition, a number of web services have been integrated in ccPDB, which include submission of jobs on PDB-based servers, annotation of protein structures and generation of patterns. This database maintains >30 types of data sets such as secondary structure, tight-turns, nucleotide interacting residues, metals interacting residues, DNA/RNA binding residues and so on [59]. Non- Ribosomal Peptide Synthetases (NRPSs) are multi-modular enzymes which biosynthesize many important peptide compounds produced by bacteria and fungi. Some studies have revealed that an individual domain within the NRPSs shows significant substrate selectivity. The discovery and characterisation of non-ribosomal peptides are of great interest for the biotechnological industries. Computational mining methods are applied in order to build a database of NRPSs modules which bind to specific substrates. This database is used to build an HMM predictor of substrates which bind to a given NRPS [60].

References

Citation: Sai YRKM, Siva Kishore N, Dattatreya A, Anand SY (2011) Bioinformatics Relevance in Biotechnology. J Proteomics Bioinform 4: 302-306.

Copyright: © 2011 Sai YRKM, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Journal of Proteomics & BioinformaticsOpen Access

Bioinformatics Relevance in Biotechnology

Bioinformatics Relevance in Biology

Bioinformatics Help in Biotechnology

Applications

References

Journal of Proteomics & Bioinformatics
Open Access