Journal of Proteomics & Bioinformatics

Journal of Proteomics & Bioinformatics
Open Access

ISSN: 0974-276X

Abstract

GMSECT: Genome-Wide Massive Sequence Exhaustive Comparison Tool for Structural and Copy Number Variations

Abhishek Narain Singh*

GMSECT is a parallel robust ‘Application Interface’ that efficiently handles the large genomic sequences for rapid and efficient processing. It is a ‘message passing interface’ based parallel computing ‘Tool’ that can be operated on a cluster for ‘Massive Sequences Exhaustive Comparison’, to identify matches such as the structural variants. The GMSECT algorithm can be implemented using other parallel application programming interfaces as well such as Posix-threads or can even be implemented in a serial submission fashion. There is complete flexibility to the choice of comparison tool that can be deployed and with the optional parameters as of the choice of comparison tools to suit the speed, sensitivity and specificity of pair wise alignment. The algorithm is simple and robust, and can be applied to compare multiple genomes, chromosomes or large sequences, of different individuals for personalized genome comparison and works good for homologous as well as distant species. The tool can even be applied to smaller genomes like the microbial genome such as the Escherichia coli or algae such as Chlamydomonas reihardtti or yeast Saccharomyces cerevisiae to quickly conduct comparisons, and thus finds its application to the pharmaceuticals and microbial product based firms for research and development. The application Interface can efficiently and rapidly compare massive sequences to detect for the presence of numerous types of DNA variation existing in the genome ranging from Single Nucleotide Polymorphism (SNPs) to larger structural alterations, such as Copy-Number Variants (CNVs) and inversions. The new algorithm has been tested for comparing the chromosome 21 of Celera’s R27c compilation with all the 48 chromosomes of Celera’s R27c compilation and with all the 48 chromosomes of the human Build 35 reference sequence, which took just 2 Hours and 10 minutes using the pair wise BLAST algorithm choice and with 110 processors each with 2.2 GHz capacity and 2 GB memory. GMSECT facilitates rapid scanning and interpretation in personalized sequencing project. The application interface with the above resources and alignment choice is estimated to do exhaustive comparison of the human genome with itself in just 2.35 days. An exhaustive comparison of an individual’s genome with a reference genome would comprise of a two ‘self-genome’ comparison and a ‘non-self-genome’ comparison which is estimated to take about 9.4 days with the above resources. With the advent of personalized genome sequencing project, it would be desirable to compare 100s of individual’s genome with a reference genome. This would involve a ‘non-self-genome’ and a ‘self-genome’ comparison for each genome, and would take around 7 days for each individual’s genome using GMSECT and the above mentioned resources.

Published Date: 2021-10-13; Received Date: 2021-09-22

Top