ISSN: 0974-276X
Short Communication - (2010) Volume 3, Issue 11
There is a continuous need for similarity search and predictive tools for the isolation of protein function, coding region, non-coding region, genes, orthologs group and phylogenetic relation in modern biological research. Especially, similarity search tool like BLAST Altschul et al. (1990) and gene prediction tool GENEMARK HMM Lukashin and Borodovsky (1997) are very effective and trust worthy computational tools for locating domains and gene prediction respectively. Using such numerous tools in the discovery environment for routine analysis is often laborious and time-consuming. For example to perform a homology search for 200 different proteins for function identification, one has to submit the individual sequence to the BLAST service and then obtains the result. Some of the problems faced during this process are (1) Should do manually because most of the services do not have mass sequence submission for the BLAST search, even though NCBI BLAST has the multiple sequence submission facility, it has processing time limit restriction, also user has to wait for the individual sequence result until BLAST search is done for all the submitted sequences, (2) Error-prone - possibility of human error on doing repetitive task, (3) Time consuming - consume lots of time when done manually. So, MASS BLASTER (Figure 1) is developed keeping in mind that the process should be automated, to reduce time and error, and to produce the same result of what the service provider, without sacrificing the quality of the result.
Model development
The architecture of MASS BLASTER is shown in (Figure 2). The implementation of the system is carried out in PERL V5.12, perltk module is used to make Graphical User Interface (GUI) for the tool. The process flow of the tools is as follows: (1) the file containing multiple sequencea (FASTA formatted file) is loaded; (2) the sequences are parsed separately; (3) parameters for the corresponding tool is set; (4) connects to the service server and submits each and individual sequence to the service separately, therefore preventing the server side restriction for multiple sequence submission (5) obtains the result generated by the services and parse the information (E-Value.)b of the result; (6) the parsed information is aligned, tabulated and saved to the user desired location on the system in hypertext format. The services included in the MASS BLASTER are (1) NCBI BLAST - protein Altschul et al. (1990) (2) NCBI BLAST - Nucleotide Altschul et al. (1990) (3) GeneBee BLAST - Nucleotide Altschul et al. (1997) (4) RNA noncoding BLAST Altschul et al. (1997) (5) GENEMarkHMM - Prokaryotes and Eukaryotes Lukashin and Borodovsky (1997) (6) COgNitor and Tatusov et al. (2001) (7) GLIMMER Salzberg et al. (1998), Delcher et al. (1999).
Requirements
The tool requires PERL V.10.1 or higher, with the additional module "Tk" for graphical interface, "WWW::Mechanize" for connecting to the internet service. These two modules doesn't come with the standard PERL installer, but can be downloaded freely from the CPAN archive. The tool is compatible with both MS-Windows platform [XP (service pack 2 or higher), VISTA, and Windows 7] as well as LINUX platform.
Input
The tool accepts both protein and nucleotide sequence in standard FASTA format. The number of sequence loaded on to the tool is not restricted, but it depends on the user systems memory and processor. With a minimum configuration (1GB RAM, Pentium IV or Higher processor, at least 10GB physical memory) at a time, up to 10,000a sequences can be loaded and submitted. User can select and set the various parameters provided by the corresponding services, like Database, program type (blastx/blastn), Matrix, etc. (Figure 3) The modification made to services will be indicated in the message window of the tool.
Output
Results from the service, for each submitted sequence, are hypertext formatted (hypertext format is preferred because as the tool being executed in multiple platform, HTML format can be easily viewed in any browser and no special software required for viewing the result) and stored in the system in user desired location. For the NCBI BLAST services (both nucleotide and protein) the information's like Hit ID, Accession, Definition, Length, Bit Score, E Value, Query From, Query To, Hit From, Hit To, Query Frame, Hit Frame, Identity, Positive, Align Length and Alignment are parsed and tabulated, for easy understanding, in hypertext format and saved in the system (Figure 4). For other services the result page content is stored assuch in hypertext format. The status of the submitted sequence and the progress of the results are displayed in status window (Figure 5).
avirtually unlimited sequence can be loaded, and it depends on system memory and processor
bfor NCBI Blast services alone
This tool is used and tested while carrying out the Genomewide analysis of intergenic regions in 11 species of Mycoplasma. To explore the coding regions in the 6840 intergenic sequence from the 11 genomes of Mycoplasma species, all the intergenic sequences are extracted from the genome and subjected to similarity search with BLASTX Altschul et al. (1990) program to explore the coding segment present in it. Also, the sequences are subjected to GENEMARK.HMM Lukashin and Borodovsky (1997) for the prediction of potential gene activity in the intergenic region. During the course of this work all the 6840 intergenic sequences are extracted from the corresponding genome sequence of Mycoplasma species and fed as input to the MASS BLASTER to perform the BLASTX and GENEMARK-HMM. It took approximately 22 hours and 42 minutes for performing the translated BLASTX and 4 hours 50 minutes for performing the GENEMARK-HMM for all the 6840 intergenic regions, with a internet speed of 1Mbps and Pentium IV processor 2 GB RAM running Microsoft Windows XP as operating system. The results of all the individual sequences are automatically stored as file in the system (existence of result files for all the 6840 input is verified manually) by the MASS BLASTER for further analysis. This clearly indicates the tool performance and usability in speeding up the process carried out using Bioinformatics prediction tool.
MASS Blaster is designed and developed in PERL and framed in a way to perform simple repetitive task in an ambiguity free manner. The tool is supplied as open source software, and hence one can study and change the software for further improvement. The intention of the work is to automate the regular and routine basic sequence analysis process, there by speeding up the biological research process that adapts Bioinformatics prediction tools for analysis.