ISSN: 0974-276X
Research Article - (2013) Volume 6, Issue 9
Cancer is a complex process in which the abnormalities of many genes appear to be involved. The synchronized patterns of gene mutations may reveal the functional relations between genes and pathways in tumor-genesis as well as identify targets for treatment. Co-occurrence database represents a comprehensive core collection of data on published coincidental mutations in nuclear genes underlying human cancers. By August 2013, the database contained 126 different coincidental lesions in 43 different genes in 25 different types of human cancers, and 8 cellular signaling pathways. In next step, this model for knowledge representation about coincidental mutations in human cancers can be extended to other synchronized patterns of genes alterations in human diseases. Although, co-occurrence database originally established for the scientific study of coincidental mutational mechanisms in human cancer genes, it is also applicable for physicians and genetic counselors. The database is freely available from http://lbb.ut.ac.ir/project/Co-Occurrence/home.php
Keywords: Gene Mutations; Human Cancers; Co-occurrence database; Database design; Family-specific mutational databases
Cancers arise due to the accumulation of mutations in protooncogenes and tumor suppressors genes [1]. Mutations have variety effects on products of genes, including truncations, frame shifts, reduction of transcripts level, and disruption of the three dimensional protein structures [2]. These harmful alterations confer a selective advantage on the cell and its progeny for tumor genesis [3]. Knowledge of mutations is necessary to understand biology of cancer and development of molecular targeted therapies [1]. Tumor-genesis is a multi-step process and several mutations in different genes are necessary to transform normal cells into malignant [4]. There are rare examples of cancer evolution by a single gene mutation [5]. Although a single mutation can initiate the processes of tumor-genesis, in fact, multiple preliminary perturbations are required for transformation of normal cells to solid tumors including cell cycle deregulation, evasion of apoptosis, limitless replication, acquisition of genomic instability, metastasis and angiogenesis [6].
The co-occurrence database, maintained at the Laboratory of Systems Biology and Bioinformatics (LBB) in Iran, represents a comprehensive core collection of data on coincidental mutations underlying human cancers. Co-occurrence database comprises published single base-pair substitutions such as duplications, insertions, deletions, repeat expansions and indels in coding, regulatory, and splicing-relevant regions of human cancer nuclear genes. Mitochondrial genome mutations are not included. By bringing together co-occurrence database includes different recurrent combinatorial mutational patterns in 25 different human cancers.
Co-occurrence database aims to facilitate combinational mutation analysis, targeted chemotherapy, study of functional relations of genes in major cancer pathways and novel patterns in tumor-genesis. Although co-occurrence database, originally, established for the scientific study of mutational mechanisms in human cancer genes, this database can also provide information of practical importance to researchers in molecular oncology and human molecular genetics, genetic counselors, and physicians interested in a particular inherited condition in a given patient or a family. In view of its potential usefulness, the curators of co-occurrence database made the database publicly available through the World Wide Web in March 2011.
Database design
Co-occurrence database was implemented as MySQL database on a Linux server. We used PHP scripts for all data retrieval and output. Its modular design is compatible with future expansion, and contains the following subsections: EntrezGene, RefSeq [7], Ensembl [8], Protein (Swiss-Prot/UniProt), Gene Ontology [9], Mutation Frequency in 25 cancer type [10,11], cancer data and protein interactions (ROCK) [3,11]. We retrieved gene sequences including genomic DNA, mRNA, and protein from UCSC browser [12]. We downloaded interaction datasets from [11]. The process annotations were done with information from CancerGenes [13]. Figure 1 present the diagram of entities and relationships in co-occurrence database.
Data coverage and structure
By August 2013, co-occurrence database contained 126 different recurrent combinatorial mutational patterns in 43 different genes in 25 types of human cancers (Table 1). The content of co-occurrence DB originates from published scientific literature [11]. In that article, Yeang et al. [11] retrieved the co-occurrence gene mutations in cancer by a combination of manual and computerized search procedures and statistical analysis [14]. They compiled collections of samples initially from an exhaustive search of the scientific literature from PubMed database at NCBI (http://www.ncbi.nih.gov/pubmed) and COSMIC database. They considered gene mutations in tumor samples which were screened at least for two genes. Finally, they classified candidate genes based on the type of cancers and mutations [11].
Row | Cancer Tissue | Frequency in DB | Percent |
---|---|---|---|
1 | Acute lymphoblastic leukemia | 11 | 8.7 |
2 | Acute myeloid leukae | 2 | 1.6 |
3 | Acute myeloid leukemia | 6 | 4.8 |
4 | B cell lymphoma | 1 | 0.8 |
5 | Biliary tract | 2 | 1.6 |
6 | Breast | 3 | 2.4 |
7 | Central nervous system | 8 | 6.3 |
8 | Endometrium | 2 | 1.6 |
9 | Esophagus | 1 | 0.8 |
10 | Hematopoietic and lymphoid others | 1 | 0.8 |
11 | Kidney | 2 | 1.6 |
12 | Large intestine | 20 | 15.9 |
13 | Lung adenocarcinoma | 11 | 8.7 |
14 | Lung large cell carcinoma | 1 | 0.8 |
15 | Lung small cell carcinoma | 4 | 3.2 |
16 | Lung squamous cell carcinoma | 1 | 0.8 |
17 | Pancreas | 10 | 7.9 |
18 | Prostate | 10 | 7.9 |
19 | Skin | 16 | 12.7 |
20 | Soft tissue | 2 | 1.6 |
21 | T cell lymphoma | 1 | 0.8 |
22 | T cell lymphoma acute lymphoblastic | 1 | 0.8 |
23 | Thyroid | 3 | 2.4 |
24 | Upper aerodigestive tract | 2 | 1.6 |
25 | Urinary tract | 5 | 4 |
Total | 126 | 100 |
Table 1: List of types and frequencies of cancers in the co-occurrence database.
We can use Catalog of Somatic Mutations in Cancer (COSMIC) database to extend the co-occurrence database in next steps [15]. The COSMIC is a large database of cancer somatic gene mutations that extracted from the literatures [15]. Currently, COSMIC collects only small mutations in protein-coding regions, including missense mutations [16]. Figure 2 shows an example of co-occurrence database pages, includes the related annotations of a gene with combinatorial patterns of mutations. The list can be sorted alphabetically, according to the number of samples, number of combinational mutations and P-values. Gene names hyperlink to gene-annotation pages and cancer types are hyperlinked to pathways of co-occurrence mutations (Figure 2).
Figure 2: List of genes with synchronized patterns of mutations with KRAS. Table includes related annotations with each synchronized pattern, including number of company-sequenced samples (co-sequenced), sample with mutations in KRAS (KRAS mutated), samples with mutations in gene-2 (gene-2 mutated), number of samples with mutation in both genes (both), and P-values.
Initial entries in homepage of database are gene chromosomal location, cancer tissue type, or the ‘search’ box, where any text can be typed for a database search. Links from results page direct user to more details about mutations, publications, and summaries. In addition, hypertext links exist from each gene page to mutation frequencies in human cancers and list of co-occurrence mutations. All entries composed gene name, alternative names, different gene IDs, chromosomal location, GO category, references to the first literature report of mutations, and mutations frequency. User can search cooccurrence database by gene accession numbers, gene symbols, or HUGO-approved gene names.
The impact of the mutation on the cell signaling pathways and proteins interactions is a key in determining the impact of combinatorial patterns of gene mutations [17]. Towards this goal, general relations between co-mutated genes and cell signaling were provided for acute lymphoblastic, acute myeloid leukemia, central nervous system, large intestine, lung adenocarcinoma, lung small cell carcinoma, pancreas, and skin cancer (Figure 3) [11].
Figure 3: Pathway of co-occurrence mutations in human acute lymphoblastic cancer [11]. The dotted arrows indicate co-mutations.
Co-occurrence database has established as the database of coincidental gene mutations in human cancers to link current knowledge on co-mutated genes to human cancers and their relevance for the tumor-genesis [1,18]. For each relation, candidate genes are selected with referred examined samples and panoramic view of functional relations [2,11]. Our major focus was to provide a filtered and annotated reference set of co-occurrence of mutations in cancer linked genes. Co-occurrence database can be used in the areas of functional relations of genes with combinational mutations, gain insight for identifying mutated driver pathways, and date and party hubs in network of cancer [17,19,20]. The co-occurrence database combines automated and expert annotation of coincidental mutations towards the specific needs of the cancer research community.
We designed co-occurrence database with a user-friendly interface to assess which genes malfunction were causal of most human cancers, and which co-occurrence mutations were cancer drivers. In next step, we will extend the database to include more genes, mutations, and pathway information. In our view, co-occurrence database annotation model can be applied to annotate co-mutated gene entries in other human inherited genetic diseases from family-specific mutational databases such as SH2base [21] and KinMutBase [22].
Availability
The database is freely available via the World Wide Web from http://lbb.ut.ac.ir/project/Co-Occurrence/home.php. The database may be cited by referencing this article.