ISSN: 0974-276X
Research Article - (2013) Volume 6, Issue 9
Mutability varies significantly along nucleotide sequences, and mutations occur at higher frequencies at certain positions of a genome. The high rate of mutation in some regions is independent from the function and structure of corresponding regions in protein products. This raises the possibility of DNA sequence and cis-elements as causes of mutations. We used computational methods to examine the surrounding region of 20 mutation hotspots related to different human genetic disorders, and combinatorial patterns of gene mutations in cancers. We introduced A(C/G)AA(C/G)(A/T) as an associated sequence motif in fourteen hotspot of mutations in different cancers. These observations are support the correlation between DNA motifs with hotspots of mutation in cancer and promising marker for selection of suitable regions for genes mutation screening.
Keywords: Co-occurrence; DNA motif; Intrinsic mutation; Mutation hot spot
Mutations in specific genes are the etiology of many human diseases. These mutations are induced by external environmental abuse (like chemicals and radiation) or spontaneously. Examples of the mechanisms of induced and spontaneous mutagenesis are interaction between DNA and mutagens and deficiency in action of the replication machineries, respectively. The frequency of mutations are significantly vary along genomic sequences [1]. The mutation frequency in some nucleotide positions is particularly high [2]. These positions are called mutation hotspots [3]. Some properties of intrinsic mutation hotspots such as repetitive DNA sequences are related to mutagenesis processes [4]. Computer analysis of nucleotide sequence context of identified hotspots can provide information about the biomarkers and their usage for prediction of suitable regions for mutation screening [5].
DNA motifs are recurring and short patterns with a biological function [6]. They often show fingerprint of interactions between DNA and many regulatory and modification enzymes [7]. The association of DNA motifs with hotspots of intrinsic mutation in genome can be evidence of such interactions [6]. Normally, the nucleotide DNA motifs are short (5 to 20 bp) and is believed to reoccur in different genes or several times within a gene [8]. Many algorithms exist to detect and analyze DNA motifs [9]. Motif finding algorithms have been categorized in word-based and probabilistic sequence models. Word-based algorithms are mostly relying on exhaustive comparing oligo-nucleotide frequencies. In probabilistic algorithms the model parameters are estimated using maximum-likelihood principle or Bayesian inference [9]. Most of these algorithms have been designed to find motifs by incorporate parameters in higher organisms including human [7].
Comparative studies introduced several common short sequence features within recombination hotspots [10-14]. We hypothesized the involvement of DNA motifs in local high rate of mutation in cancer genes. In support of this hypothesis, we used computational methods to examine the surrounding region of hotspots in human genes and combinatorial mutational patterns.
In this study, we collected DNA motifs in adjacent of mutation hotspots, and subsequently searched for motif(s) that is common on multiple motif sets. In first step, based on a careful inspection of the available literatures and Catalog of Somatic Mutations in Cancer (COSMIC) database, mutation hotspots in 20 cancer genes were selected (Table 1, Supplementary file 1) [15,16]. We considered the 660 nucleotide around a mutation hotspot by most common motif finder programs including AlignACE, ANN-Spec, BioProspector, Improbizer, MDScan, MEME, MITRA, Motif Sampler, SPACE, Weeder, and Trawler [17]. After analyzing the selected regions of human genome, huge numbers of DNA motifs were collected. To recognize the common motifs between sets of motifs, a C++ program was prepared and used. In this algorithm, the input is list of unique motifs as a text file and output is common motifs between clusters (Figure 1).
Row | Gene Name | Cancer Tissue & Disease Names | Gene Position on Human Genome | Hotspot Position on cDNA | Number of motif | |
---|---|---|---|---|---|---|
Hot-spot | Cold-spot | |||||
1 | PIK3CA | large intestine, breast, endometrium | Chromosome 3: 180,349,005-180,435,189 forward strand | c.3140A>G | 3 | 1 |
2 | APC | large intestine, Stomach | Chromosome 5: 112,101,483-112,209,834 forward strand | c.4348C>T | 2 | 1 |
3 | CDKN2A | haematopoietic and lymphoid tissue, central nervous system, lung | Chromosome 9: 21,957,751-21,984,490 reverse strand | c.238C>T | 2 | 1 |
4 | ATM | haematopoietic and lymphoid tissue | Chromosome 11: 107,598,769-107,745,036 forward strand. | c.9023G>A | 3 | 2 |
5 | RB1 | Lung, eye | Chromosome 13: 47,775,884-47,954,027 forward strand. | c.958C>T | 0 | 2 |
6 | VHL | kidney | Chromosome 3: 10,158,319-10,168,744 forward strand | c.241C>T | 1 | 0 |
7 | PTEN | Endometrium, glioma, breast | Chromosome 10: 89,613,175-89,718,511 forward strand | c.388C>T | 4 | 0 |
8 | FLT3 | haematopoietic and lymphoid tissue | Chromosome 13: 28576811:28675329:1 | c.2503G>T | 1 | 0 |
9 | MSH6 | central nervous system, large intestine, stomach | Chromosome 7: 48009621:48034692:1 | c.3656C>T | 2 | 0 |
10 | SMARCB1 | soft tissue, central nervous system | Chromosome 22: 24,129,150-24,176,701 forward strand. | c.601C>T | 0 | 1 |
11 | ALK | autonomic ganglia | Chromosome 2: 29,415,640-30,144,432 | c.3824G>A | 0 | 1 |
12 | CTNNB1 | Liver, soft tissue | Chromosome 3: 41,236,328-41,281,939 forward strand. | c.121A>G & c.134C>T | 4 | 2 |
13 | FGFR3 | Achondroplasia, Skin, Urinary tract | Chromosome 4: 1,795,560-1,810,598 | c.746C>G | 3 | 0 |
14 | CdK6 | Skin | Chromosome 7: 92,072,175-92,301,148 reverse strand. | c.588C>T | 0 | 3 |
15 | PDGFRA | soft tissue, stomach | Chromosome 4: 55,095,264-55,164,412 forward strand. | c.2525A>T | 0 | 0 |
16 | JAK2 | haematopoietic and lymphoid tissue | Chromosome 17: 7,512,445-7,531,642 reverse strand. | c.1849G>T | 2 | 1 |
17 | KIT | haematopoietic and lymphoid tissue, soft tissue | Chromosome 4: 55,524,095-55,606,879 forward strand. | c.2447A>T | 5 | 0 |
18 | MAP2K4 | urinary tract, testis | Chromosome 17: 11,864,866-11,987,865 forward strand. | c.551C>T | 2 | 0 |
19 | RET | Thyroid, adrenal gland | Chromosome 10: 42,892,523-42,945,805 forward strand. | c.2753T>C | 0 | 0 |
20 | P53 | Breast, Large intestine, Upper aerodigestive tract | Chromosome 17: 7572927-7579912 forward strand. | c.743G>A | 2 | 0 |
Table 1: Selected genes for analysis of common sequence motifs associated with intrinsic mutation hot spots.
For each mutation hotspots, we selected an identically sized coldspot region on the same gene, where there was no evidence of high frequency of mutation in human cancers [15]. For each hot-spot and cold-spot pairs, we recorded the number of A(C/G)AA(C/G)(A/T) motif, and compared the frequency of this motif in two groups by unpaired t-test [5].
In first step, we considered the frequency of A(C/G)AA(C/G) (A/T) motif in hot and cold-spots of mutations, as recognition site of Phenylacetyl nitrogen mustard. We found this DNA motif in association with hotspot of mutations in different human cancers including breast cancer, blood cancer, large intestine cancer, lung cancer, kidney cancer, glioma, liver cancer, stomach cancer, etc. (Table 1). In all cases, hotspot of mutation and motif were on the same strand of DNA. We observed that frequency of A(C/G)AA(C/G)(A/T) were significantly different for hot-spot and cold-spot regions (t (4.81)=2.62, P<0.05) (Table 1).
The association of specific DNA motif with hot spots of mutation can be indication of same sequence dependent mutagenesis mechanism. We examined the authenticity of this theory by study of combinatorial mutational patterns in fourteen different human cancer genes containing A(C/G)AA(C/G)(A/T) motif [16]. The co-occurrence of mutations have been detected in PI3K with ACP, RB1 with PTEN, KIT with CTNNB1, PDGFR and FLT3, and P53 with RB1, PTEN, PIK3 and APC genes [18].
In next step, we analyzed selected regions of human genome around the hotspot of mutations by MotifVoter to find more motifs that are common between them. We compared hundreds of motifs in produced sets, but any statistically significant common motif did not detect.
Mutation screening is one of the most popular methods for genetic diagnosis and confirming the genetic cause of diseases. The numerous exon of genes and limitation in length of sequence that can be analyzed with common sequencing machines are two major difficulties in screening of mutations [19]. For example, the APC gene is composed of 15 exons, nevertheless, about two thirds of the total mutations are clustered in the 5’ region of exon 15 [20]. We hypothesized the DNA motif(s) can result in local high rate of mutation and this motif can be a marker for selection of most probable regions for mutation screening [21]. In support of this hypothesis, we used computational analysis method to examine surrounding regions of mutation hotspots in 20 human cancer genes.
According to the results of this study, we introduced A(C/G) AA(C/G)(A/T) as an associated DNA motif with fourteen hotspots of mutations. In addition, the simultaneous of mutations in different human genes containing A(C/G)AA(C/G)(A/T) motif could be the evidence of same mutagenesis mechanisms.
The A(C/G)AA(C/G)(A/T) motif is glucocorticoid receptor (GR) binding sequence and cyclic polyamide recognition site [22]. The glucocorticoid receptor response element (GRE) consists of palindromic GR binding sequence to which a hormone-bound GR dimer can be bind. GR behave as a ligand-inducible regulator that binds to DNA as a monomer at GRE half-sites (GRE1/2). GRE1/2s can act without associated elements [23]. For example, in mammary tumor virus (MMTV) enhancer, the GRE half-site can be bound by GR [24]. The glucocorticoid recruits ATP-dependant chromatin remodeling complex positioned nucleosomes over the GRE half-sites. The absence of this chromatin remodeling prevents the binding of NF-1, Oct and TBP to the DNA. In contrast, nucleosomes do not hinder GR [25]. It is likely, some features of chromatin structure have effect on mutational frequency of neighbor sequences [26]. The A(C/G)AA(C/G)(A/T) motif is also the recognition site of Phenylacetyl nitrogen mustard (PNA) and hairpin polyamides [27,28]. PNA show specificity for its match sequence (5’-AGAACT-3’) in minor groove of DNA. The replacement of sulfur by nitrogen from mustard gas produces PNA as an alkylating agent. They formerly used as toxicants and powerful mutagens, but now use to preventing development of neoplasms. PNA and hairpin polyamides exhibit exquisite AGAACT sequence specificity, and PNA conjugates exploited to direct modification of DNA targets [29,30].
Some previous researches characterized the role of nucleotide sequences and hotspot of mutations [19,21,31]. One well-known example of intrinsic mutation hotspots is CpG di-nucleotides [32]. Another example of mutational hotspots is signal of retroposable elements (LINEs and SINEs) in mammals [33]. The (CG)4 motif was found to be hotspot of Spontaneous frame shifts in Salmonella typhimurium [34]. GC is hotspots of frame shifts during DNA synthesis in vitro by Sulfolobus solfataricus DNA polymerase IV [35]. The CATCGCTTRRT motif is signal of recombination in Bacillus subtilis mal gene [36]. Although these studies have characterized the role of DNA motifs in replication, recombination, and mutagenesis at the sequence level, none have investigated the relation between sequence motifs and hotspot of mutations in human cancer genes [21].
Sequence motifs as biomarkers hold promise for the uniform, molecular, and database-driven predictor of mutation hotspots in coding region of candidate human disease genes. In addition, the connection between genes mutations and specific DNA motif as a target of small molecules is a proper opportunity from serendipity capable of reprogramming the cellular machinery to the challenge of treating genetic diseases.
It includes the histogram of mutation frequency for 20 cancer genes and identically sized regions around their cold and hotspot.