ISSN: 2329-8936
Michael G Sadovsky
Accepted Abstracts: Transcriptomics
The aim is to reveal the relation between the structure (that is triplet frequencies) of organelle genomes, and taxonomy of the bearers. Here the structure is a frequency dictionary of triplets. A clusterization of the dictionaries in 63-dimensional space has been studied. The clusterization yields several clusters of entities; the clade composition of those clusters has been investigated. The clusters were developed through the implementation of both linear (K-means) and non-linear techniques (rigid map technique). Ensemble of chloroplast genomes (251 entries), and ensemble of mitochondria genomes (2997 entries) were used. We developed a clusterization through the K-means implementation. The series differ in cluster numbers: Starting from 8 clusters at the first series, and ending with 3 clusters of the last one. The clade composition of the clusters in each series and being studied. A change of the series (say, transfer from seven clusters to six ones) followed in a fusion of some clusters into a single one. The clade composition of those clusters was extremely non-random: Closely related species occupied the same cluster, and the exclusions were very rare. It was found that Eucalyptus, Glycine and Magnolia occupy the same cluster. Nannocloropsis, Cyanophora and Pavlova occupy another one cluster, and Selaginella and Chlorophyta occupy the third one, among the clusters leading in size and abundance of the species composing it. Remarkably, there are few clusters that splitting into two\elder" ones. Graphically, it is shown as two vertices with two outgoing edges. A similar pattern is observed for mitochondria genomes. The clusterization into two classes yields extremely stable and exact separation of the genomes of vertebrates from the genomes of invertebrates, shows the distribution of mitochondria genomes into two classes obtained through K-means (left) vs. the coloring of the same genomes (right) with vertebrates marked in red, but invertebrates marked in dark. The taxonomy composition of clusters is very far from a random one: The closely related species, preferably occupy the same cluster (vortex). The most intriguing thing is that the proximity in the terms of genome structure was implemented over the organelle (chloroplast or mitochondria) genomes, while the proximity in the terms of taxonomy is derived from morphology (determined by the nuclear genome); these two genetic entities are physically isolated. This strong and evident correlation between the clusters identified through a proximity in triplet frequencies, and their taxonomy composition proves apparently and unambiguously the highly expressed synchrony in evolution of host andorganelle genomes.