ISSN: 0974-276X
Taner Arslan
Germany
Research Article
Robust Detection of Outlier Samples and Genes in Expression Datasets
Author(s): Ahmad Barghash, Taner Arslan and Volkhard Helms
Ahmad Barghash, Taner Arslan and Volkhard Helms
Expression and methylation datasets are standard genomic techniques and an increasing number of computational methods are implemented to aid in analyzing the huge and complex amount of generated data. Such generated datasets often contain a sizeable fraction of outliers that cause misleading results in downstream analysis. Here, we present a comprehensive approach to detect sample and gene outliers in expression or methylation datasets. The core algorithms detected most outliers that were artificially introduced by us. Sample outliers detected by hierarchical clustering are validated by the Silhouette coefficient. At the gene level, the GESD, Boxplot, and MAD algorithms detected with f-measure of at least 83% the simulated outlier genes in non-intersected distributions. This combined approach detected many outliers in publicly available datasets from the TCGA and GEO portals. Frequent.. View More»
DOI:
10.4172/jpb.1000387