Journal of Chromatography & Separation Techniques

Journal of Chromatography & Separation Techniques
Open Access

ISSN: 2157-7064

+44 1300 500008

On variable selection methods for analyzing high dimensional data from analytical instruments


3rd International Conference and Exhibition on Advances in Chromatography & HPLC Techniques

July 13-14, 2017 Berlin, Germany

J Jay Liu

Pukyong National University, Korea

Posters & Accepted Abstracts: J Chromatogr Sep Tech

Abstract :

The need to identify a few important variables that greatly affect a certain outcome of interest commonly arises in nearly all research areas. Moreover, it is convenient to analyze a smaller number of variables in order to reduce the complexity of a problem. In this work, selected variable selection methods are applied to three real case studies that can be encountered in the field of analytical chemistry. Selected methods include genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), flower pollination algorithm (FPA), interval PLS (iPLS), sparse PLS (sPLS), least absolute shrinkage and selection operator (LASSO), least angle regression algorithm (LARS), interval partial least squares (iPLS), sparse PLS (sPLS), and uninformative variable elimination-PLS (UVE-PLS). Three case studies are (ii) development of quantitative structure�retention relationship (QSRR) models, (ii) multivariate calibration of soil carbonate content using Fourier transform mid-infrared (FT-MIR) spectral information, and (iii) diagnosis of prostate cancer patients using gene expression information. In latter two case studies, beside quantitative performance measures: error and accuracy often used in variable selection studies, a qualitative measure, the selection index (SI), was introduced to evaluate the methods in terms of quality of selected features. The results from the first case study show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior with its lowest computational cost and higher accuracy with a smaller number of variables. Results of the second case study have shown that in order of decreasing predictive ability and robustness: GA>FA>PSO>LASSO>LARS is recommended for application in regression involving spectral information. In the third case study, the following trend: GA>PSO>FA>LASSO>LARS (accuracies of 100, 95.12 and 90.24%) has been observed. In the classification case, only LARS exhibited a considerable decrease in accuracy upon introduction of noise features.

Biography :

Email: jayliu@pknu.ackr

Top