ISSN: 0974-276X
Brett A. Lidbury
Scientific Tracks Abstracts: J Proteomics Bioinform
Diagnostic pathology laboratories are essential for modern health systems, providing high quality blood test results that reflect health/disease status. These laboratories sample the surrounding population continuously, and in doing so accumulate enormous volumes of human physiological/biochemical data, for example on liver and kidney function, lipid metabolism, and blood cell biology. Pathology data also reflect naturally acquired disease processes in human subjects that while diverse compared to research laboratory conditions, represents true human disease biology. With patient health currently evaluated via individual pathology results in relation to laboratory reference ranges, the availability of massive data sets and machinelearning methods provide opportunities to advance laboratory diagnostics, and fundamental research, through ?knowledge discovery? bioinformatics, particularly pattern recognition methods. Pattern recognition in aggregated pathology data is being explored via combinations of tree-based machine-learning and support vector machines (SVM) executed through R statistical computing packages. Tree methods (recursive partitioning) bring the advantage of multiple decision boundaries, while SVMs provide powerful categorisation and regression modelling in high dimensional feature space. Tree methods and SVMs have been used on large data sets comprising immunoassay data for hepatitis B virus (HBV) or Chlamydia pneumoniae (response variable) and associated routine pathology test results (predictor variables). Challenges involving unbalanced data and missing values have been met, with data patterns of high predictive value identified for future biological validation. As well as benefits for laboratory medicine, this strategy is also included in a novel system designed to provide an alternative to mouse models in fundamental research.