ISSN: 2376-130X
Review Article - (2014) Volume 1, Issue 1
The process of drug discovery requires integration of biochemical and genetic tests to analyze the effects of drug molecules on biological systems. Comparative proteomic/lipidomic methods have identified a large number of differentially expressed novel proteins and lipids that can be used as prominent biomarkers for disease classification and drug resistance. Lipidomics or proteomics are not only used for target identification and deconvolution but also for analysis of off–targets and for studying the mode of action of drug molecules. In addition, they play significant roles in toxicity and preclinical trials at very early stages of drug development as well as in analysis of adverse effects of existing drug molecules. Since large-scale ‘omics’ data are now available in the public domain, bioinformatics and statistical analysis tools are needed to decipher knowledge from this vast amount of data. This review gives a brief overview of advancements in technological and computational methods in the area of lipidomics and proteomics based drug design.
<Keywords: Proteomics, Lipidomics, Mass-spectrometry, Target deconvolution, Off-target, Toxicity
Drug discovery is a time consuming and cost intensive process. It takes around 12-15 years and costs up to €800 million in order to bring a new drug in market [1,2]. Most drugs exert their therapeutic effect by binding to and regulating the activity of a particular therapeutic target. Identification and validation of such targets is a first and important step in drug discovery. Currently therapeutic targets can be identified using both structural and sequential approaches. Selectivity and specificity are not only major challenges in drug design but also important factors for withdrawing drug molecules from the market [2,3]. There are numerous proteins in the human body and it is practically impossible to ascertain whether these drug molecules bind with high affinity only to intended target proteins or also interact with other off-targets.
The drug discovery process requires several biochemical and genetic assays in order to delineate the effects of drug candidates on cellular systems and model organisms [4]. State-of-the-art proteomics/ lipidomics techniques measure the changes in proteins/lipids and their isoforms quantitatively upon drug exposure, and are important tools at various stages in small drug molecule discovery. Advancement in high-throughput technology and better understanding of biology is helpful in this aspect. The ‘omics’ technology utilizes high-throughput techniques for generating vast amount of data allowing new directions in drug discovery [5]. In general the ‘-omics’ suffix has been used to denote the study of the entire set of entities in a class. ‘Omics’ data provide comprehensive descriptions of nearly all components and interactions within a system that are required to enable a system level understanding [6]. Genomics, proteomics, toxicogenomics, lipidomics, pharmacogenomics, metabolomics and other areas of ‘omics’ have become handy tools in modern drug discovery. These ‘omics’ technologies are very popular in disease biomarker identification [4,7], drug target identification [8-11], and profiling of drug molecules [12-14].
Proteomics based methods are now popular in drug target identification as well as in off-target analysis. Recent technological advancement in mass spectrometry (MS) and rapid improvements in chromatographic techniques have led to the rapid expansion of the proteomics and lipidomics. Recent development in computational database search algorithm [15], pathway mapping [16] give new dimension in area of biomarker and target identification. Different software for MS data processing and analysis are already available but they give a lot of false positive and false negative hit, so integration of new component to overcome this problem is still needed [6-18]. This review encompasses an overview of applications of lipidomics and proteomics in drug designing.
Proteins are the principal targets of small chemical drug molecules. Common applications of proteomics in the drug discovery include target identification and validation, identification of toxicity biomarkers, efficacy estimation and understanding the mode of action of the drug molecules and their toxicity. MS based proteomics technologies are ideally suited for the discovery of biomarkers in the absence of prior knowledge of quantitative and qualitative changes in proteins. Following are the major areas in drug discovery where proteomics have become popular.
Drug target deconvolution is a process involving identification of complete spectrums of proteins that are associated with the bioactive chemical drug molecules [4,19]. Information about spectrum of target proteins against bioactive drug molecule helps in drug toxicity research by identification of off-targets, leading to drug prioritization. It also aids in identification of additional unexplored targets of existing drug molecules. It is worthwhile to add detailed target deconvolution in each drug discovery process. The deconvolution of therapeutic drug targets should be done in consecutive steps whereby experiments are repeated at least twice and only resultant consensus proteins are considered to be valid. Proteins that observed repeatedly in every independent experiment using unrelated drugs and/or with matrices without immobilized drugs are removed from list. Finally, the most frequent proteins those are present in several cell lines, also known as ‘core-proteome’ proteins are also removed from the final list [20].
The direct way to identify the molecular target of a drug candidate involves immobilizing the drug molecules on solid matrix, e.g. agarose, sepharose or streptavidin magnetic beads [21-24]. Use of complex protein mixtures such as cell or tissue or organ lysates, with matrixbound drug molecule, captures the target proteins. Matrix and linker molecule are selected on the basis of little or no unspecific binding of proteins. This also includes a control that involves beads with linker without drug molecule. These controls are included for every experiment to identify unspecific binders. In past chemoproteomic approach based target deconvolution based on classical drug affinity chromatography has been successfully used in identification of molecular target of immunosuppressants [25,26] and inhibitors of histone deacetylation [27]. Protein kinases are major therapeutic targets and their involvement in cancer and inflammation has been well explored. Although several successful cancer drugs are associated with the well-defined protein kinase target profiles, such as Imanitib or Dasatinib, several off-targets have been identified for these drugs [28,29]. The use of single immobilized kinase inhibitors allows the capturing of specific target proteins [30,31] as well as off-targets.
Flesischer et al. [32] used affinity-based proteomics to identify nicotinamide phosphoribosytransferase as a target of the potent and selective cytotoxic agent CB30865. Huang et al. [33] used chemoproteomic approach to identify tankyrases as the target of the small molecule XAV939. Filippakopoulos et al. [34] demonstrated that the small molecule, JQ1, displaces BET proteins from the chromatin; hence this compound is efficient in patient-derived xenograft having squamous carcinoma. Dawson et al. [35] used a multitier proteomic strategy to characterize BET-dependent histone binding of various protein complexes including the super elongation complex (SEC) and polymerase-associated factor complex. These success stories suggest that chemoproteomic approach enables the identification of direct target of drug molecules and provides insights into regulatory mechanism depending on protein-protein interactions.
Binding mode centric profiling based on the binding/activity of a small drug molecule against proteins of particular protein target class may help in selectivity and specificity analysis of drug molecules. The affinity of a given compound to all members of a target class is determined by quantifying the amount of proteins captured by the affinity matrix. Precisely, inhibition of binding curves is obtained and used for the calculation of apparent Kd value [36-38]. This is a robust and reliable approach as proteins are assayed under physiological conditions. In addition, the multiplexing capability of MS for protein identification can provide ranked affinities of a compound against all members of the target class in a single experiment.
In case of protein kinases, the conserved ATP-binding site has been used to generate nonselective ATP-competitive affinity matrices (e.g. ‘kinobeads’) that allows the determination of IC50 values and the selectivity and specificity of drug molecules for up to 150 kinase target proteins in a single experimental run [28,29,39]. Such ‘kinobeads’ have been successfully applied in selectivity profiling of clinical BCR-ABL inhibitors in the chronic myeloid leukemia cell line K562 [28], EGFR inhibitors in HeLa cells [37], and 13 other multi-kinase inhibitors in chronic lymphoid leukemia cells under clinical investigation [29]. Wu et al. [40] used immobilized kinase inhibitors to identify targets in head and neck cancer by analyzing the kinase complement across 34 squamous cell carcinoma cell lines established from patients.
Chemoproteomic based target deconvolution of lead molecules does not necessarily identify well annotated and characterized proteins. Hence an initial challenge is to link these proteins to disease biology and to elucidate the mode of action of drug molecules for generating the observed response phenotype.
Building of protein-protein interaction networks by affinity proteomic approaches can help in characterization of functional roles of proteins under experimental conditions. In an ideal condition, placing protein into an interaction network identifies a protein directly as a player in the disease process under investigation. In addition, protein-protein interaction studies can be used to shed light on mechanism other than direct inhibition or activation by which a drug can modulate target activity. Differential protein complex formation with and without compound treatment, either in cell lysate or during the purification procedure, allows the identification of compound sensitive protein-protein interaction [41]. The generation of largescale protein interaction maps also enables the identification of more favorable drug target candidates.
Proteomic approaches are becoming an important tool to characterize the mode of action of enzymes that modulate drug compounds. It sheds light on post-translational modification of substrate proteins such as phosphorylation, acetylation and ubiquitination. Differential phosphorylation proteomic analysis, using selective small molecule inhibitors of particular kinases, has been used to identify substrates in human cells and characterize the effects of kinase inhibition on signaling. Chemogenetic kinase trapping approach allows for direct and unequivocal identification of kinase substrates. We use genetically engineered ATP-binding pocket that can bind an unnatural bulky ATP analog. This analog cannot bind with wild type kinase and hence cannot transfer its phosphate group to substrate proteins. The use of thio-ATP followed by a covalent capture step and identification of modified peptides by MS has been successfully applied for the characterization of human CDK1 and CDK proteins [42].
Another current focus of drug discovery effort is identification of epigenetic targets that modulate the posttranslational modification state of histones. Quantitative proteomics have been successfully used to study the effect of small drug molecules by monitoring protein acetylation and methylation. Application of proteomic approaches are not restricted to the identification of the mode of action but could also be applied to the identification of cellular mechanism of drug resistance [43].
Proteomic based biomarker discovery has gained substantial attention in recent years. The identification of prominent biomarkers of disease, drug efficacy and drug toxicity is important in drug discovery and disease diagnosis. The overall goal of biomarker profiling is to identify the list of proteins, which differentially expressed in disease as compared to normal cells. For example, Korolainen et al. [44] identified 26 proteins which show statistically significant changes in Alzheimer’s disease.
The identification of a mechanistic biomarker of drug efficacy can be achieved by monitoring the levels of PTMs as phosphorylation of kinase proteins, protein acetylation and deacetylation or protein fragment by protease activity, quantitative and qualitative proteomic analysis by using global proteome profiling. The power of MS-based proteomics defines its ability to discover these modifications at a large scale and monitors their responses to drug treatment. It also estimates quantitative change in the level of proteins by other system perturbations [45]. For example, the output of an enzymatic activity as pharmacodynamics biomarker is used for monitoring global protein levels as a parameter for the effect of applied treatment.
Proteomic studies on drug selectivity and mode-of-action could provide appropriate molecular toxicity biomarkers. Liver toxicity is particularly one of most common problems. Global proteomic profiling of human hepatocytes or rodent livers treated with a drug could be used to identify proteins which undergo abundant changes in response to drugs that may be useful as surrogate pharmacodynamic biomarkers [46]. It is important to translate such findings from cell line to relevant animal models of disease and, eventually, in human context.
Proteomics are being applied to identify the biomarkers in cancer and drug resistance, thereby leading to personalized therapeutic strategies of cancer patients. Besada et al. [47] used comparative proteomic analysis of the breast tumor xenografts, which are sensitive and resistant to tamoxifen. It was observed that twelve proteins are up regulated and nine were down regulated. Umar et al. [48] performed comparative proteomic analyses on LCM-purified human breast tumor cells, which are both sensitive and resistant to tamoxifen. They found a set of biomarkers such as extracellular matrix metalloproteinase inducer; ENPP1, EIF3E and GNB4 are associated with tamoxifen resistance. Recent research suggests that several proteins such as annexin IV and claudin-4 are involved in modulating response of cisplatin in ovarian cancer are potential biomarker of treatment response.
Any biomarker discovery can produces lengthy list of candidate proteins that are detected differentially in case vs controls which requires further verification in large number of samples. Verification of these candidate proteins requires targeted, multiplexed assays to screen and quantify proteins in patient plasma sample with high sensitivity and specificity. Because there is no any quantitative assay for the majority of human proteins, assays (like enzyme-linked immunosorbent assays (ELISA) must be developed for de novo for clinical testing of candidate protein biomarkers, and de novo assay development is very expensive for testing large number of candidate biomarker. Recent advances in proteomics have become an integral part of biomarker discovery, quantification and validation of candidates in bodily fluids [49,50]. Selected reaction/multiple reaction monitoring (SRM/MRM) mass spectrometry holds the promise to overcome this bottlenecking. SRM/MRM MS technology has high reproducibility across complex samples. Keshishian et al. [51] quantify six biomarkers in serum which was previously reported by ELISA. Later on Whiteaker et al. [52] reported fabulin-2 as a marker for breast cancer later Nicol et al. [53] reported carcinoembryonic antigen as a marker of lung cancer by using MRS-MS. Recently Muraoka et al. [54] identified and quantified 5122 proteins with high confidence in 18 breast cancer patient tissue sample by using shotgun proteomics coupled with the isobaric tag for relative quantification (iTRAQ) and SRM/MRM. A total of 61 proteins were found to be altered by 2-fold or more between high and low-risk breast cancer tissues and 49 of these proteins were subsequently verified with targeted proteomics using SRM/MRM. Twenty-three proteins were shown to be differentially expressed between high and low-risk group. Narumi et al. [55] performed large-scale differential phosphoproteome analysis coupled iTRAQ technique and subsequent validation by SRM/ MRM of human breast cancer tissues in high and low-risk recurrence groups. They successfully quantified 15 probable cancer biomarker phosphopeptides by SRM using stable isotope peptides.
Lipid molecules within human body are enormously complex and they are the fundamental component of biological membranes. They also play multiple important roles in biological systems such as, formation of cellular membranes, storage of energy and cell signaling, these could be expected to reflect much in health and disease. Lipidomics is a metabolomics approach targeted on lipids that aims for comprehensive analysis of lipids in biological systems. Lipidomics research involves the identification and quantification of the thousands of cellular lipid molecular species and their interactions with other lipids, proteins, sugars and other metabolites. Recently, lipidomics caught attention due to the well-recognized roles of lipids in numerous human diseases, such as diabetes, obesity, atherosclerosis, Alzheimer’s disease etc. Application of lipidomics would not only provide insights into the specific roles of lipid molecular species in health and disease, but would also assist in identifying the potential biomarkers for establishing preventive or therapeutic approaches for human health. The major objective of lipidomics is to link the lipid metabolites and/or lipid metabolic pathways in complex biological systems and to interpret the changes in the lipid metabolism or in the regulation of these pathways in metabolic and inflammatory diseases from a physiological and/or pathological perspective. Lipidomics is usually focused on the measurement of alterations of lipids at systemlevel indicative of disease or due to environmental perturbations or in response to diet, drugs and toxins as well as genetics [56].
Recent advancements in MS and innovations in chromatographic technologies have largely driven the advancement in lipidomics. The major biological significance of lipidomics is the achievement of the traditional lipid research in two major point: (i) how to link metabolites and/or lipid metabolic pathways in complex biological systems to individuals metabolic health; (ii) how to interpret the changes in the lipid metabolism or in the regulation of these pathways linked to metabolic and inflammatory diseases from the pathophysiological perspectives. For this reason, lipidomic investigation usually focus on the measurement of alterations of lipid at systems level indicative disease, environmental perturbations or response to diet, drug and toxins as well as genetics [56]. Often the lipid profiles in clinical investigations related to person that are in disease state or have specific genetic profiles become the basis for detection of the potential biomarkers related to disease or specific gene expression compared to control [57,58].
Usually, lipidomic analyses of given sample are performed by shotgun and/or targeted approaches depending on the question raised by researcher. Shotgun technology is an analysis of multiple lipid classes in one run where lipid extracts are infused directly into a mass spectrometer. The advantage of shotgun approach is that it enables the identification and quantification of hundreds of lipids in less than 30 min/sample, making it suitable for initial screening. Most important the shotgun approach has been demonstrated to be highly reproducible, matching suitable for good laboratory practice (GLP) requirements [59]. In targeted lipidomics, lipid extracts are primarily separated by liquid chromatography before monitoring by online MS [60]. A lipidomic approach is applicable to all therapeutic area, including cardiovascular disease, autoimmune, diabetes, neurological disease, cancer, as well as inflammatory diseases [61]. Following are the major areas in drug discovery where proteomics have become popular.
Lipid metabolic disorder or abnormalities is involved in several human diseases such as inborn disease/syndrome, coronary heart disease, brain injuries, cancer including all other discussed in last paragraph. For example, obesity is very common and most vital risk factors of heart diseases and diabetes. High level of low-density lipoproteins (LDL) and triacylglycerol and decreased level of highdensity lipoprotein (HDL) are common indicators of abdominal obesity. Therefore, monitoring of alteration in lipid metabolites in biological samples would be helpful for the identification of lipid metabolites indicative of metabolic disorders or disease. Quehenberger et al. [62] described MS-based lipidomic tools, which were developed by the LIPID MAPS Consortium [63] and used for the systematic identification and quantification of human lipidome. They presented plasma concentration of more than 500 different lipid species from six main lipid categories [64,65]. Jung et al. [66] developed highthroughput, anticipate that this toolkit will contribute to basic research, nutritional research and promote the discovery of new disease biomarkers, disease related mechanisms of actions and drug targets. Min et al. [67] used qualitative and quantitative profiling of six different categories of urinary phospholipids from patients with prostate cancer to develop an analytical method for discovery of candidate biomarker by using shotgun lipidomics. They used nanoflow chromatographyelectrospray ionization-tandem mass spectrometry and identified that one phasphatidycholine, one phosphatidylethanolamines, six phosphatidylserines and one phosphatidylinositol show significant differences between control and cancer patients.
Recently Zhou et al. [68] identified plasma lipid biomarkers for prostate cancer by using lipiodomics and bioinformatics. They used identified 15 lipid candidate marker which can classify disease and normal sample with accuracy 97.3%, which demonstrate the power of lipidomics in disease biomarker field. Drug toxicity marker analysis is another high potential area in high-throughput lipidomics. Ximelagatran, an oral thrombin inhibitor was withdrawn from market owing to increased risk of sever leaver damage with an unknown cause after Sergent et al. [69] lipidomic analysis. Based on their results, the investigators concluded that the lipid changes led to the loss of membrane integrity and leakage of cellular proteins. Their research identified distinct molar phospholipid ratios as novel biomarkers for hepatotoxity of ximelagatran drug. Recently Jänis et al. [70] reported lipid biomarkers of drug efficacy. Several proprotein convertase subtilism/kexin Type 9 (PCSK9) inhibitors are currently being developed by pharmaceutical companies because these compound have been identified to be a potent lowering drug. Lipidomic analysis human carrying a well-characterized PCSK9 loss-of-function mutation observed that PCSK9 inhibition lowered plasma concentration of certain cholesteryl easters and short chain sphingolipid species much more efficiently that did LDL cholesterol. The authors suggested that these specific lipid species could be utilized for the characterization of novel PCSK9 inhibitors and as sensitive efficacy markers of PCSK9 inhibition.
Lipid play vital role in several biological function, differential change in concentration of different lipids can be used as probes of functionality of various metabolic pathways in disease. This area is still unexplored but we believe that integration of gene expression, flux lipidomics and other omics data can play vital role target identification in future.
Large amount of proteomics and lipidomics data are now available in public domain. Omics bioinformatics is, thus, emerging as well as challenging for proteomics and lipidomics. Protein/lipid concentration changes in living biological systems reflects regulation at multiple spatial and dynamic scales, e.g., cellular biochemical reactions, intracellular trafficking of proteins and lipids, cell membrane composition change, protein biosynthesis and degradation and lipid metabolism and lipid oxidation. In order to address protein/lipid regulation, following are the steps required in bioinformatics: (a) data processing and identification, (b) statistical analysis of the data, and (c) pathway analysis.
Preprocessing of data
Specific workflow of proteomics data processing depends on the specific biological problems. Data pre-processing and identification are methods that ameliorate turning raw omics data from experiments into a final proteomics/lipidomics dataset that can be interpreted and analyzed. This may include tools for automatic data processing, identification and mining. Current proteomics dominated by MS based approaches, use direct infusion techniques of liquid chromatography coupled by MS (LC/MS). For background correction and data processing, several free and commercial softwares are available but R [71] based e.g. msProcess, PROcess and MATLAB (http://www.mathworks.com/) based e.g. Backcor are most popular. There are several freely available software for processing of mass spectrometry data.
OpenMS
OpenMS is an open source framework for LC-MS based proteomics [72]. OpenMS offers data structures and algorithms for the processing of mass spectrometry data. The library is written in C++ and it will work on all major platforms as Windows XP/7/8, Linux, MacOS. OpenMS is freely downloadable at http://open-ms.sourceforge.net/.
MZmine
MZmine 2 is improved version of popular MZmine [73] is framework for differential analysis of mass spectrometry data, is an open-source software for mass-spectrometry data processing, with the main focus on LC-MS data [74]. MZmine 2 is freely available at http://mzmine.sourceforge.net/. MZmine 2 can read and process both unit mass resolution and exact mass resolution data in both continuous and centroided modes, including fragmentation scans. Web can visualize raw data together with peak picking and identification results, which is very useful for evaluating different peak detection methods.
Peak detection in MZmine 2 is performed in a three-step manner; first mass values are detected within each spectrum. In the second step, a chromatogram is constructed for each of the mass values which span over certain time range. Finally, deconvolution algorithms are applied to each chromatogram to recognize the actual chromatographic peaks. MZmine 2 can report the quantification results in table form in comma separated value (CSV) or using charts, we can download CVS result file. There are several modules for further processing of peak detection results, including deisotoping, filtering and alignment. Peak identification can be performed by searching a custom database or by connecting to PubChem Compound database [75]. MZmine 2 also contains basic methods for statistical analysis of processed data.
OpenChrom
OpenChrom is an open source software for chromatography and mass spectrometry based on the Eclipse Rich Client Platform (RCP). Mass spectrometry data generated, for example, by GC/MS, LC/MS, HPLC-MS, ICP-MS or MALDI-MS may be imported directly, without prior conversion, for subsequent visualization and evaluation. The focus is to handle data files from different GC/MS systems and vendors. OpenChrom support of various vendor data formats, data may also be imported in common formats such as NetCDF, csv or mzXML. All data format converters are provided as separate plug-ins. OpenChrom have adaptable graphical user interface and is available for various operating systems, e.g. Windows, Linux, Solaris and Mac OS X which is freely available at https://www.openchrom.net/
ProteoWizard
The ProteoWizard Library and Tools are a set of modular and extensible open-source, cross-platform tools and software libraries that facilitate proteomics data analysis. The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations. It can read major vendor raw data format and other as mzML, mzXML, MGF etc. and convert in different file format. ProteoWizard is freely available at http://proteowizard.sourceforge.net/
XCMS
XCMS is a popular R [71] based Bioconductor [76] package developed for processing and visualization of LC-MS and GC-MS data [77]. The Xcms package reads full-scan LC/MS data from AIA/ANDI format NetCDF, mzXML, and mzData. All data to be analyzed by must be converted to one of those file formats. All NetCDF/mzXML/mzData format exported file put in same place throughout the analysis. During peak identification, Xcms uses a separate line for each sample to report the status of processing. It outputs have two numbers separated by a colon. The first number is the m/z it is currently processing, and second number is the number of peaks that have been identified so far. XCMS have several advanced tools for processing, peak detection, filling the missing data, retention time correction, analysis and visualization of results, selecting and visualizing peaks.
PrepMS
PrepMS is a simple-to-use graphical application for MS data preprocessing, peak detection, and visual data quality assessment. PrepMS is a compiled stand-alone application, which are written in MATLAB. PrepMS is freely available at http://sourceforge.net/projects/prepms/
Trans-Proteomic Pipeline (TPP)
TPP is a mature suite of tools for mass-spec (MS, MS/MS) based proteomics: statistical validation, quantitation, visualization, and converters from raw MS data to our open mzXML format. http://sourceforge.net/projects/sashimi/
Isobar
Isobar is a tool for analysis and quantitation of isobarically tagged MS/MS proteomics data. Isobar provides methods for preprocessing, normalization, and report generation for the analysis of quantitative mass spectrometry proteomics data labeled with isobaric tags, such as iTRAQ and TMT. Isobar is Bioconductor [76] package freely available at http://bioconductor.org/packages/release/bioc/html/isobar.html
Target search
This package provides a targeted pre-processing method for GCMS. TargetSearch can currently read only NetCDF files. Target scan have some advanced features as baseline correction, peak idenfication, retention index correction, normalization, library search, metabolite profiling, peak and spectra visualization. TargetSearch software is freely available at http://www.bioconductor.org/packages/release/bioc/html/TargetSearch.html
MassSpecWavelet
MassSpecWavelet is R package aimed to process MS data mainly based on Wavelet Transforms [78]. The current version only supports the peak detection based on Continuous Wavelet Transform (CWT). More functions covering baseline removal, smoothing, alignment will be added in the future versions. The algorithms have been evaluated with low resolution mass spectra (SELDI and MALDI data), we believe some of the algorithms can also be applied to other kind of spectra. MassSpecWavelet is freely available at http://bioconductor.org/packages/release/bioc/html/MassSpecWavelet.html
MetAlign
MetAlign is tool for preprocessing of LCS-MS and GC-MS data [79]. It is capable of automatic format conversions, accurate mass calculations, baseline corrections, peak-picking, saturation and mass-peak artifact filtering, as well as alignment of up to 1000 data sets. MetAlign software output is compatible with most multivariate statistics programs.
MSPtoo
Mass Spectra Preprocessing tool (MSPtool), a user-friendly versatile tool for preprocessing MS data [80]. MSPtool provides the user with a wide set of MS preprocessing steps by means of an easy-touse graphical interface. Also, this tool has been embedded in a timeseries- based framework for MS data clustering.
Other packages
There are several other R packages for mass spectrometry as msProcess, PROcess, caMassClass, FTICRMS, RProteomics, caBIG etc.
Several specialized database and software are available for lipid, peptide identification.
SEQUEST
SEQUEST is a database searching algorithm match experimental spectra with theoretical spectra which are generated from peptide sequences in silico, and then calculate scores to evaluate how well they match [81]. Then it selects a proportion of top candidate peptides based on the rank of preliminary score for cross-correlation analysis. So, for each candidate peptide identification, several scores and rankings are determined. To distinguish correct identifications from incorrect identifications, filters using a set of database searching scores are applied
Mascot
Mascot is a probability based scoring method for MS data searching, which has a number of advantages; (i) a simple rule can be used to judge whether a result is significant or not (ii) scores can be compared with those from other types of search, such as sequence homology (iii) search parameters can be readily optimized by iteration [82].
PeptideProphet
PeptideProfet is another database search tool, made on establishing statistical analysis methods to determine the possibility of positive identifications [83]. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides
ProFound, PepFrag
ProFound (http://prowl.rockefeller.edu/prowl-cgi/profound.exe) is a tool for searching a protein sequence collections with peptide mass maps. A Bayesian algorithm is used to rank the protein sequences in the database according to their probability of producing the peptide map. PepFrag (http://prowl.rockefeller.edu/prowl/pepfrag.html) is a tool for identifying proteins from a collection of sequences that matches a single tandem mass spectrum.
InsPecT
InsPecT is a tool to identify posttranslationally modified peptides from tandem mass spectra [84]. InsPecT constructs database filters that proved to be very successful in genomics searches. InsPecT uses peptide sequence tags as efficient filters that reduce the size of the database by a few orders of magnitude while retaining the correct peptide with very high probability. In addition to filtering, InsPecT also uses novel algorithms for scoring and validating in the presence of modifications, without explicit enumeration of all variants.
LIMSA
LIMSA (Lipid Mass Spectrum Analysis) is a program for quantitative analysis of mass spectra of complex lipid samples. LIMSA can do peak finding, integration, assigning, isotope correction and quantitation with internal standards. In LIMSA we can search lipids by single search or by batch analyze and summarize results. Source code of LIMSA is freely available at http://www.helsinki.fi/science/lipids/software.html.
Fatty acid analysis tool (FAAT)
FAAT is an algorithm based on Fourier transform mass spectral data analysis of from lipid extracts has been developed [85]. FAAT is Microsoft Visual Basic based rapid tool it generally takes tens second to interpret multiple. FAAT can reduce data by scaling, identifying monoisotopic ions, and assigning isotope packets. Unique features of FAAT is : (1) it can distinguished overlapping saturated and unsaturated lipid species, (2) known ions are assigned from a user defined library including species that possess methylene heterogeneity, (3) and isotopic shifts from stable isotope labeling experiments are identified and assigned. FAAT can determine abundance differences between samples grown under normal and stressed conditions.
Similar to other omics data, high dimensional proteomics and lipidomics/proteomics data needs accurate statistical analysis. Several statistical methods such as, principle components, correlation, and multivariate analysis are used commonly for getting co-regulated lipid and proteins. Various R based free packages are available for these statistical analyses. Cluster analysis provides a statistical framework to get proteins/lipids that separate different sample groups from each other, and/or co-vary in a specific study. The major goal of clustering method is to group sample, variables, or both into a homogenous group. Several freely available R based software are available for both supervised and unsupervised clustering such as MASS, PLAS-DA, AMORE, hclust, PLS, PLSR etc.
Statistical methods alone just provide information about key metabolites affected within a specific group of samples. Pathway analysis takes this information further to identify affected metabolic pathway. Such analysis proceeds by combining different omics data, as proteomics, lipidomics, transcriptomics etc. KEGG [86], LIPID-MAPS [87], human metabolite database (HMDB) [88], human proteome research database (HPRDB) [89], plasma proteome database (PPD) [90], PubChem [75], and DrugBank [91,92], ChemProt [93] provides information of global metabolic schemes, metabolites, enzymes and their respective links to drug. Time is ripe to integrate these individual components in biological network for advanced drug designing. These days, several visualization tools and plugins are available for Cytoscape, which can be used for biological network construction. Knowledge based and genome-scale pathway reconstruction methods are thus needed, which can deal with large-scale metabolites data and biochemical reactions.
There are so many tools for MS data processing and analysis, so it’s very difficult to conclude which one is best. Each software have some advantage and disadvantage, it would be better to use different software for different step of analysis rather than single one. In general it observed that MZmine 2, XCMS perform better than any other for data processing and Wavelet transform based MassSpecWavelet for peak identification. For identification, LIMSA for lipid identification and SEQUEST and PeptideProphet for peptide identification. In case of downstream analysis R based free software such as Limma, hclust, PLS, PLSR are best.
Target identification and validation involves identifying proteins, whose expression levels or activities change in disease states. These proteins may serve as potential therapeutic targets or may be used to classify patients for clinical trials. Proteomics technologies may also help in identifying protein–protein interactions that influence either the disease state or the proposed therapy. Efficient biomarkers are used to assess whether target modulation has occurred or not. They are used for the characterization of disease models and to assess the effects and mechanism of action of lead candidates in animal models. Toxicity (safety) biomarkers are used to screen compounds in pre-clinical studies for target organ toxicities and followed by their employment during clinical trials.
The use of proteomic approaches contributes significantly to our understanding of the potential biomarker, drug target identification and deconvolution, mode of action of drug molecules and mechanism of drug resistance. Chemotherapeutic drug resistance in one of major problems and advancement in proteomic approaches can play major role in cancer drug resistance in near future. With the use of sensitivity of analytical method, future research needs to focus on the use of these qualitative and quantitative proteomic/lipidomics data of cell lines on animal models as well as on humans. Similarly, now genotypic and phenotypic data of different human ethnic populations are publically available in HapMap database [94,95]. Using the information in the HapMap and other genotype and phenotype data, researchers will be able to find genes that affect health, disease [96], and individual responses to medications [97,98] and environmental factors. It is high time for the integration of these genotypic and phenotypic data with global proteomics & lipidomics for the development of better understanding of disease cause, mode of action of drug molecules, adverse and toxicity effect of drugs in the area of advanced drug designing. Fortunately we have free computational tools which can help in integration of such data e.g. MixOmics, canonical correlation analysis (CCA).
Diseases often occur in only few cells. Therefore, direct whole proteomic analysis by MS can be difficult because the biomarker signal is diluted by the presence of other components of cell. There is an urgent need for development and implication of existing statistical methods for background noise correction to extract maximum information. At present, the existing quantitative proteomics and lipidomics methods are not up to mark. An improvement in the existing methods and development of new robust methods is the need of time. Recently, single cell proteomics gives a new insight about various differentially dynamic proteins in individual cells. Cellular response to drugs is a highly dynamic process and the overall effect of drug molecules is an ensemble of proteome dynamics in individual cells, both spatially and temporally. Single cell proteomics provides a way for understanding of how seemingly identical cells show different responses to signals and drugs. It can be an immense aid in designing better and improved drug molecules.
There are so many available computational tools for lipidomics/ proteomics data analysis, improvement in these software is still needed in order to reduce the number of false positive and false negative. In recent past people try to solve these problems, but still lot more things have to do for sensitivity, specificity improvement [15]. One major problem in lipidomics/proteomics area is that each machine will provide different mass spectra for same sample, developing new robust computational algorithm which can overcome this problem and make these data comparable is still needed. Machine learning techniques have great potential to recognize pattern in complex dataset, it’s high time to utilize these techniques in lipidomics/proteomics based drug designing. Change in expression of different metabolites of various metabolic pathways in disease can be used to identify druggable target enzymes to control the pathway of interest [61]. It is often useful to integrate lipidomics and gene expressions might be useful for better understanding of multiple changes in complex pathways [16,99]. Metabolic tracing experiments (FLUX lipidomics) enables the quantitative measurement of molecular metabolism, including synthesis and degradation in real time can reveal the kinetics of individual molecules. In future we need advanced bioinformatics tools for comparative metabolomics, lipidomics [100] and pathway analysis [101]. Pathway mapping combined with gene expression analysis and flux experiments will help to revel insights into metabolism that might be future of target discovery.