ISSN: 0974-276X
Short Communication - (2012) Volume 5, Issue 12
Among the MS-based quantitative methods using stable isotope labelling, the Isotope-Coded Protein Label (ICPL) technique has emerged as a powerful tool to identify and relatively quantify thousands of proteins within complex protein mixtures. The ICPL_ESIQuant 3.0 software package is one of the key components of the ICPL-ESI workflow, covering data processing steps like LC-MS feature detection, ICPL doublet/triplet/quadruplet quantification as well as a merging step of LC-MS features and Mascot search results. As unique features, the software performs isotope pattern overlap corrections and utilizes additional chemical knowledge, e.g. the physico-chemical properties of the ICPL labels, to discard false positive isotope pattern, which significantly improves the quality of the final peptide and protein results.
ICPL_ESIQuant is the first freeware tool on the market, which supports both the shotgun proteomics strategy using Data Dependent Acquisition (DDA) and the directed proteomics strategy using mass inclusion lists for precursor ion selection.
ICPL_ESIQuant 3.0 (32 and 64 bit versions) can be downloaded from https://sourceforge.net/projects/icplquant/ files/
An important task of most MS-based proteomics experiments is to measure a difference in abundance of proteins between two or more biological conditions.
By using the Isotope-Coded Protein Label (ICPL) approach [1,2] up to four different proteomics states can be compared with regard to their relative protein amounts. Measuring the samples using Liquid Chromatography coupled to Mass Spectrometry (LC-MS) results in hundreds and thousands of MS signals, each comprising m/z, intensity and retention time information of a certain compound ion. In order to detect and quantify peptides and proteins, many different software tools have been developed in the recent years, most of them comprising very intelligent peak picking and deisotoping algorithms [3].
Unfortunately, there is no software solution on the market which satisfactorily analyzes MS data from ICPL experiments generated on LC-ESI platforms. This is due to (a) the lack of considering physicochemical properties of the (ICPL) labels in the search algorithms (to decrease the detection rate of false positive peptide pattern), and (b) missing overlap correction of interfering isotopic pattern, which is indispensable if working with the ICPL triplex or quadruplex approach (ICPL4 and ICPL6 labels differ by only 2 Da) or if dealing with complex protein mixtures like plasma or tissue samples.
Moreover, many software pipelines are simply limited to the DDA strategy, which selects precursor ions from the MS1 data based on their abundances for further fragmentation. Only identified peptides can be quantified by using this method, while not identified (but eventually regulated) analytes were neglected. The directed proteomics strategy is based on a user-defined peptide mass selection on MS1 level using inclusion lists, which contain m/z and Retention Time (RT) information of peptides not identified yet in the first MS2 analysis [4]. An inclusion list is subjected back to the machine for a subsequent MS2 analysis. This application note describes ICPL_ESIQuant, developed to analyze LC-ESI-MS2 peptide data generated from ICPL labelled protein samples.
ICPL_ESIQuant reads LC-MS data in .mzXML raw file format [5]. LC-MS data processing is done in three main steps: Peak picking, Deisotoping and Quantification (Figure 1).
Figure 1: Overview about the LC-MS data processing steps of ICPL_ESIQuant. The example shows an ICPL quadruplex peak pattern in three dimensions (m/z, RT and intensity): 1. Peak picking recognizes peak profiles in the raw data and converts it to peak m/z centroids (single sticks); 2. Deisotoping recognizes peptide isotope pattern (including overlapping peptide pattern) and reduces each peptide pattern to its monoisotopic centroided peak; 3. Quantification integrates the chromatographic elution profile of each peptide, which results in one single intensity value represented as a single stick. In order to detect ICPL multiplets in the data, the algorithm checks the known mass distances (in Dalton) between the peptide signals as well as the isotopic effect introduced by deuterated labels ICPL4 and ICPL10 (detailed information in [11]).
To detect peaks in the raw data, the local m/z maxima as well as the lower and upper m/z interval limits for each two-dimensional (2D) peak (comprising m/z and intensity information) are determined in each MS1 scan. Each detected 2D peak is then represented by its centroid m/z value, calculated as the median of the five most intense data points of a 2D peak (Peak picking step) [6].
Since every peptide species generates a group of related 2D peaks (=isotope pattern) in a LC-MS dataset, the next algorithmic step is to recognize overlapping and non-overlapping peptide isotope pattern in each MS1 scan (deisotoping step). This is accomplished by matching the observed peptide patterns with multiple calculated isotope patterns (natural isotope pattern and overlapping variants) generated by using a Poisson distribution [7,8]. All theoretical isotope patterns were estimated by using the formula of an average aminoacid C10H16N3O3, which was gained from averaging all aminoacids from the SWISSPROT protein database. Based on a distance score (Hellinger distance) one can decide which of the multiple theoretical pattern fits best to the observed pattern [9,10] and whether the observed pattern is interfering with another pattern or not. In case of interference (e.g. in case of an ICPL4 and ICPL6 labelled peptide, differing by 2 Da), the isotopes were corrected by subtracting overlapping peak areas from each other (Supplementary Figure 1).
In addition, this approach increases the number of peptide peak pattern detected in an LC-MS experiment by discovering “hidden isotope pattern”, which arise if isotope pattern of low abundant peptides were (partially) overlaid by patterns of higher abundant peptides having similar physico-chemical LC-MS properties (Supplementary Figure 2).
Once having collected all isotope patterns in all MS1 scans, ICPL_ESIQuant generates peptide features by combining all identical monoisotopic peptide masses eluting over a continuous and limited period of time. The assembly of features into ICPL multiplets followed by their quantification was done as described in the ICPLQuant manuscript (Quantification step) [11]. It is important to mention, that the slightly earlier LC elution time of peptides marked with deuterated ICPL labels (ICPL4 and ICPL10 isotopologues) is a positive predictor of true positive ICPL triplet or quadruplet peak pattern and therefore can be used for eliminating false positives. The algorithm therefore checks all isotopologues of a ICPL multiplet for their expected retention time behaviour in the MS space (Supplementary material).
ICPL_ESIQuant uses a small software tool called “MzXML2Search.exe” [12] to extract MS2 peak lists in Mascot Generic file format (mgf). These files can be directly loaded into the Mascot search engine [13].
ICPL_ESIQuant imports Mascot result files (.dat files) and the intermediate results generated during the quantification step to assemble peptide and protein result lists in .txt file format, which are easily editable in Microsoft Excel/ Access or Open Office applications. These files can also be loaded into visualization tools like Spotfire or into statistical programs like Matlab to do further bioinformatics calculations (Figure 2).
The user can decide whether to stop the analysis workflow at this point (DDA) or to use automatically generated mass inclusion lists of not yet identified (but quantified) ICPL multiplets for a further MS2 identification. The last option can be used iteratively in order to maximize peptide and protein identification rate.
If working with the ICPL technology, ICPL_ESIQuant is the first choice to quantitatively analyse peptides and proteins measured on LC-ESI mass spectrometers for the following reasons:
(1) There is no freeware on the market, that is able to analyze multiplexed MS data generated from ICPL experiments; (2) By using .mzXML as file input format, the software is compatible with the most common MS instruments; (3) ICPL_ESIQuant considers overlapping isotope pattern, which is essential for an accurate quantification of ICPL triplex and quadruplex labelled samples and the handling of complex MS data in general; (4) ICPL_ESIQuant takes advantage of the physico-chemical properties of the ICPL labels in its algorithm, significantly improving the true positive rate (and hence decreasing the false positive rate!) of detected ICPL multiplet pattern; (5) ICPL_ESIQuant supports both the shotgun proteomics strategy using DDA and the use of mass inclusion lists, enabling the user to preselect biological interesting peptides for further MS2 identifications.
ICPL_ESIQuant has already been successfully applied to proteomics experiments in the field of cancer biomarker discovery and anti-diabetic hemorphins [14,15].
This project was funded by the BMBF (German Federal Ministry for Education and Research) Grants FKZ 031U101 and FKZ 031U201.
The authors have declared no conflict of interest.