See this page online at: http://www.laboratoryfocus.com/AnovelsequencespecificalgorithmforpeptideretentionpredictioninionpairRPHPLC
Sign up for your free subscription and keep up-to-date.
Stay updated on the latest news and technologies with Bioscienceworld's newsletters.
Five to choose from.
Proteomics benefits greatly from the use of classical chromatographic techniques: reversed-phase, ion-pairing, ion-exchange and immobilized metal ion affinity chromatography. In particular, ion-pairing RP HPLC of peptides is a key component of many approaches for protein identification and characterization, enabling the separation of complex mixtures before mass spectrometry. However, RP HPLC is often regarded simply as a sample-preparation step, neglecting the valuable information that can be extracted from retention data.
The retention time of a peptide under water/acetonitrile gradient conditions depends mainly on the overall
peptide hydrophobicity. Earlier models1,2 were based on the assumption that a peptide’s hydrophobicity can be calculated as the sum of hydrophobicities of the component amino acids, but failed to account for the influence of amino acid position within peptide chain. Despite claims of superior correlation in model peptides (up to 0.99 R-squared values), they exhibited disappointing results on real proteomic samples.
Retention prediction in various LC modes is the subject of fundamental studies, allowing fast optimization of chromatographic conditions to improve peak resolution and decrease analysis time. These studies are mostly directed towards retention prediction of a limited number of sample components under various chromatographic conditions. By contrast, RP HPLC in proteomics deals with samples that carry a virtually unlimited number of components. Since complete chromatographic resolution of all analytes in such a case is impossible, peptide retention prediction in RP HPLC is mainly concerned with one question: assuming constant chromatographic conditions, how will the composition of a given peptide affect its retention time? A critical point for the development of such an algorithm is collection of the starting data set. The data used for our algorithm was derived from a library of about 3,000 tryptic peptides, separated on a 300Å pore size C18 RP column in micro-flow mode and confidently identified by off-line MALDI-QqTOF MS and MS/MS. The off-line combination of HPLC and MALDI MS was chosen due to its unique archiving capability: chromatographic fractions can be collected and stored for further analysis (Figure 1). Separation is completely decoupled from the mass spectrometry part of the analysis, so the two techniques can be optimized separately.
We chose to work with tryptic digests of known commercially available proteins, since real biological samples may contain proteins of significantly different abundances. Samples were carefully composed to avoid overloading the column. Each HPLC-MALDI run was calibrated using the digest of a standard protein. Since we used known proteins for tryptic digestion, the identity of tryptic peptides could be assigned first by mass (with 10 ppm mass accuracy) and then confirmed by MS/MS, leaving no room for false identification of peptides in our data set.
We believe that the reason for the limited applicability of previous models lies in the neglect of the ion-pairing nature of the peptide separation.3 Formic, acetic or trifluoroacetic acids are typically added as modifiers into both water/acetonitrile gradient components. Apart from providing acidic conditions, the respective anions form ion pairs with the positively charged groups of the peptide (i.e., the N-terminal amino group and the side chains of Lys, Arg, and His). These interactions make significant changes in the hydrophobic properties of the neighboring amino acids. Thus, if the peptide N-terminal region contains hydrophobic amino acids (Leu, Ile, Trp, Phe, Val, Met), the screening effect of the ion-pairing modifier decreases the apparent hydrophobicity of these residues.
On first approximation, our algorithm depends on the summation of retention coefficients of the individual amino acids as used in previous approaches. Indeed, the starting point for our optimization was the model developed in R. Hodges’s group.2 However, we introduce corrections that reflect the influence of the ion-pairing modifier and the overall distribution of positively charged and hydrophobic groups within the peptide chain. In particular, we introduce a correction that accounts for significant changes in the retention coefficients for amino acids located at the N-terminal of the peptide and close to amino acids carrying positive charge during separation at acidic pH. Other correction coefficients are also included to compensate for the influence of the peptide length, the value of the iso-electric point, the uniformity of distribution of hydrophobic residues along peptide chain, and the peptide’s propensity to form helical structures. Optimizing the model allowed us to linearize the dependence of retention time versus hydrophobicity (RT = A × (Hydrophobicity) + B) with an R-squared value about ~0.98 for our test peptide library. Slope A in this equation depends on the slope of the acetonitrile gradient and the intercept B depends on the dead volume of the system.
We call this algorithm the Sequence Specific Retention Calculator (SSRCalc).4 Its first test was monitoring the ability to predict separation of isobaric peptides. These peptides are, of course, indistinguishable by MS measurements of any accuracy, since they have the same mass. For example, m/z=564.352 IYLR (human integrin a5) and YILR (human fibronectin) peptides differ only in the order of Ile-Tyr pair. In this case, SSRCalc predicts retention times 19.8 and 21.7 min (for 0.66% acetonitrile per minute gradient) for IYLR and YILR, respectively. The actual retention times were found to be 19.9 and 20.5 min., showing the effect of transferring the more hydrophobic Ile residue from the N-terminal of the peptide. Overall, our data set contained 39 such isobaric peptide pairs and the retention order was predicted correctly in 35 cases.
A valuable application of SSRCalc is its use in supplying an additional retention time constraint to protein identification by peptide mass fingerprinting. After separation, complex protein digests may generate hundreds to thousands of peaks in a single MS run. Matching these values with theoretical protein digests in database always generates a significant number of false positive identifications. Application of retention time filter eliminates most of these false matches. Figure 2A shows the correlation obtained in a calibrating protein digest (human transferrin tryptic digest). Figure 2B shows the correlation obtained for protein identification in a yeast invertase sample. Here, a simple visual inspection is enough to assign yeast invertase and fructose 1,6-biphosphate adolase as correct, whereas calmodulin binding protein is a false positive identification.
Such examples extend our knowledge about the nature of complex chromatographic processes. And in return, a valuable new application emerges to upgrade ability HPLC-MS combination in dealing with proteomics samples.
We made the original SSRCalc algorithm available on the internet,4 and it is already applied in a number of research laboratories, as well as pharmaceutical and bioinformatics companies. Efforts are still being undertaken to improve the algorithm and widen its applicability. A new version for 100Å pore size sorbents will be available in early 2006. The algorithm will also be extended to allow for a number of chemical and post-translational protein modifications.
References
1. Meek, J. L. Proc Natl Acad Sc. U.S.A. 1980; 77, 1632-1636.
2. Guo, D., C. T. Mant, A.K. Taneja, J.M.R. Parker, R.S. Hodges J Chromatogr 1986; 359, 499-517.
3. Krokhin, O. V., R. Craig, V. Spicer, W. Ens, K.G. Standing, R.C. Beavis, J. A. Wilkins. Molecular and Cellular Proteomics 2004; 3(9), 908-919.
4. Sequence Specific Retention Calculator. Manitoba Centre for Proteomics. December 20 2005 <http://hs2.proteome.ca/SSRCalc
/SSRCalc.html>.
Oleg Krokhin, PhD and Ken Standing, PhD are members of the Manitoba Centre for Proteomics and Systems Biology (MCPSB) at the University of Manitoba (U of M) (Winnipeg, MB), and are also a research associate and professor emeritus, respectively, with U of M’s department of physics and astronomy. John Wilkins, PhD is the director of the MCPSB.