Proteoform Family Identification, Quantification, and Visualization

2 minute read

Proteoform families

I created software named Proteoform Suite that automated the identification, quantification, and visualization of proteoform families, i.e. the diverse forms of a protein from the same gene. This software uses only the intact masses of these proteoforms to perform this analysis, whereas most proteoform studies use fragmentation to reveal more sequence information. We also contributed the valuable concept of the “proteoform family,” which clarifies the important relationships between proteoforms. I used this software tool to identify thousands of yeast proteoforms and quantify the abundance changes in these proteoforms in response to salt stress. This revealed expected significant changes in acetylation and phosphorylation states of the SBH1 proteoform family and many new significant changes.

Beyond my study of yeast proteoforms, Proteoform Suite has been used to show we can expand identifications with top-down mass spectrometry by around 50% in both yeast and mouse and perform label-free proteoform quantification (Schaffer, et al., Anal. Chem., 2017; Schaffer, et al., J. Proteome Res., in press.). It has also been used to reveal proteoforms in Escherichia coli (Dai, et al., J. Proteome Res., 2017). We are now applying this software to identify human proteoforms.

Beyond this work, I worked with a subcommittee of the Consortium for Top-Down Proteomics (CTDP) to develop a standard notation for proteoforms that allows the communication of a proteoform sequence with post-translational modifications at specific positions. We are now building a standard development kit to use this notation and extending it to account for ambiguity in modification localizations.

Research Papers: Cesnik, A. J.; Shortreed, M. R.; Schaffer, L. V.; Knoener, R. A.; Frey, B. L.; Scalf, M.; Solntsev, S. K.; Dai, Y. Gasch, A. P.; Smith, L. M. “Proteoform Suite: Software for Constructing, Quantifying, and Visualizing Proteoform Families.” Journal of Proteome Research, 2018 17, 568–578. LeDuc, R.*; Schwämmle, V.*; Shortreed, M. R.*; Cesnik, A. J. *; Solntsev, S. K.*; Shaw, J.*; Martin, M. J.; Vizcaíno, J. A.; Alpi, E.; Danis, P.; Kelleher, N. L.; Smith, L. M.; Ge, Y.; Agar, J. N.; Chamot-Rooke, J.; Loo, J.; Paša-Tolić, L.; Tsybin, Y. O. “ProForma: a Standard Proteoform Notation.” Journal of Proteome Research, 2018, 17, 1321–1325. *Contributed equally.

Alternative splicing and PTM identification

Each of us has thousands of variations in our genomes, alternative splicing in our transcriptomes, and post-translational modifications (PTMs) adorning our proteins that make us unique. These variations are not captured in canonical protein databases used for proteomic searches, and so they are invisible to traditional database-centric means of identifying proteins. I combined and refined methods to construct databases with these variations and modifications, and I was able to identify thousands of peptides that were unique to each of 10 human cell lines, illustrating the power of sample-specific databases to reveal variation unique to individuals, cancers, and non-human primates in another study using the software I developed (Proffitt, et al., BMC Genomics, 2017).

Research Papers: Cesnik, A. J.; Shortreed, M. R.; Sheynkman, G. M.; Frey, B. L.; Smith, L. M. “Human Proteomic Variation Revealed by Combining RNA-Seq Proteogenomics and Global Post-Translational Modification (G-PTM) Search Strategy.” Journal of Proteome Research, 2016, 15, 800–808. **American Chemical Society Editor’s Choice

Updated: