Accessible single-cell proteomics

Recently single-cell mass-spectrometry analysis has allowed quantifying thousands of proteins in single mammalian cells. Yet, these technologies have been adopted in relatively few mass-spectrometry laboratories. Increasing their adoption can help reveal biochemical mechanisms that underpin health and disease, and it requires robust methods that can be widely deployed in any mass spectrometry laboratory.    

This aim for a “model T” single-cell proteomics has been the guiding philosophy in the development of Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) and its version 2 (SCoPE2). We aimed to make every step easy to reproduce, from sample preparation and experimental parameters optimization to an open source data analysis pipeline. The emphasis has been on accuracy and accessibility, which has facilitated replication (video) and adoption of SCoPE2. Yet, we still found that some groups adopting these single-cell technologies fail to make quantitatively accurate protein measurements, because they skip important quality control steps of sample preparation (such as negative controls and labeling efficiency), and mass spectrometry analysis, such as apex sampling and purity of MS2 spectra. 

These observations motivated us to write a detailed protocol for multiplexed single-cell proteomics. The protocol emphasizes quality controls that are required for accurately quantifying protein abundance in single cells and scaling up the analysis to thousands of single cells. The protocol and its associated video and web resources should make single-cell proteomics accessible to the wider research community.

Label-free single-cell proteomics

Recently, Matthias Mann and colleagues published a preprint (doi: 10.1101/2020.12.22.423933v1) reporting a label-free mass-spectrometry method for single-cell proteomics. Many colleagues asked me what I think about the preprint, and I summarized a few comments in the peer review below. I did not examine all aspects of the work, but I hope my comments are useful:

Dear Matthias and colleagues,

I found your preprint interesting, especially as it focuses on an area that recently has received much attention. Methods for single-cell protein analysis by label-free mass-spectrometry have made significant gains over the last few years, and the method that you report looks promising. Below, I suggest how it might be improved further and benchmarked more rigorously.

To analyze single Hela cells, you combined the recently reported diaPASEF method with Evosep LC and timsTOF improvements developed in collaboration with Bruker. This is a logical next step and sounds like a good approach for label-free MS. The method quantifies about 1000 proteins per Hela cell, a coverage comparable to published DDA label-free methods (doi: 10.1039/D0SC03636F) and reported by the Aebersold group for a DIA method performed on a Lumos instrument (data presented at the third Single-Cell Proteomics Conference). This is a good coverage, though given the advantages of diaPASEF and the timsTOF improvements, there is potential for even better performance. I look forward to exploring the raw data.

The major advantage of your label-free MS approach is its speed. It is faster than previously reported label-free single-cell proteomics methods, which allowed you to analyze over 400 single Hela cells, generating the largest label-free dataset to date. This increased speed is a major advance for label-free single-cell proteomics. The speed (and thus throughput) can be increased further based on multiplexing using the isobaric carrier approach.  

You combine Hela data from single-cell MS analysis with Hela data from two scRNA-seq methods. This is good, and I think such joint analysis of protein and RNA should be an integral part of analyzing single-cell MS proteomics data. The results shown in Fig. 5A,B are straightforward to interpret and indicate that your method compares favorably to scRNA-seq in terms of reproducibility and missing data. The interpretation of Fig. 5A, B is more confounded by systematic biases. Both mass-spec and sequencing have significant biases, such as sequence-specific biases and peptide-specific ionization propensities. These biases contribute to estimates of absolute abundances (doi: 10.1038/nmeth.2031, 10.1038/nbt.2957) and might contribute to the variance captured by PC2 in Fig. 5C, and thus may alter your conclusion.



I have possible suggestions:  

— Benchmark the accuracy of relative quantification. Ideally, this can be done by measuring protein abundance in single cells by an independent method (such as fluorescent proteins measured by a FACS sorter) and comparing the measurements to the MS estimates. You may possibly choose other methods, such as spiked in protein/peptide standards. Benchmarks of accuracy (rather than merely reproducibility) would strengthen your study. 

— Order the unperturbed Hela cells by the cell division cycle (CDC) phase and display the abundances of the periodic proteins.

— Provide more discussion positioning your work in the context of the field and other approaches, in terms of technology, depth of coverage, throughput, and biological applications.


Nikolai Slavov
slavovlab.net

The cost of omics

Next generation DNA sequencing is ubiquitously integrated in modern biomedical research while mass-spectrometry proteomics remains less ubiquitous. In fact, mass-spectrometry proteomics is conspicuously missing from projects that desperately need it.

Why is DNA sequencing better integrated with biomedical research? This question comes up often in my conversations with colleagues. A commonly suggested answer is the difference in cost. I am not convinced by this answer, so I decided to evaluate it with a bit more rigour than in my usual casual conversation. The metric of merit for the comparison will be the cost of quantifying 10,000 genes in a sample at the transcriptome and the proteome level.

This simple metric has many dimensions, such as quantification of proteforms and transcript isoforms, that are beyond the scope of my comparison. Also, both the RNA and protein analysis might use different analytical methods and the cost will vary somewhat between methods. So, my estimates will be based on high-quality economic options and on representatives facility fees charged in Boston, MA.

Proteomics

A good and economic option for quantifying >10,000 proteins in a sample is TMT 16-plex with offline fractionation and DDA analysis, which needs about 1-2 hours of instrument time per sample. Another up and coming option is label-free DIA analysis, which also needs about 1-2 hours of instrument time per sample, but at present will struggle to quantify 10,000 proteins / sample. With facility fees of about 100 – 200 USD / hour, the cost of analysis is about 200 – 400 USD / sample. This cost does not include sample preparation, which is relatively simple and almost any biology lab can perform in house. The reagents for sample prep are less than 100 USD, but I will include 100 USD to the cost of instrument time (also including the cost for offline fractionation). So the final estimate is 300 – 500 USD / sample.

 

Transcriptomics

For RNA sequencing, I will base the estimate on using Illumina NextSeq 500 Next Generation Sequencing at 150 Cycles paired-end reads. The cost for a run is about 3000 – 4000 USD, which can provides 200 – 400 millions reads. Depending on the number of reads per sample, one can analyze different number of samples. For RNA data quality comparable with the quality of the mass-spec data, I will assume that we need 15 million reads per sample and that we can analyse about 20 samples per run. Fewer reads per sample can reduce the cost while still providing usable, albeit less quantitative, data. Again, sample prep can be performed in house for much less than 100 USD / sample and is more expensive if performed at a facility. So the final cost estimate is lower but comparable to the one as for mass-spec proteomics, about 250 – 350 USD / sample.

Cost of transcriptomics and proteomics

Cost of transcriptomics and proteomics analysis of a sample.

Data analysis

The above estimates considered only the cost of data generation while the cost of analysis (human hours) is the dominant expense for many studies. The analysis cost is quite similar for DNA sequencing and mass-spectrometry data, and more variable depending on the number of samples and analysis aims.

Why the difference?

The above rough estimates suggest that the cost may not be the main reason why research projects that need mass-spectrometry do not use it. If not cost, then what? These are my main hypotheses.

  • Availability of service: While good mass-spec labs can quantify > 10,000 proteins as sketched above, relatively few facilities can accomplish that. Finding a DNA sequencing facility is easier than finding a mass-spec facility that can perform the analysis outlined above.
  •  Knowledge: Many biologists are familiar with next generation sequencing and its capabilities while fewer are familiar with mass-spectrometry. This is related both to the time of maturation of the technologies and sociological factors. So, I am passionate about making mass-spec technology accessible and about teaching its basic concepts.

What do you think? Why the difference?

 

Discussion from Twitter: 

When do you need single-cell analysis?

Single-cell analysis is trendy for good reasons: It has enabled asking and answering important questions. Of course, the substantive reasons are surrounded by much hype. Sometimes colleagues tell me they want to add single-cell RNA-seq analysis since it will help them publish their paper in a more prestigious journal, and sadly there is perhaps more truth to that than I want to believe.

On the other end of the spectrum, some colleagues from the mass-spec community are puzzled by our efforts to develop methods for single-cell mass-spec analysis: At HUPO, I have been repeatedly asked: “Why analyze single cells when you can identify more peptides in bulk samples?”

So, when do we need single-cell analysis? Can’t we just FACS sort cells based on markers and analyze the sorted cells? Indeed, that maybe a good strategy when the cells we analyze fall into relatively homogenous clusters (they will never be perfectly homogeneous) and we have a reliable marker for each cluster. If these assumptions hold, the averaging out of differences between individual cells will give us very useful coarse graining. Unfortunately, bulk analysis of the sorted cells cannot validate the assumption of homogeneity. For example, we can easily sort B-cells and T-cells from blood samples because we have well-defined markers for each cell type. However, the bulk analysis of the sorted cells will not provide any information on the homogeneity of the sorted T-cells. Yet, a wealth of single-cell analysis has demonstrated the existence of multiple states within T-cell subpopulations, states for which we rarely have well-defined markers allowing efficient FACS sorting and follow up bulk analysis.

FACS sorting is especially inadequate when the cell heterogeneity is not easily captured by discrete subpopulations / clusters of cells. For example the continuous gradient of macrophage states that we recently observed in our SCoPE2 data:  

To explore the heterogeneity within the macrophage-like cells, we sorted them based on the Laplacian vector. See Specht et al., 2019 for details.

In some cases, e.g., analysis of small clonal populations, the benefits of single-cell analysis may be too small to justify the increased cost. Sometimes, we can gain single-cell information from analyzing small groups of cells, e.g., Shaffer et al., 2018. Sometimes, nobody can be sure if single-cell analysis is needed. If we assume it’s needed and perform it, the data can refute our assumption and show us that there is no much heterogeneity, at least at the level of what we could measure. If we assume that there is no heterogeneity and thus no need for single cell analysis, e.g., FACS sort T-cells, the bulk analysis of the sorted cells will not correct our assumption. We can feel the assumption is validated while being blinded to what might be the most meaningful cellular diversity in the system. So, single-cell analysis is not always needed, but it is much better at correcting our assumptions and teaching us if it is needed or not. 

Direct causal mechanisms

Understanding biological systems: In search of direct causal mechanisms

The advent of DNA-microarrays spurred a vigorous effort to reverse engineer biological networks. Recently, these efforts have been reinvigorated by the availability of RNA-seq data from perturbed and unperturbed single cells. In the talk below, I discuss the opportunities and limitations of using such data for inferring networks of direct causal interactions, with emphasis on the distinctions between models based on direct and indirect interactions. This discussion motivates the need to model proteins since most biological interactions involve proteins. Then I introduce key ideas and technological capabilities of high-throughput single-cell proteomics methods that we have developed and will focus on the opportunities of using such data for inferring direct causal mechanisms in biological systems.

Some of the ideas that I discuss above involve doing single-cell mass-spec measurements with SCoPE-MS. Thus, if you do not have strong background in quantitative mass-spec, you may want first to learn some of the key ideas that make SCoPE-MS possible from this primer by Harrison Specht. Below is its summary.

Quantifying proteins by mass-spec

Mass spectrometry-based proteomics is a suite of high-throughput and sensitive approaches for identifying and quantifying proteins in biological samples. These methods allow for quantifying >10,000 proteins in bulk samples. However, these techniques have not yet been widely applied to single cells despite the fact that modern mass spectrometers can detect single ions. To explain why, this primer talk will explore core concepts of mass spectrometry-based proteomics with emphasis on developing intuition for the physical processes underpinning peptide sequencing and quantification. In particular, I will cover what is called “shotgun” or “discovery” proteomics using isobaric barcoding, a technology used by Single Cell Proteomics by Mass Spectrometry (SCoPE-MS). The primer talk will outline the obstacles that have limited the broad application of quantitative mass spectrometry to single-cell analysis and how SCoPE-MS overcomes these obstacles to enable profiling thousands of proteins across thousands of single cells.

Single-cell analysis

single-cell analysis

Imaging is the most widely used method for single-cell analysis

The success of imaging technologies

The molecular and functional differences among the cells making our bodies have been appreciated for many decades. Yet, the tools to study them were very limited. In the last couple of decades, we have began developing increasingly powerful technologies for molecular single-cell measurements. Currently, the most widely used high-throughput methods for molecular single-cell analysis have two things in common: (1) they quantify nucleic acids and 2) they are based on imagining. The imaging can be done in situ (e.g., fluorescent in situ hybridization, FISH) or in vitro (e.g., single-cell RNA-seq based on next gen DNA sequencing). Imaging has been applied to single-cell protein analysis as well, though most applications have been hampered by their dependance on antibodies. A recent break away from this antibody-dependance is the single-molecule Edman degradation developed by the group of Edward Marcotte. If this is developed further, imaging could become a workhorse for single-cell protein analysis as well.

Emerging mass-spec methods

Efforts to apply mass-spectrometry to single-cell analysis started in the 1990s. As comprehensively reviewed by Rubakhin et al., these efforts focused on ionizing biological molecules via Secondary Ion MS (SIMS) or via Matrix Assisted Laser Desorption/Ionization (MALDI). These methods allow to ionize biological molecules with minimal processing and losses but remain rather limited in their quantification accuracy and in identifying the chemical composition of the analyzed ions. In contrast, the methods that afford robust high-throughput identification (based on analyte separation and tandem MS analysis, e.g., LC-MS/MS or CE-MS/MS) have been very challenging to apply to small samples. Still, the typical mammalian cell contains thousands of metabolites and proteins whose abundance is much higher than the sensitivity of mass-spec instruments. Based on this realization, we outlined directions for multiplexed analysis of single cells by LC-MS/MS that can enable quantifying thousands of proteins across many thousands of single cells. We recently published a proof of principle that has been superseded by a higher throughput single-cell proteomics method. These initial steps need much further developments, both experimental and computational, before they reach the transformative potential that single-cell mass-spec could have.

 

Understanding biology

Single-cell analysis is not merely about measurements. It’s about understanding them. Our progress in understanding single-cell data has been limited, even for the data coming from the more mature technologies. Conceptual progress has been much slower than technological progress. So, how do we make sense of the data?

I will reserve my musings on this question for a forthcoming post. For now, I’ll just say that I like an idea articulated by Munsky et al., 2012 and Padovan-Merhar and Raj, 2013: Using the variability between single cells as a natural perturbation for studying gene regulation. I think that this approach can be a very powerful. More thoughts on that coming soon.

 

 

 

Single-cell proteomics

Ever since my lab posted the SCoPE-MS preprint, I have been repeatedly asked about the future potential and the cost of quantifying proteins by high-throughput mass-spectrometry in single cells. I will summarize a few thoughts that hopefully will be helpful and will reduce email traffic.

Why quantify proteins and PTMs in single cells?

Single-cell RNA-seq has made great strides and become widely available and preferred method for high-throughput single-cell measurements. That is great! These measurements are very useful and their usefulness will continue to grow as we invent new ways to think about these data and reduce their noise. Yet, measuring transcript levels alone is insufficient for studying and understating many physiological and pathological processes, not least because the changes of protein levels human across tissues and cell differentiation are poorly predicted by the corresponding changes in mRNA levels:

 

The usefulness of mRNA levels as surrogate for signaling activing by post-translational modifications (PTMs, e.g., phosphorylation, ubiquitylation) is even more limited.

 

What is the history of single-cell proteomics by mass-spectrometry?

Quantifying proteins in single cells directly, without relying on antibodies, has been a long standing aim and dream for many scientists. There are over a dozen reports for doing so over the last decade but they all have used cells that have over 1000 fold larger volume than the typical mammalian cell, e.g., muscle cells and oocytes, and quantified only a few proteins in a few cells. To my knowledge, SCoPE-MS is the first method to quantify over a thousand proteins across hundreds of mammalian cells with typical cell sizes, i.e., diameter of 10 – 15μm.

 

How expensive is it to do SCoPE-MS?

This questions comes up frequently. The answer depends a lot on what we factor into the price. If you own a suitable high-resolution MS instrument/system, the current cost is about a dollar per cell but very soon that will drop significantly; stay tuned for our next preprint. If you do not own a suitable high-resolution MS instrument, the price depends on the service charges of your prefered MS facility. The cost for a suitable instrument ranges from ~ 100k (low end refurbished instrument) to ~ 700k (the high end benchtop instruments on the market, new).

 

How easy is it to do SCoPE-MS?

For our lab, quite easy. I am proud of the fact that SCoPE-MS is enabled by a simple idea and not by access to the newest corporate technology with limited accessibility. We used an old, low-end instrument for developing SCoPE-MS. We are writing up more detailed protocols and hoping to release a robust data processing pipeline soon. Anyways, there is nothing particularly tricky in the method, and I expect that any good lab should be able to quantify single cell proteomes by SCoPE-MS.

 

How noisy are the data?

As all methods using tandem mass tags, SCoPE-MS measurements are affected by coisolation interferences, which means that about 5 – 10 % of the reporter ion signal for a typical peptide comes from other peptides. This undesirable contribution can be reduced by using newer instruments with better mass-filters that allow for smaller ion isolation windows. It can also be reduced by simply filtering out peptides with more co-isolation and focusing on those with very limited coisolation or by computationally compensating for it.

There is of course also nonsystematic (random) noise. In our current data (Supplemental Figure 2c), the reliability of the measurements for the proteins with the smallest fold changes is over 50 % and for those with the largest fold changes, about 80 %. The reliability is higher for data acquired on the new instruments that use high-quality quadrupole mass-filters, i.e, Q-exactive Orbitraps.

 

Can you measure post-translational modifications (PTMs)?

Yes, we can. Stay tuned for the preprint.

 

What is the future potential for building up on SCoPE-MS?

That is my favorite question! We have outlined ideas and technologies that can advance single cell proteomics methods by several orders of magnitude. In short:

  • Throughput: The throughput will grow as we increase the number of mass-tags. These should go up to 16 in the fall. As the demand for single cell proteomics increases, thermo or the community will come up with much higher plex. Since MS does the measurements on groups of identical ions (not individual molecules as in the case of next-generation DNA sequencing), higher multiplex will increase the number of quantified samples without affecting the depth of coverage. The higher multiplex will also reduce the need for the career channel, first reduce the number of carrier cells and ultimately eliminate the need for them.
  • Accuracy: SCoPE-MS does minimal processing of the samples and the measurement is based on hundreds, even thousands of ions for the quantification of each peptide in each cell. There are no fundamental limits to achieving very high accuracy. Since proteins are much more abundant than mRNAs (on average over 1000 protein molecules per each mRNA), the counting of low copy number molecules or ions is much less problematic compared to single cell RNA sequencing. As we improve our ability to deliver and capture all ions, we should be able to measure even the least abundant proteins and expand the depth of coverage tremendously. This is not just a distant promise. I think it is an imminent possibility.

CSHL Meeting: Single Cell Analyses 2017

 

Increasingly direct evidence

The results in our Cell report are particularly satisfying to me since they bring clarity to a puzzle that I have pursued for almost a decade. The puzzle started with an observation that I made while a graduate student in the Botstein laboratory at Princeton University.

As growth rate increases, RPs are transcriptionally induced to varying degrees; some are even repressed.

I studied the transcriptional responses of yeast cells growing across a wide range of growth rates. These data allowed us to evaluate a suggestion that Ole Maaløe had proposed for bacteria over 30 years earlier: cells growing faster should induce the transcription of ribosomal proteins since they need to make more ribosomes that can meet the increased demands for protein synthesis. While most mRNAs coding for ribosomal proteins (RPs) exhibited this logical trend (their levels increased with the growth rate), others did not. The RP transcript levels that deviated from the expectation were reproducible across biological replicas and even across different nutrient limitations used to control the cell growth rate. Furthermore, the number of the RP transcripts defying the expectations was even larger when I grew the yeast cells on ethanol carbon source. I also observed uncorrelated variability in RP transcripts across human cancers, but this observation was based on public data without biological replicates and with many confounding factors.

My observations of differential RP transcriptional induction puzzled me deeply. According to the decades-old model of the ribosomes, each ribosome has exactly one copy of each core RP. Thus, the simplest mechanism for making more ribosomes is to induce the transcription of each RP by the same amount, not to induce some RPs and repress others. Still, biology often defies simplistic expectations; one can easily imagine that RP levels are controlled mostly post-transcriptionally. Transcript levels for RPs were enough to pick my curiosity but ultimately too indirect to serve as evidence for the protein composition of the ribosomes. Thus, I neglected the large differences in RP transcriptional responses and interpreted our data with the satisfyingly simple framework suggested by Ole Maaløe. Many other research groups have also reported differential transcription of RP genes but these observations have the same limitations as my transcriptional data.  The puzzle remained latent in my mind until years later I quantified the yeast proteome by mass-spectrometry as part of investigating trade-offs of aerobic glycolysis. This time, the clue for altered protein composition of the ribosomes was at the level of the ribosomal proteins, not their transcripts. While still indirect and inconclusive, I found this observation compelling. It motivated me to design experiments specifically aiming to find out if the protein composition of the ribosome can vary within a cell and across growth conditions.

The data from these experiments showed that unperturbed cells build ribosomes with different protein compositions that depend both on the number of ribosomes bound per mRNA and on the growth conditions. I find this an exciting result because it opens the door to conceptual questions such as: What is the extent, scope and specificity of ribosome-mediated translational regulation? What are the advantages of regulating gene expression by modulating the ribosomal composition as compared to the other layers of gene regulation, from histone modifications through RNA processing to protein degradation? Do altered ribosomal compositions offer tradeoffs, such as higher translational accuracy at the expense of lower translation-elongation rate via more kinetic proofreading? Some of these question may (hopefully will) reveal general principles. These questions are fascinating to speculate about but they can also be answered by direct measurements. Designing experiments that can rigorously explore and discriminate among different conceptual models should be a lot of fun!

 

CSHL Translational Control Meeting 2016