Accessible single-cell proteomics

Recently single-cell mass-spectrometry analysis has allowed quantifying thousands of proteins in single mammalian cells. Yet, these technologies have been adopted in relatively few mass-spectrometry laboratories. Increasing their adoption can help reveal biochemical mechanisms that underpin health and disease, and it requires robust methods that can be widely deployed in any mass spectrometry laboratory.    

This aim for a “model T” single-cell proteomics has been the guiding philosophy in the development of Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) and its version 2 (SCoPE2). We aimed to make every step easy to reproduce, from sample preparation and experimental parameters optimization to an open source data analysis pipeline. The emphasis has been on accuracy and accessibility, which has facilitated replication (video) and adoption of SCoPE2. Yet, we still found that some groups adopting these single-cell technologies fail to make quantitatively accurate protein measurements, because they skip important quality control steps of sample preparation (such as negative controls and labeling efficiency), and mass spectrometry analysis, such as apex sampling and purity of MS2 spectra. 

These observations motivated us to write a detailed protocol for multiplexed single-cell proteomics. The protocol emphasizes quality controls that are required for accurately quantifying protein abundance in single cells and scaling up the analysis to thousands of single cells. The protocol and its associated video and web resources should make single-cell proteomics accessible to the wider research community.

Label-free single-cell proteomics

Recently, Matthias Mann and colleagues published a preprint (doi: 10.1101/2020.12.22.423933v1) reporting a label-free mass-spectrometry method for single-cell proteomics. Many colleagues asked me what I think about the preprint, and I summarized a few comments in the peer review below. I did not examine all aspects of the work, but I hope my comments are useful:

Dear Matthias and colleagues,

I found your preprint interesting, especially as it focuses on an area that recently has received much attention. Methods for single-cell protein analysis by label-free mass-spectrometry have made significant gains over the last few years, and the method that you report looks promising. Below, I suggest how it might be improved further and benchmarked more rigorously.

To analyze single Hela cells, you combined the recently reported diaPASEF method with Evosep LC and timsTOF improvements developed in collaboration with Bruker. This is a logical next step and sounds like a good approach for label-free MS. The method quantifies about 1000 proteins per Hela cell, a coverage comparable to published DDA label-free methods (doi: 10.1039/D0SC03636F) and reported by the Aebersold group for a DIA method performed on a Lumos instrument (data presented at the third Single-Cell Proteomics Conference). This is a good coverage, though given the advantages of diaPASEF and the timsTOF improvements, there is potential for even better performance. I look forward to exploring the raw data.

The major advantage of your label-free MS approach is its speed. It is faster than previously reported label-free single-cell proteomics methods, which allowed you to analyze over 400 single Hela cells, generating the largest label-free dataset to date. This increased speed is a major advance for label-free single-cell proteomics. The speed (and thus throughput) can be increased further based on multiplexing using the isobaric carrier approach.  

You combine Hela data from single-cell MS analysis with Hela data from two scRNA-seq methods. This is good, and I think such joint analysis of protein and RNA should be an integral part of analyzing single-cell MS proteomics data. The results shown in Fig. 5A,B are straightforward to interpret and indicate that your method compares favorably to scRNA-seq in terms of reproducibility and missing data. The interpretation of Fig. 5A, B is more confounded by systematic biases. Both mass-spec and sequencing have significant biases, such as sequence-specific biases and peptide-specific ionization propensities. These biases contribute to estimates of absolute abundances (doi: 10.1038/nmeth.2031, 10.1038/nbt.2957) and might contribute to the variance captured by PC2 in Fig. 5C, and thus may alter your conclusion.



I have possible suggestions:  

— Benchmark the accuracy of relative quantification. Ideally, this can be done by measuring protein abundance in single cells by an independent method (such as fluorescent proteins measured by a FACS sorter) and comparing the measurements to the MS estimates. You may possibly choose other methods, such as spiked in protein/peptide standards. Benchmarks of accuracy (rather than merely reproducibility) would strengthen your study. 

— Order the unperturbed Hela cells by the cell division cycle (CDC) phase and display the abundances of the periodic proteins.

— Provide more discussion positioning your work in the context of the field and other approaches, in terms of technology, depth of coverage, throughput, and biological applications.


Nikolai Slavov
slavovlab.net

My experience with elite journals


Over the last 15 years, my opinion about the significance of publishing in elite journals has evolved considerably. Below are some of main phases and the factors that have shaped my opinion:

  • As a beginning PhD student, I took several classes based on discussing primary research papers. Many of these papers, especially in my biochemistry course, reported milestone results and were published in elite journals. This created the impression that a disproportionate number of influential papers are published in elite journals and elite journals are the place where one finds such papers.
  • Later in my PhD, I became an expert in a small field. The two best known papers in that field were published in Science, and I was as confident as I could be that these papers are deeply problematic and with incorrect interpretations. (I still believe that, and I think that Science is a good journal) Being well known and influential, these papers stymied further developments in the field. It did not help that the senior author of these papers extended a death threat to me.
  • Simultaneously, my PhD mentor (David Botstein) expressed strong misgivings towards elite journals and even declined being a senior author for one of my papers if we submit it to Nature or Science. I have always had deep respect for David (he is a brilliant scientist), and his opinion was very influential for me. Given that he was very accomplished and successful based on any metric of merit, I became convinced that one does not need to publish in elite journals to be successful in academic research. (David has published many influential milestone papers but only a small fraction of them are published in Nature or Science.)

During my postdoctoral and PI years, my opinion about elite journals grew increasingly complex and nuanced. Here is what I think now:

  • Elite journals can significantly promote papers in the short term and make a big difference for papers that do not stand out on their own.
  • The establishment generally resists innovation. For various reasons (mostly not related to the editors), papers reporting very original results seem to encounter much more resistance for the limited space in elite journals. This may be frustrating for their authors, but it is much less frustrating once we realize that these are the papers that need advertisement the least.
  • Ultimately, I aim to publish our papers in good journals that are read by colleagues interested in our results and have broad visibility. Yet, my opinion continues to evolve: I see that our results are read, cited, and reproduced even when shared via preprints, and the importance of the publication venue is declining in my opinion.

Promoting (your) research

“I am a scientist focused on conducting research, not on promoting it.” This thinking strongly resonated with me when I was a PhD student. If it resonates with you, read on to learn why and how you should promote (your) research.

Common approaches to promoting your research include aiming to publish your papers in elite journals (they invest in advertising and some have dedicated professionals focusing on advertising) and attending conferences. I will not focus on these approaches because they are commonly known and often used more than I think is useful. Rather, I prefer more thoughtful and organic approaches outlined below.

The first (and my favorite) approach is clear and compelling communication. If you make important discoveries but fail to communicate them clearly and compellingly, you may be the only one (or one of the few people) knowing they are important. Such results are unlikely to drive scientific progress. Clear communication, means clear logic without hype, vague phrases and unnecessary adjectives. It means good framing with the relevant background needed to understand the questions and approaches without extraneous clutter or meanderings into tangential but not essential discussions.

The second and related approach is to communicate your results to the communities interested in them, which includes presenting in relevant conferences. It also includes scientific social networks. Since the most prominent of those is twitter, I will make a few suggestions with twitter in mind:

  • When I tweet about a paper of a colleague, I think of the tweet as a mini (and thus very limited) peer review that highlights substantive findings or element of a paper. The format does not allow rigorously scholarly treatment, but it does allow pointing to something specific that you genuinely think is exciting. If you tell me you are excited about sharing your paper or publishing it in a particular journal, I do not learn new scientific information. Make your tweet as informative about the science as you can.
  • Promote all good work that you come across. This includes your work, but also the work of your colleagues more broadly. I think of it as a good service to the scientific community.
  • I particularly like highlighting research that otherwise may not get noticed. Papers that are promoted by a sophisticated advertisement system don’t need my help as much.

As you can tell from these brief remarks, my definition of promoting research is enhancing its communication, both the formal and rigorous description of the research itself and in providing thoughtful and informative comments that will attract the attention to the formal description. I think such communication is an important component of the scientific ecosystem, and I strongly encourage all students to participate in it. It helps you, and it helps your scientific community.

Bibliography for NIH proposals

NIH requires adding PMCID to references cited in proposals, and PMCID may be difficult to add because many reference libraries and citation styles omit them. As a result, some colleagues add them manually, which is rather time consuming. To avoid this drudgery, I wrote a couple of scripts available from this GitHub repository that allow adding PMCID automatically to proposals using LaTeX/BibTeX/Biber. The scripts use the paper’s DOI to find the corresponding PMCID and then add both types of IDs to each reference that has them.

Scripts and usage

  • The script get_doi.pl extracts digital object identifiers (DOIs) from the bbl file. Run it as: perl get_doi.pl Your_bbl_file.bbl. It will output the DOIs in a file named DOIs_Your_bbl_file.bbl
  • Take the DOIs from DOIs_Your_bbl_file.bbl and input them into https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/ to extract in batch all corresponding PMCIDs. Save the output as csv file in ids.csv, which is an option of the website.
  • The script add_PMCID.pl adds PMCIDs to the bbl file. Run it as: perl add_PMCID.pl ids.csv Your_bbl_file.bbl. This will output _Your_bbl_file.bbl file that should contain the PMCID added at the end of each reference that had a DOI with a corresponding PMCID.

Working example

Customization

  • This strategy requires that your bibliography outputs DOIs in the bbl file, so any modifications should preserve this feature.
  • Depending on the formatting of the DOIs output in your bbl file, you may need to change the regular expressions in the perl scripts a bit.

The cost of omics

Next generation DNA sequencing is ubiquitously integrated in modern biomedical research while mass-spectrometry proteomics remains less ubiquitous. In fact, mass-spectrometry proteomics is conspicuously missing from projects that desperately need it.

Why is DNA sequencing better integrated with biomedical research? This question comes up often in my conversations with colleagues. A commonly suggested answer is the difference in cost. I am not convinced by this answer, so I decided to evaluate it with a bit more rigour than in my usual casual conversation. The metric of merit for the comparison will be the cost of quantifying 10,000 genes in a sample at the transcriptome and the proteome level.

This simple metric has many dimensions, such as quantification of proteforms and transcript isoforms, that are beyond the scope of my comparison. Also, both the RNA and protein analysis might use different analytical methods and the cost will vary somewhat between methods. So, my estimates will be based on high-quality economic options and on representatives facility fees charged in Boston, MA.

Proteomics

A good and economic option for quantifying >10,000 proteins in a sample is TMT 16-plex with offline fractionation and DDA analysis, which needs about 1-2 hours of instrument time per sample. Another up and coming option is label-free DIA analysis, which also needs about 1-2 hours of instrument time per sample, but at present will struggle to quantify 10,000 proteins / sample. With facility fees of about 100 – 200 USD / hour, the cost of analysis is about 200 – 400 USD / sample. This cost does not include sample preparation, which is relatively simple and almost any biology lab can perform in house. The reagents for sample prep are less than 100 USD, but I will include 100 USD to the cost of instrument time (also including the cost for offline fractionation). So the final estimate is 300 – 500 USD / sample.

 

Transcriptomics

For RNA sequencing, I will base the estimate on using Illumina NextSeq 500 Next Generation Sequencing at 150 Cycles paired-end reads. The cost for a run is about 3000 – 4000 USD, which can provides 200 – 400 millions reads. Depending on the number of reads per sample, one can analyze different number of samples. For RNA data quality comparable with the quality of the mass-spec data, I will assume that we need 15 million reads per sample and that we can analyse about 20 samples per run. Fewer reads per sample can reduce the cost while still providing usable, albeit less quantitative, data. Again, sample prep can be performed in house for much less than 100 USD / sample and is more expensive if performed at a facility. So the final cost estimate is lower but comparable to the one as for mass-spec proteomics, about 250 – 350 USD / sample.

Cost of transcriptomics and proteomics

Cost of transcriptomics and proteomics analysis of a sample.

Data analysis

The above estimates considered only the cost of data generation while the cost of analysis (human hours) is the dominant expense for many studies. The analysis cost is quite similar for DNA sequencing and mass-spectrometry data, and more variable depending on the number of samples and analysis aims.

Why the difference?

The above rough estimates suggest that the cost may not be the main reason why research projects that need mass-spectrometry do not use it. If not cost, then what? These are my main hypotheses.

  • Availability of service: While good mass-spec labs can quantify > 10,000 proteins as sketched above, relatively few facilities can accomplish that. Finding a DNA sequencing facility is easier than finding a mass-spec facility that can perform the analysis outlined above.
  •  Knowledge: Many biologists are familiar with next generation sequencing and its capabilities while fewer are familiar with mass-spectrometry. This is related both to the time of maturation of the technologies and sociological factors. So, I am passionate about making mass-spec technology accessible and about teaching its basic concepts.

What do you think? Why the difference?

 

Discussion from Twitter: 

Research success

There can be many definitions of successful research and many factors that contribute to it. Comprehensively discussing all of these can fill the pages of several books and is beyond the aims of this post. Rather, I want to focus on two factors that I consider most important for success as I define it: Accelerating the rate of progress. Making discoveries that stand out and stand the test of time.

Vision

The first factor is the direction that we choose for our research. Choosing a worthwhile direction is essential: No amount of work can compensate for misguided direction. Choosing a fruitful and original directions is very hard, and I consider it as one of the most limiting factors in advancing biomedical research at the moment. By fruitful I mean a question that is worth asking and that can be meaningfully answered with existing tools and resources. By original I mean a question not already pursued by investigators with similar skill sets and tools.

People and culture

The second factor is the people, both their individual abilities and their ability to work as a team. The pool of colleagues whom a PI can successfully recruit is defined by the research itself: the vision, the tools and past success. Having recruited strong people, a PI should help them grow both individually and as members of a team pursuing a shared vision. This mentoring is the second major role of a PI (in addition to coming up with a vision) and deserves a devoted future post.

Of course other factors can be influential and even intertwined with the vision and the people seeking to realize it, i.e., prestige and resources can help to attract more capable people. Still, I consider all other factors secondary. Overthinking some of these secondary factors may waste precious time. For example, should you buy all equipment as soon as you start your lab or buy equipment when you need it? I am not sure how to determine which is better, but I think either way can work well and the benefit of the “better” option is unlikely to make the difference between success and failure or even to deserve the time spent on deliberation.

So, I think we — the research community — can be much more successful if we invest more time and effort in what matters: Coming up with original new leads and helping each other grow as scientists and people. Happy & successful new year to everybody!

The bigger picture

It is tempting to think of scientific progress in discrete units: papers. Indeed, graduate students often devote many years on a single paper, and it looms large. Its significance then may be codified by neat numeric metrics. Yet, this view is rather myopic. Revising it not only changes the way we feel about our work (principles) but also the way we chose to communicate research (practice).

Some papers stand out as exceptionally important, but even such exceptional papers depend critically on a body of related research. Take for example Einstein’s theories of special and general relativity. As brilliant as they might be, they will mean very little without the follow up experimental papers that confirmed the theoretical predictions. More generally, a single paper can never establish a big discovery. A paper can report a big discovery, but the discovery will not be established until independently reproduced and cross validated in sequel studies.

What about the more typical paper? It’s a part of a continuous body of research that is reported in discrete units mostly because of old customs. The discrete units are intertwined to shape a bigger picture, and thus the significance of a single unit intimately depends on its role in the bigger picture. This thinking shifts the focus from the success of a particular paper to the success of the overall research agenda: If the overall body of work is visible and important, so are its components, even if some of them are published in relatively obscure journals. Thus, if your research is important and at least some parts of it are visible, the need or benefit of publishing a particular part of it in a top journal is relatively small.

When do you need single-cell analysis?

Single-cell analysis is trendy for good reasons: It has enabled asking and answering important questions. Of course, the substantive reasons are surrounded by much hype. Sometimes colleagues tell me they want to add single-cell RNA-seq analysis since it will help them publish their paper in a more prestigious journal, and sadly there is perhaps more truth to that than I want to believe.

On the other end of the spectrum, some colleagues from the mass-spec community are puzzled by our efforts to develop methods for single-cell mass-spec analysis: At HUPO, I have been repeatedly asked: “Why analyze single cells when you can identify more peptides in bulk samples?”

So, when do we need single-cell analysis? Can’t we just FACS sort cells based on markers and analyze the sorted cells? Indeed, that maybe a good strategy when the cells we analyze fall into relatively homogenous clusters (they will never be perfectly homogeneous) and we have a reliable marker for each cluster. If these assumptions hold, the averaging out of differences between individual cells will give us very useful coarse graining. Unfortunately, bulk analysis of the sorted cells cannot validate the assumption of homogeneity. For example, we can easily sort B-cells and T-cells from blood samples because we have well-defined markers for each cell type. However, the bulk analysis of the sorted cells will not provide any information on the homogeneity of the sorted T-cells. Yet, a wealth of single-cell analysis has demonstrated the existence of multiple states within T-cell subpopulations, states for which we rarely have well-defined markers allowing efficient FACS sorting and follow up bulk analysis.

FACS sorting is especially inadequate when the cell heterogeneity is not easily captured by discrete subpopulations / clusters of cells. For example the continuous gradient of macrophage states that we recently observed in our SCoPE2 data:  

To explore the heterogeneity within the macrophage-like cells, we sorted them based on the Laplacian vector. See Specht et al., 2019 for details.

In some cases, e.g., analysis of small clonal populations, the benefits of single-cell analysis may be too small to justify the increased cost. Sometimes, we can gain single-cell information from analyzing small groups of cells, e.g., Shaffer et al., 2018. Sometimes, nobody can be sure if single-cell analysis is needed. If we assume it’s needed and perform it, the data can refute our assumption and show us that there is no much heterogeneity, at least at the level of what we could measure. If we assume that there is no heterogeneity and thus no need for single cell analysis, e.g., FACS sort T-cells, the bulk analysis of the sorted cells will not correct our assumption. We can feel the assumption is validated while being blinded to what might be the most meaningful cellular diversity in the system. So, single-cell analysis is not always needed, but it is much better at correcting our assumptions and teaching us if it is needed or not. 

Direct causal mechanisms

Understanding biological systems: In search of direct causal mechanisms

The advent of DNA-microarrays spurred a vigorous effort to reverse engineer biological networks. Recently, these efforts have been reinvigorated by the availability of RNA-seq data from perturbed and unperturbed single cells. In the talk below, I discuss the opportunities and limitations of using such data for inferring networks of direct causal interactions, with emphasis on the distinctions between models based on direct and indirect interactions. This discussion motivates the need to model proteins since most biological interactions involve proteins. Then I introduce key ideas and technological capabilities of high-throughput single-cell proteomics methods that we have developed and will focus on the opportunities of using such data for inferring direct causal mechanisms in biological systems.

Some of the ideas that I discuss above involve doing single-cell mass-spec measurements with SCoPE-MS. Thus, if you do not have strong background in quantitative mass-spec, you may want first to learn some of the key ideas that make SCoPE-MS possible from this primer by Harrison Specht. Below is its summary.

Quantifying proteins by mass-spec

Mass spectrometry-based proteomics is a suite of high-throughput and sensitive approaches for identifying and quantifying proteins in biological samples. These methods allow for quantifying >10,000 proteins in bulk samples. However, these techniques have not yet been widely applied to single cells despite the fact that modern mass spectrometers can detect single ions. To explain why, this primer talk will explore core concepts of mass spectrometry-based proteomics with emphasis on developing intuition for the physical processes underpinning peptide sequencing and quantification. In particular, I will cover what is called “shotgun” or “discovery” proteomics using isobaric barcoding, a technology used by Single Cell Proteomics by Mass Spectrometry (SCoPE-MS). The primer talk will outline the obstacles that have limited the broad application of quantitative mass spectrometry to single-cell analysis and how SCoPE-MS overcomes these obstacles to enable profiling thousands of proteins across thousands of single cells.