Research fest

Academics like to discuss paper rejections and gatekeeping. The other end of this spectrum is highlighting research that deserves our attention.

Let’s promote good research: Share it in accessible and engaging ways. Put it in context and help your colleagues appreciate it. The more we can put substance ahead of hype, the more science and our colleagues benefit from our highlights.

Below are links to tweets of papers that I find interesting and worth sharing!

Accessible single-cell proteomics

Recently single-cell mass-spectrometry analysis has allowed quantifying thousands of proteins in single mammalian cells. Yet, these technologies have been adopted in relatively few mass-spectrometry laboratories. Increasing their adoption can help reveal biochemical mechanisms that underpin health and disease, and it requires robust methods that can be widely deployed in any mass spectrometry laboratory.    

This aim for a “model T” single-cell proteomics has been the guiding philosophy in the development of Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) and its version 2 (SCoPE2). We aimed to make every step easy to reproduce, from sample preparation and experimental parameters optimization to an open source data analysis pipeline. The emphasis has been on accuracy and accessibility, which has facilitated replication (video) and adoption of SCoPE2. Yet, we still found that some groups adopting these single-cell technologies fail to make quantitatively accurate protein measurements, because they skip important quality control steps of sample preparation (such as negative controls and labeling efficiency), and mass spectrometry analysis, such as apex sampling and purity of MS2 spectra. 

These observations motivated us to write a detailed protocol for multiplexed single-cell proteomics. The protocol emphasizes quality controls that are required for accurately quantifying protein abundance in single cells and scaling up the analysis to thousands of single cells. The protocol and its associated video and web resources should make single-cell proteomics accessible to the wider research community.

Magnanimity pays off

Earlier this year, I read an inspiring recollection (by Sydney Brenner) of a grand scientific milestone: the elucidation of the genetic code. How do DNA nucleotides code for the amino-acid sequence of proteins? This fundamental question had captivated numerous scientists, including Francis Crick and Sydney Brenner. The punchline of this wonderful interview/recollection is a magnanimous act by Francis Crick:

crick-13154-portrait-mini-2xIn August 1961, more than 5,000 scientists came to Moscow for five days of research talks at the International Congress of Biochemistry. A couple of days in, Matt Meselson, a friend of Crick’s, told him the news: The first word of the genetic code had been solved, by somebody else. In a small Friday afternoon talk at the Congress, in a mostly empty room, Marshall Nirenberg—an American biochemist and a complete unknown to Crick and Brenner—reported that he had fed a single repeated letter into a system for making proteins, and had produced a protein made of repeating units of just one of the amino acids. The first word of the code was solved. And it was clear that Nirenberg’s approach would soon solve the entire code.

Here’s where I like to imagine what I would have done if I were Crick. For someone driven solely by curiosity, Nirenberg’s result was terrific news: The long-sought answer was arriving. The genetic code would be cracked. But for someone with the human urge to attach one’s name to discoveries, the news could not have been worse. Much of nearly a decade’s worth of Crick and Brenner’s work on the coding problem was about to be made redundant.

I’d like to believe I would have reacted honorably. I wouldn’t have explained away Nirenberg’s finding to myself, concocting reasons why it wasn’t convincing. I wouldn’t have returned to my lab and worked a little faster to publish my own work sooner. I’ve seen scientists react like this to competition. I’d like to believe that I would have conceded defeat and congratulated Nirenberg. Of course, I’ll never know what I would have done.

Crick’s response was, to me, remarkable and exemplary. He implored Nirenberg to give his talk again, this time to announce the result to more than 1,000 people in a large symposium that Crick was chairing. Crick’s Moscow meeting booklet survives as an artifact of his decision, with a hand-written “Nirenberg” in blue ink, and a long arrow inserting into an already-packed schedule the scientist who had just scooped him. And when Nirenberg reached the stage, he reported that his lab had just solved a second word of the code.

by Bob Goldstein 

I admire Crick’s reaction. It is very honorable. In the long run, it helped both science and Crick’s reputation. Nirenberg had a correct result and sooner or later, he was going to receive credit for it. Crick facilitated this process, and in the process Crick only added to his own credit. Our current admiration for Crick’s reaction at the Moscow conference is the only proof I need.

Any interpretation that sees Crick’s magnanimous act as being good only for the science but bad for Crick’s personal reputation is myopic; it misses the long run. It misses mine (and hopefully yours) opinion of Crick’s magnanimous act.

Premature human engineering

The news buzz alive with excitement about human genome editing, even human germline engineering. Successful germline engineering requires (1) a technology for editing DNA safely and (2) knowledge of what to edit and how to edit based on understanding the underlying biology. We are approaching (1), which is the easier part; we do not have (2), and we are far from achieving it for most desired “edits”.

A huge hurdle to germline engineering is that, beyond a few simple cases, our understanding does not allow achieving desired effects while avoiding unintended consequences. Unlike DNA sequencing, silicon chips and DNA editing, our understanding of complex combinatorial multi-gene interactions has made very little progress over the last few decades. Until we made more progress and understand gene interactions and the respective health outcomes better, germline engineering is akin to medieval quack therapies based the technology to bleed patients and feed them various concoctions but with very limited understanding of the medical consequences, and with plenty of unintended consequences. We can fix the unintended consequences later and then fix the unintended consequences from the fixing, and we will keep trying!

Deceptive Numbers

You want to estimate an important quantity. You compute an exact number purporting to estimate it. You compute another exact number purporting to estimate it. The two numbers differ significantly. The only logical conclusion is that these estimates are less exact than they seem.

This clearly seems to be the case with the notion of “impact” as quantified by different  metrics that purport to estimate the same quantity from the same data:


Perhaps a good antidote to innumeracy can be playing with data interactively. So, you can search these data interactively and find for yourself how different metrics of impact may differ by over 300 % !


Increasingly direct evidence

The results in our Cell report are particularly satisfying to me since they bring clarity to a puzzle that I have pursued for almost a decade. The puzzle started with an observation that I made while a graduate student in the Botstein laboratory at Princeton University.

As growth rate increases, RPs are transcriptionally induced to varying degrees; some are even repressed.

I studied the transcriptional responses of yeast cells growing across a wide range of growth rates. These data allowed us to evaluate a suggestion that Ole Maaløe had proposed for bacteria over 30 years earlier: cells growing faster should induce the transcription of ribosomal proteins since they need to make more ribosomes that can meet the increased demands for protein synthesis. While most mRNAs coding for ribosomal proteins (RPs) exhibited this logical trend (their levels increased with the growth rate), others did not. The RP transcript levels that deviated from the expectation were reproducible across biological replicas and even across different nutrient limitations used to control the cell growth rate. Furthermore, the number of the RP transcripts defying the expectations was even larger when I grew the yeast cells on ethanol carbon source. I also observed uncorrelated variability in RP transcripts across human cancers, but this observation was based on public data without biological replicates and with many confounding factors.

My observations of differential RP transcriptional induction puzzled me deeply. According to the decades-old model of the ribosomes, each ribosome has exactly one copy of each core RP. Thus, the simplest mechanism for making more ribosomes is to induce the transcription of each RP by the same amount, not to induce some RPs and repress others. Still, biology often defies simplistic expectations; one can easily imagine that RP levels are controlled mostly post-transcriptionally. Transcript levels for RPs were enough to pick my curiosity but ultimately too indirect to serve as evidence for the protein composition of the ribosomes. Thus, I neglected the large differences in RP transcriptional responses and interpreted our data with the satisfyingly simple framework suggested by Ole Maaløe. Many other research groups have also reported differential transcription of RP genes but these observations have the same limitations as my transcriptional data.  The puzzle remained latent in my mind until years later I quantified the yeast proteome by mass-spectrometry as part of investigating trade-offs of aerobic glycolysis. This time, the clue for altered protein composition of the ribosomes was at the level of the ribosomal proteins, not their transcripts. While still indirect and inconclusive, I found this observation compelling. It motivated me to design experiments specifically aiming to find out if the protein composition of the ribosome can vary within a cell and across growth conditions.

The data from these experiments showed that unperturbed cells build ribosomes with different protein compositions that depend both on the number of ribosomes bound per mRNA and on the growth conditions. I find this an exciting result because it opens the door to conceptual questions such as: What is the extent, scope and specificity of ribosome-mediated translational regulation? What are the advantages of regulating gene expression by modulating the ribosomal composition as compared to the other layers of gene regulation, from histone modifications through RNA processing to protein degradation? Do altered ribosomal compositions offer tradeoffs, such as higher translational accuracy at the expense of lower translation-elongation rate via more kinetic proofreading? Some of these question may (hopefully will) reveal general principles. These questions are fascinating to speculate about but they can also be answered by direct measurements. Designing experiments that can rigorously explore and discriminate among different conceptual models should be a lot of fun!


CSHL Translational Control Meeting 2016


Science and failures of authority

One of the key milestones in the emergence of the scientific method was placing empirical evidence above authority. This shift is beautifully articulated by both William Harvey and Nicolaus Copernicus. Both of them tactfully and forcefully defended their choice to side with the empirical results even when they contradicted the most venerable authorities in their respective fields. These early examples and many later ones, however, do not indicate that empirical evidence always trumps authority, at least not on the scales of years and decades. Even contemporary science provides compelling examples of highly accomplished scientists, i.e., venerable authorities, trumping empirical evidence, and thus halting progress for years. A few prominent examples are listed below. Suggestions for other examples are most welcomed!

1) Linus Pauling ridiculed the existence of quasicrystals (demonstrated by neutron scattering data) by saying: “There is no such thing as quasicrystals, only quasi-scientists.” This scepticism held back the acceptance of quasicrystals.

2) William Shockley opposed continuing research on silicon-based semiconductors, the strategy that enabled the rise of the semiconductor industry, microprocessors and modern electronics.

3) Wilhelm Ostwald strongly opposed the atomic theory of matter, delaying its acceptance until the experiments of Jean Perrin and Ernest Rutherford.

4) Charles Hapgood rejected the continental drift theory in influential book The Earth’s Shifting Crust featuring a foreword by Albert Einstein. This rejection, alongside other authoritative rejections unsubstantiated by data, delayed the acceptance of the continental drift theory by decades.

5) Many senior scientists opposed the existence of prions and delayed scientific progress in the field for many years. This vehement opposition is described in Madness and Memory: The Discovery of Prions—A New Biological Principle of Disease, by Stanley Prusiner.

Linus Pauling, William Shockley, and Wilhelm Ostwald have made some of the most important contributions to modern science and these contributions have been deservingly recognized as such. Their demonstrated abilities and authority should bring attention to their arguments and opinions; however, their abilities and authority should not change the standards of evidence appraisal. That is, the opinions and arguments of accomplished scientists may deserve more attention but not authority-based acceptance; the evidence should be evaluated based on objective standards of appraisal.

This larger point about the relationship between evidence appraisal, bias and authority relates to recent debates on whether the authors of a research article should be revealed to its peer-reviewers. My opinion is that the identity of the authors should have very little, if any, influence on the peer-reviewers. I understand that some labs are known for having expertise in certain fields or for being very careful and this prior information may be relevant. However, even data from such labs should stand on their own, not be contingent on prior reputations. More importantly, the data from labs without long track-record should be taken seriously and evaluated with reasonable and objective standards. Thus, I think that the identity of the authors should not be an influential factor in evaluating their research results. However, anonymizing the authors is not practical, especially since it is incompatible with the publication of preprints. The benefits of preprints (in my opinion) vastly outweigh the benefits of hiding author identities.

Which publications are citable?

The number of references to a scientific publication is frequently used as an objective measure for the significance of the publication. This metric is far less precise than it may appear and in the short/medium-term it certainly fails to capture the most visionary and creative research. Consider, for example, that the publications of Richard Feynman are currently cited 6-10 times more frequently than during the peak of Feynman’s career. The respective comparison for the citations to publications by Albert Einstein is even more extreme. Still, citations are used as an influential metric, and thus the community standards of what is citable are very influential as well.

The community standards should be set by a community-wide discussion. Here, I will express my opinion in hope of stimulating discussion and soliciting more opinions. To be citable, a publication must (i) be permanent and traceable (e.g., it must have a DOI), and (ii) provide scientific support for what it is being cited. The second criterion is loaded and needs clarification. By scientific support, I mean data, reasoning and theoretical/computational results that are verifiable and refutable. Crucially, this assessment of “scientific support” must be made by the authors referring to the work. Most magazines and journals do not publish the names of the editors and the peer-reviewers, or even the contents of the peer-reviews. This type of hidden assessment and the anonymous people involved in it cannot possibly assume responsibility for the scientific merits of what is published. The scientific merits of a paper and the extent to which it provides scientific support must be evaluated by the authors referring to it.

These two criteria of what is citable apply equally to traditional papers having undergone open/hidden peer-review and to preprints uploaded on permanent servers guaranteeing timestamps and traceability. In fact, the data suggest that some communities have long recognized and adopted these standards. To my delight, I saw much enthusiasm among broader communities who have not yet adopted them:

I am optimistic that scientists will embrace their duties of independent critical assessment of the publications they refer to. I would love to hear your thoughts and ideas on what should be the criteria and the community practices for citing scientific publications.


I have heard many biomedical researchers express the opinion: “In biology, ideas are cheap. It is the doing that matters.” I do not share this opinion. However, I would like to understand it, particularly since it seems quite prevalent and shared by prominent professors and institutional directors.

One aspect contributing to this thinking is perhaps the fact that biological systems are complex, and this complexity makes it hard for ideas and theories alone to resolve important problems. Of course there are prominent examples to the contrary, such as kinetic proofreading, but the majority of prominent and widely-celebrated triumphs have been driven by experimental data, increasingly experimental data acquired by large consortia.

The main aspect, I think, is what people mean by the word “ideas”. In biomedical research, ideas for informative and low-risk next-step experiments float in the community, at conferences and review articles. Securing the resources to implement these ideas first and implementing them well is crucial to the realization of these kind of ideas. However, there are also the non-trivial and original ideas, such as kinetic proofreading. It is clearly this interpretation of the word “idea” that Albert Einstein had in mind when he said:

You ask me if I keep a notebook to record my great ideas. I’ve only ever had one.

I think that this kind of ideas are the most prominently lacking aspect of our data-rich discipline. This kind of ideas are scarce and essential element of research. They are precious !

Evolving scientific culture

The technological progress is continuously expanding our ability to measure and study new phenomena at ever higher sensitivity and accuracy. This progress is empowering research and sometimes drives new discoveries. Such progress comes at high price: expensive equipment and ever increasing dependence on resources. Resource-driven investigations are becoming common enough to stimulate the appearance of specialized types of articles in many journals. The-resource driven research contrasts with idea-driven research, e.g., the Luria–Delbrück experiment, general relativity, and Feynman diagrams. Most research is driven both by ideas and by resources, and the contrast between the extremes (principal eigen-components) is useful mostly for emphasising the evolving shift in their relative contributions.

Both idea-driven and resource-driven research can be very productive. However, they demand different sets of skills and create very different cultures. Privileged background and political skills are far more important for resource-driven research than for idea-driven research. Conversely, resource-driven research is less conducive to a meritocratic culture. Furthermore, priority is generally harder to assign objectively for resource-driven research and stimulates political attributions; see this excellent post and discussion by Arjun Raj. Resource-driven research usually involves many repetitive steps that are suboptimal learning experience for graduate students. On the other hand, resource-driven research is fairly “safe” in terms of producing visible and popular publications, soliciting future funding, and building careers.

The shift towards higher resources-dependence is sometimes demanded even by the most creative, idea-driven research; some brilliant and original ideas are impotent unless combined with empirical data whose collection requires resources. I consider this use of resources essential. Beyond this essential use of resources, much resource-driven research seems to be motivated by the relative safety of this approach for those with power and resources. Some of the increasing reliance on resources seems to be autocatalytic; as the scientific culture evolves, some scientists seem to become more accustomed to the inequality. As one prominent PI put it: “The world is unfair. That is nothing new and nothing to worry about”. I find this attitude defeatist. While perfect fairness is hard to define and perhaps impossible to achieve practically, this should not be a reason for resignation and lack of motivation to improve the system as much as we can.