Birthdays, black magic and codon optimization

There are certain norms that should not be questioned. I mean, take birthdays. Personally, I love my birthday: it’s in early May and normally there’s beautiful spring weather. Sometimes it overlaps with a May Day bank holiday. All in all, pretty good. However, I know people who really hate the date of their birthday, and so, a couple of years ago, I came up with a great idea: what if we could choose the day we celebrate our birthdays? Every year you could pick a different date (or the same, if you wanted), and obviously, you could pick only one day per calendar year, not to deflate the value of birthdays, as with unbirthdays:

It seems a simple and practical solution to the birthday problem to me, but when I pitch this idea to people, many are up in arms about it. There seems to be a widespread belief that celebrating your birthday on the anniversary of the day you were born is a custom set in stone, which must not, under any circumstances, be changed. Of course, this is not true. For refugees who cannot prove their date of birth, immigation officials often put 1st Januray as a placeholder. Some immigrants embrace this new birthday as a symbol of a fresh start. Then there are religions, where celebrating birthdays is not part of the culture anyway. So, really, if you look at the bigger picture, birthdays don’t have to be on the day you were born at all. But if you’ve grown up following this tradition, normally you don’t question it.

Similar set-in-stone “norms” and traditions are also abound in science. For example, every lab I’ve been to has had different protocols for transforming RbCl-heat shock-competent
E. coli cells, and they’ve all insisted their protocol is best. Protocols differ in the time used for heat-shock and recovery, whether cells are left (at room-temperature or on ice) for a couple of minutes after the heat shock or not, and one lab mate even swore that using green tape made all the difference. I won’t be surprised if I’ll find a lab that thinks bacteria should be harvested on a full moon to make transformation more efficient. There seems to be a lot of black magic around…

The most recent “norm” that’s been pissing me off, has been codon optimization in mammalian gene expression. Honestly, nowadays everybody seems to be codon optimizing, regardless whether it’s for overexpression, phenotype rescue or genome editing, and no matter if they are expressing a bacterial gene, a yeast gene or mouse gene in human cells. Last week, a colleague exclaimed in horror, when I asked to use a construct of his: “But it won’t work in human cells! It’s been optimized for zebrafish!”

What is codon optimization, and what’s the problem with it, you may ask? Well, codon optimisation is based on the observation that there are 20 different amino acids (the building blocks of proteins), but there are 61 codons that can code for these amino acids. Thus, some amino acids can be encoded by multiple codons, and theoretically these could be used interchangeably. In reality, however, there is a bias for which codons are being used, and many people think that codon choice may affect the efficiency of translation, mainly because some tRNAs (the “translators” that mediate between codons and amino acids) are more abundant and therefore more readily available during the process of translation. So codon optimization (or codon adaptation) is often applied when proteins from one organism are to be expressed in the cells from another organism (eg bacterial proteins are expressed in mammalian cells). In the process of codon optimization, codons are modified to match the “preferences” of the host organism, without changing the sequence of the protein.

Now, codon optimization can certainly make a difference. There are plenty of examples that it matters when expressing human genes in bacteria. There are also some examples that it can matter when expressing heterologous genes in mammalian cells. Expression of the green fluorescent protein (GFP) from the jellyfish Aequorea Victoria was greatly enhanced in human cells after codon optimization (here1 and here2). Even the expression levels of human proteins can be further increased in human cells by altering the codons3. But do we really understand why and to what extent codon usage (particularly in mammalian cells) matters? Translation is a complex process, and multiple factors, from tRNA copy number and expression levels, mRNA stability and folding, to translation initiation and/or elongation have all been implicated in playing a role. For example, a couple of years ago, it was shown that a silent polymorphism in the MDR1 gene can affect the folding and subsequent function of the MDR1 protein4. At the time the authors hypothesized that this may be due to the usage of a rare codon, which affects the kinetics of protein translation, but later multiple alternative explanations were offered, such as altered mRNA folding and nearest neighbour codon context effects. Another example involved a synonymous variant in the IRGM gene5, which can be considered a risk allele for Crohn’s disease. Not because it affects codon usage, but because it alters the binding of a miRNA, thus affecting mRNA stability.

As you can see, mRNA sequence matters for protein expression and function – beyond it’s “direct” coding function. But there’s no simple relationship between codon bias and tRNA abundance. This month a beautiful article by Tamir Tuller was published6, which provides a great overview of how current studies on codon usage and translation efficiency are often contradictory. He also provides possible explanations for these contradictions. The paper is largely hypothetical, but it includes plenty of references to original publications. Also, it highlights both theoretical and technical problems in dissecting causal relationships, such as hidden relationships between factors that could produce “fake” causality, or the difficulty of experimentally modifying one biological variable without affecting others. Another, older review7, also highlights the complexity and contradictions in understanding the mechanistic origins and implications of codon bias, with a special outlook on mammalian codon bias.

So, overall what angers me when people “codon optimize” in mammalian systems, is that they often don’t have a clue about the complex science behind codon choice. It’s not sexy to know the theory and ins-and-outs of a method, especially something as boring as protein expression. Seriously, do we know what the relationship between zebrafish and human codon usage is? And so, slowly, while everyone codon optimises, the actual rationale for doing so becomes black magic, a norm set in stone. Just as people are up in arms about my “chose-your-birthday” idea, they are also up in arms when I suggest that codon optimisation might not be important (eg. between zebrafish or mouse and human). Granted, the “chose-your-birthday” plan is just a crazy idea, while the concept of codon optimization has a scientific basis, so it’s definitely not a perfect comparison. Nevertheless, research and the way we do research, should be evidence-based, even in its smallest details, and not simply a tradition that we follow just because that’s how everybody else does it – and I feel that’s where we’re headed with codon optimization.


1. Yang et al. (1996) Optimized codon usage and chromophore mutations provide enhanced sensitivity with the green fluorescent protein. Nucleic Acids Res 24(22):4592-3
2. Zolotukhin et al (1996) A “humanized” green fluorescent protein cDNA adapted for high-level expression in mammalian cells. J Virol 70(7):4646-54
3. Kotsopoulou et al (2010) Optimised mammalian expression through the coupling of codon adaptation with gene amplification: Maximum yields with minimum effort. J Biotechnol 164(4):186-93
4. Kimchi-Sarfaty et al (2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315(5811):525-8
5. Brest et al (2011) A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat Genet 43(3):242-5
6. Tuller T (2014) Challenges and obstacles related to solving the codon bias riddles. Biochem Soc Trans 42(1):155-9
7. Plotkin and Kudla (2011) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12(1):32-42

Tuller T. (2014). Challenges and obstacles related to solving the codon bias riddles. Biochem Soc Trans DOI: 10.1042/BST20130095


2 thoughts on “Birthdays, black magic and codon optimization

  1. Good post.

    I think the most striking thing I noticed when moving between labs is the role of superstition – or more charitably, habit – on the protocols carried out in the lab. Few people know where their protocols come from or the rational behind them (and I’m not excluding myself from that) which I guess is probably inevitable: we carry out a great many procedures, most of which are largely tangential to what we’re actually interested in, who has the time to research every one? Who has the time to trial every protocol and pick the “best”? And so long as it works, does it even really matter?

    • Yes, I agree. I have come to the point, where I use the different protocols in circulation to assess how “robust”/flexible a method is. If incubation times are different in different labs, probably it’s not so crucial to be super precise. But I still think it’s important to know whether you know what the basis for a protocol is. It’s OK not to know (especially as long as it works ;), but then one should be aware of this, and not insist on “this is how it has to be done”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s