Back in 2007 I did a brilliant course called Computational Genomics. I enjoyed it so much that I stayed on to do a PhD.

One project was on the Genetic Code. This code is what our bodies, and all lifeforms, use to convert DNA to functional protein chains.

Most interesting to me is that there are many genetic codes. Animals all share the same language. Bacteria have their own. Yeast have yet another.

I use the world language here because each of the genetic codes have the same set of letters and words. The differences between them are minor and relate to how these words are interpreted.

This is designed to be a short introduction to DNA and the Genetic Code. We’ll look at DNA itself, how it is processed in order to create proteins, and how this can differ across various branches of the tree of life. What’s cool is these differences are tiny.

What is DNA?

DNA is an instruction set for life. It takes a set of 4 bases, which can describe roughly 20 amino acids, which themselves chain into proteins. These proteins are what power life, helping us turn sugar and oxygen into energy, or to break down and digest various foodstuffs in our stomachs.

DNA is our means of storing the instructions to make these proteins. It is like a recipe book for life, built up over billions of years.

There’s so much to talk about, but we’ll focus today on the Genetic Code.

What does the Genetic Code look like?

DNA is comprised of four letters: T, C, A, G. Sometimes you might see U in place of T. This is down to differences between DNA and RNA, which we can ignore for the time being.

There are 64 words in the language of DNA. We refer to these words as codons. Codons are comprised of 3 letters. For example, the following snippet of DNA:

ATGCTAATA

can be read as three codons: ATG, CTA, ATA. Each is an instruction to insert a different amino acid into the resulting protein.

The table we use to translate from codons to amino acids, or from words to their meaning, is called The Genetic Code. From our perspective then, genetic codes are like dictionaries.

Even though there are many codes, we usually refer to the Standard Genetic Code. You can see this code below:

The Standard Genetic Code

To read this table, first look up the codon you wish to translate, and it’ll show you the amino acid it codes for.
Doing this for our example DNA, ATGCTAATA, gives us:

Codon Amino Acid Code Amino Acid Name
ATG M Methionine
CTA L Leucine
ATA I Isoleucine

During translation our bodies would create a small chain of amino acids, MLI. The protein this creates is something I just made up, but you can view and download real proteins online.

Features of Genetic Codes

Looking at the image above will reveal some interesting patterns. You’ll see that:

  • there are more codons than amino acids
  • multiple codons point at the same amino acid
  • codons that point to the same amino acid tend to look similar
  • the first two letters in a codon are usually all you need to know to determine the amino acid (e.g., CT., CC., AC., GC., …)

What’s important here is that we have many ways to write down the same amino acid. If you imagine yourself writing out DNA with pen and paper, you’ll probably imagine making the odd mistake along the way. However, not all mistakes are equal.

A mistake writing the first two letters of a codon could make a big difference to your protein, but a mistake in the final letter might not make any difference at all.

This will be important when we look the process of replicating DNA, known as transcription, and the effect of mistakes, or mutations, on this process. Mutations are key to our evolution as it is one way in which variation is introduced into species.

For now though, we’ll focus on genetic codes and will start with mould.

A Mouldy Genetic Code

Let’s look now at a different genetic code: mould mitochondria. This code is exactly the same as the standard genetic code, except for a single codon:

TGA -> W

The block of codons beginning TG now looks like this:

A portion of the Mould Genetic Code

First off you’ll see the difference is minor. How minor, I’m not sure, but we can totally investigate. Some questions to get us started:

  • how many times does the codon TGA, appear in mould DNA?
  • how many times does it appear in other DNA?
  • if we decoded DNA using the wrong genetic code, would the difference be critical?
  • how commonly used are the amino acids in species that rely on each code?

We can keep listing questions and hunting down answers but for now we’ll think about how this might affect the lifeforms that use this code.

Mould versus Standard Genetic Code?

The only difference occurs when we encounter the codon TGA. In the Standard Code this codon means STOP, a signal that the end of the protein has been reached.

Under the mould genetic code, we end up instead with the amino acid, W, which stands for Tryptophan.

Why is this? I don’t know the answer but we can form some hypotheses and investigate later:

  • W only occurs once in the standard code. Perhaps mould rely on this more than other lifeforms?
  • Mutations between A <> G or T <> C are more likely than others. Perhaps there was a need to decrease the chance of mutations away from Tryptophan?

Summary

We’ve looked at DNA and how it is composed of 4 bases. These bases describe around 20 amino acids, which chain together to form proteins.

The meaning of these bases, or how they translate to amino acids, is dictated by the Genetic Code. This code is shared across almost all lifeforms, with a few variations.

We finished with questions on the mould genetic code and why this might exist in the first place. Although we don’t have the answer, we certainly have the resources to go and find out for ourselves.