Introduction to Peptide Synthesis
Last updated: September 12th, 2019 |
Peptide bonds: Forming peptides from amino acids with the use of protecting groups
Today we’ll go deeper on how to synthesize the most important amides of all – peptides – with an important contribution from protecting group chemistry.
Table of Contents
- What Are Peptide Bonds?
- The “Proteinogenic” Amino Acids
- Synthesis of a Dipeptide Without Protecting Groups
- Synthesis of a Dipeptide Using A Protecting Group Strategy
- Synthesis of Longer Peptides – Tripeptides and Tetrapeptides
- Bonus Topic: Solid-Phase Peptide Synthesis
A “peptide bond” is an amide linkage (see Amides: Properties. Synthesis, and Nomenclature) that connects two amino acids, as in the “dipeptides” L-phenylalanyl-L-valine (below left) and L-leucyl-L-alanine (below right):
Proteinogenic amino acids are the building blocks of proteins. In addition to the 20 amino acids directly encoded by the genome, two other amino acids are coded into proteins under special circumstances: selenocysteine (present in eukaryotes, including humans) and pyrrolysine (found only in methane-producing bacteria).
With the exception of (achiral) glycine, all proteinogenic amino acids are L-amino acids, where the “L-” prefix relates the stereochemistry of the amino acid relative to that of L-glyceraldehyde [See post: D and L Sugars] .
Of the chiral amino acids, all are S, with the exception of cysteine and selenocysteine (because sulfur and selenium have a higher priority under the CIP system. )
Let’s build a simple dipeptide between two of these amino acids. For simplicity’s sake, we’ll pick two from the “hydrophobic sidechain” group, alanine (Ala) and leucine (Leu), since their sidechains don’t need additional protecting groups.
What do we need to do to make L-Ala-L-Leu ?
Surveying the methods previously covered to make amides, it might seem simple.
Why not take 1 equivalent each of L-alanine and L-phenylalanine, add a coupling agent like N,N-dicyclohexylcarbodiimide (DCC) and patiently wait for our product to appear?
What could possibly go wrong?
Well, this will give us some of our desired product. But it won’t do so efficiently!
That’s because each amino acid has two reactive termini – an amine and a carboxylic acid – and they can bond together in multiple ways.
Just like the letters A and L can combine to make the words AL and LA, in addition to Ala-Leu (our desired product) we will also get Leu-Ala.
Furthermore, since we’re not adding single molecules together but molar quantities (even a millionth of a mole (a “micromole”) has 1017 molecules in it) we also have the possibility of forming the “homo-dipeptides” AA (Ala-Ala) and LL (Leu-Leu).
And that’s just a start. No matter how you slice it you’re looking at a low yield (<25%) of the desired material.
That’s inefficient, wasteful, and expensive! Isn’t there a better way?
Yes. Rather than using the native amino acids and just praying for a good yield, we can use protected versions of L-Alanine and L-Leucine.
If we protect the carboxylic acid of Leucine as an ester (e.g. a methyl ester) and protect the amine of L-Alanine as a carbamate (See: Carbamates as protecting groups) then we set up a situation where we have a single nucleophile and a single electrophile.
This results in a high yield (>95%) of a single product!
[Note: the Boc group is a popular carbamate protecting group for amines; “Boc” stands for t-butyloxycarbonyl]
The good news is that we don’t have to stop at the dipeptide. If we choose protecting groups that can be removed selectively (and the carbamate / ester pair qualifies) then we can then deprotect the carbamate, and add a third amino acid.
The choice of carbamate protecting group here was t-butoxycarbonyl (Boc) which is removed with strong acid (trifluoroacetic acid, abbreviated as TFA).
Treatment with TFA removes the Boc group but leaves the methyl ester alone.
So if we treat the dipeptide with TFA, we liberate the amine nitrogen, and can react with another Boc-protected amino acid in the presence of DCC to get a tripeptide.
If we’re keen, we can even extend the same method to build a tetrapeptide, a pentapeptide… or beyond!
It’s not unreasonable to consider this method for longer peptides.
For instance, take something like bradykinin, a 9-peptide chain that causes dilation of blood vessels leading to a rapid drop in blood pressure. (Your body releases bradykinin in response to snake bites, which is how it was originally discovered.)
It might be interesting to synthesize variants of bradykinin where some of the amino acids are swapped out for other ones. In order to do that, we’d need to be able to synthesize it.
So how effective could it be?
If each peptide coupling step has a yield of about 95%, then our overall yield for making bradykinin would be (0.95)9 , or 63%. That’s actually pretty good! A lot of chemists would be happy to get a yield of 63% for a single reaction, let me tell you.
If the yields are high enough, one can even imagine building something crazy like insulin (51 peptide residues). That’s 7% yield for 51 steps.
Is this possible?
Yes… but it requires a clever modification that won its inventor, Bruce Merrifield, the 1984 Nobel Prize in Chemistry.
What follows below is more supplemental than anything else, but given the importance of the topic, both interesting and useful.
In 1963 a chemist at Rockefeller University named Bruce Merrifield published a paper that would revolutionize how peptides were synthesized, and eventually make the synthesis of long peptides routine.
It was entitled: “Solid-Phase Peptide Synthesis. I. The Synthesis of a Tetrapeptide“.
Here’s the key idea.
Recall that in our original scheme (above) we protected the carboxylic acid as a methyl ester, which stays the same throughout the whole peptide synthesis.
Merrifield’s idea was: what if we find a way to attach the carboxylic acid to a functional group that is itself linked to a polymer bead? Not only would this also protect the carboxylic acid, it would drastically improve the ease of separations.
Why? Because instead of having to purify the final product by crystallization or column chromatography, you purify by filtering off the polymer beads (each 200-500 μm) and washing them to remove excess reagents.
The polymer beads themselves are pretty small. A typical size is 200 micrometers. Each bead can load about 4 nanomoles of amino acid.
The video posted below is not mine, but it gives you an idea of the process.
At about 0:34 you can see how small the beads are.
The starting point for the Merrifield process is crosslinked polystyrene. which behaves like one big interlinked molecule. Polystyrene is then attached to a linker, which usually terminates with an NH2 group. This itself is usually protected; in order to activate the linker, you need to remove the protecting group cap.
The polymer bead needs to swell in a solvent in order for functional groups on the solid support to undergo reactions efficiently.
The essential procedure is: swell –> add reagents –> wait –> filter –> wash, and repeat. Beads stay in the reaction vessel the whole time. There’s also usually some kind of capping step to make sure any unreacted amines don’t participate in the next reaction.
It’s possible to make peptides up to about 50 units this way. In highly automated systems one can be even more ambitious.
Merrifield started knocking off peptides in the 1960s. Bradykinin was made in 8 days and 68% overall yield. one example. Insulin was made two years later. The crowning achievement of this initial period was probably ribonuclease A, which has 150 amino acid residues.
The original Merrifield process has been significantly modified and improved. Originally, removal of the linker required harsh conditions (strong acid). Today, procedures usually employ FMOC protecting groups instead of BOC, which allow for deprotection with mild amine base (piperidine). A galaxy of new resins, linkers, and coupling procedures have been subsequently developed. The Wikipedia article on solid-phase peptide synthesis is an OK place to start.
Note 1. Cysteine (and selenocysteine) are L, but R, because sulfur has a higher priority within the Cahn-Ingold-Prelog system.