Introduction to protein

85262_largeIntroduction to Protein Structure (Part 1: Basic structural principles and folding of protein)

Carl Branden & John Toose, Karolinska Institute & Imperial Cancer Research

Taylor & Francis Group

The preface of this book sets out the tenet that biological reactions cannot be understood except through the structure of participating molecules. The functional properties of protein depend on their three-dimensional structure. Proteins are formed out of chains of amino acids known as polypeptide chains that fold into three dimensional structures. Predicting the three dimensional structure of folded proteins remains a major challenge. The 20 amino acids are capable of being folded into more different proteins than there are atoms in the universe. The earliest researchers into protein were disappointed by the complexity and lack of symmetry they discovered, and this has made it harder to understand than was DNA.

Protein molecules:  Protein molecules are polymers built up out of the 20 amino acids. The amino acid sequence of the polypeptide chain is referred to as its primary structure. Different sections of a chain can form themselves into secondary structures. The alpha helix and the beta strand are the most common form of such secondary structures. The tertiary structure is formed when these secondary structures are packed into globular units known as domains. An individual protein can contain several of these domains, and this grouping is known as the quaternary structure. This arrangement allows amino acids that are not close together in the sequence of the primary chain, to be brought close together.

Structure of amino acids:  All the amino acids have a central carbon atom, sometimes referred to as the alpha carbon. Two groups are attached to either side of the central carbon atom; the amino group contains one nitrogen and two hydrogen atoms, while the carboxyl group has a carbon atom attached to two oxygen atoms and one hydrogen atom. A further hydrogen atom is attached directly to the alpha carbon, as is the side chain of each amino acid. The side chain is what distinguishes one amino acid from another. The amino acids,which repeat over and over again in the polypeptide chain are also referred to as residues. Amino acids in the polypeptide chain are joined by peptide bonds, which are covalent between the nitrogen atom in the amino group and the carbon atom in the carboxyl group. A hydrogen and two oxygen atoms are squeezed out when this bond is formed. The chain that is held together by these peptide bonds is also referred to as the main chain or backbone. The amino acids are divided into three main classes, according to their chemical properties. One class has hydrophobic side chains, one has polar side chains, and one has electrically charged residues. The conformation of the main chain or backbone is determined by the angle between the alpha carbon and the nitrogen atom of the amino group (phi), and the angle between the alpha carbon and the carbon atom of the carboxyl group (psi). Metal atoms such as iron, zinc, magnesium and calcium are often bound to the side chains of amino acids.

The folding of protein is mainly driven by the need to pack hydrophobic side chains into the interior of the molecule, creating a hydrophobic core and a hydrophylic surface. The close packing of the hydrophobic core is here compared to solving a three-dimensional jigsaw puzzle. Where there are holes in the interior of the protein, these are usually occupied by closely bound water molecules, which are here regarded as integral parts of the protein.

The creation of a hydrophobic core represents a problem, because of the hydrophilic nature of the backbone. This problem is solved by forming secondary structures, usually either an alpha helix or  a beta sheet, within the protein. In these structures, the backbone polar groups are neutralised by hydrogen bonds between amino and carboxyl groups, in locations where a number of consecutive amino acids have the same phi or psi angle.

The alpha helix:  Each twist of the alpha helix is equivalent in length to 3.6 amino acids or residues, with hydrogen bonds between the carboxyl group of one residue and the amino group of another residue, four away from it on the helix. The ends of the helix are polar, and are usually on the surface of the protein. In the alpha helix, the backbone atoms are packed so as to provide a stable structure. All the hydrogen bonds in the alpha helix point in the same direction, as a result of the peptide bonds being aligned along the axis of the helix. There is also a dipole moment arising from the different polarity of the amino and the carboxyl groups, and this is also aligned along the helical axis. This results in the alpha helix having a significant net dipole, which is negative at the amino acid end and positive at the carboxyl end. These can attract ligands of opposite charges. Amino acid side chains project out from the alpha helix, with a tendency to go from hydrophylic to hydrophobic with the twisting of the helix. The most common location for an alpha helix is on the outside of the protein, with one side of the helix facing the hydrophobic interior of the protein. An isolated alpha helix is only marginally stable, but is stabilised by being packed together with other helices, by means of hydrophobic side chains. These side chain interactions are maximised if the helices are wound together in a coil-coiled arrangement.

Beta sheets: Beta sheets are the other main form of secondary structure. While alpha helices are formed from only one region of an amino acid chain, beta sheets can be built from several regions of the polypeptide chain. The regions involved in beta sheets are usually five to ten residues long. The individual beta strands are aligned so that hydrogen bonds form between amino groups and carboxyl groups on another strand. Beta sheets are formed from several of these beta strands, with the alpha carbons and related side chains located a bit above or below the plane of the beta sheet. There are three forms of beta sheet, parallel, anti-parallel and mixed. In parallel sheets, all strands run in the same biochemical direction, for instance from amino terminal to carboxyl terminal. In anti-parallel sheets, the beta strands are arranged in alternating biochemical directions. There are also mixed sheets with no particular ordering of the strands, but these are less common than the parallel and anti-parallel forms.

Loop regions:  Most proteins are built from alpha helices and beta strands, and these structures are connected by what are called loop regions. Combinations of alpha helices and beta sheets make up the hydrophobic core of the protein. The loop regions are on the surface of the molecule, including hydrophylic residues, and can thus form hydrogen bonds with water molecules. Loop regions are frequently involved in binding sites and enzyme active sites. Adjacent anti-parallel beta strands are joined by hairpin loops, sometimes referred to as reverse turns or turns. Loops are often involved in the conformation of protein, and can switch from an open to a closed conformation.

Motifs and domains:  Some combinations of secondary structures occur frequently, and these are referred to as motifs. The motif of two alpha helices joined by a loop region is found in many proteins. Another motif is related to calcium binding, with a loop region and the two alpha helices providing a scaffold for calcium ligands. There is also a common motif with two adjacent anti-parallel strands joined by a loop. Four adjacent anti-parallel beta strands are a common motif referred to as the Greek key motif.. Secondary structures are usually arranged in one or another of such simple motifs. Several such motifs may may make up a folded globular structure known as a domain. Polypeptide chains are folded into one or several such domains. These domains are classified as alpha, beta or alpha/beta domains. Alpha domains are comprised of alpha helices, and beta domains are comprised of beta sheets, while the alpha/beta structures have beta sheets surrounded by alpha helices.

Coiled-coil alpha helix structures are found in many globular proteins. In coiled-coil structures, the side chains in the first helix can fit into holes between the side chains in the next helix. Alpha domains consist of a bundle of alpha helices packed together in the hydrophobic core of the protein. The four-helix bundle structure is a common motif in these domains. Another important motif is the globin fold, found in a large group of related proteins, and comprising eight short alpha helices connected by loop regions. Packing interactions in the core holds the helices together in a globular structure. Alpha helices with surfaces covered by hydrophylic side chains are often found in membrane proteins.

Alpha/beta structures are the most common of protein structures. There are three types of alpha/beta structure. Firstly, there is an arrangement of eight parallel beta strands in a barrel-like arrangement surrounded by alpha helices. This is the most common of all domain structures, often referred to as the TIM barrel; secondly, there is a motif with a beta sheet with alpha helices on both sides, sometimes called the Rossman fold, and thirdly, a large number of parallel beta strands in  a curved sheet with alpha helices on the outside, sometimes referred to as the horseshoe fold.

Anti-parallel beta structures are another major class of protein conformation, and can involve between four and ten beta strands. Two twisted beta sheets form into a barrel-like structure, with a core of hydrophylic side chains inside the barrel. The beta strands usually pack against each other, and form a distorted barrel structure at the core of the molecule. The three main types of barrel are the up-and-down barrel, the Greek key barrel and the jelly roll barrel. In up-and-down barrels, the beta strands are connected by a short loop region, and can bind large hydrophobic ligands inside the barrel structure. The barrels appear well-suited to bind chemically diverse ligands. Which ligand scan be bound is a function of the size of the barrel and the amino acids it comprises.


Folding of proteins:  The process of a protein acquiring its three-dimensional and biologically active structure is called protein folding. Some polypeptide chains fold spontaneously, while others require the help of enzymes, or of a class of proteins called chaperones. When a chaperone binds to a polypeptide chain, it prevents it from associating with other chains and promotes its folding.

After this the protein reaches an intermediate or molten globular state, before spontaneously completing its folding. The molten globule is an intermediate process in the folding of some proteins. The packing of the interior of the protein is not complete at this stage. It is not clear what is the basis of the choice of a particular tertiary structure for a protein when it folds, although it is suggested that the hydrophobic side chains are involved at an early stage, and the burial of part of the chains backbone in the core suggests that these form hydrogen bonds between themselves at an early stage, by means of forming protein secondary structures. The protein progresses from a high energy unfolded state to a low energy folded state, with local energy minima separated by unstable intermediate states.

There are two major contributors to the energy difference between folded and unfolded protein, enthalpy and entropy. Enthalpy derives from the non-covalent interactions within the polypeptide chain, including hydrophobic interactions, hydrogen bonds, ionic bonds and van der Waals forces. Covalent bonds are usually the same in the folded and unfolded state, and so do not contribute to the energy difference. The non-covalent bonds, however, differ considerably between the two states, being stronger in the folded state, in order to hold together the hydrophobic core, and representing the contribution of the enthalpic energy contribution. However, the higher energy of the folded protein also reflects its more ordered or lower entropy state. It requires much less energy for the protein to be in its unfolded state. The free energy difference between the folded and the unfolded state is small reflecting the difference between a relatively large number for both enthalpy and entropy, leaving the folded state with only a marginal stability relative to the unfolded state.

In the folding of protein, a single conformation for any given amino acid appears to be the conformation involving the lowest free energy. It is seen from this that the protein needs a defined pathway to avoid the sampling of an impractically large number of possible conformations, and also to avoid being trapped in a local minima, which is not the global energy minimum.  Living cells need the globular proteins, in particular quantities at particular times, and it is important for redundant proteins to be degradable. In practice, globular proteins in cells have a rapid turnover, and a more rigid structure would be incompatible with the functioning or organisms.

Folded proteins are not in a static state but are influenced by the fluctuations of atoms or groups of atoms. The functioning of proteins also depends on ligand binding, which can trigger major conformational changes. Globular folded proteins are essentially unstable, and biologically active proteins can be reduced by small changes in temperature and acidity. Every atom of the folded protein is constantly in motion, and this can result in side chains flipping between conformations, some loop regions not being fixed in one single conformation, helices moving relative to each other, and domains changing their packing. Collective movements occur on a picoseconds timescale and small changes in conditions can produce conformational changes in proteins.

Introduction to Protein Structure (Part 2)

Carl Branden & John Tooze, Karolinska Institute & Imperial Cancer Research

Taylor & Francis Group

INTRODUCTION:  Part II of Branden and Tooze’s book has sections dealing with K+ ion channels and photosynthetic proteins. These are usefully read in conjunction with sections on this site dealing with the ideas of Gustav Bernroider, and also sections dealing with studies made since the publication of this book, relative to quantum coherence in photosynthetic proteins. Other sections deal with G-proteins and second messenger signalling in protein, and also with the interaction of protein and DNA. If quantum states in protein are related to consciousness, reading of this or similar good text books is recommended.

The folding problem:  The folding problem, or the difficulty of predicting how a protein will fold, just from its amino acid sequence, is seen as being usually described in terms of the computing power needed for searching through all the possible conformations of a polypeptide chain, for the conformation that requires the least energy (the energy minimum). The author has no particular solution to this problem, which might, however, be resolved by quantum computing within protein.

DNA and protein:  DNA replication, transcription and also the regulation of gene expression all involve recognition by proteins. Proteins that regulate the transcription of DNA recognise specific DNA sequences, by means of DNA binding domains within their polypeptide chains. Such chains are usually relatively small, with less than 100 amino acid residues. The conformation of DNA is stabilised by interactions between the sugar-phosphate backbone of the DNA and regions of protein. Proteins in contact with DNA often contains a helix-turn-helix motif. Recognition of DNA targets by the helix-turn-helix structure involves interaction between side chains of the recognition helix and bases in the major groove of the DNA. Interactions include those of hydrogen bonds between atoms in the protein backbone and atoms in the DNA backbone. DNA sequences are recognised by specific binding proteins. Proteins involved with DNA often have homologous domains (similarities in their amino acid sequence and folding structure), each of these containing a zinc atom as part of the binding domain. The protein binding motifs that relate to DNA have three-dimensional ‘scaffolds’ that match the contours of the DNA, ensuring the correct positioning of the interacting protein surfaces and the DNA. Many transcription factors involve DNA as an element in the binding domain. The polypeptide chains involved are usually short with only about 50 amino acids or less, with a regular pattern of cysteine and/or histidine residues along the chains. These residues bind to zinc atoms, which provide a scaffold for folding into a small compact domain. Side chains can be involved in both stabilising the structure, and providing hydrogen bonds to bind with the DNA. In typical instances, the DNA recognition molecule consists of a small polypeptide chain wrapped round a zinc atom, with the alpha helix able to bind into the major groove of the DNA. The frequent role of zinc and its ligands is to fix the conformation of the polypeptide chain, holding the alpha helix so that it  makes a specific contact with the DNA bases.

K+ ion channels:  Cells are bounded by membranes, comprising mainly a 4.5 nanometre deep bilayer of lipids. Protein molecules are embedded in this lipid bilayer. In one form of this arrangement, there is a hydrophobic trans-membrane segment, and two hydrophilic segments, one projecting outside the membrane, and one projecting into the interior of the cell. Some trans-membrane proteins, usually formed from alpha helices, penetrate the membrane several times. Within these, the N-terminal and the C-terminal are in the hydrophilic regions. P. Ion channels are one important trans-membrane channel, being central to the transmission of the spike along the axon. Most proteins that act as channels through the membrane have narrow and selective pores specialised in transporting potassium (K+), sodium (Na+), calcium (Ca2+) and chlorine (Cl-) ions across the cell membrane, in order to alter the balance of electrical potential between the two sides of the membrane.

The polypeptide chain of one K+ channel has 158 amino acids arranged in two pairs of alpha helices, with a hole in between, which constitutes the pore through the membrane. The atoms of the protein backbone line the walls of this narrow pore with the oxygen atoms of the carboxyl groups pointing into the pore, and thus forming binding sites for the K+ ions. The helix of the C-terminal  end of the main chain faces the central pore, while the N-terminal end of the helix faces the lipid layer. At the intracellular end, the pore is a water-filled channel, which widens out into into a water-filled cavity in the middle of the membrane. Finally, there is a narrow passage between the cavity and the outside of the cell.

The pore can contain three K+ ions, two in the narrow passage and one in the water-filled cavity. The orientation of the alpha helices means that the negatively charged carboxyl end is is close to the positively charged K+ ions. The main chain carboxyl groups provide closely spaced binding sites. The helices are firmly packed by means of their side chains, helping the carboxyl atoms to provide strong binding sites for the potassium ions. The K+ ions are forced through the channel by the difference in electrical potential between the two sides of the membrane and the opposite charges of the oxygen atoms. Within the filter, the ions cascade from one oxygen atom to the next.

Photosynthetic proteins:  The photosynthetic reaction centre is a large protein complex that is crucial for life on Earth. This converts sunlight into electrical and chemical energy by pumping proteins from one side to the other of a membrane. The reaction centre is embedded in a membrane. Each reaction centre is surrounded by about 100 membrane proteins, which are the antenna pigment protein molecules. Each of these contain several hundred chlorophyll molecules that guide photons into the reaction centre. The reaction centre comprises four amino acid chains plus four bacteriochlorophyl molecules, two of which comprise a dimer known as the ‘special pair’. Two of the amino acid chains in the reaction centre are secured by five trans-membrane alpha helices. These two chains have a similar sequence, differing only in some of their loop regions. Alpha helices that are part of these chains account for most of the membrane-spanning of the complex. The photosynthetic pigments are bound to these two chains and mostly to the trans-membrane helices. The pigments form two possible pathways for electron transfer across the membrane, and this brings the molecules of the ‘special pair’ into close contact. They are bound in a hydrophobic pocket.

In photosynthesis, light energy is converted to electrical energy by an electron flow that separates positively and negatively charged molecules. Some molecules absorb photons, and use this energy to donate a molecule to an electron acceptor molecule. In most cases, there is a rapid flow back from the the acceptor to the donor, but in the reaction centre, there is a very fast forward reaction, but in a slow back reaction. This leads to a separation charge and a consequent storage of energy, because energy would be released, if the charges were able to come together. The author admits that the detailed mechanism was not understood at the time of writing, which can be related to the later papers in Nature, by both Engels and Collini, commented on elsewhere on this site. At this stage, the author merely notes photosynthetic systems, as being almost free of energy loss compared to an efficiency of only 20% in man made solar cells.

All photosynthetic organisms have a system of light-harvesting complexes that surround the reaction centre and increase the light-capturing area. The reaction centres receive most of their light energy from these complexes rather than their own chlorophyll molecules. In bacteria but not in plants, pigments are bound to hydrophobic protein molecules embedded in the membrane. Light-harvesting in plants uses a different structure.

In purple bacteria, one light-harvesting complex is referred to as an inner antenna, and it is directly associated with the reaction centre, with the ring of pigments thought to surround it, while there is a second peripheral antenna. Both the antenna are comprised of alpha and beta amino acids. There is a downhill flow of energy from the light-harvesting protein to the reaction centre. When a photon is absorbed by any of the chlorophyll molecules, it becomes delocalised between all the chlorophyll molecules of the ring.

Trans-membrane helices in the reaction centres are embedded in a hydrophobic environment, and thus built up from continuous regions of hydrophobic amino acids. About 20 amino acids are required to span the lipid bilayer. In the reaction centre, some residues extend outside the hydrophobic membrane, and their positions may be determined by their loop regions. Binding pockets for ligands can be located between the membrane spanning helices.

G-proteins and intracellular signalling:  Many signal receptors are membrane proteins that bind specific extracellular molecules including neurotransmitters, growth factors and hormones. These can give rise to and be amplified by cascades of enzyme reactions that have many different functions within the cell. There are three main types of membrane receptor, ion channel receptors, G-protein receptors and enzyme receptors. G-proteins, which are molecular amplifiers, have an extracellular domain that recognises molecular signals and an intracellular region that gives a response.

Sensory responses are often linked to G-proteins. The receptors involved have seven helices spanning the membrane. The intracellular signals are amplified by a family of proteins called G-proteins. G-proteins are so-called because they bind guanine nucleotides, and they act as molecular switches by binding to GTP. Further molecules bind to the G-protein to accelerate the process of hydrolysis. In this active state, the G-proteins activate downstream effector molecules, thus amplifying their signal. Most G-proteins are trimers consisting of an alpha, beta and gamma unit. All three of these are anchored to the membrane lipids that are covalently bound to the N-terminal region of the alpha unit of the G-protein, and to the C-terminal of the gamma unit of the G-proteins. The beta unit is bound to the gamma unit by a coil-coiled structure, although the gamma unit is not itself anchored directly to the lipids. This lipid anchoring facilitates the functioning of the G-proteins. The alpha unit has two domains, one of which is catalytically active, and switch between an active GTP-bound form and an inactive GDP-bound form.

In its inactive state, the G-protein is bound to the intracellular domain of the trans-membrane protein. When a specific ligand binds to the external domain of the receptor, a signal activates the intracellular domain and this activates the G-protein, which transmits the signal to effector proteins also embedded in the membrane. These, in their turn, activate second messenger molecules such as cyclic AMP.

The G-protein signals depend on a cycle between a resting state, where they are bound to GDP, and an active state, where they are bound to GTP. This process involves hydrogen bonds between the G-protein and the GTP. Essentially all the above molecular mechanisms stem from the trimeric G-proteins. The complicated processes of signalling in cells are assisted by small modules within the polypeptide chains that guide protein catalysts to their target molecules.

Print Friendly
Tags: Posted by

Leave a Reply