Protein Families

Overview

Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism. Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members. If these new proteins contain similar amino acids in key locations, protein domains, and possibly the overall three-dimensional structure, can remain similar. Proteins within a family can have as low as 30% amino acid sequence homology but still perform related functions.

Protein Superfamilies

Protein superfamilies are larger groups of proteins that have evolved from a more distant ancestor. They generally have lower sequence homology as compared to a protein family but still have significant structural features in common. Each superfamily can contain several protein families with more closely related structures and functions. Some larger families are even further divided into sub-families. The exact distinction as to whether proteins belong to a superfamily, family, or subfamily can vary between classification systems and is still changing as the amount of protein sequence and structural data continues to grow.

The immunoglobulin protein superfamily (IgSF) is one of the largest protein superfamilies; over 700 superfamily members are found in the human genome. All members of the superfamily contain one or more immunoglobulin (Ig) domains. This domain has a unique three-dimensional structure composed of a sandwich of two anti-parallel beta-sheets, and most are involved in cell adhesion or ligand binding. The IgSF contains many families including antigen receptors, cell adhesion molecules (CAMs), cytoskeletal proteins, and several growth-factor and cytokine receptor groups. Several of the larger families are further divided into subfamilies. The antigen receptor family can be further divided into subfamilies: the antibody or immunoglobulin family and the T- cell receptor family; the CAMs can be divided into the NCAM, ICAM, and CD2 related protein families.

Classification Databases

Protein family classifications allow scientists to understand functional and evolutionary relationships between proteins. Several online resources can be used to search for known protein families or classify newly discovered proteins. Pfam is one of several online databases where a scientist can search for known proteins and their family members. A researcher can also enter the amino acid sequence of a newly discovered protein to see if it might belong to a known family of proteins due to sequence similarity. This can provide a testable hypothesis as to the possible role of the novel protein as family members often have similar structures and functions.

Procedure

A protein family is a group of proteins that have evolved from a common genetic ancestor. These proteins have similarities in their three-dimensional structures and the functions they perform.

Proteins within a family are called homologs.

Orthologs are homologous proteins in different species that evolved from the same protein in a common ancestor. Generally, orthologous proteins perform similar functions in the different species.

Alpha hemoglobin in humans and alpha hemoglobin in mice are orthologous proteins. In contrast, paralogs are homologous proteins produced by gene duplication and can be present in the same or different species. Alpha and beta hemoglobin in humans are paralogous proteins.

A superfamily is composed of two or more families that have evolved from a more distant common ancestor than that of a protein family. Proteins in a superfamily have larger variations in structure and function.

Protein families can help us make a hypothesis about the function of a protein with a known amino acid sequence, but unknown shape or function. These proteins can be compared to other proteins in the family to help predict the three-dimensional structure and help determine the function of the protein.

Protein families often occur through gene duplication, when genetic mechanisms in an organism create an extra copy of a gene. The original copy of the gene and its duplicate copy can mutate and diverge in function as their gene sequences code for different amino acid sequences than their genetic ancestors.

Proteins within a family may have sequence identity as low as 30%, that is only 30% of the amino acid primary sequence may be identical; however, the overall protein structure and the domains within the protein are often incredibly similar.

For example, hemoglobin and myoglobin are proteins that are thought to have evolved by gene duplication. The α and β subunits of hemoglobin have only 49% amino acid sequence identity but have the same general pattern of secondary and tertiary protein structure, that is similar positions of their alpha helices and turns of their amino acid chain.

The hemoglobin alpha subunit and myoglobin have only 26% sequence identity but also have similar secondary and tertiary structures. All three proteins can bind oxygen through a heme group as these homologous proteins are members of an oxygen-binding protein family.