ISSN: 0974-276X
Research Article - (2009) Volume 2, Issue 7
It has been more than 30 years since the initial report of the discovery of ubiquitin as an 8.5 kDa protein of unknown function expressed universally in living cells. And still, protein modification by covalent conjugation of the ubiquitin molecule is one of the most dynamic posttranslational modifications studied in terms of biochemistry and cell physiology. Ubiquitination plays a central regulatory role in number of eukaryotic cellular processes such as receptor endocytosis, growth-factor signaling, cell-cycle control, transcription, DNA repair, gene silencing, and stress response. Ubiquitin conjugation is a three step concerted action of the E1-E2-E3 enzymes that produces a modified protein. In this review we investigate studies undertaken to identify both ubiquitin and SUMO (small ubiquitin-related modifier) substrates with the goal of understanding how lysine selectivity is achieved. The SUMOylation pathway though distinct from that of ubiquitination, draws many parallels. Based upon the recent findings, we present a model to explain how an individual ubiquitin ligase may target specific lysine residue(s) with the co-operation from a scaffold protein.
Keywords: Mass spectrometry, Ubiquitination, SUMOylation, Ubl, E3 ligase, Sequence motif.
Ub: Ubiquitin; MS: Mass Spectrometry; Ubl: Ubiquitin-Like; UBD: Ubiquitin-Binding Domains; UIM: Ubiquitin-Interacting Motif; UBA: Ubiquitin-Associated; RING : Really Interesting New Gene; TrkA: Tyrosine kinase receptor A; NRIF: Neurotrophin Receptor Interacting Factor.
Ubiquitination was originally described as a mechanism by which cells disposed of short-lived, damaged or abnormal proteins. However, its involvement in diverse cellular processes is coming to light and considered to rival phosphorylation. Ubiquitination is an ATP-requiring process and at the center of this modification is ubiquitin a 76-amino acid (~9 kDa) protein (Figure 1), which is highly conserved across eukaryotes and is synthesized as a fusion protein either to itself or to one of two ribosomal proteins (Schlesinger et al., 1987). Conjugation involves attachment of C-terminal glycine of ubiquitin (Ub) to the ε-amino group in lysine residues of the targeted protein. The conserved conjugation reaction is achieved by sequential actions of three enzymes (Hershko et al., 1998). The reaction commences with the formation of a thiol-ester linkage between the glycine residue at the C terminus of Ub and the active cysteine (Cys) residue of the first enzyme of the system, Ub activating enzyme (commonly referred to as E1). The ubiquitin molecule is then subsequently transferred to the cysteinyl group of the second enzyme called Ub-conjugating enzyme (E2). Lastly, through the action of an Ub ligase (E3), ubiquitin and the marked substrate are linked together via an amide (isopeptide) bond. This ability of an E3 to recognize and bind both the target substrate and the Ub-E2 enzyme suggests this enzyme provides specificity to the Ub reaction. At this point, the ubiquitination reaction may result in the addition of a single Ub molecule to a single target site, mono-ubiquitination (Figure 2). Alternatively, ubiquitination may result in the addition of single molecules of ubiquitin to other Lys in the target protein giving rise to multi-ubiquitination. After the initial ubiquitin is conjugated to a substrate, it can also be conjugated to another molecule of ubiquitin through one of its seven lysines. An isopeptide bond is formed between Gly76 of one ubiquitin to the ε-NH2 group of one of the seven potential lysines (K6, K11, K27, K29, K33, K48 or K63) of the preceding ubiquitin, giving rise to many different types of poly-ubiquitinated proteins (Adhikari and Chen, 2009). These poly-ubiquitin chains can vary in length with respect to the number of ubiquitin molecules, resulting in different topologies and, ultimately different functional consequences. For example, Lys48-linked polyubiquitination primes proteins for proteolytic destruction by the proteasome (Chau et al., 1989), whereas Lys63-linked polyubiquitination plays a key role in regulating processes such as DNA repair (Spence et al., 1995; Hofmann and Pickart, 1999), stress responses (Arnason and Ellison, 1994), signal transduction (Sun and Chen, 2004; Mukhopadhyay and Riezman, 2007), and intracellular trafficking of membrane proteins (Hicke, 1999; Geetha et al., 2005; Mukhopadhyay and Riezman, 2007). Proteins tagged with ubiquitin are most often destined for degradation by the proteasome. Recent studies reveal that all non-K63 linkages may target proteins for degradation (Xu et al., 2009). However this is still a matter of debate since K63-chains have also been shown to serve as a targeting signal for the 26S proteasome (Seibenhener et al., 2004; Saeki et al., 2009). Both, mono-ubiquitination and poly-ubiquitination also possess non-proteasomal regulatory functions like targeting proteins to nucleus, cytoskeleton and endocytic machinery, or modulating enzymatic activity and protein-protein interactions (Hershko et al., 1998; Pickart, 2001). Recent reports have indicated non lysine moieties can serve as ubiquitin acceptor sites. Ubiquitination occurring at noncanonical site —the N terminus— has been reported for transcription factor MyoD, the latent membrane protein-1 of Epstein-Barr virus, and p21, lead to proteasome-mediated degradation (Aviel et al., 2000; Breitschopf et al., 1998; Bloom et al., 2003). Moreover, studies have shown the cysteine residue is required for ubiquitination of major histocompatibility complex class I proteins by the viral E3 ligases (Cadwell and Coscoy, 2005).
Figure 1: Ubiquitination reaction. The protein substrate is ubiquitinated in a reaction involving three types of ubiquitinating enzymes: the ubiquitin activating protein E1, an ubiquitin carrier protein E2, and an ubiquitin-protein ligase E3. Following addition of a single ubiquitin molecule to a protein substrate (monoubiquitination), further ubiquitin molecules can be added to the first, yielding a polyubiquitin chain. The fate of the protein depends on the type of ubiquitin chain formed on the protein substrate.
Figure 2: Ubiquitin modifications. A) Mono-ubiquitination is involved in transcription, histone function, endocytosis and membrane trafficking. B) Multi-monoubiquitination is involved in protein regulation. C) Polyubiquitination is involved in signal transduction, endocytosis, DNA repair, stress response, and targeting proteins to the proteasome.
Like other posttranslational modifications (e.g. phosphorylation) ubiquitination is highly regulated and reversible process. It is controlled by the opposing activities of the E3 protein ubiquitin ligases which attach Ub molecules covalently to target proteins and de-ubiquitinating enzymes (DUBs) which remove the ubiquitin from target proteins (Wilkinson et al., 1997). Reversible covalent modification allows cells to rapidly and efficiently convey signals across different sub-cellular locations. It has been predicted that the human genome encodes three Ub-protein E1 enzymes, about fifty Ub-protein E2 conjugating complexes, over 600 ubiquitin ligases and about 100 DUBs (Kaiser and Huang, 2005).
Lysine residues are a target for diverse posttranslational modification enzymes which either attach methyl, acetyl, hydroxyl, ubiquitin or SUMO moieties to it. Except for hydroxylation, all of these attachments are reversible. In addition to ubiquitin, several ubiquitin-like proteins (Ubls) can also be conjugated to alter the function of the substrate proteins at lysine residues. These small molecular modifiers include NEDD8 (neural precursor cell expressed, developmentally down-regulated 8), ISG15 (interferon-stimulated gene 15), FAT10, FUB1 (FBR-MuSV associated ubiquitously expressed gene), UBL5 (ubiquitin-like 5), URM1 (ubiquitin-related modifier 1), ATG8 (autophagy associated protein 8), ATG12 (autophagy associated protein 12), and three SUMO isoforms to which ubiquitin bears much resemblance (Kerscher et al., 2006). However, modification of these Ubls requires their own unique combinations of E1, E2 and E3 and addition of these tags to the target protein likely serves a different function compared ubiquitination. These protein tags have been implicated in numerous cellular activities including DNA synthesis and repair, transcription, translation, organelle biogenesis, cell cycle control, signal transduction, protein quality control in the endoplasmic reticulum, immune system etc (Kerscher et al., 2006). These different Ubls are activated and conjugated to their substrates by a process very similar to the biochemical reactions of ubiquitination. All the structurally characterized Ubls share the ubiquitin or β-grasp fold, even when their primary sequences have little similarity (Kerscher et al., 2006).
Like several other posttranslational modifications, ubiquitination changes the molecular conformation of a protein, thereby influencing protein-protein interactions. Ubiquitin modification is known to alter protein localization, activity and/or stability through interaction with various proteins. These modifications on the target protein (either through monoubiquitination or polyubiquitination) act as attachment sites for proteins with ubiquitin-binding domains (UBDs) (Bertolaet et al., 2001; Wilkinson et al., 2001). The first UBD was characterized in a proteasome subunit, the S5A/RPN10 protein11. Similarity searches of a short sequence of S5a bound to ubiquitin led to the identification of a sequence pattern known as the ubiquitin-interacting motif (UIM) (Hofmann and Falquet, 2001). The ubiquitin-associated domain (UBA) was identified as a common sequence motif present in multiple proteins participating in ubiquitin-dependent signaling pathways (Hofmann and Bucher, 1996). Of the total sixteen UBDs reported to date, discovery of UIM and UBA domains, was the most important as it propelled the study of ubiquitination. Both UBA and UIM are known to bind poly- and mono- ubiquitin chains. The other ubiquitin-binding domains include a diverse family of structurally dissimilar protein domains, such as MIU, DUIM, CUE, GAT, NZF, A20 ZnF, UBP ZnF, UBZ, Ubc, Uev, UBM, GLUE, Jab1/MPN, and PFU (Hurley et al., 2006). Of these, many UBA-containing proteins are reported to bind polyubiquitin chains, some serve as shuttling factors for delivery of ubiquitinated proteins to the proteasome (e.g. hHR23A, p62 and Dsk2) (Seibenhener et al., 2004). This function is thought to be achieved by binding of the UBA domain to the ubiquitinated substrates, while simultaneously interacting with the proteasome through another domain (like Ubl domain) (Seibenhener et al., 2004).
Ubiquitin-protein ligases (E3) are the last (but likely the most important) components in the ubiquitin conjugation system because they play an important role in controlling target specificity. The E3s recruit target proteins, position them for optimal transfer of the Ub moiety from the E2 to a lysine residue in the target protein, and initiate the conjugation. Ubiquitin E3 ligases can be either monomeric proteins or multimeric complexes with the most common type of Ub ligases grouped into two classes depending on their modular architecture and catalytic mechanism. Typically E3s containing a HECT domain (Homologous to E6-AP C Terminus) forms a direct thioester bond with ubiquitin. Their approximately 350 amino acid HECT domains contain a conserved Cys residue that participates in the direct transfer of activated ubiquitin from the E2 to a target protein (Hershko et al., 1998; Pickart, 2001). On the other hand, RING (Really Interesting New Gene) finger domain ligase consists of Cys and His residues that coordinate two Zn++ ions. The globular architecture of the domain primarily functions as a scaffold for the interaction of E2s with their target proteins (Hershko et al., 1998; Pickart, 2001). These ligases require a structural and/or catalytic motif that facilitates ubiquitination without directly forming a bond with ubiquitin. RING finger domain containing E3s comprise the largest ligase family, and contain both monomeric and multimeric ubiquitin ligases. There are three types of multisubunit E3s —SCF (Skp1-Cullin-F-box protein), the APC, and the VHL (von Hippel Lindau protein) E3(s)— where a small RING finger protein is an essential component. A lesser known family of Ub E3 ligases includes an E2-binding domain called the U-box adaptor E3 ligases. The U-box ligase was first identified in yeast Ufd2 acting as an accessory protein (E4) promoting polyubiquitination of another E3's substrate (Kuhlbrodt et al., 2005). Bioinformatics studies placed them under conventional RING E3 ligases, as the U-box ligases adopt a RING domain-like conformation via electrostatic interactions (Aravind and Koonin et al., 2000). Genome-wide annotation of the human E3 superfamily genes (Li et al., 2008) had revealed the number of putative E3 genes, 617, to be greater than the number of human genes for protein kinases, 518, suggesting the extent of biological targets of ubiquitination.
Substrate Selection for Ubiquitination
One salient question is what determines whether or not a protein is tagged by Ub? While as of yet this cannot fully be answered, recent research has uncovered some interesting clues. It has been proposed that proteins contain an “embedded code” that is recognized by the Ub machinery (Figure 3). For example, E3 ubiquitin ligases recognize their corresponding protein substrates via a variety of structural determinants, including primary sequence, post-translational modifications and protein folding state. Herein, we consider some of the other examples discovered thus far for directing target specificity.
The N-end Rule
There exists a correlation between the half-life of a protein and its N-terminal residue (Bachmair et al., 1986). The stability of a protein is dependent on the nature of its N-terminal amino acid residues, which are classified either as stabilizing or destabilizing residues. Proteins with N-terminal Met, Ser, Ala, Thr, Val, or Gly are known to have half-lives greater than 20 hours. In contrast, proteins with N-terminal Phe, Leu, Asp, Lys, or Arg have half-lives of 3 min or less. The N-end rule pathway is a proteolytic pathway targeting proteins for degradation through destabilizing N-terminal residues (N-degrons). An N-degron consists of a protein's destabilizing N-terminal residue and an internal Lys residue. E3 Ub ligases that recognize these N-degrons are called N-recognins, which share a ≈70-residue motif called the UBR box. UBR1 (also known as E3α) is the recognition component of the N-end rule pathway that binds to a destabilizing N-terminal residue of a substrate protein and participates in the formation of a substrate-linked polyubiquitin chain. Mutations in human Ubr1 have been associated with the Johansson–Blizzard Syndrome (JBS), which includes mental retardation, physical malformations and pancreatic dysfunction (Zenker et al., 2005). The N-end rule has a hierarchical structure in which primary, secondary and tertiary destabilizing N-terminal residues participate differentially based on their requirements for enzymatic modification. Recent studies have shown that though the N-end rule pathway in prokaryotes and eukaryotes employ distinct proteolytic machineries that share common principles of substrate recognition (Mogk et al., 2007). The processes that control N-end have just begun to be unraveled and only a few in vivo substrates been identified.
PEST Sequences
Particular amino acid sequences within the polypeptide act as proteolytic recognition signals. Analysis of sequence motifs in rapidly degraded proteins, lead Roberts and Rechsteiner to identify PEST sequences. Stretches of PEST sequences which are rich in proline (P), glutamate (E), serine (S), and threonine (T) (along with a lesser extent, aspartic acid) serve as a destruction signal (so called "PEST sequences") (Rogers et al., 1986). Ubiquitination of proteins by multi- subunit ligases, consisting of Ubc3/Cdc34, Skp1, cullin/Cdc53 and F-box proteins has been shown to be preceded by phosphorylation within the PEST motif (Feldmann et al., 1997). Furthermore, phosphorylation of Ser or Thr residues in the PEST regions of proteins has been shown to activate their recognition and processing by the ubiquitin-proteasome pathway (Yaglom et al., 1995; Lanker et al., 1996; Willems et al., 1996; Won and Reed, 1996).
D- box and the KEN Box
By far, short sequence motifs serve as primarily signals for degradation. This specific degradation mechanism is involved in regulating cell cycle proteins. Ubiquitination of mitotic cyclins is mediated by a small NH2-terminal motif known as the "destruction box" or “D-box” (Glotzer et al., 1991). The minimal motif is nine residues long with, the following consensus sequence: R-A/T-A-L-G-X-I/V-G/T-N. The destruction box, while either phosphorylated or ubiquitinated, serves as a binding site for the ligase subunit of the APC/cyclosome complex. Deletion experiments suggested that NH2-terminal sequences of cyclin B, 90 in sea urchins (Murray and Kirschner, 1989) and 72 in humans (Lorca et al., 1992), play a critical role in targeting cyclins for degradation. The resistance of truncated proteins to degradation indicated interaction of the NH2-terminal portion of cyclin with the destruction machinery. Mutations in the D-box of cyclins severely reduce and/or abolish their ubiquitination abililty (Glotzer et al., 1991; Lorca et al., 1992; Amon et al., 1994; Stewart et al., 1994). Moreover, the cyclin B destruction box is portable, as chimeras containing the N-terminus of cyclin B that has been integrated into other proteins result in their rapid degradation.
A new targeting signal, the KEN box, present in Cdc20 was identified by Pfleger and Kirschner (2000). Mutations studies identified four key residues necessary for substrate recognition in the motif K-E-N-X-X-X-N, (in which aspartic acid in the final position supported similar polyubiquitination as the asparagine). Active KEN boxes have been reported within other proteins and like D-boxes are transposable to other proteins. Both D-box and KEN-box are recognized by Cdh1 and/or Cdc20, which subsequently recruit the APC/cyclosome complex, leading them to ubiquitination and proteasome-mediated degradation of the target protein. The D-box is recognized by both Cdc20 and Cdh1, whereas the KEN-box is preferentially recognized by Cdh1. Cdc20 itself contains a KEN box, which is therefore recognized by Cdh1, ensuring the temporal degradation of Cdc20.
Sugar Recognition
N-glycans were recently found to act as ubiquitination signaling molecules. It was recently demonstrated that Fbx2, component of large SCF-type E3 ubiquitin ligase complex specifically binds N-linked glycoproteins and ubiquitinates them, leading to degradation via the endoplasmic reticulum associated protein degradation (ERAD) pathway (Yoshida et al., 2002). Fbx2 recognizes high mannose on its substrates to eliminate glycoproteins in neuronal cells. In yeast, the HRD/DER pathway is the main ubiquitination system known to be involved in the ERAD pathway. More E3 ligases outside the HRD/DER pathways are being recognized that target their substrates employing sugar-recognition (Yoshida, 2003).
Hydroxyproline
Hypoxia inducible factor-1 (HIF1) is a heterodimeric transcription factor, composed of alpha and beta subunits, which responds to changes in cellular oxygen content. In the presence of oxygen, HIF1α is targeted for destruction by the E3 Ub ligase VHL. Human VHL protein recognizes and binds to the conserved hydroxylated proline 564 in the alpha subunit (Ivan et al., 2001). Prolyl-hydroxylation of HIF1α by HIF prolyl-hydroxylase is the key regulator of the interaction of the enzyme VHL ligase and HIFα (Jaakkola et al., 2001). HIF1 is known to play key role in various cellular responses to hypoxia, like the regulation of genes involved in energy metabolism, angiogenesis, and apoptosis. Thus, an absolute requirement for dioxygen as a co-substrate by prolyl-hydroxylase suggests that HIF1 is a master regulator of metabolic adaptation to hypoxia in vivo (Semenza, 2000).
Protein Misfolding
The molecular chaperones are known to bind misfolded or unfolded proteins to prevent protein aggregation. They either catalyze the refolding of the protein through an ATP-dependent mechanism (if feasible) or target these misfolded proteins for ubiquitination. CHIP (C-terminus of Hsc70-interacting protein) is an excellent example of U-box E3 ligase family as it targets the misfolded proteins (Connell et al., 2001; Jiang et al., 2001). Molecular chaperones such as heat shock protein Hsp70 and Hsp90 work in concert with co-chaperones such as CHIP to promote substrate degradation. CHIP, as mentioned previously, is an E3 ubiquitin ligase enzyme responsible for the ubiquitination of Hsp70 misfolded substrates such as the serine/threonine kinase Raf-1, glucocorticoid receptor, tau and immature CFTR proteins (Connell et al., 2001; Shimura et al., 2004; Petrucelli et al., 2004; Jiang et al., 2001).
Phosphorylation Based
Additionally, studies have revealed that a specific ubiquitin ligase recognizes phosphorylated IKBα (pIKBα) through a short peptide stretch, composed of 6 aa motif ( e.g., DS(PO3)GXXS(PO3)). This highly conserved region suggests a well-defined E3 recognition motif. A similar motif is also present in β-catenin, mutating any of the conserved residues within these recognition sites results in stabilization of both IKBs as well as β-catenin. A lysine residue, located 9–12 aa N-terminal to the recognition site, is also conserved between IKBs and β-catenin, suggesting a single enzyme mediates both the recognition and conjugation of ubiquitin to these substrates via two functional sites residing in one or two distinct proteins (Hunter, 2007).
Altogether, these studies illustrate the diversity in determinants of various individual Ub E3 ligases. Thus, there is a need to focus on single Ub E3 ligase system to understand how individual ligases select their targets for modification and achieve site specificity. Numerous large-scale studies have been undertaken to identify ubiquitinated substrates. However, the identification of ubiquitinated lysines has proven to be difficult for many proteins.
Approaches Taken to Identify Ubiquitinated Proteins
There is a need for novel techniques designed to identify and characterize protein modifications on a large or global scale. For example, there are more than 500 E3s in the human genome, yet functional information is available for only a small fraction. Linking an E3 with its substrates is difficult and is generally dependent on either a functional connection or a physical association between the proteins. Given the large number of potentially ubiquitinated substrates and E3s, new strategies to deduce E3-substrate pairs are needed since performing biochemical screens for E3 substrates is labor-intensive, is hampered by low substrate levels, as well as, the intrinsically weak interactions between E3s and their substrates.
Mass Spectrometry Approaches
Most of the studies done to date are either specifically targeted towards identifying the ubiquitinated site in a single protein (like EGFR) or geared toward large-scale approaches ( i.e. identifying the ‘ubiquitome’ in a cell). These large-scale analyses of ubiquitinated proteins usually employ multi-step approaches that include affinity purification and MS (mass spectrometry) analysis of proteins. This approach was successful in yeast (Peng et al., 2003), human cell lines (Matsumoto et al., 2005), and transgenic mice (Jeon et al., 2007). MS-based approaches to identify precise ubiquitination sites rely on the fact that isopeptide-linked ubiquitin can be cleaved by trypsin between Arg74 and Gly75, producing a signature diglycine peptide. Ubiquitination can be detected based on two properties; firstly, that peptides containing an ubiquitinated site (or sites) have an incremental molecular mass of 114 Da for each targeted lysine residue; secondly, that ubiquitin conjugation to a lysine residue inhibits proteolytic cleavage by trypsin at the modified site. In their landmark approach for large-scale screening of ubiquitinated sites, Peng and colleagues detected 110 ubiquitinated sites from 72 ubiquitin-tagged proteins (Peng et al., 2003). This was the most comprehensive study conducted where endogenous yeast Ub genes were disrupted and replaced by His epitope-tagged ubiquitin. Additionally, their large-scale approach using shotgun sequencing generated a dataset of more than 1000 candidate substrates. Database searching revealed 110 ubiquitinated sites on 72 different proteins. Subsequently, use of tagged ubiquitin in vivo in a transgenic mouse model was described (Tsirigotis et al., 2001). Immunoaffinity purification of ubiquitinated substrates in mammals (Vasilescu et al., 2005) was used to separate substrates after being trypsinized. Over 70 ubiquitinated proteins and 16 signature Ub attachment sites were identified by LC-MS/MS analysis. In a variation of this method, identified potential Ub ligase substrates were identified by subjecting the immunoaffinity purified fractions from human cells to both native and denaturing conditions (Matsumoto et al., 2005). Combinations of several proteomic studies are summarized with regard to the purification strategies, methods used and total number of Ub-tagged candidates identified (Table 1).
Mass spectrometric approaches | |||
---|---|---|---|
Purification strategies | Screen | Substrates/sites identified | References |
(HIS)6-biotin-Ub Ni-chelate chromatography LC/LC-MS/MS |
Hela cells | 100 proteins Included both ubiquitinated ubiquitin associated proteins |
Gururaja et al. |
Membrane associated | Yeast proteome | 211 overall identified 83 prtoeins ERAD substrates > 30 sites |
Hitchcock et al. |
FT-ICR MS | Ubc5 | 15 sites | Cooper et al. |
In gel digestion LC-MS/MS |
Breast cancer cells | 96 sites | Denis et al. |
SCX cation exchange LC/LC-MS/MS |
Yeast proteome | 1075 proteins 110 sites |
Peng et al. |
No Ub tag Immunoaffinity GeLC-MS/MS |
Breast cancer cells | 70 proteins | Vasilescu et al. |
No Ub tag Immunoaffinity with (native and denaturing) LC/LC-MS/MS |
Human cells | proteins identified 670 native conditions 345- denaturing conditions 18 sites |
Matsumoto et al. |
MALDI-TOF MS/MS of sulfonated tryptic peptides | CHIP | 3 proteins 1 site |
Wang et al |
In vitro Ub assay | BRAC1/BARD1 | 2 proteins | Sato et al. Starita et al. |
(HIS)6-biotin-Ub Native nickel chromatography LC/LC-MS/MS |
Human cells | 22 proteins 4 sites |
Kirkpatrick et al. |
Subtractive Ub profiling Affinity purification LC/LC-MS/MS |
Proteasome receptor Rpn10 in Yeast | 54 substrates | Mayor et al. |
Non-Mass spectrometric approaches | |||
Two-hybrid screen | Yeast proteome | Some positive substrates | Uetz et. al |
Luminescent assay Ub-biotin |
188 purified GST-tagged yeast proteins | 7 novel Rsp5 substrates | Kus et. al |
Protein Microarrays | Yeast proteome | 150 potential substrates 40 strong candidates |
Gupta et. al |
UBA-association | Adult human brain cDNA library screen | 11 proteins | Pridgeon et al. |
S5a-affinity chromatography Two-dimensional analysis |
Mammalian tissues | Some proteins hHR23B identified |
Layfield et al. |
Affinity purification GST-fused UBDs LC-MS/MS-based (MudPIT) analysis |
Arabidopsis proteome | 294 proteins 85 sites |
Moar et al. |
Table 1: Comparison of Mass-spectrometric approaches and non-mass spectrometric approaches to identify ubiquitinated proteins and target sites.
While recent advances in mass spectrometry have quickly expanded our repository of proteins modified by the ubiquitin family, MS-based approaches are still biased towards identifying highly abundant and stable complexes. Ub ligase-substrate complexes are known to be transient and only a fraction of the sampled protein is ubiquitinated at a given time. Also, it has been reported that miscleavage at Arg74 in the ubiquitin sequence generates a longer tag (LRGG) that is difficult to identify. The peptides generated by trypsin sometimes are too large to undergo standardized analytical procedures. Most of the purification strategies use tagged ubiquitin, but there are still no reports on how ubiquitination machinery reacts towards tagged ubiquitin as compared to the wild-type. Moreover the accurate identification of Ub substrates is hindered because some ubiquitin-like proteins (Nedd8 and ISG15) are known to target lysine residues which are known to generate the same GG peptides by trypsin digestion, as with ubiquitin. This results in detection of false positive results. Thus, MS-based proteomics identifies a broad range of post-translationally modified substrates in an unbiased manner. In addition to this, only relatively few ubiquitinated substrates have been identified due to the difficulty of detecting small quantities of transient Ub-tagged proteins in the complex mixed with highly abundant proteins in the purified sample. This requires an additional step in the identification procedure in order to separate out those proteins from ubiquitinated samples. While various fractionation studies have been applied prior to MS to overcome these barriers, there still exist issues regarding resolution and sample loss. Thus, despite the extensive efforts to accurately identify Ub substrates and the target site, the MS-based methods used have been laborious and results far from accurate. As a result novel methods like stable-isotope-based quantification strategies and development of non-MS based approaches to aid in differentiating Ub-targeted proteins from the background proteins without the need to enrich ubiquitinated substrate pool in the sample is much needed.
Non-mass Spectrometry Approaches
Another approach toward developing tools for the purification of ubiquitinated substrates is making use of the fact that UBA domains bind polyubiquitin chains with high affinity. The relative ease of UBA–agarose conjugates production, as compared with anti-ubiquitin antibody production, makes these domains an attractive resource in ubiquitin pull-down experiments. Ubiquitin-binding proteins have been described based on the type of ubiquitin-binding domains/motifs they possess. Their ubiquitin-binding properties have just begun to be exploited in charactering the ‘ubiquitome’, which consists of all ubiquitinated proteins in the cell. The ability of the UBA domain to bind polyubiquitin was employed in a screen coupled with in vitro transcripton/translation of a human cDNA library from adult brain to identify proteins interacting with the p62 UBA domain (Pridgeon et al., 2003). A total of 11 proteins were identified as putative ubiquitinated proteins, most of which were important in neuropathologies. With approximately 5% of the total Arabidopsis proteins known to be involved in the UPS/proteasome system, more and more studies are being directed towards identifying ubiquitinated substrates. The first large scale study conducted in plants used recombinant GST-tagged ubiquitin binding domains (UIM and double UBA domain). Affinity purified ubiquitinated proteins were separated by SDS-PAGE, and then trypsin-digested before they were analyzed by a multidimensional protein identification technology (MudPIT) system; more than 290 putative ubiquitinated proteins were identified and 85 ubiquitinated lysine residues in 56 proteins were characterized (Maor et al., 2007). More recently, affinity purification employing the UBA domain of p62 yielded a total of 200 putative ubiquitinated proteins from Arabidopsis (Manzano et al., 2008). Proteins bound to the p62-agarose matrix were digested with trypsin and later separated by HPLC chromatography followed by identification by MALDI-TOF/TOF. However, affinity purification of ubiquitinated substrates, using a UBA domain has its drawbacks. Apart from interacting with ubiquitin, some UBA domains interact with UBL domains (Walters et al., 2003; Lowe et al., 2006; Kang et al., 2007; Layfield et al., 2001), as well as, other proteins (Dieckmann et al., 1998; Feng et al., 2004; Gao et al., 2003; Boutet et al., 2007; Gwizdek et al., 2006; Ota et al., 2008), thus raising questions regarding their specificity with respect to ubiquitin chains. A combination of SILAC (stable isotope labeling with amino acids in cell culture), parallel affinity purification (PAP), and mass spectrometry was used to identify F-box ligase substrates in yeast. This approach was successful in identifying transiently modified substrates and proteins tagged with poly Lys-48 chains for degradation; however, this method failed to detect already reported substrates such as Fzo1p (Fritz et al., 2003; Escobar-Henriques et al., 2006; Cohen et al., 2008), and Gal4p (Muratani et al., 2005).
Using a yeast protein microarray numerous known and novel ubiquitinated substrates of the E3 ligase Rsp5 were recently identified in a high-throughput manner (Gupta et al., 2007). These protein microarrays contained more than 4000 GST- and 6 × HIS-tagged yeast proteins from S. cerevisiae spotted on nitrocellulose slides and directly tested for ubiquitination by Rsp5 in vitro.
However, not all known Rsp5 substrates were identified in their screen, since some of the known substrates were not printed on the array, and some Rps5 substrates are known to require adaptor proteins to bind to Rsp5. Moreover, there is a possibility that some of the substrates might have been lost in the purification process because of their weak and transient interaction with the enzyme, making it impossible to determine the impact the tags had on the accessibility of some substrates. A more powerful approach, global protein stability (GPS) profiling consists of a fluorescence-based multiplex system for assessing protein stability on a high-throughput scale for SCF substrates (Yen and Elledge, 2008). A powerful feature of this technique was that it monitored the E3 ligase activity. This screen recovered 73% of the previously reported SCF substrates and found a total of 359 proteins as likely substrates. Since the technique measured indirect effects of the SCF ligase activity on proteins, all those proteins whose stability was either increased or decreased in response to various drugs or stimuli were reported. However, the GPS technique can failed to detect a protein whose functionality was altered as a result of ubiquitination, or if a protein changed its localization in the cell or acquired different binding partners. Again, it was impossible to access what role the fusion tag may have played in the stability of these proteins.
Recent advances in this field have been made by the generation of antibodies that are capable of recognizing ubiquitin linkages of a specific conformation. Two groups have independently generated K63-chain specific antibodies for use in Western blotting (Newton et al., 2008; Wang et al., 2008). These reagents should enhance the identification of K63 ubiquitinated substrates and further define the functional role for this tag.
Clearly, it has been difficult to achieve a robust approach for the large-scale identification of ubiquitinated substrates in the cell. Each of the methods employed to date have inherent advantages and disadvantages, therefore there is a need for an alternative solution toward solving the problem of identifying the “embedded code” that predicts lysine selectivity in a target substrate. Lessons can be learnt from computational investigations aimed at identification of a SUMOylation motif required for target selection (Rodriguez et al., 2001).
Lessons from SUMO: Examining the Nearest Kin
Of the several new Ubl modifiers that have been discovered in the past few years, the SUMO pathway has received the most intense scrutiny. SUMO was identified in 1996 as a peptide conjugated to the nucleocytoplasmic-transport protein RanGAP1, resulting in a change in its cellular localization (Matunis et al., 1996). Since the discovery of SUMO as a post-translational protein modifier over 10 years ago, more than 200 proteins targets have been reported, with the majority being nuclear proteins. SUMOylation is known to cause either alteration in protein localization, a change in protein activity, or differences in interaction with binding partners (Geiss-Friedlander and Melchior, 2007). SUMO is about 20% similar to ubiquitin in its primary sequence and contains ~15 additional N-terminal amino acid residues (Bayer et al., 1998). Like, ubiquitination, SUMOylation is achieved by sequential action of three enzymes; the activating (E1), conjugating (E2), and ligating (E3) enzymes. Nevertheless, SUMO E1, E2, and E3s are very distinct from the E1, E2 and E3 of the ubiquitination system (Yeh et al., 2000). Despite the similarities in structure and conjugation mechanism, they both have distinct physiological effects in the cell. To date, there is only one reported example of both E1 (SAE1/SAE2 heterodimer) and E2 (UBC9) for SUMOylation, in contrast to the large number of E1s and E2s reported for the ubiquitination pathway. Like the ubiquitination system several SUMO E3 ligases have been identified, most of which have a SiYz/PIAS (SP)-ring motif required for their function. There are three types of known SUMO E3 ligases – PIAS proteins, RanBP2, and Pc2 each conferring substrate specificity to the SUMOylation reaction. As additional SUMO targets and pathways influenced by SUMO regulation are recognized, the significance of this pathway is beginning to be appreciated. SUMOylation is known to participate in diverse cellular events, including chromosome segregation and cell division, DNA replication and repair, transcriptional regulation, nuclear transport and signal transduction (Müller et al., 2001). Four different type of SUMO isoforms (SUMO1 - 4) are reported in mammals. SUMO-1 is the most commonly found conjugated isoform under normal conditions. SUMO-2 and SUMO-3 have very similar sequence identity and appear to be conjugated in response to stress signals. SUMO-4 is more tissue-specific, as it is identified in human kidney, suggesting its involvement in more tissue-dependent functions. Both SUMO2/3 and SUMO-4 contain an internal consensus motif ψKXE (where ψ represents a large hydrophobic amino acid, and X represents any amino acid) that is required for SUMO modification both in vivo and in vitro (Rodriguez et al., 2001), which is missing in SUMO-1. Exploiting the fact that Ubc9 binds to this motif directly (Sampson et al., 2001), a number of SUMO targets have been identified via their interaction with Ubc9 in the yeast two-hybrid screen. Not all ψKXE motif found in proteins are modified, as SUMO E3s are presumed to enhance specificity by interacting with other features of the substrate. In addition, to the consensus sequence amino acids upstream or downstream of the acceptor lysine may help to insure accessibility of the substrate for the conjugation apparatus. For some SUMO substrates, additional interactions occur outside the consensus sequence (Anckar and Sistonen, 2007; Bernier-Villamor et al., 2002), demonstrating the involvement of multiple, co-operating interactions in regulating the target selection process. In this regard, the consensus sequence can be seen as a local mediator of substrate-conjugation apparatus interaction, fine-tuning the SUMO conjugation event by facilitating the correct positioning of the target lysine residue to the active site of Ubc9.
Approaches similar to the identification of ubiquitinated substrates have been utilized in identifying novel SUMO targets and/or total SUMOylated substrates in the cell. These methods rely upon purification of SUMOylated proteins from cell lysates via affinity tags, followed by MS analysis (Li et al., 2004; Zhao et al., 2004; Zhou et al., 2004; Vertegaal et al., 2004; Wohlschlegel et al., 2004; Panse et al., 2004). A variety of affinity-tagged SUMOs have been described that have been overexpressed to overcome low levels of SUMOylated proteins in the cells, a major barrier to MS sensitivity. Moreover, at a given time only a small fraction of proteins in the cells are SUMOylated, since it is a dynamic process in which conjugation and de-conjugation work in concert. It has been suggested that <1% of the proteins in a cell are SUMO modified at any given time (Johnson, 2004), thus making efforts at detecting these modified proteins difficult. The use of several genomic/proteomic and in silico combinatorial approaches to identify global pool of ‘Sumo-tome’ has lead to identification of ~500 potential SUMO substrates (Wohlschlegel et al., 2004; Gocke et al., 2005; Zhou et al., 2005). However, bona fide SUMOylation sites may still remain to be identified or confirmed in vivo. Thus, as experimental proteomics approaches become more and more-labor intensive and time-consuming, there is a growing need to develop prediction tools that would aid in successfully predicting the target substrate. In this regard, computational techniques have presented a promising approach toward identifying SUMOylation sites. Given this, the first computational prediction tool SUMOplot, was developed which predicted the probability for a SUMO attachment. The SUMOplot prediction heavily depended on identification of the SUMO consensus motif. This limited the prediction results as many non-consensus true positives were missed. SUMOsp was developed based on a manually curated 239 experiment-verified SUMOylation sites from the literature (Xue et al., 2006). GPS and MotifX, two earlier described strategies, were applied to the dataset, yielding good (89.12%) prediction platform for SUMOylation sites. Another bioinformatic study accurately predicted SUMO modified sites employing a statistical method based on properties of individual amino acid surrounding the SUMO site (Xu et al., 2008).
Status Quo on Ubiquitination Sites
To better understand lysine selectivity within a protein destined for ubiquitination (Figure 3), it is first important to survey the literature for reported proteins and their ubiquination sites. The first report exploring the preferences for a specific ubiquitination site was conducted on human red blood cell protein a-spectrin (Galluzzi et al., 2001). The investigators demonstrated that the leucine zipper was a potential ubiquitin recognition motif by site-directed mutagenesis. Moreover, in addition to the primary sequence it has been suggested that secondary folding also plays a role in directing the lysine selected for ubiquitination. The leucine zipper described in multi-ubiquitination of c-Jun (Treir et al., 1994) is observed in a number of other gene regulatory proteins with 75% similarity to the flanking regions of ubiquitinated α-spectrin lysine (Murantani and Tansey, 2003). This suggests a conformational recognition mechanism in which positioning of the Lys plays an important role in directing specificity. In another study, K187 (out of the possible six available lysines) was found to be a preferred ubiquitin target site in the transcription activator Rpn4 (Ju and Xie, 2006). Primary sequence analysis revealed the close proximity of K187 to the N-terminal acidic domain, which acts as ubiquitination signal for transcription activators. Additionally, surface hydrophobic residues are known to be required for ubiquitination of several proteins for proteasomal degradation (Bogusz et al., 2006; Johnson et al., 1998). The neurotrophin receptor TrkA was one of the first receptors to be identified as a K63-polyubiquitin tagged at K485 (Geetha et al., 2005). Recently, ubiquitination of a lysine within the membrane proximal region of granulocyte colony-stimulating factor receptor (G-CSFR) was reported (Wolfler et al., 2009) and K63-ubiquitination of K338 was reported for the Jen1 Transporter (Paiva et al., 2009) Altogether, a picture is emerging where K63-chains may play a role in regulating internalization and sorting of receptors.
Studies conducted on both the Huntingtin and Androgen receptors support the importance of conserved pentapeptide pattern (FQXL(L/F)) as determinants in their degradation by the proteasome (Chandra et al., 2008). Another report on the E3 substrate selection process analyzed the ubiquitinized-yeast proteome based on subcellular localization (Catic et al., 2004). This study revealed the presence of compartment-specific sequence patterns for ubiquitinated substrates. Structural analysis of ubiquitinated proteins demonstrates a preference for an exposed lysine residue on the surface of the molecule. Additionally, a survey of 40 ubiquination sites from 23 proteins showed clear secondary structure preference for lysine ubiquitination. Modifications were prominent at the lysines occurring in loop regions (26/40) followed by lysines in a-helices (10/40) (Catic et al., 2004). This investigation also reported the presence of compartment-specific motifs within the dataset. For example, nuclear proteins had preference for ubiquitination of lysines near the phosphorylatable residues. Similar bias was observed for ubiquitinated plasma membrane proteins that had either Glu or Asp at -1 or -2 positions from the acceptor lysine (Catic et al., 2004). Thus, investigating the overall primary and secondary structure as well as the proteins’ subcellular localization could yield important information regarding the targeting of the substrates.
Specificity Provided by a Scaffold
Many E3 ligases are known to interact with specific substrates either directly or through scaffold proteins. Scaffold proteins facilitate interaction between the E3 enzymes and their substrates through their multi-domain architecture. One such scaffold is p62, a highly conserved and transcriptionally regulated protein that plays important roles in ubiquitination, receptor trafficking, protein aggregation, and inclusion formation (Seibenhener et al., 2004). P62 acts as a scaffold by interacting with the RING E3, TRAF6, through a TRAF-binding site (TBS) as well as other proteins through one of its many protein-protein interaction domains. Interaction between p62 and TRAF6 has been shown to auto-activate TRAF6 (Wooten et al., 2001; 2006). Functional domains in p62 include a Phox and Bem1p (PB1) domain, a TRAF6-binding region, and an UBA domain (Geetha et al., 2002). The C-terminal UBA domain of p62 has been shown to non-covalently bind ubiquitin (Mueller et al., 2002). Moreover, p62 functions as a shuttling factor for polyubiquitinated substrates by binding the ubiquitinated proteins through its UBA domain and the 26S proteasome through its N-terminal PB1 domain (Wooten et al., 2005). The tyrosine kinase receptor A (TrkA) (Geetha et al., 2005) and the neurotrophin receptor interacting factor (NRIF) (Geetha et al., 2005), both have been shown to be K63- polyubiquitinated by the TRAF6/p62 complex. In a recent study, in a attempt to understand the lysine selection process employed by TRAF6/p62 the primary sequences of the lysines that were targeted for ubiquitination in both TrkA and NRIF were examined for a possible consensus motif (Jadhav et al., 2008). A close look at these two substrates revealed the presence of a conserved consensus pattern for ubiquitination by the TRAF6/p62 complex. This consensus pattern has also been observed in others members of the Trk receptor family, TrkB and TrkC (Jadhav et al., 2008). Interestingly a consensus pattern identified in these proteins was a 10-amino acid long stretch {[–(hydrophobic)–K–(hydrophobic)–X–X–(hydrophobic)–(polar)–(hydrophobic)–(polar)–(hydrophobic)] where K was the ubiquitinated lysine residue and X any other amino acid} required to successfully target the primary lysine residue (Jadhav et al., 2008). These studies further suggest the possibility that an “embedded code” that exists whereby an E3 ligase targets a specific lysine residues for modification over others. Therefore, to better understand the lysine selection process during ubiquitination, it is important to examine the enzyme-specific selection process. The development of an algorithm to search a training dataset of p62/TRAF6 interactors could be employed as a first step in development of a computational tool to aid in discovery of TRAF6 targets.
Model for Substrate Selection
Substrate selection and site specificity is a multi-step process depending on two types of signals, both primary and secondary. The primary signals are the structural motifs; α-helices or β-sheets that influence the local architecture of the primary sequence. Secondary signals, on the other hand, are inherent primary sequences that are essential for the recognition of the primary ubiquitination site. Of both, secondary signals can vary slightly depending on the localization of proteins in the cell.
What can be learned from the E3 TRAF6? In the case of TrkA site-specific ubiquitination (Geetha et al., 2005), the E3, TRAF6, exists as a complex with the E2, UbcH7, in the cytosol. Post-receptor stimulation, the E2/E3 pair form a transient complex recruited to the scaffold, p62, to mediate the ubiquitination of TrkA (Geetha et al., 2005). The target lysine within a protein can either be buried inside a hydrophobic pocket of the globular protein structure or masked, while the protein is interacting with a different binding partner. Binding of the scaffold protein likely induces a conformational change in the proteins’ structure exposing the buried target site (Figure 4A). Thereafter, the scaffold recruits the activated E3/E2 complexes to the substrate protein. The enzyme complex then scans the exposed surface for an acceptor lysine that possesses the appropriate conformation. Once an accessible lysine is recognized and if the nearby flanking residues present an appropriate environment, transfer of the ubiquitin molecule occurs. In other cases, the active enzyme complex E3/E2 first binds to the substrate protein and produces a similar type of conformational change (i.e., exposure of the target site). This binding of substrate to the E3 produces structural changes for accommodating the scaffold protein to the complex, which aids in the enzymatic process (Figure 4B). Our results suggest that the former model is more likely operative for site-specific ubiquitination of the target (Geetha et al., 2005).
Figure 4: Model for substrate selection mechanism for Ub E3 ligase/scaffold complex. The target lysine site can either be masked or buried inside the hydrophobic pocket of the globular protein structure or be exposed to the exterior surface on the substrate. A) The scaffold protein interacts with the E3/E2 complex providing specificity for ubiquitination. Employing an embedded code the complex, with the assistance of the scaffold, directs ubiquitination of the target substrate on one or more specific lysine residues. This model is supported by studies with p62/TRAF6 complex (Geetha et al., 2005). B) Alternatively, the interaction of the E3 with the putative substrate changes the conformation of the substrate and allows it to recruit scaffold protein which in turn provides a platform for the ubiquitination reaction to take place.
The analysis of the ‘ubiquitome’ presents one of the most exciting and challenging tasks in current proteomics research. The ultimate limiting factor in studying ubiquitination substrate selection mechanism is the lack of curated data sets of ubiquinated proteins. This makes it difficult to evaluate, and compare target sites to decode selectivity and specificity. With identification of more than 500 or so ubiquitin ligases there exists a need to rapidly and precisely identify enzyme-specific substrates. This task demands that we take multiple novel approaches as well as a combination of techniques to precisely identify target sites for these ligases. With rapid advancement in mass spectrometric analysis and more sophistication in proteomic tools and novel approaches we can expect the number of precisely identified sites to rise. Moreover, use of bioinformatic methods to predict site modification in silico could yield more efficient results. These prediction tools should be closely integrated into the interpretation of proteomic experiments. Also as proteomics methods identify more and more in vivo ubiquitination sites, prediction algorithms can be fine tuned and improved with this information. The model that we propose here can be applied to other E3 Ub ligases that are known to employ scaffold proteins to aid in their substrate selection process (Figure 4). For example, the BTB-domain proteins that were identified as substrate-specific scaffolds for Ub E3 ligase CUL-3 in C. elegans (Xu et al., 2003). Lysine ubiquitination interplays actively with other post-translational modifications, either agonistically or antagonistically, to form a coded message for intramolecular signaling programs that are crucial for governing cellular functions. Given the intricacy of the ubiquitin system, research into its functions and mechanisms should continue to yield novel insights into cell regulation.
This work was funded by NIH (NINDS 33661) to MWW. We thank Drs. Scott Santos and Michael Wooten for reading and review of draft form of this manuscript.