ISSN: 0974-276X
Research Article - (2009) Volume 2, Issue 11
Lung cancer is a leading cause of cancer death worldwide, and lung cancer proteomics studies have been carried out to reveal the molecular background of cancer phenotypes and to develop clinically relevant applications. Here, we report an open-access proteome expression database derived from the study of 262 lung cancer cases using data extracted by two-dimensional difference gel electrophoresis (2D-DIGE) and mass spectrometry. Proteins extracted from primary tumor tissues were labeled with CyDye DIGE Fluor saturation dye, and separated using a large format electrophoresis device, generating 3179 protein spots. Mass spectrometry following in-gel digestion identified 487 proteins corresponding to 721 protein spots. Multiple proteins were observed from single protein spots, and single proteins generated multiple protein spots, suggesting diversity of the proteome. The results of 2D-DIGE and protein identification, and part of the corresponding clinico-pathological data are freely accessible in the public proteome database Genome Medicine Database of Japan Proteomics (GeMDBJ Proteomics, http://gemdbj.nibio.go.jp/dgdb/DigeTop.do).
Keywords: Lung cancer proteomics, GeMDBJ proteomics, Proteome database, Two-dimensional difference gel electrophoresis (2D-DIGE).
Lung cancer is a leading cause of cancer death in Japan, claiming 55,000 lives annually, and is a major health problem in many countries. Despite the modern therapeutic strategies, early recurrence is common and the prognosis for patients with lung cancer is generally poor, with an overall 5-year survival rate for patients receiving treatment of only 14% (Hoffman et al., 2000). A more detailed characterization of the molecular background of the carcinogenesis and progression of lung cancer is required for obtaining information relevant to early tumor detection and for the development of novel targeted therapeutics.
Lung cancer proteomics studies have been conducted to identify the proteins that correspond to certain clinico-pathological parameters of value in lung cancer. An open-access proteome database is a useful platform to integrate the proteome data derived from different patients and different malignancies, allowing the proteomics community to share the proteome data. However, there is no proteome expression database practically applicable in cancer proteomics studies to date. For this reason, we constructed a proteome database for lung cancer using 262 surgically resected frozen tissue samples, two-dimensional difference gel electrophoresis (2D-DIGE) using highly sensitive fluorescent dyes (CyDye DIGE Fluor saturation dye) and an original large format electrophoresis device.
In 2D-DIGE, different protein samples are labeled with fluorescent dyes with different emission and excitation wavelength, mixed together and separated by two-dimensional gel electrophoresis (Unlu et al., 1997). By including a common internal control sample labeled with a fluorescent dye different from that for the individual samples, the gel-to-gel variations can be canceled out, and reproducible results can be expected across a large number of samples. 2D-DIGE can improve the aspects of classical 2D-PAGE that place critical limitations and provide a platform for unique applications such as for the use of laser microdissected tissues (Kondo et al., 2003; Kondo and Hirohashi, 2006). We have extensively applied 2D-DIGE to study samples derived from surgical specimens with an aim of developing clinically relevant biomarkers (Kondo, 2008).
In this study, primary tumor tissues from 262 lung cancer patients were subjected to lung cancer proteomics. The corresponding clinico-pathological information is available in GeMDBJ Proteomics (Figure 1) and includes the tumor size, status of lymph node metastases and pathological staging. This project was approved by the ethical board of the National Cancer Center and written informed consent was obtained from all donors.
Protein samples were prepared by homogenizing frozen lung cancer tissues as previously described (Kondo and Hirohashi, 2006). In brief, proteins were extracted using a urea lysis buffer (2 M thiourea, 6 M urea, 3% CHAPS, 1% Triton X-100) from tumor tissue powdered by a Multi-beads shocker (Yasui-kikai, Osaka, Japan). For preparative purposes, 100 micrograms of the extracted proteins were labeled with a CyDye DIGE Fluor saturation dye according to the manufacturer’s instructions. For analytical purposes, the internal control sample was prepared by mixing a small portion of all 262 individual samples. Five micrograms of the internal control sample and the individual samples were labeled with Cy3 and Cy5 respectively, and mixed together. Then the labeled protein samples were separated by 2D-PAGE using a large format electrophoresis device (Kondo and Hirohashi, 2006). The gel images were obtained by scanning the gels with a laser scanner (Typhoon Trio, GE Healthcare Biosciences, Uppsala, Sweden) at the appropriate wavelength. All protein spots were numbered by the Progenesis SameSpots software (Nonlinear Dynamics, Newcastle, UK) according to the spot numbers in the master gel image. A typical 2D image with the merged and numbered protein spots is exhibited in the GeMDBJ Proteomics. Proteins in the recovered protein spots were subjected to in-gel digestion and the trypsin digests were subjected to liquid chromatography coupled with tandem mass spectrometry, using a Finnigan LTQ linear ion trap mass spectrometer (Thermo Electron Co., San Jose, CA) equipped with a nano-electrospray ion (NSI) source (AMR Inc., Tokyo, Japan). The Mascot software (version 2.1, Matrix science, London, UK) was used to search for the mass of the peptide ion peaks against the SWISS-PROT database. Proteins with a Mascot score of 34 or more were subjected to protein identification. When multiple proteins were identified in a single spot, the proteins with the highest number of peptides were considered as those corresponding to the spot, while the proteins with lower but significant scores were also recorded in the database. All procedures for protein identification were reported in our previous report (Kondo and Hirohashi, 2006).
The system reproducibility of 2D-DIGE was significantly high when we ran the same lung cancer tissue sample twice; the correlation coefficiency of the intensity of the 3170 protein spots detected was 0.85, and the intensity of 3029 of these protein spots (ie. of 95.5% of all spots detected) was scattered within a range of two fold differences from the mean. We randomly selected protein spots, and resulted in the positive identification of the proteins contained in 721 protein spots by mass spectrometry. The two-dimensional gel image and the results of protein identification as well as the supporting mass spectrometric data are exhibited in GeMDBJ Proteomics.
Among the 721 protein spots identified, we found that 391 protein spots contained multiple proteins, accounting for 45.8% of all protein spots with annotations. Figure 2a demonstrates the number of proteins that may be observed in a single protein spot. We had similar results in our database study for the pancreatic cancer proteome (Yamada et al., 2008), which, in turn, are consistent with the results of previous proteome studies (Campostrini et al., 2005; Westbrook et al., 2001). Multiple proteins may be detected in single protein spots probably because of the limited separation performance of 2D gels, the relatively large number of protein spots with detectable intensity, and the high sensitivity of protein identification by mass spectrometry. However, the protein overlap may not be a serious problem when we use two-dimensional gels for semi-quantitative comparative studies, because, as discussed by Hunsucker et al., (2006), only a few proteins contained at each location may contribute to the detectable signal due to the fact that the intensity of the rest is lower than the detection limit. Indeed, in our experience, gels with longer separation distance have a higher number of protein spots, as the protein spots that overlap in gels of smaller size are separated. However, to construct further experiments based on 2D-DIGE results, western blotting may need to be employed to confirm the contribution of each individual identified protein to the differential intensity of the spots between the sample groups.
We also found that 254 proteins were identified in multiple protein spots, accounting for 65.3% of all identified proteins. We had similar results in our previous study on the pancreatic cancer proteome (Yamada et al., 2008). Figure 2b demonstrates the number of proteins only detected in single protein spots. Actin, vimentin, and enolase A, among others, appeared repeatedly in different protein spots, a finding that may be explained by the presence of alternative splicing or posttranslational modifications. These observations may indicate some unique advantages of gel-based proteomics; the proteins are separated reflecting their whole structure. Once proteins are cleaved with protease for mass spectrometric studies, such data would be otherwise missed. These observations may also reveal the presence of a critical problem in employing 2D gel-based proteomics for biomarker studies. In our experience, 2D-DIGE and SDS-PAGE/ western blotting often exhibited discordant results. Such discordance may in part be due to the presence of protein spots other than those identified as containing biomarker candidates.
We are planning to up-load the proteome data derived from a range of malignancies to the GeMDBJ Proteomics. Presently, the GeMDBJ Proteomics includes the 2D-DIGE proteome data derived from pancreatic cancer cell lines, esophageal cancer, Ewing’s sarcoma, lung adenocarcinoma, and malignant pleural mesothelioma tissues. Proteome data of other malignancies such as colorectal cancer, hepatocel lular carcinoma, cholangiocarcinoma, synovial sarcoma, osteosarcoma, rhabdomyosarcoma, and gastrointestinal stromal tumor and will be included soon. The GeMDBJ Proteomics will thus be the largest proteome expression database containing data from a wide range and number of clinical cases. The integration of proteome data of different malignancies will be our next challenge.
This work was supported by a grant from the Ministry of Health, Labor and Welfare and by the Program for the Promotion of Fundamental Studies in Health Sciences of the National Institute of Biomedical Innovation of Japan.