ISSN: 2379-1764
Review Article - (2017) Volume 5, Issue 3
Outbreaks caused by foodborne microbes pose serious public health and food safety concerns worldwide. There is a great demand for rapid, sensitive and high-throughput methods to detect and track these pathogens in food, water and other environments. Recent advances in DNA genomic technology have enabled high-throughput analyses of strains by capturing total genomic content of strains and with concomitant comparative phylogenies. Microarrays are particularly adept for distilling large amounts of genomic DNA sequence information such as the gene(s) or genetic traits of hundreds of foodborne isolates in a single experiment. Hence, over the past two decades, microarray technology has advanced tremendously due to accessibility to thousands of complete and draft microbial genomes and this progress has led to the design and manufacturing of newer microarrays which can now identify gene sequence variations down to a single nucleotide polymorphism. DNA microarray remains a useful tool for rapid and refined genomic analysis of foodborne microbes. In this review, we will primarily focus our discussion on pathogen detection, serotype identification and tracking the genetic diversity and source of contamination of respective foodborne strains with our first-hand experience in using this technology.
Keywords: DNA microarray; Foodborne pathogens; SNPs; Public health and food safety
Foodborne diseases result in an estimated 9.4 million illnesses in the United States every year [1]; and this estimation only reflects a minority of foodborne illnesses, hospitalizations, and deaths which occur as part of recognized outbreaks [2]. Thus, foodborne pathogens such as Salmonella, Escherichia coli O157:H7 and other Shiga toxin producing E. coli (STEC), Listeria monocytogenes and Cronobacter are public health concerns [1]. There is a growing demand for analytical tools that can be used for rapid, comprehensive, and robust pathogen detection and identification with further subtyping that allows for microbial source tracking, epidemiological and phylogenetic investigations [3-6]. Whilst pathogen detection and control are essential in food safety, other applications have been developed for product surveillance containing live microbial ingredients (beneficial microbes) included in cultured foods and dietary supplements [7].
Genomes sequencing efforts and the ability to immobilize thousands of DNA fragments onto a surface, such as coated glass slide or membrane, have led to the development of DNA microarray technology. Numerous microbial genomes can be easily represented in a single array, making it affordable to perform genome-wide microarray analysis on a large scale. A microarray is a pattern of ssDNA probes which are immobilized onto a chip or a slide. The principle behind microarray involves complementary base-pairing hybridization between two ssDNAs followed by a detection strategy for positive hybridization. In many cases, fluorescent labeled target sequences that bind to an immobilized probe sequence on the array surface will generate a signal that is a measure of correct sequencespecific hybridization. DNA microarray technology may be defined as a high-throughput and versatile technology. The two major applications of genomic DNA microarray technology primarily and traditionally involved transcriptional profiling for gene expression, whereas measurement of the similarities or differences in genetic contents between or within different microbial strains or groups of strains ensued within recent years. This latter development was due to recognition of extensive genomic plasticity in pathogen genomes (differing on the Mbp) that was observed from the early whole genome sequencing (WGS) efforts. Microarray applications can be generally stratified into four types depending on the subject to be analyzed:
In this instance, the cDNA is derived from the mRNA of a strain or cell line that has been exposed to or challenged in some manner. This is a comparative metric with a respective control scenario. Hybridization intensities translate to gene expression profiles related to the specific condition [8-14]. At present, this remains a preferred method over RNAseq due to the heavy bioinformatic burden and cost associated with WGS approaches especially for large number of samples typically required for functional genetic studies.
Mutation analysis
There are generally three approaches that involve probe tiling and/or re-sequencing for de novo mutation detection and identification. A single base difference between two sequences is known as single nucleotide polymorphisms (SNPs) and identifying SNPs from the genome of an organism is known as SNP detection. Additionally, pre-defined variant analysis panels that have SNP targets of epidemiological or phylogenetic significance can be captured with specifically targeted probes that are degenerative in a key central nucleotide that affects productive hybridization. By extension, spans of probes tiles strategically spaced apart can thereby give mutation detection and if spaced at 1 nucleotide resolution can in fact provide direct “solid state” short DNA sequence readout. Using the Affymetrix platform, for example, scientists at Food and Drug administration (FDA) have each taken advantage of these strategies for foodborne microbial hazards involving viruses and bacteria [15-17].
Comparative genomic analysis
It can be used for investigating the presence or absence of genes which also helps define an organism’s phylogenetic relatedness among related organisms. For example, two strains may have orthologous genes where their gene sequences have diverged along species’ epithet lines. In the case of two strains sharing a high degree of sequence similarity for these genes a higher hybridization signal will be captured, i.e., the more divergent the gene sequence, the less hybridization signal is captured. In this case, each gene target is represented on the microarray by 22 individual oligonucleotide probes (25-mer) as shown in Figure 1. The 22 probes together make up 11 probe pair sets, and each probe pair set includes 11 perfect-match (PM) probes and 11 mismatch (MM) probes per gene. A PM probe matches the reference sequence perfectly, while an MM probe contains a single nucleotide mismatch in the central (13th) position of the probe [18,19]. In essence, this becomes a true allele-based profiling for genetic capability and capacity. A particular application of this discriminatory capacity involves targeting serotyping loci providing a true molecular-based serotyping identification.
Figure 1: Probe set design used in DNA microarray (adopted from Li et al. [19]).
Multiple species component array
In keeping with the above strategy in iii), it is feasible to conduct analysis of mixed populations of organisms in a single sample using allele-based profiling. This is particularly useful in verifying content claims for foods marketed with live bacteria such as probiotics [7]. As well, there would be industry implications and utility for monitoring Good Manufacturing Processes in the product production cycle [20]. Such a genomic design is purposefully very broad and is quite adept for species level identification, and has been demonstrated to successfully identify several species in culture independent manner from a single product.
Currently, DNA microarray technology is widely used for the investigation of foodborne pathogenic and environmental isolates [4,15,19,21] for source attribution, phylogeny, serotyping [16,19,22,23] and for sequence typing (ST) analysis [24]. For instance, during the last decade, scientists at the Center of Food Safety and Applied Nutrition (CFSAN), FDA in conjunction with Affymetrix, Inc., have developed several microarray assays [25] for the investigation of genetic diversity and phylogenetic relatedness of Escherichia coli O157:H7 [23,26], STEC [22,27], Salmonella [19], Listeria monocytogenes [21], Cronobacter [28-30] and enteric viruses [15,17].
As a case in point, one of the FDA custom microarrays was used in the investigation of the genetic diversity of Salmonella isolates obtained from irrigation ponds associated with produce farms in southeast United States. Salmonella is a diverse group of pathogens that have evolved to survive in a wide range of environments and hosts [31]. There are more than 2600 serovars [32] and the majority (over 1500) of them belong to Salmonella enterica subsp. enterica , which encompasses most of the serovars that are of greatest clinical relevance. Our previous work indicated predominance (56.8%) of Salmonella Newport among isolates recovered from irrigation ponds used in produce farms over a 2 year period [19]. To investigate the issue of environmental survival of Salmonella, we utilized a novel microarray chip previously employed with several Salmonella outbreaks including (most notably) an egg outbreak of 2010 [25] and exploring the global genomic diversity of Salmonella. In brief, this design included genomes of Salmonella (n=38), E. coli (n=27), Shigella (n=10) and Vibrio cholerae (n=10). In total, this chip covers over 80,000 unique genes representing the pan-genomes of these four foodborne pathogens, including known antibiotic resistance and virulence genes [12].
In the case of irrigation water, the microarray analysis not only correctly identified all the isolates, but also differentiated the S. Newport isolates into two phylogenetic lineages (Salmonella Newport II, III). Serovar distribution based on microarray data showed no instances where the same serovar was recovered from a pond for more than a month. Furthermore, during the study, numerous isolates with an indistinguishable genotype were recovered from different ponds over 2 years. Despite the fact that isolates within either lineage were phylogenetically related, subtle genotypic differences were detected within the lineages by microarray, suggesting that isolates in either lineage could have come from several different pathogen (Salmonella) harboring hosts. Based on this comparative genomic evidence derived from microarray and the spatial and temporal factors in the irrigation pond environment background where the isolates were recovered, we were able to speculate that the presence of Salmonella in the ponds was likely due to numerous punctuated reintroduction events from several different hosts in the environment. These findings may have implications for the development of strategies for efficient and safe irrigation to minimize the risk of Salmonella outbreaks associated with fresh produce [19].
A prime example of distilling WGS data into a useful framework is highlighted with the research development behind the FDA E. coli identification array. For food safety, Shiga toxigenic E. coli presents a formidable challenge in determining risk for potential pathogenicity in humans. This is based on several factors including virulence genes (and their subtypes), other accessory genes required for disease manifestation (such as adherence factors), and serotype with an etiological history of causing disease. Considering E. coli is one of the well-characterized bacteria, undoubtedly attributable to its status as commensal, pathogen and classical molecular genetic mainstay, availability of genomic data enabled a highly refined and comprehensive design to cover the various aspects required for regulatory needs. This one platform provides a toolbox for serotyping, virulence profiling, molecular epidemiology, and comparative phylogenies as a one stop shop. It has been used for regulatory decisions with isolates from at least 39 foods [27]. A particularly notable case involving strain level discrimination with isolates from cookie dough, E. coli O157 strains were clearly epidemiological different between food and clinical sources [16]. This strain level discriminatory capacity was also demonstrated in high-profile cases involving spinach (2006) and O104:H4 in fenugreek seeds (2011).
Tall et al. [28] showed another example of how a pan genomic microarray approach can assist in foodborne investigations as a highly discriminatory characterization and identification tool for source attribution of Cronobacter. Surveillance studies have shown that Cronobacter contaminates a multitude of foods and environments, including water, infant foods (such as powdered infant formula (PIF) and follow-up formulas), dried milk protein products, cheese, licorice, candies, dried spices, teas, nuts, herbs, filth and stable flies, and PIF or milk powder production facilities and household environments [33]. The microarray was developed during 2009-2013 through the leveraging of WGS efforts of a five-member International Cronobacter consortium. The microarray was able to distinguish the seven Cronobacter species from one another and from non-Cronobacter species [28] as shown in Figure 2.
Figure 2: Neighbor net (SplitsTree4) analysis of Cronobacter (n=126) and phylogenetically-related strains, which were generated from the gene-difference matrix shown in Table 1 as reported by Tall et al. [28]. The microarray experimental protocol as described by Jackson et al. [26] was used for the interrogation of the strains and for the analysis. The phylogenetic tree illustrates that the Cronobacter microarray could clearly separate the seven species of Cronobacter, with each species forming their own distinct cluster. The tree was generated using SplitsTree4 neighbor net [34]. C. sakazakii subclades are denoted as Roman numerals I-VII. The scale bar represents a 0.01 base substitution per site.
These results also support the phylogenic divergence of members of the genus and clearly highlight the genomic diversity among each member of the genus. This analysis demonstrated that the majority of strains grouped according to sequence type (ST)s. These results support the hypothesis that microarray analysis is more resolving than the seven-allele MLST scheme. These studies establish a powerful platform for further genomics research of this diverse genus, an important prerequisite toward the expansion of future countermeasures against this important foodborne pathogen in food safety.
Microarray offers a higher discriminatory power compared with pulsed-field gel electrophoresis (PFGE) and multi-locus sequence typing (MLST). PFGE is still widely used and is considered the “gold standard” for subtyping. PFGE is an efficient tool for subtyping foodborne pathogens to the specie level or strain level. However, when high discriminatory power is needed to characterize closely related isolates such as those discussed above, PFGE is not adequate. For example, PFGE could not detect differences among the Salmonella Newport linages or Salmonella enteritidis strains. In contrast, microarray detected subtle differences within the Salmonella Newport linages and among the Salmonella enteritidis strains [19]. It is noteworthy that such genetic differences revealed by microarray were confirmed by WGS [20]. Additionally, microarray can be used to detect virulence genes, antibiotic resistance genes, and other genetic traits [23,28]. For instances, multidrug resistance (MDR) phenotype was found among Salmonella Newport isolates; the microarray analysis identified the genes that coded for the MDR phenotype, indicating the versatility of microarray [19].
To assess microarray’s analytical ability, the most logical way to do it may be by comparing microarray with two common technologies for subtyping, i.e., PFGE and WGS, for their advantages and limitations, because PFGE, as a conventional method, and as mentioned above, has been the gold standard for subtyping for decades; while WGS, as a newer technology, offers great potential in analysis of genetic diversity and structure of microbes between and within species and currently is becoming a part of routine laboratory setting. All three technologies are widely used in epidemiological studies and outbreak investigations; and each offers its advantages and bears its limitations as shown in Table 1. For example, a comparison between WGS and microarray datasets were conducted with a broad representation of E. coli strains that span the species landscape. The SNP datasets generated by WGS and microarray were well-correlated (99.7%) for phylogrouping. However, the array was better for gene-based strain identification (allele profiling) than SNP-based relationships at the sub-clonal group/ strain compared to WGS SNP [7].
Technology | Capacity | Discrimitory power | Data analysis | Speed | Operation | Cost |
---|---|---|---|---|---|---|
PFGE | Low | Low | Easy | Medium | Easy | Low |
Microarrays | High | Medium | Medium | High | Easy | Medium |
WGS | High | High | High | Medium | Medium | Medium |
Table 1: Comparison of three common molecular analytical technologies for analysis of foodborne pathogens.
With the advent of affordable WGS technology, it is now possible to sequence the entire genome sequence at low cost in just a few days, making it an ideal tool for subtyping and surveillance [35]. By providing definitive genotype information including genetic mutations, SNPs, presence of antibiotic resistance and virulence genes, WGS offers the highest discriminatory power for characterizing individual microbes. More importantly, WGS can cover the entire genome and, at the same time, detect subtle genomic changes such as SNP and any other genetic divergences. No other technology can match the capabilities of WGS. However, a practical issue facing WGS is how to efficiently handle or process the massive amount of data generated from WGS analyses such as genome assembling, gene annotation and comparative genomic analyses. Undertaking this formidable task needs not only powerful and robust analytical software but also well-trained bioinformaticians to interpret the raw data.
In this aspect, microarray has an advantage over WGS, because microarray analysis is relatively easy to perform and requires less bioinformatic expertise. Hence, microarray assay is rapid, simple and economical. Once DNA is extracted from an isolate, a result can be obtained within 24 h and requires less than 2 h of actual hands-on time, as opposed to traditional serology, which typically takes a few weeks to perform [22]. Furthermore, microarray results are very reproducible and do not require highly trained bioinformaticians to interpret. Thus microarray can serve as a suitable tool for molecular epidemiological analysis of outbreak investigations. With robust comparative data-basing to help microarray users to deposit and analyze their microarray data [4], similar to PFGE, it is possible for investigators to align their pathogen isolates from an ongoing outbreak to those from an existing outbreak. The importance of database depth was highlighted, for instance, when we subtyped environmental Salmonella isolates: PFGE could not type some isolates like as microarray did, whereas some isolates identified by PFGE, were untypable by microarray due to lack of reference strains in our microarray database.
Microarray technology plays an important role in identification and analysis of foodborne pathogens. Assessment of pathogens at the gene level in a pan-genome spectrum offers the capability that cannot be provided by traditional culture-based identification methods, PCR or PFGE. This capability expands the analytical outcomes beyond species identification including virulence and antibiotic resistance genes as well as other important genomic traits. In terms of turn-around time, microarray assays are much faster than culture-based methods and relatively faster than PFGE and WGS.
Microarray technologies have undergone rapid development in the last two decades. As improvements in manufacturing of microarrays continue, the cost of high density pan-genome arrays will become more affordable. The coverage of target probes in most high density pan-genome arrays is extensive for foodborne pathogen detection and identification. Overall, microarray analysis is still the most robust and rapid tool for pathogen detection and epidemiological study as well as source tracking in outbreak investigation where high discriminatory power is needed.