If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Central Veterinary Institute (CVI) part of Wageningen University and Research Centre (WUR), Lelystad, The NetherlandsDepartment of Infectious Diseases and Immunology, Utrecht University, Utrecht, The Netherlands
PU-PH des Disciplines Pharmaceutiques, 1-URMITE CNRS IRD UMR 6236, IHU Méditerranée Infection, Valorization and Transfer, Aix Marseille Université, Faculté de Médecine et de Pharmacie, Marseille, France
Corresponding author: N. Woodford, Antimicrobial Resistance and Healthcare Associated Infections (AMRHAI) Reference Unit, National Infection Service, Public Health England, 61 Colindale Avenue, London, NW9 5EQ, UK.
Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility testing (AST).
The published evidence for using WGS as a tool to infer antimicrobial susceptibility accurately is currently either poor or non-existent and the evidence / knowledge base requires significant expansion. The primary comparators for assessing genotypic–phenotypic concordance from WGS data should be changed to epidemiological cut-off values in order to improve differentiation of wild-type from non-wild-type isolates (harbouring an acquired resistance). Clinical breakpoints should be a secondary comparator. This assessment will reveal whether genetic predictions could also be used to guide clinical decision making. Internationally agreed principles and quality control (QC) metrics will facilitate early harmonization of analytical approaches and interpretive criteria for WGS-based predictive AST. Only data sets that pass agreed QC metrics should be used in AST predictions. Minimum performance standards should exist and comparative accuracies across different WGS laboratories and processes should be measured. To facilitate comparisons, a single public database of all known resistance loci should be established, regularly updated and strictly curated using minimum standards for the inclusion of resistance loci. For most bacterial species the major limitations to widespread adoption for WGS-based AST in clinical laboratories remain the current high-cost and limited speed of inferring antimicrobial susceptibility from WGS data as well as the dependency on previous culture because analysis directly on specimens remains challenging.
For most bacterial species there is currently insufficient evidence to support the use of WGS-inferred AST to guide clinical decision making. WGS-AST should be a funding priority if it is to become a rival to phenotypic AST. This report will be updated as the available evidence increases.
The remit of the European Committee on Antimicrobial Susceptibility Testing (EUCAST) Subcommittee on the Role of Whole Genome Sequencing (WGS) in Antimicrobial Susceptibility Testing of Bacteria was:
to perform a review of the literature describing the role of WGS in antimicrobial susceptibility testing (AST) of bacteria;
to assess the sensitivity and specificity of WGS compared with standard phenotypic AST;
to consider how WGS for AST may be applied in clinical microbiology laboratories and the likely implications for phenotypic and other genotypic methods in use;
to consider the epidemiological implications of using WGS;
to consider the clinical implications of WGS for the selection of antimicrobial therapy;
to consider the principles of how the results of WGS for AST could be presented to clinical users;
to describe the drivers and barriers to routine use of WGS;
to report at ECCMID 2016.
We chose to tackle this task on a ‘by organism’ basis (for Enterobacteriaceae (non-Salmonellae); Salmonella spp.; Staphylococcus aureus; Streptococcus pneumoniae; Neisseria gonorrhoeae; Mycobacterium tuberculosis; Clostridium difficile; Acinetobacter baumannii and Pseudomonas aeruginosa), with particular focus on the use of technology for characterizing cultured isolates of bacteria that have been identified as critical antimicrobial resistance (AMR) threats by the World Health Organization [
]. There are encouraging signs, but our report makes clear that more robust data are needed across these diverse ‘bug/drug’ combinations. Furthermore, work is needed to overcome problems currently posed by particular species and/or certain antimicrobial classes. We highlight these gaps, make recommendations (summarized below), and encourage others to use these to generate the analyses that will move this important topic forwards.
The committee met by teleconference between July 2015 and March 2016 to agree the organisms and detailed scope of the report and appoint specialists to undertake non-systematic reviews of the literature and write the corresponding sections of the report. The findings of the report were agreed via teleconference and/or email, and were presented at the 26th European Congress on Clinical Microbiology and Infectious Diseases (ECCMID) in April 2016. Shortly after that, a draft for public consultation was posted online (at www.eucast.org/documents/consultations). The Subcommittee considered and responded to comments and several were included in or affected the content of this print version.
Summary of the conclusions and recommendations
For most bacteria considered in this report, the available evidence for using WGS as a tool to infer antimicrobial susceptibility (i.e. to rule-in as well as to rule-out resistance) accurately is either poor or non-existent. More focused studies and additional funding resources are needed as a priority to improve knowledge.
The primary comparator for WGS-based prediction of antimicrobial susceptibility should, whenever possible, be the epidemiological cut-off value (ECOFF).
Assessing genotypic data against clinical breakpoints represents a tougher challenge, but will be necessary if WGS-based testing is to guide clinical decision making. Clinical breakpoints should therefore be used as a secondary comparator, ideally using the same data sets as used for ECOFF-based assessments.
Most published evidence does not currently support use of WGS-inferred susceptibility to guide clinical decision making (i.e. to replace routine phenotypic AST in most or all cases).
There should be international agreement on the most appropriate and effective principles and quality control (QC) metrics to facilitate early standardization and harmonization of analytical approaches and interpretative criteria for WGS-based predictive AST. Only data sets passing agreed QC metrics should be used in antimicrobial susceptibility predictions, as resistance genes or mutations might be missed in sequences of poor quality.
Different bioinformatics tools for predicting AST should perform to minimum standards and should be calibrated and shown to be equivalent in terms of the results generated.
A single database of all known resistance genes/mutations should be established to ensure that there is parity of analysis and to facilitate measurement of comparative accuracies across different systems and bioinformatics tools. This database should be updated regularly, and must have strictly curated minimum standards for the inclusion of new resistance genes and mutations. An important function of a centralized database would be to control resistance gene nomenclature.
Expansion of the evidence base is a critical priority if WGS is to be considered seriously as a rival to phenotypic AST.
For most bacterial species and in most countries the current cost and speed of inferring antimicrobial susceptibility from WGS data remain prohibitive to wide adoption in routine clinical laboratories.
This report should be considered as a baseline discussion document, which should be revisited and updated at regular intervals (likely every 18–24 months) as sequencing technologies become more affordable and more widely applied, and the available evidence increases.
MIC distributions of wild-type bacteria, ECOFFs and their relationship to clinical breakpoints
In 2002, EUCAST introduced the concept of gathering large numbers of MIC values from many contributors to present on a web site as aggregated reference MIC distributions for every important combination of microorganisms and antimicrobial agents [
]. The original conditions of acceptance of individual MIC distributions were:
that each contribution of MIC values must consist of a minimum number of isolates;
that the species was well defined;
that MIC determinations were performed using standardized methodology (or a method calibrated to a standardized method);
that the concentrations tested were not truncated at the lower end of the concentration series.
A EUCAST Subcommittee is currently reviewing the rules for inclusion and exclusion of data sets in aggregated MIC distributions and the process by which ECOFF values are defined.
Contributors of MIC data are not informed about whether or not their contributions are accepted. There are currently more than 29 000 MIC distributions in the EUCAST database, which amounts to many millions of MIC values. The distributions are from breakpoint committees, individual researchers in human and veterinary medicine, programmes for the surveillance of AMR in humans and animals, EUCAST development projects, pharmaceutical companies as part of programmes for the development of new agents, and more. The distributions are freely available on www.eucast.org.
A typical aggregated MIC distribution, in this case for Escherichia coli and cefotaxime, is shown in Fig. 1. Of 72 cefotaxime MIC distributions for E. coli available in the database, 41 fulfilled the criteria for acceptance, were aggregated and the aggregated distribution displayed on the EUCAST web site as in Fig. 1.
MIC distributions are uni-modal or multi-modal. The left-hand, most often dominating and Gaussian-shaped part of the distribution, represents the isolates devoid of phenotypically detectable acquired resistance mechanisms, otherwise known as ‘the wild-type (WT) MIC distribution’. Furthermore, MIC values for bacteria from humans and animals are distributed in the same way. The WT is not affected by the geographical location where the isolates are collected, the specimen source (e.g. humans or animals [
], or healthy or sick individuals) or the era of collection (some of the distributions date from the 1950s whereas others are very recent). There are several ways (biological or statistical) to assess the WT distribution to define the MIC value best representing the upper end of the WT distribution [
]. Despite the fact that there is often no absolute and distinguishing threshold to mark the upper end of the WT or beginning of the non-wild-type (NWT) distribution, it has been useful to define the ECOFF as the ‘highest MIC for organisms devoid of phenotypically detectable acquired resistance mechanisms’. It provides a means to distinguish between resistant and susceptible populations in a biological sense.
From a clinical point of view, categorizing isolates into WT and NWT informs the clinician of whether or not the isolate causing infection is devoid of acquired resistance mechanisms or not, irrespective of its clinical susceptibility categorization as susceptible (S), intermediate (I) or resistant (R).
There is no immediate relationship between the categorization of WT and NWT on one side and the clinical categorization ‘S’, ‘I’ and ‘R’ on the other. A WT microorganism can be categorized as ‘S’, ‘I’ or ‘R’ to a particular antimicrobial agent and a NWT organism may still be categorized as ‘S’. This means that if one wants to encompass both WT and NWT on one hand and clinical ‘S’, ‘I’ and ‘R’ on the other, the possible susceptibility categories for an isolate to any antimicrobial agent are SWT, SNWT, IWT, INWT, RWT and RNWT. There are many examples of each of these categories in the EUCAST breakpoint tables. An E. coli isolate with a ciprofloxacin MIC of 0.25 mg/L exemplifies the SNWT category. Conversely, Pseudomonas aeruginosa and tigecycline or Stenotrophomonas maltophilia and carbapenems both exemplify the RWT category. For ampicillin, TEM-1-producing E. coli represents RNWT.
At the outset of the review of all breakpoints for all antimicrobial classes in 2002, EUCAST decided that clinical breakpoints should not divide WT MIC distributions. If breakpoints are allowed to bisect WT MIC (or zone diameter) distributions, the methodological variation would obviate reproducible ‘S’, ‘I’ and ‘R’ categorization [
]. We have not had reason to change this view. Following on from this, once it has been established that the species is a good clinical target for the agent in question, the ECOFF is the lowest possible susceptible breakpoint. The ECOFF is also the relevant ‘cut-off’ to screen for low-level resistance using phenotypic susceptibility testing.
ECOFFs provide an opportunity to compare AMR and resistance development when clinical breakpoints: (a) differ among committees and agencies, e.g. EUCAST, CLSI, US Food and Drug Administration and European Medicines Agency; (b) change over time; or (c) differ between humans and animals. There is no difference in principle between MIC distributions and ECOFFs for fast-growing non-fastidious and fastidious bacteria, and those for slow-growing bacteria, such as Mycobacterium tuberculosis [
More information on EUCAST in general and wild-type MIC distributions and ECOFFs in particular can be obtained from the EUCAST website (www.eucast.org) and the recently published review of EUCAST activities since 2001 [
This Subcommittee sought to assess whether available data are sufficient to test the hypothesis that the closest relationship between sequencing and phenotypic testing will be achieved by using the WT versus NWT categories.
Molecular mechanisms of antimicrobial resistance
For most of the clinically relevant bacterial pathogens, phenotypic analysis of bacterial susceptibility to antimicrobial agents is relatively straightforward and relies on well-proven methods, such as agar and broth microdilution (the latter being the reference standard) or disc diffusion, followed by interpretation according to agreed guidelines.
With the introduction of Sanger sequencing in the mid-1970s and PCR in the 1980s, it became possible to study some of the molecular mechanisms responsible for the observed non-susceptibility to various antimicrobial agents. Common examples of these molecular mechanisms are: (a) transferable AMR genes (e.g. extended-spectrum β-lactamases (ESBLs)); (b) upregulation of AMR gene expression by point mutations (e.g. ampC in E. coli or regulatory mutations effecting efflux in many taxa); (c) porin modification or loss (e.g. by deletion events and/or lack of expression); (d) point mutations in essential single-copy (e.g. gyrA and/or parC of Enterobacteriaceae) and multi-copy (e.g. mutations in one or more loci of the 23S rRNA gene) housekeeping genes. To complicate the matter, some bacterial species (or higher taxonomic orders) can be intrinsically resistant to a given antimicrobial agent [
]. This intrinsic resistance can be caused by some of the mechanisms listed above but can also be a result of the lack or unavailability of targets for the antimicrobial agent.
Traditional Sanger sequencing and rapid molecular methods, e.g. PCR, allow screening for a limited number of resistance genes, which are often selected because they confer resistance to key antibiotics (e.g. genes encoding ESBLs or carbapenemases). The data offer very limited opportunities to compare genotype with phenotype. By contrast, WGS has potential to yield data about any resistance gene or mutation present and the data might therefore be analysed to create a genotypically inferred antimicrobial resistance profile (or antibiogram) or, perhaps, to infer susceptibility.
Using next-generation sequencing data for in silico (genotypic) detection of AMR
Next-generation sequencing data producing WGS information can originate from a variety of different sequencing platforms. A review of these technologies is beyond the scope of this report, but can be found elsewhere [
]. Short read (e.g. 100 to 500 bp) output with high accuracy may be complemented by that produced by much longer reads. At the time of writing, these newer platforms come at significantly greater cost and higher error rates than the short-read technology. The dominant short-read technology produces single (raw) reads that are in most cases shorter than the gene(s) responsible for the reduced susceptibility to a given antimicrobial agent and either need to undergo de novo assembly to obtain larger fragments of the originally contiguous DNA (‘contigs’) or by reference (‘mapping’) to known genetic targets (in this case, AMR genetic determinants). Repeat regions in DNA fragments are particularly challenging and correct assembly may be problematic.
Using WGS data for detection of the many different molecular mechanisms leading to AMR yields far more information from a single physical test than other methods (e.g. PCR or microarrays) and, at its most fundamental level, does not require previous knowledge of the resistance phenotype of the isolate. Nevertheless, there is a need to understand the potential ‘added value’ of WGS with regard to the clinical implications of AMR and so the validity of data generated by these novel technologies must be challenged against phenotypic methods to differentiate WT isolates from NWT, or S isolates from R isolates. In this regard WGS is a genetic test that defines a genotype as WT or NWT and compares most directly with phenotypic criteria that do the same (ECOFFs).
Although more informative than conventional molecular techniques, WGS is no simple task, especially when the data have been generated by short-read (‘second generation’) technology. Detection of defined resistance genes can be achieved either by BLAST (Basic Local Alignment Search Tool) analysis of draft genomes towards a gene-based database or by mapping individual reads to the same type of database. Such solutions are already available as either downloadable tools, such as ARG-ANNOT [
]. The gene-based solutions have the obvious requirement for full-length genes identical to already characterized (and preferably published) AMR genes. The bioinformatics solutions mentioned above are able to identify less-than-perfect hits (<100% nucleotide identity, truncated genes because of non-perfect de novo assembly), but such hits will always need to be subjected to some sort of assessment if they are to be translated into a predicted phenotype.
Accurate prediction of resistance by WGS can be complicated by insufficient knowledge about all genetic variation leading to reduced susceptibility for a given antimicrobial agent (such as colistin or daptomycin) as well as the emergence of new mechanisms and when resistance arises due to altered expression of intrinsic genes (e.g. those encoding efflux pumps). Also, shortcomings of second-generation sequencing technology may hamper accuracy. An example of the latter could be Enterococcus faecium, where a point mutation (G2576T) in two or three copies of the six 23S rRNA loci would lead to phenotypic resistance to linezolid [
]. De novo assembly of second-generation sequence data from the same isolate would most likely lead to assembly into a WT version of the 23S gene due to the consensual nature of the assemblers, where only the most abundant base is reported in the draft genome data.
Quality metrics for WGS
Like any other test, the quality of WGS data can vary between individual test runs. Therefore before any actual bioinformatics analysis, QC steps are essential to assess whether the WGS data have reached a suitable standard. Only data sets passing these QC metrics should be used in antimicrobial susceptibility predictions, as resistance genes or mutations might be missed in sequences of poor quality. These QC steps include (a) assessing the quality and quantity of the raw reads to ensure sufficient coverage (e.g. >30 times coverage) of the bacterial genome, (b) assessing the quality of the de novo assembly (leading to a draft genome sequence) and (c) detecting possible contaminant DNA, originating either from upstream handling of the bacterial isolates and DNA purification or from the preparation and running of the DNA samples on the sequencer. Some of the different sequence QC parameters that have been used are listed in Table 1. The parameters most frequently used are highlighted in bold. There are currently no international standards for QC thresholds to use for assessing quality. This seems to be individually decided by researchers and may also depend on the purpose of the study or the methods used for sequencing. A high error rate for a sequencing method can be compensated to some extent by greater depth of coverage. The necessary QC threshold also depends on the species analysed. Hence, before WGS can be routinely implemented into accredited clinical practice there is a need to establish necessary minimum QC thresholds (e.g. by multiple sequencing of reference isolates) for identification of resistance genes and their variants.
Table 1Selected quality control (QC) parameters used to evaluate whole genome sequencing (WGS) data (most commonly used are shown in bold)
Number of reads
Number of reads refers to sequence yield (the amount sequenced)
Average read length
The average length of all reads, measured in base pairs.
Number of reads mapped to reference sequence
The number of reads that map to a closed (finished) genome (same strain).
Proportion of reads mapped to reference sequence (%)
The proportion of reads that map to a closed genome (same strain).
Number of reads mapped to reference chromosome
The number of reads that map to a closed chromosome (same strain).
Proportion of reads mapped to reference chromosome (%)
The proportion of reads that map to a closed genome’s chromosome (same strain). This cannot exceed 100%.
Reads mapped to reference plasmids
The number of reads that map to plasmids, if present.
Proportion of reads mapped to reference plasmids (%)
The proportion of reads that map to plasmids (if present) of the closed genomes. This cannot exceed 100%.
Depth of coverage, total DNA sequence
Describes the number of times the sequenced base pairs cover the reference DNA. Number of base pairs sequenced divided by the total size (both chromosome and plasmids) of the closed genome (same strain), often expressed with an “x” (e.g. 30x). A minimum depth of 30x is usually preferred.
Depth of coverage: chromosome
As for total DNA coverage, but describes the number of base pairs sequenced divided by the total size of the closed chromosome (same strain).
Depth of coverage: plasmid
As for total DNA coverage, but describes the number of base pairs sequenced divided by the total size of the closed plasmid (same strain).
Size of assembled genome
Often used to identify contamination. If the calculated size of all the contigs in base pairs exceeds that expected it could indicate more than one genome.
Size of assembled genome per total size of DNA sequence (%)
The proportion of contigs that map directly to the closed genome (same strain). This cannot exceed 100%.
Total number of contigs
Generally, the total number of contigs assembled, <1000 contigs indicates good quality. For organisms with genomes 5–6 Mb in size then <100 contigs is (generally) realistic.
Number of contigs >500 bp
The total number of contigs assembled that have a sequence length >500 bp. This should correspond well to the total number of contigs.
Longest contig length
The length of the longest contig.
Shortest contig length
The length of the shortest contig
Mean, median and standard deviation
Mean, median and standard deviation of the contigs, used to evaluate quality.
The length for which the collection of all contigs of that length or longer contains at least half of the sum of the lengths of all contigs, and for which the collection of all contigs of that length or shorter also contains at least half of the sum of the lengths of all contigs. N50 >15 000 normally indicates good quality, but minimum size of 30 000 bp is often preferred.
Helpful for comparisons between assemblies. As N50, except that 50% of the genome size must be of the NG50 length or longer. Where the assembly size ≤ the genome size then NG50 cannot exceed N50.
] in proficiency testing of WGS data, and isolates have been distributed to 50 laboratories worldwide. This and similar initiatives are important first steps towards setting objective QC thresholds. There is, however, a need to expand this using more isolates as well as developing standard data sets of raw sequences to facilitate the assessment of performances across different laboratories.
The need for a standardized, open-access database
Most of the genomes released now are not closed, so there is a need for better standardization of annotation to facilitate detection of AMR genes because standard BLAST analysis will retrieve plenty of hits within annotated or raw sequences available in GenBank, and those hits will be inconsistently annotated even where the actual sequences are identical.
Currently several AMR databases exist, and they are either downloadable for use locally (e.g. ARG-ANNOT) [
]). In addition to a fully curated database of accurately annotated genes that seeks to avoid the pitfalls posed by erroneously annotated genes, there also exists a need for a single, standardized ‘challenge database’ solution that contains all validated AMR genes as well as those point mutations in chromosomal target genes that are known to be associated with antimicrobial resistance. This can then be used as a standard reference data set for different bioinformatics analysis tools. Any such solution should be flexible so that stringency of detection can be changed to allow detection from partial gene sequences (length <100%) and/or AMR genes with identities <100%. Looking at conserved sequence motifs in gene families (e.g. β-lactamases) should also help in assessing the validity of newly detected genes and in particular those exhibiting low sequence identity matches.
This single web solution should be iterative and enhanced by regular, validated updates of newly identified gene sequences and point mutations at a frequency that remains to be decided. Machine learning should be explored to improve iteratively and automatically the detection algorithms for this purpose.
However, to achieve this goal there must also be clear international consensus on the criteria used to define a gene as ‘new’ (i.e. % of identity with existing genes) or as a variant of known genes. This is inextricably linked to issues of gene nomenclature. Currently, different criteria are used depending on the antimicrobial class to which a particular gene confers resistance. For example, a ‘new’ β-lactamase gene can be defined by as little as one amino acid difference from a known sequence and regardless of any impact that this change might have on the conferred resistance phenotype.
There should be minimum standards for inclusion of new resistance determinants in the standardized database, and these standards would probably differ from those currently required for publication (e.g. they may be more demanding). It seems reasonable that new genes should have a full gene sequence, which can be translated into a protein sequence, and that they should have been unequivocally linked to a predicted resistance phenotype, as was recently exemplified by mcr-1 plasmid-mediated colistin resistance [
Categories of systematic errors in WGS predictions of AMR
When comparing the concordance between phenotypic and genotypic AMR it is essential to consider the reasons that errors may occur. Three broad reasons for systematic errors are:
An inadequate limit of detection of WGS.
Flaws with phenotypic AST.
Incomplete understanding of the genotypic basis of phenotypic resistance.
Of these, the limit of detection of WGS applies to the detection of hetero-resistance, which is most applicable to Mycobacterium tuberculosis, as for most other organisms WGS is usually performed from single bacterial colonies. Flaws due to phenotypic detection issues become most apparent when the knowledge base of the genetic basis of resistance is relatively complete for a given organism and can point to such problems. For the purposes of this report it will likely only apply to well-progressed / well-characterized species (such as Mycobacterium tuberculosis and Staphylococcus aureus). At this relatively early stage of development of WGS-based genotype–phenotype comparisons it can be anticipated that there may be many gaps in the knowledge base and these will be explored and highlighted in the following evidence reports.
Evidence reports for in silico prediction of antimicrobial resistance
Enterobacteriaceae (other than Salmonella spp.)
Multidrug-resistant Enterobacteriaceae are emerging as a serious infectious disease challenge. They can accumulate many antimicrobial resistance genes through horizontal transfer of genetic elements, those coding for β-lactamases (e.g. ESBLs and carbapenemases), fluoroquinolones and aminoglycosides being of particular concern.
A small number of studies have assessed the feasibility of using WGS to infer AMR in E. coli and Klebsiella pneumoniae genomes; they are largely based on screening for known acquired AMR genes and a small number of known resistance-conferring mutations, such as those associated with ciprofloxacin resistance. In one study, Stoesser et al. reported 95% concordance between phenotypic and WGS-predicted susceptibility for seven commonly used agents (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin, ceftriaxone, ceftazidime and meropenem) by querying 143 assembled genomes from E. coli and K. pneumoniae with a compiled database of acquired AMR sequences and mutations in the quinolone-resistance-determining regions of gyrA and parC [
]. An even higher level of concordance (99.74%) between phenotypic susceptibility testing and the WGS-predicted resistance to five agents or classes of agents (β-lactams, chloramphenicol, sulphonamides, tetracycline and trimethoprim) in E. coli genomes was reported in an earlier study using the same approach with a database of 1411 different AMR sequences, confirmed using simple blotting and PCR approaches to the expected genes [
]. A recent study investigating 76 E. coli isolates from farm cattle also showed good phenotype–genotype correlation (97.8%), with the majority of discordant results attributed to the prediction of streptomycin resistance [
There are important limitations to identifying the mode of transmission of these acquired genes in short-read sequences due to exclusion of repeat regions during the ‘cleaning’ stages while initial contigs are assembled. This typically results in inaccuracies in annotation. These are particularly marked in highly recombinant plasmids, which unfortunately carry most of the AMR genes that are relevant to β-lactam and aminoglycoside resistance in the Enterobacteriaceae. Nevertheless, using sequencing technologies with longer reads (and greater cost), such as PacBio and MinION, improved their detection [
]. These bioinformatics challenges also include the development of tools that can detect signature sequences of AMR determinants (e.g. β-lactamase motifs) to identify potentially new variants conferring acquired AMR, which can be explored in more detail.
Problems found or anticipated—gaps in the knowledge base
Chromosomal mutations that alter the cell membrane permeability due to modification in the structure or the levels of expression of outer membrane proteins, antimicrobial efflux due to efflux pumps such as resistance-nodulation-division and major facilitator superfamily pumps, or changes in the lipopolysaccharide structure have been linked to decreased susceptibility and resistance to β-lactams, quinolones, chloramphenicol, tetracyclines, tigecycline and colistin in Enterobacteriaceae, but have yet to be fully elucidated. This makes comprehensive phenotypic–genotypic comparisons difficult [
] by limiting the sensitivity of the WGS-based data. In particular, the relationships between chromosomal mutations and the related phenotypic changes responsible for resistance are not always well characterized and screening genome sequences for insertion sequences interrupting or modifying the expression of resistance-associated genes, including intrinsic β-lactamases, could be problematic due to constraints inherent in using short reads. Therefore, and in contrast to horizontally acquired resistance genes, the ability of WGS to predict resistance due to, or modulated by, chromosomal alterations is likely to be restricted by existing knowledge, as in the case of carbapenem non-susceptibility resulting from the combination of decreased permeability and AmpC or ESBL enzymes and also for antimicrobial agents for which the underlying genetic backgrounds of resistance are yet to be fully characterized (e.g. amoxicillin-clavulanic acid, nitrofurantoin, temocillin, colistin and tigecycline). Screening for the loss-of-function mutations via nonsense mutations, frameshifts or insertion elements that are meaningfully less complex than substitutions affecting the structure, dynamics and substrate specificity of resistance-conferring proteins is realistically achievable [
]. The effect of amino acid changes in the transmembrane β-strand loop 3 that constitutes the porin channel eyelet that was associated with diminished carbapenem uptake in the endemic K. pneumoniae carbapenemase-producing K. pneumoniae ST258 clone illustrates the complexity of interpreting amino acid substitutions identified by WGS in the absence of experimental evidence [
]. Recent studies have shown that the genetic basis of resistance to colistin in K. pneumoniae clinical isolates can be attributed in the majority of cases to alterations in the mgrB regulator or the two component systems pmrAB or phoEQ that regulate the expression of the biosynthesis pathway of lipid A [
]. However, incorrectly inferring susceptibility remains the risk if resistance is mediated by genuinely novel, undiscovered genetic factors, as evidenced by the description of mcr-1, the first known transferable colistin resistance determinant [
The gaps in the existing knowledge of genotype–phenotype relationships could be augmented in some cases by directly detecting the levels of gene expression by sequencing RNA extracts. This approach was successfully used as a proof-of-concept for the detection of ompF down-regulation associated with cephalosporin resistance and over-expression of the resistance-nodulation-division pump component acrB leading to decreased susceptibility to quinolones, tetracycline and chloramphenicol in an E. coli laboratory-selected mutant [
]. Although a similar approach should also be feasible for the detection of hyper-production of intrinsic chromosomal β-lactamases (e.g. AmpC in Enterobacter spp.) the use of such methods, for which accuracy is highly dependent on bacterial growth conditions, is likely to be confined to a sub-set of laboratories in the foreseeable future and will not be considered further for the purposes of this document, which seeks to examine widely used techniques only.
The relatively limited number of acquired resistance genes and resistance-associated mutations that dominate epidemiologically in the Enterobacteriaceae (compared with the large number of those that have been reported in the resistome) could explain the high levels of accuracy of genotype–phenotype correlation in published studies and means that well-informed screening approaches can be very accurate. However, susceptibility to some antimicrobial agents will be harder to predict than for others and understanding the full range of mechanisms and their interplay will require more study if improved levels of accuracy across large, genetically diverse data sets are to be achieved.
Molecular mechanisms conferring reduced susceptibility to antimicrobial agents are relatively well characterized in Salmonella spp. The majority of these are encoded by horizontally transferable genes, which potentially makes genotypic detection a reliable alternative to phenotypic testing as these genes generally assemble into full-length genes when using short-read sequencing data, as long as the quality and quantity of these are adequate to produce good assemblies. In addition to acquired gene-based AMR, mutationally acquired AMR also exists in Salmonella spp. The clinically most important examples currently are single or double mutations in the gyrA DNA gyrase and parC topoisomerase genes leading to reduced susceptibility to quinolones and fluoroquinolones. Resistance to third-generation cephalosporins due to acquired extended-spectrum and AmpC β-lactamases is also clinically important to consider.
Very few comprehensive studies have been published where phenotypic susceptibility data have been compared with the underlying molecular mechanisms identified in WGS data sets from a collection of Salmonella isolates. Zankari et al. used a set of 50 Salmonella enterica serovar Typhimurium isolates originating from pigs and previously tested phenotypically against 17 different antimicrobial agents as part of the DANMAP surveillance program [
]. WGS was performed on these 50 isolates, of which 49 produced sufficient WGS data to create draft genomes for analysis with the ResFinder web-tool. Here, complete agreement (100% sensitivity and specificity) between tested and predicted susceptibility/resistance phenotypes (S/R) was observed [
]. However, this perfect correlation between phenotypes and genotypes generated by ResFinder was to some extent biased by (a) the fact that none of the isolates showed phenotypic resistance to quinolones or fluoroquinolones—which would, in most cases, have been unnoticed by ResFinder as it currently does not detect chromosomally acquired mutations leading to AMR; and (b) there was a relatively low level of diversity among the resistance phenotypes and hence resistance genes found.
In addition, a few studies exist where only a small number of isolates have been analysed using both phenotypic and genotypic methods. In a study of extremely drug-resistant Salmonella enterica serovar Senftenberg, two isolates from Zambia were analysed both by conventional phenotypic methods and by WGS analysis [
]. Here, genes conferring reduced susceptibility to nine drug classes including fluoroquinolones and extended-spectrum cephalosporins, were identified, again with the use of the ResFinder web-tool. These isolates also demonstrated high-level resistance to fluoroquinolones caused by mutations in GyrA (S83Y and D87G) and ParC (S80I), which were identified manually. An underlying molecular mechanism was identified for all the AMR phenotypes displayed by the two isolates. In a similar study, two ESBL-producing Salmonella enterica serovar Typhi isolates were tested phenotypically against 25 different antimicrobial agents belonging to ten different classes [
Genomic dissection of travel-associated extended-spectrum-beta-lactamase-producing Salmonella enterica serovar Typhi isolates originating from the Philippines: a one-off occurrence or a threat to effective treatment of typhoid fever?.
]. Here, ResFinder found seven different AMR genes, which in combination could explain the observed AMR phenotypes of the two isolates.
Problems found or anticipated—gaps in the knowledge base
The overall degree of resistance within Salmonella spp. varies depending on the serovar and phage type; some may be associated with resistance to particular antimicrobial agents, whereas others may have an increased propensity for multidrug resistance. Diversity within the isolate panel tested will therefore be likely to impact on the conclusions drawn regarding the utility of WGS for AMR prediction in Salmonella spp. WGS should therefore be applied to further isolate panels reflecting the diversity of Salmonella serovars (and their common resistance phenotypes) associated with clinical and veterinary infections. Included in these panels should be representatives of some of the particularly multiresistant clones that are currently circulating in human and animal populations, e.g. serovars Kentucky, Infantis and monophasic Typhimurium. Nevertheless, recent data from the European Centre for Disease Prevention and Control indicate that nearly 55% of Salmonella spp. from humans are susceptible to all antimicrobial classes tested, suggesting that resistance prediction in Salmonella spp. may be more straightforward than in other organisms, where multidrug resistance is the norm [
As with the other Enterobacteriaceae, detection of chromosomal mutations leading to acquired antimicrobial resistance is a challenge that still needs to be addressed to be able to predict resistance phenotypes fully based on WGS data. Priority should be afforded to the detection of mutations leading to fluoroquinolone resistance, as this drug class has high clinical relevance and phenotypic resistance is commonly detected in salmonellae. Development of fluoroquinolone resistance can be a multifactorial process involving acquisition of mutations leading to amino acid substitutions within the topoisomerase genes and altered expression of outer membrane proteins and/or multidrug efflux pumps. Fortunately, the most common chromosomal mutations leading to acquired fluoroquinolone resistance in salmonellae are well characterized; decision rules to translate mutations into a predicted phenotype are therefore available and can, in principle, be incorporated into existing tools such as ResFinder. This, however, requires detection of amino acid variation rather than nucleotide variation, which is currently used to detect (transferable) resistance genes. Other possible candidates for detection of chromosomal mutations are pmrA and pmrB, leading to reduced susceptibility to colistin [
The relatively limited studies that are available on the feasibility of using WGS data to predict antimicrobial resistance in salmonellae are showing promising results, but the impact of the sample sets tested on genetic diversity needs to be explored in detail before further conclusions are drawn about the use of WGS data for AMR prediction in Salmonella spp.
Staphylococcus aureus exhibits intrinsic susceptibility to commonly used antimicrobial agents. Resistance is associated with mutations in core genes or the acquisition of specific antimicrobial resistance genes. Generally, the history of antimicrobial resistance in this species is associated with the evolution of resistance shortly after the first introductions of a new agent into clinical practice, for example resistance to methicillin was detected in clinical isolates of S. aureus within a year of introduction into the UK [
]. The problem of resistance in this species has driven extensive studies to identify the genetic basis of resistance, and as such there is a large body of literature documenting resistance mechanisms for most of the clinically relevant agents used. This has revealed a well-characterized spectrum of mechanisms bestowing resistance in S. aureus, and in the cases of some AMR determinants, in other staphylococcal species and also in other genera. This has proved to be a valuable resource for the in silico prediction of antimicrobial resistance and has contributed to the overall value of the results.
To date, several studies have been published that demonstrate the ability to predict antimicrobial resistance from genome sequence data [
The initial demonstration of the potential of WGS data to predict AMR as a clinical tool came from proof-of-concept studies using the benchtop Illumina MiSeq platform to investigate suspected methicillin-resistant S. aureus (MRSA) outbreaks. Köser et al. sequenced 14 isolates belonging to four different clonal complexes of S. aureus, and demonstrated 100% concordance of the in silico resistance prediction with the phenotypic results for 13 different agents (cefoxitin, erythromycin, ciprofloxacin, gentamicin, tetracycline, rifampicin, fusidic acid, mupirocin, clindamycin, kanamycin, tobramycin, trimethoprim and linezolid) [
]. The authors used an in-house database of resistance determinants derived from literature mining and mapped sequence reads to a resistome pseudo-molecule (concatenated resistance genes in a single DNA sequence), followed by manual inspection to predict the resistance profile of each isolate. Investigating an MRSA cluster in an intensive care unit, Eyre et al. sequenced ten isolates belonging to the same spa-type (t5973) and conducted in silico predictions for penicillin and tetracycline [
]. The authors took a different bioinformatics path to investigate the antimicrobial resistance, using the de novo assemblies to look for the presence and absence of two genes, tetK and blaZ. In all cases the presence of these genes correlated with the phenotypic resistances to respective agents.
Examining 13 isolates belonging to USA300 clone, Lee et al. predicted the antimicrobial resistance profiles for nine agents (ciprofloxacin, clindamycin, doxycycline, erythromycin, gentamicin, oxacillin, tetracycline, trimethoprim-sulfamethoxazole and vancomycin) in complete concordance with the phenotypic results [
]. In the case of this study, details of the antimicrobial resistance database used were not provided.
The effectiveness of in silico prediction for antimicrobial resistance in S. aureus has been further demonstrated in larger studies, both in terms of the number of isolates and also the agents investigated. Using the genome data of 193 isolates belonging to a global collection of ST22, Holden et al. applied a mapping based approach, coupled with manual inspection, to identify molecular determinants that explained 99.8% of the measured phenotypic resistance traits [
]. In total, 847 resistance traits were tested for 18 agents (penicillin, oxacillin gentamicin, linezolid, erythromycin, clindamycin, ciprofloxacin, fusidic acid, mupirocin, moxifloxacin, trimethoprim-sulfamethoxazole, tetracycline, vancomycin, teicoplanin, rifampicin, fosfomycin, tigecycline and daptomycin), using an enhanced version of the library used by Köser et al. [
Using the WGS data and phenotype data for 501 S. aureus isolates as a derivation set to optimize predictions for 12 agents (penicillin, methicillin, erythromycin, clindamycin, tetracycline, ciprofloxacin, vancomycin, trimethoprim, gentamicin, fusidic acid, rifampicin and mupirocin), Gordon et al. then conducted a blind validation of their refined method on a query set of 491 isolates and demonstrated sensitivity of 0.97 and specificity of 0.99 [
]. In their resistance database 18 acquired genes were included: blaZ, mecA, msr(A), erm(A), erm(B), erm(C), erm(T), tet(K), tet(L), tet(M), vanA, fusB, far, dfrA, dfrG, aacA-aphD, mupA and mupB, and variation in six core genes: gyrA (n = 6), grlA (n = 13), grlB (n=6), fusA (n = 59), rpoB (n = 28) and dfrB (n = 8).
In a significant departure from previous studies in S. aureus, Bradley et al. described a stand-alone tool, Mykrobe predictor (http://www.mykrobe.com/), for antimicrobial resistance prediction directly from fastq files and which does not rely on mapping or assembly-based approaches [
]. This tool uses a de Bruijn graph-based approach to compare sequence reads to a reference graph representation. The method has the advantage of being faster than the mapping and assembly-based methods, and it can also identify minority variants in sequencing data and therefore identify potential contamination issues. In their study, Bradley et al. used 495 isolates as a training set, and then validated the tool with a collection of WGS data from a further 471 isolates. The tool uses the Gordon et al. database [
] with some additional refinements and makes predictions for the same 12 agents. Using the tool, Bradley et al. were able to demonstrate sensitivity of 99.1% and specificity of 99.6% for the genotypic predictions in comparison with the phenotypes.
Problems found or anticipated—gaps in the knowledge base
The evolution of antimicrobial resistance in S. aureus occurs by point mutation in core genes, and also by horizontal acquisition of resistance genes via mobile genetic elements. In the studies conducted so far, it is apparent that in some cases the relative genetic instability of mobile genetic elements carrying resistance genes can be a cause of discrepancy in the genotypic and phenotypic comparisons. One of the agents most prone to this problem is erythromycin. In S. aureus, genes encoding erythromycin resistance are often found on plasmids, such as in the case of erm(C). The instability of the erm(C)-carrying plasmid has been well documented, and it can be lost during passage of isolates in the laboratory. In the study by Holden et al. [
] discrepancies in erythromycin resistance prediction were thought to be due to the loss of the erm(C) during propagation and transfer between testing and genomics laboratories. Similar observations about the loss of the SCCmec element from the chromosome have also been made, which can account for discrepancies in cefoxitin resistance, albeit at a far lower frequency. In this case, evidence of the deletion of the whole SCCmec element carrying the mecA gene encoding cefoxitin resistance can be observed. Genetic stability of core components can also affect the observed resistance levels for some agents in S. aureus. Hetero-resistance has been observed whereby a sub-population of cells in a cultured population exhibits a higher MIC than their ‘siblings’. WGS of sub-populations has uncovered genetic variation associated with hetero-resistance to vancomycin, daptomycin and oxacillin [
Although the studies that have been published so far have generally demonstrated the effectiveness of in silico resistance prediction, there is evidence emerging that the performance for some agents will be less accurate than for others. Aanensen et al. [
] recently conducted a blinded study of 308 isolates and phenotypically tested a range of agents (16 agents were tested against all isolates: penicillin, cefoxitin, ciprofloxacin, moxifloxacin, amikacin, gentamicin, tobramycin, erythromycin, clindamycin, tetracycline, tigecycline, fusidic acid, linezolid, mupirocin, rifampicin, trimethoprim; and for MRSA isolates teicoplanin, vancomycin and daptomycin were also tested). Overall, the total performance of the in silico prediction was in line with previous studies, with 98.6% concordance, although for some agents, such as amikacin (92.5% concordance) and teicoplanin (97.5% concordance), the in silico prediction proved less effective.
For some agents there are clearly gaps in the knowledge base for the genetic basis of resistance that require further investigation. For example, in the case of glycopeptides, such as vancomycin, the multiplicity and diversity of mutational changes that are suspected to be linked with increased MICs in S. aureus is so great that they confound an accurate prediction of susceptibility or resistance [
Discrepancies between the genotype and phenotype in some studies have been revealed to be laboratory artefacts and errors, where phenotypic re-testing led to concordance. Technical variation in some of the tests is also a possible contributing factor in mis-matches. For example, in the study by Aanensen et al. [
] it was noted that in the case of five isolates that had incorrect in silico predictions for mupirocin. The inhibition zone of these isolates (all at 29 mm) were so close to the ECOFF (WT ≥30 mm), that it cannot be ruled out that technical variations in the phenotypic AST might have influenced results or that it signals that a revision of the zone diameter ECOFF for mupirocin is needed.
The in silico prediction of AMR susceptibility for S. aureus is effective for most clinically relevant agents. There are, however, some agents for which it is more challenging to make predictions and further investigation is required to characterize the genetic and phenotypic basis of resistance.
Streptococcus pneumoniae is clinically a highly important community pathogen, in which genetic detection of acquired resistance is particularly challenging because most resistance results from development of mosaic genes or mutations in chromosomally encoded genes [
]. The most clinically important groups of antimicrobial agents with activity against pneumococci are the β-lactams, macrolides, tetracyclines and newer fluoroquinolones (e.g. moxifloxacin). WGS-based approaches have been used for characterizing resistance mechanisms for several of these antimicrobial groups. However, no specific, user-friendly database has been developed so far, and WGS has mostly been used to study new mechanisms, and not in the context of predicting phenotypic resistance from whole genome data.
For β-lactams, resistance is mostly mediated through the development of mosaic genes encoding altered penicillin-binding proteins (PBPs), as a result of intraspecies and interspecies DNA transfer by natural transformation [
]. Variants of PBP2x, PBP2b and PBP1a are considered most relevant to penicillin resistance in pneumococci. However, there have also been reports of non-PBP-mediated resistance, such as enrichment in branched-chain muropeptides and mutations in genes encoding other enzymes involved in the peptidoglycan synthesis [
]. Sequencing was done by the 454 platform, generating a genome assembly of 28× coverage, with 97% of the reads assembled into 78 large contigs. Comparative sequence analysis identified mutations that were confirmed by Sanger sequencing. PBP2x mutations were shown to be important, but the relationship between genotypic and phenotypic resistance was complex, with a mutated iron transport system found in several of the resistant mutants.
Later work by the same research group proposed phenotypic reconstruction by whole genome transformation of penicillin-susceptible S. pneumoniae of known genetic backgrounds with genomic DNA from resistant clinical isolates [
]. This procedure would then be followed by WGS of the antimicrobial-resistant transformants. Selection of transformants was done by gradually increasing increments of penicillin concentrations. The genome sequences of the fully resistant and intermediate-step transformants were compared with the reference genome of the wild-type S. pneumoniae strains used in the experiments. The study confirmed the importance of mosaic PBP2x, PBP2b and PBP1a, but also suggested a role for PBP2a in some isolates. In another study, analysis of cefotaxime-resistant mutants revealed mosaic PBPs as well as mutations in other genes important for peptidoglycan synthesis [
]. Although these data suggest that predicting phenotypic β-lactam resistance based on WGS could be feasible in S. pneumoniae, there are so far no studies with clinical isolates to confirm this.
Macrolides also have clinically important activity against S. pneumoniae. Resistance is often mediated through RNA methylase (erm) or macrolide efflux (mef) genes, both of which are coupled to mobile genetic elements. One study was conducted in the USA with 147 pneumococcal isolates collected over an 18-year period both before and after the introduction of conjugate vaccines [
]. Genomes were then compared and mapping of macrolide resistance genes and their genetic environment was carried out. Resistance genes were detected in all isolates, but the study was in no way investigator-blinded, as all included isolates were macrolide-resistant by phenotypic methods.
Lupien et al. investigated mutants selected for resistance to tetracyclines [
]. Resistance to tetracycline in bacteria occurs through enzymatic inactivation or, more often, by active efflux (via intrinsic or acquired pumps) or by ribosome protection. Resistance to tetracyclines in pneumococci is very common, and most often mediated by tet genes, which are found on mobile genetic elements. Lupien et al. used WGS to investigate not only genomic DNA, but also RNA sequencing libraries depleted of rRNA. RNA expression was compared in parent strains and mutants, identifying differentially expressed genes. Quantitative RT-PCR was used to confirm over-expression of some of the genes identified by comparison of sequenced mRNA in mutants and wild-type strains. Gene ontology classification of genes whose expression is significantly altered in S. pneumoniae therefore seems to be a feasible way of studying new chromosomal resistance mechanisms, although this approach has not been used on clinical isolates. Finally, whole genomic DNA transformation combined with WGS has also been used to study isolates with resistance to ciprofloxacin [
]. In addition to identifying efflux (using quantitative RT-PCR) and quinolone-resistance-determining region mutations, the methodology could also point to the potential role of mutations in drug transporters and redox enzymes in ciprofloxacin resistance.
A number of mechanistic studies, including whole genome transformation, have been carried out with laboratory mutants. These studies have shed light on a number of putative new mechanisms, but there is at present a lack of studies of the utility of WGS for predicting phenotypic resistance to agents used in the treatment of S. pneumoniae.
Gonorrhoea is a global public health concern with the World Health Organization estimating 106 million cases every year [
]. However, treatment failures with cephalosporin monotherapy have recently been observed in a number of countries in cases where the cefixime MICs for the infecting gonococci were as low as 0.032 mg/L [
Numerous genetic mechanisms exist in N. gonorrhoeae for the development of elevated MICs to the extended-spectrum cephalosporins (ceftriaxone and cefixime). Alterations in penA, which encodes PBP2, have been described either through amino acid alterations (A501, G542, P551) or through the acquisition of a penA mosaic allele, which contains segments of penA from non-gonococcal Neisseria species [
]. Upregulation of the MtrCDE efflux pump via a deletion in the promoter at –35 (A-del) or alterations of the MtrR repressor protein at positions G45D and A39T have also been associated with decreased susceptibility to the extended-spectrum cephalosporins [
]. A third mechanism for increased MICs to extended-spectrum cephalosporins involves alterations in the PorB1b porin at amino acid positions G120 and A121. These permeability changes may reduce entry of extended-spectrum cephalosporins into the cell leading to reduced susceptibility [
Only a few published studies have used WGS to examine the phenotypic and genotypic antimicrobial resistance patterns observed in N. gonorrhoeae. The first such study examined the genomes of 236 isolates of N. gonorrhoeae collected in the USA from 2009 to 2010, and included 118 isolates with decreased susceptibility to cefixime (MIC ≥0.25 mg/L) [
]. The mosaic penA XXXIV allele once again had the best positive predictive value, with this locus detected in six of seven cefixime-resistant isolates. Mutations in the mtrR/mtrCDE operon promoter region and penB gene did not have such a strong predictive value, being found in only two of seven and three of seven cefixime-resistant isolates, respectively. A third study applied WGS to 169 Canadian isolates of N. gonorrhoeae with various antimicrobial resistance patterns [
]. There were 67 isolates with ceftriaxone MICs ranging from 0.125 to 2 mg/L. Of these, 40 (59.7%) harboured the penA mosaic, and all but one isolate had either porB mutations or the mtrR promoter mutations. Of the remaining 27 isolates without the penA mosaic, all had porB mutations and only a single isolate did not contain a mutation in the mtrR promoter. However, when isolates with low MICs of ceftriaxone (<0.032 mg/L; n = 65) were examined, one isolate (1.5%) was found to have the penA mosaic, 33 (50.7%) isolates contained mtrR promoter mutations, and five isolates also contained mutations in porB.
A second highly clinically significant phenotype that has been examined for N. gonorrhoeae is azithromycin resistance. The rates of azithromycin resistance in N. gonorrhoeae have been increasing in recent years in many countries, and the emergence of high-level resistance (MIC ≥256 mg/L) has been reported from the UK, Argentina, Canada and the USA [
]. The genetic mechanisms of resistance to azithromycin in N. gonorrhoeae include: accumulated changes in the four different alleles of the 23S rRNA genes; the presence of a 23S rRNA methylase encoded by erm(A), erm(B), erm(C) or erm(F); mutations in rplD and rplV; as well as the penB and mtr operon genes described above for cephalosporin resistance [
Although reports describing WGS-based detection of these mechanisms are limited, there have been attempts to compare the azithromycin resistance phenotype with the WGS genotype and these are discussed below.
Problems found or anticipated—gaps in the knowledge base
Ezewudo et al. examined two isolates that were resistant to azithromycin and found that one isolate contained the 23S rRNA mutation whereas the other did not contain any of the known mutations examined [
]. In a second study, involving the WGS analysis of 213 Canadian azithromycin-resistant isolates, 23S rRNA mutations A2045G and/or C2597T, disruptions in the mtrR promoter, or the presence of erm(C) were strongly associated with phenotypic resistance [
]. Seventy N. gonorrhoeae contained only the mtrR –35 deletion, and of these 21 were susceptible to azithromycin, suggesting that other potential but unknown mechanisms conferring resistance may exist.
Although there is strong association with penA mosaic alleles and reduced susceptibility to extended-spectrum cephalosporins, caution is required with this data set as it relates to predicting the extended-spectrum cephalosporin MIC phenotype. The study by Demczuk et al. suggested that reduced susceptibility to ceftriaxone remains complex, involving additional genetic markers [
Inferring resistance to extended-spectrum cephalosporins and azithromycin in N. gonorrhoeae is possible with a high probability if certain genetic markers are detected in WGS data. However, the elevated MICs for some isolates result from combinations of multiple genetic changes, and further mechanisms of resistance have yet to be elucidated. Hence, predicting resistance to these antimicrobial agents can be problematic. Additional studies are required before the use of WGS can be advocated for use on a routine basis to predict resistance for these antimicrobial agents.
Mycobacterium tuberculosis complex
Tuberculosis (TB) is caused by members of the Mycobacterium tuberculosis complex (MTBC) and, more rarely, by Mycobacterium canettii [
]. MTBC is monomorphic and strictly clonal and antimicrobial resistance is therefore only caused by chromosomal changes. These are single nucleotide polymorphisms (SNPs) in the vast majority of cases, but small in-frame insertions/deletions (indels) in essential resistance genes and large indels in non-essential genes are also possible [
]. Current molecular AST assays only interrogate the most frequent mutations conferring resistance to a limited number of drugs. In theory, this limitation can be overcome by WGS. In practice, however, routine WGS of all isolates from TB cases is unlikely to be cost-effective if performed to predict antimicrobial susceptibility alone, despite the decreasing cost of WGS [
]. Instead, the main driver for the introduction of WGS for all TB cases will be the desire to replace traditional typing techniques by the ultimate resolution provided by WGS to improve outbreak investigations [
]. Moreover, WGS from the initial liquid culture can replace current techniques for pathogen identification (WGS directly from a clinical sample is technically possible, but less reliable and prohibitively expensive for clinical practice at the moment [
]. Nevertheless, major gaps in this area remain, as discussed below. Several tools have been developed to automate WGS data analysis, although most do not meet clinical standards because they do not provide the necessary record keeping capabilities, have not been evaluated extensively, and often there are no plans to accredit them [
Castb (the comprehensive analysis server for the Mycobacterium tuberculosis complex): a publicly accessible web server for epidemiological analyses, drug-resistance prediction and phylogenetic comparison of clinical isolates.
Problems found/anticipated—gaps in the knowledge base
As discussed briefly in the section above on ‘Categories of systematic errors in WGS predictions of AMR’, three main challenges limit the utility of genotypic AST compared with phenotypic alternatives.
Systematic errors due to inadequate limit of detection of WGS. AST for TB is usually done on a significant fraction of the primary culture, as opposed to just one to three colonies from a primary agar plate, which is the approach taken for the vast majority of other clinically relevant bacterial pathogens. Resistance is deemed clinically significant if resistant organisms are present at or above a critical proportion, which is set at 10% for pyrazinamide and at 1% for the remaining antimicrobial agents, and reference standard phenotypic AST (i.e. the proportion method) is calibrated to detect resistance at this limit [
]. The limit of detection of traditional genotypic AST methods is poorer and depends on the assay and specific mutation, which can result in systematic false-negative results for strains with low-level hetero-resistance [
]. The magnitude of this source of error depends on several factors, including the frequency of mixed infections with unrelated strains that have different susceptibilities and the proportion of resistance that is transmitted versus resistance that evolves during treatment [
]. In practice, these factors vary between patient groups, geographic settings and antimicrobial agents. Moreover, the precise mechanism of resistance is relevant in this context. Low-level hetero-resistance SNPs can be identified by increasing the sequencing coverage, although this makes WGS prohibitively expensive in a clinical context at the moment [
]. This strategy is not an option for hetero-resistance indels, particularly large ones, because of the limited read lengths of the most commonly used platforms for clinical sequencing, coupled with the fact that most analysis algorithms are not optimized to identify indels [
Systematic errors due to poorly defined breakpoints for phenotypic AST used as the reference standard for the validation of WGS-based AST. The clinical breakpoints, known as critical concentrations (CCs) in the TB field, are currently defined by the CLSI and the WHO [
]. Clinical breakpoints should be defined by committees based on representative MIC distributions, pharmacokinetic–pharmacodynamic data and, ideally, clinical outcome data, which, for a variety of reasons, are difficult to obtain for TB agents [
]. In practice, however, the evidence used to set the current CCs is not clear and emerging data from systematic MIC testing and pharmacokinetic–pharmacodynamic studies indicate that the CCs for some agents need to be revised [
Incomplete understanding of the genotypic basis of phenotypic resistance. The Bill & Melinda Gates Foundation has funded the Foundation for Innovative New Diagnostics and the Critical Path to TB Drug Regimens to create a clinical grade database, akin to the HIV Stanford resistance database, to enable the interpretation of TB WGS data for AST [
]. As part of this effort, the Foundation for Innovative New Diagnostics (FIND) and the Critical Path to TB Drug Regimens will, together with the World Health Organization, New Diagnostic Working Group of the Stop TB Partnership, the USA CDC and the National Institute of Allergy and Infectious Diseases, collect and analyse WGS with associated phenotypic AST results for tens of thousands of isolates to gain sufficient confidence in the association between particular mutations and resistance, as even the largest WGS studies published to date have been underpowered and were not designed to achieve this goal [
The complexity of this task depends on whether a resistance gene is essential or non-essential. In the former case, only a limited spectrum of resistance mutations is possible. Consequently, the correlation between genotype and phenotype should be relatively easy to resolve, provided that methodological problems such as poorly defined CCs are addressed (e.g. there is a near perfect correlation between genotype and phenotype for rifampicin resistance and rpoB mutations, although this can depend on the medium used for phenotypic AST) [
]. The situation with non-essential genes is more complicated. For genes that are non-essential in vitro as well as in vivo, it is impossible to study the genetic basis of resistance comprehensively given that there are too many possible resistance mutations. The best example of this type of resistance gene is pncA, which is responsible for the activation of the pro-drug pyrazinamide [
]. Any loss-of-function mutation in this gene can confer resistance and a wide variety of mutations are found clinically (e.g. more than 4000 single codon changes are possible in pncA, excluding start codon changes and nonsense mutations, not all of which will cause resistance). By combining large data sets it is possible to distinguish resistance mutations from neutral polymorphisms, but novel mutations will continue to be discovered, albeit at a lower rate over time [
]. The remaining isoniazid resistance is due to a large number of rare mutations that are impossible to study in their entirety.
Some of the aforementioned challenges to introducing and validating WGS for AST of TB can be overcome over time. For example, the ability of WGS to detect hetero-resistance will improve as the cost of sequencing decreases and read lengths improve. Similarly, the ongoing re-evaluation of CCs will probably resolve some of the current systematic differences between genotype and phenotype. Moreover, the pooling of large data sets will clarify the role of rare resistance mechanisms and the level of resistance conferred by different resistance mutations or mechanisms. For example, some low-level isoniazid-resistant strains due to inhA mutations remain treatable with higher doses of the agent and the same may apply for some strains with low-level resistance to new-generation fluoroquinolones (codon 90 mutations of gyrA) [
]. However, it is impossible to study the genetic basis of antimicrobial resistance to all clinically relevant agents comprehensively because of the large number of possible resistance mutations for some agents. This means that WGS can mainly be used to rule in resistance, as opposed to rule out resistance. Nevertheless, this constitutes a significant improvement in current clinical practice, as WGS directly from the first positive culture would allow for established resistance mutations to key agents to be identified rapidly, thereby allowing treatment regimens to be adjusted within days as opposed to weeks or even months for phenotypic AST [
]. Based on these results, reference laboratories could also immediately commence phenotypic AST for all remaining relevant agents, including second-line agents (which are usually only tested if resistance to first-line agents is found, which introduces long delays). Consequently, WGS is unlikely to completely replace phenotypic AST for TB in the near future, but will result in less phenotypic testing over time and in more rapid identification of resistant isolates in many cases. However, it is likely that different countries will adopt their own policies in terms of how much phenotypic confirmation of genotypic results is required, based on the resources available and the local rates of resistance.
Clostridium difficile is the leading cause of healthcare-associated diarrhoea, the severity of which may vary from mild and self-limiting symptoms to fulminant disease, including pseudomembranous colitis. Hospital outbreaks are occurring with an increasing frequency, and the most severe outbreaks have been caused by hypervirulent C. difficile strains 027/NAP1, although other ribotypes (such as 078/NAP7&8) also seem to have the ability both to cause outbreaks and severe disease in affected individuals.
Acquired phenotypic resistance to tetracyclines, clindamycin, fluoroquinolones and rifampicin and corresponding resistance genes have frequently been reported in C. difficile and moxifloxacin resistance is used as an epidemiological marker for hypervirulent strains, and for ribotype 027 (NAP1) in particular. However, resistance to the agents that are used as primary therapeutics for C. difficile infection (i.e. vancomycin, metronidazole and fidaxomicin) is less common.
Phenotypic AST of C. difficile suffers from some drawbacks. As anaerobic conditions are required, it is costly and time-consuming and therefore often not routinely performed in the clinical laboratory and the correlation between in vitro susceptibility and clinical outcome in the individual patient has not been thoroughly studied. In light of this, genotypic AST using WGS appears an attractive alternative. Although resistance to fidaxomicin has been associated with mutations in genes encoding RNA polymerase (rpoB and rpoC) or in the marR homologue CD22120, the mechanisms underlying resistance to vancomycin and metronizadole are less well defined.
Some of the main lineages of C. difficile contain a vanG locus, which is expressed but does not appear to play a role in resistance to vancomycin in C. difficile [
]. To date, no clinical isolates have been identified that are resistant to vancomycin. Two laboratory-derived vancomycin-resistant isolates have been described. One had a substitution mutation in the rpoC gene and the second had two mutations, one in murG (CD2725) and the second in a locus named CD3659 [
Nitroimidazole genes (nimA–E) associated with metronidazole resistance in other anaerobic species, including several species of the Clostridium genus, have not been described in C. difficile. The exact mechanism(s) behind reduced susceptibility to metronidazole in C. difficile still remains to be determined, although there have been several reports of strains exhibiting elevated MICs. Such reports of metronidazole resistance have all observed loss of the resistant phenotype after passaging or low temperature storage [
]. There has been one reported clinical isolate, 027/NAP1 from Canada, that initially had an unstable resistance phenotype, but after serial passage in the presence of metronidazole the phenotype became stable [
]. Hence, the genetic mechanism for metronidazole resistance in C. difficile remains elusive.
Gaps in the knowledge base
To date there have not been any publications of large-scale studies comparing phenotypic with WGS-based AST for C. difficile.
AST of C. difficile using WGS could be a useful tool, both for guiding the choice of treatment of the individual patient and for epidemiological purposes. However, the knowledge gaps regarding the mechanisms underlying resistance to several of the first-line treatment options pose a great challenge. Studies comparing WGS-based approaches with phenotypic testing are needed and future work on resistance mechanisms to frontline antimicrobials are required.
Acinetobacter baumannii and Pseudomonas aeruginosa
Among non-fermentative Gram-negative bacteria Pseudomonas aeruginosa and Acinetobacter baumannii are important pathogens due to their ability to cause a variety of opportunistic infections, persist in the hospital environment and acquire antimicrobial resistance [
]. Genomic studies have shown that both P. aeruginosa and A. baumannii are associated with high genomic diversity and gene content due to frequent transfer/acquisition of mobile genetic elements, mobilization of insertion sequence elements, insertion sequence-mediated deletions and genome-wide homologous recombination [
], the increase in multidrug-resistant and in particular carbapenem-resistant P. aeruginosa and A. baumannii has resulted in infections caused by extensively drug-resistant and even pan-drug resistant isolates with very limited or no validated therapeutic options [
In both P. aeruginosa and A. baumannii acquired resistance genes are associated with various horizontally acquired resistance elements, although the majority of acquired resistance genes exist as gene cassettes in integron structures [
]. In addition, both species, and in particular P. aeruginosa, have an extraordinary capacity for modification of endogenous genes affecting functions such as membrane permeability, efflux, expression of intrinsic β-lactamases, antimicrobial targets and regulatory genes contributing to multidrug resistance [
To date, few comprehensive studies have investigated the concordance between phenotypic AST and WGS-based resistance prediction for P. aeruginosa or A. baumannii. Kos et al. related phenotypic susceptibility data for meropenem, levofloxacin and amikacin to the genome sequences of approximately 390 clinical isolates of P. aeruginosa [
]. The sensitivity and specificity for genotypic inference of meropenem and levofloxacin resistance were 91% and 94%, respectively. In contrast, a genotypic marker for amikacin resistance was identified for only 60% of the amikacin non-susceptible isolates. In addition, 30 of 283 amikacin-susceptible isolates were found to harbour genes associated with amikacin resistance. This is in contrast to a study by Wright et al., where a strong association between amikacin resistance and the presence of aphA6 and armA genes was observed in a collection of 75 clinical isolates of A. baumannii [
Although there is a lack of phenotypic–genotypic comparison studies with respect to prediction of clinical resistance, several genomic studies have been performed for epidemiological purposes and to decipher mechanisms of resistance to various agents in selected resistant isolates [
]. These studies are important to identify both intrinsic and acquired genotypic resistance determinants associated with resistance to various agents. For instance, recent investigation of isogenic colistin-susceptible and colistin-resistant isolates of both P. aeruginosa and A. baumannii from single patients revealed novel determinants associated with colistin resistance [
]. Further, the use of WGS as a tool to predict antimicrobial resistance has recently been studied using 178 A. baumannii bacterial genomes to evaluate the antimicrobial resistance gene database ARG-ANNOT and it was shown that such an approach could be used as a routine test [
Problems found or anticipated—gaps in the knowledge base
Although prediction of antimicrobial resistance based on the presence of a relatively limited number of acquired resistance genes and chromosomal resistance-associated mutations might give high sensitivity and specificity, the major challenge with respect to both P. aeruginosa and A. baumannii lies in the identification or prediction of resistance due to chromosomal alterations resulting in modification of expression levels, particularly with respect to efflux pumps, outer membrane proteins and intrinsic β-lactamases. For instance, resistance to β-lactams in A. baumannii can occur due to the insertion of elements such as ISAba1 and ISAba125 upstream of the intrinsic β-lactamase genes blaADC and blaOXA-51, increasing the expression of these genes and consequently resistance to cephalosporins and carbapenems, respectively [
]. Screening of genomes for insertion sequence elements in close association with resistance-associated genes, as well as for gene loss, will pose a significant challenge.
For P. aeruginosa the challenge is expected to be even greater due to the plethora of genes associated with intrinsic resistance and alterations in these genes or regulatory genes can confer resistance to several agents, even from different antimicrobial classes [
]. Further, alterations to one or, mostly, several of these mechanisms might be required to achieve clinical resistance (e.g. combination of decreased porin expression, increased efflux and/or increased β-lactamase expression) [
]. Altered expression of genes could be overcome by investigation of gene-expression analysis by RNA sequencing. However, specific studies on P. aeruginosa indicate that correlation between expression of genes on exposure to sub-MIC concentrations of antimicrobial agents and the genes implicated in intrinsic resistance is not always clearly observed [
In general these studies showed that prediction of resistance based on the detection of known acquired resistance genes and resistance-conferring mutations in antimicrobial targets can be used to investigate the phenotype–genotype relationship. However, additional comparative studies between phenotypic and genotypic methods using representative strain collections of P. aeruginosa and A. baumannii are required to evaluate the possibility of confidently predicting antimicrobial susceptibility/resistance by WGS. Further, for both species a greater understanding of the contribution to clinical resistance of alterations in intrinsic resistance genes is required. This will require not only WGS, but also knock-out and complementation studies of deleted/mutated determinants.
The epidemiological implications of using WGS
The epidemiology of AMR is determined by the spread of the host organisms harbouring resistance genes, and the spread of the resistance genes by different routes of horizontal gene transfer.
Classical methodologies used to study the epidemiology of AMR include strain genotyping with a variety of methods with a large variation in reproducibility and discriminatory power. This includes techniques such as multilocus sequence typing (MLST), pulsed field gel electrophoresis, variable number tandem repeat, multiple-locus variable number tandem repeat analysis, amplified fragment length polymorphism and Enterobacterial repetitive intergenic consensus-based PCR [
]. Resistance genes can be identified by micro-array approaches, PCR and sequence analysis. A variety of molecular techniques are needed to identify and characterize epidemiologically relevant mobile genetic elements involved in the horizontal transfer of AMR genes, such as plasmids, conjugative transposons or genomic islands. Dedicated PCRs and sequencing are needed to identify the genetic environment of the AMR genes such as integrons and/or transposons and insertion sequences, and this is crucial to understand the epidemiological behaviour of specific AMR genes [
PCR-based replicon typing is most commonly applied for plasmid characterization in Enterobacteriaceae. Relaxase typing is more comprehensive and phylogenetically more informative, but is less discriminatory within the Enterobacteriaceae, where there are major concerns regarding resistance at present. Plasmid MLST and similar techniques, such as Double Locus Sequence Typing or restriction fragment length polymorphism, are used to sub-type plasmids. In addition, toxin/anti-toxin systems encoded on plasmids in Enterobacteriaceae may be key epidemiological determinants. As a result of its complexity, plasmid epidemiology is currently beyond the capabilities of most clinical microbiology laboratories and is labour-intensive even for the reference laboratory.
WGS opens a world of opportunities for enhanced (molecular) epidemiology of AMR because, in principle, all essential information needed to study the epidemiology of AMR will be available in the sequences obtained. WGS is particularly effective for identifying and characterizing clonal distribution of monomorphic species such as S. aureus [
]. Importantly, WGS provides high-resolution typing information making most if not all of the traditional molecular typing approaches redundant. With its potential for an objective assessment of the gene content, such as presence of absence of resistance genes of particular public health importance, multicentre surveillance approaches would greatly benefit from the reporting of genomic resistance markers, obviating the need to rely on phenotypic AST profiles of doubtful inter-laboratory reproducibility.
However, WGS also has its weaknesses. For example, WGS is weak in managing direct repeats and insertions in plasmids and current bioinformatic cleaning often omits those from contigs. As a result, short-read WGS data can be misleading if studying plasmid-mediated outbreaks in which a broad host-range plasmid is moving freely between different species, in each of which it has a different phenotype.
To be able to analyse sequence output rapidly and identify all information needed for epidemiology as listed above, the AMR genes and plasmid types need to be determined in these sequences using genomic databases such as ResFinder or PlasmidFinder [
]. In silico arrays or PCRs are also commonly applied but short-read sequencing (e.g. as obtained with Illumina) is generally not sufficient to study the genetic environment of AMR genes.
Transfer (by transformation or conjugation) of a plasmid of interest into a ‘workhorse’ bacterium with a known genetic background, with or without subsequent plasmid enrichment during DNA extraction, will facilitate complete and correct plasmid sequence assembly. Long-read or single-cell sequencing (e.g. by PacBio or Oxford Nanopore) may be necessary [
], either alone or as a ‘scaffold’ for high-coverage short-read data (e.g. Illumina). However, both are beyond the capabilities of most clinical microbiology laboratories, and plasmid handling is relatively labour intensive. This may still leave specialized annotation problems, although direct annotation grammars can be helpful with these [
Since different users may have different demands for WGS data, a tiered approach can be applied.
Rapid identification of a targeted set of AMR genes may provide important information at a clinical level. The output could vary from answering a specific dichotomous question (e.g. does a sample contain an ESBL-producer or an MRSA isolate) to a more complete resistance/susceptibility profile [
]. The output should be based on the bacterial species and the information required for the clinicians.
The question of the positive and negative predictive values of WGS will be important, although their usefulness will depend on the targets to be identified, their diversity and their prevalence in the gene pools [
Identification of genes and subtypes (e.g. blaCTX-M-15 versus blaCTX-M-1, or blaCTX-M-3, or mecA versus mecC) may be important for outbreak management, infection control or even phylogeographic analysis.
In silico strain typing in cultured organisms is based on:
MLST (seven or more gene targets) for evolutionary relatedness;
Genomic islands such as SCCmec, SGI1, SXT;
SNPs, insertions/deletions (indels), and large structural DNA rearrangements (e.g. for tracking outbreaks/ mapping transmission chains).
In silico plasmid typing by mapping to a reference database (in silico microarray)
Sub-typing of plasmids by in silico PCR (pMLST, double locus sequence typing, replicon sequence typing (RST))
Phylogenetic analysis of the total sequence output.
WGS approaches can be used to track markers from the chromosome (e.g. MLST), from plasmids (e.g. incompatibility markers, or post-segregational killing/toxin/antitoxin markers) and individual genes—barcoding to tie them together in individual isolates by WGS using third-generation approaches such as PacBio or Nanopore.
The ideal method will provide sufficient depth and coverage to answer all of these questions, but will vary with the starting material: metagenomic approaches to DNA extracted directly from clinical samples will require a considerably higher number of sequencing reads than for analysis of a microorganism in pure culture.
At the time of writing the availability of reference databases for epidemiological questions remains limited both in the number of species and typing methods that are represented. Further development in this area will be crucial.
Clinical and wider impacts
The routine use of WGS in diagnostic and public health laboratories holds the promise of a revolution in the identification, typing, antimicrobial susceptibility testing and determination of pathogenicity of potential pathogens [
]. At present, at the initiation of antimicrobial chemotherapy or thereafter when definitive therapy is selected based on phenotypic AST, clinicians have no routine data provided on the likely pathogenic potential of any pathogens isolated. Future data from WGS linking pathogenicity determinants to adverse clinical outcomes for certain highly pathogenic strains may have significant impacts on chemotherapy—perhaps by identifying those at higher risk of infection-related complications, those who may require more aggressive or combination chemotherapy or prolonged intravenous courses of antimicrobial agents. Conversely, reassurance that some potential pathogens are of low pathogenicity may allow for shorter duration therapy, oral therapy, less intensive patient monitoring, fewer investigations and perhaps earlier discharge. Such approaches are, at present, almost entirely speculative but may have a greater clinical impact than the work done so far on the value of WGS in predicting phenotypic susceptibility or resistance when tested by conventional methodologies.
At present, proof-of-principle studies have been completed for WGS on common pathogens already isolated in pure cultures and hence most data related to WGS for predicting antimicrobial susceptibility assume an initial culture step. This is an obvious limitation in terms of speed of diagnosis compared with direct testing of specimen material. To date common pathogens such as E. coli and K. pneumoniae [
]. Although for other organisms, including C. difficile, little has been demonstrated to date. Most work remains where there are significant gaps in the knowledge base regarding resistance mechanisms. However, at present, we lack a clear understanding of how antimicrobial susceptibility data can be generated from WGS in a timely way for incorporation into clinical-care pathways and what the likely clinical impacts will be. In particular, we do not fully understand the barriers or facilitators to increased clinical use, assuming technical problems can be overcome. The costs of routine delivery of WGS data to predict AST have not been balanced against potential financial savings across the patient care pathway or the clinical impacts. At present, even feasibility studies to start answering these questions have not been reported.
One obvious potential advantage of WGS in AST is the increased speed of information flow even if at present WGS would depend on an initial culture step. Increased speed of diagnosis has been identified as a way of improving antimicrobial stewardship and patient outcomes. If WGS could be made to deliver pathogen identification and predict susceptibility for common pathogens within 8 hours of initial culture it may offer enough to impact on measures of patient outcome and use of antimicrobial agents to justify higher costs within the laboratory. However, the longer it takes for data to become available then the poorer the potential clinical impacts.
Thought needs to be given as to how WGS data will be presented to end-users. It is possible we will move from categorical reporting of susceptible, intermediate or resistant to reporting the probability of an isolate being susceptible or resistant based on pre-test probabilities (perhaps different in different hospitals or different areas within a hospital), the completeness of our genetic database for a particular pathogen and the presence or absence of resistance determinants as determined by WGS. We may even be able to give measures of confidence to these predictions. Such approaches will require significant staff education and evaluation as it is not clear how prescribers would respond to such data.
As >95% of pathogen identification and susceptibility testing in present clinical microbiology laboratories is based on around 20 bacterial species and a limited number of antimicrobial agents it may not be necessary to cover all possibilities to provide useful data rapidly, but rather we might focus on a limited number of agents for each pathogen initially and let more detailed data become available later.
At present, the use of WGS outside reference or research laboratories to determine antimicrobial susceptibility has not been tested. Preliminary data are promising and feasibility studies need to be conducted in a more clinical environment. It is likely that WGS will first be used as a tool to predict antimicrobial susceptibility in public-health microbiology laboratories in the coming years with subsequent use closer to the patient to predict susceptibility in pathogens such as M. tuberculosis before its wider application in diagnostic laboratories.
Conclusions and recommendations
This EUCAST Subcommittee report on the role of WGS in AST of bacteria has reviewed the state-of-the–art as a first approach. It refers to almost 200 published works and describes where we are at the time of writing (late 2015 to early 2016). Despite the volume of published literature already available we conclude that, at present, there are insufficient data to present a definitive document on the topic. Instead, this report is intended to form a baseline discussion document, which can be revisited and updated at regular intervals (probably every 18–24 months). This will be important as sequencing technologies become more affordable and more widely applied. This first version will provide the baseline against which to compare and assess future progress in the area.
We are aware of many ongoing, as yet unpublished, studies of phenotypic–genotypic AST concordance and it is certain that the amount of available data will increase in the near future. However, the quality of those data needs to improve and to be assured via more rigorous and ‘standardized’ approaches to data analysis. Bacterial AST is a fundamental activity that can be undertaken in any microbiology laboratory, but it is important to appreciate that the MIC or zone diameter measured reflects more than gene presence/absence; these values reflect multiple and complex interplays between different systems including cellular permeability, influx/efflux, target availability and binding, as well as enzymatic expression levels and activities. So there are many challenges in gathering and assessing evidence to consider whether AST can be replaced by a genotypic method such as WGS, which does not assess bacterial growth in the presence of antimicrobial agents.
At the present time, WGS-based analyses cannot yield an inferred MIC or zone diameter. Hence the potential utility of WGS-based approaches for AST must be considered at the level of detecting gene presence or absence. We will need more powerful bioinformatics tools in future if we seek to make inferences about antimicrobial susceptibility based on combinations of multiple different genes or contributory mutations. Furthermore, WGS does not directly provide information on levels of gene expression. Although other technologies can do so, e.g. RNA sequencing, it seems unlikely that these will find a place in the clinical laboratory before WGS.
It is our recommendation that the primary AST comparator for WGS-based prediction should be the ECOFF, wherever possible, to assess WGS-inferred ‘antibiograms’ (based on gene positivity) against phenotypically defined categories of wild-type or non-wild-type. Adoption of ECOFFs as the primary comparator would make meta-analysis across different publications simpler, as comparison of data would not be subject to confounding factors such as differences in breakpoints from different organizations. Nevertheless, demonstrating concordance with interpretation based on clinical breakpoints will ultimately be necessary for the use of WGS-based testing to guide clinical decision making, but this will probably be more difficult to demonstrate for all organisms and antimicrobial agents. For this reason, assessing WGS-derived data against clinical breakpoints represents a tougher challenge, but should be encouraged as a secondary comparator and should ideally be done using the same data sets that are used for ECOFF-based assessments.
The challenges of harmonizing antimicrobial susceptibility breakpoints across multiple parallel and independent national and international systems have been ongoing for >50 years, and we still lack a globally harmonized system. When considering the introduction of WGS-based approaches, we need to balance the needs of clinical laboratories, where standardized and validated procedures are required to meet accreditation standards, with the need for intellectual and innovative academic challenge, which drives many of those who generate bioinformatics tools. We recommend that there should be international agreement on the most appropriate and effective principles to facilitate early standardization and harmonization of analytical approaches and interpretative criteria for WGS-based predictive AST. However we also recommend at the present time that we need to be pragmatic and must accept that bioinformatics algorithms will vary. It is unrealistic to suggest a single analytical approach. We recommend that different bioinformatics tools should perform to minimum standards and should be calibrated and equivalent in terms of the results generated.
To facilitate such comparisons, we recommend that performance of different bioinformatics tools should be calibrated against a single database of all known resistance genes/mutations. There have been efforts and investments in this direction, but multiple solutions exist and are used, thereby confounding comparisons. Establishing a single database will ensure that there is parity of analysis and will facilitate measurement of comparative accuracies across different systems. Such a global reference database would need to be updated regularly, and must have strictly curated minimum standards for the inclusion of new resistance genes and mutations. An important function of a centralized database would be to control resistance gene nomenclature (since poor annotation can confound current analyses, where multiple ‘hits’ from searches may reflect inconsistent annotation of the same gene). The inclusion criteria for any new determinant would probably need to be set higher than those accepted for publication because strong evidence of causal association would maximize the predictive values of inferring AST phenotype from genotype.
The organisms considered in this report can be divided into three main groups in terms of the available evidence for predicting AMR using WGS. First, at present most is known for S. aureus and M. tuberculosis and it is apparent that there is now momentum behind their deeper investigation. For a second group of organisms, including the Enterobacteriaceae (including Salmonella), initial studies have shown promise, but serve to highlight, through poor concordance, where gaps exist in the knowledge base about resistance mechanisms either in some genera or species or for some agents. For a third group of organisms, including S. pneumoniae, N. gonorrhoeae, P. aeruginosa, A. baumannii and C. difficile it is apparent that more studies are required before we can even define the extent of the gaps in the knowledge base accurately. More focused study and additional funding resources are needed as a priority to improve knowledge for the second and third of these groups.
Expansion of the knowledge base is a critical priority if WGS is to be considered seriously as a rival to phenotypic AST, better defining resistance determinants across all organisms. It seems likely that WGS may replace phenotypic testing ‘soon’ for surveillance purposes, where the low error rate has low impact. This would need to be phased to reflect the evidence base for the organism–agent being reported, and would require surveillance schemes to expand their inclusion criteria to accept WGS-inferred data. In reference laboratories, WGS-based AST may also be adopted ‘soon’, unless the reason for investigation relates to individual patient management, is for agents or species shown to have poor genotypic–phenotypic concordance, or is to assess the activity of novel agents.
Available published evidence does not currently support use of WGS-inferred susceptibility to guide clinical decision making. Such a paradigm shift would require large-scale education and behavioural change among microbiologists and prescribers. Gene (or mutation) absence cannot always reliably predict susceptibility, so robust evidence will be needed to show that the potential of genotypic tests for very major errors does not adversely impact on treatment outcomes. It seems likely that this may first be considered for M. tuberculosis, where the speed of WGS-generated results offers advantage over traditional AST methods. However, even if the evidence can be generated and expectations changed, for most bacteria and in most countries the current cost and speed of inferring antimicrobial susceptibility from WGS data remain prohibitive to wide adoption in routine clinical laboratories. Nevertheless, as advances in the knowledge of polymorphisms associated with antimicrobial resistance, technology, data sharing and training become more widely available in high-burden countries, sequencing technologies will be more attractive and cost effective as the cost of goods comes down.
Finally, there may even be scope for WGS-based approaches to be used to better understand and improve some areas of phenotypic AST. For some agents, there are technical challenges in measuring susceptibility in any way that meaningfully correlates with outcome. If WGS data could be correlated directly with outcome, then this revolutionary tool might aid development of improved criteria for interpreting phenotypic data.
During the 2015 ECCMID meeting in Copenhagen, Neil Woodford was approached (separately) by Derek Brown and Gunnar Kahlmeter, who were both very keen for me to put together a group to consider how well WGS can predict antimicrobial susceptibility patterns and how this game-changing technology could impact on clinical microbiology, now and in the future. This coordinated and two-pronged approach clearly achieved its goal, and shortly afterwards this EUCAST Subcommittee came into being.
Over the past year the team has ‘met’ virtually and this report marks its first output. We quickly agreed that there are too few data to present a definitive document on the topic, but that it would be necessary to review the state-of-the–art as a first approach. This document is therefore presented as a baseline and as a discussion document, and should be considered as such. It marks where we are now, and we present it in the knowledge that it will require updating, probably regularly, as sequencing technologies become more affordable and more widely applied, and as the analysis of the WGS data becomes more rigorous and standardized, and the quantity and quality of evidence for phenotypic–genotypic concordance (or lack thereof) relating to antimicrobial susceptibility improves.
Neil Woodford would like to thank all members of the Subcommittee for their efforts over the last year. It has been a pleasure working with them all. Special thanks are owed to Matt Ellington and Oskar Ekelund, who compiled the multiple contributions to the report.
Thanks also to those who contributed comments during the consultation phase, many of which served to improve our arguments and the report.
The Subcommittee thanks Thomas Schön for editing and assistance with the TB evidence report. He is a member of the EUCAST Subcommittee on Antimycobacterial Susceptibility Testing and is supported by research grants from the Swedish Heart and Lung Foundation and Marianne and Marcus Wallenberg Foundation .
Thanks also to Mrs Sushma Udani for providing administrative support for the Subcommittee.
CUK is a consultant for the Foundation for Innovative New Diagnostics and was technical advisor for the Tuberculosis Guideline Development Group of the World Health Organization. The Bill & Melinda Gates Foundation and Janssen Pharmaceutica covered his travel and accommodation to present at meetings. The European Society of Mycobacteriology awarded CUK the Gertrud Meissner Award, which is sponsored by Hain Lifescience. CUK collaborated with Illumina Inc. on a number of scientific projects.
MJE, KLH, MD and NW are part of Public Health England's AMRHAI Reference Unit which has received financial support for conference attendance, lectures, research projects or contracted evaluations from numerous sources, including: Accelerate Diagnostics , Achaogen Inc , Allecra Therapeutics , Amplex , AstraZeneca UK Ltd , Basilea Pharmaceutica , Becton Dickinson Diagnostics , BioMérieux , Bio-Rad Laboratories , The BSAC, Cepheid , Check-Points B.V. , Cubist Pharmaceuticals , Department of Health , Enigma Diagnostics , Food Standards Agency , GlaxoSmithKline Services Ltd , Henry Stewart Talks , IHMA Ltd , Kalidex Pharmaceuticals , Melinta Therapeutics , Merck Sharpe & Dohme Corp , Meiji Seika Pharma Co , Mobidiag , Momentum Biosciences Ltd , Nordic Pharma Ltd , Norgine Pharmaceuticals , Rempex Pharmaceuticals Ltd , Roche, Rokitan Ltd , Smith & Nephew UK Ltd , Trius Therapeutics , VenatoRx Pharmaceuticals and Wockhardt Ltd .
CG received conference support and had research collaboration with AB Biodisk (later purchased by bioMérieux); research collaboration with bioMérieux, Checkpoints, Q-linea; received speaker’s honoraria from BioRad, Liofilchem, Pfizer, Cepheid, Cubist; and carried out consultancy work for the bioinformatics company 1928 Diagnostics.
JI has received travel support and honoraria from Astra Zeneca, MSD, and Pfizer in the last 5 years for advisory board attendance and lectures. All current funding is from the National Health and Medical Research Council of Australia (NHMRC) grant 1001021 .
RC has participated in educational programmes from Cepheid, Roche, AstraZeneca, MSD and Novartis and has received financial research support from Amplex , AstraZeneca , Cepheid , Cubist Pharmaceuticals , Ferrer International Laboratories and MSD .
GK serves as consultant for Oxoid Ltd on technical matters related to antibiotic disc quality, media for AST performance and Quality Control.
DM leads the Dutch National Reference Laboratory for Antimicrobial Resistance in Animals at Central Veterinary Institute in Lelystad. Financial support is provided by the Ministry of Economic Affairs , the EU and he coordinates several public private partnerships in which 50% of the funds come from the public domain and 50% from animal-producing organizations including Aviagen, Vion Food Group and van Drie Group. Travel and accommodation is solely paid by public funds.
AMacG, JMR, MM, HH, OS, MTGH, OE, TN, HG, FMA and TP have stated explicitly that there are no conflicts of interest in connection with this article.
IOfS. Clinical laboratory testing and in vitro diagnostic test systems. Susceptibility testing of infectious agents and evaluation of performance of antimicrobial susceptibility test devices.
(Part 1: Reference Method for Testing The in vitro Activity of Antimicrobial Agents Against Rapidly Growing Aerobic Bacteria Involved)in: Infectious Diseases International Standard 20776-1. ISO,