Molecular diagnostic and genetic characterization of highly pathogenic viruses: application during Crimean-Congo haemorrhagic fever virus outbreaks in Eastern Europe and the Middle East.

Several haemorrhagic fevers are caused by highly pathogenic viruses that must be handled in Biosafety level 4 (BSL-4) containment. These zoonotic infections have an important impact on public health and the development of a rapid and differential diagnosis in case of outbreak in risk areas represents a critical priority. We have demonstrated the potential of a DNA resequencing microarray (PathogenID v2.0) for this purpose. The microarray was first validated in vitro using supernatants of cells infected with prototype strains from five different families of BSL-4 viruses (e.g. families Arenaviridae, Bunyaviridae, Filoviridae, Flaviviridae and Paramyxoviridae). RNA was amplified based on isothermal amplification by Phi29 polymerase before hybridization. We were able to detect and characterize Nipah virus and Crimean-Congo haemorrhagic fever virus (CCHFV) in the brains of experimentally infected animals. CCHFV was finally used as a paradigm for epidemics because of recent outbreaks in Turkey, Kosovo and Iran. Viral variants present in human sera were characterized by BLASTN analysis. Sensitivity was estimated to be 10(5) -10(6) PFU/mL of hybridized cDNA. Detection specificity was limited to viral sequences having ~13-14% of global divergence with the tiled sequence, or stretches of ~20 identical nucleotides. These results highlight the benefits of using the PathogenID v2.0 resequencing microarray to characterize geographical variants in the follow-up of haemorrhagic fever epidemics; to manage patients and protect communities; and in cases of bioterrorism.


Introduction
Viruses recognized as highly pathogenic for humans must be manipulated in a Biosafety level 4 (BSL-4) laboratory. They include viruses associated with encephalitis and respiratory infections, such as recently emerged members of the genus Henipavirus, family Paramyxoviridae and haemorrhagic fever viruses in the families Arenaviridae, Filoviridae, Bunyaviridae and Flaviviridae [1]. Infections with these viruses lead to a wide spectrum of clinical outcomes, from flu-like and malaria-like symptoms to vascular complications that may cause death [1,2]. Most members of the genus Flavivirus (family Flaviviridae) are arthropod-borne, as are those of the family Bunyaviridae, except for the genus Hantavirus which is rodent-borne or insectivore-borne [2,3]. Viruses of the family Arenaviridae are also rodent-borne [2]. Those of the genus Henipavirus have bat reservoirs but may also infect humans through contact with infected horses or pigs [4]. Recent data indicate that bats are also probable reservoirs and vectors for viruses of the family Filoviridae [5,6]. Interhuman transmission and nosocomial infections also contribute to spreading the diseases [2,7].
Development of vaccines to prevent infection by these emerging zoonotic viruses is limited and only ribavirin has been used as an efficacious treatment for several of them [1], so early, rapid and specific diagnosis is critically important for disease control. At-risk areas should possess the necessary facilities and equipment, as well as rapid tests, to be prepared for public health emergencies [2][3][4][5][6][7][8]. Accurate diagnoses have traditionally relied on specific serological and virological analyses, which include western blotting, ELISA, immunofluorescence staining, genome detection by PCR and quantitative PCR, and ultimately, virus isolation [9][10][11][12][13]. Molecular methods are rapid and specific but are limited by the high genetic variability among different viral strains. To overcome this limitation, macroarray and microarray technology platforms have been developed to detect and identify a large number of pathogens in a single assay [14][15][16][17][18][19][20]. Long oligonucleotide probes have been used previously for the detection of viruses associated with haemorrhagic fevers [16]. Low-density macroarrays allowed different variants of Crimean-Congo haemorrhagic fever virus (CCHFV) to be rapidly detected [17], but were complicated by a requisite reverse transcription (RT-) PCR step. High-density resequencing microarrays not only detect pathogens but also determine nucleic acid sequences to single base-pair resolution. A large panel of viral genome sequences from different geographical origins can be characterized in a single test. The high-density resequencing DNA microarray, PATHOGENID v2.0, has been shown to be useful for rapid diagnosis during emerging viral infections, such as the 2009 influenza pandemic [18], and was useful for genotyping members of the family Rhabdoviridae [19].
Here, we used the PATHOGENID v2.0 microarray to detect highly pathogenic viruses. We first validated the microarray with in vitro samples by analysing supernatants from cells infected by prototype virus strains and variants belonging to five families of BSL-4 agents (Arenaviridae, Bunyaviridae, Filoviridae, Flaviviridae, Paramyxoviridae). We then evaluated its performance during a health emergency situation by testing human sera from CCHFV outbreaks in Turkey (2009), Kosovo (2001) and Iran (2009). CCHFV belongs to the genus Nairovirus, family Bunyaviridae and has the largest geographic distribution among haemorrhagic fever viruses [21,22]. Zoonotic infection occurs either directly through its vectors, which are various tick species from the genus Hyalomma, or indirectly through contact with infected livestock. Hospital environments are also vulnerable to inter-human transmissions [23]. CCHFV infection is associated with several clinical outcomes, some of which can become life threatening [22]. CCHFV outbreaks or sporadic cases have occurred in Mauritania [24], Iran [10], Turkey [25], Kosovo [26] and Sudan [23].

Ethics statement
This work includes a retrospective study on 12 human sera from clinical specimens submitted to France National-WHO-OIE Reference Centres for diagnosis during CCHF epidemics in Kosovo, Turkey and Iran.
The collection of the remaining samples to be used for scientific purpose was declared to and approved by the Comit e de Protection des Personnes, Ile-de-France I and the French Research Ministry (no. DC 2011-1471) according to French regulations.
Animal experimental methods were approved by the R egion Rhône Alpes Ethics Committee (France). (Table 1 and Table 2) were cultured and isolated in permissive Vero-E6 cells as previously described [11,27]

Animal biopsies
One non-human primate, a New World squirrel monkey (Saimiri sciureus) was experimentally infected intravenously with 10 3 PFU UM-MC1 Malaysian isolate of Nipah virus [28] as previously described [29]. It was imported from a breeding colony in French Guiana and housed in the BSL-4 animalcare facility in Lyon. The animal was observed daily for signs of disease onset; disease symptoms appeared at day 10 and lasted for 3 days before the moribund monkey was humanely euthanized. A brain biopsy was taken at necropsy and frozen at À80°C.
In another experiment, ten newborn Swiss mice were intracranially inoculated with 20 µL CCHFV (i.e. 10 3 PFU) each in the BSL-4 animal-care facility in Lyon. Seven days after infection, mice were euthanized. Brain was collected, crushed in phosphatebuffered saline 1 9 (1/10 weight/volume), and clarified by centrifugation for 15 min at 600 g before storage at À80°C.

RNA extraction
RNA extraction was performed using the QIAamp Viral RNA Mini Kit (Qiagen Inc., Valencia, CA, USA) as previously described [11]. For BSL-4 viruses, the cell lysis step was carried out at the Jean M erieux BSL-4 Laboratory (Lyon, France) according to the validated BSL-4 procedure.

Amplification of viral RNA
Extracted viral RNAs were reverse transcribed into cDNA using SuperScript III reverse transcriptase (Invitrogen Inc., Carlsbad, CA, USA) then amplified by the whole transcriptome amplification (WTA) approach in the presence of random hexamer primers. An optimized protocol based on isothermal amplification by the Phi29 polymerase was applied to the QuantiTect Whole Transcriptome Kit (Qiagen) as previously described [30].

Quantitative RT-PCR and PCR
Quantitative RT-PCR and PCR amplifications of CCHFV sequences present in infected cell supernatants or human sera were performed in a Light-Cycler Instrument (Roche Applied Sciences, Basel, Switzerland) [31]. Treated samples were: (i) extracted RNA, (ii) cDNA obtained following reverse transcription of extracted RNA using random primers, and (iii) WTA products obtained following amplification by Phi29 polymerase.
Hybridization to PATHOGENID v2.0 microarray and data analysis The PATHOGENID v2.0 microarray is the second generation of a microarray developed through a collaboration between Affymetrix and Institut Pasteur [19,30]. It was designed to detect 949 genes, including 126 different viral sequences [18,19], 18 of which correspond to highly pathogenic viral agents ( Table 1). The entire microarray experimental procedure is summarized in Figure 1. Total cDNA (20-25 µg in 25 µL) that had been amplified from 100 µL of cell culture supernatant or from 25 µL of a serum sample was fragmented, labelled and hybridized overnight at 45°C to the PATHOGENID v2.0 microarray. The array was then washed and scanned according to instructions provided by Affymetrix. Results were analysed using GENECHIP OPERATING SOFTWARE version 4.0 (GCOS), GENECHIP SEQUENCE ANALYSIS SOFTWARE version 4.0 (GSEQ), and the ABACUS algorithm [32]. The call rate value (the percentage of nucleotides identified by the microarray) obtained from each sample hybridized on the microarray was used to determine the degree of hybridization of that sample and to compare it with that of other samples. All the obtained sequences were exported into a FASTA-formatted file and then subjected to BLASTN analysis to identify viral variants.
After scanning and analysis, all the chips were destroyed according to BSL-4 waste guidelines.

Direct sequencing
All specimens used either for the validation steps of the PATHOGENID v2.0 microarray or for clinical investigation of the outbreaks, were sequenced directly. To analyse the CCHFV strains, classical, nested or semi-nested PCR were performed to amplify the region tiled on the microarray, e.g. the 531 bases of the L segment encoding the RNA-dependent RNA polymerase. Degenerate primer design and sequence analysis were performed using MACVECTOR software (MacVector Inc., Cary, NC, USA). Primer position refers to the L genome segment of the prototype CCHFV strain (IbAr10200): fw2645 (5′-TGCTCWTTYATTGCCTGTGC-3′); rev3269 (5′-TNACACCRTTGGGGTGACA-3′); fw2576(5′-GGGAAAA TAAGGACAGACCA-3′); rev3371 (5′-TCYGTTAAGCATT CATTRCT-3′). The PCR fragments were purified by ultrafiltration before sequencing (Millipore, Billerica, MA, USA). Sequencing was performed using a BigDye Terminator v1.1 cycle sequencing kit (Applied Biosystems, Carlsbad, CA, USA) and purified by ethanol precipitation. Sequence chromatograms from both strands were obtained on an automated sequence analyser ABI3730XL (Applied Biosystems) with the PCR primers. The percentage of sequence divergence was calculated for each sample by determining the number of mutations relative to the prototype sequence tiled on the microarray.

Results
We used the high-density PATHOGENID v2.0 resequencing microarray to detect and identify a number of different highly pathogenic viruses. This work was divided into two parts: (i) a validation step, in which we used supernatants from cultured cells infected with viral strains that matched the prototype probes tiled on the microarray and their variants, and (ii) an exploratory step, in which we used human sera from CCHFV outbreaks to evaluate the potential of the microarray to be used in public health emergencies.

Detection and differential diagnosis of viral prototype strains and their variants
We assessed whether the PATHOGENID v2.0 microarray could be used to correctly detect and identify different viruses present in a single sample designed to resemble a complex biological specimen that might occur in nature or the laboratory (i.e. screening a pool of samples). Hence, total RNA was extracted from the supernatants of cells that had been infected with a single viral strain. Then, pools of RNA from up to three different supernatants were made to resemble likely combinations that might coexist in the same geographical area or animal host (Table 2). For two viruses that were absent from our laboratory collection (Junin virus and Sin Nombre virus), two plasmids encompassing the synthetic sequences tiled on the microarray were introduced into certain pools after the reverse transcription step and were then further amplified by WTA. The microarray detected and characterized each virus prototype to similar levels of sensitivity whether the viral RNA was tested alone or in a pool (Table 2). Similar results were obtained when viruses were mixed before RNA extraction [18]. These results indicate that detection of one virus was not affected by the presence of one or two others. For the family Arenaviridae, the Junin virus plasmid clearly validated the homologous sequence on the microarray (call rate: 98% In summary, the PATHOGENID v2.0 resequencing microarray very efficiently detected: (i) prototype virus strains and the two synthetic probes with excellent call rates (>97%); (ii) variants with high call rates similar to those of prototype strains (e.g. Lassa virus Guinea, CCHFV Mauritania, Tchoupitoulas virus, Marburg marburgvirus Musoke, Zaire ebolavirus Gabon); (iii) variants with moderate call rates (e.g. 41.6%, for Omsk haemorrhagic fever virus). Variants with low call rates (30.9% for CCHFV from China, 28.7% for Lassa virus from Ivory Coast) were also detected, although less significantly. Finally, highly divergent variants were not detected by the microarray (data not shown).

Application of the microarray to CCHFV outbreaks
We next evaluated the ability of the microarray to detect viruses in human serum samples that were collected during virus outbreaks. CCHFV was chosen as an example because this virus has emerged several times in recent years, particularly in Eastern Europe (Balkan region) and the Middle East. We used (i) sera from 12 CCHFV-positive patients from recent outbreaks in Turkey (2009) (Table 2). We sequenced 531 bp of the polymerase gene of each strain/ isolate and constructed a phylogenetic tree that also included all the CCHFV sequences available in GenBank (Fig. 2). Phylogenetic analysis distinguished five genetic clusters, as has been previously described [26,33,34]. Two clusters are in Africa: one is spread from western (Mauritania, Senegal and Nigeria) to southern Africa (South African Republic) and includes the Nigerian and Mauritanian sequences; the other is restricted to Equatorial Africa (Congo, Uganda The microarray clearly detected three out of the four CCHFV reference strains, the China strain being only poorly detected (call rate 30.9% but no BLASTN confirmation). It also allowed the geographical characterization of five out of the 12 CCHFV serum samples: two samples from Turkey and three from Kosovo, all belonging to the Eurasian cluster ( Table 3). The two remaining Kosovo samples and all five samples from Iran were not detected.
To determine why these samples were not detected, we characterized the viral genetic material at each step of the detection process (Table 3). We used quantitative PCR to precisely measure the amounts of specific viral genetic material present before and after RNA amplification. The amount of viral RNA in each original sample was comparable to or slightly higher than (AEl0 1 maximum) the amount of specific cDNA after random priming. This indicated that reverse transcription did not substantially affect the amount of specific viral genetic material. In contrast, the WTA isothermal amplification of cDNA by Phi29 polymerase significantly increased this amount. The increase from the original amount of RNA in the sample to the cDNA after WTA was between 1.08 9 10 2 PFU/mL to 3.46 9 10 5 PFU/mL, with a mean increase of 3.72 9 10 4 PFU/mL. Interestingly, the lower amount of amplified cDNA detected by the microarray was 2.8 9 10 5 PFU/mL for the Kosovo sample 429 (Kosovo samples 423 and 426 with 3.5 9 10 4 PFU/mL and 8.1 9 10 4 PFU/mL respectively, were not detected). On the other hand, the Turkey sample 090139 with 9.9 9 10 5 PFU/mL was not detected. As Kosovo and Turkey samples are equally divergent from the tiled sequence (∼10% ; Table 3), the sensitivity detection limit of the microarray must therefore be estimated between 10 5 and 10 6 PFU/mL of amplified cDNA.
Limited genetic material did not explain why the five samples from Iran were not detected, because the amounts of amplified cDNA hybridized on the microarray (8.9 9 10 6 to 5.0 9 10 8 PFU/mL) were all well above the 10 5 /10 6 PFU/mL detection limit ( Table 3). The China strain was poorly detected despite the presence of sufficient material hybridized (5.0 9 10 8 PFU/mL). Therefore, a degree of divergence of about 13.7-14.7% from the tiled sequence (Nigerian IbAr10200 strain) is the specificity detection limit of the microarray.
For the sequences detected by the microarray, call rate values were between 29.2% (Kosovo 429) and 70.8% (Kosovo 427), which were globally lower than those obtained from the infected cell supernatants (62.9-99.6%). Nevertheless, the BLASTN analysis allowed the geographical origin of the different isolates to be assessed with a precision dependent on the quality of the call rate. Sequences from samples having call rates >70% were precisely segregated into their specific sub-cluster in the phylogenetic tree along with sequences obtained by their direct sequencing (Fig. 2). This is the case for the Nigeria and Mauritania strains and for the Kosovo 427 serum (Eurasian sub-cluster). The only difference consisted of a longer branch on the tree that was proportional to the number of nucleotides undetermined by the microarray. For the Kosovo 429 and Turkey 090137 sera, which yielded lower call rates (29.2% and 45.7%, respectively), the analysis nevertheless specified that they belonged in the Eurasian cluster. CCHFV IbAr10200 (mouse brain) 1.9 9 10 4 2.0 9 10 5 1.9 9 10 9 82.6 IbAr10200 1.9 Nipah virus UMMC1 (monkey brain) 3.08 9 10 5 4.4 9 10 5 2.3 9 10 9 60.9 UM-MC1 ND CCHFV, Crimean-Congo haemorrhagic fever virus; ND, not done. a Specific viral genetic material evaluated by quantitative PCR, expressed in equivalent PFU/mL. b Call rate for the detection of the strain/isolate on the microarray. c Percentage of divergence (531 bp region in the polymerase gene) against the sequence tiled on the microarray. The capacity of the microarray to detect viruses in animal samples was tested (Table 4). CCHFV was detected in the brain of newborn mice experimentally infected intracranially for the purpose of virus isolation. The amplification of the cDNA by Phi29 was even more efficient than for human serum samples (increase ratio of 10 5 from the original RNA, 10 4 from the cDNA), which indicates that the complexity of genetic material of the sample did not impair WTA amplification. However, the call rate was lower than for supernatants of cells infected with the same CCHFV IbAr10200 strain (82.6% versus 99.6%), suggesting a higher background for hybridization (Table 3).
In addition, the neurotropic Nipah virus was detected in the brain of a monkey moribund upon an experimental intravenous infection. As observed above, the complexity of genetic material in the sample did not significantly affect the amplification of viral material (increase ratio of 5.2 9 10 3 from the cDNA) but generated a higher background for hybridization (call rate 60.9% versus 99.6%) compared with that obtained with cell supernatant infected with the same viral strain (Table 3).

Discussion
Highly pathogenic viruses are endemic in developing countries where their impact on public health is especially important in light of the absence of efficacious treatments and vaccines [1]. Occasionally, they can be brought into the developed world by travellers and could be misused for bioterrorism. These viruses produce haemorrhagic fevers, encephalitis or respiratory symptoms, but their aetiology is hard to establish in the absence of specific clinical symptoms. Hence, rapid differential diagnosis during outbreaks represents a critical public health priority.
Among the molecular techniques used in clinical and field diagnosis, (RT)-PCR is considered a reference standard because of its versatility and rapid turnover. However, it may also be limited by pitfalls such as the genetic variability of the viral isolates or doubtful aetiology requiring the design of a battery of specific or degenerated primers, etc. Under these conditions, DNA microarray technology offers the advantage of performing a differential diagnosis in a single test. It has already proven effective for pathogen detection and epidemiological studies [14,20,35]. The GREENECHIP 60-mer oligonucleotide array provided a good level of sensitivity for the diagnosis of different infections including viral haemorrhagic fevers, but was problematic because it required correction of probe intensities and subtraction of the negative control [16]. The resequencing microarray approach rapidly identifies virus variants while simultaneously characterizing their genome sequences [20]. The confidence levels of these data depend on the virus's similarity to a tiled reference sequence [36]. It is a promising diagnostic alternative for RNA viruses which have high levels of genetic variability [15,19,37]. The PATHOGENID v2.0 resequencing microarray has precisely identified the geographic origin of virus isolates, which is crucial for monitoring an epidemic or a pandemic [18]. It has also been used to help in genotyping of viruses for taxonomic purposes [19]. In our study, we evaluated the ability of this microarray to detect variants of highly pathogenic BSL-4 viruses from the families Arenaviridae, Bunyaviridae, Flaviviridae, Filoviridae and Paramyxoviridae. We first validated its spectrum in differential diagnosis, then explored its potential in sensitivity and specificity for use with human serum samples from CCHFV outbreaks in Eastern Europe (Balkan region) and the Middle East.
Validation was performed using different types of samples (i.e. cell supernatants, human sera and animal brain) at different degrees of complexity and divergence from the tiled sequence. In single analyses containing multiple virus types, the microarray was able to identify specific viruses among pathogens that produce similar symptoms, and to discriminate between variants of different origins. This is crucial for clinical management of outbreaks that may involve viruses, bacteria or parasites [1,2,9]. The ∼48-h procedure required to complete the assay may appear less rapid than classical PCR-based methods. However, when a differential diagnosis is needed for an unknown aetiology, the PATHOGENID v2.0 microarray might be competitive because it does not require (i) designing specific primers for all potential etiologic agents, (ii) setting up the corresponding PCR assays, and (iii) performing the sequence and bioinformatic analyses.
Crimean-Congo haemorrhagic fever virus was chosen as a model infectious agent with which to test the microarray because it has a widespread geographic distribution [8,10,21,22] and substantial genetic diversity [24,25,33,34,[38][39][40]. The Nigerian strain (from the African cluster [41] tiled on the microarray: (i) perfectly detected variants of the same African cluster (e.g. Mauritania, call rate: >97%); (ii) correctly identified viruses of the Eurasian cluster, which are about 9% divergent (e.g. Turkey, call rate: 70%); and (iii) weakly detected viruses from the Middle East cluster (China, call rate: 30.9%). Analysis of human sera from recent epidemics in Kosovo, Turkey and Iran clearly demonstrated the utility of this microarray for detection and characterization by phylogenetic analysis of viruses circulating during outbreaks (Table 3 and Fig. 2). For example, it showed that variants from two sub-groups of the Eurasian cluster were co-circulating in the Balkan region (Kosovo/ Turkey), which confirmed previous observations [42].
Using quantitative RT-PCR [28], the microarray sensitivity limit was estimated to be between 10 5 and 10 6 PFU/mL of hybridized cDNA per sample. As the mean amplification ratio from the original RNA to the cDNA after WTA was 3.72 9 10 4 (considering all samples) and 1.75 9 10 4 (only considering serum samples), the detection limit of the microarray is between 10 1 and 10 2 equivalent PFU of original viral RNA per mL of serum. This compares favourably with the sensitivity limit of the quantitative RT-PCR method described by W€ olfel et al. [31], which detected 1164 copies/ mL of plasma. The specificity limit in terms of divergence from the tiled L segment sequence (531 nucleotides) was estimated at about 13-14%, a value exceeded by the undetected Iranian samples, and approached by the poorly detected China strain (13.7%), which lacked significant match by BLASTN analysis. The CCHFV isolate from Kosovo 429 and the Lassa virus strain AV, despite their low call rates (29.2% and 28.7%, respectively) were detected by BLASTN because they share, respectively, stretches of 21 and 25 consecutive nucleotides with the tiled sequence. This was not the case for the CCHFV isolates Iranian 407 (28.4%) and Turkey 09139 (33.5%) with similar call rates (28.4% and 33.5%, respectively) but sharing no stretches longer than 11 and 16 nucleotides. This indicates that the microarray may preferentially identify sequences that have stretches of ∼20 consecutive nucleotides identical to the tiled sequence, regardless of the overall similarity.
Apart from human samples, the potential of the microarray was also preliminarily tested in animal organ material. It was able to detect viruses in brain samples from experimentally infected animals. This has been demonstrated not only in mouse brain that was intentionally infected intracranially for virus isolation, but also in moribund Saimiri sciureus infected intravenously with the neurotropic Nipah virus. In both cases, the amplification of the viral sequences was not affected by the complexity of brain genetic material but a higher background was observed during the hybridization step (lower call rates).
Taking all results together, there is still room to enlarge the spectrum of pathogen detection by increasing the capacity of the microarray. This would allow not only the detection of all currently known isolates but also the discovery of new ones with reliable sequence information. To this purpose, the nextgeneration panvirological microarray, PATHOGENID v3.0, will include additional CCHFV sequences from the Middle East, Greece and Asian clusters, as well as geographical variants of families Filoviridae (Bundibugyo ebolavirus, Sudan ebolavirus and Ivory Coast strain), Arenaviridae (Ippy virus, Mopeia vrus, Mobala virus and Tacaribe virus), Paramyxoviridae (Tioman virus) and Bunyaviridae (Prospect Hill virus). This improved covering of the sequence space will allow detection of new emerging viruses substantially divergent from the tiled sequences. Table S1. Raw sequences in FASTA format obtained following hybridization on the Pathogen IDv2.0 microarray of amplified viral RNA obtained from (A) cellular supernatants, (B) human sera and (C) animal brain. The sequences are listed following the same order of Tables 3, 4 and 5.