期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Genome-Wide Comparisons of Phylogenetic Similarities between Partial Genomic Regions and the Full-Length Genome in Hepatitis E Virus Genotyping

Shuai Wang Wei Wei Xuenong Luo Xuepeng Cai 《PloS one》2014,9(12)

Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3′-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species. 相似文献

2.

应用系统发育树分析DDBJ基因库中HBV基因序列的基因型 总被引：1，自引：0，他引：1

戴二黑杨瑞馥刘恒军宋亚军《生物技术通讯》2003,14(1):29-32,56

为了充分利用核酸库中的HBV序列信息,探讨DDBJ核酸库中HBV基因序列的基因分型,采用Clustal X（1.8）软件比较HBV基因序列前S区序列差异并产生系统发育树。通过对下载的1471条HBV基因序列进行系统分析。获得了228条前S/S区完整的HBV基因序列,其中有66条序列的基因型已被各种方法所证实。利用软件分析绘制了基于228条HBV前S区基因序列的系统发育树。66条已知基因型HBV基因序列在系统发育树上的分型与其原有基因型完全吻合。在228条HBV基因序列中,有207条序列分别属于A、B、C、D、E、F和G等7个基因型,但另外21条序列不能归属于上述7个基因型的任何一种,而且它们又分为彼此相互独立存在的两群,暂分别称之为未分型I和未分型Ⅱ,经比较未分型I、Ⅱ和其他7个基因型前S区核苷酸序列,发现未分型I、Ⅱ和D型前S区都有33个核苷酸缺失,但三者基因缺失片段的位置和形式各不相同,但其它六型前S区无大片段基因缺失。结果说明采用基于前S区的系统发育树基因分型分析方法正确可靠,除了现已证实的7个基因型外,尚可能存在另外两个新的HBV基因型。相似文献

3.

Pseudomonas aeruginosa isolates of distinct sub-genotypes exhibit similar potential of antimicrobial resistance by drugs exposure

Zhen-Hong Liu Yan Xu Li-Bo Duo Yu Liu Zhao-Zhen Xu Jane L. Burns Gui-Rong Liu Bao-Feng Yang Shu-Lin Liu 《Antonie van Leeuwenhoek》2013,103(4):797-807

Pseudomonas aeruginosa, a wide-spread opportunistic pathogen, often complicates clinical treatments due to its resistance to a large variety of antimicrobials, especially in immune compromised patients, occasionally leading to death. However, the resistance to antimicrobials varies greatly among the P. aeruginosa isolates, which raises a question on whether some sub-lineages of P. aeruginosa might have greater potential to develop antimicrobial resistance than others. To explore this question, we divided 160 P. aeruginosa isolates collected from cities of USA and China into distinct genotypes using I-CeuI, a special endonuclease that had previously been proven to reveal phylogenetic relationships among bacteria reliably due to the highly conserved 26-bp recognition sequence. We resolved 10 genotypes by I-CeuI analysis and further divided them into 82 sub-genotypes by endonuclease cleavage with SpeI. Eight of the 10 genotypes contained both multi-drug resistant (MDR) and less resistant isolates based on comparisons of their antimicrobial resistance profiles (ARPs). When the less resistant or susceptible isolates from different genotypes were exposed to eight individual antimicrobials, they showed similar potential to become resistant with minor exceptions. This is to our knowledge the first report to examine correlations between phylogenetic sub-lineages of P. aeruginosa and their potential to become resistant to antimicrobials. This study further alerts the importance and urgency of antimicrobial abuse control. 相似文献

4.

A quantitative genotype algorithm reflecting H5N1 Avian influenza niches 总被引：1，自引：0，他引：1

Wan XF Chen G Luo F Emch M Donis R 《Bioinformatics (Oxford, England)》2007,23(18):2368-2375

MOTIVATION: Computational genotyping analyses are critical for characterizing molecular evolutionary footprints, thus providing important information for designing the strategies of influenza prevention and control. Most of the current methods that are available are based on multiple sequence alignment and phylogenetic tree construction, which are time consuming and limited by the number of taxa. Arbitrarily defining genotypes further complicates the interpretation of genotyping results. METHODS: In this study, we describe a quantitative influenza genotyping algorithm based on the theory of quasispecies. First, the complete composition vector (CCV) was utilized to calculate the pairwise evolutionary distance between genotypes. Next, Hierarchical Bayesian Modeling using the Gibbs Sampling algorithm was applied to identify the segment genotype threshold, which is used to identify influenza segment genotype through a modularity calculation. The viral genotype was defined by combining eight segment genotypes based on the genetic reassortment feature of influenza A viruses. RESULTS: We applied this method for H5N1 avian influenza viruses and identified 107 niches among 283 viruses with a complete genome set. The diversity of viral genotypes, and their correlation with geographic locations suggests that these viruses form local niches after being introduced to a new ecological environment through poultry trade or bird migration. This novel method allows us to define genotypes in a robust, quantitative as well as hierarchical manner. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

5.

Conversion of array‐based single nucleotide polymorphic markers for use in targeted genotyping by sequencing in hexaploid wheat (Triticum aestivum)

下载免费PDF全文

Amanda J. Burridge Paul A. Wilkinson Mark O. Winfield Gary L. A. Barker Alexandra M. Allen Jane A. Coghill Christy Waterfall Keith J. Edwards 《Plant biotechnology journal》2018,16(4):867-876

Wheat breeders and academics alike use single nucleotide polymorphisms (SNP s) as molecular markers to characterize regions of interest within the hexaploid wheat genome. A number of SNP ‐based genotyping platforms are available, and their utility depends upon factors such as the available technologies, number of data points required, budgets and the technical expertise required. Unfortunately, markers can rarely be exchanged between existing and newly developed platforms, meaning that previously generated data cannot be compared, or combined, with more recently generated data sets. We predict that genotyping by sequencing will become the predominant genotyping technology within the next 5–10 years. With this in mind, to ensure that data generated from current genotyping platforms continues to be of use, we have designed and utilized SNP ‐based capture probes from several thousand existing and publicly available probes from Axiom® and KASP ? genotyping platforms. We have validated our capture probes in a targeted genotyping by sequencing protocol using 31 previously genotyped UK elite hexaploid wheat accessions. Data comparisons between targeted genotyping by sequencing, Axiom® array genotyping and KASP ? genotyping assays, identified a set of 3256 probes which reliably bring together targeted genotyping by sequencing data with the previously available marker data set. As such, these probes are likely to be of considerable value to the wheat community. The probe details, full probe sequences and a custom built analysis pipeline may be freely downloaded from the CerealsDB website (http://www.cerealsdb.uk.net/cerealgenomics/CerealsDB /sequence_capture.php). 相似文献

6.

Characterization of 19 single nucleotide polymorphism markers for coho salmon

C. T. SMITH L. PARK D. VANDOORNIK L. W. SEEB J. E. SEEB 《Molecular ecology resources》2006,6(3):715-720

We report 39 single nucleotide polymorphisms (SNPs) observed in 23 nuclear DNA sequences in coho salmon Oncorhynchus kisutch. High‐throughput genotyping assays based on the 5′‐nuclease reaction were developed for 17 of these nuclear SNPs and for two previously published mitochondrial DNA SNPs. Minor allele frequency differences (Δq) among collections were between 5.2% and 51.2%, resulting in per locus F_ST estimates of 0.00–0.24 with an average of 0.09. 相似文献

7.

Thirty‐two single nucleotide polymorphism markers for high‐throughput genotyping of sockeye salmon

CARITA M. ELFSTROM CHRISTIAN T. SMITH JAMES E. SEEB 《Molecular ecology resources》2006,6(4):1255-1259

We characterize 32 single nucleotide polymorphism genotyping assays for resolving genotypic variation in sockeye salmon Oncorhynchus nerka in the Pacific Rim. These assays are based on the 5′‐nuclease reaction and thus facilitate high‐throughput genotyping with minimal optimization time. Minor allele frequency differences (Δq) among collections were between 4.7% and 97.9%, resulting in per locus F_ST estimates of 0.02–0.71 with an average of 0.22. 相似文献

8.

An automated genotyping system for analysis of HIV-1 and other microbial sequences

de Oliveira T Deforche K Cassol S Salminen M Paraskevis D Seebregts C Snoeck J van Rensburg EJ Wensing AM van de Vijver DA Boucher CA Camacho R Vandamme AM 《Bioinformatics (Oxford, England)》2005,21(19):3797-3800

相似文献

9.

Thirty‐eight single nucleotide polymorphism markers for high‐throughput genotyping of chum salmon

CARITA M. ELFSTROM CHRISTIAN T. SMITH LISA W. SEEB 《Molecular ecology resources》2007,7(6):1211-1215

We characterize 38 single nucleotide polymorphism genotyping assays for chum salmon (Oncorhynchus keta), an important species for both commercial and subsistence fisheries in western Alaska. These assays are based on the 5′‐nuclease reaction and thus facilitate high‐throughput genotyping with minimal optimization time. Minor allele frequency differences (Δq) among collections were between 0.01 and 0.50 resulting in per locus F_ST estimates of 0.00–0.08 with an average of 0.03. 相似文献

10.

Discovery of a novel hsp65 genotype within Mycobacterium massiliense associated with the rough colony morphology

Kim BJ Yi SY Shim TS Do SY Yu HK Park YG Kook YH Kim BJ 《PloS one》2012,7(6):e38420

So far, genetic diversity among strains within Mycobacterium massiliense has rarely been studied. To investigate the genetic diversity among M. massiliense, we conducted phylogenetic analysis based on hsp65 (603-bp) and rpoB (711-bp) sequences from 65 M. massiliense Korean isolates. We found that hsp65 sequence analysis could clearly differentiate them into two distinct genotypes, Type I and Type II, which were isolated from 35 (53.8%) and 30 patients (46.2%), respectively. The rpoB sequence analysis revealed a total of four genotypes (R-I to R-IV) within M. massiliense strains, three of which (R-I, R-II and R-III) correlated with hsp65 Type I, and other (R-IV), which correlated with Type II. Interestingly, genotyping by the hsp65 method agreed well with colony morphology. Despite some exceptions, Type I and II correlated with smooth and rough colonies, respectively. Also, both types were completely different from one another in terms of MALDI-TOF mass spectrometry profiles of whole lipid. In addition, we developed PCR-restriction analysis (PRA) based on the Hinf I digestion of 644-bp hsp65 PCR amplicons, which enables the two genotypes within M. massiliense to be easily and reliably separated. In conclusion, two distinct hsp65 genotypes exist within M. massiliense strains, which differ from one another in terms of both morphology and lipid profile. Furthermore, our data indicates that Type II is a novel M. massiliense genotype being herein presented for the first time. The disparity in clinical traits between these two hsp65 genotypes needs to be exploited in the future study. 相似文献

11.

HCV Genotyping from NGS Short Reads and Its Application in Genotype Detection from HCV Mixed Infected Plasma

Ping Qiu Richard Stevens Bo Wei Fred Lahser Anita Y. M. Howe Joel A. Klappenbach Matthew J. Marton 《PloS one》2015,10(4)

Genotyping of hepatitis C virus (HCV) plays an important role in the treatment of HCV. As new genotype-specific treatment options become available, it has become increasingly important to have accurate HCV genotype and subtype information to ensure that the most appropriate treatment regimen is selected. Most current genotyping methods are unable to detect mixed genotypes from two or more HCV infections. Next generation sequencing (NGS) allows for rapid and low cost mass sequencing of viral genomes and provides an opportunity to probe the viral population from a single host. In this paper, the possibility of using short NGS reads for direct HCV genotyping without genome assembly was evaluated. We surveyed the publicly-available genetic content of three HCV drug target regions (NS3, NS5A, NS5B) in terms of whether these genes contained genotype-specific regions that could predict genotype. Six genotypes and 38 subtypes were included in this study. An automated phylogenetic analysis based HCV genotyping method was implemented and used to assess different HCV target gene regions. Candidate regions of 250-bp each were found for all three genes that have enough genetic information to predict HCV genotypes/subtypes. Validation using public datasets shows 100% genotyping accuracy. To test whether these 250-bp regions were sufficient to identify mixed genotypes, we developed a random primer-based method to sequence HCV plasma samples containing mixtures of two HCV genotypes in different ratios. We were able to determine the genotypes without ambiguity and to quantify the ratio of the abundances of the mixed genotypes in the samples. These data provide a proof-of-concept that this random primed, NGS-based short-read genotyping approach does not need prior information about the viral population and is capable of detecting mixed viral infection. 相似文献

12.

The impact of molecular systematics on hypotheses for the evolution of root nodule symbioses and implications for expanding symbioses to new host plant genera

Swensen Susan M. Mullin Beth C. 《Plant and Soil》1997,194(1-2):185-192

Current taxonomic schemes place plants that can participate in root nodule symbioses among disparate groups of angiosperms. According to the classification scheme of Cronquist (1981) which is based primarily on the analysis of morphological characters, host plants of rhizobial symbionts are placed in subclasses Rosidae and Hamamelidae, and those of Frankia are distributed among subclasses Rosidae, Hamamelidae, Magnoliidae and Dilleniidae. This broad phylogenetic distribution of nodulated plants has engendered the notion that nitrogen fixing endosymbionts, particularly those of actinorhizal plants, can interact with a very broad range of unrelated host plant genotypes. New angiosperm phylogenies based on DNA sequence comparisons reveal a markedly different relationship among nodulated plants and indicate that they form a more coherent group than has previously been thought (Chase et al., 1993; Swensen et al., 1994; Soltis et al., 1995). Molecular data support a single origin of the predisposition for root nodule symbiosis (Soltis et al., 1995) and at the same time support the occurrence of multiple origins of symbiosis within this group (Doyle, 1994; Swensen, 1996; Swensen and Mullin, In Press). 相似文献

13.

Genetic comparison of Bacillus anthracis and its close relatives using amplified fragment length polymorphism and polymerase chain reaction analysis 总被引：2，自引：0，他引：2

P. J. Jackson K. K. Hill M. T. Laker L. O. Ticknor P. Keim 《Journal of applied microbiology》1999,87(2):263-269

Amplified fragment length polymorphism (AFLP) analysis allows a rapid, relatively simple analysis of a large portion of a microbial genome, providing information about the species and its phylogenetic relationship to other microbes (Vos et al. 1995). The method simply surveys the genome for length and sequence polymorphisms. The AFLP pattern identified can be used for comparison to the genomes of other species. Unlike other methods, it does not rely on analysis of a single genetic locus that may bias the interpretation of results and does not require any prior knowledge of the targeted organism. Moreover, a standard set of reagents can be applied to any species without using species-specific information or molecular probes. We are using AFLP analysis to rapidly identify different bacterial species. A comparison of AFLP profiles generated from a large battery of Bacillus anthracis strains shows very little variability among different isolates (Keim et al. 1997). By contrast, there is a significant difference between AFLP profiles generated for any B. anthracis strain and even the most closely related Bacillus species. Sufficient variability is apparent among all known microbial species to allow phylogenetic analysis based on large numbers of genetically unlinked loci. These striking differences among AFLP profiles allow unambiguous identification of previously identified species and phylogenetic placement of newly characterized isolates relative to known species based on a large number of independent genetic loci. Data generated thus far show that the method provides phylogenetic analyses that are consistent with other widely accepted phylogenetic methods. However, AFLP analysis provides a more detailed analysis of the targets and samples a much larger portion of the genome. Consequently, it provides an inexpensive, rapid means of characterizing microbial isolates to further differentiate among strains and closely related microbial species. Such information cannot be rapidly generated by other means. AFLP sample analysis quickly generates a very large amount of molecular information about microbial genomes. However, this information cannot be analysed rapidly using manual methods. We are developing a large archive of electronic AFLP signatures that is being used to identify isolates collected from medical, veterinary, forensic and environmental samples. We are also developing the computational packages necessary to rapidly and unambiguously analyse the AFLP profiles and conduct a phylogenetic comparison of these data relative to information already in our database. We will use this archive and the associated algorithms to determine the species identity of previously uncharacterized isolates and place them phylogenetically relative to other microbes based on their AFLP signatures. This study provides significant new information about microbes with environmental, veterinary and medical significance. This information can be used in further studies to understand the relationships among these species and the factors that distinguish them from one another. It should also allow the identification of unique factors that contribute to important microbial traits, including pathogenicity and virulence. We are also using AFLP data to identify, isolate and sequence DNA fragments that are unique to particular microbial species and strains. The fragment patterns and sequence information provide insights into the complexity and organization of bacterial genomes relative to one another. They also provide the information necessary for the development of species-specific polymerase chain reaction primers that can be used to interrogate complex samples for the presence of B. anthracis, other microbial pathogens or their remnants. 相似文献

14.

Trait-to-gene: a computational method for predicting the function of uncharacterized genes

Levesque M Shasha D Kim W Surette MG Benfey PN 《Current biology : CB》2003,13(2):129-133

The function of unknown genes is often inferred from comparisons to well-characterized homologs. In this paper, we show that, even if all of the homologs of a gene are unannotated, its function may be deduced through phylogenetic profiling. We have designed a series of algorithms that make functional predictions of genes based on orthology and set theory, but our approach to predicting gene function requires no previous knowledge of homolog function. With this technique, we successfully identified 94% of the clusters of orthologous groups that are known to be involved in flagella development or function. As a test, we removed the function of three putative flagellar genes that had been previously uncharacterized in Bacillus subtilis. We observed a motility phenotype for two of these three genes. Thus, these algorithms allow for high-throughput functional prediction of genes beyond that provided by simple orthology-based annotation endeavors. 相似文献

15.

The Genographic Project public participation mitochondrial DNA database

Behar DM Rosset S Blue-Smith J Balanovsky O Tzur S Comas D Mitchell RJ Quintana-Murci L Tyler-Smith C Wells RS;Genographic Consortium 《PLoS genetics》2007,3(6):e104

The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor-based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool. 相似文献

16.

An optimized microsatellite marker set for detection of Metarhizium anisopliae genotype diversity on field and regional scales

Catherine Oulevey Franco Widmer Roland Klliker Jürg Enkerli 《Mycological Research》2009,113(9):1016-1024

Thirty-three Metarhizium anisopliae isolates sampled across Switzerland as well as 35 and 36 M. anisopliae isolates sampled from two field sites were assembled in three isolate collections. All isolates were analyzed using 27 newly developed and 14 previously published microsatellite markers. The 41 markers allowed for detection of 25 genotypes in the Swiss collection while 30 and 11 genotypes were detected in the two field collections. This indicated high genetic diversity on a regional as well as on a field scale. In order to improve genotyping efficiency, an optimized marker set, which allows discrimination of a large number of genotypes with as few markers as possible was developed. The optimized marker set consisted of 16 common markers, which provided resolution close to maximal resolution in all three collections (91–93 %). The results demonstrated that optimized marker sets have to be validated before large scale application to previously unassessed collections in order to avoid suboptimal resolution. This genetic tool will be valuable for analyses of genetic population structure of M. anisopliae in different habitats on a regional as well as on a field scale. 相似文献

17.

Tandem repeats analysis for the high resolution phylogenetic analysis of <Emphasis Type="Italic">Yersinia pestis</Emphasis>

C?Pourcel F?André-Mazeaud H?Neubauer F?Ramisse G?Vergnaud Email author 《BMC microbiology》2004,4(1):22

Background

Yersinia pestis, the agent of plague, is a young and highly monomorphic species. Three biovars, each one thought to be associated with the last three Y. pestis pandemics, have been defined based on biochemical assays. More recently, DNA based assays, including DNA sequencing, IS typing, DNA arrays, have significantly improved current knowledge on the origin and phylogenetic evolution of Y. pestis. However, these methods suffer either from a lack of resolution or from the difficulty to compare data. Variable number of tandem repeats (VNTRs) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses in a growing number of pathogens and have given promising results for Y. pestis as well.

Results

In this study we have genotyped 180 Y. pestis isolates by multiple locus VNTR analysis (MLVA) using 25 markers. Sixty-one different genotypes were observed. The three biovars were distributed into three main branches, with some exceptions. In particular, the Medievalis phenotype is clearly heterogeneous, resulting from different mutation events in the napA gene. Antiqua strains from Asia appear to hold a central position compared to Antiqua strains from Africa. A subset of 7 markers is proposed for the quick comparison of a new strain with the collection typed here. This can be easily achieved using a Web-based facility, specifically set-up for running such identifications.

Conclusion

Tandem-repeat typing may prove to be a powerful complement to the existing phylogenetic tools for Y. pestis. Typing can be achieved quickly at a low cost in terms of consumables, technical expertise and equipment. The resulting data can be easily compared between different laboratories. The number and selection of markers will eventually depend upon the type and aim of investigations.

相似文献

18.

High-resolution melting molecular signatures for rapid identification of human papillomavirus genotypes

TH Lee TS Wu CP Tseng JT Qiu 《PloS one》2012,7(8):e42051

Background

Genotyping of human papillomarvirus (HPV) is crucial for patient management in a clinical setting. This study accesses the combined use of broad-range real-time PCR and high-resolution melting (HRM) analysis for rapid identification of HPV genotypes.

Methods

Genomic DNA sequences of 8 high-risk genotypes (HPV16/18/39/45/52/56/58/68) were subject to bioinformatic analysis to select for appropriate PCR amplicon. Asymmetric broad-range real-time PCR in the presence of HRM dye and two unlabeled probes specific to HPV16 and 18 was employed to generate HRM molecular signatures for HPV genotyping. The method was validated via assessment of 119 clinical HPV isolates.

Results

A DNA fragment within the L1 region was selected as the PCR amplicon ranging from 215–221 bp for different HPV genotypes. Each genotype displayed a distinct HRM molecular signature with minimal inter-assay variability. According to the HRM molecular signatures, HPV genotypes can be determined with one PCR within 3 h from the time of viral DNA isolation. In the validation assay, a 91% accuracy rate was achieved when the genotypes were in the database. Concomitantly, the HRM molecular signatures for additional 6 low-risk genotypes were established.

Conclusions

This assay provides a novel approach for HPV genotyping in a rapid and cost-effective manner. 相似文献

19.

Phylogenetic diversity of fluorescent pseudomonads in agricultural soils from Korea

Kwon SW Kim JS Crowley DE Lim CK 《Letters in applied microbiology》2005,41(5):417-423

AIMS: To identify and compare the relative diversity and distribution of genotypes of culturable fluorescent pseudomonads from soils. METHODS AND RESULTS: Analysis of 160 isolates from seven soil samples using randomly amplified polymorphism DNA methods revealed 53 genotypes, which were subsequently identified by their 16S ribosomal DNA sequences. Phylogenetic analyses of the 53 genotypes along with 43 fluorescent pseudomonad type strains separated the genotypes into 10 distinct clusters that included two phylogenetic groups that were not represented by previously described type strains. CONCLUSIONS: The diversity of genotypes that was obtained from the soil samples was highly variable among the different soils and appeared to be associated with different soil management practices that also influence plant yields. SIGNIFICANCE AND IMPACT OF THE STUDY: The identification and phylogenetic analysis of these genotypes offers opportunities for study of phenotypic traits that may be associated within taxonomically related groups of fluorescent pseudomonad species and how these groups vary in relation to soil management practices. 相似文献

20.

Nephele: genotyping via complete composition vectors and MapReduce

Marc E Colosimo Matthew W Peterson Scott Mardis Lynette Hirschman 《Source code for biology and medicine》2011,6(1):1-10

Background

Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences.

Results

Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours.

Conclusions

We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage. 相似文献