首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.  相似文献   

2.
594 fish genomes have been sequenced in past two decades, this represents 1.85% of the total reported fish species (32,000). Despite this no study represents the trends and only some studies have delved into how the genome size (GS) of the genomes are shaped by species taxonomy. However, all these studies have used data obtained by traditional cytometric methods and also have largely disregarded other genome attributes namely GC, number of chromosomes (CR), number of genes (GE), and protein count (PC). The present study used the most current data on genome attributes of fishes as generated by the whole genome sequencing projects to understand the trends, effect of taxonomy on the genome attributes (GS, GC, CR, GE, and PC) and the interrelation of genome attributes. The trends states that maximum number of fish genomes were sequenced in year 2020, order Cichliformes represents the highest number of published genomes, Illumina is the most used technology for sequencing fish genomes, etc. Our analyses exhibit some concrete trends for fishes as a whole and indicated a strong selection for smaller genomes among all vertebrates and a strong effect of taxonomy on all genome attributes. It also provides clear insights that the fish GS is significantly different from birds, amphibians, reptiles, mammals and insects while the GC only varied from insects. An inverse relation was observed between the GS and GC, and a direct relation was observed between the GS and CR, GE and PC. The results also signify that the per MB value of all the genome attributes decline with increasing GS.  相似文献   

3.
The genomic DNA-DNA hybridization (DDH) method has been widely used as a practical method for the determination of phylogenetic relationships between closely related biological strains. Traditional DDH methods have serious limitations including low reproducibility, a high background and a time-consuming procedure. The DDH method using a genome-probing microarray (GPM) has been recently developed to complement conventional methods and could be used to overcome the limitations that are typically encountered. It is necessary to compare the GPM-based DDH method to the conventional methods before using the GPM for the estimation of genomic similarities since all of the previous scientific data have been entirely dependent on conventional DDH methods. In order to address this issue we compared the DDH values obtained using the GPM, microplate and nylon membrane methods to multi-locus sequence typing (MLST) data for 9 Salmonella genomes and an Escherichia coli type strain. The results showed that the genome similarity values and the degrees of standard deviation obtained using the GPM method were lower than those obtained with the microplate and nylon membrane methods. The dendrogram from the cluster analysis of GPM DDH values was consistent with the phylogenetic tree obtained from the multi-locus sequence typing (MLST) data but was not similar to those obtained using the microplate and nylon membrane methods. Although the signal intensity had to be maximal when the targets were hybridized to their own probe, the methods using membranes and microplates frequently produced higher signals in the heterologous hybridizations than those obtained in the homologous hybridizations. Only the GPM method produced the highest signal intensity in homologous hybridizations. These results show that the GPM method can be used to obtain results that are more accurate than those generated by the other methods tested.  相似文献   

4.
Microsporidia are obligatory intracellular parasites related to fungi and since their discovery their classification and origin has been controversial due to their unique morphology. Early taxonomic studies of microsporidia were based on ultrastructural spore features, characteristics of their life cycle and transmission modes. However, taxonomy and phylogeny based solely on these characteristics can be misleading. SSU rRNA is a traditional marker used in taxonomical classifications, but the power of SSU rRNA to resolve phylogenetic relationships between microsporidia is considered weak at the species level, as it may not show enough variation to distinguish closely related species. Overall genome relatedness indices (OGRI), such as average nucleotide identity (ANI), allows fast and easy-to-implement comparative measurements between genomes to assess species boundaries in prokaryotes, with a 95% cutoff value for grouping genomes of the same species. Due to the increasing availability of complete genomes, metrics of genome relatedness have been applied for eukaryotic microbes taxonomy such as microsporidia. However, the distribution of ANI values and cutoff values for species delimitation have not yet been fully tested in microsporidia. In this study we examined the distribution of ANI values for 65 publicly available microsporidian genomes and tested whether the 95% cutoff value is a good estimation for circumscribing species based on their genetic relatedness.  相似文献   

5.
Taxonomy in the second decade of the 21st century is benefiting from technological advances in molecular microbiology, especially those related to genomics. Gene and genome databases are significantly increasing due to intense research activities in the field of molecular ecology and genomics. Taxa, and especially species, are tailored by means of the recognition of a phylogenetic, genomic and phenotypic coherence that reveal their uniqueness in the classification schema. Phylogenetic coherence is mainly revealed by means of 16S rRNA gene analyses for which curated databases such as EzTaxon and LTP provide a valuable tool for tree reconstruction to taxonomy users. On the other hand, in silico full or partial genomic sequence comparisons are called on to substitute cumbersome techniques such as DNA-DNA hybridization (DDH) to genomically circumscribe species. DDH similarity values around 70% would be equivalent to ANI values of 96%. Finally, finding an exclusive phenotypic property for the taxa to be classified is of paramount relevance to producing an operative and predictive classification system. The current methods used for taxonomic classification require significant laboratory experimentation, and generally will not produce interactive databases. The new high-throughput metabolomic technologies, such as ICR-FT and MALDI-TOF mass spectrometry methods, open the door to the construction of metabolic databases for taxonomic purposes. It is to be foreseen that, in the future, taxonomists will benefit significantly from public databases speeding up the classification process. However, serious effort will be needed to harmonize them and to prevent inaccurate material.  相似文献   

6.
The genomic DNA–DNA hybridization (DDH) method has been widely used as a practical method for the determination of phylogenetic relationships between closely related biological strains. Traditional DDH methods have serious limitations including low reproducibility, a high background and a time-consuming procedure. The DDH method using a genome-probing microarray (GPM) has been recently developed to complement conventional methods and could be used to overcome the limitations that are typically encountered. It is necessary to compare the GPM-based DDH method to the conventional methods before using the GPM for the estimation of genomic similarities since all of the previous scientific data have been entirely dependent on conventional DDH methods. In order to address this issue we compared the DDH values obtained using the GPM, microplate and nylon membrane methods to multi-locus sequence typing (MLST) data for 9 Salmonella genomes and an Escherichia coli type strain. The results showed that the genome similarity values and the degrees of standard deviation obtained using the GPM method were lower than those obtained with the microplate and nylon membrane methods. The dendrogram from the cluster analysis of GPM DDH values was consistent with the phylogenetic tree obtained from the multi-locus sequence typing (MLST) data but was not similar to those obtained using the microplate and nylon membrane methods. Although the signal intensity had to be maximal when the targets were hybridized to their own probe, the methods using membranes and microplates frequently produced higher signals in the heterologous hybridizations than those obtained in the homologous hybridizations. Only the GPM method produced the highest signal intensity in homologous hybridizations. These results show that the GPM method can be used to obtain results that are more accurate than those generated by the other methods tested.  相似文献   

7.
DNA–DNA hybridizations (DDH) play a key role in microbial species discrimination in cases when 16S rRNA gene sequence similarities are 97 % or higher. Using real-world 16S rRNA gene sequences and DDH data, we here re-investigate whether or not, and in which situations, this threshold value might be too conservative. Statistical estimates of these thresholds are calculated in general as well as more specifically for a number of phyla that are frequently subjected to DDH. Among several methods to infer 16S gene sequence similarities investigated, most of those routinely applied by taxonomists appear well suited for the task. The effects of using distinct DDH methods also seem to be insignificant. Depending on the investigated taxonomic group, a threshold between 98.2 and 99.0 % appears reasonable. In that way, up to half of the currently conducted DDH experiments could safely be omitted without a significant risk for wrongly differentiated species.  相似文献   

8.
图像配准是图像处理的一个重要技术,可用于分析两幅图像之间的相似度。本文提出了一种基于图像配准分析物种进化关系的新方法:首先利用一阶马尔可夫链方法计算不同基因组序列的寡聚核苷酸转移概率矩阵;然后将转移概率矩阵转换为彩色图像矩阵,并绘制物种两两之间彩色图像矩阵的联合直方图;最后分析联合直方图点集的分布情况,引入直方图点集的散度公式,将其作为相似性测度的标准,从而鉴定物种亲缘关系的远近。100种细菌全基因组的计算结果表明,相较于单基因法或基于基因组寡聚核苷酸频率组分差异信息的方法,本文提出的新方法具有更高的准确度和分辨力,它不仅能够很好地分辨科以下的分类单元,而且对科以上的分类单元同样具有较好的区分效果。该方法有望发展成为物种鉴定及系统发育推断的有效手段。  相似文献   

9.
Streptomyces hygroscopicus and related species are the most well known candidate producers of antibiotics and many other industrially and agronomically important secondary metabolites in the genus Streptomyces. Multilocus sequence analysis (MLSA) has shown to be a powerful and pragmatic molecular method for unraveling streptomycete diversities. In this investigation, a multilocus phylogeny of 58 representatives of the S. hygroscopicus 16S rRNA gene clade including S. violaceusniger and related species was examined. The result demonstrated that the MLSA data were helpful in defining members of the S. hygroscopicus clade, providing further evidence that the MLSA scheme of five housekeeping genes (atpD, gyrB, recA, rpoB and trpB) is a valuable alternative for creating and maintaining operational protocols for the Streptomyces species assignment. DNA-DNA hybridization (DDH) between strains with representative MLSA evolutionary distances, combined with previous data from S. griseus and S. albidoflavus clades, revealed a high correlation between MLSA and DDH, and sustains that the five-gene nucleotide sequence distance of 0.007 could be considered as the species cut-off for the whole genus. This significant correlation thus makes the MLSA scheme applicable to construction of a theory-based taxonomy for both ecology and bioprospecting of streptomycetes. Based on the MLSA and DDH data, as well as phenotypic characteristics, 10 species and three subspecies of the S. hygroscopicus clade are considered to be later heterotypic synonyms of eight genomic species, and Streptomyces glebosus sp. nov., comb. nov. (type strain CGMCC 4.1873(T)=LMG 19950(T)=DSM 40823(T)) and Streptomyces ossamyceticus sp. nov., comb. nov. (type strain CGMCC 4.1866(T)=LMG 19951(T)=DSM 40824(T)) are also proposed.  相似文献   

10.
Amplified fragment length polymorphism (AFLP) analysis allows a rapid, relatively simple analysis of a large portion of a microbial genome, providing information about the species and its phylogenetic relationship to other microbes (Vos et al. 1995). The method simply surveys the genome for length and sequence polymorphisms. The AFLP pattern identified can be used for comparison to the genomes of other species. Unlike other methods, it does not rely on analysis of a single genetic locus that may bias the interpretation of results and does not require any prior knowledge of the targeted organism. Moreover, a standard set of reagents can be applied to any species without using species-specific information or molecular probes. We are using AFLP analysis to rapidly identify different bacterial species. A comparison of AFLP profiles generated from a large battery of Bacillus anthracis strains shows very little variability among different isolates (Keim et al. 1997). By contrast, there is a significant difference between AFLP profiles generated for any B. anthracis strain and even the most closely related Bacillus species. Sufficient variability is apparent among all known microbial species to allow phylogenetic analysis based on large numbers of genetically unlinked loci. These striking differences among AFLP profiles allow unambiguous identification of previously identified species and phylogenetic placement of newly characterized isolates relative to known species based on a large number of independent genetic loci. Data generated thus far show that the method provides phylogenetic analyses that are consistent with other widely accepted phylogenetic methods. However, AFLP analysis provides a more detailed analysis of the targets and samples a much larger portion of the genome. Consequently, it provides an inexpensive, rapid means of characterizing microbial isolates to further differentiate among strains and closely related microbial species. Such information cannot be rapidly generated by other means. AFLP sample analysis quickly generates a very large amount of molecular information about microbial genomes. However, this information cannot be analysed rapidly using manual methods. We are developing a large archive of electronic AFLP signatures that is being used to identify isolates collected from medical, veterinary, forensic and environmental samples. We are also developing the computational packages necessary to rapidly and unambiguously analyse the AFLP profiles and conduct a phylogenetic comparison of these data relative to information already in our database. We will use this archive and the associated algorithms to determine the species identity of previously uncharacterized isolates and place them phylogenetically relative to other microbes based on their AFLP signatures. This study provides significant new information about microbes with environmental, veterinary and medical significance. This information can be used in further studies to understand the relationships among these species and the factors that distinguish them from one another. It should also allow the identification of unique factors that contribute to important microbial traits, including pathogenicity and virulence. We are also using AFLP data to identify, isolate and sequence DNA fragments that are unique to particular microbial species and strains. The fragment patterns and sequence information provide insights into the complexity and organization of bacterial genomes relative to one another. They also provide the information necessary for the development of species-specific polymerase chain reaction primers that can be used to interrogate complex samples for the presence of B. anthracis, other microbial pathogens or their remnants.  相似文献   

11.

Background  

Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study.  相似文献   

12.
Given the considerable promise whole-genome sequencing offers for phylogeny and classification, it is surprising that microbial systematics and genomics have not yet been reconciled. This might be due to the intrinsic difficulties in inferring reasonable phylogenies from genomic sequences, particularly in the light of the significant amount of lateral gene transfer in prokaryotic genomes. However, recent studies indicate that the species tree and the hierarchical classification based on it are still meaningful concepts, and that state-of-the-art phylogenetic inference methods are able to provide reliable estimates of the species tree to the benefit of taxonomy. Conversely, we suspect that the current lack of completely sequenced genomes for many of the major lineages of prokaryotes and for most type strains is a major obstacle in progress towards a genome-based classification of microorganisms. We conclude that phylogeny-driven microbial genome sequencing projects such as the Genomic Encyclopaedia of Archaea and Bacteria (GEBA) project are likely to rectify this situation.  相似文献   

13.
The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR.  相似文献   

14.
Identification of species has long been done by phenotype-based methodologies. Recently, genotype-based species identification has been shown to be possible by way of Genome profiling, which is based on a temperature gradient gel electrophoresis (TGGE) analysis of random PCR products. However, the results, though sufficient in information, provided by genome profiling were complicated and difficult to deal with objectively. To cope with this, a technology of utilizing species identification dots (spiddos), which corresponds to structural transition points of DNAs, was introduced. Pattern similarity score (PaSS), derived from spiddos, was shown to be usable for quantitatively measuring the closeness between genomes. This was demonstrated with the experiments applied to the genomes of Escherichia coli O157:H7 (19 strains). The same genomes were also examined by sequencing and RFLP methods in order to compare the effectiveness of these three methods. As a result, the spiddos method was shown to give reasonable results and to be the most advantageous for measuring the closeness between species in general. This means that spiddos is pushing the heavy gate open for genome microbiology.  相似文献   

15.
Composition Vector Tree (CVTree) is an alignment-free algorithm to infer phylogenetic relationships from genome sequences. It has been successfully applied to study phylogeny and taxonomy of viruses, prokaryotes, and fungi based on the whole genomes, as well as chloroplast genomes, mitochondrial genomes, and metagenomes. Here we presented the standalone software for the CVTree algorithm. In the software, an extensible parallel workflow for the CVTree algorithm was designed. Based on the workflow, new alignment-free methods were also implemented. And by examining the phylogeny and taxonomy of 13,903 prokaryotes based on 16S rRNA sequences, we showed that CVTree software is an efficient and effective tool for studying phylogeny and taxonomy based on genome sequences. The code of CVTree software can be available at https://github.com/ghzuo/cvtree.  相似文献   

16.
Molecular chaperones are a wide group of unrelated protein families whose role is to assist others proteins. Comparably, under environmental stress, stress proteins behave as biocatalysts of protein stabilization. Stress proteins include a large class of proteins that were originally termed heat shock proteins (HSPs) due to their initial discovery in tissues exposed to elevated temperatures. Many, but not all, stress proteins and HSPs are molecular chaperones. Moreover, not all HSPs are derivable from stress. HSPs are structurally diversified by the contribution of various domains having specific roles. HSPs have been grouped, mainly on the basis of their molecular masses, into specific families that include small HSPs (sHSPs)/alpha-crystallins, HSP10s, HSP40s, HSP60s, HSP70s, HSP90s, HSP100s and HSP110s. The names of these major families are historical artefacts with limited information content. Using the current databases, names and proteic domains of many molecular chaperones in different species were analyzed. Although traditional names of HSPs are trivial, it is unrealistic to suggest replacing them, because they are preferred and widely used. Here we suggest that these traditional names be chaperoned, in silico, by a systematic nomenclature. Thus, for example, with the same intent of use of [trioxygen: O3] for ozone, we propose here C7HSP70[Ehsa]ER-P11021 for GRP78 (78 kDa endoplasmic Human molecular chaperone in HSP70 superfamily with P11021 as its accession number in the database of the National Center for Biotechnology Information (NCBI)). The proposed systematic computer-oriented naming and classification method is designed for HSPs and also their partners based on the number of amino acids, domain structure, phylogenetic domain, localization in the cell and accession number as stated in the NCBI. Arabidopsis thaliana was analyzed as a model, because it contains a large number of various HSPs localized in several organelles. Overall, this naming system helps in building, optimizing and managing a novel online database entirely devoted to HSPs. The purported taxonomy, coupled with the newly constructed database, can contribute to studies involving large amounts of stored data on HSPs.  相似文献   

17.
18.
Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation.  相似文献   

19.
In this review we discuss the use of non-coding DNA at the intraspecific level in plants. Both nuclear and organelle non-coding regions are widely used in interspecific phylogenetic approaches. However, they are also valuable in analyses on the intraspecific level. Besides taxonomy, that is, defining subspecies or varieties, large fields for the application of non-coding DNA are population genetic and phylogeographic studies. Population genetics tries to explain the genetic patterns within species mostly by the amount of extant gene flow among populations, while phylogeography explicitly tries to reconstruct historic events. Depending on the study different molecular markers can be used, varying between very fast evolving microsatellites or some more slowly changing regions like intergenic spacers and introns. Here, we focus mainly on the use of non-coding regions in phylogeographic analyses. Mostly used in this context are regions of the genomes of the chloroplasts and mitochondria. In phylogeography, the correct estimation of allele or haplotype relationships is particularly important. As tree-based methods are mostly insufficient to depict relationships within species, network approaches are better suitable to infer gene or locus genealogies. Problematic for phylogeographic studies are alleles shared among multiple species, which could result from either hybridization or incomplete lineage sorting. Especially the latter can severely influence the interpretation of the phylogeographic patterns. Therefore, it seems necessary for us to also include close relatives of the species under study in phylogeographic analyses. Not only the sample design but also the analysis methods are currently changing, as some new methods such as statistical phylogeography were emerging recently and widely used methods like nested clade analysis might not be reliable in every case. During the last few years, a multitude of studies were published, which mainly analyzed phylogeographic patterns in European and North American plants. Phylogeographic studies in other regions of the earth are still comparably rare, although questions like the influence of the ice age on the vegetation in the tropics or southern hemisphere are still open and phylogeography provides an excellent remedy to answer them.  相似文献   

20.
Ciliates are a large group of ubiquitous and highly diverse single-celled eukaryotes that play an essential role in the functioning of microbial food webs. However, their genomic diversity is far from clear due to the need to develop cultivation methods for most species, so most research is based on wild organisms that almost invariably contain contaminants. Here we establish an integrated Genome Decontamination Pipeline (iGDP) that combines homology search, telomere reads-assisted and clustering approaches to filter contaminated ciliate genome assemblies from wild specimens. We benchmarked the performance of iGDP using genomic data from a contaminated ciliate culture and the results showed that iGDP could recall 91.9% of the target sequences with 96.9% precision. We also used a synthetic dataset to offer guidelines for the application of iGDP in the removal of various groups of contaminants. Compared with several popular metagenome binning tools, iGDP could show better performance. To further validate the effectiveness of iGDP on real-world data, we applied it to decontaminate genome assemblies of three wild ciliate specimens and obtained their genomes with high quality comparable to that of previously well-studied model ciliate genomes. It is anticipated that the newly generated genomes and the established iGDP method will be valuable community resources for detailed studies on ciliate biodiversity, phylogeny, ecology and evolution. The pipeline ( https://github.com/GWang2022/iGDP ) can be implemented automatically to reduce manual filtering and classification and may be further developed to apply to other microeukaryotes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号