首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.  相似文献   

2.
Geographical distances between host populations are key determinants of how many parasite species they share. In principle, decay in similarity should also occur with increasing distance along any other dimension that characterizes some form of separation between communities. Here, we apply the biogeographical concept of distance decay in similarity to ontogenetic changes in the metazoan parasite communities of three species of marine fish from the Atlantic coast of South America. Using differences in body length between all possible pairs of size classes as measures of ontogenetic distances, we find that, using an index of similarity (Bray-Curtis) that takes into account the abundance of each parasite species, the similarity in parasite communities showed a very clear decay pattern; using an index (Jaccard) based on presence/absence of species only, we obtained slightly weaker but nevertheless similar patterns. As we predicted, the slope of the decay relationship was significantly steeper in the fish Cynoscion guatucupa, which goes through clear ontogenetic changes in diet and therefore in exposure to parasites, than in the other species, Engraulis anchoita and Micropogonias furnieri, which maintain a roughly similar diet throughout their lives. In addition, we found that for any given ontogenetic distance, i.e. for a given length difference between two size classes, the similarity in parasite communities was almost always higher if they were adult size classes, and almost always lower if they were juvenile size classes. This, combined with comparisons among individual fish within size classes, shows that parasite communities in juvenile fish are variable and subject to stochastic effects. We propose the distance decay approach as a rigorous and quantitative method to measure rates of community change as a function of host age, and for comparisons across host species to elucidate the role of host ecology in the development of parasite assemblages.  相似文献   

3.

Background

Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms.

Methodology

Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews.

Conclusions

A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.  相似文献   

4.
Chargaff''s rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov–Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.  相似文献   

5.
Samples of the ‘Himantura uarnak’ species complex (H. leoparda, H. uarnak, H. undulata under their current definitions), mostly from the Coral Triangle, were analyzed using nuclear markers and mitochondrial DNA sequences. Genotypes at five intron loci showed four reproductively isolated clusters of individuals. The COI sequences showed four major mitochondrial lineages, each diagnostic of a cluster as defined by nuclear markers. No mitochondrial introgression was detected. The average Kimura-2 parameter nucleotide distance separating clades was 0.061–0.120 (net: 0.055–0.114), while the distance separating individuals within a clade was 0.002–0.008. Additional, partial cytochrome-b gene sequences were used to link these samples with previously published sequences of reference specimens of the three nominal species. One of the clusters was identified as H. undulata and another one, as H. uarnak, while two cryptic species were uncovered within the recently-described H. leoparda, challenging the current morphology-based taxonomy of species within the H. uarnak species complex.  相似文献   

6.
High-elevation cold environments are considered ideal places to test hypotheses about mechanisms of bacterial colonization and succession, and about bacterial biogeography. Debris-covered glaciers (glaciers whose ablation area is mainly covered by a continuous layer of rock debris fallen from the surrounding mountains) have never been investigated in this respect so far. We used the Illumina technology to analyse the V5 and V6 hypervariable regions of the bacterial 16S rRNA gene amplified from 38 samples collected in July and September 2009 at different distances from the terminus on two debris-covered glaciers (Miage and Belvedere—Italian Alps). Heterotrophic taxa-dominated communities and bacterial community structure changed according to ice ablation rate, organic carbon content of the debris and distance from the glacier terminus. Bacterial communities therefore change during downwards debris transport, and organic carbon of these recently exposed substrates is probably provided more by allochthonous deposition of organic matter than by primary production by autotrophic organisms. We also investigated whether phylotypes of the genus Polaromonas, which is ubiquitous in cold environments, do present a biogeographical distribution by analysing the sequences retrieved in this study together with others available in the literature. We found that the genetic distance among phylotypes increased with geographic distance; however, more focused analyses using discrete distance classes revealed that both sequences collected at sites <100 km and at sites 9400–13 500 km to each other were more similar than those collected at other distance classes. Evidences of biogeographic distribution of Polaromonas phylotypes were therefore contrasting.  相似文献   

7.
Correcting errors in synthetic DNA through consensus shuffling   总被引:4,自引:2,他引:4       下载免费PDF全文
Although efficient methods exist to assemble synthetic oligonucleotides into genes and genomes, these suffer from the presence of 1–3 random errors/kb of DNA. Here, we introduce a new method termed consensus shuffling and demonstrate its use to significantly reduce random errors in synthetic DNA. In this method, errors are revealed as mismatches by re-hybridization of the population. The DNA is fragmented, and mismatched fragments are removed upon binding to an immobilized mismatch binding protein (MutS). PCR assembly of the remaining fragments yields a new population of full-length sequences enriched for the consensus sequence of the input population. We show that two iterations of consensus shuffling improved a population of synthetic green fluorescent protein (GFPuv) clones from ~60 to >90% fluorescent, and decreased errors 3.5- to 4.3-fold to final values of ~1 error per 3500 bp. In addition, two iterations of consensus shuffling corrected a population of GFPuv clones where all members were non-functional, to a population where 82% of clones were fluorescent. Consensus shuffling should facilitate the rapid and accurate synthesis of long DNA sequences.  相似文献   

8.
9.
10.
The analysis of functional diversity and its dynamics in the environment is essential for understanding the microbial ecology and biogeochemistry of aquatic systems. Here we describe the development and optimization of a DNA microarray method for the detection and quantification of functional genes in the environment and report on their preliminary application to the study of the denitrification gene nirS in the Choptank River-Chesapeake Bay system. Intergenic and intragenic resolution constraints were determined by an oligonucleotide (70-mer) microarray approach. Complete signal separation was achieved when comparing unrelated genes within the nitrogen cycle (amoA, nifH, nirK, and nirS) and detecting different variants of the same gene, nirK, corresponding to organisms with two different physiological modes, ammonia oxidizers and denitrifying halobenzoate degraders. The limits of intragenic resolution were investigated with a microarray containing 64 nirS sequences comprising 14 cultured organisms and 50 clones obtained from the Choptank River in Maryland. The nirS oligonucleotides covered a range of sequence identities from approximately 40 to 100%. The threshold values for specificity were determined to be 87% sequence identity and a target-to-probe perfect match-to-mismatch binding free-energy ratio of 0.56. The lower detection limit was 10 pg of DNA (equivalent to approximately 107 copies) per target per microarray. Hybridization patterns on the microarray differed between sediment samples from two stations in the Choptank River, implying important differences in the composition of the denitirifer community along an environmental gradient of salinity, inorganic nitrogen, and dissolved organic carbon. This work establishes a useful set of design constraints (independent of the target gene) for the implementation of functional gene microarrays for environmental applications.  相似文献   

11.
Phylogenetic analysis of 16S ribosomal DNA (rDNA) clones obtained by PCR from uncultured bacteria inhabiting a wide range of environments has increased our knowledge of bacterial diversity. One possible problem in the assessment of bacterial diversity based on sequence information is that PCR is exquisitely sensitive to contaminating 16S rDNA. This raises the possibility that some putative environmental rRNA sequences in fact correspond to contaminant sequences. To document potential contaminants, we cloned and sequenced PCR-amplified 16S rDNA fragments obtained at low levels in the absence of added template DNA. 16S rDNA sequences closely related to the genera Duganella (formerly Zoogloea), Acinetobacter, Stenotrophomonas, Escherichia, Leptothrix, and Herbaspirillum were identified in contaminant libraries and in clone libraries from diverse, generally low-biomass habitats. The rRNA sequences detected possibly are common contaminants in reagents used to prepare genomic DNA. Consequently, their detection in processed environmental samples may not reflect environmentally relevant organisms.  相似文献   

12.

Background

The intra- and inter-species genetic diversity of bacteria and the absence of ‘reference’, or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia.

Methods

A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization.

Results

The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as ‘centroids’ in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578.

Conclusion

The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra-species variability.  相似文献   

13.
Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.  相似文献   

14.
E. coli Integration host factor (IHF) condenses the bacterial nucleoid by wrapping DNA. Previously, we showed that DNA flexibility compensates for structural characteristics of the four consensus recognition elements associated with specific binding (Aeling et al., J. Biol. Chem. 281, 39236–39248, 2006). If elements are missing, high-affinity binding occurs only if DNA deformation energy is low. In contrast, if all elements are present, net binding energy is unaffected by deformation energy. We tested two hypotheses for this observation: in complexes containing all elements, (1) stiff DNA sequences are less bent upon binding IHF than flexible ones; or (2) DNA sequences with differing flexibility have interactions with IHF that compensate for unfavorable deformation energy. Time-resolved Förster resonance energy transfer (FRET) shows that global topologies are indistinguishable for three complexes with oligonucleotides of different flexibility. However, pressure perturbation shows that the volume change upon binding is smaller with increasing flexibility. We interpret these results in the context of Record and coworker's model for IHF binding (J. Mol. Biol. 310, 379–401, 2001). We propose that the volume changes reflect differences in hydration that arise from structural variation at IHF–DNA interfaces while the resulting energetic compensation maintains the same net binding energy.  相似文献   

15.
16.
The Sichuan snub-nosed monkey (Rhinopithecus roxellanae) is a species endemic to China, and its distribution is the widest among all snub-nosed monkeys in China. To clarify whether there is subspecific differentiation within this species, we determined partial sequences of the cytochrome-b gene from four populations ofR. roxellanae. First, 402bps of the partial sequences fromR. roxellanae were compared with those fromR. bieti andR. avunculus, and the phylogenetic tree was constructed by the neighbor-joining method. The genetic distance was only 0–0.002 among the four populations, and their sequences constituted a monophyletic group. Further, comparison of longer sequences (735bp) among the four populations revealed that there were only four substitutions and the genetic distance was only 0.001–0.005 among them. Thus, we suggest that, at least on mtDNA phylogeny, the difference among the four populations does not reach the subspecies level, and that this species should be recognized as a monotypic species.  相似文献   

17.
Molecular phylogenies based on chloroplast gene rps4 sequences and nuclear ribosomal ITS sequences have been generated to investigate relationships among species and putative segregates in Plagiochila (Plagiochilaceae), the largest genus of leafy liverworts. About a fourth of the ca. 450 accepted binomials of Plagiochilaceae are included in these phylogenetic analyses, several represented by multiple accessions. A clade with Chiastocaulon, Pedinophyllum, and Plagiochilion is placed sister to a clade with numerous accessions of Plagiochila. Plagiochila pleurata and P. fruticella are resolved sister to the remainder of Plagiochilaceae and transferred to the new Australasian genus Proskauera which differs from all other Plagiochilaceae by the occurrence of spherical leaf papillae. The historical biogeography of Plagiochilaceae is explored based on the reconstructions of the phylogeny, biogeographic patterns and diversification time estimates. The results indicate that the current distribution of Plagiochilaceae cannot be explained exclusively by Gondwanan vicariance. A more feasible explanation of the range is a combination of short distance dispersal, rare long distance dispersal events, extinction, recolonization and diversification.  相似文献   

18.
We investigated the phylogeography and subspecies classification of the ostrich (Struthio camelus) by assessing patterns of variation in mitochondrial DNA control region (mtDNA-CR) sequence and across fourteen nuclear microsatellite loci. The current consensus taxonomy of S. camelus names five subspecies based on morphology, geographic range, mtDNA restriction fragment length polymorphism and mtDNA-CR sequence analysis: S. c. camelus, S. c. syriacus, S. c. molybdephanes, S. c. massaicus and S. c. australis. We expanded a previous mtDNA dataset from 18 individual mtDNA-CR sequences to 123 sequences, including sequences from all five subspecies. Importantly, these additional sequences included 43 novel sequences of the red-necked ostrich, S. c. camelus, obtained from birds from Niger. Phylogeographic reconstruction of these sequences matches previous results, with three well-supported clades containing S. c. camelus/syriacus, S. c. molybdophanes, and S. c. massaicus/australis, respectively. The 14 microsatellite loci assessed for 119 individuals of four subspecies (all but S. c. syriacus) showed considerable variation, with an average of 13.4 (±2.0) alleles per locus and a mean observed heterozygosity of 55.7 (±5.3)%. These data revealed high levels of variation within most subspecies, and a structure analysis revealed strong separation between each of the four subspecies. The level of divergence across both marker types suggests the consideration of separate species status for S. c. molybdophanes, and perhaps also for S. c. camelus/syriacus. Both the mtDNA-CR and microsatellite analyzes also suggest that there has been no recent hybridization between the subspecies. These findings are of importance for management of the highly endangered red-necked subspecies (S. c. camelus) and may warrant its placement onto the IUCN red list of threatened animals.  相似文献   

19.
Classification of high-throughput genomic data is a powerful method to assign samples to subgroups with specific molecular profiles. Consensus partitioning is the most widely applied approach to reveal subgroups by summarizing a consensus classification from a list of individual classifications generated by repeatedly executing clustering on random subsets of the data. It is able to evaluate the stability of the classification. We implemented a new R/Bioconductor package, cola, that provides a general framework for consensus partitioning. With cola, various parameters and methods can be user-defined and easily integrated into different steps of an analysis, e.g., feature selection, sample classification or defining signatures. cola provides a new method named ATC (ability to correlate to other rows) to extract features and recommends spherical k-means clustering (skmeans) for subgroup classification. We show that ATC and skmeans have better performance than other commonly used methods by a comprehensive benchmark on public datasets. We also benchmark key parameters in the consensus partitioning procedure, which helps users to select optimal parameter values. Moreover, cola provides rich functionalities to apply multiple partitioning methods in parallel and directly compare their results, as well as rich visualizations. cola can automate the complete analysis and generates a comprehensive HTML report.  相似文献   

20.

Key message

The number of SNPs required for QTL discovery is justified by the distance at which linkage disequilibrium has decayed. Simulations and real potato SNP data showed how to estimate and interpret LD decay.

Abstract

The magnitude of linkage disequilibrium (LD) and its decay with genetic distance determine the resolution of association mapping, and are useful for assessing the desired numbers of SNPs on arrays. To study LD and LD decay in tetraploid potato, we simulated autotetraploid genotypes and used it to explore the dependence on: (1) the number of haplotypes in the population (the amount of genetic variation) and (2) the percentage of haplotype specific SNPs (hs-SNPs). Several estimators for short-range LD were explored, such as the average r 2, median r 2, and other percentiles of r 2 (80, 90, and 95 %). For LD decay, we looked at LD½,90, the distance at which the short-range LD is halved when using the 90 % percentile of r 2 at short range, as estimator for LD. Simulations showed that the performance of various estimators for LD decay strongly depended on the number of haplotypes, although the real value of LD decay was not influenced very much by this number. The estimator LD½,90 was chosen to evaluate LD decay in 537 tetraploid varieties. LD½,90 values were 1.5 Mb for varieties released before 1945 and 0.6 Mb in varieties released after 2005. LD½,90 values within three different subpopulations ranged from 0.7 to 0.9 Mb. LD½,90 was 2.5 Mb for introgressed regions, indicating large haplotype blocks. In pericentromeric heterochromatin, LD decay was negligible. This study demonstrates that several related factors influencing LD decay could be disentangled, that no universal approach can be suggested, and that the estimation of LD decay has to be performed with great care and knowledge of the sampled material.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号