首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In temperate forest soils, filamentous ectomycorrhizal and saprotrophic fungi affiliated to the Agaricomycetes and Pezizomycotina contribute to key biological processes. The diversity of soil fungal communities is usually estimated by studying molecular markers such as nuclear ribosomal gene regions amplified from soil-extracted DNA. However, this approach only reveals the presence of the corresponding genomic DNA in the soil sample and may not reflect the diversity of the metabolically active species. To circumvent this problem, we investigated the performance of the mitochondrial cytochrome c oxidase 1 (COX1)-encoding gene as a fungal molecular marker for environmental RNA-based studies. We designed PCR primers to specifically amplify Agaricomycetes and Pezizomycotina COX1 partial sequences and amplified them from both soil DNA and reverse-transcribed soil RNA. As a control, we also amplified the nuclear internal transcribed spacer ribosomal region from soil DNA. Fungal COX1 sequences were readily amplified from soil-extracted nucleic acids and were not significantly contaminated by nontarget sequences. We show that the relative abundance of fungal taxonomic groups differed between the different sequence data sets, with for example ascomycete COX1 sequences being more abundant among sequences amplified from soil DNA than from soil cDNAs.  相似文献   

2.
A major part of the barcoding of life problem is assigning newly sequenced or sampled individuals to existing groups that are preidentified externally (by a taxonomist, for example). This problem involves evaluating the statistical evidence towards associating a sequence from a new individual with one group or another. The main concern of our current research is to perform this task in a fast and accurate manner. To accomplish this we have developed a model-based, decision-theoretic framework based on the coalescent theory. Under this framework, we utilized both distance and the posterior probability of a group, given the sequences from members of this group and the sequence from a newly sampled individual to assign this new individual. We believe that this approach makes efficient use of the available information in the data. Our preliminary results indicated that this approach is more accurate than using a simple measure of distance for assignment.  相似文献   

3.
4.
In this paper we have addressed the problem of analysing Next Generation Sequencing samples with an expected large biodiversity content. We analysed several well-known 16S rRNA datasets from experimental samples, including both large and short sequences, in numbers of tens of thousands, in addition to carefully crafted synthetic datasets containing more than 7000 OTUs. From this data analysis several patterns were identified and used to develop new guidelines for experimentation in conditions of high biodiversity. We analysed the suitability of different clustering packages for these type of situations, the problem of even sampling, the relative effectiveness of Chao1 and ACE estimators as well as their effect on sampling size for a variety of population distributions. As regards practical analysis procedures, we advocated an approach that retains as much high-quality experimental data as possible. By carefully applying selection rules combining the taxonomic assignment with clustering strategies, we derived a set of recommendations for ultra-sequencing data analysis at high biodiversity levels.  相似文献   

5.
Metabarcoding of environmental samples on second‐generation sequencing platforms has rapidly become a valuable tool for ecological studies. A fundamental assumption of this approach is the reliance on being able to track tagged amplicons back to the samples from which they originated. In this study, we address the problem of sequences in metabarcoding sequencing outputs with false combinations of used tags (tag jumps). Unless these sequences can be identified and excluded from downstream analyses, tag jumps creating sequences with false, but already used tag combinations, can cause incorrect assignment of sequences to samples and artificially inflate diversity. In this study, we document and investigate tag jumping in metabarcoding studies on Illumina sequencing platforms by amplifying mixed‐template extracts obtained from bat droppings and leech gut contents with tagged generic arthropod and mammal primers, respectively. We found that an average of 2.6% and 2.1% of sequences had tag combinations, which could be explained by tag jumping in the leech and bat diet study, respectively. We suggest that tag jumping can happen during blunt‐ending of pools of tagged amplicons during library build and as a consequence of chimera formation during bulk amplification of tagged amplicons during library index PCR. We argue that tag jumping and contamination between libraries represents a considerable challenge for Illumina‐based metabarcoding studies, and suggest measures to avoid false assignment of tag jumping‐derived sequences to samples.  相似文献   

6.
Millions to billions of DNA sequences can now be generated from ancient skeletal remains thanks to the massive throughput of next‐generation sequencing platforms. Except in cases of exceptional endogenous DNA preservation, most of the sequences isolated from fossil material do not originate from the specimen of interest, but instead reflect environmental organisms that colonized the specimen after death. Here, we characterize the microbial diversity recovered from seven c. 200‐ to 13 000‐year‐old horse bones collected from northern Siberia. We use a robust, taxonomy‐based assignment approach to identify the microorganisms present in ancient DNA extracts and quantify their relative abundance. Our results suggest that molecular preservation niches exist within ancient samples that can potentially be used to characterize the environments from which the remains are recovered. In addition, microbial community profiling of the seven specimens revealed site‐specific environmental signatures. These microbial communities appear to comprise mainly organisms that colonized the fossils recently. Our approach significantly extends the amount of useful data that can be recovered from ancient specimens using a shotgun sequencing approach. In future, it may be possible to correlate, for example, the accumulation of postmortem DNA damage with the presence and/or abundance of particular microbes.  相似文献   

7.
Determining haplotype-specific DNA sequence information is very important in a wide range of research fields. However, no simple and robust approaches are currently available for determining haplotype-specific sequence information. We have addressed this problem by developing a very simple and robust haplotype-specific sequencing approach. We utilise the fact that DNA sequencing polymerases are sensitive to 3'end mismatches in the sequencing primer. By using two sequencing primers with 3'end corresponding to the two alleles in a given SNP locus, we are able to obtain allele-specific DNA sequences from both alleles. We evaluated this direct haplotype-specific approach by determining haplotypes within the intron 2 sequence of the fructan-6-fructosyltransferase (6-ft) gene in Lolium perenne L. We obtained reliable haplotype-specific sequences for all primers and genotypes evaluated. We conclude that the haplotype-specific sequencing is robust, and that the approach has a potentially very wide application range for any diploid organism.  相似文献   

8.
DNA barcoding of stylommatophoran land snails: a test of existing sequences   总被引:1,自引:0,他引:1  
DNA barcoding has attracted attention because it is a potentially simple and universal method for taxonomic assignment. One anticipated problem in applying the method to stylommatophoran land snails is that they frequently exhibit extreme divergence of mitochondrial DNA sequences, sometimes reaching 30% within species. We therefore trialled the utility of barcodes in identifying land snails, by analysing the stylommatophoran cytochrome oxidase subunit I sequences from GenBank. Two alignments of 381 and 228 base pairs were used to determine potential error rates among a test data set of 97 or 127 species, respectively. Identification success rates using neighbour‐joining phylogenies were 92% for the longer sequence and 82% for the shorter sequence, indicating that a high degree of mitochondrial variation may actually be an advantage when using phylogeny‐based methods for barcoding. There was, however, a large overlap between intra‐ and interspecific variation, with assignment failure (per cent of samples not placed with correct species) particularly associated with a low degree of mitochondrial variation (Kimura 2‐parameter distance < 0.05) and a small GenBank sample size (< 25 per species). Thus, while the optimum intra/interspecific threshold value was 4%, this was associated with an overall error of 32% for the longer sequences and 44% for the shorter sequences. The high error rate necessitates that barcoding of land snails is a potentially useful method to discriminate species of land snail, but only when a baseline has first been established using conventional taxonomy and sample DNA sequences. There is no evidence for a barcoding gap, ruling out species discovery based on a threshold value alone.  相似文献   

9.
Assignment of orthologous genes via genome rearrangement   总被引:1,自引:0,他引:1  
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.  相似文献   

10.
Accurate estimation of biological diversity in environmental DNA samples using high-throughput amplicon pyrosequencing must account for errors generated by PCR and sequencing. We describe a novel approach to distinguish the underlying sequence diversity in environmental DNA samples from errors that uses information on the abundance distribution of similar sequences across independent samples, as well as the frequency and diversity of sequences within individual samples. We have further refined this approach into a bioinformatics pipeline, Amplicon Pyrosequence Denoising Program (APDP) that is able to process raw sequence datasets into a set of validated sequences in formats compatible with commonly used downstream analyses packages. We demonstrate, by sequencing complex environmental samples and mock communities, that APDP is effective for removing errors from deeply sequenced datasets comprising biological and technical replicates, and can efficiently denoise single-sample datasets. APDP provides more conservative diversity estimates for complex datasets than other approaches; however, for some applications this may provide a more accurate and appropriate level of resolution, and result in greater confidence that returned sequences reflect the diversity of the underlying sample.  相似文献   

11.
12.
13.
An Eulerian path approach to global multiple alignment for DNA sequences.   总被引:3,自引:0,他引:3  
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one configuration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance.  相似文献   

14.
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, A, C, G, U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G ≡ C and A = U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B3)N of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.  相似文献   

15.

Background

An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. However, it is known that the best BLAST hit may not always correspond to the best taxonomic match. An alternative approach involves phylogenetic methods, which take into account alignments and a model of evolution in order to more accurately define the taxonomic origin of sequences. Similarity-search based methods typically run faster than phylogenetic methods and work well when the organisms in the sample are well represented in the database. In contrast, phylogenetic methods have the capability to identify new organisms in a sample but are computationally quite expensive.

Results

We propose a two-step approach for metagenomic taxon identification; i.e., use a rapid method that accurately classifies sequences using a reference database (this is a filtering step) and then use a more complex phylogenetic method for the sequences that were unclassified in the previous step. In this work, we explore whether and when using top BLAST hit(s) yields a correct taxonomic label. We develop a method to detect outliers among BLAST hits in order to separate the phylogenetically most closely related matches from matches to sequences from more distantly related organisms. We used modified BILD (Bayesian Integral Log-Odds) scores, a multiple-alignment scoring function, to define the outliers within a subset of top BLAST hits and assign taxonomic labels. We compared the accuracy of our method to the RDP classifier and show that our method yields fewer misclassifications while properly classifying organisms that are not present in the database. Finally, we evaluated the use of our method as a pre-processing step before more expensive phylogenetic analyses (in our case TIPP) in the context of real 16S rRNA datasets.

Conclusion

Our experiments make a good case for using a two-step approach for accurate taxonomic assignment. We show that our method can be used as a filtering step before using phylogenetic methods and provides a way to interpret BLAST results using more information than provided by E-values and bit-scores alone.
  相似文献   

16.
Expressed sequence tags (ESTs) have been obtained from several hundred brain cDNAs as an initial effort to characterize expressed brain genes. These ESTs will become tools for human genome mapping and they will also provide candidate causative genes for inherited disorders affecting the central nervous system. We have developed a procedure for the rapid chromosomal assignment of these ESTs: cDNA sequences are first analyzed by a computer program to determine regions likely not to be interrupted by introns in the genomic DNA. A pair of oligonucleotide primers is then designed to amplify this region by the polymerase chain reaction using DNA template from human-rodent somatic cell hybrid chromosomal panels. The chromosomal assignment of the cDNA is determined by studying the segregation of the amplified products in these panels. In this paper we describe the mapping of 46 brain ESTs, as well as observations on the amplification of rodent sequences.  相似文献   

17.
A novel multivariate statistical approach is presented for extracting and exploiting intrinsic information present in our ever-growing sequence data banks. The information extraction from the sequences avoids the pitfalls of intersequence alignment by analyzing secondary invariant functions derived from the sequences in the data bank rather than the sequences themselves. Such typical invariant function is a 20 x 20 histogram of occurrences of amino acid pairs in a given sequence or fragment thereof. To illustrate the potential of the approach an analysis of 10,000 protein sequences from the National Biomedical Research Foundation Protein Identification Resource is presented, whose analysis already reveals great biological detail. For example, zeta-hemoglobin is found to lie close to amphibian and fish chi-hemoglobin which, in turn, is an important clue to the physiological function of this mammalian early embryonic hemoglobin. The multivariate statistical framework presented unifies such apparently unrelated issues as phylogenetic comparisons between a set of sequences and distance matrices between the constituents of the biological sequences. The Multivariate Statistical Sequence Analysis (MSSA) principles can be used for a wide spectrum of sequence analysis problems such as: assignment of family memberships to new sequences, validation of new incoming sequences to be entered into the database, prediction of structure from sequence, discrimination of coding from non-coding DNA regions, and automatic generation of an atlas of protein or DNA sequences. The MSSA techniques represent a self-contained approach to learning continuously and automatically from the growing stream of new sequences. The MSSA approach is particularly likely to play a significant role in major sequencing efforts such as the human genome project.  相似文献   

18.
The application of Needleman-Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high-scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman-Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated nonconserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, successfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.  相似文献   

19.
We describe a new chromosomal assignment method based on the polymerase chain reaction mediated amplification of target sequences in DNAs from somatic cell hybrids. The new method is faster, much more sensitive and less labor intensive than the standard method of chromosome assignment by Southern hybridization analysis of somatic cell hybrid DNAs. The feasibility of the new approach was demonstrated by verifying the assignment of the previously mapped acidic fibroblast growth factor gene to human chromosome 5. The method was employed to assign the related oncogene, FGF-5, to human chromosome 4.  相似文献   

20.
MOTIVATION: We explored the feasibility of using unaligned rRNA gene sequences as DNA barcodes, based on correlation analysis of composition vectors (CVs) derived from nucleotide strings. We tested this method with seven rRNA (including 12, 16, 18, 26 and 28S) datasets from a wide variety of organisms (from archaea to tetrapods) at taxonomic levels ranging from class to species. RESULT: Our results indicate that grouping of taxa based on CV analysis is always in good agreement with the phylogenetic trees generated by traditional approaches, although in some cases the relationships among the higher systemic groups may differ. The effectiveness of our analysis might be related to the length and divergence among sequences in a dataset. Nevertheless, the correct grouping of sequences and accurate assignment of unknown taxa make our analysis a reliable and convenient approach in analyzing unaligned sequence datasets of various rRNAs for barcoding purposes. AVAILABILITY: The newly designed software (CVTree 1.0) is publicly available at the Composition Vector Tree (CVTree) web server http://cvtree.cbi.pku.edu.cn.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号