首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.  相似文献   

2.
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.  相似文献   

3.
We examined a broad selection of protein-coding loci from a diverse array of clades and genomes to quantify three factors that determine whether nucleotide or amino acid characters should be preferred for phylogenetic inference. First, we quantified the difference in observed character-state space between nucleotides and amino acids. Second, we quantified the loss of potential phylogenetic signal from silent substitutions when amino acids are used. Third, we used the disparity index to quantify the relative compositional heterogeneity of nucleotides and amino acids and then determined how commonly convergent (rather than unique) shifts in nucleotide and amino acid composition occur in a phylogenetic context. The greater potential phylogenetic signal for nucleotide characters was found to be enormous (on average 440% that of amino acids), whereas the greater observed character-state space for amino acids was less impressive (on average 150.4% that of nucleotides). While matrices of amino acid sequences had less compositional heterogeneity than their corresponding nucleotide sequences, heterogeneity in amino acid composition may be more homoplasious than heterogeneity in nucleotide composition. Given the ability of increased taxon sampling to better utilize the greater potential phylogenetic signal of nucleotide characters and decrease the potential for artifacts caused by heterogeneous nucleotide composition among taxa, we suggest that increased taxon sampling be performed whenever possible instead of restricting analyses to amino acid characters.  相似文献   

4.
A new method for detecting chimeras and other anomalies within 16S rRNA sequence records is presented. Using this method, we screened 1,399 sequences from 19 phyla, as defined by the Ribosomal Database Project, release 9, update 22, and found 5.0% to harbor substantial errors. Of these, 64.3% were obvious chimeras, 14.3% were unidentified sequencing errors, and 21.4% were highly degenerate. In all, 11 phyla contained obvious chimeras, accounting for 0.8 to 11% of the records for these phyla. Many chimeras (43.1%) were formed from parental sequences belonging to different phyla. While most comprised two fragments, 13.7% were composed of at least three fragments, often from three different sources. A separate analysis of the Bacteroidetes phylum (2,739 sequences) also revealed 5.8% records to be anomalous, of which 65.4% were apparently chimeric. Overall, we conclude that, as a conservative estimate, 1 in every 20 public database records is likely to be corrupt. Our results support concerns recently expressed over the quality of the public repositories. With 16S rRNA sequence data increasingly playing a dominant role in bacterial systematics and environmental biodiversity studies, it is vital that steps be taken to improve screening of sequences prior to submission. To this end, we have implemented our method as a program with a simple-to-use graphic user interface that is capable of running on a range of computer platforms. The program is called Pintail, is released under the terms of the GNU General Public License open source license, and is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.  相似文献   

5.
Kim SM  Lee JM  Yim KO  Oh MH  Park JW  Kim KH 《Molecules and cells》2003,16(3):407-412
The nucleotide sequences of the genomic RNAs of Cucumber green mottle mosaic virus Korean watermelon isolate (CGMMV-KW) and Korean oriental melon isolate (CGMMV-KOM) were determined and compared to the sequences of other tobamoviruses including CGMMV strains W and SH. Each CGMMV isolate had a genome of 6,424 nucleotides. Each also had 60 and 176 nucleotides of 5' and 3' untranslated regions (UTRs), respectively, and four open reading frames (ORF1-4). ORFs 1 to 4 encode proteins of 129, 186, 29, and 17.4 kDa, respectively. The nucleotide and deduced amino acid sequences of CGMMV-KOM and CGMMV-KW were more than 98.3% identical. When compared to other CGMMV strains in a phylogenetic analysis they were found to form a distinct virus clade, and were more distantly related to other tobamoviruses (23.5-56.7% identity).  相似文献   

6.
DNA sequencing with direct blotting electrophoresis.   总被引:10,自引:0,他引:10       下载免费PDF全文
S Beck  F M Pohl 《The EMBO journal》1984,3(12):2905-2909
A method for transferring the DNA molecules of sequencing reaction mixtures onto an immobilizing matrix during electrophoresis has been developed. A blotting membrane moves with constant speed across the end of a very short, denaturing gel and collects the molecules according to size. A constant distance between bands for molecules differing in length by one nucleotide is obtained over a large range (approximately 600 nucleotides with a 5% gel), simplifying the determination of DNA sequences considerably. Reliable sequences of 500 nucleotides can be read and sequence features up to greater than 1000 nucleotides are revealed in a single experiment. The sequencing of a potential Z-DNA-forming fragment from Escherichia coli DNA is given as an example and possible further developments are discussed.  相似文献   

7.
T Kumazaki  H Hori    S Osawa 《Nucleic acids research》1983,11(20):7141-7144
The nucleotide sequences of 5S rRNAs from two nemerteans (ribbon worms), Lineus geniculatus and Emplectonema gracile have been determined. Emplectonema has two 5S rRNA species that are composed of 119 and 120 nucleotides, respectively. The sequences of these two 5S rRNAs differ at 22 positions. On the other hand, only a single 5S rRNA species was found in Lineus. The sequence similarity percents are 88% (Lineus/Emplectonema longer 5S rRNA), 82% (Emplectonema longer/Emplectonema shorter) and 80% (Lineus/Emplectonema shorter). The comparisons of these sequences with those of other organisms suggest that the phylum Nemertinea is most related to the Mollusca (91%) and the Rotifera (89%), but not to fresh-water planarias (72%).  相似文献   

8.
Detection of chimeric artifacts formed when PCR is used to retrieve naturally occurring small-subunit (SSU) rRNA sequences may rely on demonstrating that different sequence domains have different phylogenetic affiliations. We evaluated the CHECK_CHIMERA method of the Ribosomal Database Project and another method which we developed, both based on determining nearest neighbors of different sequence domains, for their ability to discern artificially generated SSU rRNA chimeras from authentic Ribosomal Database Project sequences. The reliability of both methods decreases when the parental sequences which contribute to chimera formation are more than 82 to 84% similar. Detection is also complicated by the occurrence of authentic SSU rRNA sequences that behave like chimeras. We developed a naive statistical test based on CHECK_CHIMERA output and used it to evaluate previously reported SSU rRNA chimeras. Application of this test also suggests that chimeras might be formed by retrieving SSU rRNAs as cDNA. The amount of uncertainty associated with nearest-neighbor analyses indicates that such tests alone are insufficient and that better methods are needed.  相似文献   

9.
Even when the maximum likelihood (ML) tree is a better estimate of the true phylogenetic tree than those produced by other methods, the result of a poor ML search may be no better than that of a more thorough search under some faster criterion. The ability to find the globally optimal ML tree is therefore important. Here, I compare a range of heuristic search strategies (and their associated computer programs) in terms of their success at locating the ML tree for 20 empirical data sets with 14 to 158 sequences and 411 to 120,762 aligned nucleotides. Three distinct topics are discussed: the success of the search strategies in relation to certain features of the data, the generation of starting trees for the search, and the exploration of multiple islands of trees. As a starting tree, there was little difference among the neighbor-joining tree based on absolute differences (including the BioNJ tree), the stepwise-addition parsimony tree (with or without nearest-neighbor-interchange (NNI) branch swapping), and the stepwise-addition ML tree. The latter produced the best ML score on average but was orders of magnitude slower than the alternatives. The BioNJ tree was second best on average. As search strategies, star decomposition and quartet puzzling were the slowest and produced the worst ML scores. The DPRml, IQPNNI, MultiPhyl, PhyML, PhyNav, and TreeFinder programs with default options produced qualitatively similar results, each locating a single tree that tended to be in an NNI suboptimum (rather than the global optimum) when the data set had low phylogenetic information. For such data sets, there were multiple tree islands with very similar ML scores. The likelihood surface only became relatively simple for data sets that contained approximately 500 aligned nucleotides for 50 sequences and 3,000 nucleotides for 100 sequences. The RAxML and GARLI programs allowed multiple islands to be explored easily, but both programs also tended to find NNI suboptima. A newly developed version of the likelihood ratchet using PAUP* successfully found the peaks of multiple islands, but its speed needs to be improved.  相似文献   

10.
Sequencing ribosomal RNA (rRNA) genes is currently the method of choice for phylogenetic reconstruction, nucleic acid based detection and quantification of microbial diversity. The ARB software suite with its corresponding rRNA datasets has been accepted by researchers worldwide as a standard tool for large scale rRNA analysis. However, the rapid increase of publicly available rRNA sequence data has recently hampered the maintenance of comprehensive and curated rRNA knowledge databases. A new system, SILVA (from Latin silva, forest), was implemented to provide a central comprehensive web resource for up to date, quality controlled databases of aligned rRNA sequences from the Bacteria, Archaea and Eukarya domains. All sequences are checked for anomalies, carry a rich set of sequence associated contextual information, have multiple taxonomic classifications, and the latest validly described nomenclature. Furthermore, two precompiled sequence datasets compatible with ARB are offered for download on the SILVA website: (i) the reference (Ref) datasets, comprising only high quality, nearly full length sequences suitable for in-depth phylogenetic analysis and probe design and (ii) the comprehensive Parc datasets with all publicly available rRNA sequences longer than 300 nucleotides suitable for biodiversity analyses. The latest publicly available database release 91 (August 2007) hosts 547 521 sequences split into 461 823 small subunit and 85 689 large subunit rRNAs.  相似文献   

11.
The use of hsp60 gene sequences for phylogenetic study and identification of pathogenic marine vibrios was investigated. A 600-bp partial hsp60 gene was amplified by PCR and sequenced from 29 strains representing 15 Vibrio species within the family Vibrionaceae. Sequence comparison of the amplified partial hsp60 gene revealed 71-82% sequence identity among different Vibrio species and 96-100% sequence identity among epidemiologically distinct strains with the same species designation. This degree of discrimination allows unambiguous differentiation of all Vibrio species included in the current study from each other, as well as from Aeromonas hydrophila and Plesiomonas shigelloides, which are often misidentified as Vibrio species by conventional biochemical methods. Based on the hsp60 gene sequences, two previously unidentified shrimp isolates were found to be more closely related to Vibrio alginolyticus (93-94% sequence identity) than to Vibrio parahaemolyticus (89% sequence identity), whereas 16S rRNA gene analysis was unable to differentiate among these closely related species (95-97% sequence identity). Our results indicate that the hsp60 gene may be a useful alternative target for phylogenetic analysis and species identification of marine Vibrios to complement more conventional identification systems.  相似文献   

12.
A new species of the genus Rhodotorula was isolated from a tubeworm (Lamellibrachia sp.) collected at a depth of 1156 m in Sagami Bay, Japan. Strain SY-89 had physiological properties quite similar to R. aurantiaca. Two phylogenetic trees, one based on internal transcribed spacer (ITS) regions and 5.8S rDNA sequences and the other based on the D1/D2 region of the large subunit (26S) rDNA sequences, united strain SY-89 to the type strain of Sakaguchia dacryoides through a considerable evolutionary distance. Strain SY-89 was differentiated from S. dacryoides by the G+C content of the nuclear DNA and differences in the ability to utilize specific carbon and nitrogen compounds. The low complementarity of strain SY-89 DNA to that of the type strain of S. dacryoides confirmed that this strain was genetically unrelated to previously known species. The tubeworm isolates are described as R. lamellibrachii sp. nov. The type strain of R. lamellibrachii is strain SY-89 (= JCM 10907). R. lamellibrachii formed a cluster with Erythrobasidium hasegawianum, R. lactosa, S. dacryoides and Sporobolomyces elongatus on the ITS and 5.8S rDNA phylogenetic tree. These five species shared a signature sequence in 26S rDNA, although this relationship was not supported by phylogeny based on the D1/D2 region of 26S rDNA.  相似文献   

13.
A new computer program, called Mallard, is presented for screening entire 16S rRNA gene libraries of up to 1,000 sequences for chimeras and other artifacts. Written in the Java computer language and capable of running on all major operating systems, the program provides a novel graphical approach for visualizing phylogenetic relationships among 16S rRNA gene sequences. To illustrate its use, we analyzed most of the large libraries of cloned bacterial 16S rRNA gene sequences submitted to the public repository during 2005. Defining a large library as one containing 100 or more sequences of 1,200 bases or greater, we screened 25 of the 28 libraries and found that all but three contained substantial anomalies. Overall, 543 anomalous sequences were found. The average anomaly content per clone library was 9.0%, 4% higher than that previously estimated for the public repository overall. In addition, 90.8% of anomalies had characteristic chimeric patterns, a rise of 25.4% over that found previously. One library alone was found to contain 54 chimeras, representing 45.8% of its content. These figures far exceed previous estimates of artifacts within public repositories and further highlight the urgent need for all researchers to adequately screen their libraries prior to submission. Mallard is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.  相似文献   

14.
Clinical immunoassays often display suitable sensitivity but some lack of specificity or vice versa. As a trade-off between specificity improvement and sensitivity loss, biosensors were designed to perform indirect immunoassays with amperometric detection using tailor-made chimeric receptors to react with the analyte, specific anti-Trypanosoma cruzi immunoglobulin G (IgG). Recombinant chimeras were designed to favor their oriented covalent attachment. This allows the chimeras to properly expose their epitopes, to efficiently capture the analyte, and to withstand severe chemical treatment to reuse the biosensors. By further binding the secondary antibody, horseradish peroxidase-labeled anti-human IgG, in the presence of the soluble mediator and the enzyme substrate, a current that increased with the analyte concentration was measured. Biosensors using the chimeric constructions showed 100% specificity with samples that had revealed false-positive results when using other bioreceptors. A protein bearing a poly-Lys chain and thioredoxin as directing elements displayed the highest signal-to-noise ratio (P < 0.05). The limit of detection was 62 ng ml−1, which is eight times lower than that obtained with a currently used commercial Chagas enzyme-linked immunosorbent assay (ELISA) kit. Reusability of the biosensor was assessed. The signal was approximately 80% of the original one after performing 10 consecutive determinations.  相似文献   

15.
DNA barcoding shows enormous promise for the rapid identification of organisms at the species level. There has been much recent debate, however, about the need for longer barcode sequences, especially when these sequences are used to construct molecular phylogenies. Here, we have analysed a set of fungal mitochondrial sequences - of various lengths - and we have monitored the effect of reducing sequence length on the utility of the data for both species identification and phylogenetic reconstruction. Our results demonstrate that reducing sequence length has a profound effect on the accuracy of resulting phylogenetic trees, but surprisingly short sequences still yield accurate species identifications. We conclude that the standard short barcode sequences ( approximately 600 bp) are not suitable for inferring accurate phylogenetic relationships, but they are sufficient for species identification among the fungi.  相似文献   

16.
Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.  相似文献   

17.
We have examined the organization of the repeated and single copy DNA sequences in the genomes of two insects, the honeybee (Apis mellifera) and the housefly (Musca domestica). Analysis of the reassociation kinetics of honeybee DNA fragments 330 and 2,200 nucleotides long shows that approximately 90% of both size fragments is composed entirely of non-repeated sequences. Thus honeybee DNA contains few or no repeated sequences interspersed with nonrepeated sequences at a distance of less than a few thousand nucleotides. On the other hand, the reassociation kinetics of housefly DNA fragments 250 and 2,000 nucleotides long indicates that less than 15% of the longer fragments are composed entirely of single copy sequences. A large fraction of the housefly DNA therefore contains repeated sequences spaced less than a few thousand nucleotides apart. Reassociated repetitive DNA from the housefly was treated with S1 nuclease and sized on agarose A-50. The S1 resistant sequences have a bimodal distribution of lengths. Thirty-three percent is greater than 1,500 nucleotide pairs, and 67% has an average size about 300 nucleotide pairs. The genome of the housefly appears to have at least 70% of its DNA arranged as short repeats interspersed with single copy sequences in a pattern qualitatively similar to that of most eukaryotic genomes.  相似文献   

18.
The REG homologs, alpha, beta and gamma, activate mammalian proteasomes in distinct ways. REGalpha and REGbeta activate the trypsin-like, chymotrypsin-like and peptidylglutamyl-preferring active sites, whereas REGgamma only activates the proteasome's trypsin-like subunit. The three REG homologs differ in carboxyl-terminal sequences that are located next to activation loops on their proteasome binding surface. To assess the importance of these carboxyl-terminal sequences in the activation of specific proteasome beta catalytic subunits, we characterized chimeras in which 8 or 12 residues were exchanged among the three proteins. Like the wild-type molecule, REGalpha chimeras activated all three proteasome catalytic subunits regardless of the carboxyl-terminal sequence. However, REGalpha-beta chimeras activated the proteasome at lower concentrations than wild-type REGalpha and higher levels of REGalpha-gamma chimeras were needed for maximal activation because exchanged carboxyl-terminal sequences can stabilize (REGalpha-beta) or destabilize (REGalpha-gamma) the REGalpha heptamer. REGgamma chimeras were equivalent to REGgamma in their activation properties, but they bound the proteasome less tightly than the wild-type molecule. REGbeta chimeras also bound the proteasome more weakly than wild-type REGbeta and were virtually unable to activate it. Our findings demonstrate that the carboxyl-terminal sequences of REG subunits can affect heptamer stability and proteasome affinity, but they do not determine which proteasome beta subunits become activated.  相似文献   

19.
The sequence of the DNA contains coding and non-coding regions. The role of the non-coding regions is not known and is hypothesized to maintain the structure of the DNA. This study aimed to investigate the structure of the non-coding sequences in honey bees utilizing bioinformatics. The non-coding sequences of the mtDNA of three honey bee species Apis dorosata, Apis florea, Apis cerana, and ten subspecies of Apis mellifera were investigated. Different techniques were utilized to explore the non-coding regions of these bees including sequence analysis, phylogenetic relationships, enzymatic digestion, and statistical tests. Variations in size and sequences of nucleotides were detected in the studied species and subspecies, but with the same nucleotide abundance (i.e. nucleotides A were more than T and nucleotides G were less than C). The phylogenetic tree based on the non-coding regions was partially similar to the known phylogenetic relationships between these bees. The enzymatic digestion using four restriction enzymes confirmed the results of the phylogenetic relationships. The statistical analysis based on numerical codes for nucleotides showed the absence of significant variations between the studied bees in their sequences in a similar way to results of neutrality tests. This study suggests that the non-coding regions have the same functional role in all the studied bees regardless of the number of nucleotides, and not just to maintain the structure of the DNA. This is approximately the first study to shade lights on the non-coding regions of the mtDNA of honey bees.  相似文献   

20.
Since the discovery of microRNA (miRNA)-guided processing, a new type of RNA silencing, the possibility that such a mechanism could play a role in virus defense has been proposed. In this work, we have analyzed whether Plum pox virus (PPV) chimeras bearing miRNA target sequences (miR171, miR167, and miR159), which have been reported to be functional in Arabidopsis, were affected by miRNA function in three different host plants. Some of these PPV chimeras had clearly impaired infectivity compared with those carrying nonfunctional miRNA target sequences. The behaviors of PPV chimeras were similar but not identical in all the plants tested, and the deleterious effect on virus infectivity depended on the miRNA sequence cloned and on the site of insertion in the viral genome. The effect of the miRNA target sequence was drastically alleviated in transgenic plants expressing the silencing suppressor P1/HCPro. Furthermore, we show that virus chimeras readily escape RNA silencing interference through mutations within the miRNA target sequence, which mainly affected nucleotides matching the 5'-terminal region of the miRNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号