首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Species identification through DNA barcoding or metabarcoding has become a key approach for biodiversity evaluation and ecological studies. However, the rapid accumulation of barcoding data has created some difficulties: for instance, global enquiries to a large reference library can take a very long time. We here devise a two‐step searching strategy to speed identification procedures of such queries. This firstly uses a Hidden Markov Model (HMM) algorithm to narrow the searching scope to genus level and then determines the corresponding species using minimum genetic distance. Moreover, using a fuzzy membership function, our approach also estimates the credibility of assignment results for each query. To perform this task, we developed a new software pipeline, FuzzyID2, using Python and C++. Performance of the new method was assessed using eight empirical data sets ranging from 70 to 234,535 barcodes. Five data sets (four animal, one plant) deployed the conventional barcode approach, one used metabarcodes, and two were eDNA‐based. The results showed mean accuracies of generic and species identification of 98.60% (with a minimum of 95.00% and a maximum of 100.00%) and 94.17% (with a range of 84.40%–100.00%), respectively. Tests with simulated NGS sequences based on realistic eDNA and metabarcode data demonstrated that FuzzyID2 achieved a significantly higher identification success rate than the commonly used Blast method, and the TIPP method tends to find many fewer species than either FuzztID2 or Blast. Furthermore, data sets with tens of thousands of barcodes need only a few seconds for each query assignment using FuzzyID2. Our approach provides an efficient and accurate species identification protocol for biodiversity‐related projects with large DNA sequence data sets.  相似文献   

2.
In order to use DNA sequences for specimen identification (e.g., barcoding, fingerprinting) an algorithm to compare query sequences with a reference database is needed. Precision and accuracy of query sequence identification was estimated for hierarchical clustering (parsimony and neighbor joining), similarity methods (BLAST, BLAT and megaBLAST), combined clustering/similarity methods (BLAST/parsimony and BLAST/neighbor joining), diagnostic methods (DNA–BAR and DOME ID), and a new method (ATIM). We offer two novel alignment‐free algorithmic solutions (DOME ID and ATIM) to identify query sequences for the purposes of DNA barcoding. Publicly available gymnosperm nrITS 2 and plastid matK sequences were used as test data sets. On the test data sets, almost all of the methods were able to accurately identify sequences to genus; however, no method was able to accurately identify query sequences to species at a frequency that would be considered useful for routine specimen identification (42–71% unambiguously correct). Clustering methods performed the worst (perhaps due to alignment issues). Similarity methods, ATIM, DNA–BAR, and DOME ID all performed at approximately the same level. Given the relative precision of the algorithms (median = 67% unambiguous), the low accuracy of species‐level identification observed could be ascribed to the lack of correspondence between patterns of allelic similarity and species delimitations. Application of DNA barcoding to sequences of CITES listed cycads (Cycadopsida) provides an example of the potential application of DNA barcoding to enforcement of conservation laws. © The Willi Hennig Society 2006.  相似文献   

3.
DNA barcoding as a method for species identification is rapidly increasing in popularity. However, there are still relatively few rigorous methodological tests of DNA barcoding. Current distance-based methods are frequently criticized for treating the nearest neighbor as the closest relative via a raw similarity score, lacking an objective set of criteria to delineate taxa, or for being incongruent with classical character-based taxonomy. Here, we propose an artificial intelligence-based approach - inferring species membership via DNA barcoding with back-propagation neural networks (named BP-based species identification) - as a new advance to the spectrum of available methods. We demonstrate the value of this approach with simulated data sets representing different levels of sequence variation under coalescent simulations with various evolutionary models, as well as with two empirical data sets of COI sequences from East Asian ground beetles (Carabidae) and Costa Rican skipper butterflies. With a 630-to 690-bp fragment of the COI gene, we identified 97.50% of 80 unknown sequences of ground beetles, 95.63%, 96.10%, and 100% of 275, 205, and 9 unknown sequences of the neotropical skipper butterfly to their correct species, respectively. Our simulation studies indicate that the success rates of species identification depend on the divergence of sequences, the length of sequences, and the number of reference sequences. Particularly in cases involving incomplete lineage sorting, this new BP-based method appears to be superior to commonly used methods for DNA-based species identification.  相似文献   

4.
5.
Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time‐consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user‐friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two‐stage algorithm. First, an alignment‐free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment‐based K2P distance nearest‐neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment‐free methods and (ii) higher scalability than alignment‐based distance methods and character‐based methods. These results suggest that this platform is able to deal with both large‐scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/ .  相似文献   

6.
DNA条形码是利用相对较短的标准DNA片段对物种进行快速准确鉴定的一门技术。DNA条形码技术可以从分子水平弥补传统鉴定方法的一些不足。该技术具有良好的通用性,使得物种鉴定过程更加快速,已经广泛应用于动物物种的鉴定研究中。近年来,随着药用植物DNA条形码鉴定研究的快速发展,逐渐形成了药用植物和植物源中药材鉴定的完善体系。本文综述了DNA条形码技术鉴定药用植物的原理,介绍了中草药传统鉴定方法及其缺陷、使用DNA条形码技术鉴定植物源药材的意义以及DNA条形码在药用植物鉴定中的应用,对其应用前景进行了展望。  相似文献   

7.
DNA条形码是一段短的、标准化的DNA序列,DNA条形码技术通过对DNA条形码序列分析实现物种的有效鉴定.随着生物DNA条形码序列的大量测定,DNA条形码分析方法得到迅速发展,推动了其在生物分子鉴定中的应用.2003年以来,DNA条形码技术已广泛应用于动物、植物和真菌等物种的鉴定,并有力地推动了生物分类学、生物多样性和生态学等学科的发展.本文在综述DNA条形码技术的基础上,总结了5类主要的DNA条形码分析方法,即基于遗传距离的分析、基于遗传相似度的分析、基于系统发育树的分析、基于序列特征的分析和基于统计分类法的分析,并进一步展望了DNA条形码技术的发展与应用.  相似文献   

8.
DNA条形码技术就是利用一段较短的标准DNA序列对物种进行快速鉴定。与基于植物外部形态特征的传统分类鉴定方法相比, DNA条形码具有高效、准确,且易于实现自动化和标准化的特点。马先蒿属(Pedicularis L.)植物具对生(轮生)叶的种类70%以上分布在中国,近缘种间形态上非常相似,鉴定较为困难。研究选取马先蒿属具对生(轮生)叶类群43种164份样品,利用叶绿体基因(rbcL、matK、trnH psbA)和核基因(ITS)条形码片段,采用建树法和距离法检验4个条形码对这些物种的鉴定效果。结果表明,ITS片段用于建树法和距离法的鉴别率分别为81.40%和89.57%,其鉴别率高于3个叶绿体基因片段和任一基因片段的组合条码。另外,利用ITS成功解决了一些疑难种的分类问题。DNA条形码在马先蒿属研究中的实用性为新一代植物志(iFlora)实现物种的快速和准确鉴定提供了有力支持,并能为分类学、生态学、进化生物学、居群遗传学和保护遗传学等分支学科的研究提供重要信息。  相似文献   

9.
DNA barcoding has become a promising means for the identification of organisms of all life‐history stages. Currently, distance‐based and tree‐based methods are most widely used to define species boundaries and uncover cryptic species. However, there is no universal threshold of genetic distance values that can be used to distinguish taxonomic groups. Alternatively, DNA barcoding can deploy a “character‐based” method, whereby species are identified through the discrete nucleotide substitutions. Our research focuses on the delimitation of moth species using DNA‐barcoding methods. We analyzed 393 Lepidopteran specimens belonging to 80 morphologically recognized species with a standard cytochrome c oxidase subunit I (COI) sequencing approach, and deployed tree‐based, distance‐based, and diagnostic character‐based methods to identify the taxa. The tree‐based method divided the 393 specimens into 79 taxa (species), and the distance‐based method divided them into 84 taxa (species). Although the diagnostic character‐based method found only 39 so‐identifiable species in the 80 species, with a reduction in sample size the accuracy rate substantially improved. For example, in the Arctiidae subset, all 12 species had diagnostics characteristics. Compared with traditional morphological method, molecular taxonomy performed well. All three methods enable the rapid delimitation of species, although they have different characteristics and different strengths. The tree‐based and distance‐based methods can be used for accurate species identification and biodiversity studies in large data sets, while the character‐based method performs well in small data sets and can also be used as the foundation of species‐specific biochips.  相似文献   

10.
Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used “1-nearest-neighbor” (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research.  相似文献   

11.
DNA条形码技术就是利用一段较短的标准DNA序列对物种进行快速鉴定。与基于植物外部形态特征的传统分类鉴定方法相比,DNA条形码具有高效、准确,且易于实现自动化和标准化的特点。马先蒿属(PedicularisL.)植物具对生(轮生)叶的种类70%以上分布在中国.近缘种间形态上非常相似,鉴定较为困难。研究选取马先蒿属具对生(轮生)叶类群43种164份样品,利用叶绿体基因(rbcL、matK、trnH-psbA)和核基因(ITS)条形码片段,采用建树法和距离法检验4个条形码对这些物种的鉴定效果。结果表明,ITS片段用于建树法和距离法的鉴别率分别为81.40%和89.57%,其鉴别率高于3个叶绿体基因片段和任一基因片段的组合条码。另外,利用ITS成功解决了一些疑难种的分类问题。DNA条形码在马先蒿属研究中的实用性为新一代植物志(iFlora)实现物种的快速和准确鉴定提供了有力支持,并能为分类学、生态学、进化生物学、居群遗传学和保护遗传学等分支学科的研究提供重要信息。  相似文献   

12.
Eight years after DNA barcoding was formally proposed on a large scale, CO1 sequences are rapidly accumulating from around the world. While studies to date have mostly targeted local or regional species assemblages, the recent launch of the global iBOL project (International Barcode of Life), highlights the need to understand the effects of geographical scale on Barcoding's goals. Sampling has been central in the debate on DNA Barcoding, but the effect of the geographical scale of sampling has not yet been thoroughly and explicitly tested with empirical data. Here, we present a CO1 data set of aquatic predaceous diving beetles of the tribe Agabini, sampled throughout Europe, and use it to investigate how the geographic scale of sampling affects 1) the estimated intraspecific variation of species, 2) the genetic distance to the most closely related heterospecific, 3) the ratio of intraspecific and interspecific variation, 4) the frequency of taxonomically recognized species found to be monophyletic, and 5) query identification performance based on 6 different species assignment methods. Intraspecific variation was significantly correlated with the geographical scale of sampling (R-square = 0.7), and more than half of the species with 10 or more sampled individuals (N = 29) showed higher intraspecific variation than 1% sequence divergence. In contrast, the distance to the closest heterospecific showed a significant decrease with increasing geographical scale of sampling. The average genetic distance dropped from > 7% for samples within 1 km, to < 3.5% for samples up to > 6000 km apart. Over a third of the species were not monophyletic, and the proportion increased through locally, nationally, regionally, and continentally restricted subsets of the data. The success of identifying queries decreased with increasing spatial scale of sampling; liberal methods declined from 100% to around 90%, whereas strict methods dropped to below 50% at continental scales. The proportion of query identifications considered uncertain (more than one species < 1% distance from query) escalated from zero at local, to 50% at continental scale. Finally, by resampling the most widely sampled species we show that even if samples are collected to maximize the geographical coverage, up to 70 individuals are required to sample 95% of intraspecific variation. The results show that the geographical scale of sampling has a critical impact on the global application of DNA barcoding. Scale-effects result from the relative importance of different processes determining the composition of regional species assemblages (dispersal and ecological assembly) and global clades (demography, speciation, and extinction). The incorporation of geographical information, where available, will be required to obtain identification rates at global scales equivalent to those in regional barcoding studies. Our result hence provides an impetus for both smarter barcoding tools and sprouting national barcoding initiatives-smaller geographical scales deliver higher accuracy.  相似文献   

13.
DNA条形码是一种分子分类方法,近年来在物种鉴定方面得到迅速的发展和应用.本研究分析了我国27属32种鸟类(61只)的线粒体细胞色素c氧化酶亚基Ⅰ(COⅠ)基因的条形码片段,分别用阈值法、聚类法和诊断核苷酸进行了分析,探究DNA条形码鉴定我国鸟类的准确性.结果显示,种内CO Ⅰ序列变异很小,种间存在较多的变异位点,种间的遗传距离显著大于种内的遗传距离,DNA条形码序列能够鉴定所有鸟类.  相似文献   

14.
The majority of the available methods for the molecular identification of species use pairwise sequence divergences between the query and reference sequences (DNA barcoding). The presence of multiple insertions and deletions (indels) in the target genomic regions is generally regarded as a problem, as it introduces ambiguities in sequence alignments. However, we have recently shown that a high level of species discrimination is attainable in all taxa of life simply by considering the length of hypervariable regions defined by indel variants. Each species is tagged with a numeric profile of fragment lengths—a true numeric barcode. In this study, we describe a multifunctional computational workbench (named SPInDel for SPecies Identification by Insertions/Deletions) to assist researchers using variable‐length DNA sequences, and we demonstrate its applicability in molecular ecology. The SPInDel workbench provides a step‐by‐step environment for the alignment of target sequences, selection of informative hypervariable regions, design of PCR primers and the statistical validation of the species‐identification process. In our test data sets, we were able to discriminate all species from two genera of frogs (Ansonia and Leptobrachium) inhabiting lowland rainforests and mountain regions of South‐East Asia and species from the most common genus of coral reef fishes (Apogon). Our method can complement conventional DNA barcoding systems when indels are common (e.g. in rRNA genes) without the required step of DNA sequencing. The executable files, source code, documentation and test data sets are freely available at http://www.portugene.com/SPInDel/SPInDel_webworkbench.html .  相似文献   

15.
The globalization of commerce carries with it significant biological risks concerning the spread of harmful organisms. International Standards for Phytosanitary Measures (ISPM) No. 27, “Diagnostic Protocols for Regulated Pests”, sets out the standards governing protocols for the detection and identification of plant pest species. We argue that DNA barcoding—the use of short, standardized DNA sequences for species identification—is a methodology which should be incorporated into standard diagnostic protocols, as it holds great promise for the rapid identification of species of economic importance, notably arthropods. With a well-defined set of techniques and rigorous standards of data quality and transparency, DNA barcoding already meets or exceeds the minimum standards required for diagnostic protocols under ISPM No. 27. We illustrate the relevance of DNA barcoding to phytosanitary concerns and advocate the development of policy at the national and international levels to expand the scope of barcode coverage for arthropods globally.  相似文献   

16.
This paper describes a method for growing a recurrent neural network of fuzzy threshold units for the classification of feature vectors. Fuzzy networks seem natural for performing classification, since classification is concerned with set membership and objects generally belonging to sets of various degrees. A fuzzy unit in the architecture proposed here determines the degree to which the input vector lies in the fuzzy set associated with the fuzzy unit. This is in contrast to perceptrons that determine the correlation between input vector and a weighting vector. The resulting membership value, in the case of the fuzzy unit, is compared with a threshold, which is interpreted as a membership value. Training of a fuzzy unit is based on an algorithm for linear inequalities similar to Ho-Kashyap recording. These fuzzy threshold units are fully connected in a recurrent network. The network grows as it is trained. The advantages of the network and its training method are: (1) Allowing the network to grow to the required size which is generally much smaller than the size of the network which would be obtained otherwise, implying better generalization, smaller storage requirements and fewer calculations during classification; (2) The training time is extremely short; (3) Recurrent networks such as this one are generally readily implemented in hardware; (4) Classification accuracy obtained on several standard data sets is better than that obtained by the majority of other standard methods; and (5) The use of fuzzy logic is very intuitive since class membership is generally fuzzy.  相似文献   

17.
A decade ago, DNA barcoding was proposed as a standardised method for identifying existing species and speeding the discovery of new species. Yet, despite its numerous successes across a range of taxa, its frequent failures have brought into question its accuracy as a short-cut taxonomic method. We use a retrospective approach, applying the method to the classification of New Zealand skinks as it stood in 1977 (primarily based upon morphological characters), and compare it to the current taxonomy reached using both morphological and molecular approaches. For the 1977 dataset, DNA barcoding had moderate-high success in identifying specimens (78-98%), and correctly flagging specimens that have since been confirmed as distinct taxa (77-100%). But most matching methods failed to detect the species complexes that were present in 1977. For the current dataset, there was moderate-high success in identifying specimens (53-99%). For both datasets, the capacity to discover new species was dependent on the methodological approach used. Species delimitation in New Zealand skinks was hindered by the absence of either a local or global barcoding gap, a result of recent speciation events and hybridisation. Whilst DNA barcoding is potentially useful for specimen identification and species discovery in New Zealand skinks, its error rate could hinder the progress of documenting biodiversity in this group. We suggest that integrated taxonomic approaches are more effective at discovering and describing biodiversity.  相似文献   

18.
Zou S  Li Q  Kong L  Yu H  Zheng X 《PloS one》2011,6(10):e26619

Background

DNA barcoding has recently been proposed as a promising tool for the rapid species identification in a wide range of animal taxa. Two broad methods (distance and monophyly-based methods) have been used. One method is based on degree of DNA sequence variation within and between species while another method requires the recovery of species as discrete clades (monophyly) on a phylogenetic tree. Nevertheless, some issues complicate the use of both methods. A recently applied new technique, the character-based DNA barcode method, however, characterizes species through a unique combination of diagnostic characters.

Methodology/Principal Findings

Here we analyzed 108 COI and 102 16S rDNA sequences of 40 species of Neogastropoda from a wide phylogenetic range to assess the performance of distance, monophyly and character-based methods of DNA barcoding. The distance-based method for both COI and 16S rDNA genes performed poorly in terms of species identification. Obvious overlap between intraspecific and interspecific divergences for both genes was found. The “10× rule” threshold resulted in lumping about half of distinct species for both genes. The neighbour-joining phylogenetic tree of COI could distinguish all species studied. However, the 16S rDNA tree could not distinguish some closely related species. In contrast, the character-based barcode method for both genes successfully identified 100% of the neogastropod species included, and performed well in discriminating neogastropod genera.

Conclusions/Significance

This present study demonstrates the effectiveness of the character-based barcoding method for species identification in different taxonomic levels, especially for discriminating the closely related species. While distance and monophyly-based methods commonly use COI as the ideal gene for barcoding, the character-based approach can perform well for species identification using relatively conserved gene markers (e.g., 16S rDNA in this study). Nevertheless, distance and monophyly-based methods, especially the monophyly-based method, can still be used to flag species.  相似文献   

19.
蒟蒻薯属(Tacca)植物种间在形态上差别不大,导致分类上存在一定的困难。DNA条形码是一种利用短的DNA标准片段来鉴别和发现物种的方法。本研究利用核基因ITS片段和叶绿体基因trnH psbA, rbcL, matK片段对蒟蒻薯属6个种的DNA条形码进行研究,对4个DNA片段可用性,种内种间变异,barcode gap进行了分析,采用Tree based和BBA两种方法比较不同序列的鉴定能力。结果显示:单片段ITS正确鉴定率最高,片段组合rbcL+matK正确鉴定率最高。支持CBOL植物工作组推荐的条码组合rbcL+matK可作为蒟蒻薯属物种鉴定的标准条码,建议ITS片段作为候选条码。丝须蒟蒻薯Tacca integrifolia采自西藏的居群与马来西亚居群形成了2个不同的遗传分支,且两者在形态上也存在一定的差异,很可能是一个新种。  相似文献   

20.
探讨在纳入分析数据时,数据信息的选择对ITS2序列作为DNA条形码在葫芦科植物中鉴定能力的影响。首先,建立由葫芦科植物ITS2序列组成的3个资料组,其中Dataset1为实验样本,Dataset2由实验样本及GenBank数据库样本组合,Dataset3为从Dataset2中去除部分序列后所得。通过比较3个资料组的种间、种内的变异、Barcoding Gap及鉴定成功率,评估纳入分析的数据选择差异对ITS2鉴定能力的影响。结果显示ITS2序列在3个资料组属水平上的鉴定成功率均达到100%;种水平上,用BLAST1法鉴定成功率分别为100%、 67.8%、 90.6%,Nearest Distance法鉴定成功率分别为100%、 52.5%、 66.5%。可见纳入分析的数据选择有差异时,会导致鉴定成功率的较大变化。3个资料组中,ITS2分析仅有Dataset2的Barcoding Gap不够显著。因此对于DNA条形码分析中的数据纳入标准,值得进一步研究。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号