首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genotyping errors are present in almost all genetic data and can affect biological conclusions of a study, particularly for studies based on individual identification and parentage. Many statistical approaches can incorporate genotyping errors, but usually need accurate estimates of error rates. Here, we used a new microsatellite data set developed for brown rockfish (Sebastes auriculatus) to estimate genotyping error using three approaches: (i) repeat genotyping 5% of samples, (ii) comparing unintentionally recaptured individuals and (iii) Mendelian inheritance error checking for known parent–offspring pairs. In each data set, we quantified genotyping error rate per allele due to allele drop‐out and false alleles. Genotyping error rate per locus revealed an average overall genotyping error rate by direct count of 0.3%, 1.5% and 1.7% (0.002, 0.007 and 0.008 per allele error rate) from replicate genotypes, known parent–offspring pairs and unintentionally recaptured individuals, respectively. By direct‐count error estimates, the recapture and known parent–offspring data sets revealed an error rate four times greater than estimated using repeat genotypes. There was no evidence of correlation between error rates and locus variability for all three data sets, and errors appeared to occur randomly over loci in the repeat genotypes, but not in recaptures and parent–offspring comparisons. Furthermore, there was no correlation in locus‐specific error rates between any two of the three data sets. Our data suggest that repeat genotyping may underestimate true error rates and may not estimate locus‐specific error rates accurately. We therefore suggest using methods for error estimation that correspond to the overall aim of the study (e.g. known parent–offspring comparisons in parentage studies).  相似文献   

2.
Abstract Genotyping error, often associated with low‐quantity/quality DNA samples, is an important issue when using genetic tags to estimate abundance using capture‐mark‐recapture (CMR). dropout , an MS‐Windows program, identifies both loci and samples that likely contain errors affecting CMR estimates. dropout uses a ‘bimodal test’, that enumerates the number of loci different between each pair of samples, and a ‘difference in capture history test’ (DCH) to determine those loci producing the most errors. Importantly, the DCH test allows one to determine that a data set is error‐free. dropout has been evaluated in McKelvey & Schwartz (2004) and is now available online.  相似文献   

3.
Assessing effects of gene tree error in coalescent analyses have widely ignored coalescent branch lengths (CBLs) despite their potential utility in estimating ancestral population demographics and detecting species tree anomaly zones. However, the ability of coalescent methods to obtain accurate estimates remains largely unexplored. Errors in gene trees should lead to underestimates of the true CBL, and for a given set of comparisons, longer CBLs should be more accurate. Here, we furthered our empirical understanding of how error in gene tree quality (i.e., locus informativeness and gene tree resolution) affect CBLs using four datasets comprised of ultraconserved elements (UCE) or exons for clades that exhibit wide ranges of branch lengths. For each dataset, we compared the impact of locus informativeness (assessed using number of parsimony-informative sites) and gene tree resolution on CBL estimates. Our results, in general, showed that CBLs were drastically shorter when estimates included low informative loci. Gene tree resolution also had an impact on UCE datasets, with polytomous gene trees producing longer branches than randomly resolved gene trees. However, resolution did not appear to affect CBL estimates from the more informative exon datasets. Thus, as expected, gene tree quality affects CBL estimates, though this can generally be minimized by using moderate filtering to select more informative loci and/or by allowing polytomies in gene trees. These approaches, as well as additional contributions to improve CBL estimation, should lead to CBLs that are useful for addressing evolutionary and biological questions.  相似文献   

4.
EST clustering error evaluation and correction   总被引:4,自引:0,他引:4  
MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.  相似文献   

5.
Nine polymorphic microsatellite markers were identified by screening of 2464 ESTs derived from a cDNA library of Atlantic cod (Gadus morhua L.). About 35 novel microsatellite loci were selected and characterised in 96 individual cod. Nine markers were successfully amplified with number of alleles from 3 to 18 per locus and the average heterozygosity was 0.57 in the panel examined (range 0.29–0.86). All loci followed the Hardy–Weinberg expectation and no significant linkage disequilibrium was found in a test including all pairwise combinations. The gene identity was determined at four of the loci, confirming the associated microsatellites as Type I markers.  相似文献   

6.
ABSTRACT Use of non-invasive sources of DNA, such as hair or scat, to obtain a genetic mark for population estimates is becoming commonplace. Unfortunately, with such marks, potentials for genotyping errors and for the shadow effect have resulted in use of many loci and amplification of each specimen many times at each locus, drastically increasing time and cost of obtaining a population estimate. We proposed a method, the Genotyping Uncertainty Added Variance Adjustment (GUAVA), which statistically adjusts for genotyping errors and the shadow effect, thereby allowing use of fewer loci and one amplification of each specimen per locus. Using allele frequencies and estimates of genotyping error rates, we determined, for each pair of specimens, the probability that the pair was obtained from the same individual, whether or not their observed genotypes match. Using these probabilities, we reconstructed possible capture history matrices and used this distribution to obtain a population estimate. With simulated data, we consistently found our estimates had lower bias and smaller variance than estimates based on single amplifications in which genotyping error was ignored and that were comparable to estimates based on data free of genotyping errors. We also demonstrated the method on a fecal DNA data set from a population of red wolves (Canis rufus). The GUAVA estimate based on only one amplification genotypes compares favorably to the estimate based on consensus genotypes. A program to conduct the analysis is available from the first author for UNIX or Windows platforms. Application of GUAVA may allow for increased accuracy in population estimates at reduced cost.  相似文献   

7.
Summary .  Sampling DNA noninvasively has advantages for identifying animals for uses such as mark–recapture modeling that require unique identification of animals in samples. Although it is possible to generate large amounts of data from noninvasive sources of DNA, a challenge is overcoming genotyping errors that can lead to incorrect identification of individuals. A major source of error is allelic dropout, which is failure of DNA amplification at one or more loci. This has the effect of heterozygous individuals being scored as homozygotes at those loci as only one allele is detected. If errors go undetected and the genotypes are naively used in mark–recapture models, significant overestimates of population size can occur. To avoid this it is common to reject low-quality samples but this may lead to the elimination of large amounts of data. It is preferable to retain these low-quality samples as they still contain usable information in the form of partial genotypes. Rather than trying to minimize error or discarding error-prone samples we model dropout in our analysis. We describe a method based on data augmentation that allows us to model data from samples that include uncertain genotypes. Application is illustrated using data from the European badger ( Meles meles ).  相似文献   

8.
早期的中华白海豚考察主要依赖样线调查法了解其资源分布,而近期研究更多采纳标记重捕法获取种群动态信息。在辨识个体的基础上,后者能够获取多种种群参数开展种群生存力分析。本文回顾在我国海域开展的中华白海豚种群动态研究进展及各地区种群标记重捕数据的累积情况;通过数据模拟评估努力值如何影响种群大小统计的误差和偏差;综合阐述野外考察方案设计、标志筛选和数据处理对数据分析的潜在影响;强调模型拟合优度检验和模型选择的重要性;最后,针对比较不同时期或不同方法获取的种群信息时的常见误解提出我们的意见。本文旨在帮助完善我国中华白海豚的后期资源监测工作。  相似文献   

9.
In noninvasive genetic sampling, when genotyping error rates are high and recapture rates are low, misidentification of individuals can lead to overestimation of population size. Thus, estimating genotyping errors is imperative. Nonetheless, conducting multiple polymerase chain reactions (PCRs) at multiple loci is time-consuming and costly. To address the controversy regarding the minimum number of PCRs required for obtaining a consensus genotype, we compared consumer-style the performance of two genotyping protocols (multiple-tubes and 'comparative method') in respect to genotyping success and error rates. Our results from 48 faecal samples of river otters (Lontra canadensis) collected in Wyoming in 2003, and from blood samples of five captive river otters amplified with four different primers, suggest that use of the comparative genotyping protocol can minimize the number of PCRs per locus. For all but five samples at one locus, the same consensus genotypes were reached with fewer PCRs and with reduced error rates with this protocol compared to the multiple-tubes method. This finding is reassuring because genotyping errors can occur at relatively high rates even in tissues such as blood and hair. In addition, we found that loci that amplify readily and yield consensus genotypes, may still exhibit high error rates (7-32%) and that amplification with different primers resulted in different types and rates of error. Thus, assigning a genotype based on a single PCR for several loci could result in misidentification of individuals. We recommend that programs designed to statistically assign consensus genotypes should be modified to allow the different treatment of heterozygotes and homozygotes intrinsic to the comparative method.  相似文献   

10.
Accurate genotyping of complex systems, such as the major histocompatibility complex (MHC) often requires simultaneous analysis of multiple co-amplifying loci. Here we explore the utility of the massively parallel 454 sequencing method as a universal tool for genotyping complex MHC systems in nonmodel vertebrates. The power of this approach stems from the use of tagged polymerase chain reaction (PCR) primers to identify individual amplicons which can be simultaneously sequenced to the arbitrarily chosen coverage. However, the error-prone sequencing technology poses considerable challenges as it may be difficult to discriminate between sequencing errors and true rare alleles; due to complex nature of artefacts and errors, efficient quality control is required. Nevertheless, our study demonstrates the parallel 454 sequencing can be an efficient genotyping platform for MHC and provides an alternative to classical genotyping methods. We introduced procedures to identify the threshold that can be used to reduce number of genotyping errors by eliminating most of artefactual alleles (AA) representing PCR or sequencing errors. Our procedures are based on two expectations: first, that AA should be relatively rare, both overall and on per-individual basis, and second, that most AA result from errors introduced to sequences of true alleles. In our data set, alleles with an average per-individual frequency below 3% most likely represented artefacts. This threshold will vary in other applications according to the complexity of the genotyped system. We strongly suggest direct assessment of genotyping error in every experiment by running a fraction of duplicates: individuals amplified in independent PCRs.  相似文献   

11.
Null hypothesis significance testing has been under attack in recent years, partly owing to the arbitrary nature of setting α (the decision-making threshold and probability of Type I error) at a constant value, usually 0.05. If the goal of null hypothesis testing is to present conclusions in which we have the highest possible confidence, then the only logical decision-making threshold is the value that minimizes the probability (or occasionally, cost) of making errors. Setting α to minimize the combination of Type I and Type II error at a critical effect size can easily be accomplished for traditional statistical tests by calculating the α associated with the minimum average of α and β at the critical effect size. This technique also has the flexibility to incorporate prior probabilities of null and alternate hypotheses and/or relative costs of Type I and Type II errors, if known. Using an optimal α results in stronger scientific inferences because it estimates and minimizes both Type I errors and relevant Type II errors for a test. It also results in greater transparency concerning assumptions about relevant effect size(s) and the relative costs of Type I and II errors. By contrast, the use of α = 0.05 results in arbitrary decisions about what effect sizes will likely be considered significant, if real, and results in arbitrary amounts of Type II error for meaningful potential effect sizes. We cannot identify a rationale for continuing to arbitrarily use α = 0.05 for null hypothesis significance tests in any field, when it is possible to determine an optimal α.  相似文献   

12.
Although mark-recapture methods are among the most powerful tools for monitoring wildlife populations, the secretive nature of some species requires a comprehensive understanding of the factors that affect capture probability to maximize accuracy and precision of population parameter estimates (e.g., population size and survivorship). Here, we used aquatic snakes as a case study in applying rigorous mark-recapture methods to estimate population parameters for secretive species. Specifically, we used intensive field sampling and robust design mark-recapture analyses in Program MARK to test specific hypotheses about ecological and methodological factors influencing detectability of two species of secretive aquatic snakes, the banded watersnake (Nerodia fasciata), and the black swamp snake (Seminatrix pygaea). We constructed a candidate set of a priori mark-recapture models incorporating various combinations of time- and sex-varying capture and recapture probabilities, behavioral responses to traps (i.e., trap-happiness or trap-shyness), and temporary emigration, and we ranked models for each species using Akaike's Information Criterion. For both banded watersnakes and black swamp snakes we found strong support for time-varying capture and recapture probabilities and strong trap-happy responses, factors that can bias population estimation if not accommodated in the models. We also found evidence of sex-dependent temporary emigration in black swamp snakes. Our study is among the first comprehensive assessments of factors affecting detectability in snakes and provides a framework for studies aimed at monitoring populations of other secretive species. © 2010 The Wildlife Society.  相似文献   

13.
The extent to which populations are connected by dispersal influences all aspects of their biology and informs the spatial scale of optimal conservation strategies. Obtaining direct estimates of dispersal is challenging, particularly in marine systems, with studies typically relying on indirect approaches to evaluate connectivity. To overcome this challenge, we combine information from an eight-year mark-recapture study with high-resolution genetic data to demonstrate extremely low dispersal and restricted gene flow at small spatial scales for a large, potentially mobile marine vertebrate, the turtleheaded sea snake (Emydocephalus annulatus). Our mark-recapture study indicated that adjacent bays in New Caledonia (<1.15 km apart) contain virtually separate sea snake populations. Sea snakes could easily swim between bays but rarely do so. Of 817 recaptures of marked snakes, only two snakes had moved between bays. We genotyped 136 snakes for 11 polymorphic microsatellite loci and found statistically significant genetic divergence between the two bays (F(ST)= 0.008, P < 0.01). Bayesian clustering analyses detected low mixed ancestry within bays and genetic relatedness coefficients were higher, on average, within than between bays. Our results indicate that turtleheaded sea snakes rarely venture far from home, which has strong implications for their ecology, evolution, and conservation.  相似文献   

14.
Sibship reconstruction from genetic data with typing errors   总被引:13,自引:0,他引:13  
Wang J 《Genetics》2004,166(4):1963-1979
Likelihood methods have been developed to partition individuals in a sample into full-sib and half-sib families using genetic marker data without parental information. They invariably make the critical assumption that marker data are free of genotyping errors and mutations and are thus completely reliable in inferring sibships. Unfortunately, however, this assumption is rarely tenable for virtually all kinds of genetic markers in practical use and, if violated, can severely bias sibship estimates as shown by simulations in this article. I propose a new likelihood method with simple and robust models of typing error incorporated into it. Simulations show that the new method can be used to infer full- and half-sibships accurately from marker data with a high error rate and to identify typing errors at each locus in each reconstructed sib family. The new method also improves previous ones by adopting a fresh iterative procedure for updating allele frequencies with reconstructed sibships taken into account, by allowing for the use of parental information, and by using efficient algorithms for calculating the likelihood function and searching for the maximum-likelihood configuration. It is tested extensively on simulated data with a varying number of marker loci, different rates of typing errors, and various sample sizes and family structures and applied to two empirical data sets to demonstrate its usefulness.  相似文献   

15.
Juvenile vital rates have important effects on population dynamics for many species, but this demographic is often difficult to locate and track. As such, we frequently lack reliable estimates of juvenile survival, which are necessary for accurately assessing population stability and potential management approaches to conserve biodiversity. We estimated survival rates for elusive juveniles of 3 species, the ringed salamander (Ambystoma annulatum), spotted salamander (A. maculatum), and small-mouthed salamander (A. texanum), using 2 approaches. First, we conducted an 11-month (2016–2017) mark-recapture study within semi-natural enclosures and used Bayesian Cormack-Jolly-Seber models to estimate survival and recapture probabilities. Second, we inferred the expected annual juvenile survival rate given published vital rates for pre-metamorphic and adult ambystomatids assuming stable population growth. For all 3 species, juvenile survival probabilities were constant across recapture occasions, whereas recapture probability estimates were time-dependent. Further, survival and recapture probabilities among study species were similar. Post-study sampling revealed that the initial study period median estimate of annual survival probability (0.39) underestimated the number of salamanders known alive at 11 months. We therefore appended approximately 1 year of opportunistic data, which produced a median annual survival probability of 0.50, encompassing salamanders that we knew to have been alive. Calculation from literature values suggested a mean annual terrestrial juvenile ambystomatid survival probability of 0.49. Similar results among our approaches indicated that juvenile survival estimates for the study species were robust and likely comparable to rates in nature. These estimates can now be confidently applied to research, monitoring, and management efforts for the study species and ecologically similar taxa. Our findings indicated that similarly robust vital rate estimates for subsets of ecologically and phylogenetically similar species can provide reasonable surrogate demographic information that can be used to reveal key factors influencing population viability for data-deficient species. © 2020 The Wildlife Society.  相似文献   

16.
Parentage analysis in natural populations presents a valuable yet unique challenge because of large numbers of pairwise comparisons, marker set limitations and few sampled true parent-offspring pairs. These limitations can result in the incorrect assignment of false parent-offspring pairs that share alleles across multi-locus genotypes by chance alone. I first define a probability, Pr(δ), to estimate the expected number of false parent-offspring pairs within a data set. This probability can be used to determine whether one can accept all putative parent-offspring pairs with strict exclusion. I next define the probability Pr(φ|λ), which employs Bayes' theorem to determine the probability of a putative parent-offspring pair being false given the frequencies of shared alleles. This probability can be used to separate true parent-offspring pairs from false pairs that occur by chance when a data set lacks sufficient numbers of loci to accept all putative parent-offspring pairs. Finally, I propose a method to quantitatively determine how many loci to let mismatch for study-specific error rates and demonstrate that few data sets should need to allow more than two loci to mismatch. I test all theoretical predictions with simulated data and find that, first, Pr(δ) and Pr(φ|λ) have very low bias, and second, that power increases with lower sample sizes, uniform allele frequency distributions, and higher numbers of loci and alleles per locus. Comparisons of Pr(φ|λ) to strict exclusion and CERVUS demonstrate that this method may be most appropriate for large natural populations when supplemental data (e.g. genealogies, candidate parents) are absent.  相似文献   

17.
B R Smith  C M Herbinger  H R Merry 《Genetics》2001,158(3):1329-1338
Two Markov chain Monte Carlo algorithms are proposed that allow the partitioning of individuals into full-sib groups using single-locus genetic marker data when no parental information is available. These algorithms present a method of moving through the sibship configuration space and locating the configuration that maximizes an overall score on the basis of pairwise likelihood ratios of being full-sib or unrelated or maximizes the full joint likelihood of the proposed family structure. Using these methods, up to 757 out of 759 Atlantic salmon were correctly classified into 12 full-sib families of unequal size using four microsatellite markers. Large-scale simulations were performed to assess the sensitivity of the procedures to the number of loci and number of alleles per locus, the allelic distribution type, the distribution of families, and the independent knowledge of population allelic frequencies. The number of loci and the number of alleles per locus had the most impact on accuracy. Very good accuracy can be obtained with as few as four loci when they have at least eight alleles. Accuracy decreases when using allelic frequencies estimated in small target samples with skewed family distributions with the pairwise likelihood approach. We present an iterative approach that partly corrects that problem. The full likelihood approach is less sensitive to the precision of allelic frequencies estimates but did not perform as well with the large data set or when little information was available (e.g., four loci with four alleles).  相似文献   

18.
Summary: I examined the use of a mark-recapture technique to measure colony size and colony growth in the ant species Formica neorufibarbis. I addressed three questions: 1) Is the method reasonably accurate?, 2) is the method precise?, and 3) how many workers does the method kill? I found that estimates of colony sizes based on mark-recapture were similar to those estimated by colony excavation. The error in estimates of worker and cocoon number due to the binomial nature of the mark-recapture method was relatively small, with a mean coefficient of variation of twelve percent for workers and nine percent for cocoons. I estimated that the method killed less than two percent of the workers in a nest.  相似文献   

19.
Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree.  相似文献   

20.
Although mark-recapture protocols produce inaccurate population estimates of termite colonies, they might be employed to estimate a relative change in colony size. This possibility was tested using two Australian, mound-building, wood-eating, subterranean Coptotermes species. Three different toxicants delivered in baits were used to decrease (but not eliminate) colony size, and a single mark-recapture protocol was used to estimate pre- and postbaiting population sizes. For both species, the numbers of termites retrieved from bait stations varied widely, resulting in no significant differences in the numbers of termites sampled between treatments in either the pre- or postbaiting protocols. There were significantly fewer termites sampled in all treatments, controls included, in the postbaiting protocol compared with the pre-, suggesting a seasonal change in forager numbers. The comparison of population estimates shows a large decrease in toxicant treated colonies compared with little change in control colonies, which suggests that estimating the relative decline in population size using mark-recapture protocols might to be possible. However, the change in population estimate was due entirely to the significantly lower recapture rate in the control colonies relative to the toxicant treated colonies, as numbers of unmarked termites did not change between treatments. The population estimates should be treated with caution because low recapture rates produce dubious population estimates and, in some cases, postbaiting mark-recapture population estimates could be much greater than those at prebaiting, despite consumption of bait in sufficient quantities to cause population decline. A possible interaction between fat-stain markers and toxicants should be investigated if mark-recapture population estimates are used. Alternative methods of population change are advised, along with other indirect measures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号