首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Genotypes produced from samples collected non-invasively in harsh field conditions often lack the full complement of data from the selected microsatellite loci. The application to genetic mark-recapture methodology in wildlife species can therefore be prone to misidentifications leading to both ‘true non-recaptures’ being falsely accepted as recaptures (Type I errors) and ‘true recaptures’ being undetected (Type II errors). Here we present a new likelihood method that allows every pairwise genotype comparison to be evaluated independently. We apply this method to determine the total number of recaptures by estimating and optimising the balance between Type I errors and Type II errors. We show through simulation that the standard error of recapture estimates can be minimised through our algorithms. Interestingly, the precision of our recapture estimates actually improved when we included individuals with missing genotypes, as this increased the number of pairwise comparisons potentially uncovering more recaptures. Simulations suggest that the method is tolerant to per locus error rates of up to 5% per locus and can theoretically work in datasets with as little as 60% of loci genotyped. Our methods can be implemented in datasets where standard mismatch analyses fail to distinguish recaptures. Finally, we show that by assigning a low Type I error rate to our matching algorithms we can generate a dataset of individuals of known capture histories that is suitable for the downstream analysis with traditional mark-recapture methods.  相似文献   

3.
Systematic detection of errors in genetic linkage data.   总被引:41,自引:0,他引:41  
S E Lincoln  E S Lander 《Genomics》1992,14(3):604-610
Construction of dense genetic linkage maps is hampered, in practice, by the occurrence of laboratory typing errors. Even relatively low error rates cause substantial map expansion and interfere with the determination of correct genetic order. Here, we describe a systematic method for overcoming these difficulties, based on incorporating the possibility of error into the usual likelihood model for linkage analysis. Using this approach, it is possible to construct genetic maps allowing for error and to identify the typings most likely to be in error. The method has been implemented for F2 intercrosses between two inbred strains, a situation relevant to the construction of genetic maps in experimental organisms. Tests involving both simulated and real data are presented, showing that the method detects the vast majority of errors.  相似文献   

4.

Background

Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles. On the other hand, computing genetic evolutionary distances among a set of typing profiles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles.

Results

We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
  相似文献   

5.
We apply a mathematical algorithm which processes discrete time series data to generate a complete list of Petri net structures containing the minimal number of nodes required to reproduce the data set. The completeness of the list as guaranteed by a mathematical proof allows to define a minimal set of experiments required to discriminate between alternative network structures. This in principle allows to prove all possible minimal network structures by disproving all alternative candidate structures. The dynamic behaviour of the networks in terms of a switching rule for the transitions of the Petri net is part of the result. In addition to network reconstruction, the algorithm can be used to determine how many yet undetected components at least must be involved in a certain process. The algorithm also reveals all alternative structural modifications of a network that are required to generate a predefined behaviour.  相似文献   

6.
High-density genetic linkage maps can be used for purposes such as fine-scale targeted gene cloning and anchoring of physical maps. However, their construction is significantly complicated by even relatively small amounts of scoring errors. Currently available software is not able to solve the ordering ambiguities in marker clusters, which inhibits the application of high-density maps. A statistical method named SMOOTH was developed to remove genotyping errors from genetic linkage data during the mapping process. The program SMOOTH calculates the difference between the observed and predicted values of data points based on data points of neighbouring loci in a given marker order. Highly improbable data points are removed by the program in an iterative process with a mapping algorithm that recalculates the map after cleaning. SMOOTH has been tested with simulated data and experimental mapping data from potato. The simulations prove that this method is able to detect a high amount of scoring errors and demonstrates that the program enables mapping software to successfully construct a very accurate high-density map. In potato the application of the program resulted in a reliable placement of nearly 1,000 markers in one linkage group.  相似文献   

7.
Markov chain Monte Carlo procedures allow the reconstruction of full-sibships using data from genetic marker loci only. In this study, these techniques are extended to allow the reconstruction of nested full- within half-sib families, and to present an efficient method for calculating the likelihood of the observed marker data in a nested family. Simulation is used to examine the properties of the reconstructed sibships, and of estimates of heritability and common environmental variance of quantitative traits obtained from those populations. Accuracy of reconstruction increases with increasing marker information and with increasing size of the nested full-sibships, but decreases with increasing population size. Estimates of variance component are biased, with the direction and magnitude of bias being dependent upon the underlying errors made during pedigree reconstruction.  相似文献   

8.
MOTIVATION: Recent results related to horizontal gene transfer suggest that phylogenetic reconstruction cannot be determined conclusively from sequence data, resulting in a shift from approaches based on polymorphism information in DNA or protein sequence to studies aimed at understanding the evolution of complete biological processes. The increasing amount of available information on metabolic pathways for several species makes it of greater relevance to understand the similarities and differences among such pathways. These similarities can then be used to infer phylogenetic trees not based exclusively in sequence data, therefore avoiding the previously mentioned problems. RESULTS: In this article, we present a method to assess the structural similarity of metabolic pathways for several organisms. Our algorithms work by using one of the three possible enzyme similarity measures (hierarchical, information content, gene ontology), and one of the two clustering methods (neighbor-joining, unweighted pair group method with arithmetic mean), to produce a phylogenetic tree both in Newick and graphic format. The web server implementing our algorithms is optimized to answer queries in linear time. AVAILABILITY: The software is available for free public use on a web server, at the address http://www.jaist.ac.jp/~clemente/cgi-bin/phylo.pl. It is available on demand in source code form for research use to educational institutions, non-profit research institutes, government research laboratories and individuals, for non-exclusive use, without the right of the licensee to further redistribute the source code.  相似文献   

9.
Microsatellite data are widely used to test ecological and evolutionary hypotheses in wild populations. In this paper, we consider three typical sources of scoring errors capable of biasing biological conclusions: stuttering, large‐allele dropout and null alleles. We describe methods to detect errors and propose conventions to mitigate scoring errors and report error rates in studies of wild populations. Finally, we discuss potential bias in ecological or evolutionary conclusions based on data sets containing these scoring errors.  相似文献   

10.
Population genetic inference from resequencing data   总被引:1,自引:1,他引:0       下载免费PDF全文
Jiang R  Tavaré S  Marjoram P 《Genetics》2009,181(1):187-197
This article is concerned with statistical modeling of shotgun resequencing data and the use of such data for population genetic inference. We model data produced by sequencing-by-synthesis technologies such as the Solexa, 454, and polymerase colony (polony) systems, whose use is becoming increasingly widespread. We show how such data can be used to estimate evolutionary parameters (mutation and recombination rates), despite the fact that the data do not necessarily provide complete or aligned sequence information. We also present two refinements of our methods: one that is more robust to sequencing errors and another that can be used when no reference genome is available.  相似文献   

11.

Background

Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue.

Results

We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.

Conclusions

The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-296) contains supplementary material, which is available to authorized users.  相似文献   

12.
Feil EJ  Smith JM  Enright MC  Spratt BG 《Genetics》2000,154(4):1439-1450
Multilocus sequence typing (MLST) is a highly discriminatory molecular typing method that defines isolates of bacterial pathogens using the sequences of approximately 450-bp internal fragments of seven housekeeping genes. This technique has been applied to 575 isolates of Streptococcus pneumoniae and identifies a number of discrete clonal complexes. These clonal complexes are typically represented by a single group of isolates sharing identical alleles at all seven loci, plus single-locus variants that differ from this group at only one out of the seven loci. As MLST is highly discriminatory, the members of each clonal complex can be assumed to have a recent common ancestor, and the molecular events that give rise to the single-locus variants can be used to estimate the relative contributions of recombination and mutation to clonal divergence. By comparing the sequences of the variant alleles within each clonal complex with the allele typically found within that clonal complex, we estimate that recombination has generated new alleles at a frequency approximately 10-fold higher than mutation, and that a single nucleotide site is approximately 50 times more likely to change through recombination than mutation. We also demonstrate how to estimate the average length of recombinational replacements from MLST data.  相似文献   

13.
Inferring genetic regulatory logic from expression data   总被引:1,自引:0,他引:1  
MOTIVATION: High-throughput molecular genetics methods allow the collection of data about the expression of genes at different time points and under different conditions. The challenge is to infer gene regulatory interactions from these data and to get an insight into the mechanisms of genetic regulation. RESULTS: We propose a model for genetic regulatory interactions, which has a biologically motivated Boolean logic semantics, but is of a probabilistic nature, and is hence able to confront noisy biological processes and data. We propose a method for learning the model from data based on the Bayesian approach and utilizing Gibbs sampling. We tested our method with previously published data of the Saccharomyces cerevisiae cell cycle and found relations between genes consistent with biological knowledge.  相似文献   

14.
15.
Dispersal is one of the most important factors determining the genetic structure of a population, but good data on dispersal distances are rare because it is difficult to observe a large sample of dispersal events. However, genetic data contain unbiased information about the average dispersal distances in species with a strong sex bias in their dispersal rates. By plotting the genetic similarity between members of the philopatric sex against some measure of the distance between them, the resulting regression line can be used for estimating how far dispersing individuals of the opposite sex have moved before settling. Dispersers showing low genetic similarity to members of the opposite sex will on average have originated from further away. Applying this method to a microsatellite dataset from lions (Panthera leo) shows that their average dispersal distance is 1.3 home ranges with a 95% confidence interval of 0.4-3.0 home ranges. These results are consistent with direct observations of dispersal from our study population and others. In this case, direct observations of dispersal distance were not detectably biased by a failure to detect long-range dispersal, which is thought to be a common problem in the estimation of dispersal distance.  相似文献   

16.
17.
MOTIVATION: Phylogenetic reconstruction from gene-order data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods (such as neighbor-joining), parsimony methods using sequence-based encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g. organelles). RESULTS: We report here on our successful efforts to scale up direct optimization through a two-step approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated disk-covering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCM-GRAPPA scales gracefully to at least 1000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on gene-order data can now be accomplished with high accuracy on datasets of significant size.  相似文献   

18.

Background  

Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes.  相似文献   

19.
Haplotype reconstruction from genotype data using Imperfect Phylogeny   总被引:13,自引:0,他引:13  
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes that shows that SNPs are organized in highly correlated 'blocks'. In a few recent studies, considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks, and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (<2% over the data) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared with previous methods such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large datasets. AVAILABILITY: The algorithm is available via a Web server at http://www.calit2.net/compbio/hap/  相似文献   

20.
SUMMARY: Inferring genetic network architecture from time series data generated from high-throughput experimental technologies, such as cDNA microarray, can help us to understand the system behavior of living organisms. We have developed an interactive tool, GeneNetwork, which provides four reverse engineering models and three data interpolation approaches to infer relationships between genes. GeneNetwork enables a user to readily reconstruct genetic networks based on microarray data without having intimate knowledge of the mathematical models. A simple graphical user interface enables rapid, intuitive mapping and analysis of the reconstructed network allowing biologists to explore gene relationships at the system level. AVAILABILITY: Download from http://genenetwork.sbl.bc.sinica.edu.tw/. SUPPLEMENTARY INFORMATION: Supplement documentation of algorithms for the four approaches is downloadable at the above location.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号