首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Achaz G 《Genetics》2008,179(3):1409-1424
Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of theta that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets.  相似文献   

2.
Optical mapping is a novel technique for determining the restriction sites on a DNA molecule by directly observing a number of partially digested copies of the molecule under a light microscope. The problem is complicated by uncertainty as to the orientation of the molecules and by erroneous detection of cuts. In this paper we study the problem of constructing a restriction map based on optical mapping data. We give several variants of a polynomial reconstruction algorithm, as well as an algorithm that is exponential in the number of cut sites, and hence is appropriate only for small number of cut sites. We give a simple probabilistic model for data generation and for the errors and prove probabilistic upper and lower bounds on the number of molecules needed by each algorithm in order to obtain a correct map, expressed as a function of the number of cut sites and the error parameters. To the best of our knowledge, this is the first probabilistic analysis of algorithms for the problem. We also provide experimental results confirming that our algorithms are highly effective on simulated data.  相似文献   

3.
Regressions of biological variables across species are rarely perfect. Usually, there are residual deviations from the estimated model relationship, and such deviations commonly show a pattern of phylogenetic correlations indicating that they have biological causes. We discuss the origins and effects of phylogenetically correlated biological variation in regression studies. In particular, we discuss the interplay of biological deviations with deviations due to observational or measurement errors, which are also important in comparative studies based on estimated species means. We show how bias in estimated evolutionary regressions can arise from several sources, including phylogenetic inertia and either observational or biological error in the predictor variables. We show how all these biases can be estimated and corrected for in the presence of phylogenetic correlations. We present general formulas for incorporating measurement error in linear models with correlated data. We also show how alternative regression models, such as major axis and reduced major axis regression, which are often recommended when there is error in predictor variables, are strongly biased when there is biological variation in any part of the model. We argue that such methods should never be used to estimate evolutionary or allometric regression slopes.  相似文献   

4.
5.
6.
In this paper, introducing stochastic dynamics into an optimal competitive Hopfield network model (OCHOM), we propose a new algorithm that permits temporary energy increases which helps the OCHOM escape from local minima. The goal of the maximum cut problem, which is an NP-complete problem, is to partition the node set of an undirected graph into two parts in order to maximize the cardinality of the set of edges cut by the partition. The problem has many important applications including the design of VLSI circuits and design of communication networks. Recently, Galán-Marín et al. proposed the OCHOM, which can guarantee convergence to a global/local minimum of the energy function, and performs better than the other competitive neural approaches. However, the OCHOM has no mechanism to escape from local minima. The proposed algorithm introduces stochastic dynamics which helps the OCHOM escape from local minima, and it is applied to the maximum cut problem. A number of instances have been simulated to verify the proposed algorithm.  相似文献   

7.
Friberg U  Rice WR 《Genetics》2008,179(4):2229-2238
Most recombination takes place in numerous, localized regions called hotspots. However, empirical evidence indicates that nascent hotspots are susceptible to removal due to biased gene conversion, so it is paradoxical that they should be so widespread. Previous modeling work has shown that hotspots can evolve due to genetic drift overpowering their intrinsic disadvantage. Here we synthesize recent theoretical and empirical results to show how natural selection can favor hotspots. We propose that hotspots are part of a cycle of antagonistic coevolution between two tightly linked chromosomal regions: an inducer region that initiates recombination during meiosis by cutting within a nearby region of DNA and the cut region itself, which can evolve to be resistant to cutting. Antagonistic coevolution between inducers and their cut sites is driven by recurrent episodes of Hill-Robertson interference, genetic hitchhiking, and biased gene conversion.  相似文献   

8.
Post-translational modifications (PTMs) occur on almost all proteins analyzed to date. The function of a modified protein is often strongly affected by these modifications and therefore increased knowledge about the potential PTMs of a target protein may increase our understanding of the molecular processes in which it takes part. High-throughput methods for the identification of PTMs are being developed, in particular within the fields of proteomics and mass spectrometry. However, these methods are still in their early stages, and it is indeed advantageous to cut down on the number of experimental steps by integrating computational approaches into the validation procedures. Many advanced methods for the prediction of PTMs exist and many are made publicly available. We describe our experiences with the development of prediction methods for phosphorylation and glycosylation sites and the development of PTM-specific databases. In addition, we discuss novel ideas for PTM visualization (exemplified by kinase landscapes) and improvements for prediction specificity (by using ESS--evolutionary stable sites). As an example, we present a new method for kinase-specific prediction of phosphorylation sites, NetPhosK, which extends our earlier and more general tool, NetPhos. The new server, NetPhosK, is made publicly available at the URL http://www.cbs.dtu.dk/services/NetPhosK/. The issues of underestimation, over-prediction and strategies for improving prediction specificity are also discussed.  相似文献   

9.
The aorta-gonad-mesonephros (AGM) region is a potent hematopoietic site within the mammalian embryo body, and the first place from which hematopoietic stem cells (HSCs) emerge. Within the complex embryonic vascular, excretory and reproductive tissues of the AGM region, the precise location of HSC development is unknown. To determine where HSCs develop, we subdissected the AGM into aorta and urogenital ridge segments and transplanted the cells into irradiated adult recipients. We demonstrate that HSCs first appear in the dorsal aorta area. Furthermore, we show that vitelline and umbilical arteries contain high frequencies of HSCs coincident with HSC appearance in the AGM. While later in development and after organ explant culture we find HSCs in the urogenital ridges, our results strongly suggest that the major arteries of the embryo are the most important sites from which definitive HSCs first emerge.  相似文献   

10.
11.
The mutation rate is known to vary between adjacent sites within the human genome as a consequence of context, the most well-studied example being the influence of CpG dinucelotides. We investigated whether there is additional variation by testing whether there is an excess of sites at which both humans and chimpanzees have a single-nucleotide polymorphism (SNP). We found a highly significant excess of such sites, and we demonstrated that this excess is not due to neighbouring nucleotide effects, ancestral polymorphism, or natural selection. We therefore infer that there is cryptic variation in the mutation rate. However, although this variation in the mutation rate is not associated with the adjacent nucleotides, we show that there are highly nonrandom patterns of nucleotides that extend ~80 base pairs on either side of sites with coincident SNPs, suggesting that there are extensive and complex context effects. Finally, we estimate the level of variation needed to produce the excess of coincident SNPs and show that there is a similar, or higher, level of variation in the mutation rate associated with this cryptic process than there is associated with adjacent nucleotides, including the CpG effect. We conclude that there is substantial variation in the mutation that has, until now, been hidden from view.  相似文献   

12.
Keightley PD  Halligan DL 《Genetics》2011,188(4):931-940
Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.  相似文献   

13.
Process life cycle assessment (PLCA) is widely used to quantify environmental flows associated with the manufacturing of products and other processes. As PLCA always depends on defining a system boundary, its application involves truncation errors. Different methods of estimating truncation errors are proposed in the literature; most of these are based on artificially constructed system complete counterfactuals. In this article, we review the literature on truncation errors and their estimates and systematically explore factors that influence truncation error estimates. We classify estimation approaches, together with underlying factors influencing estimation results according to where in the estimation procedure they occur. By contrasting different PLCA truncation/error modeling frameworks using the same underlying input‐output (I‐O) data set and varying cut‐off criteria, we show that modeling choices can significantly influence estimates for PLCA truncation errors. In addition, we find that differences in I‐O and process inventory databases, such as missing service sector activities, can significantly affect estimates of PLCA truncation errors. Our results expose the challenges related to explicit statements on the magnitude of PLCA truncation errors. They also indicate that increasing the strictness of cut‐off criteria in PLCA has only limited influence on the resulting truncation errors. We conclude that applying an additional I‐O life cycle assessment or a path exchange hybrid life cycle assessment to identify where significant contributions are located in upstream layers could significantly reduce PLCA truncation errors.  相似文献   

14.
Near-UV irradiation in the presence of vanadate cleaves the heavy chain of myosin subfragment 1 at three specific sites located at 23, 31, and 74 kDa from the N-terminus. Increasing the pH from 6.0 to 8.5, gradually, reduces the efficiency of the cleavage and completely eliminates the 31-kDa cut. Actin specifically inhibits the photocleavage at the sites located 31 and 74 kDa from the N-terminus. ATP strongly protects from cleavage at the 23- and 31-kDa sites and less strongly from the cut at the 74-kDa site. ADP and pyrophosphate have similar, but less pronounced, effects as ATP. Orthophosphate inhibits the photocleavage at the 23- and 74-kDa sites with a similar efficiency. In the ternary actin-S-1-ATP complex, the photocleavage is inhibited at all sites, and the effects of actin and ATP are additive. Photocleavages affect the K+(EDTA)-, Ca2(+)-, and actin-activated ATPase activity of subfragment 1. Loss of all three ATPases is caused by cleavage at the 23-kDa site, while the cut at the 74-kDa site only leads to the loss of actin-activated ATPase activity. It is concluded that subfragment 1 contains at least two distinct phosphate binding sites, the first being part of the "consensus" ATP binding site wherein the 23-kDa photocleavage site is located. This site is responsible for the binding and hydrolysis of ATP. It is possible that the 31-kDa cleavage site is also associated with the "consensus" site through a loop. The 74-kDa cleavage site is a part of another phosphate binding site which may play a role in the regulation of the myosin-actin interaction.  相似文献   

15.
The input to a supertree problem is a collection of phylogenetic trees that intersect pairwise in their leaf sets; the goal is to construct a single tree that retains as much as possible of the information in the input. This task is complicated by inconsistencies due to errors. We consider the case where the input trees are rooted and are represented by the clusters they exhibit. The problem is to find the minimum number of flips needed to resolve all inconsistencies, where each flip moves a taxon into or out of a cluster. We prove that the minimum-flip problem is NP-complete, but show that it is fixed-parameter tractable and give approximation algorithms for special cases.  相似文献   

16.
Models of gene regulatory networks (GRNs) attempt to explain the complex processes that determine cells' behavior, such as differentiation, metabolism, and the cell cycle. The advent of high-throughput data generation technologies has allowed researchers to fit theoretical models to experimental data on gene-expression profiles. GRNs are often represented using logical models. These models require that real-valued measurements be converted to discrete levels, such as on/off, but the discretization often introduces inconsistencies into the data. Dimitrova et al. posed the problem of efficiently finding a parsimonious resolution of the introduced inconsistencies. We show that reconstruction of a logical GRN that minimizes the errors is NP-complete, so that an efficient exact algorithm for the problem is not likely to exist. We present a probabilistic formulation of the problem that circumvents discretization of expression data. We phrase the problem of error reduction as a minimum entropy problem, develop a heuristic algorithm for it, and evaluate its performance on mouse embryonic stem cell data. The constructed model displays high consistency with prior biological knowledge. Despite the oversimplification of a discrete model, we show that it is superior to raw experimental measurements and demonstrates a highly significant level of identical regulatory logic among co-regulated genes. A software implementing the method is freely available at: http://acgt.cs.tau.ac.il/modent.  相似文献   

17.
Probing intrinsic properties of a robust morphogen gradient in Drosophila   总被引:1,自引:0,他引:1  
He F  Wen Y  Deng J  Lin X  Lu LJ  Jiao R  Ma J 《Developmental cell》2008,15(4):558-567
A remarkable feature of development is its reproducibility, the ability to correct embryo-to-embryo variations and instruct precise patterning. In Drosophila, embryonic patterning along the anterior-posterior axis is controlled by the morphogen gradient Bicoid (Bcd). In this article, we describe quantitative studies of the native Bcd gradient and its target Hunchback (Hb). We show that the native Bcd gradient is highly reproducible and is itself scaled with embryo length. While a precise Bcd gradient is necessary for precise Hb expression, it still has positional errors greater than Hb expression. We describe analyses further probing mechanisms for Bcd gradient scaling and correction of its residual positional errors. Our results suggest a simple model of a robust Bcd gradient sufficient to achieve scaled and precise activation of its targets. The robustness of this gradient is conferred by its intrinsic properties of "self-correcting" the inevitable input variations to achieve a precise and reproducible output.  相似文献   

18.
The BcgI endonuclease exemplifies a subset of restriction enzymes, the Type IIB class, which make two double-strand breaks (DSBs) at each copy of their recognition sequence, one either side of the site, to excise the sequence from the remainder of the DNA. In this study, we show that BcgI is essentially inactive when bound to a single site and that to cleave a DNA with one copy of its recognition sequence, it has to act in trans, bridging two separate DNA molecules. We also show that BcgI makes the two DSBs at an individual site in a highly concerted manner. Intermediates cut on one side of the site do not accumulate during the course of the reaction: instead, the DNA is converted straight to the final products cut on both sides. On DNA with two sites, BcgI bridges the sites in cis and then generally proceeds to cut both strands on both sides of both sites without leaving the DNA. The BcgI restriction enzyme can thus excise two DNA segments together, by cleaving eight phosphodiester bonds within a single-DNA binding event.  相似文献   

19.
Coincident Gene Conversion Events in Yeast That Involve a Large Insertion   总被引:8,自引:5,他引:3  
In yeast, spontaneous gene conversion events involving sites that are far apart (16 cM) occur 1000 times more frequently in mitotic cells than is expected for two independent acts of recombination. It has been proposed that a major portion of these could be due to a long, continuous heteroduplex intermediate. We have examined this possibility in further detail by introducing, via transformation, a large plasmid insertion between the LEU1 and TRP5 loci and studying its behavior among coincident convertants involving the flanking sites. Among such convertants, there is frequent loss of the plasmid when it is present in hemizygous or homozygous configuration. Our results could support the long heteroduplex model for coincident recombination events, but only if novel assumptions regarding the formation and fate of mismatched DNA are made. Therefore, an alternative model that proposes multiple, concerted recombination events is discussed.  相似文献   

20.
J. E. Golin  H. Tampe 《Genetics》1988,119(3):541-547
In mitosis, coincident recombination events between widely separated markers occur more frequently than expected for two independent acts. Several different mechanisms have been proposed to account for this phenomenon. It has been argued that coincident recombination could be due to either an extensive region of heteroduplex DNA or some other distance-dependent mechanism. Alternately, it has been suggested that at least some is due to subpopulations of cells which undergo recombination at very high frequencies. The purpose of these experiments is to evaluate the possible contribution of distance-dependent and distance-independent components. By comparing the coincident recombination frequencies for markers on the same homolog as well as pairs of unlinked sites, we show that there is a strong distance-dependent component for at least 8.8-35-kbp, depending on the type of recombination event (conversion or intrachromosomal exchange). For larger distances separating sites, a distance-independent mechanism(s) results in higher than expected frequencies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号