首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not tree-like. In a seminal paper, Wang et al.(1) studied the problem of constructing a phylogenetic network, allowing recombination between sequences, with the constraint that the resulting cycles must be disjoint. We call such a phylogenetic network a "galled-tree". They gave a polynomial-time algorithm that was intended to determine whether or not a set of sequences could be generated on galled-tree. Unfortunately, the algorithm by Wang et al.(1) is incomplete and does not constitute a necessary test for the existence of a galled-tree for the data. In this paper, we completely solve the problem. Moreover, we prove that if there is a galled-tree, then the one produced by our algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiple-crossover recombinations. We also prove that when there is a galled-tree for the data, the galled-tree minimizing the number of recombinations is "essentially unique". We also note two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation per site is allowed; second, the site compatibility problem (which is NP-hard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree. Perhaps more important than the specific results about galled-trees, we introduce an approach that can be used to study recombination in general phylogenetic networks. This paper greatly extends the conference version that appears in an earlier work.(8) PowerPoint slides of the conference talk can be found at our website.(7).  相似文献   

2.
Maximum likelihood haplotyping for general pedigrees   总被引:3,自引:0,他引:3  
Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. The use of Bayesian networks enables efficient maximum likelihood haplotyping for more complex pedigrees than was previously possible. Furthermore, to support efficient haplotyping for larger pedigrees, we have also incorporated a novel algorithm for determining a better elimination order for the variables of the Bayesian network. The presented optimization algorithm also improves likelihood computations. We present experimental results for the new algorithms on a variety of real and semiartificial data sets, and use our software to evaluate MCMC approximations for haplotyping.  相似文献   

3.
In this paper, we give a complete characterization of the existence of a galled-tree network in the form of simple sufficient and necessary conditions for both root-known and root-unknown cases. As a by-product we obtain a simple algorithm for constructing galled-tree networks. We also introduce a new necessary condition for the existence of a galled-tree network similar to bi-convexity.  相似文献   

4.
The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct theta(nm2) solutions that use simpler techniques (Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. (2003) and Eskin et al. (2003) be sped-up to O(nm + m2) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes. We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step (in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nm(omega-1)), where omega < or = 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).  相似文献   

5.
Haplotyping as perfect phylogeny: a direct approach.   总被引:4,自引:0,他引:4  
A full haplotype map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A haplotype map project has been announced by NIH. The biological key to that project is the surprising fact that some human genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). In this paper we explore the algorithmic implications of the no-recombination in long blocks observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, motivates a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny. We consider the following algorithmic problem, called the perfect phylogeny haplotyping problem (PPH), which was introduced by Gusfield (2002) - given n genotypes of length m each, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? The approach taken by Gusfield (2002) to solve this problem reduces it to established, deep results and algorithms from matroid and graph theory. Although that reduction is quite simple and the resulting algorithm nearly optimal in speed, taken as a whole that approach is quite involved, and in particular, challenging to program. Moreover, anyone wishing to fully establish, by reading existing literature, the correctness of the entire algorithm would need to read several deep and difficult papers in graph and matroid theory. However, as stated by Gusfield (2002), many simplifications are possible and the list of "future work" in Gusfield (2002) began with the task of developing a simpler, more direct, yet still efficient algorithm. This paper accomplishes that goal, for both the rooted and unrooted PPH problems. It establishes a simple, easy-to-program, O(nm(2))-time algorithm that determines whether there is a PPH solution for input genotypes and produces a linear-space data structure to represent all of the solutions. The approach allows complete, self-contained proofs. In addition to algorithmic simplicity, the approach here makes the representation of all solutions more intuitive than in Gusfield (2002), and solves another goal from that paper, namely, to prove a nontrivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed.  相似文献   

6.
The Gapped Consecutive-Ones Property (C1P) Problem, or the (k, δ)-C1P Problem is: given a binary matrix M and integers k and δ, decide if the columns of M can be ordered such that each row contains at most k blocks of 1's, and no two neighboring blocks of 1's are separated by a gap of more than δ 0's. This problem was introduced by Chauve et al. ( 2009b ). The classical polynomial-time solvable C1P Problem is equivalent to the (1, 0)-C1P problem. It has been shown that, for every unbounded or bounded k ≥ 2 and unbounded or bounded δ ≥ 1, except when (k, δ) = (2, 1), the (k, δ)-C1P Problem is NP-complete (Maňuch et al., 2011 ; Goldberg et al., 1995 ). In this article, we study the Gapped C1P Problem with a third parameter d, namely the bound on the maximum number of 1's in any row of M, or the bound on the maximum degree of M. This is motivated by the reconstruction of ancestral genomes (Ma et al., 2006 ; Chauve and Tannier, 2008 ), where, in binary matrices obtained from the experiments of Chauve and Tannier ( 2008 ), we have observed that the majority of the rows have low degree, while each high degree row contains many rows of low degree. The (d, k, δ)-C1P Problem has been shown to be polynomial-time solvable when all three parameters are fixed (Chauve et al., 2009b ). Since fixing d also fixes k (k ≤ d), the only case left to consider is the case when δ is unbounded, or the (d, k, ∞)-C1P Problem. Here we show that for every d > k ≥ 2, the (d, k, ∞)-C1P Problem is NP-complete.  相似文献   

7.
Meiotic recombination is a fundamental biological event and one of the principal evolutionary forces responsible for shaping genetic variation within species. In addition to its fundamental role, recombination is central to several critical applied problems. The most important example is "association mapping" in populations, which is widely hoped to help find genes that influence genetic diseases (Carlson et al., 2004; Clark, 2003). Hence, a great deal of recent attention has focused on problems of inferring the historical derivation of sequences in populations when both mutations and recombinations have occurred. In the algorithms literature, most of that recent work has been directed to single-crossover recombination. However, gene-conversion is an important, and more common, form of (two-crossover) recombination which has been much less investigated in the algorithms literature. In this paper, we explicitly incorporate gene-conversion into discrete methods to study historical recombination. We are concerned with algorithms for identifying and locating the extent of historical crossing-over and gene-conversion (along with single-nucleotide mutation), and problems of constructing full putative histories of those events. The novel technical issues concern the incorporation of gene-conversion into recently developed discrete methods (Myers and Griffiths, 2003; Song et al., 2005) that compute lower and upper-bound information on the amount of needed recombination without gene-conversion. We first examine the most natural extension of the lower bound methods from Myers and Griffiths (2003), showing that the extension can be computed efficiently, but that this extension can only yield weak lower bounds. We then develop additional ideas that lead to higher lower bounds, and show how to solve, via integer-linear programming, a more biologically realistic version of the lower bound problem. We also show how to compute effective upper bounds on the number of needed single-crossovers and gene-conversions, along with explicit networks showing a putative history of mutations, single-crossovers and gene-conversions. Both lower and upper bound methods can handle data with missing entries, and the upper bound method can be used to infer missing entries with high accuracy. We validate the significance of these methods by showing that they can be effectively used to distinguish simulation-derived sequences generated without gene-conversion from sequences that were generated with gene-conversion. We apply the methods to recently studied sequences of Arabidopsis thaliana, identifying many more regions in the sequences than were previously identified (Plagnol et al., 2006), where gene-conversion may have played a significant role. Demonstration software is available at www.csif.cs.ucdavis.edu/~gusfield.  相似文献   

8.
9.
Nishant KT  Plys AJ  Alani E 《Genetics》2008,179(2):747-755
Interference-dependent crossing over in yeast and mammalian meioses involves the mismatch repair protein homologs MSH4-MSH5 and MLH1-MLH3. The MLH3 protein contains a highly conserved metal-binding motif DQHA(X)(2)E(X)(4)E that is found in a subset of MLH proteins predicted to have endonuclease activities (Kadyrov et al. 2006). Mutations within this motif in human PMS2 and Saccharomyces cerevisiae PMS1 disrupted the endonuclease and mismatch repair activities of MLH1-PMS2 and MLH1-PMS1, respectively (Kadyrov et al. 2006, 2007; Erdeniz et al. 2007). As a first step in determining whether such an activity is required during meiosis, we made mutations in the MLH3 putative endonuclease domain motif (-D523N, -E529K) and found that single and double mutations conferred mlh3-null-like defects with respect to meiotic spore viability and crossing over. Yeast two-hybrid and chromatography analyses showed that the interaction between MLH1 and mlh3-D523N was maintained, suggesting that the mlh3-D523N mutation did not disrupt the stability of MLH3. The mlh3-D523N mutant also displayed a mutator phenotype in vegetative growth that was similar to mlh3Delta. Overexpression of this allele conferred a dominant-negative phenotype with respect to mismatch repair. These studies suggest that the putative endonuclease domain of MLH3 plays an important role in facilitating mismatch repair and meiotic crossing over.  相似文献   

10.
Inferring haplotype data from genotype data is a crucial step in linking SNPs to human diseases. Given n genotypes over m SNP sites, the haplotype inference (HI) problem deals with finding a set of haplotypes so that each given genotype can be formed by a combining a pair of haplotypes from the set. The perfect phylogeny haplotyping (PPH) problem is one of the many computational approaches to the HI problem. Though it was conjectured that the complexity of the PPH problem was O(nm), the complexity of all the solutions presented until recently was O(nm (2)). In this paper, we make complete use of the column-ordering that was presented earlier and show that there must be some interdependencies among the pairwise relationships between SNP sites in order for the given genotypes to allow a perfect phylogeny. Based on these interdependencies, we introduce the FlexTree (flexible tree) data structure that represents all the pairwise relationships in O(m) space. The FlexTree data structure provides a compact representation of all the perfect phylogenies for the given set of genotypes. We also introduce an ordering of the genotypes that allows the genotypes to be added to the FlexTree sequentially. The column ordering, the FlexTree data structure, and the row ordering we introduce make the O(nm) OPPH algorithm possible. We present some results on simulated data which demonstrate that the OPPH algorithm performs quiet impressively when compared to the previous algorithms. The OPPH algorithm is one of the first O(nm) algorithms presented for the PPH problem.  相似文献   

11.
MOTIVATION: An important tool for analyzing biological networks is the ability to perform homology searches, i.e. given a pattern network one would like to be able to search for occurrences of similar (sub)networks within a set of host networks. In the context of metabolic pathways, Pinter et al. [Bioinformatics, 2005] proposed to solve this computationally hard problem by restricting it to the case where both the pattern and host networks are trees. This restriction, however, severely limits the applicability of their algorithm. RESULTS: We propose a very fast and simple algorithm for the alignment of metabolic pathways that does not restrict the topology of the host or pattern network in any way; instead, our algorithm exploits a natural property of metabolic networks that we call 'local diversity property'. Experiments on a test bed of metabolic pathways from the BioCyc database indicate that our algorithm is much faster than the restricted algorithm of Pinter et al.-the metabolic pathways of two organisms can be aligned in mere seconds-and yet has a wider range of applicability and yields new biological insights. Our ideas can likely be extended to work for the alignment of various types of biological networks other than metabolic pathways. AVAILABILITY: Our algorithm has been implemented in C++ as a user-friendly metabolic pathway alignment tool called METAPAT. The tool runs under Linux or Windows and can be downloaded at http://theinf1.informatik.uni-jena.de/metapat/  相似文献   

12.
Plant architecture is considered to affect herbivory intensity, but it is one of the least studied factors in plant–insect interactions, especially for gall-inducing insects. This study aimed to investigate the influence of plant architecture on the speciose fauna of gall-inducing insects associated with 17 species of Baccharis. Five architectural variables were evaluated: plant height, number of fourth-level shoots, biomass, average level and number of ramifications. The number of galling species associated with each host plant species was also determined. To test the effects of plant architecture on gall richness at the individual level, we used another data set where the number of fourth-level shoots and gall richness were determined for B. concinna, B. dracunculifolia, and B. ramosissima every 3 weeks during 1 year. The average similarity between host species based on gall fauna was low (9%), but plants with the same architectural pattern tended to support similar gall communities. The most important architectural trait influencing gall richness at the species level was the number of fourth-level shoots, which is indicative of the availability of plant meristems, a fundamental tissue for gall induction and development. This variable also showed a positive correlation with gall richness at the individual level. We propose that variations in gall richness among host species are driven by interspecific differences in plant architecture via availability of young, undifferentiated tissue, which is genetically controlled by the strength of the apical dominance. Plant architecture should have evolutionary consequences for gall communities, promoting insect radiation among architecturally similar plants through host shift and sympatric speciation. We also discuss the role of plant architecture in the global biogeography of gall-inducing insects. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

13.
F Heffron  B J McCarthy  H Ohtsubo  E Ohtsubo 《Cell》1979,18(4):1153-1163
The complete nucleotide sequence of the transposon Tn3 and of 20 mutations which affect its transposition are reported. The mutations, generated in vitro by random insertion of synthetic restriction sites, proved to contain small duplications or deletions immediately adjacent to the new restriction site. By determining the phenotype and DNA sequence of these mutations we were able to generate an overlapping phenotypic and nucleotide map. This 4957 bp transposon encodes three polypeptides which account for all but 350 bp of its total coding capacity. These proteins are the transposase, a high molecular weight polypeptide (1015 amino acids) encoded by the tnpA gene; the Tn3-specific repressor, a low molecular weight polypeptide (185 amino acids) encoded by the tnpR gene; and the 286 amino acid beta-lactamase. The 38 bp inverted repeats flanking Tn3 appear to be absolutely required in cis for Tn3 to transpose. Genetic data suggest that Tn3 contains a third site (Gill et al., 1978), designated IRS (internal resolution site), whose absence results in the insertion of two complete copies of Tn3 as direct repeats into the recipient DNA. We suggest that these direct repeats of complete copies of Tn3 are intermediates in transposition, and that the IRS site is required for recombination and subsequent segregation of the direct repeats to leave a single copy of Tn3 (Gill et al., 1978). A 23 nucleotide sequence within the amino terminus of the transposase which shares strong sequence homology with the inverted repeat may be the internal resolution site.  相似文献   

14.
Analysis of haplotypes is an important tool in population genetics, familial heredity and gene mapping. Determination of haplotypes of multiple single nucleotide polymorphisms (SNPs) or other simple mutations is time consuming and expensive when analyzing large populations, and often requires the help of computational and statistical procedures. Based on double PCR amplification of specific alleles, described previously, we have developed a simple, rapid and low-cost method for direct haplotyping of multiple SNPs and simple mutations found within relatively short specific regions or genes (micro-haplotypes). Using this method, it is possible to directly determine the physical linkage of multiple heterozygous alleles, by conducting a series of double allele-specific PCR amplification sets with simple analysis by gel electrophoresis. Application of the method requires prior information as to the sequence of the segment to be haplotyped, including the polymorphic sites. We applied the method to haplotyping of nine sites in the chicken HSP108 gene. One of the haplotypes in the population apparently arose by recombination between two existing haplotypes, and we were able to locate the point of recombination within a segment of 19 bp. We anticipate rapidly growing needs for SNP haplotyping in human (medical and pharmacogenetics), animal and plant genetics; in this context, the multiple double PCR amplifications of specific alleles (MD-PASA) method offers a useful haplotyping tool.  相似文献   

15.
Summary The formation of crown gall tumours involves the transfer of the T-DNA region of the Ti plasmid from Agrobacterium to plant cells and its subsequent integration into plant chromosomes. When agrobacteria are incubated with plant protoplasts or exudates of plants, the T-DNA region is circularized by recombination or cleavage and rejoining between the 25 bp terminal repeats; the formation of circular T-DNAs is thought to be one step in T-DNA transfer (Koukolikova-Nicola et al. 1985; Machida et al. 1986). We previously showed that the virulence region of the Ti plasmid is required for T-DNA circularization. In the present paper, we examined the circularization event in agrobacteria harbouring octopine Ti plasmids with mutations in various loci of the virulence region. The results clearly demonstrate that the gene(s) encoded in the virD locus are necessary for T-DNA circularization. In particular, the gene(s) present in the region proximal to the virD promoter are essential. We propose that roduct(s) of this gene have recombinase or endonuclease activity which specifically recognizes the 25 bp terminal repeats of T-DNA.  相似文献   

16.
Parasitic nematode worms infect a variety of crop plants worldwide. Roots infected by these worms start to look rather unsavory – with knot like tumors (galls) developing all over them. At the core of each gall, a worm matures and lays its eggs. Olmo et al. (2018) looked into the developmental reprogramming that leads to gall formation and found an Arabidopsis protein to be a necessary component in this process.  相似文献   

17.
For a given set L of species and a set T of triplets on L, we seek to construct a phylogenetic network which is consistent with T i.e. which represents all triplets of T. The level of a network is defined as the maximum number of hybrid vertices in its biconnected components. When T is dense, there exist polynomial time algorithms to construct level-0,1 and 2 networks (Aho et al., 1981; Jansson, Nguyen and Sung, 2006; Jansson and Sung, 2006; Iersel et al., 2009). For higher levels, partial answers were obtained in the paper by Iersel and Kelk (2008), with a polynomial time algorithm for simple networks. In this paper, we detail the first complete answer for the general case, solving a problem proposed in Jansson and Sung (2006) and Iersel et al. (2009). For any k fixed, it is possible to construct a level-k network having the minimum number of hybrid vertices and consistent with T, if there is any, in time O(T(k+1)n([4k/3]+1)).  相似文献   

18.
This paper examines some of the rich structure of the syntenic distance model of evolutionary distance, introduced by Ferretti et al. (1996). The syntenic distance between two genomes is the minimum number of fissions, fusions, and translocations required to transform one into the other, ignoring gene order within chromosomes. We prove that the previously unanalyzed algorithm given by Ferretti et al. (1996) is a 2-approximation and no better, and that, further, it always outperforms the algorithm presented by DasGupta et al. (1998). We also prove the same results for an improved version of the Ferretti et al. algorithm. We then prove a number of properties which give insight into the structure of optimal move sequences. We give instances in which any move sequence working solely within connected components is nearly twice optimal and prove a general lower bound based on the spread of genes from each chromosome. We then prove a monotonicity property for the syntenic distance, and bound the difficulty of the hardest instance of any size. We discuss the results of implementing these algorithms and testing them on real and simulated synteny data.  相似文献   

19.
Sulfolobus solfataricus carboxypeptidase (CPSso) is a thermostable zinc-metalloenzyme, consisting of four identical subunits with a M(r) of 43,000. In a previous paper (Occhipinti et al., Biophys J 2003; 85:1165-1175), we developed a structure of the enzyme by molecular modeling and validated it by site-directed mutagenesis and small angle X-ray scattering. Here, we report investigations aimed at further validating the model, as well as at identifying molecular determinants responsible for thermostability. To this end, we took advantage of mass spectrometry techniques, notably LC-MS/MS. The structure was confirmed by such approaches, in that they lead to the identification of a disulfide bridge formed by Cys286 and Cys293, whose location in the model is well suited for giving rise to the crosslink. More notably, we also identified a protease-resistant core consisting of the N- and C-terminal antiparallel alpha-helices, which in the model are predicted to interact with each other via hydrophobic quadrants. On the basis of the model, we also tentatively identified the most tightly interacting residues as Leu7, Ala380, and Leu376. Although the replacement of Ala380 by serine did not detectably impair protein stability, a dramatic drop in thermostability was observed when the two leucines were replaced by either aspartate (L7D; L376D) or asparagine (L7N; L376N). We then investigated the kinetic thermal stability of the wild type and the mutants by determining the thermodynamic activation parameters, DeltaG++, DeltaH++, and DeltaS++. Besides highlighting the key role of the hydrophobic core in thermostability, these results suggest clearly different mechanisms of destabilization by the single mutations, depending on whether the leucines are replaced by asparagines or aspartates.  相似文献   

20.
The genetics of B-cell chronic lymphocytic leukemia (B-CLL) differ considerably from most other forms of hematologic malignancy which are usually characterized by chromosome translocations. B-CLL typically contains chromosomal deletions and chromosomes 13q14 and 11q22-->q23 are the most common. These two regions appear to share a common ancestral origin (Auer et al., 2007b). Overall, chromosomal abnormalities can be found in the majority of patients with B-CLL when using sensitive techniques (Dohneret al., 2000) and possibly reflects an underlying predisposition, with a small but significant number of familial cases. Although single and consistent abnormalities are most common, multiple rearrangements can occur, often with disease progression (Feganetal., 1995; Dohner et al., 2000). Regions of recurrent deletion suggest the presence of tumor suppressor genes if following Knudson's theoretical 2-hit model. However, despite extensive sequencing analysis over the last decade and lack of pathogenic mutations identified, there has been a move away from this suggested hypothesis and alternative mechanisms of gene inactivation involving epigenetic silencing or haploinsufficiency may be considered as more likely in this disease. This review focuses on the common genetic abnormalities in B-CLL and relates them to some of the more recent hypotheses on inactivation of genes within these regions of deletion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号