首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
Given a phylogenetic tree for a family of tandemly repeated genes and their signed order on the chromosome, we aim to find the minimum number of inversions compatible with an evolutionary history of this family. This is the first attempt to account for inversions in an evolutionary model of tandemly repeated genes. We present a branch-and-bound algorithm that finds the exact solution, and a polynomial-time heuristic based on the breakpoint distance. We show, on simulated data, that those algorithms can be used to improve phylogenetic inference of tandemly repeated gene families. An application on a published phylogeny of KRAB zinc finger genes is presented.  相似文献   

2.
We present the first practical algorithm for the optimal linear leaf ordering of trees that are generated by hierarchical clustering. Hierarchical clustering has been extensively used to analyze gene expression data, and we show how optimal leaf ordering can reveal biological structure that is not observed with an existing heuristic ordering method. For a tree with n leaves, there are 2(n-1) linear orderings consistent with the structure of the tree. Our optimal leaf ordering algorithm runs in time O(n(4)), and we present further improvements that make the running time of our algorithm practical.  相似文献   

3.
A preliminary step to most comparative genomics studies is the annotation of chromosomes as ordered sequences of genes. Different genetic mapping techniques often give rise to different maps with unequal gene content and sets of unordered neighboring genes. Only partial orders can thus be obtained from combining such maps. However, once a total order O is known for a given genome, it can be used as a reference to order genes of a closely related species characterized by a partial order P. Our goal is to find a linearization of P that is as close as possible to O, in term of a given genomic distance. We first prove NP-completeness complexity results considering the breakpoint and the common interval distances. We then focus on the breakpoint distance and give a dynamic programming algorithm whose running time is exponential for general partial orders, but polynomial when the partial order is derived from a bounded number of genetic maps. A time-efficient greedy heuristic is then given for the general case and is empirically shown to produce solutions within 10% of the optimal solution, on simulated data. Applications to the analysis of grass genomes are presented.  相似文献   

4.
The total order of genes or markers on a chromosome is crucial for most comparative genomics studies. However, current gene mapping efforts might only suffice to provide a partial order of the genes on a chromosome. Several different genes or markers might be mapped at the same position due to the low resolution of gene mapping or missing data. Moreover, conflicting datasets might give rise to the ambiguity of gene order. In this paper, we consider the reversal distance and breakpoint distance problems for partially ordered genomes. We first prove that these problems are nondeterministic polynomial-time (NP)-hard, and then give an efficient heuristic algorithm to compute the breakpoint distance between partially ordered genomes. The algorithm is based on an efficient approximation algorithm for a natural generalization of the well-known feedback vertex set problem, and has been tested on both simulated and real biological datasets. The experimental results demonstrate that our algorithm is quite effective for estimating the breakpoint distance between partially ordered genomes and for inferring the gene (total) order.  相似文献   

5.
Hannenhalli and Pevzner developed the first polynomial-time algorithm for the combinatorial problem of sorting signed genomic data. Their algorithm determines the minimum number of reversals required for rearranging a genome to another -but only in the absence of gene duplicates. However, duplicates often account for 40% of a genome. In this paper, we show how to extend Hannenhalli and Pevzner's approach to deal with genomes with multi-gene families. We propose a new heuristic algorithm to compute the nearest reversal distance between two genomes with multi-gene families via binary integer programming. The experimental results on both synthetic and real biological data demonstrate that the proposed algorithm is able to find the reversal distance with high accuracy.  相似文献   

6.
The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably.  相似文献   

7.
The order of genes in the genomes of species can change during evolution and can provide information about their phylogenetic relationship. An interesting method to infer the phylogenetic relationship from the gene orders is to use different types of rearrangement operations and to find possible rearrangement scenarios using these operations. One of the most common rearrangement operations is reversals, which reverse the order of a subset of neighbored genes. In this paper, we study the problem to find the ancestral gene order for three species represented by their gene orders. The rearrangement scenario should use a minimal number of reversals and no other rearrangement operations. This problem is called the Median problem and is known to be NP--complete. In this paper, we describe a heuristic algorithm for finding solutions to the Median problem that searches for rearrangement scenarios with the additional property that gene groups should not be destroyed by reversal operations. The concept of conserved intervals for signed permutations is used to describe such gene groups. We show experimentally, for different types of test problems, that the proposed algorithm produces very good results compared to other algorithms for the Median problem. We also integrate our reversal selection procedure into the well-known MGR and GRAPPA algorithms and show that they achieve a significant speedup while obtaining solutions of the same quality as the original algorithms on the test problems.  相似文献   

8.
Breakpoint graph analysis is a key algorithmic technique in studies of genome rearrangements. However, breakpoint graphs are defined only for genomes without duplicated genes, thus limiting their applications in rearrangement analysis. We discuss a connection between the breakpoint graphs and de Bruijn graphs that leads to a generalization of the notion of breakpoint graph for genomes with duplicated genes. We further use the generalized breakpoint graphs to study the genome halving problem (first introduced and solved by Nadia El-Mabrouk and David Sankoff). The El-Mabrouk-Sankoff algorithm is rather complex, and, in this paper, we present an alternative approach that is based on generalized breakpoint graphs. The generalized breakpoint graphs make the El-Mabrouk-Sankoff result more transparent and promise to be useful in future studies of genome rearrangements  相似文献   

9.
The small-island effect (SIE) has become a widespread pattern in island biogeography and biodiversity research. However, in most previous studies only area is used for the detection of the SIE, while other causal factors such as habitat diversity is rarely considered. Therefore, the role of habitat diversity in generating SIEs is poorly known. Here, we compiled 86 global datasets that included the variables of habitat diversity, area and species richness to systematically investigate the prevalence and underlying factors determining the role of habitat diversity in generating SIEs. For each dataset, we used both path analysis and breakpoint regressions to identify the existence of an SIE. We collected a number of system characteristics and employed logistic regression models and an information–theoretic approach to determine which combination of variables was important in determining the role of habitat diversity in generating SIEs. Among the 61 datasets with adequate fits, habitat diversity was found to influence the detection of SIEs in 32 cases (52.5%) when using path analysis. By contrast, SIEs were detected in 26 of 61 cases (42.6%) using breakpoint regressions. Model selection and model-averaged parameter estimates showed that Number of sites, Habitat range and Species range were three key variables that determined the role of habitat diversity in generating SIEs. However, Area range, Taxon group and Site type received considerably less support. Our study demonstrates that the effect of habitat diversity on generating SIEs is quite prevalent. The inclusion of habitat diversity is important because it provides a causal factor for the detection of SIEs. We conclude that for a better understanding of the causes of SIEs, habitat diversity should be included in future studies.  相似文献   

10.
Machado CA  Haselkorn TS  Noor MA 《Genetics》2007,175(3):1289-1306
There is increasing evidence that chromosomal inversions may facilitate the formation or persistence of new species by allowing genetic factors conferring species-specific adaptations or reproductive isolation to be inherited together and by reducing or eliminating introgression. However, the genomic domain of influence of the inverted regions on introgression has not been carefully studied. Here, we present a detailed study on the consequences that distance from inversion breakpoints has had on the inferred level of gene flow and divergence between Drosophila pseudoobscura and D. persimilis. We identified the locations of the inversion breakpoints distinguishing D. pseudoobscura and D. persimilis in chromosomes 2, XR, and XL. Population genetic data were collected at specific distances from the inversion breakpoints of the second chromosome and at two loci inside the XR and XL inverted regions. For loci outside the inverted regions, we found that distance from the nearest inversion breakpoint had a significant effect on several measures of divergence and gene flow between D. pseudoobscura and D. persimilis. The data fitted a logarithmic relationship, showing that the suppression of crossovers in inversion heterozygotes also extends to loci located outside the inversion but close to it (within 1-2 Mb). Further, we detected a significant reduction in nucleotide variation inside the inverted second chromosome region of D. persimilis and near one breakpoint, consistent with a scenario in which this inversion arose and was fixed in this species by natural selection.  相似文献   

11.
The comparison of the gene orders in a set of genomes can be used to infer their phylogenetic relationships and to reconstruct ancestral gene orders. For three genomes this is done by solving the "median problem for breakpoints"; this solution can then be incorporated into a routine for estimating optimal gene orders for all the ancestral genomes in a fixed phylogeny. For the difficult (and most prevalent) case where the genomes contain partially different sets of genes, we present a general heuristic for the median problem for induced breakpoints. A fixed-phylogeny optimization based on this is applied in a phylogenetic study of a set of completely sequenced protist mitochondrial genomes, confirming some of the recent sequence-based groupings which have been proposed and, conversely, confirming the usefulness of the breakpoint method as a phylogenetic tool even for small genomes.  相似文献   

12.
Ectopic exchange between transposable elements or other repetitive sequences along a chromosome can produce chromosomal inversions. As a result, genome sequence studies typically find sequence similarity between corresponding inversion breakpoint regions. Here, we identify and investigate the breakpoint regions of the X chromosome inversion distinguishing Drosophila mojavensis and Drosophila arizonae. We localize one inversion breakpoint to 13.7 kb and localize the other to a 1-Mb interval. Using this localization and assuming microsynteny between Drosophila melanogaster and D. arizonae, we pinpoint likely positions of the inversion breakpoints to windows of less than 3000 bp. These breakpoints define the size of the inversion to approximately 11 Mb. However, in contrast to many other studies, we fail to find significant sequence similarity between the 2 breakpoint regions. The localization of these inversion breakpoints will facilitate future genetic and molecular evolutionary studies in this species group, an emerging model system for ecological genetics.  相似文献   

13.
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.  相似文献   

14.
The Hardy-Weinberg law is among the most important principles in the study of biological systems. Given its importance, many tests have been devised to determine whether a finite population follows Hardy-Weinberg proportions. Because asymptotic tests can fail, Guo and Thompson developed an exact test; unfortunately, the Monte Carlo method they proposed to evaluate their test has a running time that grows linearly in the size of the population N. Here, we propose a new algorithm whose expected running time is linear in the size of the table produced, and completely independent of N. In practice, this new algorithm can be considerably faster than the original method.  相似文献   

15.
Rigid-body docking approaches are not sufficient to predict the structure of a protein complex from the unbound (native) structures of the two proteins. Accounting for side chain flexibility is an important step towards fully flexible protein docking. This work describes an approach that allows conformational flexibility for the side chains while keeping the protein backbone rigid. Starting from candidates created by a rigid-docking algorithm, we demangle the side chains of the docking site, thus creating reasonable approximations of the true complex structure. These structures are ranked with respect to the binding free energy. We present two new techniques for side chain demangling. Both approaches are based on a discrete representation of the side chain conformational space by the use of a rotamer library. This leads to a combinatorial optimization problem. For the solution of this problem, we propose a fast heuristic approach and an exact, albeit slower, method that uses branch-and-cut techniques. As a test set, we use the unbound structures of three proteases and the corresponding protein inhibitors. For each of the examples, the highest-ranking conformation produced was a good approximation of the true complex structure.  相似文献   

16.
In comparative genomics, gene order data is often modeled as signed permutations. A classical problem for genome comparison is to detect common intervals in permutations, that is, genes that are colocalized in several species, indicating that they remained grouped during evolution. A second largely studied problem related to gene order is to compute a minimum scenario of reversals that transforms a signed permutation into another. Several studies began to mix the two problems and it was observed that their results are not always compatible: Often, parsimonious scenarios of reversals break common intervals. If a scenario does not break any common interval, it is called perfect. In two recent studies, Berard et al. defined a class of permutations for which building a perfect scenario of reversals sorting a permutation was achieved in polynomial time and stated as an open question whether it is possible to decide, given a permutation, if there exists a minimum scenario of reversals that is perfect. In this paper, we give a solution to this problem and prove that this widens the class of permutations addressed by the aforementioned studies. We implemented and tested this algorithm on gene order data of chromosomes from several mammal species and we compared it to other methods. The algorithm helps to choose among several possible scenarios of reversals and indicates that the minimum scenario of reversals is not always the most plausible  相似文献   

17.
The evolutionary history of certain species such as polyploids are modeled by a generalization of phylogenetic trees called multi-labeled phylogenetic trees, or MUL trees for short. One problem that relates to inferring a MUL tree is how to construct the smallest possible MUL tree that is consistent with a given set of rooted triplets, or SMRT problem for short. This problem is NP-hard. There is one algorithm for the SMRT problem which is exact and runs in time, where is the number of taxa. In this paper, we show that the SMRT does not seem to be an appropriate solution from the biological point of view. Indeed, we present a heuristic algorithm named MTRT for this problem and execute it on some real and simulated datasets. The results of MTRT show that triplets alone cannot provide enough information to infer the true MUL tree. So, it is inappropriate to infer a MUL tree using triplet information alone and considering the minimum number of duplications. Finally, we introduce some new problems which are more suitable from the biological point of view.  相似文献   

18.

Background

The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

Methods

We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of \(k \le 3\) and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice.

Results

The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR.

Conclusions

We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.
  相似文献   

19.
Combined analysis of fourteen nuclear genes refines the Ursidae phylogeny   总被引:2,自引:0,他引:2  
Despite numerous studies, questions remain about the evolutionary history of Ursidae and additional independent genetic markers were needed to elucidate these ambiguities. For this purpose, we sequenced ten nuclear genes for all the eight extant bear species. By combining these new sequences with those of four other recently published nuclear markers, we provide new insights into the phylogenetic relationships of the Ursidae family members. The hypothesis that the giant panda was the first species to diverge among ursids is definitively confirmed and the precise branching order within the Ursus genus is clarified for the first time. Moreover, our analyses indicate that the American and the Asiatic black bears do not cluster as sister taxa, as had been previously hypothesised. Sun and sloth bears clearly appear as the most basal ursine species but uncertainties about their exact relationships remain. Since our larger dataset did not enable us to clarify this last question, identifying rare genomic changes in bear genomes could be a promising solution for further studies.  相似文献   

20.
MOTIVATION: Deciphering the location of gene duplications and multiple gene duplication episodes on the Tree of Life is fundamental to understanding the way gene families and genomes evolve. The multiple gene duplication problem provides a framework for placing gene duplication events onto nodes of a given species tree, and detecting episodes of multiple gene duplication. One version of the multiple gene duplication problem was defined by Guigó et al. in 1996. Several heuristic solutions have since been proposed for this problem, but no exact algorithms were known. RESULTS: In this article we solve this longstanding open problem by providing the first exact and efficient solution. We also demonstrate the improvement offered by our algorithm over the best heuristic approaches, by applying it to several simulated as well as empirical datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号