首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
We study three classical problems of genome rearrangement--sorting, halving, and the median problem--in a restricted double cut and join (DCJ) model. In the DCJ model, introduced by Yancopoulos et al., we can represent rearrangement events that happen in multichromosomal genomes, such as inversions, translocations, fusions, and fissions. Two DCJ operations can mimic transpositions or block interchanges by first extracting an appropriate segment of a chromosome, creating a temporary circular chromosome, and then reinserting it in its proper place. In the restricted model, we are concerned with multichromosomal linear genomes and we require that each circular excision is immediately followed by its reincorporation. Existing linear-time DCJ sorting and halving algorithms ignore this reincorporation constraint. In this article, we propose a new algorithm for the restricted sorting problem running in O(n log n) time, thus improving on the known quadratic time algorithm. We solve the restricted halving problem and give an algorithm that computes a multilinear halved genome in linear time. Finally, we show that the restricted median problem is NP-hard as conjectured.  相似文献   

2.
Given a phylogenetic tree involving whole genome duplication events, we contribute to solving the problem of computing the rearrangement and double cut-and-join (DCJ) distances on a branch of the tree linking a duplication node d to a speciation node or a leaf s. In the case of a genome G at s containing exactly two copies of each gene, the genome halving problem is to find a perfectly duplicated genome D at d minimizing the rearrangement distance with G. We generalize the existing exact linear-time algorithm for genome halving to the case of a genome G with missing gene copies. In the case of a known ancestral duplicated genome D, we develop a greedy approach for computing the distance between G and D, called the double distance. Two algorithms are developed in both cases of a genome G containing exactly two copies of each gene, or at most two copies of each gene (with missing gene copies). These algorithms are shown time-efficient and very accurate for both the rearrangement and DCJ distances.  相似文献   

3.
MOTIVATION: Finding genomic distance based on gene order is a classic problem in genome rearrangements. Efficient exact algorithms for genomic distances based on inversions and/or translocations have been found but are complicated by special cases, rare in simulations and empirical data. We seek a universal operation underlying a more inclusive set of evolutionary operations and yielding a tractable genomic distance with simple mathematical form. RESULTS: We study a universal double-cut-and-join operation that accounts for inversions, translocations, fissions and fusions, but also produces circular intermediates which can be reabsorbed. The genomic distance, computable in linear time, is given by the number of breakpoints minus the number of cycles (b-c) in the comparison graph of the two genomes; the number of hurdles does not enter into it. Without changing the formula, we can replace generation and re-absorption of a circular intermediate by a generalized transposition, equivalent to a block interchange, with weight two. Our simple algorithm converts one multi-linear chromosome genome to another in the minimum distance.  相似文献   

4.
Nucleotide insertions and deletions (indels) are responsible for gaps in the sequence alignments. Indel is one of the major sources of evolutionary change at the molecular level. We have examined the patterns of insertions and deletions in the 19 mammalian genomes, and found that deletion events are more common than insertions in the mammalian genomes. Both the number of insertions and deletions decrease rapidly when the gap length increases and single nucleotide indel is the most frequent in all indel events. The frequencies of both insertions and deletions can be described well by power law.Key Words: Insertion, deletion, gap, indel, mammalian genome.  相似文献   

5.
Lateral gene transfer has emerged as an important force in bacterial evolution. A substantial number of genes can be inserted into or deleted from genomes through the process of lateral transfer. In this study, we looked for atypical occurrence of genes among related organisms to detect laterally transferred genes. We have analyzed 50 bacterial complete genomes from nine groups. For each group we use a 16s rRNA phylogeny and a comparison of protein similarity to map gene insertions/deletions onto their species phylogeny. The results reveal that there is poor correlation of genes inserted, deleted, and duplicated with evolutionary branch length. In addition, the numbers of genes inserted, deleted, or duplicated within the same branch are not always correlated with each other. Nor is there any similarity within groups. For example, in the Rhizobiales group, the ratio of insertions to deletions in the evolutionary branch leading to Agrobacterium tumefaciens str. C58 (Cereon) is 0.52, but it is 39.52 for Mesorhizobium loti. Most strikingly, the number of insertions of foreign genes is much larger in the external branches of the trees. These insertions also greatly outnumber the occurrence of deletions, and yet the genome sizes of these bacteria remain roughly constant. This indicates that many of the insertions are specific to each organism and are lost before related species can evolve. Simulations of the process of insertion and deletion, tailored to each phylogeny, support this conclusion.  相似文献   

6.
MOTIVATION: The double cut and join operation (abbreviated as DCJ) has been extensively used for genomic rearrangement. Although the DCJ distance between signed genomes with both linear and circular (uni- and multi-) chromosomes is well studied, the only known result for the NP-complete unsigned DCJ distance problem is an approximation algorithm for unsigned linear unichromosomal genomes. In this article, we study the problem of computing the DCJ distance on two unsigned linear multichromosomal genomes (abbreviated as UDCJ). RESULTS: We devise a 1.5-approximation algorithm for UDCJ by exploiting the distance formula for signed genomes. In addition, we show that UDCJ admits a weak kernel of size 2k and hence an FPT algorithm running in O(2(2k)n) time.  相似文献   

7.
Hannenhalli and Pevzner developed the first polynomial-time algorithm for the combinatorial problem of sorting signed genomic data. Their algorithm determines the minimum number of reversals required for rearranging a genome to another -but only in the absence of gene duplicates. However, duplicates often account for 40% of a genome. In this paper, we show how to extend Hannenhalli and Pevzner's approach to deal with genomes with multi-gene families. We propose a new heuristic algorithm to compute the nearest reversal distance between two genomes with multi-gene families via binary integer programming. The experimental results on both synthetic and real biological data demonstrate that the proposed algorithm is able to find the reversal distance with high accuracy.  相似文献   

8.
We have developed a software tool, GenomeComp, for summarizing, parsing and visualizing the genome sequences comparison results derived from voluminous BLAST textual output. With GenomeComp, the variation between genomes can be easily highlighted, such as repeat regions, insertions, deletions and rearrangements of genomic segments. This software provides a new visualizing tool for microbe comparative genomics.  相似文献   

9.
A phylogenetic analysis of indel dynamics in the cotton genus   总被引:2,自引:0,他引:2  
Genome size evolution is a dynamic process involving counterbalancing mechanisms whose actions vary across lineages and over time. Whereas the primary mechanism of expansion, transposable element (TE) amplification, has been widely documented, the evolutionary dynamics of genome contraction have been less thoroughly explored. To evaluate the relative impact and evolutionary stability of the mechanisms that affect genome size, we conducted a phylogenetic analysis of indel rates for 2 genomic regions in 4 Gossypium genomes: the 2 coresident genomes (A(T) and D(T)) of tetraploid cotton and its model diploid progenitors, Gossypium arboreum (A) and Gossypium raimondii (D). We determined the rates of sequence gain or loss along each branch, partitioned by mechanism, and how these changed during species divergence. In general, there has been a propensity toward growth of the diploid genomes and contraction in the polyploid. Most of the size difference between the diploid species occurred prior to polyploid divergence and was largely attributable to TE amplification in the A/A(T) genome. After separating from the true parents of the polyploid genomes, both diploid genomes experienced slower sequence gain than in the ancestor, due to fewer TE insertions in the A genome and a combination of increased deletions and decreased TE insertions in the D genome. Both genomes of the polyploid displayed increased rates of deletion and decreased rates of insertion, leading to a rate of near stasis in D(T) and overall contraction in A(T) resulting in polyploid genome contraction. As expected, TE insertions contributed significantly to the genome size differences; however, intrastrand homologous recombination, although rare, had the most significant impact on the rate of deletion. Small indel data for the diploids suggest the possibility of a bias as the smaller genomes add less or delete more sequence through small indels than do the larger genomes, whereas data for the polyploid suggest increased sequence turnover in general (both as small deletions and small insertions). Illegitimate recombination, although not demonstrated to be a dominant mechanism of genome size change, was biased in the polyploid toward deletions, which may provide a partial explanation of polyploid genomic downsizing.  相似文献   

10.
We propose new algorithms for computing pairwise rearrangement scenarios that conserve the combinatorial structure of genomes. More precisely, we investigate the problem of sorting signed permutations by reversals without breaking common intervals. We describe a combinatorial framework for this problem that allows us to characterize classes of signed permutations for which one can compute, in polynomial time, a shortest reversal scenario that conserves all common intervals. In particular, we define a class of permutations for which this computation can be done in linear time with a very simple algorithm that does not rely on the classical Hannenhalli-Pevzner theory for sorting by reversals. We apply these methods to the computation of rearrangement scenarios between permutations obtained from 16 synteny blocks of the X chromosomes of the human, mouse, and rat  相似文献   

11.
12.

Background

There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes?

Results

Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera.

Conclusions

The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.
  相似文献   

13.
In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods ( [1], [2]) have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G_1 and G_2: first, one establishes a oneto- one correspondence between genes of G_1 and genes of G_2 ; second, once this correspondence is established, it defines explicitly a permutation and it is then possible to quantify their similarity using classical measures defined for permutations, like the number of breakpoints. Hence these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes, and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, to compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy are deleted in both genomes for each gene family, and the matching model, that computes a maximal correspondence for each gene family. We show that for these two models, and for three (dis)similarity measures on permutations, namely the number of common intervals, the maximum adjacency disruption (MAD) number and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete, and even APXhard for the MAD number and SAD number.  相似文献   

14.
The Hardness (Ha) locus controls grain hardness in hexaploid wheat (Triticum aestivum) and its relatives (Triticum and Aegilops species) and represents a classical example of a trait whose variation arose from gene loss after polyploidization. In this study, we investigated the molecular basis of the evolutionary events observed at this locus by comparing corresponding sequences of diploid, tertraploid, and hexaploid wheat species (Triticum and Aegilops). Genomic rearrangements, such as transposable element insertions, genomic deletions, duplications, and inversions, were shown to constitute the major differences when the same genomes (i.e., the A, B, or D genomes) were compared between species of different ploidy levels. The comparative analysis allowed us to determine the extent and sequences of the rearranged regions as well as rearrangement breakpoints and sequence motifs at their boundaries, which suggest rearrangement by illegitimate recombination. Among these genomic rearrangements, the previously reported Pina and Pinb genes loss from the Ha locus of polyploid wheat species was caused by a large genomic deletion that probably occurred independently in the A and B genomes. Moreover, the Ha locus in the D genome of hexaploid wheat (T. aestivum) is 29 kb smaller than in the D genome of its diploid progenitor Ae. tauschii, principally because of transposable element insertions and two large deletions caused by illegitimate recombination. Our data suggest that illegitimate DNA recombination, leading to various genomic rearrangements, constitutes one of the major evolutionary mechanisms in wheat species.  相似文献   

15.
The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models.  相似文献   

16.
Long INterspersed Elements (LINE-1s or L1s) are abundant non-LTR retrotransposons in mammalian genomes that are capable of insertional mutagenesis. They have been associated with target site deletions upon insertion in cell culture studies of retrotransposition. Here, we report 50 deletion events in the human and chimpanzee genomes directly linked to the insertion of L1 elements, resulting in the loss of ~18 kb of sequence from the human genome and ~15 kb from the chimpanzee genome. Our data suggest that during the primate radiation, L1 insertions may have deleted up to 7.5 Mb of target genomic sequences. While the results of our in vivo analysis differ from those of previous cell culture assays of L1 insertion-mediated deletions in terms of the size and rate of sequence deletion, evolutionary factors can reconcile the differences. We report a pattern of genomic deletion sizes similar to those created during the retrotransposition of Alu elements. Our study provides support for the existence of different mechanisms for small and large L1-mediated deletions, and we present a model for the correlation of L1 element size and the corresponding deletion size. In addition, we show that internal rearrangements can modify L1 structure during retrotransposition events associated with large deletions.  相似文献   

17.
Retroelements (REs) occupy up to 40% of the human genome. Newly integrated REs can change the pattern of expression of pre-existing host genes and therefore might play a significant role in evolution. In particular, human- and primate-specific REs could affect the divergence of the Hominoidea superfamily. A comparative genome-wide analysis of RE sites of integration, neighboring genes, and their regulatory interplay in human and ape genomes would be of help in understanding the impact of REs on evolution and genome regulation. We have developed a technique for the genome-wide comparison of the integrations of transposable elements in genomic DNAs of closely related species. The technique called targeted genome differences analysis (TGDA) is also useful for the detection of deletion/insertion polymorphisms of REs. The technique is based on an enhanced version of subtractive hybridization and does not require preliminary knowledge of the genome sequences under comparison. In this report, we describe its application to the detection and analysis of human specific L1 integrations and their polymorphisms. We obtained a library highly enriched in human-specific L1 insertions and identified 24 such new insertions. Many of these insertions are polymorphic in human populations. The total number of human-specific L1 inserts was estimated to be approximately 4000. The results suggest that TGDA is a universal method that can be successfully used for the detection of evolutionary and polymorphic markers in any closely related genomes.  相似文献   

18.
Recently integrated Alu elements and human genomic diversity   总被引:8,自引:0,他引:8  
A comprehensive analysis of two Alu Y lineage subfamilies was undertaken to assess Alu-associated genomic diversity and identify new Alu insertion polymorphisms for the study of human population genetics. Recently integrated Alu elements (283) from the Yg6 and Yi6 subfamilies were analyzed by polymerase chain reaction (PCR), and 25 of the loci analyzed were polymorphic for insertion presence/absence within the genomes of a diverse array of human populations. These newly identified Alu insertion polymorphisms will be useful tools for the study of human genomic diversity. Our screening of the Alu insertion loci also resulted in the recovery of several "young" Alu elements that resided at orthologous positions in nonhuman primate genomes. Sequence analysis demonstrated these "young" Alu insertions were the products of gene conversion events of older, preexisting Alu elements or independent parallel forward insertions of older Alu elements in the same short genomic region. The level of gene conversion between Alu elements suggests that it may have an influence on the single nucleotide polymorphism within Alu elements in the genome. We have also identified two genomic deletions associated with the retroposition and insertion of Alu Y lineage elements into the human genome. This type of Alu retroposition-mediated genomic deletion is a novel source of lineage-specific evolution within primate genomes.  相似文献   

19.
Efficient bacterial genetic engineering approaches with broad‐host applicability are rare. We combine two systems, mobile group II introns (‘targetrons’) and Cre/lox, which function efficiently in many different organisms, into a versatile platform we call GETR (Genome Editing via Targetrons and Recombinases). The introns deliver lox sites to specific genomic loci, enabling genomic manipulations. Efficiency is enhanced by adding flexibility to the RNA hairpins formed by the lox sites. We use the system for insertions, deletions, inversions, and one‐step cut‐and‐paste operations. We demonstrate insertion of a 12‐kb polyketide synthase operon into the lacZ gene of Escherichia coli, multiple simultaneous and sequential deletions of up to 120 kb in E. coli and Staphylococcus aureus, inversions of up to 1.2 Mb in E. coli and Bacillus subtilis, and one‐step cut‐and‐pastes for translocating 120 kb of genomic sequence to a site 1.5 Mb away. We also demonstrate the simultaneous delivery of lox sites into multiple loci in the Shewanella oneidensis genome. No selectable markers need to be placed in the genome, and the efficiency of Cre‐mediated manipulations typically approaches 100%.  相似文献   

20.
Presence/absence patterns of retroposon insertions at orthologous genomic loci constitute straightforward markers for phylogenetic or population genetic studies. In birds, the convenient identification and utility of these markers has so far been mainly restricted to the lineages leading to model birds (i.e., chicken and zebra finch). We present an easy-to-use, rapid, and cost-effective method for the experimental isolation of chicken repeat 1 (CR1) insertions from virtually any bird genome and potentially nonavian genomes. The application of our method to the little grebe genome yielded insertions belonging to new CR1 subfamilies that are scattered all across the phylogenetic tree of avian CR1s. Furthermore, presence/absence analysis of these insertions provides the first retroposon evidence grouping flamingos + grebes as Mirandornithes and several markers for all subsequent branching events within grebes (Podicipediformes). Five markers appear to be species-specific insertions, including the hitherto first evidence in birds for biallelic CR1 insertions that could be useful in future population genetic studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号