首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
《Gene》1996,172(1):GC11-GC17
Algorithms inspired by comparative genomics calculate an edit distance between two linear orders based on elementary edit operations such as inversion, transposition and reciprocal translocation. All operations are generally assigned the same weight, simply by default, because no systematic empirical studies exist verifying whether algorithmic outputs involve realistic proportion of each. Nor de we have data on how weights should vary with the length of the inverted or transposed segment of the chromosome. In this paper, we present a rapid algorithm that allows each operation to take on a range of weights, producing an relatively tight bound on the distance between single-chromosome genomes, by means of a greedy search with look-ahead. The efficiency of this algorithm allows us to test random genomes for each parameter setting, to detect gene order similarity and to infer the parameter values most appropriate to the phylogenetic domain under study. We apply this method to genome segments in which the sa me gene order is conserved in Escherichia coli and Bacillus subtilis, as well as to the gene order in human versus Drosophila mitochondrial genomes. In both cases, we conclude that it is most appropriate to assign somewhat more than twice the weight to transpositions and inverted transpositions than to inversions. We also explore segment-length weighting for fungal mitochondrial gene orders.  相似文献   

3.
Compositional constraints and genome evolution   总被引:31,自引:0,他引:31  
  相似文献   

4.
Fast algorithms for large-scale genome alignment and comparison   总被引:35,自引:5,他引:30       下载免费PDF全文
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory as the original MUMmer system. It has been used successfully to align the entire human and mouse genomes to each other, and to align numerous smaller eukaryotic and prokaryotic genomes. A new module permits the alignment of multiple DNA sequence fragments, which has proven valuable in the comparison of incomplete genome sequences. We also describe a method to align more distantly related genomes by detecting protein sequence homology. This extension to MUMmer aligns two genomes after translating the sequence in all six reading frames, extracts all matching protein sequences and then clusters together matches. This method has been applied to both incomplete and complete genome sequences in order to detect regions of conserved synteny, in which multiple proteins from one organism are found in the same order and orientation in another. The system code is being made freely available by the authors.  相似文献   

5.
A central problem in genome rearrangement is finding a most parsimonious rearrangement scenario using certain rearrangement operations. An important problem of this type is sorting a signed genome by reversals and translocations (SBRT). Hannenhalli and Pevzner presented a duality theorem for SBRT which leads to a polynomial time algorithm for sorting a multi-chromosomal genome using a minimum number of reversals and translocations. However, there is one case for which their theorem and algorithm fail. We describe that case and suggest a correction to the theorem and the polynomial algorithm. The solution of SBRT uses a reduction to the problem of sorting a signed permutation by reversals (SBR). The best extant algorithms for SBR require quadratic time. The common approach to solve SBR is by finding a safe reversal using the overlap graph or the interleaving graph of a permutation. We describe a family of signed permutations which proves a quadratic lower bound on the number of affected vertices in the overlap/interleaving graph during any optimal sorting scenario. This implies, in particular, an Omega(n3) lower bound for Bergeron's algorithm.  相似文献   

6.
GRIL is a tool to automatically identify collinear regions in a set of bacterial-size genome sequences. GRIL uses three basic steps. First, regions of high sequence identity are located. Second, some of these regions are filtered based on user-specified criteria. Finally, the remaining regions of sequence identity are used to define significant collinear regions among the sequences. By locating collinear regions of sequence, GRIL provides a basis for multiple genome alignment using current alignment systems. GRIL also provides a basis for using current inversion distance tools to infer phylogeny. AVAILABILITY: GRIL is implemented in C++ and runs on any x86-based Linux or Windows platform. It is available from http://asap.ahabs.wisc.edu/gril  相似文献   

7.
By utilization of mid-repetitive sequences, the intracisternal A particle (IAP) gene, as a probe, genome rearrangement involving IAP genes and their neighboring sequences in rodent cells can be monitored. This is based on electrophoretic separation of the twice digested restriction fragments of genomic DNA in a 2-D pattern. The first digestion was done in solution followed by electrophoresis of the restriction fragments in the first dimension. A second restriction enzyme digestion was carried out in situ in the gel followed by electrophoresis in a second dimension perpendicular to the first electrophoresis. After Southern blotting, the DNA on the filter is hybridized with a probe that is a fragment located near the 5' end of the IAP gene, but does not overlap with the 5' long terminal repeat (LTR). The exposed X-ray film revealed about 370 distinct spots in the 2-D maps. In comparing the 2-D maps, genome rearrangement involving IAP was detected.  相似文献   

8.
Summary: ROBIN is a web server for analyzing genome rearrangementof block-interchanges between two chromosomal genomes. It takestwo or more linear/circular chromosomes as its input, and computesthe number of minimum block-interchange rearrangements betweenany two input chromosomes for transforming one chromosome intoanother and also determines an optimal scenario taking thisnumber of rearrangements. The input can be either bacterial-sizesequence data or landmark-order data. If the input is sequencedata, ROBIN will automatically search for the identical landmarksthat are the homologous/conserved regions shared by all theinput sequences. Availability: ROBIN is freely accessed at http://genome.life.nctu.edu.tw/ROBIN Contact: cllu{at}mail.nctu.edu.tw  相似文献   

9.

Background  

Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes.  相似文献   

10.
11.
Gene sequences contain a gold mine of phylogenetic information. But unfortunately for taxonomists this information does not only tell the story of the species from which it was collected. Genes have their own complex histories which record speciation events, of course, but also many other events. Among them, gene duplications, transfers and losses are especially important to identify. These events are crucial to account for when reconstructing the history of species, and they play a fundamental role in the evolution of genomes, the diversification of organisms and the emergence of new cellular functions. We review reconciliations between gene and species trees, which are rigorous approaches for identifying duplications, transfers and losses that mark the evolution of a gene family. Existing reconciliation models and algorithms are reviewed and difficulties in modeling gene transfers are discussed. We also compare different reconciliation programs along with their advantages and disadvantages.  相似文献   

12.
工业微生物是微生物制造的核心,而微生物制造过程中微生物的生产性能是提高微生物制造过程效率的关键。结合基于约束条件的优化算法的基因组规模代谢网络模型是全局规模上认识、调控和优化工业微生物生理功能的重要平台。本文在简述基因组规模代谢网络模型构建的基础上,详细介绍了基于约束条件的优化算法的原理、分类和应用实例,为提高微生物制造过程效率奠定了坚实的基础。  相似文献   

13.
Let A be a sequence of n real numbers, L(1) and L(2) be two integers such that L(1) < or = L(2) , and R(1) and R(2) be two real numbers such that R(1) < or = R(2). An interval of A is feasible if its length is between L(1) and L(2) and its average is between R(1) and R(2). In this paper, we study the following problems: finding all feasible intervals of A, counting all feasible intervals of A, finding a maximum cardinality set of non-overlapping feasible intervals of A, locating a longest feasible interval of A, and locating a shortest feasible interval of A. The problems are motivated from the problem of locating CpG islands in biomolecular sequences. In this paper, we firstly show that all the problems have Omega (n log n)-time lower bound in the comparison model. Then, we use geometric approaches to design optimal algorithms for the problems. All the presented algorithms run in an on-line manner and use O(n) space.  相似文献   

14.
With the advent of experimental technologies like chemical cross-linking, it has become possible to obtain distances between specific residues of a newly sequenced protein. These types of experiments usually are less time consuming than X-ray crystallography or NMR. Consequently, it is highly desired to develop a method that incorporates this distance information to improve the performance of protein threading methods. However, protein threading with profiles in which constraints on distances between residues are given is known to be NP-hard. By using the notion of a maximum edge-weight clique finding algorithm, we introduce a more efficient method called FTHREAD for profile threading with distance constraints that is 18 times faster than its predecessor CLIQUETHREAD. Moreover, we also present a novel practical algorithm NTHREAD for profile threading with Non-strict constraints. The overall performance of FTHREAD on a data set shows that although our algorithm uses a simple threading function, our algorithm performs equally well as some of the existing methods. Particularly, when there are some unsatisfied constraints, NTHREAD (Non-strict constraints threading algorithm) performs better than threading with FTHREAD (Strict constraints threading algorithm). We have also analyzed the effects of using a number of distance constraints. This algorithm helps the enhancement of alignment quality between the query sequence and template structure, once the corresponding template structure is determined for the target sequence.  相似文献   

15.
Assignment of orthologous genes via genome rearrangement   总被引:1,自引:0,他引:1  
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.  相似文献   

16.
Small RNAs produced by an RNAi-related mechanism are involved in DNA elimination during development of the somatic macronucleus from the germline micronucleus in Tetrahymena. The properties of these small RNAs can explain how the primary sequence of the parental macronucleus epigenetically controls genome rearrangement in the new macronucleus and provide the first demonstration of an RNAi-mediated process that directly alters DNA sequence organization. Methylation of histone H3 on lysine 9 and accumulation of chromodomain proteins, hallmarks of heterochromatin, also occur specifically on sequences undergoing elimination and are dependent on the small RNAs. These findings contribute to a new paradigm of chromatin biology: regulation of heterochromatin formation by RNAi-related mechanisms in eukaryotes.  相似文献   

17.
18.
Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of "symmetric inversions"-inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes.  相似文献   

19.
MOTIVATION: In spite of a well-known fact that genome rearrangements are supposed to be viewed in the light of the evolutionary relationships within and between the species involved, no formal underlying framework based on the evolutionary considerations for treating the questions arising in the area has been proposed. If such an underlying framework is provided, all the basic questions in the area can be posed in a biologically more appropriate and useful form: e.g., the similarity between two genomes can then be computed via the nearest ancestor, rather than 'directly', ignoring the evolutionary connections. RESULTS: We outline an evolution-based general framework for answering questions related to the multiple genome rearrangement. In the proposed model, the evolutionary genome graph (EG-graph) encapsulates an evolutionary history of a genome family. For a set of all EG-graphs, we introduce a family of similarity measures, each defined via a fixed set of genome transformations. Given a set of genomes and restricting ourselves to the transpositions, an algorithm for constructing an EG-graph is presented. We also present the experimental results in the form of an EG-graph for a set of concrete genomes (for several species). This EG-graph turns out to be very close to the corresponding known phylogenetic tree.  相似文献   

20.
Despite the various processing steps involved in V(D)J recombination, which could potentially introduce many biases in the length distribution of complementarity determining region 3 (CDR3) segments, the observed CDR3 length distributions for complete repertoires are very close to a normal-like distribution. This raises the question of whether this distribution is simply a result of the random steps included in the process of gene rearrangement, or has been optimized during evolution. We have addressed this issue by constructing a simulation of gene rearrangement, which takes into account the DNA modification steps included in the process, namely hairpin opening, nucleotide additions, and nucleotide deletions. We found that the near-Gaussian- shape of CDR3 length distribution can only be obtained under a relatively narrow set of parameter values, and thus our model suggests that specific biases govern the rearrangement process. In both B-cell receptor (BCR) heavy chain and T-cell receptor beta chain, we obtained a Gaussian distribution using identical parameters, despite the difference in the number and the lengths of the D segments. Hence our results suggest that these parameters most likely reflect the optimal conditions under which the rearrangement process occurs. We have subsequently used the insights gained in this study to estimate the probability of occurrence of two exactly identical BCRs over the course of a human lifetime. Whereas identical rearrangements of the heavy chain are highly unlikely to occur within one human lifetime, for the light chain we found that this probability is not negligible, and hence the light chain CDR3 alone cannot serve as an indicator of B-cell clonality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号