首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes?

Results

Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera.

Conclusions

The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.
  相似文献   

2.

Background

Bacterial genomes develop new mechanisms to tide them over the imposing conditions they encounter during the course of their evolution. Acquisition of new genes by lateral gene transfer may be one of the dominant ways of adaptation in bacterial genome evolution. Lateral gene transfer provides the bacterial genome with a new set of genes that help it to explore and adapt to new ecological niches.

Methods

A maximum likelihood analysis was done on the five sequenced corynebacterial genomes to model the rates of gene insertions/deletions at various depths of the phylogeny.

Results

The study shows that most of the laterally acquired genes are transient and the inferred rates of gene movement are higher on the external branches of the phylogeny and decrease as the phylogenetic depth increases. The newly acquired genes are under relaxed selection and evolve faster than their older counterparts. Analysis of some of the functionally characterised LGTs in each species has indicated that they may have a possible adaptive role.

Conclusion

The five Corynebacterial genomes sequenced to date have evolved by acquiring between 8 – 14% of their genomes by LGT and some of these genes may have a role in adaptation.
  相似文献   

3.

Background

The reconstruction of ancestral genomes must deal with the problem of resolution, necessarily involving a trade-off between trying to identify genomic details and being overwhelmed by noise at higher resolutions.

Results

We use the median reconstruction at the synteny block level, of the ancestral genome of the order Gentianales, based on coffee, Rhazya stricta and grape, to exemplify the effects of resolution (granularity) on comparative genomic analyses.

Conclusions

We show how decreased resolution blurs the differences between evolving genomes, with respect to rate, mutational process and other characteristics.
  相似文献   

4.
Alexeev  Nikita  Alekseyev  Max A. 《BMC genomics》2017,18(4):356-9

Background

The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the parsimony assumption may not always hold, emphasizing the need for estimation that does not rely on the rearrangement distance. The distance that accounts for the actual (rather than minimal) number of rearrangements between two genomes is often referred to as the true evolutionary distance. While there exists a method for the true evolutionary distance estimation, it however assumes that genomes can be broken by rearrangements equally likely at any position in the course of evolution. This assumption, known as the random breakage model, has recently been refuted in favor of the more rigorous fragile breakage model postulating that only certain “fragile” genomic regions are prone to rearrangements.

Results

We propose a new method for estimating the true evolutionary distance between two genomes under the fragile breakage model. We evaluate the proposed method on simulated genomes, which show its high accuracy. We further apply the proposed method for estimation of evolutionary distances within a set of five yeast genomes and a set of two fish genomes.

Conclusions

The true evolutionary distances between the five yeast genomes estimated with the proposed method reveals that some pairs of yeast genomes violate the parsimony assumption. The proposed method further demonstrates that the rearrangement distance between the two fish genomes underestimates their evolutionary distance by about 20%. These results demonstrate how drastically the two distances can differ and justify the use of true evolutionary distance in phylogenomic studies.
  相似文献   

5.

Background

One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient alone for a scaffolder to generate correct scaffolds of a target draft genome, especially when the evolutionary relationship between the target and reference genomes is distant or some rearrangements occur between them. This motivates the need to develop scaffolding tools that can order and orient the contigs of the target genome using multiple reference genomes.

Results

In this work, we utilize a heuristic method to develop a new scaffolder called Multi-CSAR that is able to accurately scaffold a target draft genome based on multiple reference genomes, each of which does not need to be complete. Our experimental results on real datasets show that Multi-CSAR outperforms other two multiple reference-based scaffolding tools, Ragout and MeDuSa, in terms of many average metrics, such as sensitivity, precision, F-score, genome coverage, NGA50, scaffold number and running time.

Conclusions

Multi-CSAR is a multiple reference-based scaffolder that can efficiently produce more accurate scaffolds of a target draft genome by referring to multiple complete and/or incomplete genomes of related organisms. Its stand-alone program is available for download at https://github.com/ablab-nthu/Multi-CSAR.
  相似文献   

6.

Background

An increasing number of microbial genomes are being sequenced and deposited in public databases. In addition, several closely related strains are also being sequenced in order to understand the genetic basis of diversity and mechanisms that lead to the acquisition of new genetic traits. These exercises have necessitated the requirement for visualizing microbial genomes and performing genome comparisons on a finer scale. We have developed GenomeViz to enable rapid visualization and subsequent comparisons of several microbial genomes in an interactive environment.

Results

Here we describe a program that allows visualization of both qualitative and quantitative information from complete and partially sequenced microbial genomes. Using GenomeViz, data deriving from studies on genomic islands, gene/protein classifications, GC content, GC skew, whole genome alignments, microarrays and proteomics may be plotted. Several genomes can be visualized interactively at the same time from a comparative genomic perspective and publication quality circular genome plots can be created.

Conclusions

GenomeViz should allow researchers to perform visualization and comparative analysis of up to eight different microbial genomes simultaneously.
  相似文献   

7.

Background

Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. In order to study such effects it is important to locate the breakpoint positions with precision.

Results

We define a partially local sequence alignment problem that assumes that following a rearrangement of a sequence F, two fragments L, and R are produced that may exactly fit together to match F, leave a gap of deleted DNA between L and R, or overlap with each other. We show that this alignment problem can be solved by dynamic programming in cubic space and time. We apply the new method to evaluate rearrangements of animal mitogenomes and find that a surprisingly large fraction of these events involved local sequence duplications.

Conclusions

The partially local sequence alignment method is an effective way to investigate the mechanism of genomic rearrangement events. While applied here only to mitogenomes there is no reason why the method could not be used to also consider rearrangements in nuclear genomes.
  相似文献   

8.

Background

One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form \(\pi =(\pi _1 \pi _2 \ldots \pi _n)\), and in that case we can consider without loss of generality that one of them is the identity permutation \(\iota _n =(1 2 \ldots n)\), and that we just need to sort the other (i.e., transform it into \(\iota _n\)). The most studied genome rearrangement events are reversals, where a segment of the genome is reversed and reincorporated at the same location; and transpositions, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by super short operations (or SSOs).

Results and conclusions

All problems considering SSOs in permutations have been shown to be in \(\mathsf {P}\), except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called cyclic permutation graph and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.
  相似文献   

9.
10.

Background

Chicken anemia virus (CAV) is the causative agent of chicken infectious anemia. CAV putative intergenotypic recombinants have been reported previously. This fact is based on the previous classification of CAV sequences into three genotypes. However, it is unknown whether intersubtype recombination occurs between the recently reported four CAV genotypes and five subtypes of genome sequences.

Results

Phylogenetic analysis, together with a variety of computational recombination detection algorithms, was used to investigate CAV approximately full genomes. Statistically significant evidence of intersubtype recombination was detected in the parent-like and two putative CAV recombinant sequences. This event was shown to occur between CAV subgroup A1 and A2 sequences in the phylogenetic trees.

Conclusions

We revealed that intersubtype recombination in CAV genome sequences played a role in generating genetic diversity within the natural population of CAV.
  相似文献   

11.

Background

The ability of bacteria to acquire resistance to antibiotics relies to a large extent on their capacity for genome modification. Prokaryotic genomes are highly plastic and can utilize horizontal gene transfer, point mutations, and gene deletions or amplifications to realize genome expansion and rearrangements. The contribution of point mutations to de novo acquisition of antibiotic resistance is well-established. In this study, the internal genome rearrangement of Escherichia coli during to de novo acquisition of antibiotic resistance was investigated using whole-genome sequencing.

Results

Cells were made resistant to one of the four antibiotics and subsequently to one of the three remaining. This way the initial genetic rearrangements could be documented together with the effects of an altered genetic background on subsequent development of resistance. A DNA fragment including ampC was amplified by a factor sometimes exceeding 100 as a result of exposure to amoxicillin. Excision of prophage e14 was observed in many samples with a double exposure history, but not in cells exposed to a single antibiotic, indicating that the activation of the SOS stress response alone, normally the trigger for excision, was not sufficient to cause excision of prophage e14. Partial deletion of clpS and clpA occurred in strains exposed to enrofloxacin and tetracycline. Other deletions were observed in some strains, but not in replicates with the exact same exposure history. Various insertion sequence transpositions correlated with exposure to specific antibiotics.

Conclusions

Many of the genome rearrangements have not been reported before to occur during resistance development. The observed correlation between genome rearrangements and specific antibiotic pressure, as well as their presence in independent replicates indicates that these events do not occur randomly. Taken together, the observed genome rearrangements illustrate the plasticity of the E. coli genome when exposed to antibiotic stress.
  相似文献   

12.

Background

The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

Methods

We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of \(k \le 3\) and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice.

Results

The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR.

Conclusions

We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.
  相似文献   

13.

Background

Miniature inverted-repeat transposable element (MITE) is a type of class II non-autonomous transposable element playing a crucial role in the process of evolution in biology. There is an urgent need to develop bioinformatics tools to effectively identify MITEs on a whole genome-wide scale. However, most of currently existing tools suffer from low ability to deal with large eukaryotic genomes.

Methods

In this paper, we proposed a novel tool MiteFinderII, which was adapted from our previous algorithm MiteFinder, to efficiently detect MITEs from genomics sequences. It has six major steps: (1) build K-mer Index and search for inverted repeats; (2) filtration of inverted repeats with low complexity; (3) merger of inverted repeats; (4) filtration of candidates with low score; (5) selection of final MITE sequences; (6) selection of representative sequences.

Results

To test the performance, MiteFinderII and three other existing algorithms were applied to identify MITEs on the whole genome of oryza sativa. Results suggest that MiteFinderII outperforms existing popular tools in terms of both specificity and recall. Additionally, it is much faster and more memory-efficient than other tools in the detection.

Conclusion

MiteFinderII is an accurate and effective tool to detect MITEs hidden in eukaryotic genomes. The source code is freely accessible at the website: https://github.com/screamer/miteFinder.
  相似文献   

14.
15.

Background

Traditionally, the merit of a rearrangement scenario between two gene orders has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. physical proximity of the DNA segments implicated or repetitive sequences.

Results

We propose optimization problems with the objective of maximizing overall likelihood, by weighting the rearrangements. We study a binary weight function suitable to the representation of sets of genome positions that are most likely to have swapped adjacencies. We give a polynomial-time algorithm for the problem of finding a minimum weight double cut and join scenario among all minimum length scenarios. In the process we solve an optimization problem on colored noncrossing partitions, which is a generalization of the Maximum Independent Set problem on circle graphs.

Conclusions

We introduce a model for weighting genome rearrangements and show that under simple yet reasonable conditions, a fundamental distance can be computed in polynomial time. This is achieved by solving a generalization of the Maximum Independent Set problem on circle graphs. Several variants of the problem are also mentioned.
  相似文献   

16.
17.

Background

Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step in this direction is the construction of a common coordinate system. Since genomes not only differ by rearrangements but also by large insertions, deletions, and duplications, the use of a single reference genome is insufficient, in particular when the number of species becomes large.

Results

The computational problem then becomes to determine an order and orientations of optimal local alignments that are as co-linear as possible with all the genome sequences. We first review the most prominent approaches to model the problem formally and then proceed to showing that it can be phrased as a particular variant of the Betweenness Problem. It is NP hard in general. As exact solutions are beyond reach for the problem sizes of practical interest, we introduce a collection of heuristic simplifiers to resolve ordering conflicts.

Conclusion

Benchmarks on real-life data ranging from bacterial to fly genomes demonstrate the feasibility of computing good common coordinate systems.
  相似文献   

18.

Background

For most sequenced prokaryotic genomes, about a third of the protein coding genes annotated are "orphan proteins", that is, they lack homology to known proteins. These hypothetical genes are typically short and randomly scattered throughout the genome. This trend is seen for most of the bacterial and archaeal genomes published to date.

Results

In contrast we have found that a large fraction of the genes coding for such orphan proteins in the Methanopyrus kandleri AV19 genome occur within two large regions. These genes have no known homologs except from other M. kandleri genes. However, analysis of their lengths, codon usage, and Ribosomal Binding Site (RBS) sequences shows that they are most likely true protein coding genes and not random open reading frames.

Conclusions

Although these regions can be considered as candidates for massive lateral gene transfer, our bioinformatics analysis suggests that this is not the case. We predict many of the organism specific proteins to be transmembrane and belong to protein families that are non-randomly distributed between the regions. Consistent with this, we suggest that the two regions are most likely unrelated, and that they may be integrated plasmids.
  相似文献   

19.

Background

With the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different genomes.

Results

We extended the algorithm of taxonomic assignment of metagenomic sequence reads (TAMER) by developing an improved method that can set a different mismatch probability for each genome rather than imposing a single parameter for all genomes, thereby obtaining a greater degree of accuracy. This method, which we call TADIP (Taxonomic Assignment of metagenomics based on DIfferent Probabilities), was comprehensively tested in simulated and real datasets. The results support that TADIP improved the performance of TAMER especially in large sample size datasets with high complexity.

Conclusions

TADIP was developed as a statistical model to improve the estimate accuracy of taxonomy assignments. Based on its varying mismatch probability setting and correlated variance matrix setting, its performance was enhanced for high complexity samples when compared with TAMER.
  相似文献   

20.

Purpose of Review

Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens.

Recent Findings

Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host.

Summary

Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号