首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%.  相似文献   

2.
3.
Comparisons of gene orders between species permit estimation of the rate of chromosomal evolution since their divergence from a common ancestor. We have compared gene orders on three chromosomes of Drosophila pseudoobscura with its close relative, D. miranda, and the distant outgroup species, D. melanogaster, by using the public genome sequences of D. pseudoobscura and D. melanogaster and approximately 50 in situ hybridizations of gene probes in D. miranda. We find no evidence for extensive transfer of genes among chromosomes in D. miranda. The rates of chromosomal rearrangements between D. miranda and D. pseudoobscura are far higher than those found before in Drosophila and approach those for nematodes, the fastest rates among higher eukaryotes. In addition, we find that the D. pseudoobscura chromosome with the highest level of inversion polymorphism (Muller's element C) does not show an unusually fast rate of evolution with respect to chromosome structure, suggesting that this classic case of inversion polymorphism reflects selection rather than mutational processes. On the basis of our results, we propose possible ancestral arrangements for the D. pseudoobscura C chromosome, which are different from those in the current literature. We also describe a new method for correcting for rearrangements that are not detected with a limited set of markers.  相似文献   

4.
5.

Background

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.

Methodology/Principal Findings

We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.

Conclusions

The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve.  相似文献   

6.
Two Drosophila pseudoobscura genomic clones have sequence similarity to the Drosophila melanogaster amylase region that maps to the 53CD region on the D. melanogaster cytogenetic map. The two clones with similarity to amylase map to sections 73A and 78C of the D. pseudoobscura third chromosome cytogenetic map. The complete sequences of both the 73A and 78C regions were compared to the D. melanogaster genome to determine if the coding region for amylase is present in both regions and to determine the evolutionary mechanism responsible for the observed distribution of the amylase gene or genes. The D. pseudoobscura 73A and 78C linkage groups are conserved with the D. melanogaster 41E and 53CD regions, respectively. The amylase gene, however, has not maintained its conserved linkage between the two species. These data indicate that amylase has moved via a transposition event in the D. melanogaster or D. pseudoobscura lineage. The predicted genes within the 73A and 78C regions show patterns of molecular evolution in synonymous and nonsynonymous sites that are consistent with previous studies of these two species.  相似文献   

7.
Substantial insights into basic strategies for embryonic body patterning have been obtained from genetic analyses of Drosophila melanogaster. This knowledge has been used in evolutionary comparisons to ask if genes and functions are conserved. To begin to ask how highly conserved are the mechanisms of mRNA localization, a process crucial to Drosophila body patterning, we have focused on the localization of bcd mRNA to the anterior pole of the embryo. Here we consider two components involved in that process: the exuperantia (exu) gene, required for an early step in localization; and the cis-acting signal that directs bcd mRNA localization. First, we use the cloned D. melanogaster exu gene to identify the exu genes from Drosophila virilis and Drosophila pseudoobscura and to isolate them for comparisons at the structural and functional levels. Surprisingly, D. pseudoobscura has two closely related exu genes, while D. melanogaster and D. virilis have only one each. When expressed in D. melanogaster ovaries, the D. virilis exu gene and one of the D. pseudoobscura exu genes can substitute for the endogenous exu gene in supporting localization of bcd mRNA, demonstrating that function is conserved. Second, we reevaluate the ability of the D. pseudoobscura bcd mRNA localization signal to function in D. melanogaster. In contrast to a previous report, we find that function is retained. Thus, among these Drosophila species there is substantial conservation of components acting in mRNA localization, and presumably the mechanisms underlying this process.  相似文献   

8.
9.
Recently algorithms for parametric alignment (Watermanet al., 1992,Natl Acad. Sci. USA 89, 6090–6093; Gusfieldet al., 1992,Proceedings of the Third Annual ACM-SIAM Discrete Algorithms) find optimal scores for all penalty parameters, both for global and local sequence alignment. This paper reviews those techniques. Then in the main part of this paper dynamic programming methods are used to compute ensemble alignment, finding all alignment scores for all parameters. Both global and local ensemble alignments are studied, and parametric alignment is used to compute near optimal ensemble alignments.  相似文献   

10.
11.
12.
MOTIVATION: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps. RESULTS: In this paper we introduce the notion of glocal alignment, a combination of global and local methods, where one creates a map that transforms one sequence into the other while allowing for rearrangement events. We present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global aligner, and is able to align long genomic sequences. To test Shuffle-LAGAN we split the mouse genome into BAC-sized pieces, and aligned these pieces to the human genome. We demonstrate that Shuffle-LAGAN compares favorably in terms of sensitivity and specificity with standard local and global aligners. From the alignments we conclude that about 9% of human/mouse homology may be attributed to small rearrangements, 63% of which are duplications.  相似文献   

13.
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance.  相似文献   

14.
The oxidative phosphorylation (OXPHOS) is the primary energy-producing process of all aerobic organisms and the only cellular function under the dual control of both the mitochondrial and the nuclear genomes. Functional characterization and evolutionary study of the OXPHOS system is of great importance for the understanding of many as yet unclear aspects of nucleus-mitochondrion genomic co-evolution and co-regulation gene networks. The MitoDrome database is a web-based database which provides genomic annotations about nuclear genes of Drosophila melanogaster encoding for mitochondrial proteins. Recently, MitoDrome has included a new section annotating genomic information about OXPHOS genes in Drosophila pseudoobscura and Anopheles gambiae and their comparative analysis with their Drosophila melanogaster and human counterparts. The introduction of this new comparative annotation section into MitoDrome is expected to be a useful resource for both functional and structural genomics related to the OXPHOS system.  相似文献   

15.
16.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

17.
SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for rapid alignment and gene prediction in user submitted sequences. Along with annotations and alignments for the submitted sequences, users obtain a list of predicted conserved non-coding sequences (and their associated alignments). The web site also links to whole genome annotations of the human, mouse and rat genomes produced with the SLAM program. The server can be accessed at http://bio.math.berkeley.edu/slam.  相似文献   

18.
Mai  Huijun  Lam  Tak-Wah  Ting  Hing-Fung 《BMC genomics》2017,18(4):362-5

Background

The recent advancement of whole genome alignment software has made it possible to align two genomes very efficiently and with only a small sacrifice in sensitivity. Yet it becomes very slow if the extra sensitivity is needed. This paper proposes a simple but effective method to improve the sensitivity of existing whole-genome alignment software without paying much extra running time.

Results and conclusions

We have applied our method to a popular whole genome alignment tool LAST, and we called the resulting tool LASTM. Experimental results showed that LASTM could find more high quality alignments with a little extra running time. For example, when comparing human and mouse genomes, to produce the similar number of alignments with similar average length and similarity, LASTM was about three times faster than LAST. We conclude that our method can be used to improve the sensitivity, and the extra time it takes is small, and thus it is worthwhile to be implemented in existing tools.
  相似文献   

19.
The sequencing of the genomes of 12 Drosophila species has created an opportunity for much in the way of comparative molecular analyses amongst these species. To aid that endeavor, we have made several transformation vectors based on the piggyBac transposon with 3xP3-EGFP and -ECFP transgenic markers that should be useful for mutagenesis and establishing the GAL4/UAS system in these species. We have tested the ability of mini-white to be used as a marker for insertional mutagenesis, and have observed mini-white derived pigmentation of the testes sheath in a subset of lines from D. pseudoobscura and D. virilis. We have incorporated a source of piggyBac transposase into nine Drosophila species, and have demonstrated the functionality of these transposase lines for mobilization of marked inserts in vivo. Additionally, we tested the ability of a D. melanogaster nanos enhancer element to drive expression of GAL4 in D. melanogaster, D. simulans, D. erecta, D. yakuba, D. pseudoobscura, and D. virilis. The efficacy of the nos-Gal4 transgene was determined by measuring the response of UAS-EGFPtub in all six species. Our results show that D. melanogaster nos-Gal4 drives expression in other species, to varying degrees, in similar spatiotemporal domains in the ovaries, testes, and embryos as seen in D. melanogaster. However, expression levels are variable, demonstrating the possible need to use species-specific promoters in some cases. In summary, we hope to provide a set of guidelines and basic tools, based upon this work, for both insertional mutagenesis and GAL4/UAS system-based experiments in multiple species of Drosophila.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号