首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps. RESULTS: In this paper we introduce the notion of glocal alignment, a combination of global and local methods, where one creates a map that transforms one sequence into the other while allowing for rearrangement events. We present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global aligner, and is able to align long genomic sequences. To test Shuffle-LAGAN we split the mouse genome into BAC-sized pieces, and aligned these pieces to the human genome. We demonstrate that Shuffle-LAGAN compares favorably in terms of sensitivity and specificity with standard local and global aligners. From the alignments we conclude that about 9% of human/mouse homology may be attributed to small rearrangements, 63% of which are duplications.  相似文献   

2.
Sequence conservation between species is useful both for locating coding regions of genes and for identifying functional noncoding segments. Hence interspecies alignment of genomic sequences is an important computational technique. However, its utility is limited without extensive annotation. We describe a suite of software tools, PipTools, and related programs that facilitate the annotation of genes and putative regulatory elements in pairwise alignments. The alignment server PipMaker uses the output of these tools to display detailed information needed to interpret alignments. These programs are provided in a portable format for use on common desktop computers and both the toolkit and the PipMaker server can be found at our Web site (http://bio.cse.psu.edu/). We illustrate the utility of the toolkit using annotation of a pairwise comparison of the mouse MHC class II and class III regions with orthologous human sequences and subsequently identify conserved, noncoding sequences that are DNase I hypersensitive sites in chromatin of mouse cells.  相似文献   

3.
Chromosome 7q22 has been the focus of many cytogenetic and molecular studies aimed at delineating regions commonly deleted in myeloid leukemias and myelodysplastic syndromes. We have compared a gene-dense, GC-rich sub-region of 7q22 with the orthologous region on mouse chromosome 5. A physical map of 640 kb of genomic DNA from mouse chromosome 5 was derived from a series of overlapping bacterial artificial chromosomes. A 296 kb segment from the physical map, spanning ACHE: to Tfr2, was compared with 267 kb of human sequence. We identified a conserved linkage of 12 genes including an open reading frame flanked by ACHE: and Asr2, a novel cation-chloride cotransporter interacting protein Cip1, Ephb4, Zan and Perq1. While some of these genes have been previously described, in each case we present new data derived from our comparative sequence analysis. Adjacent unfinished sequence data from the mouse contains an orthologous block of 10 additional genes including three novel cDNA sequences that we subsequently mapped to human 7q22. Methods for displaying comparative genomic information, including unfinished sequence data, are becoming increasingly important. We supplement our printed comparative analysis with a new, Web-based program called Laj (local alignments with java). Laj provides interactive access to archived pairwise sequence alignments via the WWW. It displays synchronized views of a dot-plot, a percent identity plot, a nucleotide-level local alignment and a variety of relevant annotations. Our mouse-human comparison can be viewed at http://web.uvic.ca/~bioweb/laj.html. Laj is available at http://bio.cse.psu.edu/, along with online documentation and additional examples of annotated genomic regions.  相似文献   

4.
5.
Non-coding DNA segments that are conserved between the human and mouse genomic sequence are good indicators of possible regulatory sequences. Here we report on a systematic approach to delineate such conserved elements from upstream regions of orthologous gene pairs from man and mouse. We focus on orthologous genes in order to maximize our chances to find functionally similar regulatory elements. The identification of conserved elements is effected using the Waterman-Eggert local suboptimal alignment algorithm. We have modified an implementation of this algorithm such that it integrates the determination of statistical significance for the local suboptimal alignments. This has the effect of outputting a dynamically determined number of suboptimal alignments that are deemed statistically significant. Comparison with experimentally determined annotation shows a striking enrichement of regulatory sites among the conserved regions. Furthermore, the conserved regions tend to cover the promotor region described in the EPD database.  相似文献   

6.
We describe a multiple alignment program named MAP2 based on a generalized pairwise global alignment algorithm for handling long, different intergenic and intragenic regions in genomic sequences. The MAP2 program produces an ordered list of local multiple alignments of similar regions among sequences, where different regions between local alignments are indicated by reporting only similar regions. We propose two similarity measures for the evaluation of the performance of MAP2 and existing multiple alignment programs. Experimental results produced by MAP2 on four real sets of orthologous genomic sequences show that MAP2 rarely missed a block of transitively similar regions and that MAP2 never produced a block of regions that are not transitively similar. Experimental results by MAP2 on six simulated data sets show that MAP2 found the boundaries between similar and different regions precisely. This feature is useful for finding conserved functional elements in genomic sequences. The MAP2 program is freely available in source code form at http://bioinformatics.iastate.edu/aat/sas.html for academic use.  相似文献   

7.
SUMMARY: In the segment-by-segment approach to sequence alignment, pairwise and multiple alignments are generated by comparing gap-free segments of the sequences under study. This method is particularly efficient in detecting local homologies, and it has been used to identify functional regions in large genomic sequences. Herein, an algorithm is outlined that calculates optimal pairwise segment-by-segment alignments in essentially linear space. AVAILABILTIY: The program is available at the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak. uni-bielefeld.de/dialign/  相似文献   

8.
9.
10.
Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.  相似文献   

11.
The determination of long segments of DNA sequences encompassing the beta- and alpha-globin gene clusters has provided an unprecedented data base for analysis of genome evolution and regulation of gene clusters. A newly developed computer tool kit generates local alignments between such long sequences in a space-efficient manner, helps the user analyze the alignments effectively, and finds consistently aligning blocks of sequences in multiple pairwise comparisons. Such sequence analyses among the beta-like globin gene clusters of human, galago, rabbit, and mouse have revealed the general patterns of evolution of this gene cluster. Alignments in the flanking regions are very useful in assigning orthologous relationships. Investigation of such matches between the mouse and human beta-like globin gene clusters has led to a reassessment of some orthologous assignments in mouse and to a revision of the proposed pathway for evolution of this gene cluster. In general, the interspersed repetitive elements have inserted independently, presumably via a retrotransposition mechanism, in the different mammalian lineages. However, some examples of ancient L1 repeats are found, including one between the epsilon- and gamma-globin genes that appears to have been in the ancestral eutherian gene cluster. Prominent matching sequences are found in a long region 5' to the epsilon-globin gene, the locus control region (LCR) that is a positive regulator of the entire gene cluster. Three-way alignments among the human, goat, and rabbit sequences can extend for > or = 3 kb in part of the LCR (DNase hypersensitive site 3), indicating that the cis-acting components of this complex regulatory region cover a long segment of DNA. In contrast to the beta-like globin gene clusters, the alpha-like globin gene clusters of many mammals occur in very G+C-rich isochores and contain prominent CpG islands. The regions between the alpha-like globin genes are evolving faster than the intergenic regions of the beta-like globin gene clusters. The contrasts between the two gene clusters can be attributed to differences in DNA metabolism in the isochore. The proximal control elements of the rabbit alpha-globin gene are located both 5' to and within the gene. All of this region is part of a prominent CpG island that may be acting as an extended, enhancer- independent promoter. One can hypothesize that the analogue to the LCR in the alpha-globin gene cluster may interface with the distinctive alpha-globin promoter in ways different from the interaction between the beta LCR and the promoters of beta-like globin genes.(ABSTRACT TRUNCATED AT 400 WORDS)   相似文献   

12.
Producing complete and accurate alignments of multiple genomic sequences is complex and prone to errors, especially with sequences generated from highly diverged species. In this article, we show that multi-sequence (as opposed to pair-wise) alignment methods are substantially better at aligning (or 'capturing') all of the available orthologous sequence from phylogenetically diverse vertebrates (i.e. those separated by relatively long branch lengths). Maximum gains are obtained only when sequences from many species are aligned. Such multi-sequence alignments contain significant amounts of exonic and highly conserved non-exonic sequences that are not captured in pair-wise alignments, thus illustrating the importance of the alignment method used for performing comparative genome analyses.  相似文献   

13.
A flexible multiple sequence alignment program   总被引:15,自引:3,他引:12       下载免费PDF全文
The 'regions' method for multisequence alignment used in the previously reported program MALIGN has been generalized to include recursive refinement so that unaligned portions between two regions at the current level of resolution can be handled with increased resolution. Additionally, there is incorporated a limiting of the number of regions to be used at any level of resolution from which to abstract an alignment. This provides a significant increase in speed over the unlimited version. The program GENALIGN uses this improved regions method to execute fast pairwise alignments in the framework of Taylor's multisequence alignment procedure using clustered pairwise alignments. Pairwise alignments by dynamic programming are also provided in the program.  相似文献   

14.
Genes that show complex tissue-specific and temporal control by regulatory elements located outside their promoters present a considerable challenge to identify the sequences involved. The rapid accumulation of genomic sequence information for a number of species has enabled a comparative phylogenetic approach to find important regulatory elements. For some genes, which show a similar pattern of expression in humans and rodents, genomic sequence information for these two species may be sufficient. Others, such as the cystic fibrosis transmembrane conductance regulator (CFTR) gene, show significant divergence in expression patterns between mouse and human, necessitating phylogenetic approaches involving additional species. The ovine CFTR gene has a temporal and spatial expression pattern that is very similar to that of human CFTR. Comparative genomic sequence analysis of ovine and human CFTR identified high levels of homology between the core elements in several potential regulatory elements defined as DNase I hypersensitive sites in human CFTR. These data provide a case for the power of an artiodactyl genome to contribute to the understanding of human genetic disease.  相似文献   

15.
Human chromosome 5q11.2-q13.3 and its ortholog on mouse chromosome 13 contain candidate genes for an inherited human neurodegenerative disorder called spinal muscular atrophy (SMA) and for an inherited mouse susceptibility to infection with Legionella pneumophila (Lgn1). These homologous genomic regions also have unusual repetitive organizations that create practical difficulties in mapping and raise interesting issues about the evolutionary origin of the repeats. In an attempt to analyze this region in detail, and as a way to identify additional candidate genes for these diseases, we have determined the sequence of 179 kb of the mouse Lgn1/SMA interval. We have analyzed this sequence using BLAST searches and various exon prediction programs to identify potential genes. Since these methods can generate false-positive exon declarations, our alignments of the mouse sequence with available human orthologous sequence allowed us to discriminate rapidly among this collection of potential coding regions by indicating which regions were well conserved and were more likely to represent actual coding sequence. As a result of our analysis, we accurately mapped two additional genes in the SMA interval that can be tested for involvement in the pathogenesis of SMA. While no new Lgn1 candidates emerged, we have identified new genetic markers that exclude Smn as an Lgn1 candidate. In addition to providing important resources for studying SMA and Lgn1, our data provide further evidence of the value of sequencing the mouse genome as a means to help with the annotation of the human genomic sequence and vice versa.  相似文献   

16.
17.
MOTIVATION: Synteny mapping, or detecting regions that are orthologous between two genomes, is a key step in studies of comparative genomics. For completely sequenced genomes, this is increasingly accomplished by whole-genome sequence alignment. However, such methods are computationally expensive, especially for large genomes, and require rather complicated post-processing procedures to filter out non-orthologous sequence matches. RESULTS: We have developed a novel method that does not require sequence alignment for synteny mapping of two large genomes, such as the human and mouse. In this method, the occurrence spectra of genome-wide unique 16mer sequences present in both the human and mouse genome are used to directly detect orthologous genomic segments. Being sequence alignment-free, the method is very fast and able to map the two mammalian genomes in one day of computing time on a single Pentium IV personal computer. The resulting human-mouse synteny map was shown to be in excellent agreement with those produced by the Mouse Genome Sequencing Consortium (MGSC) and by the Ensembl team; furthermore, the syntenic relationship of segments found only by our method was supported by BLASTZ sequence alignment.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号