首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
GMAP: a genomic mapping and alignment program for mRNA and EST sequences   总被引:13,自引:0,他引:13  
MOTIVATION: We introduce GMAP, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. RESULTS: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, GMAP identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, GMAP provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, GMAP performed comparably with GeneSeqer. In these experiments, GMAP demonstrated a several-fold increase in speed over existing programs. AVAILABILITY: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap SUPPLEMENTARY INFORMATION: http://www.gene.com/share/gmap.  相似文献   

2.
TANDEM: matching proteins with tandem mass spectra   总被引:15,自引:0,他引:15  
SUMMARY: Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. AVAILABILITY: The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.  相似文献   

3.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

4.
To evaluate the importance of the surrounding nucleotide sequence in the selection of a splice site for mRNA, we have carried out computer studies of eukaryotic protein genes whose entire nucleotide sequences were available. A splice site-like sequence that has a significant homology to the consensus splice junction sequences is frequently found within an intron and exon. It is found that the higher the homology of a candidate donor site sequence to the nine-nucleotide consensus sequence, the higher is its probability of being a donor site. For most of the donors, the stability of presumed base-pairing with U1-RNA is higher than that of donor-like sequences, if any, in the adjacent exon and intron. However, homology of a candidate acceptor sequence to the 15-nucleotide consensus is a poor criterion of an acceptor site. The presence of a sequence that could serve as a branch-point 18 to 37 nucleotides before an acceptor does not seem to be critical in distinguishing it from an acceptor-like sequence. For genes of human, rat, mouse and chicken, respectively, nucleotide frequencies around splice junctions of many genes have been calculated. They seem to be different at some positions around a donor site from species to species. The acceptors for these vertebrates have longer pyrimidine-rich regions than the previous consensus sequence. The newly derived nucleotide frequencies were used as the standard to calculate the weighted homology score of a candidate splice site sequence in a gene of the four species. This weighted homology score of the 40 to 60-nucleotide intron-exon sequence is a much better criterion of an acceptor. These results suggest that the most important signal in the selection of a splice resides in the surrounding nucleotide sequence. It is also suggested that the surrounding nucleotide sequence alone is not generally sufficient for the selection.  相似文献   

5.
Two alternative exons, BEK and K-SAM, code for part of the ligand binding site of fibroblast growth factor receptor 2. Splicing of these exons is mutually exclusive, and the choice between them is made in a tissue-specific manner. We identify here pre-mRNA sequences involved in controlling splicing of the K-SAM exon. The short K-SAM exon sequence 5'-TAGGGCAGGC-3' inhibits splicing of the exon. This inhibition can be overcome by mutating either the exon's 5' or 3' splice site to make it correspond more closely to the relevant consensus sequence. Two separate sequence elements in the intron immediately downstream of the K-SAM exon, one of which is a sequence rich in pyrimidines, are both needed for efficient K-SAM exon splicing. This is no longer the case if either the exon's 5' or 3' splice site is reinforced. Furthermore, if the exon inhibitory sequence is removed, the intron sequences are not required for splicing of the K-SAM exon in a cell line which normally splices this exon. At least three elements are thus involved in controlling splicing of the K-SAM exon: suboptimal 5' and 3' splice sites, an exon inhibitory sequence, and intron activating sequences.  相似文献   

6.
The complete nucleotide sequence of an HLA-DP beta 1 gene and part of the adjacent DP alpha 1 gene, up to and including the signal sequence exon, were determined. The sequence of the DP beta 1 gene identified it as the DPw4 allele. The six exons of the DP beta 1 gene spanned over 11,000 bp of sequence. The arrangement of the gene was broadly analogous to genes of other class II beta chains. The beta 1 exon was flanked by introns of over 4 kb. Comparisons with published sequences of cDNA clones indicated that an alternative splice junction, at the 3' end of the gene, is used in at least one allele. Variation in choice of splice junction indicates an additional mechanism for allelic variation in class II genes. The sequence also indicated that the DP beta 1 and DP alpha 1 genes are separated by only 2 kb at their 5' ends. Comparison of the 5' ends of the DP alpha 1 and beta 1 genes with other class II sequences, including the DZ alpha gene, showed conservation of several blocks of sequences thought to be involved in control of expression. Some areas of the introns were partially conserved in the DQ beta gene, and several other intron sequences were homologous to sequences found in other unrelated genes.  相似文献   

7.
A conserved 3' splice site YAG is essential for the second step of pre-mRNA splicing but no trans-acting factor recognizing this sequence has been found. A direct, non-Watson-Crick interaction between the intron terminal nucleotides was suggested to affect YAG selection. The mechanism of YAG recognition was proposed to involve 5' to 3' scanning originating from the branchpoint or the polypyrimidine tract. We have constructed a yeast intron harbouring two closely spaced 3' splice sites. Preferential selection of a wild-type site over mutant ones indicated that the two sites are competing. For two identical sequences, the proximal site is selected. As previously observed, an A at the first intron nucleotide spliced most efficiently with a 3' splice site UAC. In this context, UAA or UAU were also more efficient 3' splice sites than UAG and competed more efficiently than the wild-type sequence with a 3' splice site UAC. We observed that a U at the first intron nucleotide is used for splicing in combination with 3' splice sites UAG, UAA or UAU. Our data indicate that the 3' splice site is not primarily selected through an interaction with the first intron nucleotide. Selection of the 3' splice site depends critically on its distance from the branchpoint but does not occur by a simple leaky scanning mechanism.  相似文献   

8.
The conformation of RNA sequences spanning five 3' splice sites and two 5' splice sites in adenovirus mRNA was probed by partial digestion with single-strand specific nucleases. Although cleavage of nucleotides near both 3' and 5' splice sites was observed, most striking was the preferential digestion of sequences near the 3' splice site. At each 3' splice site a region of very strong cleavage is observed at low concentrations of enzyme near the splice site consensus sequence or the upstream branch point consensus sequence. Additional sites of moderately strong cutting near the branch point consensus sequence were observed in those sequences where the splice site was the preferred target. Since recognition of the 3' splice site and branch site appear to be early events in mRNA splicing these observations may indicate that the local conformation of the splice site sequences may play a direct or indirect role in enhancing the accessibility of sequences important for splicing.  相似文献   

9.
采用基于贝叶斯网络的建模方法,预测真核生物DNA序列中的剪接位点.分别建立了供体位点和受体位点模型,并根据两种位点的生物学特性,对模型的拓扑结构和上下游节点的选择进行了优化.通过贝叶斯网络的最大似然学习算法求出模型参数后,利用10分组交互验证方法对测试数据进行剪接位点预测。结果显示,受体位点的平均预测准确率为92.5%,伪受体位点的平均预测准确率为94.0%,供体位点的平均预测准确率为92.3%,伪供体位点的平均预测准确率为93.5%,整体效果要好于基于使用独立和条件概率矩阵、以及隐Markov模型的预测方法.表明利用贝叶斯网络对剪接位点建模是预测剪接位点的一种有效手段.  相似文献   

10.
11.
Empirical models of substitution are often used in protein sequence analysis because the large alphabet of amino acids requires that many parameters be estimated in all but the simplest parametric models. When information about structure is used in the analysis of substitutions in structured RNA, a similar situation occurs. The number of parameters necessary to adequately describe the substitution process increases in order to model the substitution of paired bases. We have developed a method to obtain substitution rate matrices empirically from RNA alignments that include structural information in the form of base pairs. Our data consisted of alignments from the European Ribosomal RNA Database of Bacterial and Eukaryotic Small Subunit and Large Subunit Ribosomal RNA ( Wuyts et al. 2001. Nucleic Acids Res. 29:175-177; Wuyts et al. 2002. Nucleic Acids Res. 30:183-185). Using secondary structural information, we converted each sequence in the alignments into a sequence over a 20-symbol code: one symbol for each of the four individual bases, and one symbol for each of the 16 ordered pairs. Substitutions in the coded sequences are defined in the natural way, as observed changes between two sequences at any particular site. For given ranges (windows) of sequence divergence, we obtained substitution frequency matrices for the coded sequences. Using a technique originally developed for modeling amino acid substitutions ( Veerassamy, Smith, and Tillier. 2003. J. Comput. Biol. 10:997-1010), we were able to estimate the actual evolutionary distance for each window. The actual evolutionary distances were used to derive instantaneous rate matrices, and from these we selected a universal rate matrix. The universal rate matrices were incorporated into the Phylip Software package ( Felsenstein 2002. http://evolution.genetics.washington.edu/phylip.html), and we analyzed the ribosomal RNA alignments using both distance and maximum likelihood methods. The empirical substitution models performed well on simulated data, and produced reasonable evolutionary trees for 16S ribosomal RNA sequences from sequenced Bacterial genomes. Empirical models have the advantage of being easily implemented, and the fact that the code consists of 20 symbols makes the models easily incorporated into existing programs for protein sequence analysis. In addition, the models are useful for simulating the evolution of RNA sequence and structure simultaneously.  相似文献   

12.
13.
I Seif  G Khoury    R Dhar 《Nucleic acids research》1979,6(10):3387-3398
  相似文献   

14.
15.
DBToolkit: processing protein databases for peptide-centric proteomics   总被引:2,自引:0,他引:2  
SUMMARY: DBToolkit is a user-friendly, easily extensible tool that allows the processing of protein sequence databases to peptide-centric sequence databases. This processing is primarily aimed at enhancing the useful information content of these databases for use as optimized search spaces for efficient identification of peptide fragmentation spectra obtained by mass spectrometry. In addition, DBToolkit can be used to reliably solve a range of other typical tasks in processing sequence databases. AVAILABILITY: DBToolkit is open source under the GNU GPL license. The source code, full user and developer documentation and cross-platform binaries are freely downloadable from the project website at http://genesis.UGent.be/dbtoolkit/ CONTACT: lennart.martens@UGent.be  相似文献   

16.
SUMMARY: Mixture models of mutagenetic trees constitute a class of probabilistic models for describing evolutionary processes that are characterized by the accumulation of permanent genetic changes. They have been applied to model the accumulation of chromosomal gains and losses in tumor development and the development of drug resistance-associated mutations in the HIV genome.Mtreemix is a software package for estimating mutagenetic trees mixture models from observed cross-sectional data and for using these models for predictions. We provide programs for model fitting, model selection, simulation, likelihood computation and waiting time estimation. AVAILABILITY: Mtreemix, including source code, documentation, sample data files and precompiled Solaris and Linux binaries, is freely available for non-commercial users at http://mtreemix.bioinf.mpi-sb.mpg.de/  相似文献   

17.
The T-->G mutation at nucleotide 705 in the second intron of the beta-globin gene creates an aberrant 5' splice site and activates a 3' cryptic splice site upstream from the mutation. As a result, the IVS2-705 pre-mRNA is spliced via the aberrant splice sites leading to a deficiency of beta-globin mRNA and protein and to the genetic blood disorder thalassemia. We have shown previously that in cell culture models of thalassemia, aberrant splicing of beta-thalassemic IVS2-705 pre-mRNA was permanently corrected by a modified murine U7 snRNA that incorporated sequences antisense to the splice sites activated by the mutation. To explore the possibility of using other snRNAs as vectors for antisense sequences, U1 snRNA was modified in a similar manner. Replacement of the U1 9-nucleotide 5' splice site recognition sequence with nucleotides complementary to the aberrant 5' splice site failed to correct splicing of IVS2-705 pre-mRNA. In contrast, U1 snRNA targeted to the cryptic 3' splice site was effective. A hybrid with a modified U7 snRNA gene under the control of the U1 promoter and terminator sequences resulted in the highest levels of correction (up to 70%) in transiently and stably transfected target cells.  相似文献   

18.
A J Griffith  C Schmauss  J Craft 《Gene》1992,114(2):195-201
The cDNA and partial genomic nucleotide (nt) sequences were derived for the mouse Sm B polypeptide and compared to the cDNA and genomic sequences encoding human Sm B. The deduced amino acid (aa) sequences from the mouse and human genes are identical with the exception of a single conserved aa substitution, accounting for the ability of anti-Sm antibodies to recognize the Sm polypeptides from a broad range of species. The genomic sequence of mouse B gene is similar to the human B genomic locus that extends from exon 6 to exon 7. These loci include conservation of both 3' alternative splice sites and putative branch points required to process B and B' mRNAs in human cells. However, the nt sequence downstream from the putative distal 3' splice junction and single nt flanking the 3' splice site consensus sequence, differ between mouse and human B. This results in a murine mRNA with a different predicted secondary structure around the distal 3' splice site when compared to humans. Thus, secondary structural constraints in the mRNA or changes in the exon sequence might prevent recognition of this alternative splice site to form B' mRNA in murine tissues.  相似文献   

19.
20.
Optimal spliced alignment of homologous cDNA to a genomic DNA template   总被引:17,自引:0,他引:17  
MOTIVATION: Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However, marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST. In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching. RESULTS: We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely only on either search by content and signal or on sequence similarity. AVAILABILITY: The algorithm is available as a C subroutine and is implemented in the SplicePredictor and GeneSeqer programs. The source code is available via anonymous ftp from ftp. zmdb.iastate.edu. Both programs are also implemented as a Web service at http://gremlin1.zool.iastate.edu/cgi-bin/s p.cgiand http://gremlin1.zool.iastate.edu/cgi-bin/g s.cgi, respectively. CONTACT: vbrendel@iastate.edu  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号