首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
GMAP: a genomic mapping and alignment program for mRNA and EST sequences   总被引:13,自引:0,他引:13  
MOTIVATION: We introduce GMAP, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. RESULTS: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, GMAP identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, GMAP provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, GMAP performed comparably with GeneSeqer. In these experiments, GMAP demonstrated a several-fold increase in speed over existing programs. AVAILABILITY: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap SUPPLEMENTARY INFORMATION: http://www.gene.com/share/gmap.  相似文献   

2.
The resources available from Arabidopsis thaliana for interpreting functional attributes of wheat EST are reviewed. A focus for the review is a comparison between wheat EST sequences, generated from developing endosperm tissue, and the complete genomic sequence from Arabidopsis. The available information indicates that not only can tentative annotations be assigned to many wheat genes but also putative or unknown Arabidopsis gene annotations can be improved by comparative genomics. Electronic Publication  相似文献   

3.
We describe a multiple alignment program named MAP2 based on a generalized pairwise global alignment algorithm for handling long, different intergenic and intragenic regions in genomic sequences. The MAP2 program produces an ordered list of local multiple alignments of similar regions among sequences, where different regions between local alignments are indicated by reporting only similar regions. We propose two similarity measures for the evaluation of the performance of MAP2 and existing multiple alignment programs. Experimental results produced by MAP2 on four real sets of orthologous genomic sequences show that MAP2 rarely missed a block of transitively similar regions and that MAP2 never produced a block of regions that are not transitively similar. Experimental results by MAP2 on six simulated data sets show that MAP2 found the boundaries between similar and different regions precisely. This feature is useful for finding conserved functional elements in genomic sequences. The MAP2 program is freely available in source code form at http://bioinformatics.iastate.edu/aat/sas.html for academic use.  相似文献   

4.

Background

Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method.

Results

Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure.

Conclusion

We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
  相似文献   

5.
Machine learning techniques have improved predictions of secretory proteins from protein, genomic and expressed sequence tag (EST) sequences. Artificial neural networks, physical sequence analysis using high-performance optimization, and hidden Markov models identify extremely variable signal peptides (the vehicles of protein transport across the endoplasmic reticulum membrane), transmembrane segments, and specific extracellular and intracellular domains as indicators of possible roles in the intercellular and intracellular chemical signaling pathways. The major role of peptide hormones, blood coagulation factors, carcinogenesis agents, and other secretory proteins in orchestrating multicellular life indicates pharmacological potential in the cure of major diseases and numerous biotechnological applications.  相似文献   

6.
7.
8.
MOTIVATION: Comparative sequence analysis is the essence of many approaches to genome annotation. Heuristic alignment algorithms utilize similar seed pairs to anchor an alignment. Some applications of local alignment algorithms (e.g. phylogenetic footprinting) would benefit from including prior knowledge (e.g. binding site motifs) in the alignment building process. RESULTS: We introduce predefined sequence patterns as anchor points into a heuristic local alignment strategy. We extended the BLASTZ program for this purpose. A set of seed patterns is either given as consensus sequences in IUPAC code or position-weight-matrices. Phylogenetic footprinting of promoter regions is one of many potential applications for the SITEBLAST software. AVAILABILITY: The source code is freely available to the academic community from http://corg.molgen.mpg.de/software  相似文献   

9.
10.
A multiple alignment program for protein sequences   总被引:1,自引:0,他引:1  
A program for the multiple alignment of protein sequences ispresented. The program is an extension of the fast alignmentprogram by Wilbur et al. (1984) into higher dimensions. Theuse of hash procedures on fragments of the protein sequencesincreases the speed of calculation. Thereby we also take intoaccount fragments which are present in some, but not in all,sequences considered. The results of some multiple alignmentsare given. Received on September 11, 1986; accepted on March 18, 1987  相似文献   

11.
12.
MICAS is a web server for extracting microsatellite information from completely sequenced prokaryote and viral genomes, or user-submitted sequences. This server provides an integrated platform for MICdb (database of prokaryote and viral microsatellites), W-SSRF (simple sequence repeat finding program) and Autoprimer (primer design software). MICAS, through dynamic HTML page generation, helps in the systematic extraction of microsatellite information from selected genomes hosted on MICdb or from user-submitted sequences. Further, it assists in the design of primers with the help of Autoprimer, for sequences containing selected microsatellite tracts.  相似文献   

13.
A system to use bovine EST data in conjunction with human genomic sequence to improve the bovine linkage map over the entire genome or on specific chromosomes was evaluated. Bovine EST sequence was used to provide primer sequences corresponding to bovine genes, while human genomic sequence directed primer design to flank introns and produce amplicons of appropriate size for efficient direct sequencing. The sequence tagged sites (STS) produced in this way from the four sires of the MARC reference families were examined for single nucleotide polymorphisms (SNPs) that could be used to map the corresponding genes. With this approach, along with a primer/extension mass spectrometry SNP genotyping assay, 100 ESTs were placed on the bovine genetic linkage map. The first 70 were chosen at random from bovine EST–human genomic comparisons. An additional 30 ESTs were successfully mapped to bovine Chromosome 19 (BTA19), and comparison of the resulting BTA19 map to the position of the corresponding human orthologs on the HSA17 draft sequences revealed differences in the spacing and order of genes. Over 80% of successful amplicons contained SNPs, indicating that this is an efficient approach to generating EST-associated genetic markers. We have demonstrated the feasibility of constructing a linkage map based on SNPs associated with ESTs and the plausibility of utilizing EST, comparative mapping information, and human sequence data to target regions of the bovine genome for SNP marker development.  相似文献   

14.
Recent years have seen a huge increase in the amount of biomedical information that is available in electronic format. Consequently, for biomedical researchers wishing to relate their experimental results to relevant data lurking somewhere within this expanding universe of on-line information, the ability to access and navigate biomedical information sources in an efficient manner has become increasingly important. Natural language and text processing techniques can facilitate this task by making the information contained in textual resources such as MEDLINE more readily accessible and amenable to computational processing. Names of biological entities such as genes and proteins provide critical links between different biomedical information sources and researchers' experimental data. Therefore, automatic identification and classification of these terms in text is an essential capability of any natural language processing system aimed at managing the wealth of biomedical information that is available electronically. To support term recognition in the biomedical domain, we have developed Termino, a large-scale terminological resource for text processing applications, which has two main components: first, a database into which very large numbers of terms can be loaded from resources such as UMLS, and stored together with various kinds of relevant information; second, a finite state recognizer, for fast and efficient identification and mark-up of terms within text. Since many biomedical applications require this functionality, we have made Termino available to the community as a web service, which allows for its integration into larger applications as a remotely located component, accessed through a standardized interface over the web.  相似文献   

15.
We present 30 microsatellite loci isolated from expressed sequence tag (EST) and genomic libraries in Vaccinium corymbosum L. Allele number per locus in 11 tetraploid and one diploid V. corymbosum accessions ranged from two to 15 (mean = 8.16) in 24 single‐locus simple sequence repeats (SSRs). Cross‐species amplification in a panel of 12 species representing nine sections ranged from 30 to 100% (mean = 83%).  相似文献   

16.
17.
Microsatellite markers have been developed from standard enriched genomic libraries and a cDNA library for the genus Streptocarpus. Out of 15 loci derived from ESTs (expressed sequence tags), four gave working primer pairs, with expected heterozygosities (HE) ranging from 0.42 to 0.86. Out of 89 genomic library derived loci, 6 gave working primer pairs, with HE ranging from 0.63 to 0.93.  相似文献   

18.
The GoSh database is a collection of 58 990 Capra hircus and Ovis aries expressed sequence tags. A perl pipeline was prepared to process sequences, and data were collected in a MySQL database. A PHP-based web interface allows browsing and querying the database. Putative single nucleotide polymorphism (SNP) detection, as well as search to repeats were performed, and links to external related resources were provided. Sequences were annotated against three different databases and an algorithm was implemented to create statistics of the distribution of retrieved homologous ontologies in the Gene Ontology categories. The GoSh database is a repository of data and links related to goat and sheep expressed genes. AVAILABILITY: The GoSh database is available at http://www.itb.cnr.it/gosh/  相似文献   

19.
DNA markers able to distinguish species or genera with high specificity are valuable in the identification of introgressed regions in interspecific or intergeneric hybrids. Intergeneric hybridization between the genera of Lolium and Festuca, leading to the reciprocal introgression of chromosomal segments, can produce novel forage grasses with unique combinations of characteristics. To characterize Lolium/Festuca introgressions, novel PCR-based expression sequence tag (EST) markers were developed. These markers were designed around intronic regions which show higher polymorphism than exonic regions. Intronic regions of the grass genes were predicted from the sequenced rice genome. Two hundred and nine primer sets were designed from Lolium/Festuca ESTs that showed high similarity to unique rice genes dispersed uniformly throughout the rice genome. We selected 61 of these primer sets as insertion-deletion (indel)-type markers and 82 primer sets as cleaved amplified polymorphic sequence (CAPS) markers to distinguish between Lolium perenne and Festuca pratensis. Specificity of these markers to each species was evaluated by the genotyping of four cultivars and accessions (32 individuals) of L. perenne and F. pratensis, respectively. Evaluation using specificity indices proposed in this study suggested that many indel-type markers had high species specificity to L. perenne and F. pratensis, including 15 markers completely specific to both species. Forty-nine of the CAPS markers completely distinguish between the two species at bulk level. Chromosome mapping of these markers using a Lolium/Festuca substitution line revealed syntenic relationships between Lolium/Festuca and rice largely consistent with previous reports. This intron-based marker system that shows a high level of polymorphisms between species in combination with high species specificity will consequently be a valuable tool in Festulolium breeding. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

20.
An optimized protocol for analysis of EST sequences   总被引:16,自引:1,他引:16  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号