首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.  相似文献   

2.
A RecA-mediated exon profiling method   总被引:1,自引:0,他引:1  
We have developed a RecA-mediated simple, rapid and scalable method for identifying novel alternatively spliced full-length cDNA candidates. This method is based on the principle that RecA proteins allow to carry radioisotope-labeled probe DNAs to their homologous sequences, resulting in forming triplexes. The resulting complex is easily detected by mobility difference on electrophoresis. We applied this exon profiling method to four selected mouse genes as a feasibility study. To design probes for detection, the information on known exonic regions was extracted from public database, RefSeq. Concerning the potentially transcribed novel exonic regions, RNA mapping experiment using Affymetrix tiling array was performed. As a result, we were able to identify alternative splice variants of Thioredoxin domain containing 5, Interleukin1β, Interleukin 1 family 6 and glutamine-rich hypothetical protein. In addition, full-length sequencing demonstrated that our method could profile exon structures with >90% accuracy. This reliable method can allow us to screen novel splice variants from a huge number of cDNA clone set effectively.  相似文献   

3.
4.
Optimal spliced alignment of homologous cDNA to a genomic DNA template   总被引:17,自引:0,他引:17  
MOTIVATION: Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However, marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST. In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching. RESULTS: We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely only on either search by content and signal or on sequence similarity. AVAILABILITY: The algorithm is available as a C subroutine and is implemented in the SplicePredictor and GeneSeqer programs. The source code is available via anonymous ftp from ftp. zmdb.iastate.edu. Both programs are also implemented as a Web service at http://gremlin1.zool.iastate.edu/cgi-bin/s p.cgiand http://gremlin1.zool.iastate.edu/cgi-bin/g s.cgi, respectively. CONTACT: vbrendel@iastate.edu  相似文献   

5.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

6.
7.
8.
A J Griffith  C Schmauss  J Craft 《Gene》1992,114(2):195-201
The cDNA and partial genomic nucleotide (nt) sequences were derived for the mouse Sm B polypeptide and compared to the cDNA and genomic sequences encoding human Sm B. The deduced amino acid (aa) sequences from the mouse and human genes are identical with the exception of a single conserved aa substitution, accounting for the ability of anti-Sm antibodies to recognize the Sm polypeptides from a broad range of species. The genomic sequence of mouse B gene is similar to the human B genomic locus that extends from exon 6 to exon 7. These loci include conservation of both 3' alternative splice sites and putative branch points required to process B and B' mRNAs in human cells. However, the nt sequence downstream from the putative distal 3' splice junction and single nt flanking the 3' splice site consensus sequence, differ between mouse and human B. This results in a murine mRNA with a different predicted secondary structure around the distal 3' splice site when compared to humans. Thus, secondary structural constraints in the mRNA or changes in the exon sequence might prevent recognition of this alternative splice site to form B' mRNA in murine tissues.  相似文献   

9.
《Gene》1998,207(2):259-266
ATP acts as a fast excitatory neurotransmitter by binding to a large family of membrane proteins, P2X receptors, that have been shown to be ligand-gated, non-selective cation channels. We report the cloning of a full-length and alternatively spliced form of the human P2X4 gene. Clones were identified from a human stomach cDNA library using a rat P2X4 probe. Nucleotide sequence analysis of positive clones identified the full-length human P2X4 cDNA, which codes for a 388-residue protein that is highly homologous (82%) to the rat gene, and an alternatively spliced cDNA. In the alternatively spliced cDNA, the 5′-untranslated region and the first 90 amino acids in the coding region of full-length human P2X4 are replaced by a 35 amino acid coding sequence that is highly homologous with a region of chaparonin proteins in the hsp-90 family. The open reading frames of the full-length and splice variant clones were confirmed by in vitro translation. Northern analysis indicated expression of the full-length P2X4 message in numerous human tissues including smooth muscle, heart, and skeletal muscles. Alternatively spliced RNAs were identified in smooth muscle and brain by RT–PCR and confirmed by RNAse protection assays using a 710 bp anti-sense RNA probe that spanned the alternatively spliced and native P2X4 regions. Injection of full-length, but not alternatively spliced, cRNA into Xenopus oocytes resulted in the expression of ATP gated non-selective cation currents.  相似文献   

10.
11.
12.
A small-scale full-length library construction approach was developed to facilitate production of a mouse full-length cDNA encyclopedia representing approximately 250 enriched, normalized, and/or subtracted cDNA libraries. One library produced using this approach was a subtracted adult mouse inner ear cDNA library (sIEa). The average size of the inserts was approximately 2.5 kb, with the majority ranging from 0.5 to 7.0 kb. From this library 22,574 sequence reads were obtained from 15,958 independent clones. Sequencing and chromosomal localization established 5240 clusters, with 1302 clusters being unique and 359 representing new ESTs. Our sIEa library contributed 56.1% of the 7773 nonredundant Unigene clusters associated with the four mouse inner ear libraries in the NCBI dbEST. Based on homologous chromosomal regions between human and mouse, we identified 1018 UniGene clusters associated with the deafness locus critical regions. Of these, 59 clusters were found only in our sIEa library and represented approximately 50% of the identified critical regions.  相似文献   

13.
Nucleoside triphosphate diphosphohydrolase 3 (NTPDase3) is a cell surface, membrane-bound enzyme that hydrolyzes extracellular nucleotides, thereby modulating purinergic signaling. An alternatively spliced variant of NTPDase3 was obtained and analyzed. This alternatively spliced variant, termed "NTPDase3beta", is produced through the use of an alternative terminal exon (exon 11) in place of the terminal exon (exon 12) in the full-length NTPDase3, now termed "NTPDase3alpha". This results in an expressed protein lacking the C-terminal cytoplasmic sequence, the C-terminal transmembrane helix, and apyrase conserved region 5. The cDNA encoding this truncated splice variant was detected in a human lung library by PCR. Like the full-length NTPDase3alpha, the alternatively spliced NTPDase3beta was expressed in COS cells after transfection, but only the full-length NTPDase3alpha is enzymatically active and properly trafficked to the plasma membrane. However, when the truncated NTPDase3beta was co-transfected with full-length NTPDase3alpha, there was a significant reduction in the amount of NTPDase3alpha that was properly processed and trafficked to the plasma membrane as active enzyme, indicating that the truncated form interferes with normal biosynthetic processing of the full-length enzyme. This suggests a role for the NTPDase3beta variant in the regulation of NTPDase3 nucleotidase activity, and therefore the control of purinergic signaling, in those cells and tissues expressing both NTPDase3alpha and NTPDase3beta.  相似文献   

14.
The Intronerator (http://www.cse.ucsc.edu/ approximately kent/intronerator/ ) is a set of web-based tools for exploring RNA splicing and gene structure in Caenorhabditis elegans. It includes a display of cDNA alignments with the genomic sequence, a catalog of alternatively spliced genes and a database of introns. The cDNA alignments include >100 000 ESTs and almost 1000 full-length cDNAs. ESTs from embryos and mixed stage animals as well as full-length cDNAs can be compared in the alignment display with each other and with predicted genes. The alt-splicing catalog includes 844 open reading frames for which there is evidence of alternative splicing of pre-mRNA. The intron database includes 28 478 introns, and can be searched for patterns near the splice junctions.  相似文献   

15.
16.
MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.  相似文献   

17.
The GeneSeqer@PlantGDB Web server (http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi) provides a gene structure prediction tool tailored for applications to plant genomic sequences. Predictions are based on spliced alignment with source-native ESTs and full-length cDNAs or non-native probes derived from putative homologous genes. The tool is illustrated with applications to refinement of current gene structure annotation and de novo annotation of draft genomic sequences. The service should facilitate expert annotation as a community effort by providing convenient access to all public plant sequences via the PlantGDB database, a simple four-step protocol for spliced alignment and visually appealing displays of the predicted gene structures in addition to detailed sequence alignments.  相似文献   

18.
19.
20.
FULL-malaria is a database for a full-length-enriched cDNA library from the human malaria parasite Plasmodium falciparum (http://133.11. 149.55/). Because of its medical importance, this organism is the first target for genome sequencing of a eukaryotic pathogen; the sequences of two of its 14 chromosomes have already been determined. However, for the full exploitation of this rapidly accumulating information, correct identification of the genes and study of their expression are essential. Using the oligo-capping method, we have produced a full-length-enriched cDNA library from erythrocytic stage parasites and performed one-pass reading. The database consists of nucleotide sequences of 2490 random clones that include 390 (16%) known malaria genes according to BLASTN analysis of the nr-nt database in GenBank; these represent 98 genes, and the clones for 48 of these genes contain the complete protein-coding sequence (49%). On the other hand, comparisons with the complete chromosome 2 sequence revealed that 35 of 210 predicted genes are expressed, and in addition led to detection of three new gene candidates that were not previously known. In total, 19 of these 38 clones (50%) were full-length. From these observations, it is expected that the database contains approximately 1000 genes, including 500 full-length clones. It should be an invaluable resource for the development of vaccines and novel drugs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号