首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: ESTminer is a collection of programs that use expressed sequence tag (EST) data from inbred genomes to identify unique genes within gene families. The algorithm utilizes Cap3 to perform an initial clustering of related EST sequences to produce a consensus sequence of a gene family. These consensus sequences are then used to collect all ESTs in the original EST library that are related using BLAST. A redundancy based criterion is applied to each EST to identify reliable unique gene-sequences. Using a highly inbred genome as a source of ESTs eliminates the necessity of computing covariance on each polymorphism to identify alleles of the same gene, thus making this algorithm more streamlined than other alternatives which must computationally attempt to distinguish genes from alleles. AVAILABILITY: The programs were written in PERL and are freely available at http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html CONTACT: nelsonrt@iastate.edu SUPPLEMENTARY INFORMATION: Figures and dataset can be obtained from: http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html.  相似文献   

2.
PEDB: the Prostate Expression Database.   总被引:6,自引:1,他引:5       下载免费PDF全文
The Prostate Expression Database (PEDB) is a curated relational database and suite of analysis tools designed for the study of prostate gene expression in normal and disease states. Expressed Sequence Tags (ESTs) and full-length cDNA sequences derived from more than 40 human prostate cDNA libraries are maintained and represent a wide spectrum of normal and pathological conditions. Detailed library information including tissue source, library construction methods, sequence diversity and abundance are available in a library archive. Prostate ESTs are assembled into distinct species groups using the multiple alignment program CAP2 and are annotated with information from the GenBank, dbEST and Unigene public sequence databases. Annotated sequences in PEDB are searched using the BLAST algorithm. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library species comparisons. PEDB may be accessed via the World Wide Web at http://www.mbt.washington.edu/PEDB/  相似文献   

3.
Optimal spliced alignment of homologous cDNA to a genomic DNA template   总被引:17,自引:0,他引:17  
MOTIVATION: Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However, marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST. In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching. RESULTS: We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely only on either search by content and signal or on sequence similarity. AVAILABILITY: The algorithm is available as a C subroutine and is implemented in the SplicePredictor and GeneSeqer programs. The source code is available via anonymous ftp from ftp. zmdb.iastate.edu. Both programs are also implemented as a Web service at http://gremlin1.zool.iastate.edu/cgi-bin/s p.cgiand http://gremlin1.zool.iastate.edu/cgi-bin/g s.cgi, respectively. CONTACT: vbrendel@iastate.edu  相似文献   

4.
Tirunagaru VG  Sofer L  Cui J  Burnside J 《Genomics》2000,66(2):144-151
The cDNA and gene sequences of many mammalian cytokines and their receptors are known. However, corresponding information on avian cytokines is limited due to the lack of cross-species activity at the functional level or strong homology at the molecular level. To improve the efficiency of identifying cytokines and novel chicken genes, a directionally cloned cDNA library from T-cell-enriched activated chicken splenocytes was constructed, and the partial sequence of 5251 clones was obtained. Sequence clustering indicates that 2357 (42%) of the clones are present as a single copy, and 2961 are distinct clones, demonstrating the high level of complexity of this library. Comparisons of the sequence data with known DNA sequences in GenBank indicate that approximately 25% of the clones match known chicken genes, 39% have similarity to known genes in other species, and 11% had no match to any sequence in the database. Several previously uncharacterized chicken cytokines and their receptors were present in our library. This collection provides a useful database for cataloging genes expressed in T cells and a valuable resource for future investigations of gene expression in avian immunology. A chicken EST Web site (http://udgenome. ags.udel. edu/chickest/chick.htm) has been created to provide access to the data, and a set of unique sequences has been deposited with GenBank (Accession Nos. AI979741-AI982511). Our new Web site (http://www. chickest.udel.edu) will be active as of March 3, 2000, and will also provide keyword-searching capabilities for BLASTX and BLASTN hits of all our clones.  相似文献   

5.
6.
7.
8.
The ProDom database of protein domain families.   总被引:12,自引:1,他引:11       下载免费PDF全文
F Corpet  J Gouzy    D Kahn 《Nucleic acids research》1998,26(1):323-326
The ProDom database contains protein domain families generated from the SWISS-PROT database by automated sequence comparisons. It can be searched on the World Wide Web (http://protein.toulouse.inra. fr/prodom.html ) or by E-mail (prodom@toulouse.inra.fr) to study domain arrangements within known families or new proteins. Strong emphasis has been put on the graphical user interface which allows for interactive analysis of protein homology relationships. Recent improvements to the server include: ProDom search by keyword; links to PROSITE and PDB entries; more sensitive ProDom similarity search with BLAST or WU-BLAST; alignments of query sequences with homologous ProDom domain families; and links to the SWISS-MODEL server (http: //www.expasy.ch/swissmod/SWISS-MODEL.html ) for homology based 3-D domain modelling where possible.  相似文献   

9.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

10.
MOTIVATION: Using bioinformatic approaches we aimed to characterize poorly understood abnormalities in splicing known as exon scrambling, exon repetition and trans-splicing. RESULTS: We developed a software package that allows large-scale comparison of all human expressed sequence tags (EST) sequences to the entire set of human gene sequences. Among 5,992,495 EST sequences, 401 cases of exon repetition and 416 cases of exon scrambling were found. The vast majority of identified ESTs contain fragments rather than full-length repeated or scrambled exons. Their structures suggest that the scrambled or repeated exon fragments may have arisen in the process of cDNA cloning and not from splicing abnormalities. Nevertheless, we found 11 cases of full-length exon repetition showing that this phenomenon is real yet very rare. In searching for examples of trans-splicing, we looked only at reproducible events where at least two independent ESTs represent the same putative trans-splicing event. We found 15 ESTs representing five types of putative trans-splicing. However, all 15 cases were derived from human malignant tissues and could have resulted from genomic rearrangements. Our results provide support for a very rare but physiological occurrence of exon repetition, but suggest that apparent exon scrambling and trans-splicing result, respectively, from in vitro artifact and gene-level abnormalities. AVAILABILITY: Exon-Intron Database (EID) is available at http://www.meduohio.edu/bioinfo/eid. Programs are available at http://www.meduohio.edu/bioinfo/software.html. The Laboratory website is available at http://www.meduohio.edu/medicine/fedorov Supplementary information: Supplementary file is available at http://www.meduohio.edu/bioinfo/software.html.  相似文献   

11.
12.
13.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.  相似文献   

14.
The knowledge base EPO-KB (Empirical Proteomic Ontology Knowledge Base) is based on an OWL ontology that represents current knowledge linking mass-to-charge (m/z) ratios to proteins on multiple platforms including Matrix Assisted Laser/Desorption Ionization (MALDI) and Surface Enhanced Laser/Desorption Ionization (SELDI)--Time of Flight (TOF). At present, it contains information on m/z ratio to protein links that were extracted from 120 published research papers. It has a web interface that allows researchers to query and retrieve putative proteins that correspond to a user-specified m/z ratio. EPO-KB also allows automated entry of additional m/z ratio to protein links and is expandable to the addition of gene to protein and protein to disease links. AVAILABILITY: http://www.dbmi.pitt.edu/EPO-KB  相似文献   

15.
WebFEATURE (http://feature.stanford.edu/webfeature/) is a web-accessible structural analysis tool that allows users to scan query structures for functional sites in both proteins and nucleic acids. WebFEATURE is the public interface to the scanning algorithm of the FEATURE package, a supervised learning algorithm for creating and identifying 3D, physicochemical motifs in molecular structures. Given an input structure or Protein Data Bank identifier (PDB ID), and a statistical model of a functional site, WebFEATURE will return rank-scored 'hits' in 3D space that identify regions in the structure where similar distributions of physicochemical properties occur relative to the site model. Users can visualize and interactively manipulate scored hits and the query structure in web browsers that support the Chime plug-in. Alternatively, results can be downloaded and visualized through other freely available molecular modeling tools, like RasMol, PyMOL and Chimera. A major application of WebFEATURE is in rapid annotation of function to structures in the context of structural genomics.  相似文献   

16.
To enhance gene discovery, expressed sequence tag (EST) projects often make use of cDNA libraries produced using diverse mixtures of mRNAs. As such, expression data are lost because the origins of the resulting ESTs cannot be determined. Alternatively, multiple libraries can be prepared, each from a more restricted source of mRNAs. Although this approach allows the origins of ESTs to be determined, it requires the production of multiple libraries. A hybrid approach is reported here. A cDNA library was prepared using 21 different pools of maize (Zea mays) mRNAs. DNA sequence "bar codes" were added during first-strand cDNA synthesis to uniquely identify the mRNA source pool from which individual cDNAs were derived. Using a decoding algorithm that included error correction, it was possible to identify the source mRNA pool of more than 97% of the ESTs. The frequency at which a bar code is represented in an EST contig should be proportional to the abundance of the corresponding mRNA in the source pool. Consistent with this, all ESTs derived from several genes (zein and adh1) that are known to be exclusively expressed in kernels or preferentially expressed under anaerobic conditions, respectively, were exclusively tagged with bar codes associated with mRNA pools prepared from kernel and anaerobically treated seedlings, respectively. Hence, by allowing for the retention of expression data, the bar coding of cDNA libraries can enhance the value of EST projects.  相似文献   

17.
Progress in maize gene discovery: a project update   总被引:9,自引:0,他引:9  
The Maize Gene Discovery Project (MGDP) is a 5-year NSF-funded plant genome initiative that began in 1998. The MGDP collaboration involves researchers at six universities from diverse disciplines with the common goal of discovering new maize genes and developing tools for the phenotypic characterization of maize mutants. The project utilizes several approaches: EST sequencing, cDNA microarray production, and the discovery of gene function and genomic sequence through the use of a recombinant Mu1 transposon (RescueMu). Current achievements of the MGDP (NSF 98–72657) include the sequencing of over 120,000 maize ESTs from diverse cDNA libraries, and over 70,000 RescueMu flanking sequences, as well as the cataloguing of mutant seed and cob phenotypes of 23,000 maize ears, 6,200 families of maize seedlings, and 4,000 families of adult maize plants carrying MuDR/Mu and RescueMu insertion alleles. A consolidation of over 24,000 unique sequences from 19 libraries has been made into the first two of the planned set of four "Unigene" microarray slides. In addition, slides for four EST libraries have been produced. These microarray slides, EST clones, library plates of immortalized RescueMu bacterial cultures, and seed are all available online (http://www.zmdb.iastate.edu). The ZmDB website posts periodic assemblies of all maize EST and genomic sequences available from GenBank. ZmDB is also a portal for sequence analysis software designed to aid in gene discovery: MuSeqBox, GeneSeqer, and SplicePredictor . In addition, ZmDB contains links to other plant and genetics websites. Electronic Publication  相似文献   

18.
Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000-100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.  相似文献   

19.
The glucocorticoid receptor resource focuses on the structure-function relationships of the glucocorticoid receptor. As well as links to sequence and bibliographic databases via the World Wide Web, the database contains sequence comparisons of receptors from different species and source information for glucocorticoid receptor clones, probes, cell lines and antibodies. The resource allows the electronic publication of essays, unpublished data and reviews on steroid receptors. These publications will not be reviewed or edited and should allow the rapid dissemination of information to the scientific community. The resource can be reached at: http://biochem1.basic- sci.georgetown.edu/grr/grr.html.  相似文献   

20.
人SDCT2基因的两种不同转录产物选择性转录机理分析   总被引:2,自引:0,他引:2  
为了克隆人高亲和力钠离子依赖性二羧酸转运蛋白 (highaffinitysodium dependentdicarboxylatetransporter,SDCT2 ,或NaDC3)基因并研究其生理功能 ,用大鼠SDCT2基因序列作为电子杂交探针对人EST数据库进行电子筛选 ,得到了一系列与大鼠SDCT2序列具有高度同源性的人EST序列 ,将它们拼接成 2个基因重叠群 ,设计特异性PCR引物通过RT PCR扩增得到 2条杂交探针用于筛选人肾cDNA文库 .从肾组织中同时克隆出了人SDCT2基因 2种mRNA变异体的全长cDNA(SDCT2α和SDCT2 β) ,两者 5′端前 3435bp序列完全一致 ,但 3′端长度不同 ,SDCT2 β在第 3435bp以后比SDCT2α多出了 5 85bp的序列 .Northern杂交和RT PCR显示 ,SDCT2α在人肾中的表达丰度最高 ,在肝、脾、胎盘、脑及结肠中也有低水平的表达 .而SDCT2 β主要在肾脏中表达 ,在脾也有低水平的表达 .基因组结构分析表明 ,虽然两种mRNAs均由 13个外显子组成 ,但是SDCT2α的第 13外显子含有 1个poly(A)加尾信号AATAAA ,而SDCT2 β的第 13外显子含有 2个poly(A)加尾信号 .这表明在肾脏和脾脏组织中 ,人SDCT2基因可能通过选择性使用位于第 13外显子不同位置的 2个poly(A)信号而转录出 2种不同长度的mRNA变异体 .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号