首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Optimal spliced alignment of homologous cDNA to a genomic DNA template   总被引:17,自引:0,他引:17  
MOTIVATION: Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However, marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST. In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching. RESULTS: We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely only on either search by content and signal or on sequence similarity. AVAILABILITY: The algorithm is available as a C subroutine and is implemented in the SplicePredictor and GeneSeqer programs. The source code is available via anonymous ftp from ftp. zmdb.iastate.edu. Both programs are also implemented as a Web service at http://gremlin1.zool.iastate.edu/cgi-bin/s p.cgiand http://gremlin1.zool.iastate.edu/cgi-bin/g s.cgi, respectively. CONTACT: vbrendel@iastate.edu  相似文献   

2.
We have created databases and software applications for the analysis of DNA mutations at the human p53 gene, the human hprt gene and both the rodent transgenic lacI and lacZ loci. The databases themselves are stand-alone dBASE files and the software for analysis of the databases runs on IBM-compatible computers with Microsoft Windows. Each database has a separate software analysis program. The software created for these databases permit the filtering, ordering, report generation and display of information in the database. In addition, a significant number of routines have been developed for the analysis of single base substitutions. One method of obtaining the databases and software is via the World Wide Web. Open the following home page with a Web Browser: http://sunsite.unc.edu/dnam/mainpage. html . Alternatively, the databases and programs are available via public FTP from: anonymous@sunsite.unc.edu. There is no password required to enter the system. The databases and software are found beneath the subdirectory: pub/academic/biology/dna-mutations. Two other programs are available at the site, a program for comparison of mutational spectra and a program for entry of mutational data into a relational database.  相似文献   

3.
Many genes are involved in mammalian cell apoptosis pathway. These apoptosis genes often contain characteristic functional domains, and can be classified into at least 15 functional groups, according to previous reports. Using an integrated bioinformatics platform for motif or domain search from three public mammalian proteomes (International Protein Index database for human, mouse, and rat), we systematically cataloged all of the proteins involved in mammalian apoptosis pathway. By localizing those proteins onto the genomes, we obtained a gene locus centric apoptosis gene catalog for human, mouse and rat.Further phylogenetic analysis showed that most of the apoptosis related gene loci are conserved among these three mammals. Interestingly, about one-third of apoptosis gene loci form gene clusters on mammal chromosomes, and exist in the three species, which indicated that mammalian apoptosis gene orders are also conserved. In addition, some tandem duplicated gene loci were revealed by comparing gene loci clusters in the three species. All data produced in this work were stored in a relational database and may be viewed at http://pcas.cbi.pku.edu.cn/database/apd.php.  相似文献   

4.
5.
6.
GRIMM: genome rearrangements web server   总被引:14,自引:0,他引:14  
SUMMARY: Genome Rearrangements In Man and Mouse (GRIMM) is a tool for analyzing rearrangements of gene orders in pairs of unichromosomal and multichromosomal genomes, with either signed or unsigned gene data. Although there are several programs for analyzing rearrangements in unichromosomal genomes, this is the first to analyze rearrangements in multichromosomal genomes. GRIMM also provides a new algorithm for analyzing comparative maps for which gene directions are unknown. AVAILABILITY: A web server, with instructions and sample data, is available at http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM.  相似文献   

7.
The UCSC Known Genes   总被引:17,自引:0,他引:17  
The University of California Santa Cruz (UCSC) Known Genes dataset is constructed by a fully automated process, based on protein data from Swiss-Prot/TrEMBL (UniProt) and the associated mRNA data from Genbank. The detailed steps of this process are described. Extensive cross-references from this dataset to other genomic and proteomic data were constructed. For each known gene, a details page is provided containing rich information about the gene, together with extensive links to other relevant genomic, proteomic and pathway data. As of July 2005, the UCSC Known Genes are available for human, mouse and rat genomes. The Known Genes serves as a foundation to support several key programs: the Genome Browser, Proteome Browser, Gene Sorter and Table Browser offered at the UCSC website. All the associated data files and program source code are also available. They can be accessed at http://genome.ucsc.edu. The genomic coverage of UCSC Known Genes, RefSeq, Ensembl Genes, H-Invitational and CCDS is analyzed. Although UCSC Known Genes offers the highest genomic and CDS coverage among major human and mouse gene sets, more detailed analysis suggests all of them could be further improved.  相似文献   

8.
Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative dictionary learning (OMRNDL) algorithm to overcome this deficiency. Notably, OMRNDL casts visual tracking as a dictionary learning problem under the particle filter framework and captures the intrinsic knowledge about the target from multiple visual modalities, e.g., pixel intensity and texture information. To this end, OMRNDL adaptively learns an individual dictionary, i.e., template, for each modality from available frames, and then represents new particles over all the learned dictionaries by minimizing the fitting loss of data based on M-estimation. The resultant representation coefficient can be viewed as the common semantic representation of particles across multiple modalities, and can be utilized to track the target. OMRNDL incrementally learns the dictionary and the coefficient of each particle by using multiplicative update rules to respectively guarantee their non-negativity constraints. Experimental results on a popular challenging video benchmark validate the effectiveness of OMRNDL for visual tracking in both quantity and quality.  相似文献   

9.
The pituitary is the master endocrine gland responsible for the regulation of various physiologic and metabolic processes. Proteomics offers an efficient means for a comprehensive analysis of pituitary protein expression. This paper reports on the application of proteomics for the mapping of major proteins in a normal (control) pituitary. Pituitary proteins were separated by two-dimensional gel electrophoresis with immobilized pH 3-10 gradient strips. Major protein spots that were visualized in the two-dimensional gel by silver staining were excised, and the proteins in these spots were digested with trypsin. The tryptic digests were analyzed by mass spectrometry, and the mass spectrometric data were used to identify the proteins through searches of the SWISS-PROT or NCBInr protein sequence databases. The majority of the proteins were identified on the basis of peptide mass fingerprinting data obtained by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Several proteins were also characterized based on product-ion spectra measured by post-source decay analysis and/or liquid chromatography-electrospray-quadrupole ion trap mass spectrometry. To date, 62 prominent protein spots, corresponding to 38 different proteins, were identified. The identified proteins include important pituitary hormones, structural proteins, enzymes, and other proteins. The protein identification data were used to establish a two-dimensional reference database of the human pituitary, which can be accessed over the Internet (http://www.utmem.edu/proteomics). This database will serve as a tool for further proteomics studies of pituitary protein expression in health and disease.  相似文献   

10.
Prophage loci often remain under-annotated or even unrecognized in prokaryotic genome sequencing projects. A PHP application, Prophage Finder, has been developed and implemented to predict prophage loci, based upon clusters of phage-related gene products encoded within DNA sequences. This application provides results detailing several facets of these clusters to facilitate rapid prediction and analysis of prophage sequences. Prophage Finder was tested using previously annotated prokaryotic genomic sequences with manually curated prophage loci as benchmarks. Additional analyses from Prophage Finder searches of several draft prokaryotic genome sequences are available through the Web site (http://bioinformatics.uwp.edu/~phage/DOEResults.php) to illustrate the potential of this application.  相似文献   

11.
The Prostate Gene Database (PGDB: http://www.ucsf.edu/pgdb) is a curated and integrated database of genes or genomic loci related to the human prostate and prostatic diseases. Currently, PGDB covers genes involved in a number of molecular and genetic events of the prostate including gene amplification, mutation, gross deletion, methylation, polymorphism, linkage and over-expression, as published in the literature. Genes that are specifically expressed in prostate, as evidenced by analysis of data from expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE), are also included. There are a total of 165 unique entries in the database. Users can either browse or query the PGDB through a web interface. For each gene, in addition to basic gene information and rich cross-references to other databases, inclusive and relevant literature references are provided to support the inclusion of the gene in the database. Detailed expression data calculated from the UniGene and SAGEmap databases are also presented.  相似文献   

12.
We have created databases and software applications for the analysis of DNA mutations in the human p53 gene, the human hprt gene and the rodent transgenic lacZ locus. The databases themselves are stand-alone dBase files and the software for analysis of the databases runs on IBM- compatible computers. The software created for these databases permits filtering, ordering, report generation and display of information in the database. In addition, a significant number of routines have been developed for the analysis of single base substitutions. One method of obtaining the databases and software is via the World Wide Web (WWW). Open home page http://sunsite.unc.edu/dnam/mainpage.ht ml with a WWW browser. Alternatively, the databases and programs are available via public ftp from anonymous@sunsite.unc.edu. There is no password required to enter the system. The databases and software are found in subdirectory pub/academic/biology/dna-mutations. Two other programs are available at the WWW site, a program for comparison of mutational spectra and a program for entry of mutational data into a relational database.  相似文献   

13.
SUMMARY: Protein name extraction is an important step in mining biological literature. We describe two new methods for this task: semiCRFs and dictionary HMMs. SemiCRFs are a recently-proposed extension to conditional random fields (CRFs) that enables more effective use of dictionary information as features. Dictionary HMMs are a technique in which a dictionary is converted to a large HMM that recognizes phrases from the dictionary, as well as variations of these phrases. Standard training methods for HMMs can be used to learn which variants should be recognized. We compared the performance of our new approaches with that of Maximum Entropy (MaxEnt) and normal CRFs on three datasets, and improvement was obtained for all four methods over the best published results for two of the datasets. CRFs and semiCRFs achieved the highest overall performance according to the widely-used F-measure, while the dictionary HMMs performed the best at finding entities that actually appear in the dictionary-the measure of most interest in our intended application. AVAILABILITY: Dictionary HMMs were implemented in Java. Algorithms are available through an information extraction package MINORTHIRD on http://minorthird.sourceforge.net  相似文献   

14.
MOTIVATION: Computational gene identification plays an important role in genome projects. The approaches used in gene identification programs are often tuned to one particular organism, and accuracy for one organism or class of organism does not necessarily translate to accurate predictions for other organisms. In this paper we evaluate five computer programs on their ability to locate coding regions and to predict gene structure in Neurospora crassa. One of these programs (FFG) was designed specifically for gene-finding in N.crassa, but the model parameters have not yet been fully 'tuned', and the program should thus be viewed as an initial prototype. The other four programs were neither designed nor tuned for N.crassa. RESULTS: We describe the data sets on which the experiments were performed, the approaches employed by the five algorithms: GenScan, HMMGene, GeneMark, Pombe and FFG, the methodology of our evaluation, and the results of the experiments. Our results show that, while none of the programs consistently performs well, overall the GenScan program has the best performance on sensitivity and Missing Exons (ME) while the HMMGene and FFG programs have good performance in locating the exons roughly. Additional work motivated by this study includes the creation of a tool for the automated evaluation of gene-finding programs, the collection of larger and more reliable data sets for N.crassa, parameterization of the model used in FFG to produce a more accurate gene-finding program for this species, and a more in-depth evaluation of the reasons that existing programs generally fail for N.crassa. AVAILABILITY: Data sets, the FFG program source code, and links to the other programs analyzed are available at http://jerry.cs.uga.edu/~wang/genefind.html. CONTACT: eileen@cs.uga.edu.  相似文献   

15.
CaGE: cardiac gene expression knowledgebase   总被引:4,自引:0,他引:4  
CaGE is a Cardiac Gene Expression knowledgebase we have developed to facilitate the analysis of genes important to human cardiac function. CaGE integrates the functionality of the LocusLink database with data from several human cardiac expression libraries, phenotypic data from OMIM and data from large-scale microarray gene expression studies to create a knowledgebase of gene expression in human cardiac tissue. The knowledgebase is fully searchable via the web using several intuitive query interfaces. Results can be displayed in several concise easy to navigate formats. AVAILABILITY: CaGE is located at http://www.cage.wbmei.jhu.edu  相似文献   

16.
Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed with regions of low LD. Such LD patterns make it possible to select a set of single nucleotide polymorphism (SNPs; tag SNPs) for genome-wide association studies. We have developed a suite of computer programs to analyze the block-like LD patterns and to select the corresponding tag SNPs. Compared to other programs for haplotype block partitioning and tag SNP selection, our program has several notable features. First, the dynamic programming algorithms implemented are guaranteed to find the block partition with minimum number of tag SNPs for the given criteria of blocks and tag SNPs. Second, both haplotype data and genotype data from unrelated individuals and/or from general pedigrees can be analyzed. Third, several existing measures/criteria for haplotype block partitioning and tag SNP selection have been implemented in the program. Finally, the programs provide flexibility to include specific SNPs (e.g. non-synonymous SNPs) as tag SNPs. AVAILABILITY: The HapBlock program and its supplemental documents can be downloaded from the website http://www.cmb.usc.edu/~msms/HapBlock.  相似文献   

17.
18.
A number of statistical methods are widely used to describe allelic variation at specific genetic loci and its implication on the evolutionary history of these loci. Although the methods were developed primarily to study allelic variation at loci that are virtually always present in the genome, they are often applied to data of gene content variation (i.e., presence/absence of multiple homologous genes) at the killer cell immunoglobulin-like receptor (KIR) gene cluster. In this paper, we discuss methodological issues involved in the analysis of gene content variation data in the KIR region and also its covariation with polymorphism at the human leukocyte antigen class I loci, which encode ligands for KIR. A comparison of several statistical methods and measures (gene frequency, haplotype frequency, and linkage disequilibrium estimation) using the Centre d’Etude du Polymorphisme Humain data will be provided using KIR haplotypes that have been determined by segregation analysis, noting the strengths and weaknesses of the methods when only the presence/absence data is considered. Finally, application of these methods to a set of globally distributed populations is described (see Single et al., Nat Genet 39:1114–1119, 2007) in order to illustrate the challenges faced when inferring the joint effects of natural selection and demographic history on these immune-related genes.  相似文献   

19.
This paper presents the 12th update of the human obesity gene map, which incorporates published results up to the end of October 2005. Evidence from single-gene mutation obesity cases, Mendelian disorders exhibiting obesity as a clinical feature, transgenic and knockout murine models relevant to obesity, quantitative trait loci (QTL) from animal cross-breeding experiments, association studies with candidate genes, and linkages from genome scans is reviewed. As of October 2005, 176 human obesity cases due to single-gene mutations in 11 different genes have been reported, 50 loci related to Mendelian syndromes relevant to human obesity have been mapped to a genomic region, and causal genes or strong candidates have been identified for most of these syndromes. There are 244 genes that, when mutated or expressed as transgenes in the mouse, result in phenotypes that affect body weight and adiposity. The number of QTLs reported from animal models currently reaches 408. The number of human obesity QTLs derived from genome scans continues to grow, and we now have 253 QTLs for obesity-related phenotypes from 61 genome-wide scans. A total of 52 genomic regions harbor QTLs supported by two or more studies. The number of studies reporting associations between DNA sequence variation in specific genes and obesity phenotypes has also increased considerably, with 426 findings of positive associations with 127 candidate genes. A promising observation is that 22 genes are each supported by at least five positive studies. The obesity gene map shows putative loci on all chromosomes except Y. The electronic version of the map with links to useful publications and relevant sites can be found at http://obesitygene.pbrc.edu.  相似文献   

20.

Background

Next-generation sequencing (NGS) has yielded an unprecedented amount of data for genetics research. It is a daunting task to process the data from raw sequence reads to variant calls and manually processing this data can significantly delay downstream analysis and increase the possibility for human error. The research community has produced tools to properly prepare sequence data for analysis and established guidelines on how to apply those tools to achieve the best results, however, existing pipeline programs to automate the process through its entirety are either inaccessible to investigators, or web-based and require a certain amount of administrative expertise to set up.

Findings

Advanced Sequence Automated Pipeline (ASAP) was developed to provide a framework for automating the translation of sequencing data into annotated variant calls with the goal of minimizing user involvement without the need for dedicated hardware or administrative rights. ASAP works both on computer clusters and on standalone machines with minimal human involvement and maintains high data integrity, while allowing complete control over the configuration of its component programs. It offers an easy-to-use interface for submitting and tracking jobs as well as resuming failed jobs. It also provides tools for quality checking and for dividing jobs into pieces for maximum throughput.

Conclusions

ASAP provides an environment for building an automated pipeline for NGS data preprocessing. This environment is flexible for use and future development. It is freely available at http://biostat.mc.vanderbilt.edu/ASAP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号