首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Dictionary-driven prokaryotic gene finding   总被引:2,自引:0,他引:2       下载免费PDF全文
Gene identification, also known as gene finding or gene recognition, is among the important problems of molecular biology that have been receiving increasing attention with the advent of large scale sequencing projects. Previous strategies for solving this problem can be categorized into essentially two schools of thought: one school employs sequence composition statistics, whereas the other relies on database similarity searches. In this paper, we propose a new gene identification scheme that combines the best characteristics from each of these two schools. In particular, our method determines gene candidates among the ORFs that can be identified in a given DNA strand through the use of the Bio-Dictionary, a database of patterns that covers essentially all of the currently available sample of the natural protein sequence space. Our approach relies entirely on the use of redundant patterns as the agents on which the presence or absence of genes is predicated and does not employ any additional evidence, e.g. ribosome-binding site signals. The Bio-Dictionary Gene Finder (BDGF), the algorithm’s implementation, is a single computational engine able to handle the gene identification task across distinct archaeal and bacterial genomes. The engine exhibits performance that is characterized by simultaneous very high values of sensitivity and specificity, and a high percentage of correctly predicted start sites. Using a collection of patterns derived from an old (June 2000) release of the Swiss-Prot/TrEMBL database that contained 451 602 proteins and fragments, we demonstrate our method’s generality and capabilities through an extensive analysis of 17 complete archaeal and bacterial genomes. Examples of previously unreported genes are also shown and discussed in detail.  相似文献   

3.
4.
5.
Computational gene finding in plants   总被引:10,自引:0,他引:10  
  相似文献   

6.
7.
The quest for orthologs: finding the corresponding gene across genomes   总被引:2,自引:0,他引:2  
Orthology is a key evolutionary concept in many areas of genomic research. It provides a framework for subjects as diverse as the evolution of genomes, gene functions, cellular networks and functional genome annotation. Although orthologous proteins usually perform equivalent functions in different species, establishing true orthologous relationships requires a phylogenetic approach, which combines both trees and graphs (networks) using reliable species phylogeny and available genomic data from more than two species, and an insight into the processes of molecular evolution. Here, we evaluate the available bioinformatics tools and provide a set of guidelines to aid researchers in choosing the most appropriate tool for any situation.  相似文献   

8.
9.
GAPSCORE: finding gene and protein names one word at a time   总被引:2,自引:0,他引:2  
MOTIVATION: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context. RESULTS: We evaluated GAPSCORE against the Yapex data set and achieved an F-score of 82.5% (83.3% recall, 81.5% precision) for partial matches and 57.6% (58.5% recall, 56.7% precision) for exact matches. Since the method is statistical, users can choose score cutoffs that adjust the performance according to their needs. AVAILABILITY: GAPSCORE is available at http://bionlp.stanford.edu/gapscore/  相似文献   

10.
MOTIVATION: The increased availability of genome sequences of closely related organisms has generated much interest in utilizing homology to improve the accuracy of gene prediction programs. Generalized pair hidden Markov models (GPHMMs) have been proposed as one means to address this need. However, all GPHMM implementations currently available are either closed-source or the details of their operation are not fully described in the literature, leaving a significant hurdle for others wishing to advance the state of the art in GPHMM design. RESULTS: We have developed an open-source GPHMM gene finder, TWAIN, which performs very well on two related Aspergillus species, A.fumigatus and A.nidulans, finding 89% of the exons and predicting 74% of the gene models exactly correctly in a test set of 147 conserved gene pairs. We describe the implementation of this GPHMM and we explicitly address the assumptions and limitations of the system. We suggest possible ways of relaxing those assumptions to improve the utility of the system without sacrificing efficiency beyond what is practical. AVAILABILITY: Available at http://www.tigr.org/software/pirate/twain/twain.html under the open-source Artistic License.  相似文献   

11.
A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold δ. A gene team tree is a succinct way to represent all gene teams for every possible value of δ. In this paper, improved algorithms are presented for the problem of finding the gene teams of two chromosomes and the problem of constructing a gene team tree of two chromosomes. For the problem of finding gene teams, Beal et al. had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg t) time, where t ≤ n is the number of gene teams. For the problem of constructing a gene team tree, Zhang and Leong had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg n lglg n) time. Similar to Beal et al.'s gene team algorithm and Zhang and Leong's gene team tree algorithm, our improved algorithms can be extended to k chromosomes with the time complexities increased only by a factor of k.  相似文献   

12.
13.
MOTIVATION: Genome-wide high density SNP association studies are expected to identify various SNP alleles associated with different complex disorders. Understanding the biological significance of these SNP alleles in the context of existing literature is a major challenge since existing search engines are not designed to search literature for SNPs or other genetic markers. The literature mining of gene and protein functions has received significant attention and effort while similar work on genetic markers and their related diseases is still in its infancy. Our goal is to develop a web-based tool that facilitates the mining of Medline literature related to genetic studies and gene/protein function studies. Our solution consists of four main function modules for (1) identification of different types of genetic markers or genetic variations in Medline records (2) distinguishing positive versus negative linkage or association between genetic markers and diseases (3) integrating marker genomic location data from different databases to enable the retrieval of Medline records related to markers in the same linkage disequilibrium region (4) and a web interface called MarkerInfoFinder to search, display, sort and download Medline citation results. Tests using published data suggest MarkerInfoFinder can significantly increase the efficiency of finding genetic disorders and their underlying molecular mechanisms. The functions we developed will also be used to build a knowledge base for genetic markers and diseases. AVAILABILITY: The MarkerInfoFinder is publicly available at: http://brainarray.mbni.med.umich.edu/brainarray/datamining/MarkerInfoFinder.  相似文献   

14.
15.
周海廷 《生物技术》2002,12(5):33-34
用非数学语言描述了隐马尔科夫过程(hidden mark-ov model,HMM),介绍了HMM用于基因识别的原理及基于HMM开发的,比较常用的基因识别程序。  相似文献   

16.
Interpolated Markov models for eukaryotic gene finding.   总被引:21,自引:0,他引:21  
Computational gene finding research has emphasized the development of gene finders for bacterial and human DNA. This has left genome projects for some small eukaryotes without a system that addresses their needs. This paper reports on a new system, GlimmerM, that was developed to find genes in the malaria parasite Plasmodium falciparum. Because the gene density in P. falciparum is relatively high, the system design was based on a successful bacterial gene finder, Glimmer. The system was augmented with specially trained modules to find splice sites and was trained on all available data from the P. falciparum genome. Although a precise evaluation of its accuracy is impossible at this time, laboratory tests (using RT-PCR) on a small selection of predicted genes confirmed all of those predictions. With the rapid progress in sequencing the genome of P. falciparum, the availability of this new gene finder will greatly facilitate the annotation process.  相似文献   

17.
The human gene HSRAD51/RecA homologue has been investigated as a possible candidate gene involved in Bloom's syndrome. No mutations were found in the cDNA isolated from three different Bloom's syndrome cell lines, thus excluding the possibility that HSRAD51 is directly involved in the syndrome. Other possible candidates are discussed.  相似文献   

18.
Published analyses of the sequences of three genes from the 1918 Spanish influenza virus have cast doubt on the theory that it came from birds immediately before the pandemic. They showed that the virus was of the H1N1 subtype lineage but more closely related to mammal-infecting strains than any known bird-infecting strain. They provided no evidence that the virus originated by gene reassortment nor that the virus was the direct ancestor of the two lineages of H1N1 viruses currently found in mammals; one that mostly infects human beings, the other pigs. The unusual virulence of the virus and why it produced a pandemic have remained unsolved. We have reanalysed the sequences of the three 1918 genes and found conflicting patterns of relatedness in all three. Various tests showed that the patterns in its haemagglutinin (HA) gene were produced by true recombination between two different parental HA H1 subtype genes, but that the conflicting patterns in its neuraminidase and non-structural-nuclear export proteins genes resulted from selection. The recombination event that produced the 1918 HA gene probably coincided with the start of the pandemic, and may have triggered it.  相似文献   

19.
微小RNAs(microRNAs,miRNAs)是长度约为22个核苷酸(nt)的内源性非编码小分子RNA。miRNA作为重要的基因调节因子,通过多种机制抑制其靶mRNA的表达。miRNA的表达和/或功能异常与人类多种疾病密切相关。因此,近年miR—NA与人类疾病的相关研究备受关注,寻找miRNA基因显得尤为重要。过去对miRNA基因进行研究的范围较为局限,获得的新miRNA基因很少。目前,对miRNA基因目录的补充主要依赖于复杂计算工具的发展,随着计算工具的发展获得多种简易的寻找miRNA基因的方法,但对miRNA基因目录的补充仍未能起有效作用。本文在简单介绍动植物miRNA生物起源和功能及作用机制的基础上,主要关注动植物miRNA基因寻找的计算方法,可望为探索动植物miRNAs基因寻找的新的计算方法提供有价值的参考。  相似文献   

20.
The gene is not dead, merely orphaned and seeking a home   总被引:2,自引:0,他引:2  
SUMMARY Despite announcements and obituaries, news of the death of the gene has been greatly exaggerated, or so says the gene as it struggles to survive and find a safe haven from which to steer its course through development and evolution. In this short piece, I consider recent claims that the gene is dead. I conclude that the gene is alive and well, living and functioning in the cell, which is both its natural home and a fundamental unit of evo-devo.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号