首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Using the transcriptome to annotate the genome   总被引:35,自引:0,他引:35  
A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified approximately 15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another approximately 10,000-20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed. As the in silico approaches identified a smaller number of genes than anticipated, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method--called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach--that can be used to rapidly identify novel genes and exons.  相似文献   

2.
We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3′ end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA.  相似文献   

3.
Improved microbial gene identification with GLIMMER.   总被引:62,自引:13,他引:49       下载免费PDF全文
The GLIMMER system for microbial gene identification finds approximately 97-98% of all genes in a genome when compared with published annotation. This paper reports on two new results: (i) significant technical improvements to GLIMMER that improve its accuracy still further, and (ii) a comprehensive evaluation that demonstrates that the accuracy of the system is likely to be higher than previously recognized. A significant proportion of the genes missed by the system appear to be hypothetical proteins whose existence is only supported by the predictions of other programs. When the analysis is restricted to genes that have significant homology to genes in other organisms, GLIMMER misses <1% of known genes.  相似文献   

4.
H Dodemont  D Riemer    K Weber 《The EMBO journal》1990,9(12):4083-4094
The structure of the single gene encoding the cytoplasmic intermediate filament (IF) proteins in non-neuronal cells of the gastropod Helix aspersa is described. Genomic and cDNA sequences show that the gene is composed of 10 introns and 11 exons, spanning greater than 60 kb of DNA. Alternative RNA processing accounts for two mRNA families which encode two IF proteins differing only in their C-terminal sequence. The intron/exon organization of the Helix rod domain is identical to that of the vertebrate type III IF genes in spite of low overall protein sequence homology and the presence of an additional 42 residues in coil 1b of the invertebrate sequence. Intron position homology extends to the entire coding sequence comprising both the rod and tail domains when the invertebrate IF gene is compared with the nuclear lamin LIII gene of Xenopus laevis presented in the accompanying report of Döring and Stick. In contrast the intron patterns of the tail domains of the invertebrate IF and the lamin genes differ from those of the vertebrate type III genes. The combined data are in line with an evolutionary descent of cytoplasmic IF proteins from a nuclear lamin-like progenitor and suggest a mechanism for this derivation. The unique position of intron 7 in the Helix IF gene indicates that the archetype IF gene arose by the elimination of the nuclear localization sequence due to the recruitment of a novel splice site. The presumptive structural organization of the archetype IF gene allows predictions with respect to the later diversification of metazoan IF genes. Whereas models proposing a direct derivation of neurofilament genes seem unlikely, the earlier speculation of an mRNA transposition mechanism is compatible with current results.  相似文献   

5.
6.
Homocracy, a term referring to shared regulatory gene expression patterns between organs in different animals, was introduced recently in order to prevent inappropriate inference of organ homology based on gene expression data. Non-homologous structures expressing homologous genes, and homologous structures expressing non-homologous genes illustrate that gene expression data is not sufficient on its own to identify morphological homology. However, gene expression data might be useful in testing hypotheses of organ homology, because parsimony can be applied on changes in the relation between expression of orthologous regulatory genes and the formation of homologous organs. A method of testing organ homology hypotheses with respect to change in regulatory gene expression required within a particular phylogenetic context is presented.Edited by R.J. Sommer  相似文献   

7.
The laboratory rat, Rattus norvegicus, and the laboratory mouse, Mus musculus, are key animal models in biomedical research. A deeper understanding of the genetic interrelationsships between Homo sapiens and these two rodent species is desirable for extending the usefulness of the animal models. We present comprehensive rat-human and rat-mouse comparative maps, based on 1090 gene homology assignments available for rat genes. Radiation hybrid, FISH, and zoo-FISH mapping data have been integrated to produce comparative maps that are estimated to comprise 83-100% of the conserved regions between rat and mouse and 66-82% of the conserved regions between rat and human. The rat-mouse zoo-FISH analysis, supported by data for individual genes, revealed nine previously undetected conserved regions compared to earlier reports. Since there is almost complete genome coverage in the rat-mouse comparative map, we conclude that it is feasible to make accurate predictions of gene positions in the rat based on gene locations in the mouse.  相似文献   

8.
It is standard practice, whenever a researcher finds a new gene, to search databases for genes that have a similar sequence. It is not standard practice, whenever a researcher finds a new gene, to search for genes that have similar expression (co-expression). Failure to perform co-expression searches has lead to incorrect conclusions about the likely function of new genes, and has lead to wasted laboratory attempts to confirm functions incorrectly predicted. We present here the example of Glia Maturation Factor gamma (GMF-gamma). Despite its name, it has not been shown to participate in glia maturation. It is a gene of unknown function that is similar in sequence to GMF-beta. The sequence homology and chromosomal location led to an unsuccessful search for GMF-gamma mutations in glioma. We examined GMF-gamma expression in 1432 human cDNA libraries. Highest expression occurs in phagocytic, antigen-presenting and other hematopoietic cells. We found GMF-gamma mRNA in almost every tissue examined, with expre  相似文献   

9.
Hunting for genes by functional screens   总被引:1,自引:0,他引:1  
Advances in high throughput sequencing technologies have led to an explosion of sequence information available for today's researchers. Efforts in the emerging next phase of the genomic era are focusing on the assignment of function to genes uncovered by genome sequencing programs. The main approaches include high throughput mutagenesis, predictions based on homology in primary sequence, microarray and proteomics. Despite the variety of strategies applied, only 30% of predicted human genes have any function assigned. There is a need, therefore, for additional tools to overcome some of the limitations of existing techniques. In this review we discuss some recent developments and their impact on gene function annotation, especially as they relate to the elucidation of signalling cascades activated by cytokines and growth factors.  相似文献   

10.
The discoidin I genes of Dictyostelium form a small, co-ordinately regulated multigene family. We have sequenced and compared the upstream regions of the DiscI-alpha, -beta and -gamma genes. For the most part the upstream regions of the three genes are non-homologous. The upstream sequences of the beta and gamma genes are exceedingly A + T-rich, while those of the alpha gene are less so. All three genes have a relatively G + C-rich region 20 to 40 base-pairs in length, found approximately 200 base-pairs 5' to the messenger RNA start site. This G + C-rich region 5' to the beta and gamma genes is flanked by short inverted repeats. Within this region, there is an 11 base-pair exact homology between the alpha and gamma genes, and a less perfect homology between these genes and the beta gene. The homology is flanked at a short distance by interspersed G and T residues. The gamma gene is greater than 90% A + T for greater than 800 base-pairs upstream. Further upstream there is a G + C-rich region that is also found inverted approximately 3.5 X 10(3) base-pairs away. The gamma and beta genes are tandemly linked, and the entire approximately 500 base-pair intergene region between the 3' end of the gamma gene and the 5' end of the beta gene is A + T-rich (approximately 90%) with the exception of the homology region 5' to the gamma gene. We demonstrate also the presence of a discoidin I pseudogene fragment having only 139 base-pairs of discoidin homology with greater than 8% mismatch. It is flanked upstream by five 39 base-pair G + C-rich repeats, and downstream by sequences that are extremely A + T-rich. We discuss the possible significance of the conserved G + C-rich structures on discoidin I gene expression.  相似文献   

11.
A cry1Ab-type gene was cloned from a new isolate of Bacillus thuringiensis by PCR. When restriction pattern was compared with that of known genes it was found to have additional restriction site for ClaI. Nucleotide sequencing and homology search revealed that the toxin shared 95% homology with the known Cry1Ab proteins as compared to more than 98% homology among the other reported Cry1Ab toxins. The gene encoded a sequence of 1,177 amino acids compared to 1,155 amino acids encoded by all the other 16 cry1Ab genes reported so far. An additional stretch of 22 amino acids after the amino acid G793 in the new toxin sequence showed 100% homology with several other cry genes within cry1 family. Homology search indicated that the new cry1Ab-type gene might have resulted by nucleotide rearrangement between cry1Ab and cry1Aa/cry1Ac genes.  相似文献   

12.
Nucleotide sequence of the immunity and lysis region of the ColE9-J plasmid   总被引:8,自引:0,他引:8  
We have determined the nucleotide sequence of a 1500 bp fragment of the ColE9-J plasmid which encodes colicin E9 immunity and colicin E5 immunity and contains two lys genes. Open reading frames corresponding to the four genes have been located and their position confirmed by transposon mutagenesis of sub-clones of the ColE9-J plasmid. The E9imm gene shows 69% homology at both the nucleotide and the amino acid level to the previously sequenced E2imm gene. The E5imm gene shows little homology to any other E colicin immunity gene which has been sequenced. The lys gene distal to the 3' end of the E5imm gene shows considerable sequence homology to all other previously sequenced E colicin lys genes. The lys gene distal to the 3' end of the E9imm gene is identical to the pColE2 and pColE3 lys genes for the first 59 nucleotides but encodes a much smaller gene product than any other lys gene which has been sequenced. The two lys genes sequenced here are exceptions to Shepherd's rule concerning the number of RNY codons in the three possible reading frames.  相似文献   

13.
Summary The small subunit (RbcS) of ribulose bisphosphate carboxylase (RuBPCase) is encoded by eight genes in Petunia (Mitchell). These genes can be divided into three subfamilies (51, 117 and 71) based upon hybridization to three petunia rbcS cDNA clones. The nucleotide sequence of six of the eight petunia rbcS genes is presented here and the structure of the genes is discussed with respect to their genomic linkage and their expression levels in petunia leaf tissue. The rbcS genes belonging to the same subfamily encode an identical mature RbcS polypeptide, however the different subfamilies encode distinguishable polypeptides. All the genes, except one, contian two introns within the mature subunit coding region; one gene contains one extra intron within the coding region. There are large regions of nucleotide sequence homology within the introns of genes within a subfamily, but significantly less homology between the introns of genes of different subfamilies. A complex pattern of homology within the multiple genes of the 51 subfamily is observed. There are regions within these genes which share high levels of sequence homology; this homology does not extend throughout the whole gene and the regions of homology do not always occur in adjacent genes. Two 3 rbcS gene fragments which we isolated from the petunia genome show high levels of homology to two of the intact rbcS genes.  相似文献   

14.
15.
We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era.  相似文献   

16.
We describe the intron-exon structure of and the homology among the four alpha-tubulin genes of Drosophila melanogaster. Three of the genes share a highly conserved 1.3 kb sequence which corresponds to most of the RNA complementary portion of the genes. The fourth gene is different. Its 5' half has weak homology and its 3' half has moderate homology to the other three genes. The homology maps were first determined by electron microscopy of heteroduplexes between pairs of genes. Higher resolution maps were then obtained by gel analysis of heteroduplexes that had been digested with S1 nuclease.  相似文献   

17.
In order to understand the coordinate regulation between the alpha-like and beta-like globins during the developmental switches in hemoglobin synthesis, we have studied the rabbit alpha-like globin gene family. A cluster of six linked genes arranged 5'-zeta 1-alpha 1-theta 1-zeta 2-zeta 3-theta 2-3' has been isolated as a set of overlapping clones from a library of rabbit genomic DNA. Blot-hybridization analysis of genomic DNA not only confirms this linkage arrangement but also reveals the presence of additional zeta and theta genes. We propose that this gene cluster was generated by a block duplication of a set of alpha-like genes; the proposed duplication unit is zeta-zeta-alpha-theta. Further duplications of a zeta-zeta-theta set are also proposed to have occurred. As expected for a duplicated locus, the rabbit alpha-like gene cluster contains long blocks of internal homology. The Z homology block is about 7.2 kilobase pairs long and contains the zeta genes; the T homology block is about 4.7 kilobase pairs long and contains a theta gene. Surprisingly, both Z and T homology blocks are flanked by a common junction sequence (J) which contains a region very similar to the 3'-untranslated sequence of an alpha-globin gene. Analysis of the J sequences suggests a recombination mechanism by which the alpha gene could have been deleted from the second set of genes in the cluster (zeta 2-zeta 3-theta 2). The relationships among the genes in characterized alpha-like gene clusters in mammals are summarized. The rabbit gene cluster differs from those of other mammals principally in the loss of a gene orthologous to the human psi alpha 1 and in the block duplication of the zeta-zeta-alpha-theta gene set.  相似文献   

18.
草菇冷诱导相关基因的克隆及序列分析   总被引:10,自引:0,他引:10  
利用差异显示技术分离获得草菇低温特异DNA片段,经与正常草菇和低温诱导草菇cDNA分别southern杂交验证后,得到低温特异性片段。采用PCR标记技术对获得的低温特异性片段进行DIG标记,以此为探针,对低温处理的草菇cDNA文库进行筛选,获得4个阳性克隆,分别进行测序。序列同源性比较分析发现,Cor3基因与s-腺苷-L-高半胱氨酸水解酶有很高的同源性,Cor4基因与40S核糖体蛋白S9有很高的同源性,这两个基因可能与草菇的低温自溶现象有关。Cor1基因与脉孢菌的保守假设蛋白(conservedhypotheticalprotein)有同源性,Cor2基因与辅酶A连接酶有同源性。半定量RT-PCR验证发现Cor1和Cor2基因在正常情况下没有表达,低温处理后有表达,Cor3和Cor4基因在正常情况下有表达,低温处理后表达量增加。  相似文献   

19.
T Toda  S Cameron  P Sass  M Zoller  M Wigler 《Cell》1987,50(2):277-287
We have isolated three genes (TPK1, TPK2, and TPK3) from the yeast S. cerevisiae that encode the catalytic subunits of the cAMP-dependent protein kinase. Gene disruption experiments demonstrated that no two of the three genes are essential by themselves but at least one TPK gene is required for a cell to grow normally. Comparison of the predicted amino acid sequences of the TPK genes indicates conserved and variable domains. The carboxy-terminal 320 amino acid residues have more than 75% homology to each other and more than 50% homology to the bovine catalytic subunit. The amino-terminal regions show no homology to each other and are heterogeneous in length. The TPK1 gene carried on a multicopy plasmid can suppress both a temperature-sensitive ras2 gene and adenylate cyclase gene.  相似文献   

20.
Among 30 conjugative plasmids of enteric bacteria from 23 incompatibility (Inc) groups, we found 19 (from 12 Inc groups) which can complement defects caused by a defective single-stranded DNA-binding protein of Escherichia coli K-12. The genes which are responsible for the complementation from three of these plasmids (Inc groups I1, Y, and 9) were cloned. These genes showed extensive homology with each other and with the E. coli F factor ssb gene (formerly denoted ssf), which codes for a single-stranded DNA binding protein. The proteins coded for by the cloned genes bound tightly to single-stranded DNA. Six other ssb- -complementing plasmids were tested for homology to the F factor ssb gene, and all of these showed homology, as did one of the ssb- -noncomplementing plasmids. Plasmids from a total of 13 different Inc groups of enteric bacteria were found to be likely to have genes with some homology to the ssb gene of the F factor. For plasmids from several different Inc groups, we found no evidence for strong homology with ssb of the F factor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号