首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.  相似文献   

3.
Aamodt E  Shen L  Marra M  Schein J  Rose B  McDermott JB 《Gene》2000,243(1-2):67-74
The Caenorhabditis briggsae homologue of the Caenorhabditis elegans pag-3 gene was cloned and sequenced. When transformed into a C. elegans pag-3 mutant, the C. briggsae pag-3 gene rescued the pag-3 reverse kinker and lethargic phenotypes. The C. elegans pag-3 gene fused to lacZ was expressed in the same pattern in C. elegans and C. briggsae. Unlike many gene homologues compared between C. elegans and C. briggsae, extensive sequence conservation was found in the non-coding regions upstream of the pag-3 exons, in several of the introns and in the downstream non-coding region. Furthermore, the splice acceptor and splice donor sites were conserved, and the size of the introns and exons was surprisingly similar. The predicted protein sequence of C. briggsae PAG-3 was 85% identical to the protein sequence of C. elegans PAG-3. Because so much of the non-coding region of pag-3 was conserved, the control of pag-3 may be quite complex, involving the binding of many trans-acting factors. These results suggest the evolutionary conservation of the pag-3 gene sequence, its expression and function.  相似文献   

4.
拟南芥和线虫基因序列及剪切位点的理论预测   总被引:5,自引:1,他引:5  
将拟南芥(A.thaliana)和线虫(C.elegans)基因组按外显子、内含子及基因间序列区分为3类。分别选取64、40、20种三联体的概率作为信号参数构建离散源,根据离散增量预测序列所属类型。结果表明:拟南芥各条染色体标准集总预测成功率达到82.19%,检验集为87.95%;线虫各条染色体标准集总预测成功率达到79.67%,检验集达到81,93%。另外,将两种基因序列中的外显子分别划分成3类,用外显子剪切位点、翻译起始和结束位点附近的三联体的3个位点作为3条子链,以各条子链的12个参数构建离散源,用离散增量对3种序列类型进行预测,预测成功率都达80%以上。  相似文献   

5.
6.
7.
8.
9.

Background  

ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and N-SCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction.  相似文献   

10.
11.
12.
Exon skipping that accompanies exonic mutation might be caused by an effect of the mutation on pre-mRNA secondary structure. Previous attempts to associate predicted secondary structure of pre-mRNA with exon skipping have been hindered by either a small number of available mutations, sub-optimal structures, or weak effects on exon skipping. This report identifies more extensive sets of mutations from the human and hamster Hprt gene whose association with exon skipping is clear. Optimal secondary structures of the wild-type and mutant pre-mRNA surrounding each exon were predicted by energy minimization and were compared by energy dot plots. A significant association was found between the occurrence of exon skipping and the disruption of a stem containing the acceptor site consensus sequences of exon 8 of the human Hprt gene. However, no change in secondary structure was associated with skipping of exon 4 of the hamster Hprt gene. Using updated energy parameters we found a different structure than that previously reported for exon 2 of the hamster Hprt gene. In contrast to the previously reported structure, no significant association was found between predicted structural changes and skipping of exon 2. For all three Hprt exons studied, there was a significantly greater number of deoxythymidine substitutions among mutations accompanied by exon skipping than among mutations without exon skipping. For exon 8, deoxythymidine substitution was also associated with structural changes in the stem containing the acceptor site consensus sequences. For exon 51 of the human fibrillin gene, structural differences from wild type were predicted for all four mutations accompanied by exon skipping that were not were predicted for a single mutation without exon skipping. Our results suggest that both primary and secondary pre-mRNA structure contribute to definition of Hprt exons, which may involve exonic splicing enhancers.  相似文献   

13.
14.
15.
Gene identification in genomic DNA from eukaryotes is complicated by the vast combinatorial possibilities of potential exon assemblies. If the gene encodes a protein that is closely related to known proteins, gene identification is aided by matching similarity of potential translation products to those target proteins. The genomic DNA and protein sequences can be aligned directly by scoring the implied residues of in-frame nucleotide triplets against the protein residues in conventional ways, while allowing for long gaps in the alignment corresponding to introns in the genomic DNA. We describe a novel method for such spliced alignment. The method derives an optimal alignment based on scoring for both sequence similarity of the predicted gene product to the protein sequence and intrinsic splice site strength of the predicted introns. Application of the method to a representative set of 50 known genes from Arabidopsis thaliana showed significant improvement in prediction accuracy compared to previous spliced alignment methods. The method is also more accurate than ab initio gene prediction methods, provided sufficiently close target proteins are available. In view of the fast growth of public sequence repositories, we argue that close targets will be available for the majority of novel genes, making spliced alignment an excellent practical tool for high-throughput automated genome annotation.  相似文献   

16.
Liang H  Guo W  Nagarajan L 《Genomics》2000,66(2):226-228
A novel C2H2 zinc finger gene, ZNF277, has been localized to human chromosome 7q31.1. The gene is encoded by 12 exons in a genomic fragment of >100 kb between the microsatellite markers D7S523 and D7S471, deleted in a number of malignancies. The predicted open reading frame (ORF) of 438 amino acids shows an overall homology of 50% to the putative ORF F46B6.7 of Caenorhabditis elegans. The presence of a 30-amino-acid coiled-coil domain in both the C. elegans ORF F46B6.7 and ZNF277 is suggestive of functional similarities. ESTs for the murine orthologue ZFP277 are found in early embryonic stem cells, 16-cell stage embryo, and blastocysts. The evolutionary conservation and the expression profile suggest ZNF277 to be a critical regulator of development and differentiation.  相似文献   

17.
18.
A Horii  M Emi  N Tomita  T Nishide  M Ogawa  T Mori  K Matsubara 《Gene》1987,60(1):57-64
We have determined the entire structure of the human pancreatic alpha-amylase (Amy2) gene. It is approx. 9 kb long and is separated into ten exons. This gene (amy2) has a structure very similar to that of human salivary alpha-amylase (Amy1) gene [Nishide et al. Gene 41 (1986a) 299-304] in the nucleotide sequence and the size and location of the exons. The major difference lies in the fact that amy1 has one extra exon on the 5' side. Other differences are at the 5' border of exon 1 and the 3' border of exon 10. The close similarity of these two genes, as compared with mouse pancreatic and salivary amylase genes, suggests that during evolution, the divergence into the two amylase genes may have occurred after the divergence of mice and man.  相似文献   

19.
20.
The Caenorhabditis elegans genome contains more than 60 cytochrome P450 (CYP) genes. The exon-intron organizations of all of the available and potentially active C. elegans CYP genes were inferred by a newly developed program for predicting protein-coding exons based on the alignment of a genomic DNA sequence and a protein profile. From the predicted amino acid sequences, all of the C. elegans CYP genes except one were classified into three groups, which were closely related to the mammalian drug-metabolizing P450 gene families CYP2, CYP3, and CYP4. The gene structures were strikingly divergent within each group; 20, 10, and 5 unique gene organizations were identified among 40, 18, and 5 genes in the CYP2-, CYP3-, and CYP4-related groups, respectively. The degrees of divergence in gene organization were strongly correlated with those in the amino acid sequences of encoding proteins, and the minimum rate of change in an intron insertion site was estimated to be about 90 times less frequent than amino acid substitutions. Parsimonious analyses suggested that frequent loss and gain of introns has occurred during the evolution of CYP genes in each group after the divergence of nematodes, arthropods, and deuterostomia. Few, if any, incidents of intron sliding were evident, and a model that did not allow intron insertions was highly inconsistent with the observations. All of these findings are explained better by the intron-late view than by the intron-early view.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号