首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. RESULTS: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene. AVAILABILITY: The Java implementation is available for download from http://www.bioinformatics.uwaterloo.ca/software.  相似文献   

2.
3.
4.
A low-resolution scoring function for the selection of native and near-native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The C(alpha) RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate approximately 90% of fold candidates generated by current prediction methods from all-atom modeling and further evaluation. The correlation between the predicted and actual C(alpha) RMS values varies considerably between the candidate fold sets.  相似文献   

5.
Computational gene prediction and identifying alternatively spliced isoforms have always been a challenging task. In this paper, we describe the performance of three gene/exon finding programmes namely Fex, Gen view2 and Gene builder capable of predicting open reading frames or exons for a given set of sequences from C. elegans genome. The predicted exons were compared with the 'sequencing consortium' identified exons and degree of consensus among them is discussed. We found that exon prediction by Fex was similar to the consortium prediction as compared to Gen view2 and Gene builder results. Interestingly, some exons (six exons in five genes) predicted positive only by Fex and not by the 'sequencing consortium' are found at the C. elegans EST database. This data is critical for further debate and discussion on gene finding in C. elegans.  相似文献   

6.
Exon skipping that accompanies exonic mutation might be caused by an effect of the mutation on pre-mRNA secondary structure. Previous attempts to associate predicted secondary structure of pre-mRNA with exon skipping have been hindered by either a small number of available mutations, sub-optimal structures, or weak effects on exon skipping. This report identifies more extensive sets of mutations from the human and hamster Hprt gene whose association with exon skipping is clear. Optimal secondary structures of the wild-type and mutant pre-mRNA surrounding each exon were predicted by energy minimization and were compared by energy dot plots. A significant association was found between the occurrence of exon skipping and the disruption of a stem containing the acceptor site consensus sequences of exon 8 of the human Hprt gene. However, no change in secondary structure was associated with skipping of exon 4 of the hamster Hprt gene. Using updated energy parameters we found a different structure than that previously reported for exon 2 of the hamster Hprt gene. In contrast to the previously reported structure, no significant association was found between predicted structural changes and skipping of exon 2. For all three Hprt exons studied, there was a significantly greater number of deoxythymidine substitutions among mutations accompanied by exon skipping than among mutations without exon skipping. For exon 8, deoxythymidine substitution was also associated with structural changes in the stem containing the acceptor site consensus sequences. For exon 51 of the human fibrillin gene, structural differences from wild type were predicted for all four mutations accompanied by exon skipping that were not were predicted for a single mutation without exon skipping. Our results suggest that both primary and secondary pre-mRNA structure contribute to definition of Hprt exons, which may involve exonic splicing enhancers.  相似文献   

7.
Partial cDNAs of different isoforms of protein phosphatase 2Cbeta (PP2Cbeta or PPM1B) have been characterized in mammals. We disclose here the full cDNAs of two major PP2Cbeta isoforms from human, rat and mouse. These cDNAs (2.6 and 3.3 kb) are able to encode 53 kDa (PP2Cbetal) and 43 kDa (PP2Cbetas) polypeptides, respectively. The isoforms are co-expressed ubiquitously with the highest level in skeletal muscle, as assessed by Northern-blot analysis. Western and in situ analyses using monoclonal antibodies against PP2Cbeta confirmed the existence of two isoforms in the cytoplasm. Comparative sequence analysis revealed that both cDNAs consist of six exons with an alternate usage of the 3' exons that underlies the differences between them. The genomic structure of PP2Cbeta is similar to that of other PP2C paralogs and includes a non-coding first exon followed by a large intron and a large second exon that encoded most of the catalytic domain. Both variants of the ending exon include large non-coding regions. All non-translated regions (NTRs) are highly conserved between the orthologous genes, indicating their regulatory function. The 5'-NTR is long (379 bp), includes upstream start codons and is predicted to contain stable secondary structures. Such features inhibit translation initiation by the scanning mechanism. Introduction of this NTR element into a bi-luciferase expression-cassette enabled expression of the second cistron, suggesting that it might serve as an internal ribosome entry site, or it contains a cryptic promoter. Overexpression of PP2Cbeta under CMV-promoter in 293 cells led to cell-growth arrest or cell death.  相似文献   

8.
The computer program exonsampler automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next‐generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User‐adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of exonsampler to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon‐capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16 000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection.  相似文献   

9.
MOTIVATION: Sequencing of complete eukaryotic genomes and large syntenic fragments of genomes makes it possible to apply genomic comparison for gene recognition. RESULTS: This paper describes a spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species. The algorithm is implemented in Pro-Gen software. Unlike other algorithms, Pro-Gen does not assume conservation of the exon-intron structure. Amino acid sequences obtained by the formal translation of candidate exons are aligned instead of nucleotide sequences, which allows for distant comparisons. The algorithm was tested on a sample of human-mammal (mouse), human-vertebrate (Xenopus ) and human-invertebrate (Drosophila ) gene pairs. Surprisingly, the best results, 97-98% correlation between the actual and predicted genes, were obtained for more distant comparisons, whereas the correlation on the human-mouse sample was only 93%. The latter value increases to 95% if conservation of the exon-intron structure is assumed. This is caused by a large amount of sequence conservation in non-coding regions of the human and mouse genes probably due to regulatory elements. AVAILABILITY: Pro-Gen v. 3.0 is available to academic researchers free of charge at http://www.anchorgen.com/pro_gen/pro_gen.html.  相似文献   

10.
In dystrophin Kobe exon 19 of the dystrophin gene is skipped during the process of mRNA precursor splicing even though the splice sites are unchanged (Matsuo et al. J. Clin. Invest. 87:2127-2131,1991). In the predicted secondary structure of the mRNA precursor, exon 19 of dystrophin Kobe is paired with intron sequences, whereas a large part of exon sequence from wild type is paired with itself and folded into a large hairpin structure. As all of 22 additional dystrophin exons analyzed also form intra-exon hairpin structures, these structures may be considered essential components of exons. We suggest that the abolishment of a hairpin structure in the truncated exon of dystrophin Kobe might prevent the splicing machinery from recognizing the splice sites and induce exon skipping.  相似文献   

11.
We have developed a computer program which predicts internal exons from naive genomic sequence data and which will run on any IBM-compatible 80286 (or higher) computer. The algorithm searches a sequence for 'spliceable open reading frames' (SORFs), which are open reading frames bracketed by suitable splice-recognition sequences, and then analyzes the region for codon usage. Potential exons are stratified according to the reliability of their prediction, from confidence levels 1 to 5. The program is designed to predict internal exons of length greater than 60 nucleotides. In an analysis of 116 genes of a training set, 384 out of 441 such exons (87.1%) are identified, with 280 (63.5%) of predictions matching the true exon exactly (at both 5' and 3' splice junctions and in the correct reading frame), and with 104 (23.6%) exons matching partially. In a similar analysis of 14 genes in a test set unrelated to the genes used to generate the parameters of the program, 70 out of 80 internal exons greater than 60 bp in length are identified (87.5%), with 47 completely and 23 partially matched. SORFs that partially match true internal exons share at least one splice junction with the exon, or share both splice junctions but are interpreted in an incorrect reading frame. Specificity (the percentage of SORFs that correspond to true exons) varies from 91% at confidence level 1 to 16% at confidence level 5, with an overall specificity of 35-40%. The output displays nucleotide position, confidence level, reading frame phase at the 5' and 3' ends, acceptor and donor sequences and scoring statistics and also gives an amino acid translation of the potential exon. SORFIND compares favourably with other programs currently used to predict protein-coding regions.  相似文献   

12.
13.
14.
15.
Genomic organization of a new candidate tumor suppressor gene, LRP1B   总被引:4,自引:0,他引:4  
  相似文献   

16.
17.
18.
19.
MOTIVATION: A method for prediction of disease relevant human genes from the phenotypic appearance of a query disease is presented. Diseases of known genetic origin are clustered according to their phenotypic similarity. Each cluster entry consists of a disease and its underlying disease gene. Potential disease genes from the human genome are scored by their functional similarity to known disease genes in these clusters, which are phenotypically similar to the query disease. RESULTS: For assessment of the approach, a leave-one-out cross-validation of 878 diseases from the OMIM database, using 10672 candidate genes from the human genome, is performed. Depending on the applied parameters, in roughly one-third of cases the true solution is contained within the top scoring 3% of predictions and in two-third of cases the true solution is contained within the top scoring 15% of predictions. The prediction results can either be used to identify target genes, when searching for a mutation in monogenic diseases or for selection of loci in genotyping experiments in genetically complex diseases.  相似文献   

20.
目的:利用RT-PCR技术验证并确认基于小鼠外显子芯片发现的部分缺血相关基因的表达,以鉴定候选基因的外显子是否发生可变剪接,从而实现对外显子芯片结果的鉴定。方法:根据生物信息学分析结果,选取小鼠外显子芯片中的3个基因(Ube3c,6330439K17Rik,Atp7a),在预测发生可变剪接的外显子两侧设计上下游引物,PCR后进行凝胶回收,再克隆到载体中进行测序。结果:RT-PCR及测序结果表明,Ube3c基因在6号外显子、6330439K17Rik基因在12号外显子、Atp7a基因在3号外显子发生可变剪接,与芯片预测结果一致。结论:RT-PCR技术可针对外显子芯片的结果进行可靠性验证,为可变剪接基因表达研究提供了一种有效手段。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号