首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Current methods for multiplicity adjustment do not make use of the graph structure of Gene Ontology (GO) when testing for association of expression profiles of GO terms with a response variable. RESULTS: We propose a multiple testing method, called the focus level procedure, that preserves the graph structure of Gene Ontology (GO). The procedure is constructed as a combination of a Closed Testing procedure with Holm's method. It requires a user to choose a 'focus level' in the GO graph, which reflects the level of specificity of terms in which the user is most interested. This choice also determines the level in the GO graph at which the procedure has most power. We prove that the procedure strongly controls the family-wise error rate without any additional assumptions on the joint distribution of the test statistics used. We also present an algorithm to calculate multiplicity-adjusted P-values. Because the focus level procedure preserves the structure of the GO graph, it does not generally preserve the ordering of the raw P-values in the adjusted P-values. AVAILABILITY: The focus level procedure has been implemented in the globaltest and GlobalAncova packages, both of which are available on www.bioconductor.org.  相似文献   

2.
王巍  卢卫红  孙野青 《生物信息学》2010,8(3):228-232,236
基因本体论是关于基因和蛋白质知识的标准词汇,也是今后实现各种与基因相关的数据统一、数据转换、数据挖掘的基础。本文通过分子功能基因本体论比较了不同模式生物基因产物分子功能分布的异同。结果发现:在动物类、植物类以及真菌类模式生物中,大部分已知功能基因的分布比例是基本一致的,存在一定的同源性;但在动物中结合类基因数量较多而在植物与真菌中反而催化类基因数量较多,信号传导相关基因在动物中的分布数量多间接证明了动物在进化上的高等性,而植物中特有的大分子传递相关编码基因,可能与植物的养分、水分在机体种的传输相关。  相似文献   

3.
4.
We develop a new weighting approach of gene ontology (GO) terms for predicting protein subcellular localization. The weights of individual GO terms, corresponding to their contribution to the prediction algorithm, are determined by the term-weighting methods used in text categorization. We evaluate several term-weighting methods, which are based on inverse document frequency, information gain, gain ratio, odds ratio, and chi-square and its variants. Additionally, we propose a new term-weighting method based on the logarithmic transformation of chi-square. The proposed term-weighting method performs better than other term-weighting methods, and also outperforms state-of-the-art subcellular prediction methods. Our proposed method achieves 98.1%, 99.3%, 98.1%, 98.1%, and 95.9% overall accuracies for the animal BaCelLo independent dataset (IDS), fungal BaCelLo IDS, animal Höglund IDS, fungal Höglund IDS, and PLOC dataset, respectively. Furthermore, the close correlation between high-weighted GO terms and subcellular localizations suggests that our proposed method appropriately weights GO terms according to their relevance to the localizations.  相似文献   

5.
An improved Bonferroni procedure for multiple tests of significance   总被引:24,自引:0,他引:24  
SIMES  R. J. 《Biometrika》1986,73(3):751-754
  相似文献   

6.
刘武艺 《生物信息学》2011,9(4):292-298,302
基因本体论是国际上标准的基因和蛋白质功能知识词汇.利用基因本体论的功能富集分布比较和分析了两种蟾蜍bHLH基因分子功能分布特点.结果发现,两种蟾蜍的bHLH基因均有显著富集分布的GO注释语句,其中转录调控活性( GO:0030528)、转录调控(GO:0045449)、DNA结合(GO:0003677)、RNA代谢过程调控(G0:0051252)、DNA依赖的转录调控(GO:0006355)、转录(G0:0006350)和转录因子活性(GO:0003700)等频率很高,表明这些GO注释是蟾蜍bHLH基因常见的功能;此外,蟾蜍bHLH基因在肌肉器官发育、神经管和眼发育等一些重要的发育或生理过程的基因表达调控中发挥着重要的作用.  相似文献   

7.
Korn EL  Freidlin B 《Biometrics》2008,64(1):227-231
Summary :   Lehmann and Romano (2005, Annals of Statistics 33, 1138–1154) discuss a Bonferroni-type procedure that bounds the probability that the number of false positives is larger than a specified number. We note that this procedure will have poor power as compared to a multivariate permutation test type procedure when the experimental design accommodates a permutation test. An example is given involving gene expression microarray data of breast cancer tumors.  相似文献   

8.
9.
10.
11.
Sorghum bicolor (L.) is an important crop of arid and semi arid zones with most of its varieties tolerant to drought, heat and salt stress. Functional identification of many salt tolerant proteins has been reported in Arabidopsis, rice and other plants, however only little functional information has been predicted in sorghum till date. A 2-D gel electrophoresis based proteomic approach with MALDI-TOF mass spectrometer was utilized to analyze the salt stress response of sorghum. Major changes in protein complement were observed at 200 mM NaCl in hydroponic culture after 96 h of salt-stress. Highly expressed five proteins were excised for functional identification. We developed shortest path (SP) analysis based method on Gene Ontology (GO) hierarchy using sum of GO-term’s semantic similarities. In this study, we observed that majority of expressed proteins belonged to the functional category of energy production and conversion, signal transduction mechanisms and ribosome maturation. These identified functions suggest a distinct mechanism of salt-stress adaptation in sorghum plant. The proposed method in this paper potentially has great importance to further understanding of newly identified proteins that can help in plant development.

Electronic supplementary material

The online version of this article (doi:10.1007/s12298-012-0121-y) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures.

Results

We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes.

Conclusions

We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users.  相似文献   

13.
Based on the recent development in the gene ontology and functional domain databases, a new hybridization approach is developed for predicting protein subcellular location by combining the gene product, functional domain, and quasi-sequence-order effects. As a showcase, the same prokaryotic and eukaryotic datasets, which were studied by many previous investigators, are used for demonstration. The overall success rate by the jackknife test for the prokaryotic set is 94.7% and that for the eukaryotic set 92.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross-validation test procedure, suggesting that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology. The very high success rates also reflect the fact that the subcellular localization of a protein is closely correlated with: (1). the biological objective to which the gene or gene product contributes, (2). the biochemical activity of a gene product, and (3). the place in the cell where a gene product is active.  相似文献   

14.
利用有限个实验条件下的基因表达谱数据,只能对与实验条件相关的基因功能类进行有效预测,所以有必要限定可预测的基因功能类范围。据此,首先基于GeneOntology(GO)选择富集差异表达基因与实验条件相关的功能类。再通过支持向量机分类器,深化预测迄今只注释到实验条件相关功能类的父结点的基因是否属于该实验条件相关功能类。应用于一套酵母基因表达谱数据,结果显示,在剔除了高度不平衡的训练集合后,平均真阳性率(precision)与平均覆盖率(recall)都分别达到了71%与47%以上。  相似文献   

15.
HOMMEL  G. 《Biometrika》1988,75(2):383-386
  相似文献   

16.
17.
18.
An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling.In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject.The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.  相似文献   

19.
Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called “pLoc-mGneg” for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to “iLoc-Gneg”, the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.  相似文献   

20.
Yang Y  Degruttola V 《Biometrics》2008,64(2):329-336
Summary .   Identifying genetic mutations that cause clinical resistance to antiretroviral drugs requires adjustment for potential confounders, such as the number of active drugs in a HIV-infected patient's regimen other than the one of interest. Motivated by this problem, we investigated resampling-based methods to test equal mean response across multiple groups defined by HIV genotype, after adjustment for covariates. We consider construction of test statistics and their null distributions under two types of model: parametric and semiparametric. The covariate function is explicitly specified in the parametric but not in the semiparametric approach. The parametric approach is more precise when models are correctly specified, but suffer from bias when they are not; the semiparametric approach is more robust to model misspecification, but may be less efficient. To help preserve type I error while also improving power in both approaches, we propose resampling approaches based on matching of observations with similar covariate values. Matching reduces the impact of model misspecification as well as imprecision in estimation. These methods are evaluated via simulation studies and applied to a data set that combines results from a variety of clinical studies of salvage regimens. Our focus is on relating HIV genotype to viral susceptibility to abacavir after adjustment for the number of active antiretroviral drugs (excluding abacavir) in the patient's regimen.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号