共查询到20条相似文献,搜索用时 15 毫秒
1.
BackgroundIt’s a very urgent task to identify cancer genes that enables us to understand the mechanisms of biochemical processes at a biomolecular level and facilitates the development of bioinformatics. Although a large number of methods have been proposed to identify cancer genes at recent times, the biological data utilized by most of these methods is still quite less, which reflects an insufficient consideration of the relationship between genes and diseases from a variety of factors. ResultsIn this paper, we propose a two-rounds random walk algorithm to identify cancer genes based on multiple biological data (TRWR-MB), including protein-protein interaction (PPI) network, pathway network, microRNA similarity network, lncRNA similarity network, cancer similarity network and protein complexes. In the first-round random walk, all cancer nodes, cancer-related genes, cancer-related microRNAs and cancer-related lncRNAs, being associated with all the cancer, are used as seed nodes, and then a random walker walks on a quadruple layer heterogeneous network constructed by multiple biological data. The first-round random walk aims to select the top score k of potential cancer genes. Then in the second-round random walk, genes, microRNAs and lncRNAs, being associated with a certain special cancer in corresponding cancer class, are regarded as seed nodes, and then the walker walks on a new quadruple layer heterogeneous network constructed by lncRNAs, microRNAs, cancer and selected potential cancer genes. After the above walks finish, we combine the results of two-rounds RWR as ranking score for experimental analysis. As a result, a higher value of area under the receiver operating characteristic curve (AUC) is obtained. Besides, cases studies for identifying new cancer genes are performed in corresponding section. ConclusionIn summary, TRWR-MB integrates multiple biological data to identify cancer genes by analyzing the relationship between genes and cancer from a variety of biological molecular perspective. 相似文献
2.
microRNA(miRNA)是一类约22个核苷酸(nt)长的非编码小分子RNA,广泛存在于动植物细胞中,通过和靶基因的不精确互补配对而裂解mRNA或抑制翻译的起始。准确地预测miRNA靶基因和正确地认识miRNA及其靶基因的作用机理已成为当前研究的热点。作者试图对目前常用的10余个高等生物miRNA靶基因预测软件的实现原理、适用对象及各算法的创新之处等加以综述,以便为进行靶基因预测算法设计人员提供参考,对生物学实验验证提供更好的理论指导。 相似文献
3.
MicroRNAs are one class of small single-stranded RNA of about 22 nt serving as important negative gene regulators. In animals,
miRNAs mainly repress protein translation by binding itself to the 3′ UTR regions of mRNAs with imperfect complementary pairing.
Although bioinformatics investigations have resulted in a number of target prediction tools, all of these have a common shortcoming—a
high false positive rate. Therefore, it is important to further filter the predicted targets. In this paper, based on miRNA:target
duplex, we construct a second-order Hidden Markov Model, implement Baum-Welch training algorithm and apply this model to further
process predicted targets. The model trains the classifier by 244 positive and 49 negative miRNA:target interaction pairs
and achieves a sensitivity of 72.54%, specificity of 55.10% and accuracy of 69.62% by 10-fold cross-validation experiments.
In order to further verify the applicability of the algorithm, previously collected datasets, including 195 positive and 38
negative, are chosen to test it, with consistent results. We believe that our method will provide some guidance for experimental
biologists, especially in choosing miRNA targets for validation. 相似文献
4.
Background Several studies have demonstrated that synthetic lethal genetic interactions between gene mutations provide an indication of functional redundancy between molecular complexes and pathways. These observations help explain the finding that organisms are able to tolerate single gene deletions for a large majority of genes. For example, system-wide gene knockout/knockdown studies in S. cerevisiae and C. elegans revealed non-viable phenotypes for a mere 18% and 10% of the genome, respectively. It has been postulated that the low percentage of essential genes reflects the extensive amount of genetic buffering that occurs within genomes. Consistent with this hypothesis, systematic double-knockout screens in S. cerevisiae and C. elegans show that, on average, 0.5% of tested gene pairs are synthetic sick or synthetic lethal. While knowledge of synthetic lethal interactions provides valuable insight into molecular functionality, testing all combinations of gene pairs represents a daunting task for molecular biologists, as the combinatorial nature of these relationships imposes a large experimental burden. Still, the task of mapping pairwise interactions between genes is essential to discovering functional relationships between molecular complexes and pathways, as they form the basis of genetic robustness. Towards the goal of alleviating the experimental workload, computational techniques that accurately predict genetic interactions can potentially aid in targeting the most likely candidate interactions. Building on previous studies that analyzed properties of network topology to predict genetic interactions, we apply random walks on biological networks to accurately predict pairwise genetic interactions. Furthermore, we incorporate all published non-interactions into our algorithm for measuring the topological relatedness between two genes. We apply our method to S. cerevisiae and C. elegans datasets and, using a decision tree classifier, integrate diverse biological networks and show that our method outperforms established methods. 相似文献
5.
RNA research is advancing at an ever increasing pace. The newest and most state-of-the-art instruments and techniques have made possible the discoveries of new RNAs, and they have carried the field to new frontiers of disease research, vaccine development, therapeutics, and architectonics. Like proteins, RNAs show a marked relationship between structure and function. A deeper grasp of RNAs requires a finer understanding of their elaborate structures. In pursuit of this, cutting-edge experimental and computational structure-probing techniques output several candidate geometries for a given RNA, each of which is perfectly aligned with experimentally determined parameters. Identifying which structure is the most accurate, however, remains a major obstacle. In recent years, several algorithms have been developed for ranking candidate RNA structures in order from most to least probable, though their levels of accuracy and transparency leave room for improvement. Most recently, advances in both areas are demonstrated by rsRNASP, a novel algorithm proposed by Tan et al. rsRNASP is a residue-separation-based statistical potential for three-dimensional structure evaluation, and it outperforms the leading algorithms in the field. 相似文献
6.
MOTIVATION: We are motivated by the fast-growing number of protein structures in the Protein Data Bank with necessary information for prediction of protein-protein interaction sites to develop methods for identification of residues participating in protein-protein interactions. We would like to compare conditional random fields (CRFs)-based method with conventional classification-based methods that omit the relation between two labels of neighboring residues to show the advantages of CRFs-based method in predicting protein-protein interaction sites. RESULTS: The prediction of protein-protein interaction sites is solved as a sequential labeling problem by applying CRFs with features including protein sequence profile and residue accessible surface area. The CRFs-based method can achieve a comparable performance with state-of-the-art methods, when 1276 nonredundant hetero-complex protein chains are used as training and test set. Experimental result shows that CRFs-based method is a powerful and robust protein-protein interaction site prediction method and can be used to guide biologists to make specific experiments on proteins. AVAILABILITY: http://www.insun.hit.edu.cn/~mhli/site_CRFs/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
7.
林火直接破坏森林资源,改变森林的结构与功能,影响局地甚至全球气候状况并威胁人类生命和财产安全,在气候变暖背景下林火将更加频发,因此开展林火预测/预报研究至关重要。利用MODIS (Moderate-resolution Imaging Spectroradiometer)的温度异常/火产品(MOD14A1)获取逐日林火数据,分析了2001-2018年中国西南地区林火时空分布特征;采用随机森林算法,综合考虑气象、地形、可燃物状况及植被等林火驱动因子,构建了中国西南地区干、湿季林火发生预测模型,系统分析了西南地区干湿季林火发生的主要驱动因子。结果表明:(1)中国西南地区林火主要集中分布于云南大部、四川西南部及贵州南部地区,并呈集聚分布特征;林火多发于干季,占林火发生总次数的96.5%,年林火发生次数呈阶段性变化特征,2001-2014年呈现显著增加趋势,随后表现为不显著减少趋势;(2)构建的干、湿季林火发生预测模型能较准确地模拟林火发生状况:训练期模型准确率分别处于82.94%-83.99%与85.12%-90.31%之间,AUC (Area Under Curve)值分别处于0.908-0.914与0.922-0.965之间;测试期模型准确率分别为79.73%和83.27%,AUC值分别为0.886和0.855;(3)海拔是西南地区林火发生最关键的限制因子,林火多集中于中海拔区,而在低海拔和高海拔地区林火不易发生,这与人类活动密切相关。当日的气象条件是干季林火发生次重要的驱动因子,可燃物的温湿度状况则是湿季林火发生次重要的驱动因子。FWI系统指标(Fire Weather Index)在西南地区有较好的适用性且对于区域干湿季林火发生均有重要的影响,因此在西南地区林火预测/预报工作中有必要引入FWI系统指标。 相似文献
8.
An algorithm has been developed to improve the success rate in the prediction of the secondary structure of proteins by taking into account the predicted class of the proteins. This method has been called the 'double prediction method' and consists of a first prediction of the secondary structure from a new algorithm which uses parameters of the type described by Chou and Fasman, and the prediction of the class of the proteins from their amino acid composition. These two independent predictions allow one to optimize the parameters calculated over the secondary structure database to provide the final prediction of secondary structure. This method has been tested on 59 proteins in the database (i.e. 10,322 residues) and yields 72% success in class prediction, 61.3% of residues correctly predicted for three states (helix, sheet and coil) and a good agreement between observed and predicted contents in secondary structure. 相似文献
9.
原核生物操纵子结构的准确注释对基因功能和基因调控网络的研究具有重要意义,通过生物信息学方法计算预测是当前基因组操纵子结构注释的最主要来源.当前的预测算法大都需要实验确认的操纵子作为训练集,但实验确认的操纵子数据的缺乏一直成为发展算法的瓶颈.基于对操纵子结构的认识,从基因间距离、转录翻译相关的调控信号以及COG功能注释等特征出发,建立了描述操纵子复杂结构的概率模型,并提出了不依赖于特定物种操纵子数据作为训练集的迭代自学习算法.通过对实验验证的操纵子数据集的测试比较,结果表明算法对于预测操纵子结构非常有效.在不依赖于任何已知操纵子信息的情况下,算法在总体预测水平上超过了目前最好的操纵子预测方法,而且这种自学习的预测算法要优于依赖特定物种进行训练的算法.这些特点使得该算法能够适用于新测序的物种,有别于当前常用的操纵子预测方法.对细菌和古细菌的基因组进行大规模比较分析,进一步提高了对基因组操纵子结构的普遍特征和物种特异性的认识. 相似文献
10.
Management of wildlife and protection of endangered species depend on determination of population trends. Because population changes are stochastic and autoregressive, there is reason to believe that population trends might not be properly determined by simple regression over short time periods. A bounded random walk (BRW) model is introduced as a null model for evaluating population trends. The BRW model shows long-term stability but rising and falling sequences of up to many decades. For a given variability and survey length, there will be an expected probability of finding a greater than X% slope simply by chance. This false positive probability needs to be considered when evaluating trends. Breeding Bird Survey data for 128 species over 46 years for two states were analyzed for trends for different series lengths. Trends estimated from short series were likely to not agree with the 46-year trends. Very short series (e.g., 5 years) tended to indicate no trend due to loss of statistical power. A 101-year series for sandwich term ( Sterna sandvicensis) revealed that even for 40 year-long series, 33% of subset series had a negative trend compared to the strong 101 year full series positive trend. The BRW model simulations and both data sets pointed to 20 years as a minimum time period for estimating trends reliably, though this can be longer for species that tend to cycle. Proper inference should thus consider the implications of inherent time series variability. 相似文献
11.
We used fluorescence recovery after photobleaching (FRAP) and single particle tracking (SPT) techniques to compare diffusion of class I major histocompatibility complex molecules (MHC) on normal and alpha-spectrin-deficient murine erythroleukemia (MEL) cells. Because the cytoskeleton mesh acts as a barrier to lateral mobility of membrane proteins, we expected that diffusion of membrane proteins in alpha-spectrin-deficient MEL cells would differ greatly from that in normal MEL cells. In the event, diffusion coefficients derived from either FRAP or SPT analysis were similar for alpha-spectrin-deficient and normal MEL cells, differing by a factor of approximately 2, on three different timescales: tens of seconds, 1-10 s, and 100 ms. SPT analysis showed that the diffusion of most class I MHC molecules was confined on both cell types. On the normal MEL cells, the mean diagonal length of the confined area was 330 nm with a mean residency time of 40s. On the alpha-spectrin-deficient MEL cells, the mean diagonal length was 650 nm with a mean residency time of 45s. Thus there are fewer barriers to lateral diffusion on cytoskeleton mutant MEL cells than on normal MEL cells, but this difference does not strongly affect lateral diffusion on the scales measured here. 相似文献
12.
In this paper, we present a method based on local density and random walks (LDRW) for core-attachment complexes detection in protein-protein interaction (PPI) networks whether they are weighted or not. Our LDRW method consists of two stages. Firstly, it finds all the protein-complex cores based on local density of subnetwork. Then it uses random walks with restarts for finding the attachment proteins of each detected core to form complexes. We evaluate the effectiveness of our method using two different yeast PPI networks and validate the biological significance of the predicted protein complexes using known complexes in the Munich Information Center for Protein Sequence (MIPS) and Gene Ontology (GO) databases. We also perform a comprehensive comparison between our method and other existing methods. The results show that our method can find more protein complexes with high biological significance and obtains a significant improvement. Furthermore, our method is able to identify biologically significant overlapped protein complexes. 相似文献
14.
遗传算法源于自然界的进化规律,是一种自适应启发式概率性迭代式全局搜索算法。本文主要介绍了GA的基本原理,算法及优点;总结GA在蛋白质结构预测中建立模型和执行策略,以及多种算法相互结合预测蛋白质结构的研究进展。 相似文献
16.
微RNA(microRNA,miRNA)是多种生物学过程的有效调节子,并表现为基因的定量调节。新出现的证据表明miRNA与天然免疫反应的调节有关。这种调节作用有助于维持宿主免疫反应和保护感染组织间的平衡。深入理解miRNA对天然免疫反应的调节有助于鉴定免疫调节的新靶标和建立基于miRNA的有效疗法。本综述重点总结miRNA在调节免疫细胞发育、Toll样受体和炎症细胞因子信号中的作用。 相似文献
17.
Summary When a very large number of phytosociological types have to be compared, a reduction of the number of relevés is desirable. In this paper a method of relevé selection from given phytosociological tables is suggested. The method is based on a sum of squares criterion. The advantage, in comparison with other selection procedures, is that this method provides a means on the basis of which the efficiency of a relevé selection can be objectively measured.Contribution from the Working Group for Data-Processing in Phytosociology, International Society for Vegetation Science.The work was completed at the Department of Plant Sciences of the University of Western Ontario, London, Canada. We wish to thank Prof. L. Orlóci for the hospitality and the helpful discussions. The work was supported by Italian C.N.R., within the project Promozione qualità dell'ambiente subproject Metodologie matematiche e basi di dati. 相似文献
18.
When observing the two-dimensional movement of animals or microorganisms, it is usually necessary to impose a fixed sampling rate, so that observations are made at certain fixed intervals of time and the trajectory is split into a set of discrete steps. A sampling rate that is too small will result in information about the original path and correlation being lost. If random walk models are to be used to predict movement patterns or to estimate parameters to be used in continuum models, then it is essential to be able to quantify and understand the effect of the sampling rate imposed by the observer on real trajectories. We use a velocity jump process with a realistic reorientation model to simulate correlated and biased random walks and investigate the effect of sampling rate on the observed angular deviation, apparent speed and mean turning angle. We discuss a method of estimating the values of the reorientation parameters used in the original random walk from the rediscretized data that assumes a linear relation between sampling time step and the parameter values. 相似文献
19.
Detecting protein complexes from protein‐protein interaction (PPI) network is becoming a difficult challenge in computational biology. There is ample evidence that many disease mechanisms involve protein complexes, and being able to predict these complexes is important to the characterization of the relevant disease for diagnostic and treatment purposes. This article introduces a novel method for detecting protein complexes from PPI by using a protein ranking algorithm (ProRank). ProRank quantifies the importance of each protein based on the interaction structure and the evolutionarily relationships between proteins in the network. A novel way of identifying essential proteins which are known for their critical role in mediating cellular processes and constructing protein complexes is proposed and analyzed. We evaluate the performance of ProRank using two PPI networks on two reference sets of protein complexes created from Munich Information Center for Protein Sequence, containing 81 and 162 known complexes, respectively. We compare the performance of ProRank to some of the well known protein complex prediction methods (ClusterONE, CMC, CFinder, MCL, MCode and Core) in terms of precision and recall. We show that ProRank predicts more complexes correctly at a competitive level of precision and recall. The level of the accuracy achieved using ProRank in comparison to other recent methods for detecting protein complexes is a strong argument in favor of the proposed method. Proteins 2012;. © 2012 Wiley Periodicals, Inc. 相似文献
20.
BackgroundGlioma is the most lethal nervous system cancer. Recent studies have made great efforts to study the occurrence and development of glioma, but the molecular mechanisms are still unclear. This study was designed to reveal the molecular mechanisms of glioma based on protein-protein interaction network combined with machine learning methods. Key differentially expressed genes (DEGs) were screened and selected by using the protein-protein interaction (PPI) networks. ResultsAs a result, 19 genes between grade I and grade II, 21 genes between grade II and grade III, and 20 genes between grade III and grade IV. Then, five machine learning methods were employed to predict the gliomas stages based on the selected key genes. After comparison, Complement Naive Bayes classifier was employed to build the prediction model for grade II-III with accuracy 72.8%. And Random forest was employed to build the prediction model for grade I-II and grade III-VI with accuracy 97.1% and 83.2%, respectively. Finally, the selected genes were analyzed by PPI networks, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the results improve our understanding of the biological functions of select DEGs involved in glioma growth. We expect that the key genes expressed have a guiding significance for the occurrence of gliomas or, at the very least, that they are useful for tumor researchers. ConclusionMachine learning combined with PPI networks, GO and KEGG analyses of selected DEGs improve our understanding of the biological functions involved in glioma growth. 相似文献
|