首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 209 毫秒
1.
长链非编码RNA(Long non-coding RNAs,lncRNAs)是一类广泛存在于真核生物中,长度大于200个核苷酸、无蛋白编码功能,具有调控基因转录后表达的RNA转录本。新近研究表明,lncRNA在多种生物途径中起着重要调节作用。生物信息学由生物、数学、计算机科学,统计学等多学科交叉产生,能从全局和系统水平对大数据信息进行深入挖掘与分析。采用生物信息学方法预测与分析lncRNA是当前发现和鉴定植物lncRNA的重要策略之一。本文梳理和总结了近年来采用生物信息学预测植物lncRNA及其靶基因的方法策略,以期为今后深入认知植物lncRNA在植物的生长发育过程、抗逆境胁迫及系统进化等过程中的作用研究提供一定参考。  相似文献   

2.
长链非编码RNA(long non-coding RNA,lncRNA)是指转录本长度超过200 nt且缺乏蛋白编码能力的一类RNA。越来越多的研究表明,lncRNA能在表观遗传、转录及转录后水平调节基因的表达,广泛参与机体的生理和病理过程,在各种疾病的发生和发展中起着重要作用。表观遗传学是研究基因发生可遗传变化而核苷酸序列不变的一门学科,表观遗传现象众多,主要有DNA甲基化(DNA methylation)、组蛋白修饰(histone modification)、染色质重塑(chromatin remodeling)等。本综述对lncRNA在表观遗传调控中的作用进行介绍,以期为进一步研究lncRNA的调控性状提供思路。  相似文献   

3.
长非编码RNA(lncRNA)是一类转录本长度大于200个核苷酸的非编码RNA分子,它们在细胞生命活动中的许多关键过程中起到重要调控作用。近年来关于lncRNA的研究发展迅速,涌现出一批用于lncRNA的鉴定、定量、结构分析以及功能预测的生物信息学工具和数据库,本文将对这些lncRNA研究的资源进行综述。  相似文献   

4.
转座子来源的植物长链非编码RNA   总被引:1,自引:0,他引:1  
转座子是基因组的重要组分, 影响基因组的结构与稳定。长链非编码RNA (lncRNA)在转录及转录后水平调控多个生物学过程。转座子与lncRNA是物种进化的重要驱动力。含有转座子序列的lncRNA在自然界广泛存在。该文对植物lncRNA的发掘策略和功能研究进行概述, 围绕植物转座子来源lncRNA (TE-lncRNA)的分布和功能展开综述, 并对植物TE-lncRNA的调控机制、表观修饰及育种潜势等进行探讨与展望。  相似文献   

5.
长链非编码RNA(long non-coding RNA,lncRNA)是一类转录本长度在200至数千个核苷酸序列,且不具有蛋白质编码潜能的非编码RNA。相较于研究较多的微小RNA(microRNA,miRNA)和干扰小RNA(small interfering,siRNA)等非编码小RNA,lncRNA的许多功能仍尚不清楚。但越来越多的研究发现,lncRNA可通过多种方式调控中枢神经系统发育,包括表观遗传组蛋白甲基化、转录辅因子调控、可变剪接调控等途经。而以上途经的异常均与多种人类重大疾病的发生密切相关,例如,阿尔兹海默症(Alzheimer’s disease,AD)、自闭症(autism spectrum disorder,ASD)、精神分裂症(schizophrenia,SZ)等。本文就lncRNA在表观遗传水平、转录水平、转录后水平和翻译水平上调控神经系统发育以及其在人类神经性疾病中的作用进行综述。  相似文献   

6.
转座子是基因组的重要组分, 影响基因组的结构与稳定。长链非编码RNA (lncRNA)在转录及转录后水平调控多个生物学过程。转座子与lncRNA是物种进化的重要驱动力。含有转座子序列的lncRNA在自然界广泛存在。该文对植物lncRNA的发掘策略和功能研究进行概述, 围绕植物转座子来源lncRNA (TE-lncRNA)的分布和功能展开综述, 并对植物TE-lncRNA的调控机制、表观修饰及育种潜势等进行探讨与展望。  相似文献   

7.
长非编码RNA(long non-coding RNA,lncRNA)是一类转录本长度超过200 nt、不编码蛋白质的RNA分子,以RNA的形式参与多层次调控,包括表观遗传学调控、转录调控以及转录后调控等。大量研究结果表明,许多长非编码RNA分子衍生于数百万年前"入侵"人类基因组的内源性反转录病毒(endogenous retrovirus,ERV)序列。内源性反转录病毒是基因组重要成分,约占基因组的5%~8%,多以"前病毒"形式存在,功能很大程度上未知。就内源性反转录病毒衍生的lncRNA在天然免疫、抗病毒和肿瘤等方面的最新研究进展进行综述。  相似文献   

8.
长非编码RNA     
人类基因组序列的约5%~10%被稳定转录,蛋白质编码基因仅约占1%,其余4%~9%的序列虽能转录,但转录物功能尚不明确。尽管如此,已确证在非蛋白质编码转录物中,含有具备调节功能的非编码RNA(noncoding RNA,ncRNA)。与具有调节功能的短链非编码RNA[如微RNA(microRNA)、小干扰RNA(siRNA),、Piwi-RNA]相比,长非编码RNA(long noncoding RNA,lncRNA)在数量上占大多数。lncRNA通过多种方式产生,以多种途径调节靶基因表达,参与调控生物体生长、发育、衰老、死亡等过程;lncRNA功能异常往往导致疾病发生。本文综述了lncRNA的起源、分类、作用分子机制及lncRNA异常与疾病的相关性等内容,旨在充分了解这一重要新型调控分子。  相似文献   

9.
长链非编码RNA (long noncoding RNA, lncRNA)是多种复杂有机体转录组中最主要的一类转录本. lncRNA在各种生物之间序列保守性差、表达量普遍比较低.与编码基因相比,lncRNA有相似的启动子区域以及剪切位点,具有较好的细胞和组织特异性分布,尤其在神经系统中具有较为丰富的表达,提示它们在神经系统中具有不可忽视的作用.本文围绕近几年lncRNA在神经系统方面的最新研究成果,总结了lncRNA对中枢和外周神经系统发育以及对神经系统功能等方面的调控作用及机制.同时展望了有关lncRNA研究的新理念和新技术及对未来神经科学研究的推动作用.  相似文献   

10.
路畅  黄银花 《遗传》2017,39(11):1054-1065
长链非编码RNA(long non-coding RNA,lncRNA)是一类广泛存在于动植物体内、长度大于200nt、基本不编码蛋白质的转录本。研究表明,lncRNA能够协助蛋白质复合体转运、参与基因和染色体的激活与失活调控等,在胚胎发育、肌肉生长、脂肪沉积以及免疫应答等过程中发挥重要作用。近年来,在人类基因组计划和ENCODE(The Encyclopedia of DNA Elements)计划推动下,在动物中不仅鉴定出数量众多的lncRNA,而且在lncRNA调控脂肪代谢、肌肉发育以及免疫抗病等重要生物学过程的机理研究方面也取得了突破性的进展。这些研究结果颠覆了lncRNA不编码蛋白的传统观念,提出了lncRNA编码功能性小肽调控生物学过程的新模型。本文主要介绍了动物lncRNA的特征与类型、常用数据库、生物学功能、分子调控模型以及未来lncRNA的研究方向,以期为动物lncRNA功能研究提供参考信息。  相似文献   

11.
12.
Prediction of RNA binding sites in a protein using SVM and PSSM profile   总被引:1,自引:0,他引:1  
Kumar M  Gromiha MM  Raghava GP 《Proteins》2008,71(1):189-194
  相似文献   

13.
长非编码RNA(long no-coding RNA,lncRNA)不含有开放阅读框,且长度大于200 bp,一般不具有编码蛋白质功能。它们在生物体内普遍存在,并发挥着多种生物学作用。植物种子萌发期的特异表达的lncRNA研究较少。BoNR8 lncRNA是结球甘蓝中RNA聚合酶Ⅲ转录的长非编码RNA(约272 bp)。前期研究发现,它在种子萌发期的特异表达;拟南芥中BoNR8 lncRNA过表达抑制了正常条件下种子萌发并降低了萌发种子对ABA的敏感性。本研究对BoNR8 lncRNA序列分析发现,其转录区域内存在冷胁迫基序(ATATAAATAAAT),并且这个基序存在BoNR8 lncRNA二级结构的最大茎环中。生化实验进一步表明,BoNR8 lncRNA响应低温环境,其过表达抑制了低温胁迫下拟南芥种子萌发。BoNR8 lncRNA低温相关功能的发现为植物耐低温研究提供了新材料,丰富了植物耐低温作用机制的研究。  相似文献   

14.

Background

Long noncoding RNAs (lncRNAs) are widely involved in the initiation and development of cancer. Although some computational methods have been proposed to identify cancer-related lncRNAs, there is still a demanding to improve the prediction accuracy and efficiency. In addition, the quick-update data of cancer, as well as the discovery of new mechanism, also underlay the possibility of improvement of cancer-related lncRNA prediction algorithm. In this study, we introduced CRlncRC, a novel Cancer-Related lncRNA Classifier by integrating manifold features with five machine-learning techniques.

Results

CRlncRC was built on the integration of genomic, expression, epigenetic and network, totally in four categories of features. Five learning techniques were exploited to develop the effective classification model including Random Forest (RF), Naïve bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR) and K-Nearest Neighbors (KNN). Using ten-fold cross-validation, we showed that RF is the best model for classifying cancer-related lncRNAs (AUC?=?0.82). The feature importance analysis indicated that epigenetic and network features play key roles in the classification. In addition, compared with other existing classifiers, CRlncRC exhibited a better performance both in sensitivity and specificity. We further applied CRlncRC to lncRNAs from the TANRIC (The Atlas of non-coding RNA in Cancer) dataset, and identified 121 cancer-related lncRNA candidates. These potential cancer-related lncRNAs showed a certain kind of cancer-related indications, and many of them could find convincing literature supports.

Conclusions

Our results indicate that CRlncRC is a powerful method for identifying cancer-related lncRNAs. Machine-learning-based integration of multiple features, especially epigenetic and network features, had a great contribution to the cancer-related lncRNA prediction. RF outperforms other learning techniques on measurement of model sensitivity and specificity. In addition, using CRlncRC method, we predicted a set of cancer-related lncRNAs, all of which displayed a strong relevance to cancer as a valuable conception for the further cancer-related lncRNA function studies.
  相似文献   

15.
Prediction of protein domain with mRMR feature selection and analysis   总被引:2,自引:0,他引:2  
Li BQ  Hu LL  Chen L  Feng KY  Cai YD  Chou KC 《PloS one》2012,7(6):e39308
The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28-40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine.  相似文献   

16.
《Genomics》2020,112(5):2928-2936
Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA–protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.  相似文献   

17.
18.
Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP''s functional association. This research will facilitate the post genome-wide association studies.  相似文献   

19.
目的 长非编码RNA(lncRNAs)参与多种重要的生物学过程并与各种人类疾病密切相关,因此,lncRNA-疾病关联预测研究有助于疾病的诊断、治疗和在分子水平理解人类疾病的发生发展机制。目前,大多数lncRNA-疾病关联预测方法倾向于浅层整合lncRNA和疾病的相关信息,忽略网络拓扑结构中的深层嵌入特征;另外通过随机选取lncRNA-疾病非关联对构建负样本训练集合,影响预测方法的鲁棒性。方法 本文提出一种基于网络嵌入的NELDA方法,预测潜在的lncRNA-疾病关联关系。NELDA首先利用lncRNA 表达谱、疾病本体论和已知的lncRNA-疾病关联关系,构建lncRNA相似性网络、疾病相似性网络和lncRNA-疾病关联网络。然后,通过设计4个深度自编码器分别从lncRNA/疾病的相似性网络、lncRNA-疾病关联网络学习lncRNA和疾病的低维网络嵌入特征。串联lncRNA和疾病的相似性网络嵌入特征及lncRNA和疾病的关联网络嵌入特征,分别输入两个支持向量机分类器预测lncRNA-疾病关联。最后,采用加权融合策略融合两个支持向量机分类器的预测结果,给出lncRNA-疾病关联关系的最终预测结果。另外,根据已知的lncRNA-疾病关联对和疾病语义相似性,设计一种负样本选取策略构建可信度相对较高的lncRNA-疾病非关联对样本集,用以改善分类器的鲁棒性,该策略通过设计一种打分函数为每对lncRNA-疾病进行打分,选取得分较低的lncRNA-疾病对作为lncRNA-疾病非关联对样本(即负样本)。结果 十折交叉验证实验结果表明:NELDA能够有效预测lncRNA-疾病关联关系,其AUC达到0.982 7,比现有LDASR和 LDNFSGB方法分别提高了0.062 7和0.020 7。另外,负样本选取策略与决策级加权融合策略能够有效改善NELDA预测性能。胃癌和乳腺癌案例研究中,29/40(72.5%)预测的与胃癌和乳腺癌关联lncRNAs,在近期文献和公共数据库中能够发现相关的支撑证据。结论 这些实验结果表明,NELDA是一种有效的lncRNA-疾病关联关系预测方法,具有挖掘潜在lncRNA-疾病关联关系的能力。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号