首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Genomics》2020,112(5):2928-2936
Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA–protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.  相似文献   

2.
3.
Several proteins and genes are members of families that share a public evolutionary. In order to outline the evolutionary relationships and to recognize conserved patterns, sequence comparison becomes an emerging process. The current work investigates critically the k-mer role in composition vector method for comparing genome sequences. Generally, composition vector methods using k-mer are applied under choice of different value of k to compare genome sequences. For some values of k, results are satisfactory, but for other values of k, results are unsatisfactory. Standard composition vector method is carried out in the proposed work using 3-mer string length. In addition, special type of information based similarity index is used as a distance measure. It establishes that use of 3-mer and information based similarity index provide satisfactory results especially for comparison of whole genome sequences in all cases. These selections provide a sort of unified approach towards comparison of genome sequences.  相似文献   

4.
《Genomics》2020,112(3):2233-2240
MicroRNA-like small RNAs (milRNAs) with length of 21–22 nucleotides are a type of small non-coding RNAs that are firstly found in Neurospora crassa in 2010. Identifying milRNAs of species without genomic information is a difficult problem. Here, knowledge-based energy features are developed to identify milRNAs by tactfully incorporating k-mer scheme and distance-dependent pair potential. Compared with k-mer scheme, features developed here can alleviate the inherent curse of dimensionality in k-scheme once k becomes large. In addition, milRNApredictor built on novel features performs comparably to k-mer scheme, and achieves sensitivity of 74.21%, and specificity of 75.72% based on 10-fold cross-validation. Furthermore, for novel miRNA prediction, there exists high overlap of results from milRNApredictor and state-of-the-art mirnovo. However, milRNApredictor is simpler to use with reduced requirements of input data and dependencies. Taken together, milRNApredictor can be used to de novo identify fungi milRNAs and other very short small RNAs of non-model organisms.  相似文献   

5.
《Genomics》2019,111(6):1298-1305
Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.  相似文献   

6.

Background

Long noncoding RNAs (lncRNAs) are widely involved in the initiation and development of cancer. Although some computational methods have been proposed to identify cancer-related lncRNAs, there is still a demanding to improve the prediction accuracy and efficiency. In addition, the quick-update data of cancer, as well as the discovery of new mechanism, also underlay the possibility of improvement of cancer-related lncRNA prediction algorithm. In this study, we introduced CRlncRC, a novel Cancer-Related lncRNA Classifier by integrating manifold features with five machine-learning techniques.

Results

CRlncRC was built on the integration of genomic, expression, epigenetic and network, totally in four categories of features. Five learning techniques were exploited to develop the effective classification model including Random Forest (RF), Naïve bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR) and K-Nearest Neighbors (KNN). Using ten-fold cross-validation, we showed that RF is the best model for classifying cancer-related lncRNAs (AUC?=?0.82). The feature importance analysis indicated that epigenetic and network features play key roles in the classification. In addition, compared with other existing classifiers, CRlncRC exhibited a better performance both in sensitivity and specificity. We further applied CRlncRC to lncRNAs from the TANRIC (The Atlas of non-coding RNA in Cancer) dataset, and identified 121 cancer-related lncRNA candidates. These potential cancer-related lncRNAs showed a certain kind of cancer-related indications, and many of them could find convincing literature supports.

Conclusions

Our results indicate that CRlncRC is a powerful method for identifying cancer-related lncRNAs. Machine-learning-based integration of multiple features, especially epigenetic and network features, had a great contribution to the cancer-related lncRNA prediction. RF outperforms other learning techniques on measurement of model sensitivity and specificity. In addition, using CRlncRC method, we predicted a set of cancer-related lncRNAs, all of which displayed a strong relevance to cancer as a valuable conception for the further cancer-related lncRNA function studies.
  相似文献   

7.
Signal transduction by growth factor receptors is essential for cells to maintain proliferation and differentiation and requires tight control. Signal transduction is initiated by binding of an external ligand to a transmembrane receptor and activation of downstream signaling cascades. A key regulator of mitogenic signaling is Grb2, a modular protein composed of an internal SH2 (Src Homology 2) domain flanked by two SH3 domains that lacks enzymatic activity. Grb2 is constitutively associated with the GTPase Son-Of-Sevenless (SOS) via its N-terminal SH3 domain. The SH2 domain of Grb2 binds to growth factor receptors at phosphorylated tyrosine residues thus coupling receptor activation to the SOS-Ras-MAP kinase signaling cascade. In addition, other roles for Grb2 as a positive or negative regulator of signaling and receptor endocytosis have been described. The modular composition of Grb2 suggests that it can dock to a variety of receptors and transduce signals along a multitude of different pathways1-3.Described here is a simple microscopy assay that monitors recruitment of Grb2 to the plasma membrane. It is adapted from an assay that measures changes in sub-cellular localization of green-fluorescent protein (GFP)-tagged Grb2 in response to a stimulus4-6. Plasma membrane receptors that bind Grb2 such as activated Epidermal Growth Factor Receptor (EGFR) recruit GFP-Grb2 to the plasma membrane upon cDNA expression and subsequently relocate to endosomal compartments in the cell. In order to identify in vivo protein complexes of Grb2, this technique can be used to perform a genome-wide high-content screen based on changes in Grb2 sub-cellular localization. The preparation of cDNA expression clones, transfection and image acquisition are described in detail below. Compared to other genomic methods used to identify protein interaction partners, such as yeast-two-hybrid, this technique allows the visualization of protein complexes in mammalian cells at the sub-cellular site of interaction by a simple microscopy-based assay. Hence both qualitative features, such as patterns of localization can be assessed, as well as the quantitative strength of the interaction.  相似文献   

8.
9.
Long non-coding RNAs (lncRNAs) have emerged as critical factors for regulating multiple biological processes during organ fibrosis. However, the mechanism of lncRNAs in idiopathic pulmonary fibrosis (IPF) remains incompletely understood. In the present study, two sets of lncRNAs were defined: IPF pathogenic lncRNAs and IPF progression lncRNAs. IPF pathogenic and progression lncRNAs-mRNAs co-expression networks were constructed to identify essential lncRNAs. Network analysis revealed a key lncRNA CTD-2528L19.6, which was up-regulated in early-stage IPF compared to normal lung tissue, and subsequently down-regulated during advanced-stage IPF. CTD-2528L19.6 was indicated to regulate fibroblast activation in IPF progression by mediating the expression of fibrosis related genes LRRC8C, DDIT4, THBS1, S100A8 and TLR7 et al. Further studies showed that silencing of CTD-2528L19.6 increases the expression of Fn1 and Collagen I both at mRNA and protein levels, promoted the transition of fibroblasts into myofibroblasts and accelerated the migration and proliferation of MRC-5 cells. In contrast, CTD-2528L19.6 overexpression alleviated fibroblast activation in MRC-5 cells induced by TGF-β1. LncRNA CTD-2528L19.6 inhibited fibroblast activation through regulating the expression of LRRC8C in vitro assays. Our results suggest that CTD-2528L19.6 may prevent the progression of IPF from early-stage and alleviate fibroblast activation during the advanced-stage of IPF. Thus, exploring the regulatory effect of lncRNA CTD-2528L19.6 may provide new sights for the prevention and treatment of IPF.Subject terms: Mechanisms of disease, Non-coding RNAs  相似文献   

10.
11.
Long noncoding RNAs (lncRNAs) have emerged as a major regulator of cell physiology, but many of which have no known function. CDKN1A/p21 is an important inhibitor of the cell-cycle, regulator of the DNA damage response and effector of the tumor suppressor p53, playing a crucial role in tumor development and prevention. In order to identify a regulator for tumor progression, we performed an siRNA screen of human lncRNAs required for cell proliferation, and identified a novel lncRNA, APTR, that acts in trans to repress the CDKN1A/p21 promoter independent of p53 to promote cell proliferation. APTR associates with the promoter of CDKN1A/p21 and this association requires a complementary-Alu sequence encoded in APTR. A different module of APTR associates with and recruits the Polycomb repressive complex 2 (PRC2) to epigenetically repress the p21 promoter. A decrease in APTR is necessary for the induction of p21 after heat stress and DNA damage by doxorubicin, and the levels of APTR and p21 are anti-correlated in human glioblastomas. Our data identify a new regulator of the cell-cycle inhibitor CDKN1A/p21 that acts as a proliferative factor in cancer cell lines and in glioblastomas and demonstrate that Alu elements present in lncRNAs can contribute to targeting regulatory lncRNAs to promoters.  相似文献   

12.
Schizophyllum commune has emerged as the most promising model mushroom to study developmental stages (mycelium, primordium), which are two primary processes of fruit body development. Long non-coding RNA (lncRNA) has been proved to participate in fruit development and sex differentiation in fungi. However, potential lncRNAs have not been identified in S. commune from mycelium to primordium developmental stages. In this study, lncRNA-seq was performed in S. commune and 61.56 Gb clean data were generated from mycelium and primordium developmental stages. Furthermore, 191 lncRNAs had been obtained and a total of 49 lncRNAs were classified as differently expressed lncRNAs. Additionally, 26 up-regulated differently expressed lncRNAs and 23 down-regulated between mycelium and primordia libraries were detected. Further, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that differentially expressed lncRNAs target genes from the MAPK pathway, phosphatidylinositol signal, ubiquitin-mediated proteolysis, autophagy, and cell cycle. This study provides a new resource for further research on the relationship between lncRNA and two developmental stages (mycelium, primordium) in S. commune.  相似文献   

13.
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.  相似文献   

14.
Breast cancer, the most common cancer in women worldwide, is associated with high mortality. The long non-coding RNAs (lncRNAs) with a little capacity of coding proteins is playing an increasingly important role in the cancer paradigm. Accumulating evidences demonstrate that lncRNAs have crucial connections with breast cancer prognosis while the studies of lncRNAs in breast cancer are still in its primary stage. In this study, we collected 1052 clinical patient samples, a comparatively large sample size, including 13 159 lncRNA expression profiles of breast invasive carcinoma (BRCA) from The Cancer Genome Atlas database to identify prognosis-related lncRNAs. We randomly separated all of these clinical patient samples into training and testing sets. In the training set, we performed univariable Cox regression analysis for primary screening and played the model for Robust likelihood-based survival for 1000 times. Then 11 lncRNAs with a frequency more than 600 were selected for prediction of the prognosis of BRCA. Using the analysis of multivariate Cox regression, we established a signature risk-score formula for 11 lncRNA to identify the relationship between lncRNA signatures and overall survival. The 11 lncRNA signature was validated both in the testing and the complete set and could effectively classify the high-/low-risk group with different OS. We also verified our results in different stages. Moreover, we analyzed the connection between the 11 lncRNAs and the genes of ESR1, PGR, and Her2, of which protein products (ESR, PGR, and HER2) were used to classify the breast cancer subtypes widely. The results indicated correlations between 11 lncRNAs and the gene of PGR and ESR1. Thus, a prognostic model for 11 lncRNA expression was developed to classify the BRAC clinical patient samples, providing new avenues in understanding the potential therapeutic methods of breast cancer.  相似文献   

15.
Hu  Jialu  Gao  Yiqun  Li  Jing  Zheng  Yan  Wang  Jingru  Shang  Xuequn 《BMC bioinformatics》2019,20(18):1-12
Background

It’s a very urgent task to identify cancer genes that enables us to understand the mechanisms of biochemical processes at a biomolecular level and facilitates the development of bioinformatics. Although a large number of methods have been proposed to identify cancer genes at recent times, the biological data utilized by most of these methods is still quite less, which reflects an insufficient consideration of the relationship between genes and diseases from a variety of factors.

Results

In this paper, we propose a two-rounds random walk algorithm to identify cancer genes based on multiple biological data (TRWR-MB), including protein-protein interaction (PPI) network, pathway network, microRNA similarity network, lncRNA similarity network, cancer similarity network and protein complexes. In the first-round random walk, all cancer nodes, cancer-related genes, cancer-related microRNAs and cancer-related lncRNAs, being associated with all the cancer, are used as seed nodes, and then a random walker walks on a quadruple layer heterogeneous network constructed by multiple biological data. The first-round random walk aims to select the top score k of potential cancer genes. Then in the second-round random walk, genes, microRNAs and lncRNAs, being associated with a certain special cancer in corresponding cancer class, are regarded as seed nodes, and then the walker walks on a new quadruple layer heterogeneous network constructed by lncRNAs, microRNAs, cancer and selected potential cancer genes. After the above walks finish, we combine the results of two-rounds RWR as ranking score for experimental analysis. As a result, a higher value of area under the receiver operating characteristic curve (AUC) is obtained. Besides, cases studies for identifying new cancer genes are performed in corresponding section.

Conclusion

In summary, TRWR-MB integrates multiple biological data to identify cancer genes by analyzing the relationship between genes and cancer from a variety of biological molecular perspective.

  相似文献   

16.
Multiwavelength spectroscopy is a rapid analytical technique that can be applied to detect, identify, and quantify microorganisms such as Karenia brevis, the species known for frequent red-tide blooms in Florida's coastal waters. This research will report on a model-based interpretation of UV–vis spectra of K. brevis. The spectroscopy models are based on light scattering and absorption theories, and the approximation of the frequency-dependant optical properties of the basic constituents of living organisms. Absorption and scattering properties of K. brevis, such as cell size/shape, internal structure, and chemical composition, are shown to predict the spectral features observed in the measured spectra. The parameters for the interpretation model were based upon both reported literature values, and experimental values obtained from live cultures and pigment standards. Measured and mathematically derived spectra were compared to determine the adequacy of the model, contribute new spectral information, and to establish the proposed spectral interpretation approach as a new detection method for K. brevis.  相似文献   

17.
18.
19.
INTRODUCTION: The molecular mechanisms underlying aggressive versus indolent disease are not fully understood. Recent research has implicated a class of molecules known as long noncoding RNAs (lncRNAs) in tumorigenesis and progression of cancer. Our objective was to discover lncRNAs that differentiate aggressive and indolent prostate cancers. METHODS: We analyzed paired tumor and normal tissues from six aggressive Gleason score (GS) 8-10 and six indolent GS 6 prostate cancers. Extracted RNA was split for poly(A)+ and ribosomal RNA depletion library preparations, followed byRNA sequencing (RNA-Seq) using an Illumina HiSeq 2000. We developed an RNA-Seq data analysis pipeline to discover and quantify these molecules. Candidate lncRNAs were validated using RT-qPCR on 87 tumor tissue samples: 28 (GS 6), 28 (GS 3+4), 6 (GS 4+3), and 25 (GS 8-10). Statistical correlations between lncRNAs and clinicopathologic variables were tested using ANOVA. RESULTS: The 43 differentially expressed (DE) lncRNAs between aggressive and indolent prostate cancers included 12 annotated and 31 novel lncRNAs. The top six DE lncRNAs were selected based on large, consistent fold-changes in the RNA-Seq results. Three of these candidates passed RT-qPCR validation, including AC009014.3 (P < .001 in tumor tissue) and a newly discovered X-linked lncRNA named XPLAID (P = .049 in tumor tissue and P = .048 in normal tissue). XPLAID and AC009014.3 show promise as prognostic biomarkers. CONCLUSIONS: We discovered several dozen lncRNAs that distinguish aggressive and indolent prostate cancers, of which four were validated using RT-qPCR. The investigation into their biology is ongoing.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号