首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.

Background

The human genome contains millions of single nucleotide polymorphisms (SNPs); many of these SNPs are intronic and have unknown functional significance. SNPs occurring within intron branchpoint sites, especially at the adenine (A), would presumably affect splicing; however, this has not been systematically studied. We employed a splicing prediction tool to identify human intron branchpoint sites and screened dbSNP for identifying SNPs located in the predicted sites to generate a genome-wide branchpoint site SNP database.

Results

We identified 600 SNPs located within branchpoint sites; among which, 216 showed a change in A. After scoring the SNPs by counting the As in the ±?10 nucleotide region, only four SNPs were identified without additional As (rs13296170, rs12769205, rs75434223, and rs67785924). Using minigene constructs, we examined the effects of these SNPs on splicing. The three SNPs (rs13296170, rs12769205, and rs75434223) with nucleotide substitution at the A position resulted in abnormal splicing (exon skipping and/or intron inclusion). However, rs67785924, a 5-bp deletion that abolished the branchpoint A nucleotide, exhibited normal RNA splicing pattern, presumably using two of the downstream As as alternative branchpoints. The influence of additional As on splicing was further confirmed by studying rs2733532, which contains three additional As in the ±?10 nucleotide region.

Conclusions

We generated a high-confidence genome-wide branchpoint site SNP database, experimentally verified the importance of A in the branchpoint, and suggested that other nearby As can protect branchpoint A substitution from abnormal splicing.
  相似文献   

2.
3.
Gao S  Xu S  Fang Y  Fang J 《Proteome science》2012,10(Z1):S7

Background

Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.

Methods

A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).

Results

Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.

Conclusions

The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.
  相似文献   

4.

Introduction

Chromosomal anomalies (CA) are the most frequent fetal anomalies.

Objective

To evaluate the diagnostic performance of a machine learning ensemble model based on the maternal serum metabolomic fingerprint of fetal aneuploidies during the second trimester .

Methods

This is a case-control pilot study. Metabolomic profiles have been obtained on serum of 328 mothers (220 controls and 108 cases), using gas chromatography coupled to mass spectrometry. Eight machines learning and classification models were built and optimized. An ensemble model was built using a voting scheme. All samples were randomly divided into two sets. One was used as training set, the other one for diagnostic performance assessment.

Results

Ensemble machine learning model correctly classified all cases and controls. The accuracy was the same for trisomy 21 and 18; also, the other CA were correctly detected. Elaidic, stearic, linolenic, myristic, benzoic, citric and glyceric acid, mannose, 2-hydroxy butyrate, phenylalanine, proline, alanine and 3-methyl histidine were selected as the most relevant metabolites in class separation.

Conclusion

The proposed model, based on the maternal serum metabolomic fingerprint of fetal aneuploidies during the second trimester, correctly identifies all the cases of chromosomal abnormalities. Overall, this preliminary analysis appeared suggestive of a metabolic environment conductive to increased oxidative stress and a disturbance in the fetal central nervous system development. Maternal serum metabolomics can be a promising tool in the screening of chromosomal defects. Moreover, metabolomics allows to extend our knowledge about biochemical alterations caused by aneuploidies and responsible for the observed phenotypes.
  相似文献   

5.

Background

Hot spot residues are functional sites in protein interaction interfaces. The identification of hot spot residues is time-consuming and laborious using experimental methods. In order to address the issue, many computational methods have been developed to predict hot spot residues. Moreover, most prediction methods are based on structural features, sequence characteristics, and/or other protein features.

Results

This paper proposed an ensemble learning method to predict hot spot residues that only uses sequence features and the relative accessible surface area of amino acid sequences. In this work, a novel feature selection technique was developed, an auto-correlation function combined with a sliding window technique was applied to obtain the characteristics of amino acid residues in protein sequence, and an ensemble classifier with SVM and KNN base classifiers was built to achieve the best classification performance.

Conclusion

The experimental results showed that our model yields the highest F1 score of 0.92 and an MCC value of 0.87 on ASEdb dataset. Compared with other machine learning methods, our model achieves a big improvement in hot spot prediction.
  相似文献   

6.
7.

Background

Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes.

Methods

In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters.

Results

Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes.

Conclusions

Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.
  相似文献   

8.
9.
Lyu  Chuqiao  Wang  Lei  Zhang  Juhua 《BMC genomics》2018,19(10):905-165

Background

The DNase I hypersensitive sites (DHSs) are associated with the cis-regulatory DNA elements. An efficient method of identifying DHSs can enhance the understanding on the accessibility of chromatin. Despite a multitude of resources available on line including experimental datasets and computational tools, the complex language of DHSs remains incompletely understood.

Methods

Here, we address this challenge using an approach based on a state-of-the-art machine learning method. We present a novel convolutional neural network (CNN) which combined Inception like networks with a gating mechanism for the response of multiple patterns and longterm association in DNA sequences to predict multi-scale DHSs in Arabidopsis, rice and Homo sapiens.

Results

Our method obtains 0.961 area under curve (AUC) on Arabidopsis, 0.969 AUC on rice and 0.918 AUC on Homo sapiens.

Conclusions

Our method provides an efficient and accurate way to identify multi-scale DHSs sequences by deep learning.
  相似文献   

10.

Background

The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information.

Methods

We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis.

Results

Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests.

Conclusions

The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.
  相似文献   

11.

Background

Adverse drug reactions (ADRs) are unintended and harmful reactions caused by normal uses of drugs. Predicting and preventing ADRs in the early stage of the drug development pipeline can help to enhance drug safety and reduce financial costs.

Methods

In this paper, we developed machine learning models including a deep learning framework which can simultaneously predict ADRs and identify the molecular substructures associated with those ADRs without defining the substructures a-priori.

Results

We evaluated the performance of our model with ten different state-of-the-art fingerprint models and found that neural fingerprints from the deep learning model outperformed all other methods in predicting ADRs. Via feature analysis on drug structures, we identified important molecular substructures that are associated with specific ADRs and assessed their associations via statistical analysis.

Conclusions

The deep learning model with feature analysis, substructure identification, and statistical assessment provides a promising solution for identifying risky components within molecular structures and can potentially help to improve drug safety evaluation.
  相似文献   

12.

Background

Altered expression of mRNA splicing factors occurs with ageing in vivo and is thought to be an ageing mechanism. The accumulation of senescent cells also occurs in vivo with advancing age and causes much degenerative age-related pathology. However, the relationship between these two processes is opaque. Accordingly we developed a novel panel of small molecules based on resveratrol, previously suggested to alter mRNA splicing, to determine whether altered splicing factor expression had potential to influence features of replicative senescence.

Results

Treatment with resveralogues was associated with altered splicing factor expression and rescue of multiple features of senescence. This rescue was independent of cell cycle traverse and also independent of SIRT1, SASP modulation or senolysis. Under growth permissive conditions, cells demonstrating restored splicing factor expression also demonstrated increased telomere length, re-entered cell cycle and resumed proliferation. These phenomena were also influenced by ERK antagonists and agonists.

Conclusions

This is the first demonstration that moderation of splicing factor levels is associated with reversal of cellular senescence in human primary fibroblasts. Small molecule modulators of such targets may therefore represent promising novel anti-degenerative therapies.
  相似文献   

13.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

14.

Background

Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions.

Results

In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC.

Conclusions

Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
  相似文献   

15.

Introduction

Prostate cancer (PCa) is one of the most common malignancies in men worldwide. Serum prostate specific antigen (PSA) level has been extensively used as a biomarker to detect PCa. However, PSA is not cancer-specific and various non-malignant conditions, including benign prostatic hyperplasia (BPH), can cause a rise in PSA blood levels, thus leading to many false positive results.

Objectives

In this study, we evaluated the potential of urinary metabolomic profiling for discriminating PCa from BPH.

Methods

Urine samples from 64 PCa patients and 51 individuals diagnosed with BPH were analysed using 1H nuclear magnetic resonance (1H-NMR). Comparative analysis of urinary metabolomic profiles was carried out using multivariate and univariate statistical approaches.

Results

The urine metabolomic profile of PCa patients is characterised by increased concentrations of branched-chain amino acids (BCAA), glutamate and pseudouridine, and decreased concentrations of glycine, dimethylglycine, fumarate and 4-imidazole-acetate compared with individuals diagnosed with BPH.

Conclusion

PCa patients have a specific urinary metabolomic profile. The results of our study underscore the clinical potential of metabolomic profiling to uncover metabolic changes that could be useful to discriminate PCa from BPH in a clinical context.
  相似文献   

16.
17.

Background:

The wide availability of genome-scale data for several organisms has stimulated interest in computational approaches to gene function prediction. Diverse machine learning methods have been applied to unicellular organisms with some success, but few have been extensively tested on higher level, multicellular organisms. A recent mouse function prediction project (MouseFunc) brought together nine bioinformatics teams applying a diverse array of methodologies to mount the first large-scale effort to predict gene function in the laboratory mouse.

Results:

In this paper, we describe our contribution to this project, an ensemble framework based on the support vector machine that integrates diverse datasets in the context of the Gene Ontology hierarchy. We carry out a detailed analysis of the performance of our ensemble and provide insights into which methods work best under a variety of prediction scenarios. In addition, we applied our method to Saccharomyces cerevisiae and have experimentally confirmed functions for a novel mitochondrial protein.

Conclusion:

Our method consistently performs among the top methods in the MouseFunc evaluation. Furthermore, it exhibits good classification performance across a variety of cellular processes and functions in both a multicellular organism and a unicellular organism, indicating its ability to discover novel biology in diverse settings.
  相似文献   

18.

Background

Inflammatory bowel disease is a group of pathologies characterised by chronic inflammation of the intestine and an unclear aetiology. Its main manifestations are Crohn’s disease and ulcerative colitis. Currently, biopsies are the most used diagnostic tests for these diseases and metabolomics could represent a less invasive approach to identify biomarkers of disease presence and progression.

Objectives

The lipid and the polar metabolite profile of plasma samples of patients affected by inflammatory bowel disease have been compared with healthy individuals with the aim to find their metabolomic differences. Also, a selected sub-set of samples was analysed following solid phase extraction to further characterise differences between pathological samples.

Methods

A total of 200 plasma samples were analysed using drift tube ion mobility coupled with time of flight mass spectrometry and liquid chromatography for the lipid metabolite profile analysis, while liquid chromatography coupled with triple quadrupole mass spectrometry was used for the polar metabolite profile analysis.

Results

Variations in the lipid profile between inflammatory bowel disease and healthy individuals were highlighted. Phosphatidylcholines, lyso-phosphatidylcholines and fatty acids were significantly changed among pathological samples suggesting changes in phospholipase A2 and arachidonic acid metabolic pathways. Variations in the levels of cholesteryl esters and glycerophospholipids were also found. Furthermore, a decrease in amino acids levels suggests mucosal damage in inflammatory bowel disease.

Conclusions

Given good statistical results and predictive power of the model produced in our study, metabolomics can be considered as a valid tool to investigate inflammatory bowel disease.
  相似文献   

19.

Background

Hutchinson-Gilford progeria syndrome (HGPS) is a devastating premature aging disorder. It arises from a single point mutation in the LMNA gene. This mutation stimulates an aberrant splicing event and produces progerin, an isoform of the lamin A protein. Accumulation of progerin disrupts numerous physiological pathways and induces defects in nuclear architecture, gene expression, histone modification, cell cycle regulation, mitochondrial functionality, genome integrity and much more.

Objective

Among these phenotypes, genomic instability is tightly associated with physiological aging and considered a main contributor to the premature aging phenotypes. However, our understanding of the underlying molecular mechanisms of progerin-caused genome instability is far from clear.

Results and Conclusion

In this review, we summarize some of the recent findings and discuss potential mechanisms through which, progerin affects DNA damage repair and leads to genome integrity.
  相似文献   

20.

Background

Central nervous system anomalies represent a wide range of congenital birth defects, with an incidence of approximately 1% of all births. They are currently diagnosed using ultrasound evaluation. However, there is strong need for a more accurate and less operator-dependent screening method.

Objectives

To perform a characterization of maternal serum in order to build a metabolomic fingerprint resulting from congenital anomalies of the central nervous system.

Methods

This is a case–control pilot study. Metabolomic profiles were obtained from serum of 168 mothers (98 controls and 70 cases), using gas chromatography coupled to mass spectrometry. Nine machine learning and classification models were built and optimized. An ensemble model was built based on results from the individual models. All samples were randomly divided into two groups. One was used as training set, the other one for diagnostic performance assessment.

Results

Ensemble machine learning model correctly classified all cases and controls. Propanoic, lactic, gluconic, benzoic, oxalic, 2-hydroxy-3-methylbutyric, acetic, lauric, myristic and stearic acid and myo-inositol and mannose were selected as the most relevant metabolites in class separation.

Conclusion

The metabolomic signature of second trimester maternal serum from pregnancies affected by a fetal central nervous system anomaly is quantifiably different from that of a normal pregnancy. Maternal serum metabolomics is therefore a promising tool for the accurate and sensitive screening of such congenital defects. Moreover, the details of the most relevant metabolites and their respective biochemical pathways allow better understanding of the overall pathophysiology of affected pregnancies.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号