首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool.

Methods

We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.

Results

We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity.

Conclusions

Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).
  相似文献   

2.

Background

Existing clustering approaches for microarray data do not adequately differentiate between subsets of co-expressed genes. We devised a novel approach that integrates expression and sequence data in order to generate functionally coherent and biologically meaningful subclusters of genes. Specifically, the approach clusters co-expressed genes on the basis of similar content and distributions of predicted statistically significant sequence motifs in their upstream regions.

Results

We applied our method to several sets of co-expressed genes and were able to define subsets with enrichment in particular biological processes and specific upstream regulatory motifs.

Conclusions

These results show the potential of our technique for functional prediction and regulatory motif identification from microarray data.
  相似文献   

3.
4.

Background

The fundamental challenge in optimally aligning homologous sequences is to define a scoring scheme that best reflects the underlying biological processes. Maximising the overall number of matches in the alignment does not always reflect the patterns by which nucleotides mutate. Efficiently implemented algorithms that can be parameterised to accommodate more complex non-linear scoring schemes are thus desirable.

Results

We present Cola, alignment software that implements different optimal alignment algorithms, also allowing for scoring contiguous matches of nucleotides in a nonlinear manner. The latter places more emphasis on short, highly conserved motifs, and less on the surrounding nucleotides, which can be more diverged. To illustrate the differences, we report results from aligning 14,100 sequences from 3' untranslated regions of human genes to 25 of their mammalian counterparts, where we found that a nonlinear scoring scheme is more consistent than a linear scheme in detecting short, conserved motifs.

Conclusions

Cola is freely available under LPGL from https://github.com/nedaz/cola.
  相似文献   

5.

Background

DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the “words” based only on the DNA sequences.

Methods

We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract “DNA words” that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods.

Results

The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary.

Conclusions

Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
  相似文献   

6.
Lyu  Chuqiao  Wang  Lei  Zhang  Juhua 《BMC genomics》2018,19(10):905-165

Background

The DNase I hypersensitive sites (DHSs) are associated with the cis-regulatory DNA elements. An efficient method of identifying DHSs can enhance the understanding on the accessibility of chromatin. Despite a multitude of resources available on line including experimental datasets and computational tools, the complex language of DHSs remains incompletely understood.

Methods

Here, we address this challenge using an approach based on a state-of-the-art machine learning method. We present a novel convolutional neural network (CNN) which combined Inception like networks with a gating mechanism for the response of multiple patterns and longterm association in DNA sequences to predict multi-scale DHSs in Arabidopsis, rice and Homo sapiens.

Results

Our method obtains 0.961 area under curve (AUC) on Arabidopsis, 0.969 AUC on rice and 0.918 AUC on Homo sapiens.

Conclusions

Our method provides an efficient and accurate way to identify multi-scale DHSs sequences by deep learning.
  相似文献   

7.
8.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

9.

Background

Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content.

Methods

To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment.

Results

Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins.

Conclusions

Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.
  相似文献   

10.

Introduction

Metabolomics has become a valuable tool in many research areas. However, generating metabolomics-based biochemical profiles without any related bioactivity is only of indirect value in understanding a biological process. Therefore, metabolomics research could greatly benefit from tools that directly determine the bioactivity of the detected compounds.

Objective

We aimed to combine LC–MS metabolomics with a cell based receptor assay. This combination could increase the understanding of biological processes and may provide novel opportunities for functional metabolomics.

Methods

We developed a flow through biosensor with human cells expressing both the TRPV1, a calcium ion channel which responds to capsaicin, and the fluorescent intracellular calcium ion reporter, YC3.6. We have analysed three contrasting Capsicum varieties. Two were selected with contrasting degrees of spiciness for characterization by HPLC coupled to high mass resolution MS. Subsequently, the biosensor was then used to link individual pepper compounds with TRPV1 activity.

Results

Among the compounds in the crude pepper fruit extracts, we confirmed capsaicin and also identified both nordihydrocapsaicin and dihydrocapsaicin as true agonists of the TRPV1 receptor. Furthermore, the biosensor was able to detect receptor activity in extracts of both Capsicum fruits as well as a commercial product. Sensitivity of the biosensor to this commercial product was similar to the sensory threshold of a human sensory panel.

Conclusion

Our results demonstrate that the TRPV1 biosensor is suitable for detecting bioactive metabolites. Novel opportunities may lie in the development of a continuous functional assay, where the biosensor is directly coupled to the LC–MS.
  相似文献   

11.

Background

Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs in IDPs using computational methods is a challenging task.

Methods

In this study, we introduce hidden Markov model (HMM) profiles to accurately identify the location of MoRFs in disordered protein sequences. Using windowing technique, HMM profiles are utilised to extract features from protein sequences and support vector machines (SVM) are used to calculate a propensity score for each residue. Two different SVM kernels with high noise tolerance are evaluated with a varying window size and the scores of the SVM models are combined to generate the final propensity score to predict MoRF residues. The SVM models are designed to extract maximal information between MoRF residues, its neighboring regions (Flanks) and the remainder of the sequence (Others).

Results

To evaluate the proposed method, its performance was compared to that of other MoRF predictors; MoRFpred and ANCHOR. The results show that the proposed method outperforms these two predictors.

Conclusions

Using HMM profile as a source of feature extraction, the proposed method indicates improvement in predicting MoRFs in disordered protein sequences.
  相似文献   

12.

Introduction

Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.

Objectives

Assessment of the quality of raw data processing in untargeted metabolomics.

Methods

Five published untargeted metabolomics studies, were reanalyzed.

Results

Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.

Conclusion

Incomplete raw data processing shows unexplored potential of current and legacy data.
  相似文献   

13.

Background

Pattern mining for biological sequences is an important problem in bioinformatics and computational biology. Biological data mining yield impact in diverse biological fields, such as discovery of co-occurring biosequences, which is important for biological data analyses. The approaches of mining sequential patterns can discover all-length motifs of biological sequences. Nevertheless, traditional approaches of mining sequential patterns inefficiently mine DNA and protein data since the data have fewer letters and lengthy sequences. Furthermore, gap constraints are important in computational biology since they cope with irrelative regions, which are not conserved in evolution of biological sequences.

Results

We devise an approach to efficiently mine sequential patterns (motifs) with gap constraints in biological sequences. The approach is the Depth-First Spelling algorithm for mining sequential patterns of biological sequences with Gap constraints (termed DFSG).

Conclusions

PrefixSpan is one of the most efficient methods in traditional approaches of mining sequential patterns, and it is the basis of GenPrefixSpan. GenPrefixSpan is an approach built on PrefixSpan with gap constraints, and therefore we compare DFSG with GenPrefixSpan. In the experimental results, DFSG mines biological sequences much faster than GenPrefixSpan.
  相似文献   

14.

Background

Taxonomic profiling of microbial communities is often performed using small subunit ribosomal RNA (SSU) amplicon sequencing (16S or 18S), while environmental shotgun sequencing is often focused on functional analysis. Large shotgun datasets contain a significant number of SSU sequences and these can be exploited to perform an unbiased SSU--based taxonomic analysis.

Results

Here we present a new program called RiboTagger that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion.

Conclusions

RiboTagger permits fast recovery of SSU-RNA sequences from shotgun nucleic acid surveys of complex microbial communities. The program targets all three domains of life, exhibits high sensitivity and specificity and is substantially faster than comparable programs.
  相似文献   

15.

Background

It is unclear to what extent pre-clinical studies in genetically homogeneous animal models of amyotrophic lateral sclerosis (ALS), an invariably fatal neurodegenerative disorder, can be informative of human pathology. The disease modifying effects in animal models of most therapeutic compounds have not been reproduced in patients. To advance therapeutics in ALS, we need easily accessible disease biomarkers which can discriminate across the phenotypic variants observed in ALS patients and can bridge animal and human pathology. Peripheral blood mononuclear cells alterations reflect the rate of progression of the disease representing an ideal biological substrate for biomarkers discovery.

Methods

We have applied TMTcalibrator?, a novel tissue-enhanced bio fluid mass spectrometry technique, to study the plasma proteome in ALS, using peripheral blood mononuclear cells as tissue calibrator. We have tested slow and fast progressing SOD1G93A mouse models of ALS at a pre-symptomatic and symptomatic stage in parallel with fast and slow progressing ALS patients at an early and late stage of the disease. Immunoassays were used to retest the expression of relevant protein candidates.

Results

The biological features differentiating fast from slow progressing mouse model plasma proteomes were different from those identified in human pathology, with only processes encompassing membrane trafficking with translocation of GLUT4, innate immunity, acute phase response and cytoskeleton organization showing enrichment in both species. Biological processes associated with senescence, RNA processing, cell stress and metabolism, major histocompatibility complex-II linked immune-reactivity and apoptosis (early stage) were enriched specifically in fast progressing ALS patients. Immunodetection confirmed regulation of the immunosenescence markers Galectin-3, Integrin beta 3 and Transforming growth factor beta-1 in plasma from pre-symptomatic and symptomatic transgenic animals while Apolipoprotein E differential plasma expression provided a good separation between fast and slow progressing ALS patients.

Conclusions

These findings implicate immunosenescence and metabolism as novel targets for biomarkers and therapeutic discovery and suggest immunomodulation as an early intervention. The variance observed in the plasma proteomes may depend on different biological patterns of disease progression in human and animal model.
  相似文献   

16.

Background

Synthetic biology aims to engineer biological systems for desired behaviors. The construction of these systems can be complex, often requiring genetic reprogramming, extensive de novo DNA synthesis, and functional screening.

Results

Herein, we present a programmable, multipurpose microfluidic platform and associated software and apply the platform to major steps of the synthetic biology research cycle: design, construction, testing, and analysis. We show the platform’s capabilities for multiple automated DNA assembly methods, including a new method for Isothermal Hierarchical DNA Construction, and for Escherichia coli and Saccharomyces cerevisiae transformation. The platform enables the automated control of cellular growth, gene expression induction, and proteogenic and metabolic output analysis.

Conclusions

Taken together, we demonstrate the microfluidic platform’s potential to provide end-to-end solutions for synthetic biology research, from design to functional analysis.
  相似文献   

17.
18.
19.

Introduction

Understanding the changes occurring in the oral ecosystem during development of gingivitis could help improve prevention and treatment strategies for oral health. Erythritol is a non-caloric polyol proposed to have beneficial effects on oral health.

Objectives

To examine the effect of experimental gingivitis and the effect of erythritol on the salivary metabolome and salivary functional biochemistry.

Methods

In a two-week experimental gingivitis challenge intervention study, non-targeted, mass spectrometry-based metabolomic profiling was performed on saliva samples from 61 healthy adults, collected at five time-points. The effect of erythritol was studied in a randomized, controlled trial setting. Fourteen salivary biochemistry variables were measured with antibody- or enzymatic activity-based assays.

Results

Bacterial amino acid catabolites (cadaverine, N-acetylcadaverine, and α-hydroxyisovalerate) and end-products of bacterial alkali-producing pathways (N-α-acetylornithine and γ-aminobutyrate) increased significantly during the experimental gingivitis. Significant changes were found in a set of 13 salivary metabolite ratios composed of host cell membrane lipids involved in cell signaling, host responses to bacteria, and defense against free radicals. An increase in mevalonate was also observed. There were no significant effects of erythritol. No significant changes were found in functional salivary biochemistry.

Conclusions

The findings underline a dynamic interaction between the host and the oral microbial biofilm during an experimental induction of gingivitis.
  相似文献   

20.

Background

Protein synthetic lethal genetic interactions are useful to define functional relationships between proteins and pathways. However, the molecular mechanism of synthetic lethal genetic interactions remains unclear.

Results

In this study we used the clusters of short polypeptide sequences, which are typically shorter than the classically defined protein domains, to characterize the functionalities of proteins. We developed a framework to identify significant short polypeptide clusters from yeast protein sequences, and then used these short polypeptide clusters as features to predict yeast synthetic lethal genetic interactions. The short polypeptide clusters based approach provides much higher coverage for predicting yeast synthetic lethal genetic interactions. Evaluation using experimental data sets showed that the short polypeptide clusters based approach is superior to the previous protein domain based one.

Conclusion

We were able to achieve higher performance in yeast synthetic lethal genetic interactions prediction using short polypeptide clusters as features. Our study suggests that the short polypeptide cluster may help better understand the functionalities of proteins.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号