共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Background
Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions).Results
Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications.Conclusion
Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.3.
4.
Background
Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression.Results
We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes.Conclusion
This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.5.
6.
7.
8.
9.
10.
Sonia Liggi Christine Hinz Zoe Hall Maria Laura Santoru Simone Poddighe John Fjeldsted Luigi Atzori Julian L. Griffin 《Metabolomics : Official journal of the Metabolomic Society》2018,14(4):52
Introduction
Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.Objectives
Merge in the same platform the steps required for metabolomics data processing.Methods
KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.Results
The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.Conclusion
KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.11.
12.
13.
Objective
To selectively enrich an electrogenic mixed consortium capable of utilizing dark fermentative effluents as substrates in microbial fuel cells and to further enhance the power outputs by optimization of influential anodic operational parameters.Results
A maximum power density of 1.4 W/m3 was obtained by an enriched mixed electrogenic consortium in microbial fuel cells using acetate as substrate. This was further increased to 5.43 W/m3 by optimization of influential anodic parameters. By utilizing dark fermentative effluents as substrates, the maximum power densities ranged from 5.2 to 6.2 W/m3 with an average COD removal efficiency of 75% and a columbic efficiency of 10.6%.Conclusion
A simple strategy is provided for selective enrichment of electrogenic bacteria that can be used in microbial fuel cells for generating power from various dark fermentative effluents.14.
15.
Background
The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information.Methods
We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis.Results
Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests.Conclusions
The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.16.
Jack W. KentJr 《BMC genetics》2016,17(Z2):S5
Background
New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation andpenalties for multiple testing.Methods
The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge.Results
Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data.Conclusions
The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.17.
Alexandre Seyer Samia Boudah Simon Broudin Christophe Junot Benoit Colsch 《Metabolomics : Official journal of the Metabolomic Society》2016,12(5):91
Introduction
Due to its proximity with the brain, cerebrospinal fluid (CSF) could be a medium of choice for the discovery of biomarkers of neurological and psychiatric diseases using untargeted analytical approaches.Objectives
This study explored the CSF lipidome in order to generate a robust mass spectral database using an untargeted lipidomic approach.Methods
Cerebrospinal fluid samples from 45 individuals were analyzed by liquid chromatography coupled to high-resolution mass spectrometry method (LC-HRMS). A dedicated data processing workflow was implemented using XCMS software and adapted filters to select reliable features. In addition, an automatic annotation using an in silico lipid database and several MS/MS experiments were performed to identify CSF lipid species.Results
Using this complete workflow, 771 analytically relevant monoisotopic lipid species corresponding to 550 unique lipids which represent five major lipid families (i.e., free fatty acids, sphingolipids, glycerophospholipids, glycerolipids, and sterol lipids) were detected and annotated. In addition, MS/MS experiments enabled to improve the annotation of 304 lipid species. Thanks to LC-HRMS, it was possible to discriminate between isobaric and also isomeric lipid species; and interestingly, our study showed that isobaric ions represent about 50 % of the total annotated lipid species in the human CSF.Conclusion
This work provides an extensive LC/HRMS database of the human CSF lipidome which constitutes a relevant foundation for future studies aimed at finding biomarkers of neurological disorders.18.
Discovery of A-type procyanidin dimers in yellow raspberries by untargeted metabolomics and correlation based data analysis 总被引:1,自引:0,他引:1
Elisabete Carvalho Pietro Franceschi Antje Feller Lorena Herrera Luisa Palmieri Panagiotis Arapitsas Samantha Riccadonna Stefan Martens 《Metabolomics : Official journal of the Metabolomic Society》2016,12(9):144
Introduction
Raspberries are becoming increasingly popular due to their reported health beneficial properties. Despite the presence of only trace amounts of anthocyanins, yellow varieties seems to show similar or better effects in comparison to conventional raspberries.Objectives
The aim of this work is to characterize the metabolic differences between red and yellow berries, focussing on the compounds showing a higher concentration in yellow varieties.Methods
The metabolomic profile of 13 red and 12 yellow raspberries (of different varieties, locations and collection dates) was determined by UPLC–TOF-MS. A novel approach based on Pearson correlation on the extracted ion chromatograms was implemented to extract the pseudospectra of the most relevant biomarkers from high energy LC–MS runs. The raw data will be made publicly available on MetaboLights (MTBLS333).Results
Among the metabolites showing higher concentration in yellow raspberries it was possible to identify a series of compounds showing a pseudospectrum similar to that of A-type procyanidin polymers. The annotation of this group of compounds was confirmed by specific MS/MS experiments and performing standard injections.Conclusions
In berries lacking anthocyanins the polyphenol metabolism might be shifted to the formation of a novel class of A-type procyanidin polymers.19.