共查询到20条相似文献,搜索用时 20 毫秒
1.
2.
Background
Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions).Results
Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications.Conclusion
Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.3.
4.
5.
6.
Background
A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.Results
AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.Conclusion
AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.7.
8.
Alexandre Seyer Samia Boudah Simon Broudin Christophe Junot Benoit Colsch 《Metabolomics : Official journal of the Metabolomic Society》2016,12(5):91
Introduction
Due to its proximity with the brain, cerebrospinal fluid (CSF) could be a medium of choice for the discovery of biomarkers of neurological and psychiatric diseases using untargeted analytical approaches.Objectives
This study explored the CSF lipidome in order to generate a robust mass spectral database using an untargeted lipidomic approach.Methods
Cerebrospinal fluid samples from 45 individuals were analyzed by liquid chromatography coupled to high-resolution mass spectrometry method (LC-HRMS). A dedicated data processing workflow was implemented using XCMS software and adapted filters to select reliable features. In addition, an automatic annotation using an in silico lipid database and several MS/MS experiments were performed to identify CSF lipid species.Results
Using this complete workflow, 771 analytically relevant monoisotopic lipid species corresponding to 550 unique lipids which represent five major lipid families (i.e., free fatty acids, sphingolipids, glycerophospholipids, glycerolipids, and sterol lipids) were detected and annotated. In addition, MS/MS experiments enabled to improve the annotation of 304 lipid species. Thanks to LC-HRMS, it was possible to discriminate between isobaric and also isomeric lipid species; and interestingly, our study showed that isobaric ions represent about 50 % of the total annotated lipid species in the human CSF.Conclusion
This work provides an extensive LC/HRMS database of the human CSF lipidome which constitutes a relevant foundation for future studies aimed at finding biomarkers of neurological disorders.9.
Background
Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content.Methods
To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment.Results
Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins.Conclusions
Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.10.
11.
12.
Background
The current literature establishes the importance of gene functional category and expression in promoting or suppressing duplicate gene loss after whole genome doubling in plants, a process known as fractionation. Inspired by studies that have reported gene expression to be the dominating factor in preventing duplicate gene loss, we analyzed the relative effect of functional category and expression.Methods
We use multivariate methods to study data sets on gene retention, function and expression in rosids and asterids to estimate effects and assess their interaction.Results
Our results suggest that the effect on duplicate gene retention fractionation by functional category and expression are independent and have no statistical interaction.Conclusion
In plants, functional category is the more dominant factor in explaining duplicate gene loss.13.
Nguyen Si-Tuan Hua My Ngoc Pham Thi Thu Hang Cuong Nguyen Pham Hung Van Nguyen Thuy Huong 《Annals of clinical microbiology and antimicrobials》2017,16(1):74
Background
Acinetobacter baumannii is an important nosocomial pathogen that can develop multidrug resistance. In this study, we characterized the genome of the A. baumannii strain DMS06669 (isolated from the sputum of a male patient with hospital-acquired pneumonia) and focused on identification of genes relevant to antibiotic resistance.Methods
Whole genome analysis of A. baumannii DMS06669 from hospital-acquired pneumonia patients included de novo assembly; gene prediction; functional annotation to public databases; phylogenetics tree construction and antibiotics genes identification.Results
After sequencing the A. baumannii DMS06669 genome and performing quality control, de novo genome assembly was carried out, producing 24 scaffolds. Public databases were used for gene prediction and functional annotation to construct a phylogenetic tree of the DMS06669 strain with 21 other A. baumannii strains. A total of 18 possible antibiotic resistance genes, conferring resistance to eight distinct classes of antibiotics, were identified. Eight of these genes have not previously been reported to occur in A. baumannii.Conclusions
Our results provide important information regarding mechanisms that may contribute to antibiotic resistance in the DMS06669 strain, and have implications for treatment of patients infected with A. baumannii.14.
Background
Bacterial genomes develop new mechanisms to tide them over the imposing conditions they encounter during the course of their evolution. Acquisition of new genes by lateral gene transfer may be one of the dominant ways of adaptation in bacterial genome evolution. Lateral gene transfer provides the bacterial genome with a new set of genes that help it to explore and adapt to new ecological niches.Methods
A maximum likelihood analysis was done on the five sequenced corynebacterial genomes to model the rates of gene insertions/deletions at various depths of the phylogeny.Results
The study shows that most of the laterally acquired genes are transient and the inferred rates of gene movement are higher on the external branches of the phylogeny and decrease as the phylogenetic depth increases. The newly acquired genes are under relaxed selection and evolve faster than their older counterparts. Analysis of some of the functionally characterised LGTs in each species has indicated that they may have a possible adaptive role.Conclusion
The five Corynebacterial genomes sequenced to date have evolved by acquiring between 8 – 14% of their genomes by LGT and some of these genes may have a role in adaptation.15.
16.
17.
JIGSAW,GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions 下载免费PDF全文
Background
Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to emerge that can provide near perfect levels of sensitivity and specificity at the level of whole genes. As an incremental step in this direction, it is hoped that controlled gene finding experiments in the ENCODE regions will provide a more accurate view of the relative benefits of different strategies for modeling and predicting gene structures.Results
Here we describe our general-purpose eukaryotic gene finding pipeline and its major components, as well as the methodological adaptations that we found necessary in accommodating human DNA in our pipeline, noting that a similar level of effort may be necessary by ourselves and others with similar pipelines whenever a new class of genomes is presented to the community for analysis. We also describe a number of controlled experiments involving the differential inclusion of various types of evidence and feature states into our models and the resulting impact these variations have had on predictive accuracy.Conclusion
While in the case of the non-comparative gene finders we found that adding model states to represent specific biological features did little to enhance predictive accuracy, for our evidence-based 'combiner' program the incorporation of additional evidence tracks tended to produce significant gains in accuracy for most evidence types, suggesting that improved modeling efforts at the hidden Markov model level are of relatively little value. We relate these findings to our current plans for future research.18.
Xiaoxuan Xia Haoyi Weng Ruoting Men Rui Sun Benny Chung Ying Zee Ka Chun Chong Maggie Haitian Wang 《BMC genetics》2018,19(1):78
Background
An accumulation of evidence has revealed the important role of epigenetic factors in explaining the etiopathogenesis of human diseases. Several empirical studies have successfully incorporated methylation data into models for disease prediction. However, it is still a challenge to integrate different types of omics data into prediction models, and the contribution of methylation information to prediction remains to be fully clarified.Results
A stratified drug-response prediction model was built based on an artificial neural network to predict the change in the circulating triglyceride level after fenofibrate intervention. Associated single-nucleotide polymorphisms (SNPs), methylation of selected cytosine-phosphate-guanine (CpG) sites, age, sex, and smoking status, were included as predictors. The model with selected SNPs achieved a mean 5-fold cross-validation prediction error rate of 43.65%. After adding methylation information into the model, the error rate dropped to 41.92%. The combination of significant SNPs, CpG sites, age, sex, and smoking status, achieved the lowest prediction error rate of 41.54%.Conclusions
Compared to using SNP data only, adding methylation data in prediction models slightly improved the error rate; further prediction error reduction is achieved by a combination of genome, methylation genome, and environmental factors.19.
20.