共查询到20条相似文献,搜索用时 203 毫秒
1.
Background
Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages.Results
Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9) change/site/year) was approximately half of the overall rate (1.9–2.0 × 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%.Conclusion
This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.2.
Background
Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.Result
Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.Conclusion
Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.Reviewers
This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.3.
4.
Rachel A. Spicer Christoph Steinbeck 《Metabolomics : Official journal of the Metabolomic Society》2018,14(1):16
Introduction
Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.Objectives
(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.Methods
A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.Results
Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.Conclusion
Further efforts are required to improve data sharing in metabolomics.5.
Background
For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.Results
We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.Conclusion
RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.6.
Background
Existing clustering approaches for microarray data do not adequately differentiate between subsets of co-expressed genes. We devised a novel approach that integrates expression and sequence data in order to generate functionally coherent and biologically meaningful subclusters of genes. Specifically, the approach clusters co-expressed genes on the basis of similar content and distributions of predicted statistically significant sequence motifs in their upstream regions.Results
We applied our method to several sets of co-expressed genes and were able to define subsets with enrichment in particular biological processes and specific upstream regulatory motifs.Conclusions
These results show the potential of our technique for functional prediction and regulatory motif identification from microarray data.7.
Introduction
Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.Objectives
Assessment of the quality of raw data processing in untargeted metabolomics.Methods
Five published untargeted metabolomics studies, were reanalyzed.Results
Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.Conclusion
Incomplete raw data processing shows unexplored potential of current and legacy data.8.
Background
This study estimates atrial repolarization activities (Ta waves), which are typically hidden most of the time from body surface electrocardiography when diagnosing cardiovascular diseases. The morphology of Ta waves has been proven to be an important marker for the early sign of inferior injury, such as acute atrial infarction, or arrhythmia, such as atrial fibrillation. However, Ta waves are usually unseen except during conduction system malfunction, such as long QT interval or atrioventricular block. Therefore, justifying heart diseases based on atrial repolarization becomes impossible in sinus rhythm.Methods
We obtain TMPs in the atrial part of the myocardium which reflects the correct excitation sequence starting from the atrium to the end of the apex.Results
The resulting TMP shows the hidden atrial part of ECG waves.Conclusions
This extraction makes many diseases, such as acute atrial infarction or arrhythmia, become easily diagnosed.9.
Sonia Liggi Christine Hinz Zoe Hall Maria Laura Santoru Simone Poddighe John Fjeldsted Luigi Atzori Julian L. Griffin 《Metabolomics : Official journal of the Metabolomic Society》2018,14(4):52
Introduction
Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.Objectives
Merge in the same platform the steps required for metabolomics data processing.Methods
KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.Results
The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.Conclusion
KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.10.
Background
There are now several ways to generate fluorescent fusion proteins by randomly inserting DNA encoding the Green Fluorescent Protein (GFP) into another protein's coding sequence. These approaches can be used to map regions in a protein that are permissive for GFP insertion or to create novel biosensors. While remarkably useful, the current insertional strategies have two major limitations: (1) they only produce one kind, or color, of fluorescent fusion protein and (2) one half of all GFP insertions within the target coding sequence are in the wrong orientation.Results
We have overcome these limitations by incorporating two different fluorescent proteins coding sequences in a single transposon, either in tandem or antiparallel. Our initial tests targeted two mammalian integral membrane proteins: the voltage sensitive motor, Prestin, and an ER ligand gated Ca2+ channel (IP3R).Conclusions
These new designs increase the efficiency of random fusion protein generation in one of two ways: (1) by creating two different fusion proteins from each insertion or (2) by being independent of orientation.11.
Background
In recent years the visualization of biomagnetic measurement data by so-called pseudo current density maps or Hosaka-Cohen (HC) transformations became popular.Methods
The physical basis of these intuitive maps is clarified by means of analytically solvable problems.Results
Examples in magnetocardiography, magnetoencephalography and magnetoneurography demonstrate the usefulness of this method.Conclusion
Hardware realizations of the HC-transformation and some similar transformations are discussed which could advantageously support cross-platform comparability of biomagnetic measurements.12.
Benjamin H Natelson Roxann Intriligator Neil S Cherniack Helena K Chandler Julian M Stewart 《Dynamic medicine : DM》2007,6(1):2
Context
Patients with chronic fatigue syndrome and those with orthostatic intolerance share many symptoms, yet questions exist as to whether CFS patients have physiological evidence of orthostatic intolerance.Objective
To determine if some CFS patients have increased rates of orthostatic hypotension, hypertension, tachycardia, or hypocapnia relative to age-matched controls.Design
Assess blood pressure, heart rate, respiratory rate, end tidal CO2 and visual analog scales for orthostatic symptoms when supine and when standing for 8 minutes without moving legs.Setting
Referral practice and research center.Participants
60 women and 15 men with CFS and 36 women and 4 men serving as age matched controls with analyses confined to 62 patients and 35 controls showing either normal orthostatic testing or a physiological abnormal test.Main outcome measures
Orthostatic tachycardia; orthostatic hypotension; orthostatic hypertension; orthostatic hypocapnia or combinations thereof.Results
CFS patients had higher rates of abnormal tests than controls (53% vs 20%, p < .002), but rates of orthostatic tachycardia, orthostatic hypotension, and orthostatic hypertension did not differ significantly between patients and controls (11.3% vs 5.7%, 6.5% vs 2.9%, 19.4% vs 11.4%, respectively). In contrast, rates of orthostatic hypocapnia were significantly higher in CFS than in controls (20.6% vs 2.9%, p < .02). This CFS group reported significantly more feelings of illness and shortness of breath than either controls or CFS patients with normal physiological tests.Conclusion
A substantial number of CFS patients have orthostatic intolerance in the form of orthostatic hypocapnia. This allows subgrouping of patients with CFS and thus reduces patient pool heterogeneity engendered by use of a clinical case definition.13.
Background
Nucleosomes are DNA-histone complex, each wrapping about 150 pairs of double-stranded DNA. Their function is fundamental for one of the primary functions of Chromatin i.e. packing the DNA into the nucleus of the Eukaryote cells. Several biological studies have shown that the nucleosome positioning influences the regulation of cell type-specific gene activities. Moreover, computational studies have shown evidence of sequence specificity concerning the DNA fragment wrapped into nucleosomes, clearly underlined by the organization of particular DNA substrings. As the main consequence, the identification of nucleosomes on a genomic scale has been successfully performed by computational methods using a sequence features representation.Results
In this work, we propose a deep learning model for nucleosome identification. Our model stacks convolutional layers and Long Short-term Memories to automatically extract features from short- and long-range dependencies in a sequence. Using this model we are able to avoid the feature extraction and selection steps while improving the classification performances.Conclusions
Results computed on eleven data sets of five different organisms, from Yeast to Human, show the superiority of the proposed method with respect to the state of the art recently presented in the literature.14.
Dimitrios J. Floros Paul R. Jensen Pieter C. Dorrestein Nobuhiro Koyama 《Metabolomics : Official journal of the Metabolomic Society》2016,12(9):145
Introduction
Natural products from culture collections have enormous impact in advancing discovery programs for metabolites of biotechnological importance. These discovery efforts rely on the metabolomic characterization of strain collections.Objective
Many emerging approaches compare metabolomic profiles of such collections, but few enable the analysis and prioritization of thousands of samples from diverse organisms while delivering chemistry specific read outs.Method
In this work we utilize untargeted LC–MS/MS based metabolomics together with molecular networking to inventory the chemistries associated with 1000 marine microorganisms.Result
This approach annotated 76 molecular families (a spectral match rate of 28 %), including clinically and biotechnologically important molecules such as valinomycin, actinomycin D, and desferrioxamine E. Targeting a molecular family produced primarily by one microorganism led to the isolation and structure elucidation of two new molecules designated maridric acids A and B.Conclusion
Molecular networking guided exploration of large culture collections allows for rapid dereplication of know molecules and can highlight producers of uniques metabolites. These methods, together with large culture collections and growing databases, allow for data driven strain prioritization with a focus on novel chemistries.15.
Background
There is a considerable literature on the source of the thermostability of proteins from thermophilic organisms. Understanding the mechanisms for this thermostability would provide insights into proteins generally and permit the design of synthetic hyperstable biocatalysts.Results
We have systematically tested a large number of sequence and structure derived quantities for their ability to discriminate thermostable proteins from their non-thermostable orthologs using sets of mesophile-thermophile ortholog pairs. Most of the quantities tested correspond to properties previously reported to be associated with thermostability. Many of the structure related properties were derived from the Delaunay tessellation of protein structures.Conclusions
Carefully selected sequence based indices discriminate better than purely structure based indices. Combined sequence and structure based indices improve performance somewhat further. Based on our analysis, the strongest contributors to thermostability are an increase in ion pairs on the protein surface and a more strongly hydrophobic interior.16.
Yves Frank Tomas Hruz Thomas Tschager Valentin Venzin 《Algorithms for molecular biology : AMB》2018,13(1):14
Background
Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of reconstructing the amino acid sequences of a peptide from this measurement data. Past de novo sequencing algorithms solely consider the mass spectrum of the fragments for reconstructing a sequence.Results
We propose to additionally exploit the information obtained from liquid chromatography. We study the problem of computing a sequence that is not only in accordance with the experimental mass spectrum, but also with the chromatographic retention time. We consider three models for predicting the retention time and develop algorithms for de novo sequencing for each model.Conclusions
Based on an evaluation for two prediction models on experimental data from synthesized peptides we conclude that the identification rates are improved by exploiting the chromatographic information. In our evaluation, we compare our algorithms using the retention time information with algorithms using the same scoring model, but not the retention time.17.
N. Cesbron A.-L. Royer Y. Guitton A. Sydor B. Le Bizec G. Dervilly-Pinel 《Metabolomics : Official journal of the Metabolomic Society》2017,13(8):99
Introduction
Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.Objectives
In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.Methods
The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.Results
A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.Conclusion
The workflow generated repeatable and informative fingerprints for robust metabolome characterization.18.
Background
Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics.Results
We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy.Conclusion
HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp.19.