Liquid Chromatography Mass Spectrometry (LC-MS) is a powerful and widely applied method for the study of biological systems, biomarker discovery and pharmacological interventions. LC-MS measurements are, however, significantly complicated by several technical challenges, including: (1) ionisation suppression/enhancement, disturbing the correct quantification of analytes, and (2) the detection of large amounts of separate derivative ions, increasing the complexity of the spectra, but not their information content. Here we introduce an experimental and analytical strategy that leads to robust metabolome profiles in the face of these challenges. Our method is based on rigorous filtering of the measured signals based on a series of sample dilutions. Such data sets have the additional characteristic that they allow a more robust assessment of detection signal quality for each metabolite. Using our method, almost 80% of the recorded signals can be discarded as uninformative, while important information is retained. As a consequence, we obtain a broader understanding of the information content of our analyses and a better assessment of the metabolites detected in the analyzed data sets. We illustrate the applicability of this method using standard mixtures, as well as cell extracts from bacterial samples. It is evident that this method can be applied in many types of LC-MS analyses and more specifically in untargeted metabolomics.
Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon database searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of database searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when searches lacked spectrum to sequence matching specificity. In sequence-similarity searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent database searches and improved the identification of low-abundance proteins. 相似文献
MPP is a Java application, encompassing both new and established algorithms, for the analysis of gene and marker content datasets arising from high-throughput microarray techniques. MPP analyses flat file output from microarray experiments to determine the probability of the presence or absence of genes or markers within a genome. MPP can construct gene or marker content datasets for a number of genomes and can use the data to estimate an evolutionary tree or network. Results from gene content analyses may be validated by comparing them to known gene contents. MPP was initially developed to analyse data derived from comparative genome hybridization (CGH) microarray experiments in fungi and bacteria. It has recently been adapted to analyse retrotransposon-based insertion polymorphism (RBIP) marker scores derived from tagged microarray marker (TAM) experiments in pea. New analytical procedures may be added easily to MPP as plugins in order to increase the scope of the software. AVAILABILITY: MPP source code, executables and online help are available at http://cbr.jic.ac.uk/dicks/software/ 相似文献
ABSTRACT: BACKGROUND: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution. RESULTS: We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set. CONCLUSIONS: These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses. 相似文献
microRNAs are short RNAs that reduce gene expression by binding to their targets. The accurate prediction of microRNA targets is essential to understanding the function of microRNAs. Computational predictions indicate that all human genes may be regulated by microRNAs, with each microRNA possibly targeting thousands of genes. Here we discuss computational methods for identifying mammalian microRNA targets and refining them for further experimental validation. We describe microRNA target prediction resources and procedures and how they integrate with various types of experimental techniques that aim to validate them or further explore their function. We also provide a list of target prediction databases and explain how these are curated. 相似文献
Observation of a novel food processing technique is reported for captive zoo gorillas (Gorilla g. gorilla). It is similar in function to that of Japanese macaques' wheat placer mining behaviour and consists of puffing/blowing air
with the mouth onto a mixture of oat grains and chaff in order to separate out the oat grains. Three females in two of four
groups regularly use this behaviour. Other individuals in these groups or individuals of the two other groups in the same
zoo do not use it. However, a very similar behaviour has been observed in three other individuals in a gorilla group of another
zoo. The existence of this technique in spatially separated groups implies that multiple individuals have invented it for
themselves. The possible role of social transmission is still to be investigated. 相似文献
Members of the Alu Yc1 subfamily are distinguished from the older Alu Y subfamily by a signature G-->A substitution at base 148 of their 281-bp consensus sequence. Members of the much older and larger Alu Y subfamily could have by chance accumulated this signature G-->A substitution and be misclassified as belonging to the Alu Yc1 subfamily. Using a Mahanalobis classification method, it was estimated that the "authentic" Alu Yc1 subfamily consists of approximately 262 members in the human genome. PCR amplification and further analysis was successfully completed on 225 of the Yc1 Alu family members. One hundred and seventy-seven Yc1 Alu elements were determined to be monomorphic (fixed for presence) in a panel of diverse human genomes. Forty-eight of the Yc1 Alu elements were polymorphic for insertion presence/absence in diverse human genomes. The insertion polymorphism rate of 21% in the human genome is similar to rates reported previously for other "young" Alu subfamilies. The polymorphic Yc1 Alu elements will be useful genetic loci for the study of human population genetics. 相似文献
Current models for the export of messenger RNA share the notion that the highly abundant class of nuclear RNA-binding proteins--the hnRNP proteins--have a key role in exporting RNA. But recent studies have led to a new understanding of several non-hnRNP proteins, including SR proteins and the conserved mRNA export factor ALY, which are recruited to the mRNA during pre-mRNA splicing. These studies, together with older work on hnRNP particles and assembly of the spliceosome, lead us to a new view of mRNA export. In our model, the non-hnRNP factors form a splicing-dependent mRNP complex that specifically targets mature mRNA for export, while hnRNP proteins retain introns in the nucleus. A machinery that is conserved between yeast and higher eukaryotes functions to export the mRNA. 相似文献
Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck.
Results
To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS – Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets.
Conclusions
RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0503-6) contains supplementary material, which is available to authorized users. 相似文献
Direct gradient analyses in spatial genetics provide unique opportunities to describe the inherent complexity of genetic variation in wildlife species and are the object of many methodological developments. However, multicollinearity among explanatory variables is a systemic issue in multivariate regression analyses and is likely to cause serious difficulties in properly interpreting results of direct gradient analyses, with the risk of erroneous conclusions, misdirected research and inefficient or counterproductive conservation measures. Using simulated data sets along with linear and logistic regressions on distance matrices, we illustrate how commonality analysis (CA), a detailed variance‐partitioning procedure that was recently introduced in the field of ecology, can be used to deal with nonindependence among spatial predictors. By decomposing model fit indices into unique and common (or shared) variance components, CA allows identifying the location and magnitude of multicollinearity, revealing spurious correlations and thus thoroughly improving the interpretation of multivariate regressions. Despite a few inherent limitations, especially in the case of resistance model optimization, this review highlights the great potential of CA to account for complex multicollinearity patterns in spatial genetics and identifies future applications and lines of research. We strongly urge spatial geneticists to systematically investigate commonalities when performing direct gradient analyses. 相似文献
The advent of massive parallel sequencing of immunopurified chromatin and its determinants has provided new avenues for researchers to map epigenome-wide changes and there is tremendous interest to uncover regulatory signatures to understand fundamental questions associated with chromatin structure and function. Indeed, the rapid development of large genome annotation projects has seen a resurgence in chromatin immunoprecipitation (ChIP) based protocols which are used to distinguish protein interactions coupled with large scale sequencing (Seq) to precisely map epigenome-wide interactions. Despite some of the great advances in our understanding of chromatin modifying complexes and their determinants, the development of ChIP-Seq technologies also pose specific demands on the integration of data for visualization, manipulation and analysis. In this article we discuss some of the considerations for experimental design planning, quality control, and bioinformatic analysis. The key aspects of post sequencing analysis are the identification of regions of interest, differentiation between biological conditions and the characterization of sequence differences for chromatin modifications. We provide an overview of best-practise approaches with background information and considerations of integrative analysis from ChIP-Seq experiments. 相似文献
Photosynthetic eukaryotes contain primary, secondary or tertiary plastids, depending on the source of the organelle (a cyanobacterium or a photosynthetic eukaryote). Plastid phylogeny is relatively well investigated, but molecular phylogenies have conflicted as a function of gene choice, taxon-representations, and analytical method. To better understand the influences of these variables, we performed analyses of a multi-gene data set based on 62 plastid-associated genes of 15 taxa representing the major plastid lineages. In an attempt to distinguish phylogenetic signal from non-phylogenetic patterns, we analyzed the data using a wide range of phylogenetic methods and examined the effect of covarion evolution and compositional bias. The data suggest that the chlorophyll c-containing plastids are monophyletic and acquired their plastids from the red algae after the emergence of the Cyanidiales. The relationships among chl c-containing plastids are particularly hard to resolve. This is the largest data set used for this purpose; the analyses show that cryptophyte plastids are sister to other chl c-containing plastids, and haptophyte and peridinin-containing dinoflagellate plastids are closely related. 相似文献
Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with
web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and
meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS) makes
it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data
processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting,
processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. 相似文献