期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

Yu Hu Yichuan Liu Xianyun Mao Cheng Jia Jane F. Ferguson Chenyi Xue Muredach P. Reilly Hongzhe Li Mingyao Li 《Nucleic acids research》2014,42(3):e20

Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq. 相似文献

2.

Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq

Hu M Zhu Y Taylor JM Liu JS Qin ZS 《Bioinformatics (Oxford, England)》2012,28(1):63-68

相似文献

3.

Using unclassified continuous remote sensing data to improve distribution models of red-listed plant species

Miia Parviainen Niklaus E. Zimmermann Risto K. Heikkinen Miska Luoto 《Biodiversity and Conservation》2013,22(8):1731-1754

Remote sensing (RS) data may play an important role in the development of cost-effective means for modelling, mapping, planning and conserving biodiversity. Specifically, at the landscape scale, spatial models for the occurrences of species of conservation concern may be improved by the inclusion of RS-based predictors, to help managers to better meet different conservation challenges. In this study, we examine whether predicted distributions of 28 red-listed plant species in north-eastern Finland at the resolution of 25 ha are improved when advanced RS-variables are included as unclassified continuous predictor variables, in addition to more commonly used climate and topography variables. Using generalized additive models (GAMs), we studied whether the spatial predictions of the distribution of red-listed plant species in boreal landscapes are improved by incorporating advanced RS (normalized difference vegetation index, normalized difference soil index and Tasseled Cap transformations) information into species-environment models. Models were fitted using three different sets of explanatory variables: (1) climate-topography only; (2) remote sensing only; and (3) combined climate-topography and remote sensing variables, and evaluated by four-fold cross-validation with the area under the curve (AUC) statistics. The inclusion of RS variables improved both the explanatory power (on average 8.1 % improvement) and cross-validation performance (2.5 %) of the models. Hybrid models produced ecologically more reliable distribution maps than models using only climate-topography variables, especially for mire and shore species. In conclusion, Landsat ETM+ data integrated with climate and topographical information has the potential to improve biodiversity and rarity assessments in northern landscapes, especially in predictive studies covering extensive and remote areas. 相似文献

4.

Towards pan-genome read alignment to improve variation calling

Daniel Valenzuela Tuukka Norri Niko Välimäki Esa Pitkänen Veli Mäkinen 《BMC genomics》2018,19(2):87

Background

Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation.

Results

We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC.

Conclusions

Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.

相似文献

5.

Using RPB1 sequences to improve phylogenetic inference among mushrooms (Inocybe, Agaricales)

Matheny PB Liu YJ Ammirati JF Hall BD 《American journal of botany》2002,89(4):688-698

An investigation of mushroom phylogeny using the largest subunit of RNA polymerase II gene sequences (RPB1) was conducted in comparison with nuclear ribosomal large subunit RNA gene sequences (nLSU) for the same set of taxa in the genus Inocybe (Agaricales, Basidiomycota). The two data sets, though not significantly incongruent, exhibit conflict among the placement of two taxa that exhibit long branches in the nLSU data set. In contrast, RPB1 terminal branch lengths are rather uniform. Bootstrap support is increased for clades in RPB1. Combined data sets increase the degree of confidence for several relationships. Overall, nLSU data do not yield a robust phylogeny when independently assessed by RPB1 sequences. This multigene study indicates that Inocybe is a monophyletic group composed of at least four distinct lineages-subgenus Mallocybe, section Cervicolores, section Rimosae, and subgenus Inocybe sensu Kühner, Kuyper, non Singer. Within subgenus Inocybe, two additional lineages, one composed of species with smooth basidiospores (clade I) and a second characterized by nodulose-spored species (clade II), are recovered by RPB1 and combined data. The nLSU data recover only clade I. The genera Astrosporina and Inocybella cannot be recognized phylogenetically. "Supersections" Cortinatae and Marginatae are not monophyletic groups. 相似文献

6.

Characterizing short read sequencing for gene discovery and RNA-Seq analysis in Crassostrea gigas

Mackenzie R. Gavery Steven B. Roberts 《Comparative biochemistry and physiology. Part D, Genomics & proteomics》2012,7(2):94-99

相似文献

7.

Using RNA-Seq to Profile Soybean Seed Development from Fertilization to Maturity

Sarah I. Jones Lila O. Vodkin 《PloS one》2013,8(3)

相似文献

8.

Using Synthetic Mouse Spike-In Transcripts to Evaluate RNA-Seq Analysis Tools

Dena Leshkowitz Ester Feldmesser Gilgi Friedlander Ghil Jona Elena Ainbinder Yisrael Parmet Shirley Horn-Saban 《PloS one》2016,11(4)

相似文献

9.

Robust Bayesian inference in lq-spherical models

OSIEWALSKI JACEK 《Biometrika》1993,80(2):456-460

相似文献

10.

Population intervention models in causal inference

Hubbard AE Laan MJ 《Biometrika》2008,95(1):35-47

We propose a new causal parameter, which is a natural extension of existing approaches to causal inference such as marginal structural models. Modelling approaches are proposed for the difference between a treatment-specific counterfactual population distribution and the actual population distribution of an outcome in the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population and therefore we refer to these models as population intervention models. We focus on intervention models estimating the effect of an intervention in terms of a difference and ratio of means, called risk difference and relative risk if the outcome is binary. We provide a class of inverse-probability-of-treatment-weighted and doubly-robust estimators of the causal parameters in these models. The finite-sample performance of these new estimators is explored in a simulation study. 相似文献

11.

Using species distribution models to identify suitable areas for biofuel feedstock production

JASON M. EVANS ROBERT J. FLETCHER JR. JANAKI ALAVALAPATI 《Global Change Biology Bioenergy》2010,2(2):63-78

The 2007 Energy Independence and Security Act mandates a five‐fold increase in US biofuel production by 2022. Given this ambitious policy target, there is a need for spatially explicit estimates of landscape suitability for growing biofuel feedstocks. We developed a suitability modeling approach for two major US biofuel crops, corn (Zea mays) and switchgrass (Panicum virgatum), based upon the use of two presence‐only species distribution models (SDMs): maximum entropy (Maxent) and support vector machines (SVM). SDMs are commonly used for modeling animal and plant distributions in natural environments, but have rarely been used to develop landscape models for cultivated crops. AUC, Kappa, and correlation measures derived from test data indicate that SVM slightly outperformed Maxent in modeling US corn production, although both models produced significantly accurate results. When compared with results from a mechanistic switchgrass model recently developed by Oak Ridge National Laboratory (ORNL), SVM results showed higher correlation than Maxent results with models fit using county‐scale point inputs of switchgrass production derived from expert opinion estimates. However, Maxent results for an alternative switchgrass model developed with point inputs from research trial sites showed higher correlation to the ORNL model than the corresponding results obtained from SVM. Further analysis indicates that both modeling approaches were effective in predicting county‐scale increases in corn production from 2006 to 2007, a time period in which US corn production increased by 24%. We conclude that presence‐only methods are a powerful first‐cut tool for estimating relative land suitability across geographic regions in which candidate biofuel feedstocks can be grown, and may also provide important insight into potential land‐use change patterns likely to be associated with increased biofuel demand. 相似文献

12.

Bias Correction in RNA-Seq Short-Read Counts Using Penalized Regression

David Dalpiaz Xuming He Ping Ma 《Statistics in biosciences》2013,5(1):88-99

相似文献

13.

Using hidden Markov models to analyze gene expression time course data

Schliep A Schönhuth A Steinhoff C 《Bioinformatics (Oxford, England)》2003,19(Z1):i255-i263

MOTIVATION: Cellular processes cause changes over time. Observing and measuring those changes over time allows insights into the how and why of regulation. The experimental platform for doing the appropriate large-scale experiments to obtain time-courses of expression levels is provided by microarray technology. However, the proper way of analyzing the resulting time course data is still very much an issue under investigation. The inherent time dependencies in the data suggest that clustering techniques which reflect those dependencies yield improved performance. RESULTS: We propose to use Hidden Markov Models (HMMs) to account for the horizontal dependencies along the time axis in time course data and to cope with the prevalent errors and missing values. The HMMs are used within a model-based clustering framework. We are given a number of clusters, each represented by one Hidden Markov Model from a finite collection encompassing typical qualitative behavior. Then, our method finds in an iterative procedure cluster models and an assignment of data points to these models that maximizes the joint likelihood of clustering and models. Partially supervised learning--adding groups of labeled data to the initial collection of clusters--is supported. A graphical user interface allows querying an expression profile dataset for time course similar to a prototype graphically defined as a sequence of levels and durations. We also propose a heuristic approach to automate determination of the number of clusters. We evaluate the method on published yeast cell cycle and fibroblasts serum response datasets, and compare them, with favorable results, to the autoregressive curves method. 相似文献

14.

Identification of Candidate Genes Related to Stem Development in Brassica napus Using RNA-Seq

Yuan Rong Zeng Xinhua Zhao Shengbo Wu Gang Yan Xiaohong 《Plant Molecular Biology Reporter》2019,37(4):347-364

Plant Molecular Biology Reporter - Plant stems are involved in supporting the entire plant body, thus having an important effect on the yield of oilseed rape. The current understanding of the... 相似文献

15.

Using habitat distribution models to evaluate large-scale landscape priorities for spatially dynamic species

Regan Early Barbara Anderson Chris D. Thomas 《Journal of Applied Ecology》2008,45(1):228-238

相似文献

16.

Characterization of FUS Mutations in Amyotrophic Lateral Sclerosis Using RNA-Seq

Marka van Blitterswijk Eric T. Wang Brad A. Friedman Pamela J. Keagle Patrick Lowe Ashley Lyn Leclerc Leonard H. van den Berg David E. Housman Jan H. Veldink John E. Landers 《PloS one》2013,8(4)

相似文献

17.

Correction to Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data

X Wang RJ Slebos D Wang PJ Halvey DL Tabb DC Liebler B Zhang 《Journal of proteome research》2012,11(9):4764

相似文献

18.

Pairwise likelihood methods for inference in image models 总被引：3，自引：0，他引：3

Nott DJ; Ryden T 《Biometrika》1999,86(3):661-676

相似文献

19.

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation 总被引：1，自引：0，他引：1

McCarthy DJ Chen Y Smyth GK 《Nucleic acids research》2012,40(10):4288-4297

A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses. 相似文献

20.

Genome-specific higher-order background models to improve motif detection

Marchal K Thijs G De Keersmaecker S Monsieurs P De Moor B Vanderleyden J 《Trends in microbiology》2003,11(2):61-66

Motif detection based on Gibbs sampling is a common procedure used to retrieve regulatory motifs in silico. Using a species-specific background model was previously shown to increase the robustness of the algorithm. Here, we demonstrate that selecting a non-species-adapted background model can have an adverse effect on the results of motif detection. The large differences in the average nucleotide composition of prokaryotic sequences exacerbate the problem of exchanging background models. Therefore, we have developed complex background models for all prokaryotic species with available genome sequences. 相似文献