期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mapping transcriptome-wide protein-RNA interactions to elucidate RNA regulatory programs

Molly M. Hannigan Leah L. Zagore Donny D. Licatalosi 《Quantitative Biology.》2018,6(3):228

相似文献

2.

Comprehensive simulation of metagenomic sequencing data with non-uniform sampling distribution

Shansong Liu Kui Hua Sijie Chen Xuegong Zhang 《Quantitative Biology.》2018,6(2):175

Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors. Methods: We developed a non-uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings. Results: We generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system. Conclusion: The data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios. 相似文献

3.

Protein–DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data

Feifei Cui Zilong Zhang Chen Cao Quan Zou Dong Chen Xi Su 《Proteomics》2022,22(8):2100197

With the development of artificial intelligence (AI) technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein – DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein – DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed. 相似文献

4.

NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data

Ye Yuting Li Jingyi Jessica 《BMC genomics》2016,17(1):127-140

相似文献

5.

DE-kupl: exhaustive capture of biological variation in RNA-seq data through <Emphasis Type="Italic">k</Emphasis>-mer decomposition

Jérôme Audoux Nicolas Philippe Rayan Chikhi Mikaël Salson Mélina Gallopin Marc Gabriel Jérémy Le Coz Emilie Drouineau Thérèse Commes Daniel Gautheret 《Genome biology》2017,18(1):243

相似文献

6.

Biases in small RNA deep sequencing data

Carsten A. Raabe Thean-Hock Tang Juergen Brosius Timofey S. Rozhdestvensky 《Nucleic acids research》2014,42(3):1414-1426

相似文献

7.

Towards precise reconstruction of gene regulatory networks by data integration

Zhi-Ping Liu 《Quantitative Biology.》2018,6(2):113

Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research paradigm to decipher regulatory mechanisms. So far, numerous methods have been developed for reconstructing gene regulatory networks. Results: In this paper, we provide a review of bioinformatics methods for inferring gene regulatory network from omics data. To achieve the precision reconstruction of gene regulatory networks, an intuitive alternative is to integrate these available resources in a rational framework. We also provide computational perspectives in the endeavors of inferring gene regulatory networks from heterogeneous data. We highlight the importance of multi-omics data integration with prior knowledge in gene regulatory network inferences. Conclusions: We provide computational perspectives of inferring gene regulatory networks from multiple omics data and present theoretical analyses of existing challenges and possible solutions. We emphasize on prior knowledge and data integration in network inferences owing to their abilities of identifying regulatory causality. 相似文献

8.

Bias and Correction in RNA-seq Data for Marine Species

Kai Song Li Li Guofan Zhang 《Marine biotechnology (New York, N.Y.)》2017,19(5):541-550

相似文献

9.

An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era

Zhenqiang Su Hong Fang Huixiao Hong Leming Shi Wenqian Zhang Wenwei Zhang Yanyan Zhang Zirui Dong Lee J Lancashire Marina Bessarabova Xi Yang Baitang Ning Binsheng Gong Joe Meehan Joshua Xu Weigong Ge Roger Perkins Matthias Fischer Weida Tong 《Genome biology》2014,15(12)

Background

Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment?

Results

We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined.

Conclusions

Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0523-y) contains supplementary material, which is available to authorized users. 相似文献

10.

scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets

Qianqian Song Jing Su Lance D.Miller Wei Zhang 《基因组蛋白质组与生物信息学报(英文版)》2021,19(2):330-341

In gene expression profiling studies, including single-cell RNA sequencing(sc RNA-seq)analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in sc RNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model(sc LM), a gene coclustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, sc LM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. sc LM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that sc LM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of sc LM, we apply it to our in-house and public experimental sc RNA-seq datasets. sc LM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the sc LM method is available at https://github.com/QSong-github/sc LM. 相似文献

11.

BowStrap v1.0: Assigning statistical significance to expressed genes using short-read transcriptome data

Larsen PE Collart FR 《BMC research notes》2012,5(1):275

相似文献

12.

Computational methods for transcriptome annotation and quantification using RNA-seq 总被引：2，自引：0，他引：2

Garber M Grabherr MG Guttman M Trapnell C 《Nature methods》2011,8(6):469-477

相似文献

13.

Estimating cell type composition using isoform expression one gene at a time

Hillary M. Heiling Douglas R. Wilson Naim U. Rashid Wei Sun Joseph G. Ibrahim 《Biometrics》2023,79(2):854-865

Human tissue samples are often mixtures of heterogeneous cell types, which can confound the analyses of gene expression data derived from such tissues. The cell type composition of a tissue sample may itself be of interest and is needed for proper analysis of differential gene expression. A variety of computational methods have been developed to estimate cell type proportions using gene-level expression data. However, RNA isoforms can also be differentially expressed across cell types, and isoform-level expression could be equally or more informative for determining cell type origin than gene-level expression. We propose a new computational method, IsoDeconvMM, which estimates cell type fractions using isoform-level gene expression data. A novel and useful feature of IsoDeconvMM is that it can estimate cell type proportions using only a single gene, though in practice we recommend aggregating estimates of a few dozen genes to obtain more accurate results. We demonstrate the performance of IsoDeconvMM using a unique data set with cell type–specific RNA-seq data across more than 135 individuals. This data set allows us to evaluate different methods given the biological variation of cell type–specific gene expression data across individuals. We further complement this analysis with additional simulations. 相似文献

14.

A ratiometric-based measure of gene co-expression

Anna CT Abelin Georgi K Marinov Brian A Williams Kenneth McCue Barbara J Wold 《BMC bioinformatics》2014,15(1)

相似文献

15.

A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data

Kvam VM Liu P Si Y 《American journal of botany》2012,99(2):248-256

RNA-Seq technologies are quickly revolutionizing genomic studies, and statistical methods for RNA-seq data are under continuous development. Timely review and comparison of the most recently proposed statistical methods will provide a useful guide for choosing among them for data analysis. Particular interest surrounds the ability to detect differential expression (DE) in genes. Here we compare four recently proposed statistical methods, edgeR, DESeq, baySeq, and a method with a two-stage Poisson model (TSPM), through a variety of simulations that were based on different distribution models or real data. We compared the ability of these methods to detect DE genes in terms of the significance ranking of genes and false discovery rate control. All methods compared are implemented in freely available software. We also discuss the availability and functions of the currently available versions of these software. 相似文献

16.

General statistical modeling of data from protein relative expression isobaric tags

Breitwieser FP Müller A Dayon L Köcher T Hainard A Pichler P Schmidt-Erfurth U Superti-Furga G Sanchez JC Mechtler K Bennett KL Colinge J 《Journal of proteome research》2011,10(6):2758-2766

Quantitative comparison of the protein content of biological samples is a fundamental tool of research. The TMT and iTRAQ isobaric labeling technologies allow the comparison of 2, 4, 6, or 8 samples in one mass spectrometric analysis. Sound statistical models that scale with the most advanced mass spectrometry (MS) instruments are essential for their efficient use. Through the application of robust statistical methods, we developed models that capture variability from individual spectra to biological samples. Classical experimental designs with a distinct sample in each channel as well as the use of replicates in multiple channels are integrated into a single statistical framework. We have prepared complex test samples including controlled ratios ranging from 100:1 to 1:100 to characterize the performance of our method. We demonstrate its application to actual biological data sets originating from three different laboratories and MS platforms. Finally, test data and an R package, named isobar, which can read Mascot, Phenyx, and mzIdentML files, are made available. The isobar package can also be used as an independent software that requires very little or no R programming skills. 相似文献

17.

Transcriptome analysis of bacterial pathogens in vivo: problems and solutions

Skvortsov TA Azhikina TL 《Bioorganicheskaia khimiia》2010,36(5):596-606

相似文献

18.

A case study on the detailed reproducibility of a Human Cell Atlas project

Kui Hua Xuegong Zhang 《Quantitative Biology.》2019,7(2):162

Background: Reproducibility is a defining feature of a scientific discovery. Reproducibility can be at different levels for different types of study. The purpose of the Human Cell Atlas (HCA) project is to build maps of molecular signatures of all human cell types and states to serve as references for future discoveries. Constructing such a complex reference atlas must involve the assembly and aggregation of data from multiple labs, probably generated with different technologies. It has much higher requirements on reproducibility than individual research projects. To add another layer of complexity, the bioinformatics procedures involved for single-cell data have high flexibility and diversity. There are many factors in the processing and analysis of single-cell RNA-seq data that can shape the final results in different ways. Methods: To study what levels of reproducibility can be reached in current practices, we conducted a detailed reproduction study for a well-documented recent publication on the atlas of human blood dendritic cells as an example to break down the bioinformatics steps and factors that are crucial for the reproducibility at different levels. Results: We found that the major scientific discovery can be well reproduced after some efforts, but there are also some differences in some details that may cause uncertainty in the future reference. This study provides a detailed case observation on the on-going discussions of the type of standards the HCA community should take when releasing data and publications to guarantee the reproducibility and reliability of the future atlas. Conclusion: Current practices of releasing data and publications may not be adequate to guarantee the reproducibility of HCA. We propose building more stringent guidelines and standards on the information that needs to be provided along with publications for projects that evolved in the HCA program. 相似文献

19.

XSAnno: a framework for building ortholog models in cross-species transcriptome comparisons

Ying Zhu Mingfeng Li André MM Sousa Nenad ?estan 《BMC genomics》2014,15(1)

相似文献

20.

Nutritional Systems Biology Modeling: From Molecular Mechanisms to Physiology

Albert A. de Graaf Andreas P. Freidig Baukje De Roos Neema Jamshidi Matthias Heinemann Johan A.C. Rullmann Kevin D. Hall Martin Adiels Ben van Ommen 《PLoS computational biology》2009,5(11)

The use of computational modeling and simulation has increased in many biological fields, but despite their potential these techniques are only marginally applied in nutritional sciences. Nevertheless, recent applications of modeling have been instrumental in answering important nutritional questions from the cellular up to the physiological levels. Capturing the complexity of today''s important nutritional research questions poses a challenge for modeling to become truly integrative in the consideration and interpretation of experimental data at widely differing scales of space and time. In this review, we discuss a selection of available modeling approaches and applications relevant for nutrition. We then put these models into perspective by categorizing them according to their space and time domain. Through this categorization process, we identified a dearth of models that consider processes occurring between the microscopic and macroscopic scale. We propose a “middle-out” strategy to develop the required full-scale, multilevel computational models. Exhaustive and accurate phenotyping, the use of the virtual patient concept, and the development of biomarkers from “-omics” signatures are identified as key elements of a successful systems biology modeling approach in nutrition research—one that integrates physiological mechanisms and data at multiple space and time scales. 相似文献