首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
新一代高通量RNA测序数据的处理与分析   总被引:4,自引:0,他引:4  
随着新一代高通量DNA测序技术的快速发展,RNA测序(RNA-seq)已成为基因表达和转录组分析新的重要手段.RNA-seq技术产生的海量数据为生物信息学带来了新的机遇和挑战.有效地对测序数据进行针对性的生物信息学处理和分析,成为RNA-seq技术能否在科学探索中发挥重大作用的关键.以新一代Illumina/Solexa测序平台所产生的数据为例,在扼要介绍高通量RNA-seq测序流程的基础上,对RNA-seq数据处理和分析的方法和现有软件做一个较为全面的综述,并对其中有待进一步研究的问题进行展望.  相似文献   

2.
3.
4.
RNA sequencing (RNA-seq) not only measures total gene expression but may also measure allele-specific gene expression in diploid individuals. RNA-seq data collected from F1 reciprocal crosses in mice can powerfully dissect strain and parent-of-origin effects on allelic imbalance of gene expression. In this article, we develop a novel statistical approach to analyze RNA-seq data from F1 and inbred strains. Method development was motivated by a study of F1 reciprocal crosses derived from highly divergent mouse strains, to which we apply the proposed method. Our method jointly models the total number of reads and the number of allele-specific reads of each gene, which significantly boosts power for detecting strain and particularly parent-of-origin effects. The method deals with the overdispersion problem commonly observed in read counts and can flexibly adjust for the effects of covariates such as sex and read depth. The X chromosome in mouse presents particular challenges. As in other mammals, X chromosome inactivation silences one of the two X chromosomes in each female cell, although the choice of which chromosome to be silenced can be highly skewed by alleles at the X-linked X-controlling element (Xce) and stochastic effects. Our model accounts for these chromosome-wide effects on an individual level, allowing proper analysis of chromosome X expression. Furthermore, we propose a genomic control procedure to properly control type I error for RNA-seq studies. A number of these methodological improvements can also be applied to RNA-seq data from other species as well as other types of next-generation sequencing data sets. Finally, we show through simulations that increasing the number of samples is more beneficial than increasing the library size for mapping both the strain and parent-of-origin effects. Unless sample recruiting is too expensive to conduct, we recommend sequencing more samples with lower coverage.  相似文献   

5.
6.
7.
Identifying somatic mutations is critical for cancer genome characterization and for prioritizing patient treatment. DNA whole exome sequencing (DNA-WES) is currently the most popular technology; however, this yields low sensitivity in low purity tumors. RNA sequencing (RNA-seq) covers the expressed exome with depth proportional to expression. We hypothesized that integrating DNA-WES and RNA-seq would enable superior mutation detection versus DNA-WES alone. We developed a first-of-its-kind method, called UNCeqR, that detects somatic mutations by integrating patient-matched RNA-seq and DNA-WES. In simulation, the integrated DNA and RNA model outperformed the DNA-WES only model. Validation by patient-matched whole genome sequencing demonstrated superior performance of the integrated model over DNA-WES only models, including a published method and published mutation profiles. Genome-wide mutational analysis of breast and lung cancer cohorts (n = 871) revealed remarkable tumor genomics properties. Low purity tumors experienced the largest gains in mutation detection by integrating RNA-seq and DNA-WES. RNA provided greater mutation signal than DNA in expressed mutations. Compared to earlier studies on this cohort, UNCeqR increased mutation rates of driver and therapeutically targeted genes (e.g. PIK3CA, ERBB2 and FGFR2). In summary, integrating RNA-seq with DNA-WES increases mutation detection performance, especially for low purity tumors.  相似文献   

8.
9.
10.
11.

Background

High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.

Results

We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.

Conclusion

Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.  相似文献   

12.
13.
Quick and accurate identification of microbial pathogens is essential for both diagnosis and response to emerging infectious diseases. The advent of next-generation sequencing technology offers an unprecedented platform for rapid sequencing-based identification of novel viruses. We have developed a customized bioinformatics data analysis pipeline, VirusHunter, for the analysis of Roche/454 and other long read Next generation sequencing platform data. To illustrate the utility of VirusHunter, we performed Roche/454 GS FLX titanium sequencing on two unclassified virus isolates from the World Reference Center for Emerging Viruses and Arboviruses (WRCEVA). VirusHunter identified sequences derived from a novel bunyavirus and a novel reovirus in the two samples respectively. Further sequence analysis demonstrated that the viruses were novel members of the Phlebovirus and Orbivirus genera. Both Phlebovirus and Orbivirus genera include many economic important viruses or serious human pathogens.  相似文献   

14.
Duchenne/Becker muscular dystrophies are the most frequent inherited neuromuscular diseases caused by mutations of the dystrophin gene. However, approximately 30 % of patients with the disease do not receive a molecular diagnosis because of the complex mutational spectrum and the large size of the gene. The introduction and use of next-generation sequencing have advanced clinical genetic research and might be a suitable method for the detection of various types of mutations in the dystrophin gene. To identify the mutational spectrum using a single platform, whole dystrophin gene sequencing was performed using next-generation sequencing. The entire dystrophin gene, including all exons, introns and promoter regions, was target enriched using a DMD whole gene enrichment kit. The enrichment libraries were sequenced on an Illumina HiSeq 2000 sequencer using paired read 100 bp sequencing. We studied 26 patients: 21 had known large deletion/duplications and 5 did not have detectable large deletion/duplications by multiplex ligation-dependent probe amplification technology (MLPA). We applied whole dystrophin gene analysis by next-generation sequencing to the five patients who did not have detectable large deletion/duplications and to five randomly chosen patients from the 21 who did have large deletion/duplications. The sequencing data covered almost 100 % of the exonic region of the dystrophin gene by ≥10 reads with a mean read depth of 147. Five small mutations were identified in the first five patients, of which four variants were unreported in the dmd.nl database. The deleted or duplicated exons and the breakpoints in the five large deletion/duplication patients were precisely identified. Whole dystrophin gene sequencing by next-generation sequencing may be a useful tool for the genetic diagnosis of Duchenne and Becker muscular dystrophies.  相似文献   

15.
16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号