首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
4.
We report here identification and validation of the first papillomavirus encoded microRNAs expressed in human cervical lesions and cell lines. We established small RNA libraries from ten human papillomavirus associated cervical lesions including cancer and two human papillomavirus harboring cell lines. These libraries were sequenced using SOLiD 4 technology. We used the sequencing data to predict putative viral microRNAs and discovered nine putative papillomavirus encoded microRNAs. Validation was performed for five candidates, four of which were successfully validated by qPCR from cervical tissue samples and cell lines: two were encoded by HPV 16, one by HPV 38 and one by HPV 68. The expression of HPV 16 microRNAs was further confirmed by in situ hybridization, and colocalization with p16INK4A was established. Prediction of cellular target genes of HPV 16 encoded microRNAs suggests that they may play a role in cell cycle, immune functions, cell adhesion and migration, development, and cancer. Two putative viral target sites for the two validated HPV 16 miRNAs were mapped to the E5 gene, one in the E1 gene, two in the L1 gene and one in the LCR region. This is the first report to show that papillomaviruses encode their own microRNA species. Importantly, microRNAs were found in libraries established from human cervical disease and carcinoma cell lines, and their expression was confirmed in additional tissue samples. To our knowledge, this is also the first paper to use in situ hybridization to show the expression of a viral microRNA in human tissue.  相似文献   

5.
Constructing mixtures of tagged or bar-coded DNAs for sequencing is an important requirement for the efficient use of next-generation sequencers in applications where limited sequence data are required per sample. There are many applications in which next-generation sequencing can be used effectively to sequence large mixed samples; an example is the characterization of microbial communities where ≤1,000 sequences per samples are adequate to address research questions. Thus, it is possible to examine hundreds to thousands of samples per run on massively parallel next-generation sequencers. However, the cost savings for efficient utilization of sequence capacity is realized only if the production and management costs associated with construction of multiplex pools are also scalable. One critical step in multiplex pool construction is the normalization process, whereby equimolar amounts of each amplicon are mixed. Here we compare three approaches (spectroscopy, size-restricted spectroscopy, and quantitative binding) for normalization of large, multiplex amplicon pools for performance and efficiency. We found that the quantitative binding approach was superior and represents an efficient scalable process for construction of very large, multiplex pools with hundreds and perhaps thousands of individual amplicons included. We demonstrate the increased sequence diversity identified with higher throughput. Massively parallel sequencing can dramatically accelerate microbial ecology studies by allowing appropriate replication of sequence acquisition to account for temporal and spatial variations. Further, population studies to examine genetic variation, which require even lower levels of sequencing, should be possible where thousands of individual bar-coded amplicons are examined in parallel.Emergent technologies that generate DNA sequence data are designed primarily to perform resequencing projects at reasonable cost. The result is a substantial decrease in per base costs from traditional methods. However, these next-generation platforms do not readily accommodate projects that require obtaining moderate amounts of sequence from large numbers of samples. These platforms also have per run costs that are significant and generally preclude large numbers of single-sample, nonmultiplexed runs. One example of research that is not readily supported is rRNA-directed metagenomics study of some human clinical samples or environmental rRNA analysis of samples from communities with low community diversity that require only thousands of sequences. Thus, strategies to utilize next-generation DNA sequencers efficiently for applications that require lower throughput are critical to capitalize on the efficiency and cost benefits of next-generation sequencing platforms.Directed metagenomics based on amplification of rRNA genes is an important tool to characterize microbial communities in various environmental and clinical settings. In diverse environmental samples, large numbers of sequences are required to fully characterize the microbial communities (15). However, a lower number of sequences is generally adequate to answer specific research questions. In addition, the levels of diversity in human clinical samples are usually lower than what is observed in environmental samples (for example, see reference 7).The Roche 454 genome sequencer system FLX pyrosequencer (which we will refer to as 454 FLX hereafter) is the most useful platform for rRNA-directed metagenomics because it currently provides the longest read lengths of any next-generation sequencing platform (1, 14). Computational analysis has shown that the 250-nucleotide read length (available from the 454 FLX-LR chemistry) is adequate for identification of bacteria if the amplified region is properly positioned within variable regions of the small-subunit rRNA (SSU-rRNA) gene (9, 10).In this study, we used the 454 FLX-LR genome sequencing platform and chemistry, which provides >400,000 sequences of ∼250 bp per run. After we conducted this study, a new reagent set (454 FLX-XLR titanium chemistry) was released, which further increases reads to >1,000,000 and read lengths to >400 bp (Roche). The 454 FLX platform dramatically reduces per base costs of obtaining sequence, and physical separation into between 2 and 16 lanes is available; this physical separation on the plate reduces sequencing output overall, up to 40% comparing 2 lanes versus 16 lanes. For applications where modest sequencing depth (∼1,000 sequences per sample) is adequate to address research questions, physical separation does not allow adequate sample multiplexing because even a 1/16 454 FLX-LR plate run is expected to produce ∼15,000 reads. Further, the utility of the platform as a screening tool at 16-plex is limited by cost per run.A solution to make next-generation sequencing economical for projects such as rRNA-directed metagenomics is to use bar-coded primers to multiplex amplicon pools so they can be sequenced together and computationally separated afterward (6). To successfully accomplish this strategy, precise normalization of the DNA concentrations of the individual amplicons in the multiplex pools is essential for effective multiplex sequencing when large numbers of pooled samples are sequenced in parallel. There are several potential methods available for normalizing concentrations of amplicons included in multiplex pools, but the relative and absolute performance of each approach has not been compared.In this study, we present a direct quantitative comparison of three available methods for amplicon pool normalization for downstream next-generation sequencing. The central goal of the study was to identify the most effective method for normalizing multiplex pools containing >100 individual amplicons. We evaluated each pooling approach by 454 sequencing and compared the observed frequencies of sequences from different pooled bar-coded amplicons. From these data, we determined the efficacy of each method based on the following factors: (i) how well normalized the sequences within the pool were, (ii) the proportion of samples failing to meet a minimum threshold of sequences per sample, and (iii) the overall efficiency (speed and labor required) of the process to multiplex samples.  相似文献   

6.
The high‐throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for lllumina sequencing machines with exclusion amplification chemistry. This may make use of these platforms prohibitive, particularly in studies that rely on low‐quantity and low‐quality samples, such as historical and archaeological specimens. Here, we use barcodes, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100‐year old museum‐preserved gorilla (Gorilla beringei) samples. Correcting for multiple sources of noise, we identify on average 0.470% of reads containing a hopped index. We show that sample‐specific quantity of misassigned reads depends on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misassigned reads. This particularly affects ancient DNA samples, as these frequently differ in their DNA quantity and endogenous content. Through simulations we show that even low rates of index hopping, as reported here, can lead to biases in ancient DNA studies when multiplexing samples with vastly different quantities of endogenous material.  相似文献   

7.
8.
Formalin fixing with paraffin embedding (FFPE) has been a standard sample preparation method for decades, and archival FFPE samples are still very useful resources. Nonetheless, the use of FFPE samples in cancer genome analysis using next-generation sequencing, which is a powerful technique for the identification of genomic alterations at the nucleotide level, has been challenging due to poor DNA quality and artificial sequence alterations. In this study, we performed whole-exome sequencing of matched frozen samples and FFPE samples of tissues from 4 cancer patients and compared the next-generation sequencing data obtained from these samples. The major differences between data obtained from the 2 types of sample were the shorter insert size and artificial base alterations in the FFPE samples. A high proportion of short inserts in the FFPE samples resulted in overlapping paired reads, which could lead to overestimation of certain variants; >20% of the inserts in the FFPE samples were double sequenced. A large number of soft clipped reads was found in the sequencing data of the FFPE samples, and about 30% of total bases were soft clipped. The artificial base alterations, C>T and G>A, were observed in FFPE samples only, and the alteration rate ranged from 200 to 1,200 per 1M bases when sequencing errors were removed. Although high-confidence mutation calls in the FFPE samples were compatible to that in the frozen samples, caution should be exercised in terms of the artifacts, especially for low-confidence calls. Despite the clearly observed artifacts, archival FFPE samples can be a good resource for discovery or validation of biomarkers in cancer research based on whole-exome sequencing.  相似文献   

9.
Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal.  相似文献   

10.
11.
MicroRNA (miRNA) expression profiling has proven useful in diagnosing and understanding the development and progression of several diseases. Microarray is the standard method for analyzing miRNA expression profiles; however, it has several disadvantages, including its limited detection of miRNAs. In recent years, advances in genome sequencing have led to the development of next-generation sequencing (NGS) technologies, which significantly advance genome sequencing speed and discovery. In this study, we compared the expression profiles obtained by next generation sequencing (NGS) with the profiles created using microarray to assess if NGS could produce a more accurate and complete miRNA profile. Total RNA from 14 hepatocellular carcinoma tumors (HCC) and 6 matched non-tumor control tissues were sequenced with Illumina MiSeq 50-bp single-end reads. Micro RNA expression profiles were estimated using miRDeep2 software. As a comparison, miRNA expression profiles for 11 out of 14 HCCs were also established by microarray (Agilent human microRNA microarray). The average total sequencing exceeded 2.2 million reads per sample and of those reads, approximately 57% mapped to the human genome. The average correlation for miRNA expression between microarray and NGS and subtraction were 0.613 and 0.587, respectively, while miRNA expression between technical replicates was 0.976. The diagnostic accuracy of HCC, p-value, and AUC were 90.0%, 7.22×10−4, and 0.92, respectively. In summary, NGS created an miRNA expression profile that was reproducible and comparable to that produced by microarray. Moreover, NGS discovered novel miRNAs that were otherwise undetectable by microarray. We believe that miRNA expression profiling by NGS can be a useful diagnostic tool applicable to multiple fields of medicine.  相似文献   

12.
13.
14.
The retinal vascular endothelium is essential for angiogenesis and is involved in maintaining barrier selectivity and vascular tone. The aim of this study was to identify and quantify microRNAs and other small regulatory non-coding RNAs (ncRNAs) which may regulate these crucial functions. Primary bovine retinal microvascular endothelial cells (RMECs) provide a well-characterized in vitro system for studying angiogenesis. RNA extracted from RMECs was used to prepare a small RNA library for deep sequencing (Illumina Genome Analyzer). A total of 6.8 million reads were mapped to 250 known microRNAs in miRBase (release 16). In many cases, the most frequent isomiR differed from the sequence reported in miRBase. In addition, five novel microRNAs, 13 novel bovine orthologs of known human microRNAs and multiple new members of the miR-2284/2285 family were detected. Several ~30 nucleotide sno-miRNAs were identified, with the most highly expressed being derived from snoRNA U78. Highly expressed microRNAs previously associated with endothelial cells included miR-126 and miR-378, but the most highly expressed was miR-21, comprising more than one-third of all mapped reads. Inhibition of miR-21 with an LNA inhibitor significantly reduced proliferation, migration, and tube-forming capacity of RMECs. The independence from prior sequence knowledge provided by deep sequencing facilitates analysis of novel microRNAs and other small RNAs. This approach also enables quantitative evaluation of microRNA expression, which has highlighted the predominance of a small number of microRNAs in RMECs. Knockdown of miR-21 suggests a role for this microRNA in regulation of angiogenesis in the retinal microvasculature.  相似文献   

15.
16.
Using multiple parallel sequencing on Illumina platform, we identified eight microRNAs that showed significant opposite changes of gene expression in cells of the hormone-sensitive LNCaP prostate cancer cell line and in cells of the hormone-resistant DU-145 cell line, in comparison to the microRNA expression in the normal prostate tissue cells. We found that the insulin-like growth factor 1 receptor (IGF1R) gene is a target of five microRNAs whose expression is increased in LNCaP cells and reduced in DU-145 cells.  相似文献   

17.
18.
19.
20.

Background

MicroRNAs are required for maintenance of pluripotency as well as differentiation, but since more microRNAs have been computationally predicted in genome than have been found, there are likely to be undiscovered microRNAs expressed early in stem cell differentiation.

Methodology/Principal Findings

SOLiD ultra-deep sequencing identified >107 unique small RNAs from human embryonic stem cells (hESC) and neural-restricted precursors that were fit to a model of microRNA biogenesis to computationally predict 818 new microRNA genes. These predicted genomic loci are associated with chromatin patterns of modified histones that are predictive of regulated gene expression. 146 of the predicted microRNAs were enriched in Ago2-containing complexes along with 609 known microRNAs, demonstrating association with a functional RISC complex. This Ago2 IP-selected subset was consistently expressed in four independent hESC lines and exhibited complex patterns of regulation over development similar to previously-known microRNAs, including pluripotency-specific expression in both hESC and iPS cells. More than 30% of the Ago2 IP-enriched predicted microRNAs are new members of existing families since they share seed sequences with known microRNAs.

Conclusions/Significance

Extending the classic definition of microRNAs, this large number of new microRNA genes, the majority of which are less conserved than their canonical counterparts, likely represent evolutionarily recent regulators of early differentiation. The enrichment in Ago2 containing complexes, the presence of chromatin marks indicative of regulated gene expression, and differential expression over development all support the identification of 146 new microRNAs active during early hESC differentiation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号