首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Along with computational approaches, NGS led technologies have caused a major impact upon the discoveries made in the area of miRNA biology, including novel miRNAs identification. However, to this date all microRNA discovery tools compulsorily depend upon the availability of reference or genomic sequences. Here, for the first time a novel approach, miReader, has been introduced which could discover novel miRNAs without any dependence upon genomic/reference sequences. The approach used NGS read data to build highly accurate miRNA models, molded through a Multi-boosting algorithm with Best-First Tree as its base classifier. It was comprehensively tested over large amount of experimental data from wide range of species including human, plants, nematode, zebrafish and fruit fly, performing consistently with >90% accuracy. Using the same tool over Illumina read data for Miscanthus, a plant whose genome is not sequenced; the study reported 21 novel mature miRNA duplex candidates. Considering the fact that miRNA discovery requires handling of high throughput data, the entire approach has been implemented in a standalone parallel architecture. This work is expected to cause a positive impact over the area of miRNA discovery in majority of species, where genomic sequence availability would not be a compulsion any more.  相似文献   

2.
MicroRNA (miRNA) expression profiling has proven useful in diagnosing and understanding the development and progression of several diseases. Microarray is the standard method for analyzing miRNA expression profiles; however, it has several disadvantages, including its limited detection of miRNAs. In recent years, advances in genome sequencing have led to the development of next-generation sequencing (NGS) technologies, which significantly advance genome sequencing speed and discovery. In this study, we compared the expression profiles obtained by next generation sequencing (NGS) with the profiles created using microarray to assess if NGS could produce a more accurate and complete miRNA profile. Total RNA from 14 hepatocellular carcinoma tumors (HCC) and 6 matched non-tumor control tissues were sequenced with Illumina MiSeq 50-bp single-end reads. Micro RNA expression profiles were estimated using miRDeep2 software. As a comparison, miRNA expression profiles for 11 out of 14 HCCs were also established by microarray (Agilent human microRNA microarray). The average total sequencing exceeded 2.2 million reads per sample and of those reads, approximately 57% mapped to the human genome. The average correlation for miRNA expression between microarray and NGS and subtraction were 0.613 and 0.587, respectively, while miRNA expression between technical replicates was 0.976. The diagnostic accuracy of HCC, p-value, and AUC were 90.0%, 7.22×10−4, and 0.92, respectively. In summary, NGS created an miRNA expression profile that was reproducible and comparable to that produced by microarray. Moreover, NGS discovered novel miRNAs that were otherwise undetectable by microarray. We believe that miRNA expression profiling by NGS can be a useful diagnostic tool applicable to multiple fields of medicine.  相似文献   

3.
With the development of next-generation sequencing (NGS) techniques, many software tools have emerged for the discovery of novel microRNAs (miRNAs) and for analyzing the miRNAs expression profiles. An overall evaluation of these diverse software tools is lacking. In this study, we evaluated eight software tools based on their common feature and key algorithms. Three deep-sequencing data sets were collected from different species and used to assess the computational time, sensitivity and accuracy of detecting known miRNAs as well as their capacity for predicting novel miRNAs. Our results provide useful information for researchers to facilitate their selection of the optimal software tools for miRNA analysis depending on their specific requirements, i.e. novel miRNAs discovery or miRNA expression profile analysis of sequencing data sets.  相似文献   

4.
The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html.  相似文献   

5.
6.
ABSTRACT: BACKGROUND: Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions). RESULTS: We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent 'read-backmapping' to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach. CONCLUSIONS: We recommend applying our general 'two-step' mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.  相似文献   

7.
Parallel analysis of RNA ends (PARE) is a technique utilizing high-throughput sequencing to profile uncapped, mRNA cleavage or decay products on a genome-wide basis. Tools currently available to validate miRNA targets using PARE data employ only annotated genes, whereas important targets may be found in unannotated genomic regions. To handle such cases and to scale to the growing availability of PARE data and genomes, we developed a new tool, ‘sPARTA’ (small RNA-PARE target analyzer) that utilizes a built-in, plant-focused target prediction module (aka ‘miRferno’). sPARTA not only exhibits an unprecedented gain in speed but also it shows greater predictive power by validating more targets, compared to a popular alternative. In addition, the novel ‘seed-free’ mode, optimized to find targets irrespective of complementarity in the seed-region, identifies novel intergenic targets. To fully capitalize on the novelty and strengths of sPARTA, we developed a web resource, ‘comPARE’, for plant miRNA target analysis; this facilitates the systematic identification and analysis of miRNA-target interactions across multiple species, integrated with visualization tools. This collation of high-throughput small RNA and PARE datasets from different genomes further facilitates re-evaluation of existing miRNA annotations, resulting in a ‘cleaner’ set of microRNAs.  相似文献   

8.
MicroRNA profiling represents an important first-step in deducting individual RNA-based regulatory function in a cell, tissue, or at a specific developmental stage. Currently there are several different platforms to choose from in order to make the initial miRNA profiles. In this study we investigate recently developed digital microRNA high-throughput technologies. Four different platforms were compared including next generation SOLiD ligation sequencing and Illumina HiSeq sequencing, hybridization-based NanoString nCounter, and miRCURY locked nucleic acid RT-qPCR. For all four technologies, full microRNA profiles were generated from human cell lines that represent noninvasive and invasive tumorigenic breast cancer. This study reports the correlation between platforms, as well as a more extensive analysis of the accuracy and sensitivity of data generated when using different platforms and important consideration when verifying results by the use of additional technologies. We found all the platforms to be highly capable for microRNA analysis. Furthermore, the two NGS platforms and RT-qPCR all have equally high sensitivity, and the fold change accuracy is independent of individual miRNA concentration for NGS and RT-qPCR. Based on these findings we propose new guidelines and considerations when performing microRNA profiling.  相似文献   

9.
10.
As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.  相似文献   

11.
Yang  Yang  Xu  Zhuangdi  Song  Dandan 《BMC bioinformatics》2016,17(1):109-116
Missing values are commonly present in microarray data profiles. Instead of discarding genes or samples with incomplete expression level, missing values need to be properly imputed for accurate data analysis. The imputation methods can be roughly categorized as expression level-based and domain knowledge-based. The first type of methods only rely on expression data without the help of external data sources, while the second type incorporates available domain knowledge into expression data to improve imputation accuracy. In recent years, microRNA (miRNA) microarray has been largely developed and used for identifying miRNA biomarkers in complex human disease studies. Similar to mRNA profiles, miRNA expression profiles with missing values can be treated with the existing imputation methods. However, the domain knowledge-based methods are hard to be applied due to the lack of direct functional annotation for miRNAs. With the rapid accumulation of miRNA microarray data, it is increasingly needed to develop domain knowledge-based imputation algorithms specific to miRNA expression profiles to improve the quality of miRNA data analysis. We connect miRNAs with domain knowledge of Gene Ontology (GO) via their target genes, and define miRNA functional similarity based on the semantic similarity of GO terms in GO graphs. A new measure combining miRNA functional similarity and expression similarity is used in the imputation of missing values. The new measure is tested on two miRNA microarray datasets from breast cancer research and achieves improved performance compared with the expression-based method on both datasets. The experimental results demonstrate that the biological domain knowledge can benefit the estimation of missing values in miRNA profiles as well as mRNA profiles. Especially, functional similarity defined by GO terms annotated for the target genes of miRNAs can be useful complementary information for the expression-based method to improve the imputation accuracy of miRNA array data. Our method and data are available to the public upon request.  相似文献   

12.
Meng F  Hackenberg M  Li Z  Yan J  Chen T 《PloS one》2012,7(3):e34394
MicroRNAs (miRNAs) are small non-coding RNAs that regulate a variety of biological processes. The latest version of the miRBase database (Release 18) includes 1,157 mouse and 680 rat mature miRNAs. Only one new rat mature miRNA was added to the rat miRNA database from version 16 to version 18 of miRBase, suggesting that many rat miRNAs remain to be discovered. Given the importance of rat as a model organism, discovery of the completed set of rat miRNAs is necessary for understanding rat miRNA regulation. In this study, next generation sequencing (NGS), microarray analysis and bioinformatics technologies were applied to discover novel miRNAs in rat kidneys. MiRanalyzer was utilized to analyze the sequences of the small RNAs generated from NGS analysis of rat kidney samples. Hundreds of novel miRNA candidates were examined according to the mappings of their reads to the rat genome, presence of sequences that can form a miRNA hairpin structure around the mapped locations, Dicer cleavage patterns, and the levels of their expression determined by both NGS and microarray analyses. Nine novel rat hairpin precursor miRNAs (pre-miRNA) were discovered with high confidence. Five of the novel pre-miRNAs are also reported in other species while four of them are rat specific. In summary, 9 novel pre-miRNAs (14 novel mature miRNAs) were identified via combination of NGS, microarray and bioinformatics high-throughput technologies.  相似文献   

13.
Terai G  Komori T  Asai K  Kin T 《RNA (New York, N.Y.)》2007,13(12):2081-2090
The identification of novel miRNAs has significant biological and clinical importance. However, none of the known miRNA features alone is sufficient for accurately detecting novel miRNAs. The aim of this paper is to integrate these features in a straightforward manner for detecting miRNAs with better accuracy. Since most miRNA regions are highly conserved among vertebrates for the ability to form stable hairpin structures, we implemented a hidden Markov model that outputs multidimensional feature vectors composed of both evolutionary features and secondary structural ones. The proposed method, called miRRim, outperformed existing ones in terms of detection/prediction performance: The total number of predictions was smaller than with existing methods when the number of miRNAs detected was adjusted to be the same. Moreover, there were several candidates predicted only by our method that are clustered with the known miRNAs, suggesting that our method is able to detect novel miRNAs. Genomic coordinates of predicted miRNA can be obtained from http://mirrim.ncrna.org/.  相似文献   

14.
Mapping small reads to genome reference is an essential and more common approach to identify microRNAs (miRNAs) in an organism. Using closely related species genomes as proxy references can facilitate miRNA expression studies in non-model species that their genomes are not available. However, the level of error this introduces is mostly unknown, as this is the result of evolutionary distance between the proxy reference and the species of interest. To evaluate the accuracy of miRNA discovery pipelines in non-model organisms, small RNA library data from a mosquito, Aedes aegypti, were mapped to three well annotated insect genomes as proxy references using miRanalyzer with two strict and loose mapping criteria. In addition, another web-based miRNA discovery pipeline (DSAP) was used as a control for program performance. Using miRanalyzer, more than 80% reduction was observed in the number of mapped reads using strict criterion when proxy genome references were used; however, only 20% reduction was recorded for mapped reads to other species known mature miRNA datasets. Except a few changes in ranking, mapping criteria did not make any significant differences in the profile of the most abundant miRNAs in A. aegypti when its original or a proxy genome was used as reference. However, more variation was observed in miRNA ranking profile when DSAP was used as analysing tool. Overall, the results also suggested that using a proxy reference did not change the most abundant miRNAs’ differential expression profiles when infected or non-infected libraries were compared. However, usage of a proxy reference could provide about 67% of the original outcome from more extremely up- or down-regulated miRNA profiles. Although using closely related species genome incurred some losses in the number of miRNAs, the most abundant miRNAs along with their differential expression profile would be acceptable based on the sensitivity level of each project.  相似文献   

15.
《Genomics》2020,112(5):3201-3206
Identification of microRNAs from plants is a crucial step for understanding the mechanisms of pathways and regulation of genes. A number of tools have been developed for the detection of microRNAs from small RNA-seq data. However, there is a lack of pipeline for detection of miRNA from EST dataset even when a huge resource is publicly available and the method is known. Here we present miRDetect, a python implementation to detect novel miRNA precursors from plant EST data using homology and machine learning approach. 10-fold cross validation was applied to choose best classifier based on ROC, accuracy, MCC and F1-scores using 112 features. miRDetect achieved a classification accuracy of 93.35% on a Random Forest classifier and outperformed other precursor detection tools in terms of performance. The miRDetect pipeline aids in identifying novel plant precursors using a mixed approach and will be helpful to researchers with less informatics background.  相似文献   

16.
二代测序技术的涌现推动了基因组学研究,特别是在疾病相关的遗传变异研究中发挥了重要作用.虽然大多数遗传变异类型都可以借助于各种二代测序分析工具进行检测,但是仍然存在局限性,比如短串联重复序列的长度变异.许多遗传疾病是由短串联重复序列的长度扩张导致的,尤其是亨廷顿病等多种神经系统疾病.然而,现在几乎没有工具能够利用二代测序检测长度大于测序读长的短串联重复序列变异.为了突破这一限制,我们开发了一个全新的方法,该方法基于双末端二代测序辨识短串联重复序列长度变异,并可估计其扩张长度,将其应用于一项基于全外显子组测序的运动神经元疾病临床研究中,成功地鉴定出致病的短串联重复序列长度扩张.该方法首次原创性地利用测序读长覆盖深度特征来解决短串联重复序列变异检测问题,在人类遗传疾病研究中具有广泛的应用价值,并且对于其他二代测序分析方法的开发具有启发性意义.  相似文献   

17.
We developed a novel method for identifying SNPs widely distributed throughout the coding and non-coding regions of a genome. The method uses large-scale parallel pyrosequencing technology in combination with bioinformatics tools. We used this method to generate approximately 23,000 candidate SNPs throughout the Macaca mulatta genome. We estimate that over 60% of the SNPs will be of high frequency and useful for mapping QTLs, genetic management, and studies of individual relatedness, whereas other less frequent SNPs may be useful as population specific markers for ancestry identification. We have created a web resource called MamuSNP to view the SNPs and associated information online. This resource will also be useful for researchers using a wide variety of Macaca species in their research.  相似文献   

18.
19.
20.
Mining long noncoding RNA in livestock   总被引:2,自引:0,他引:2       下载免费PDF全文
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号