期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adjacent Nucleotide Dependence in ncRNA and Order-1 SCFG for ncRNA Identification

Thomas K. F. Wong Tak-Wah Lam Wing-Kin Sung Siu-Ming Yiu 《PloS one》2010,5(9)

Background

Non-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. A popular software, Infernal, is the most successful prediction tool and exhibits high sensitivity. The application of Infernal has been mainly focused on small suspected regions. We tried to apply Infernal on a chromosome level; the results have high sensitivity, yet contain many false positives. Further enhancing Infernal for chromosome level or genome wide study is desirable.

Methodology

Based on the conjecture that adjacent nucleotide dependence affects the stability of the secondary structure of an ncRNA, we first conduct a systematic study on human ncRNAs and find that adjacent nucleotide dependence in human ncRNA should be useful for identifying ncRNAs. We then incorporate this dependence in the SCFG model and develop a new order-1 SCFG model for identifying ncRNAs.

Conclusions

With respect to our experiments on human chromosomes, the proposed new model can eliminate more than 50% false positives reported by Infernal while maintaining the same sensitivity. The executable and the source code of programs are freely available at http://i.cs.hku.hk/~kfwong/order1scfg. 相似文献

2.

Structured RNAs and synteny regions in the pig genome

Christian Anthon Hakim Tafer Jakob H Havgaard Bo Thomsen Jakob Hedegaard Stefan E Seemann Sachin Pundhir Stephanie Kehr Sebastian Bartschat Mathilde Nielsen Rasmus O Nielsen Merete Fredholm Peter F Stadler Jan Gorodkin 《BMC genomics》2014,15(1)

Background

Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals.

Results

We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog).

Conclusions

We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users. 相似文献

3.

Meiotic recombination hotspots of fission yeast are directed to loci that express non-coding RNA

Wahls WP Siegel ER Davidson MK 《PloS one》2008,3(8):e2887

相似文献

4.

Identifying Cis-Regulatory Sequences by Word Profile Similarity

Garmay Leung Michael B. Eisen 《PloS one》2009,4(9)

Background

Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples.

Methodology/Principal Findings

We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila.

Conclusions/Significance

Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz. 相似文献

5.

Small non-coding RNA profiling and the role of piRNA pathway genes in the protection of chicken primordial germ cells

Deivendran Rengaraj Sang In Lee Tae Sub Park Hong Jo Lee Young Min Kim Yoon Ah Sohn Myunghee Jung Seung-Jae Noh Hojin Jung Jae Yong Han 《BMC genomics》2014,15(1)

相似文献

6.

Identification of Intermediate-Size Non-Coding RNAs Involved in the UV-Induced DNA Damage Response in C. elegans

Aqian Li Guifeng Wei Yunfei Wang Ying Zhou Xian-en Zhang Lijun Bi Runsheng Chen 《PloS one》2012,7(11)

相似文献

7.

SHEAR: sample heterogeneity estimation and assembly by reference

Sean R Landman Tae Hyun Hwang Kevin AT Silverstein Yingming Li Scott M Dehm Michael Steinbach Vipin Kumar 《BMC genomics》2014,15(1)

Background

Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis.

Results

By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.

Conclusion

SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-84) contains supplementary material, which is available to authorized users. 相似文献

8.

Identification of ncRNAs as potential therapeutic targets in multiple sclerosis through differential ncRNA – mRNA network analysis

Haritz Irizar Maider Mu?oz-Culla Matías Sáenz-Cuesta I?aki Osorio-Querejeta Lucía Sepúlveda Tamara Castillo-Trivi?o Alvaro Prada Adolfo Lopez de Munain Javier Olascoaga David Otaegui 《BMC genomics》2015,16(1)

Background

Several studies have revealed a potential role for both small nucleolar RNAs (snoRNAs) and microRNAs (miRNAs) in the physiopathology of relapsing-remitting multiple sclerosis (RRMS). This potential implication has been mainly described through differential expression studies. However, it has been suggested that, in order to extract additional information from large-scale expression experiments, differential expression studies must be complemented with differential network studies. Thus, the present work is aimed at the identification of potential therapeutic ncRNA targets for RRMS through differential network analysis of ncRNA – mRNA coexpression networks. ncRNA – mRNA coexpression networks have been constructed from both selected ncRNA (specifically miRNAs, snoRNAs and sdRNAs) and mRNA large-scale expression data obtained from 22 patients in relapse, the same 22 patients in remission and 22 healthy controls. Condition-specific (relapse, remission and healthy) networks have been built and compared to identify the parts of the system most affected by perturbation and aid the identification of potential therapeutic targets among the ncRNAs.

Results

All the coexpression networks we built present a scale-free topology and many snoRNAs are shown to have a prominent role in their architecture. The differential network analysis (relapse vs. remission vs. controls’ networks) has revealed that, although both network topology and the majority of the genes are maintained, few ncRNA – mRNA links appear in more than one network. We have selected as potential therapeutic targets the ncRNAs that appear in the disease-specific network and were found to be differentially expressed in a previous study.

Conclusions

Our results suggest that the diseased state of RRMS has a strong impact on the ncRNA – mRNA network of peripheral blood leukocytes, as a massive rewiring of the network happens between conditions. Our findings also indicate that the role snoRNAs have in targeted gene silencing is a widespread phenomenon. Finally, among the potential therapeutic target ncRNAs, SNORA40 seems to be the most promising candidate.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1396-5) contains supplementary material, which is available to authorized users. 相似文献

9.

CAR: contig assembly of prokaryotic draft genomes using rearrangements

Chin Lung Lu Kun-Tze Chen Shih-Yuan Huang Hsien-Tai Chiu 《BMC bioinformatics》2014,15(1)

Background

Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed.

Results

In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage.

Conclusions

CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0381-3) contains supplementary material, which is available to authorized users. 相似文献

10.

Searching for non-coding RNAs in genomic sequences using ncRNAscout

Bao M Cervantes Cervantes M Zhong L Wang JT 《基因组蛋白质组与生物信息学报(英文版)》2012,10(2):114-121

相似文献

11.

Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays

S Boerner KM McGinnis 《PloS one》2012,7(8):e43047

相似文献

12.

PMS: A Panoptic Motif Search Tool

Hieu Dinh Sanguthevar Rajasekaran 《PloS one》2013,8(12)

Background

Identification of DNA/Protein motifs is a crucial problem for biologists. Computational techniques could be of great help in this identification. In this direction, many computational models for motifs have been proposed in the literature.

Methods

One such important model is the motif model. In this paper we describe a motif search web tool that predominantly employs this motif model. This web tool exploits the state-of-the art algorithms for solving the motif search problem.

Results

The online tool has been helping scientists identify many unknown motifs. Many of our predictions have been successfully verified as well. We hope that this paper will expose this crucial tool to many more scientists.

Availability and requirements

Project name: PMS - Panoptic Motif Search Tool. Project home page: http://pms.engr.uconn.edu or http://motifsearch.com. Licence: PMS tools will be readily available to any scientist wishing to use it for non-commercial purposes, without restrictions. The online tool is freely available without login. 相似文献

13.

MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

Turki Turki Usman Roshan 《BMC genomics》2014,15(1)

Background

Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads even for bacteria genomes.

Results

We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap’s accuracy and runtime when mapping simulated Illumina E.coli and human chromosome one reads of different lengths and 10% to 30% mismatches with gaps to the E.coli genome and human chromosome one. We also demonstrate applications on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes.

Conclusions

We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap with high accuracy and low error much faster than if Smith-Waterman were used. On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower accuracy compared to at higher lengths. On real data MaxSSmap produces many alignments with high score and mapping quality that are not given by NextGenMap and BWA. The MaxSSmap source code in CUDA and OpenCL is freely available from http://www.cs.njit.edu/usman/MaxSSmap.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-969) contains supplementary material, which is available to authorized users. 相似文献

14.

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Federico Agostini Davide Cirillo Riccardo Delli Ponti Gian Gaetano Tartaglia 《BMC genomics》2014,15(1)

相似文献

15.

Comparative genomic analysis of clinical and environmental strains provides insight into the pathogenicity and evolution of Vibrio parahaemolyticus

Lei Li Hin-chung Wong Wenyan Nong Man Kit Cheung Patrick Tik Wan Law Kai Man Kam Hoi Shan Kwan 《BMC genomics》2014,15(1)

Background

Vibrio parahaemolyticus is a Gram-negative halophilic bacterium. Infections with the bacterium could become systemic and can be life-threatening to immunocompromised individuals. Genome sequences of a few clinical isolates of V. parahaemolyticus are currently available, but the genome dynamics across the species and virulence potential of environmental strains on a genome-scale have not been described before.

Results

Here we present genome sequences of four V. parahaemolyticus clinical strains from stool samples of patients and five environmental strains in Hong Kong. Phylogenomics analysis based on single nucleotide polymorphisms revealed a clear distinction between the clinical and environmental isolates. A new gene cluster belonging to the biofilm associated proteins of V. parahaemolyticus was found in clincial strains. In addition, a novel small genomic island frequently found among clinical isolates was reported. A few environmental strains were found harboring virulence genes and prophage elements, indicating their virulence potential. A unique biphenyl degradation pathway was also reported. A database for V. parahaemolyticus (http://kwanlab.bio.cuhk.edu.hk/vp) was constructed here as a platform to access and analyze genome sequences and annotations of the bacterium.

Conclusions

We have performed a comparative genomics analysis of clinical and environmental strains of V. parahaemolyticus. Our analyses could facilitate understanding of the phylogenetic diversity and niche adaptation of this bacterium.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1135) contains supplementary material, which is available to authorized users. 相似文献

16.

Simulated unbound structures for benchmarking of protein docking in the Dockground resource

Tatsiana Kirys Anatoly M. Ruvinsky Deepak Singla Alexander V. Tuzikov Petras J. Kundrotas Ilya A. Vakser 《BMC bioinformatics》2015,16(1)

相似文献

17.

Differential motif enrichment analysis of paired ChIP-seq experiments

Tom Lesluyes James Johnson Philip Machanick Timothy L Bailey 《BMC genomics》2014,15(1)

相似文献

18.

CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

Rachid Ounit Steve Wanamaker Timothy J Close Stefano Lonardi 《BMC genomics》2015,16(1)

相似文献

19.

YOC,A new strategy for pairwise alignment of collinear genomes

Raluca Uricaru Célia Michotey Hélène Chiapello Eric Rivals 《BMC bioinformatics》2015,16(1)

Background

Comparing and aligning genomes is a key step in analyzing closely related genomes. Despite the development of many genome aligners in the last 15 years, the problem is not yet fully resolved, even when aligning closely related bacterial genomes of the same species. In addition, no procedures are available to assess the quality of genome alignments or to compare genome aligners.

Results

We designed an original method for pairwise genome alignment, named YOC, which employs a highly sensitive similarity detection method together with a recent collinear chaining strategy that allows overlaps. YOC improves the reliability of collinear genome alignments, while preserving or even improving sensitivity. We also propose an original qualitative evaluation criterion for measuring the relevance of genome alignments. We used this criterion to compare and benchmark YOC with five recent genome aligners on large bacterial genome datasets, and showed it is suitable for identifying the specificities and the potential flaws of their underlying strategies.

Conclusions

The YOC prototype is available at https://github.com/ruricaru/YOC. It has several advantages over existing genome aligners: (1) it is based on a simplified two phase alignment strategy, (2) it is easy to parameterize, (3) it produces reliable genome alignments, which are easier to analyze and to use.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0530-3) contains supplementary material, which is available to authorized users. 相似文献

20.

Effective Automated Feature Construction and Selection for Classification of Biological Sequences

Uday Kamath Kenneth De Jong Amarda Shehu 《PloS one》2014,9(7)

Background

Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.

Methodology

We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.

Results

To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools. 相似文献