期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MCALIGN2: Faster, accurate global pairwise alignment of non-coding DNA sequences based on explicit models of indel evolution

Jun Wang Peter D Keightley Toby Johnson 《BMC bioinformatics》2006,7(1):292-15

Background

Non-coding DNA sequences comprise a very large proportion of the total genomic content of mammals, most other vertebrates, many invertebrates, and most plants. Unraveling the functional significance of non-coding DNA depends on how well we are able to align non-coding DNA sequences. However, the alignment of non-coding DNA sequences is more difficult than aligning protein-coding sequences. 相似文献

2.

RIPCAL: a tool for alignment-based analysis of repeat-induced point mutations in fungal genomic sequences

James K Hane Richard P Oliver 《BMC bioinformatics》2008,9(1):478

Background

Repeat-induced point mutation (RIP) is a fungal-specific genome defence mechanism that alters the sequences of repetitive DNA, thereby inactivating coding genes. Repeated DNA sequences align between mating and meiosis and both sequences undergo C:G to T:A transitions. In most fungi these transitions preferentially affect CpA di-nucleotides thus altering the frequency of certain di-nucleotides in the affected sequences. The majority of previously published in silico analyses were limited to the comparison of ratios of pre- and post-RIP di-nucleotides in putatively RIP-affected sequences – so-called RIP indices. The analysis of RIP is significantly more informative when comparing sequence alignments of repeated sequences. There is, however, a dearth of bioinformatics tools available to the fungal research community for alignment-based RIP analysis of repeat families. 相似文献

3.

Multiple sequence alignments of partially coding nucleic acid sequences

Roman R Stocsits Ivo L Hofacker Claudia Fried Peter F Stadler 《BMC bioinformatics》2005,6(1):160

Background

High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes. 相似文献

4.

Position specific variation in the rate of evolution in transcription factor binding sites

Alan?M?Moses Derek?Y?Chiang Manolis?Kellis Eric?S?Lander Michael?B?Eisen Email author 《BMC evolutionary biology》2003,3(1):19

相似文献

5.

Non-coding sequence retrieval system for comparative genomic analysis of gene regulatory elements

Sung Tae Doh Yunyu Zhang Matthew H Temple Li Cai 《BMC bioinformatics》2007,8(1):94

相似文献

6.

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence

Rahul Siddharthan 《BMC bioinformatics》2006,7(1):143-15

Background

Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. 相似文献

7.

VisualRepbase: an interface for the study of occurrences of transposable element families

Sébastien Tempel Matthew Jurka Jerzy Jurka 《BMC bioinformatics》2008,9(1):345

Background

Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Repbase already has software for entering new sequence families and for comparing the user's sequence with the database of consensus sequences. 相似文献

8.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf?RP?Bininda-Emonds Email author 《BMC bioinformatics》2005,6(1):156

Background

Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets. 相似文献

9.

RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure 总被引：1，自引：0，他引：1

Qi Liu Yu Yang Chun Chen Jiajun Bu Yin Zhang Xiuzi Ye 《BMC bioinformatics》2008,9(1):176

Background

With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression. 相似文献

10.

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization

Markus Bauer Gunnar W Klau Knut Reinert 《BMC bioinformatics》2007,8(1):271

Background

The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. 相似文献

11.

Clustering exact matches of pairwise sequence alignments by weighted linear regression

Alvaro J González Li Liao 《BMC bioinformatics》2008,9(1):102

Background

At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless. 相似文献

12.

Conservation of regulatory elements between two species of Drosophila

Eldon?Emberly Nikolaus?Rajewsky Eric?D?Siggia Email author 《BMC bioinformatics》2003,4(1):57

Background

One of the important goals in the post-genomic era is to determine the regulatory elements within the non-coding DNA of a given organism's genome. The identification of functional cis-regulatory modules has proven difficult since the component factor binding sites are small and the rules governing their arrangement are poorly understood. However, the genomes of suitably diverged species help to predict regulatory elements based on the generally accepted assumption that conserved blocks of genomic sequence are likely to be functional. To judge the efficacy of strategies that prefilter by sequence conservation it is important to know to what extent the converse assumption holds, namely that functional elements common to both species will fall within these conserved blocks. The recently completed sequence of a second Drosophila species provides an opportunity to test this assumption for one of the experimentally best studied regulatory networks in multicellular organisms, the body patterning of the fly embryo. 相似文献

13.

Quantization of chloroplast DNA using dot blot filter hybridization with double-stranded probes

Marvin A. Smith 《Molecular and cellular biochemistry》1984,63(2):149-156

Summary Hybridization characteristics of purified chloroplast DNA, immobilized in dot blots on nitrocellulose filters using radiolabeled chloroplast DNA restriction fragments or recombinant DNA probes were investigated. Conditions are described which provide a near linear relationship between amounts of hybridization and amounts of immobilized DNA. A standard curve constructed using such data provided a simple means for quantizing specific chloroplast DNA sequences in partially purified total DNA from protoplast extracts. Using this technique, DNA sequences corresponding to about 0.01 % of the total immobilized DNA could be detected. 相似文献

14.

Kangaroo – A pattern-matching program for biological sequences

Doron?Betel Christopher?WV?Hogue Email author 《BMC bioinformatics》2002,3(1):20

相似文献

15.

Word correlation matrices for protein sequence analysis and remote homology detection

Thomas Lingner Peter Meinicke 《BMC bioinformatics》2008,9(1):259

Background

Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive. 相似文献

16.

MatGAT: An application that generates similarity/identity matrices using protein or DNA sequences 总被引：1，自引：0，他引：1

James?J?Campanella Email author Ledion?Bitincka John?Smalley 《BMC bioinformatics》2003,4(1):29

Background

The rapid increase in the amount of protein and DNA sequence information available has become almost overwhelming to researchers. So much information is now accessible that high-quality, functional gene analysis and categorization has become a major goal for many laboratories. To aid in this categorization, there is a need for non-commercial software that is able to both align sequences and also calculate pairwise levels of similarity/identity. 相似文献

17.

Characterization of DXZ4 conservation in primates implies important functional roles for CTCF binding,array expression and tandem repeat organization on the X chromosome

McLaughlin CR Chadwick BP 《Genome biology》2011,12(4):R37

Background

Comparative sequence analysis is a powerful means with which to identify functionally relevant non-coding DNA elements through conserved nucleotide sequence. The macrosatellite DXZ4 is a polymorphic, uninterrupted, tandem array of 3-kb repeat units located exclusively on the human X chromosome. While not obviously protein coding, its chromatin organization suggests differing roles for the array on the active and inactive X chromosomes. 相似文献

18.

Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans 总被引：2，自引：0，他引：2

Vavouri T Walter K Gilks WR Lehner B Elgar G 《Genome biology》2007,8(2):R15

相似文献

19.

Efficient and accurate P-value computation for Position Weight Matrices

Hélène Touzet Jean-Stéphane Varré 《Algorithms for molecular biology : AMB》2007,2(1):15-12

Background

Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time. 相似文献

20.

The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage

Hamilton Ganesan Anna S Rakitianskaia Colin F Davenport Burkhard Tümmler Oleg N Reva 《BMC bioinformatics》2008,9(1):333

Background

Data mining in large DNA sequences is a major challenge in microbial genomics and bioinformatics. Oligonucleotide usage (OU) patterns provide a wealth of information for large scale sequence analysis and visualization. The purpose of this research was to make OU statistical analysis available as a novel web-based tool for functional genomics and annotation. The tool is also available as a downloadable package. 相似文献