首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Benchmarking tools for the alignment of functional noncoding DNA   总被引:1,自引:0,他引:1  

Background

Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools.

Results

Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25–3.0 substitutions per site.

Conclusion

For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.
  相似文献   

2.
The vast majority of the mammalian genome does not code for proteins, and a fundamental question in genomics is: What proportion of the noncoding mammalian genome is functional? Most attempts to address this issue use sequence comparisons between highly diverged mammals such as human and mouse to identify conservation due to negative selection. But such comparisons will underestimate the true proportion of functional noncoding DNA if there is turnover, if patterns of negative selection change over time. Here we test whether the inferred level of negative selection differs between different pairwise species comparisons. Using a multiple alignment of more than a megabase of contiguous sequence from eight mammalian species, we find a strong negative relationship between inferred levels of negative selection and pairwise divergence using 21 pairwise comparisons. This result suggests that there is a high rate of turnover of functional noncoding elements in the mammalian genome, so measures of functional constraint based on human-mouse comparisons may seriously underestimate the true value.  相似文献   

3.
4.
5.
Projection neurons are the commonest neuronal type in the mammalian forebrain and their individual characterization is a crucial step to understand how neural circuitry operates. These cells have an axon whose arborizations extend over long distances, branching in complex patterns and/or in multiple brain regions. Axon length is a principal estimate of the functional impact of the neuron, as it directly correlates with the number of synapses formed by the axon in its target regions; however, its measurement by direct 3D axonal tracing is a slow and labor-intensive method. On the contrary, axon length estimations have been recently proposed as an effective and accessible alternative, allowing a fast approach to the functional significance of the single neuron. Here, we analyze the accuracy and efficiency of the most used length estimation tools—design-based stereology by virtual planes or spheres, and mathematical correction of the 2D projected-axon length—in contrast with direct measurement, to quantify individual axon length. To this end, we computationally simulated each tool, applied them over a dataset of 951 3D-reconstructed axons (from NeuroMorpho.org), and compared the generated length values with their 3D reconstruction counterparts. The evaluated reliability of each axon length estimation method was then balanced with the required human effort, experience and know-how, and economic affordability. Subsequently, computational results were contrasted with measurements performed on actual brain tissue sections. We show that the plane-based stereological method balances acceptable errors (~5%) with robustness to biases, whereas the projection-based method, despite its accuracy, is prone to inherent biases when implemented in the laboratory. This work, therefore, aims to provide a constructive benchmark to help guide the selection of the most efficient method for measuring specific axonal morphologies according to the particular circumstances of the conducted research.  相似文献   

6.
7.
8.
Tandem-repetitive noncoding DNA: forms and forces   总被引:8,自引:1,他引:7  
A model of sequence-dependent, unequal crossing-over and gene amplification (slippage replication) has been stimulated in order to account for various structural features of tandemly repeated DNA sequences. It is shown that DNA whose sequence is not maintained by natural selection will exhibit repetitive patterns over a wide range of recombination rates as a result of the interaction of unequal crossing-over and slippage replication, processes that depend on sequence similarity. At high crossing-over frequencies, the nucleotide patterns generated in the simulations are simple and highly regular, with short, nearly identical sequences repeated in tandem. Decreasing recombination rates increase the tendency to longer and more-complex repeat units. Periodicities have been observed down to very low recombination rates (one or more orders of magnitude lower than mutation rate). At such low rates, most of the sequences contain repeats which have an extensive substructure and a high degree of heterogeneity among each other; often higher-order structures are superimposed on a tandem array. These results are compared with various structural properties of tandemly repeated DNAs known from eukaryotes, the spectrum ranging from simple-sequence DNAs, particularly the hypervariable mini-satellites, to the classical satellite DNAs, located in chromosomal regions of low recombination, e.g., heterochromatin.  相似文献   

9.
Noncoding DNA in eukaryotes encodes functionally important signals for the regulation of chromosome assembly, DNA replication, and gene expression. The increasing availability of whole-genome sequences of related taxa has led to interest in the evolution of these signals, and the phylogenetic footprints they produce. Cis-regulatory sequences controlling gene expression are often conserved among related species, but are rarely conserved between distantly related taxa. Several experimentally characterized regulatory elements have failed to show sequence similarity even between closely related species.  相似文献   

10.

Background  

Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy.  相似文献   

11.
12.
Over the past decade, a battery of powerful tools that encompass forward and reverse genetic approaches have been developed to dissect the molecular and cellular processes that regulate development and disease. The advent of genetically-encoded fluorescent proteins that are expressed in wild type and mutant mice, together with advances in imaging technology, make it possible to study these biological processes in many dimensions. Importantly, these technologies allow direct visual access to complex events as they happen in their native environment, which provides greater insights into mammalian biology than ever before.  相似文献   

13.
Awareness of the complex structure and evolutionary dynamics of noncoding DNA has improved both noncoding sequence alignment and the use of microstructural changes as characters in phylogenetic analysis. The next step is to consider improvements in the use and selection of phylogenetic models for noncoding sequence data. Models of character evolution are central to phylogeny estimation, but the use of an inadequate model can mislead topology selection and branch length estimations. This is particularly likely when sequence divergence is either limited (nearly invariable, as in population-level or species-level studies) or extreme (nearly saturated, as in deep-level studies that focus on conserved secondary structures). Noncoding data sets are often at these extremes, and they can be particularly awkward for model definition and model selection. This paper introduces the goals of model use in phylogenetics and identifies ten issues that arise from the application of models to noncoding sequence data. It is concluded that most of these issues derive from small data set sizes, very low or very high sequence variability, limitations of current phylogenetic models, and possibly character definition and nonindependence. Recommendations are made that should help to improve alignment, character quality, model selection, and phylogeny estimation based on noncoding sequence data.  相似文献   

14.
15.

Background

With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor.

Results

We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected.

Conclusion

We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism.
  相似文献   

16.
17.
While the deciphering of basic sequence information on a genomic scale is yielding complete genomic sequences in ever-shorter intervals, experimental procedures for elucidating the cellular effects and consequences of the DNA-encoded information become critical for further analyses. In recent years, DNA microarray technology has emerged as a prime candidate for the performance of many such functional assays. Technically, array technology has come a long way since its conception some 15 years ago, initially designed as a means for large-scale mapping and sequencing.The basic arrangement, however, could be adapted readily to serve eventually as an analytical tool in a large variety of applications. On their own or in combination with other methods, microarrays open up many new avenues of functional analysis.  相似文献   

18.
随着高通量测序技术的快速发展,下一代测序技术也迅速发展为生物领域中的主流技术,而理解下一代测序数据最重要的一步是比对。比对是进行后续生物信息分析的基石,也因此催生了很多比对软件。本文主要选取了四种常用的比对软件Bowtie2、BWA、MAQ和SOAP2,对这四种软件及算法进行综述,并通过实际测序数据对四种软件进行比较和评估,为生物学研究者选择最佳的短序列比对软件提供理论和实践依据。  相似文献   

19.
20.
The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号