首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.

Results

Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels (splids) that define an approximate bipartition of the taxon set. We show both in simulated data and in case studies on real-life data that splids can be efficiently extracted from phylogenomic data sets.

Conclusions

Suitably processed gap patterns extracted from genome-wide alignment provide a surprisingly clear phylogenetic signal and an allow the inference of accurate phylogenetic trees.
  相似文献   

2.

Background

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment.

Methods

We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments.

Results and conclusions

The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
  相似文献   

3.
Benchmarking tools for the alignment of functional noncoding DNA   总被引:1,自引:0,他引:1  

Background

Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools.

Results

Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25–3.0 substitutions per site.

Conclusion

For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.
  相似文献   

4.
5.

Background

In proteomics studies, liquid chromatography coupled to mass spectrometry (LC-MS) has proven to be a powerful technology to investigate differential expression of proteins/peptides that are characterized by their peak intensities, mass-to-charge ratio (m/z), and retention time (RT). The variable complexity of peptide mixtures and occasional drifts lead to substantial variations in m/z and RT dimensions. Thus, label-free differential protein expression studies by LC-MS technology require alignment with respect to both RT and m/z to ensure that same proteins/peptides are compared from multiple runs.

Methods

In this study, we propose a new strategy to align LC-MALDI-TOF data by combining quality threshold cluster analysis and support vector regression. Our method performs alignment on the basis of measurements in three dimensions (RT, m/z, intensity).

Results and conclusions

We demonstrate the suitability of our proposed method for alignment of LC-MALDI-TOF data through a previously published spike-in dataset and a new in-house generated spike-in dataset. A comparison of our method with other methods that utilize only RT and m/z dimensions reveals that the use of intensity measurements enhances alignment performance.
  相似文献   

6.

Background

Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method.

Results

Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure.

Conclusion

We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
  相似文献   

7.

Background

A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.

Results

AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.

Conclusion

AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.
  相似文献   

8.

Background

Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences.

Results

In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art.
  相似文献   

9.

Objectives

To screen the phylogenetically-nearest members of Cellulosimicrobium cellulans for the production of cellulosome-like multienzyme complexes and extracellular β-xylosidase activity against 7-xylosyltaxanes and to get corresponding molecular insights.

Results

Cellulosimicrobium (family Promicromonosporaceae) and all genera of the family Cellulomonadeceaec produced both cellulosome-like multienzyme complexes and extracellular β-xylosidase activity, while the other genera of the family Promicromonosporaceae did not. Multiple sequence alignments further indicated that hypothetic protein M768_06655 might be a possible key subunit.

Conclusion

This is the first report that many actinobacteria species can produce cellulosome-like multienzyme complexes. The production of cellulosome-like complexes and the extracellular β-xylosidase activity against 7-xylosyltaxanes might be used to differentiate the genus Cellulosimicrobium from other genera of the family Promicromonosporaceae.
  相似文献   

10.

Background

Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families.

Results

The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function.

Conclusions

Our results demonstrate that the method we present here using a k- modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
  相似文献   

11.

Objectives

To express a Δ6-desaturase gene and produce gamma-linolenic acid (GLA) and stearidonic acid (SDA) in prokaryotic expression system (Escherichia coli), and analyze its substrate specificity in the omega-3 fatty acid biosynthetic pathway.

Results

Full-length ORF (1448 bp) of Δ6Des-Iso was isolated from Isochrysis sp. and characterized using multiple sequence alignment, phylogenetic analysis, transmembrane domain, and protein tertiary structure. Δ6Des-Iso is a front-end desaturase consisting of three conserved histidine domains and a cytochrome b5 domain. Δ6Des-Iso was cloned and expressed in E. coli with the production of GLA and SDA. Recombinant E. coli utilized 27 and 8% of exogenously supplied alpha-linolenic acid (ALA) and linoleic acid (LA) to produce 6.3% of SDA and 2.3% of GLA, respectively, suggesting that isolated Δ6Des-Iso is specific to the omega-3 pathway.

Conclusion

For the first time production of GLA and SDA in a prokaryotic system was achieved.
  相似文献   

12.
13.

Key message

The heterodimer formation between B-class MADS-box proteins of GsAP3a and GsPI2 proteins plays a core role for petal formation in Japanese gentian plants.

Abstract

We previously isolated six B-class MADS-box genes (GsAP3a, GsAP3b, GsTM6, GsPI1, GsPI2, and GsPI3) from Japanese gentian (Gentiana scabra). To study the roles of these MADS-box genes in determining floral organ identities, we investigated protein–protein interactions among them and produced transgenic Arabidopsis and gentian plants overexpressing GsPI2 alone or in combination with GsAP3a or GsTM6. Yeast two-hybrid and bimolecular fluorescence complementation analyses revealed that among the GsPI proteins, GsPI2 interacted with both GsAP3a and GsTM6, and that these heterodimers were localized to the nuclei. The heterologous expression of GsPI2 partially converted sepals into petaloid organs in transgenic Arabidopsis, and this petaloid conversion phenomenon was accelerated by combined expression with GsAP3a but not with GsTM6. In contrast, there were no differences in morphology between vector-control plants and transgenic Arabidopsis plants expressing GsAP3a or GsTM6 alone. Transgenic gentian ectopically expressing GsPI2 produced an elongated tubular structure that consisted of an elongated petaloid organ in the first whorl and stunted inner floral organs. These results imply that the heterodimer formation between GsPI2 and GsAP3a plays a core role in determining petal and stamen identities in Japanese gentian, but other B-function genes might be important for the complete development of petal organs.
  相似文献   

14.

Background

Previous studies have revealed that the C-terminal region of the S-layer protein from Lactobacillus is responsible for the cell wall anchoring, which provide an approach for targeting heterologous proteins to the cell wall of lactic acid bacteria (LAB). In this study, we developed a new surface display system in lactic acid bacteria with the C-terminal region of S-layer protein SlpB of Lactobacillus crispatus K2-4-3 isolated from chicken intestine.

Results

Multiple sequence alignment revealed that the C-terminal region (LcsB) of Lb. crispatus K2-4-3 SlpB had a high similarity with the cell wall binding domains SA and CbsA of Lactobacillus acidophilus and Lb. crispatus. To evaluate the potential application as an anchoring protein, the green fluorescent protein (GFP) or beta-galactosidase (Gal) was fused to the N-terminus of the LcsB region, and the fused proteins were successfully produced in Escherichia coli, respectively. After mixing them with the non-genetically modified lactic acid bacteria cells, the fused GFP-LcsB and Gal-LcsB were functionally associated with the cell surface of various lactic acid bacteria tested. In addition, the binding capacity could be improved by SDS pretreatment. Moreover, both of the fused proteins could simultaneously bind to the surface of a single cell. Furthermore, when the fused DNA fragment of gfp:lcsB was inserted into the Lactococcus lactis expression vector pSec:Leiss:Nuc, the GFP could not be secreted into the medium under the control of the nisA promoter. Western blot, in-gel fluorescence assay, immunofluorescence microscopy and SDS sensitivity analysis confirmed that the GFP was successfully expressed onto the cell surface of L. lactis with the aid of the LcsB anchor.

Conclusion

The LcsB region can be used as a functional scaffold to target the heterologous proteins to the cell surfaces of lactic acid bacteria in vitro and in vivo, and has also the potential for biotechnological application.
  相似文献   

15.

Background

In antibody purification processes, the acidic buffer commonly used to elute the bound antibodies during conventional affinity chromatograph, can damage the antibody. Herein we describe the development of several types of affinity ligands which enable the purification of antibodies under much milder conditions.

Results

Staphylococcal protein A variants were engineered by using both structure-based design and combinatorial screening methods. The frequency of amino acid residue substitutions was statistically analyzed using the sequences isolated from a histidine-scanning library screening. The positions where the frequency of occurrence of a histidine residue was more than 70% were thought to be effective histidine-mutation sites. Consequently, we identified PAB variants with a D36H mutation whose binding of IgG was highly sensitive to pH change.

Conclusion

The affinity column elution chromatograms demonstrated that antibodies could be eluted at a higher pH (?pH**≧2.0) than ever reported (?pH?=?1.4) when the Staphylococcal protein A variants developed in this study were used as affinity ligands. The interactions between Staphylococcal protein A and IgG-Fab were shown to be important for the behavior of IgG bound on a SpA affinity column, and alterations in the affinity of the ligands for IgG-Fab clearly affected the conditions for eluting the bound IgG. Thus, a histidine-scanning library combined with a structure-based design was shown to be effective in engineering novel pH-sensitive proteins.
  相似文献   

16.

Background

Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. In order to study such effects it is important to locate the breakpoint positions with precision.

Results

We define a partially local sequence alignment problem that assumes that following a rearrangement of a sequence F, two fragments L, and R are produced that may exactly fit together to match F, leave a gap of deleted DNA between L and R, or overlap with each other. We show that this alignment problem can be solved by dynamic programming in cubic space and time. We apply the new method to evaluate rearrangements of animal mitogenomes and find that a surprisingly large fraction of these events involved local sequence duplications.

Conclusions

The partially local sequence alignment method is an effective way to investigate the mechanism of genomic rearrangement events. While applied here only to mitogenomes there is no reason why the method could not be used to also consider rearrangements in nuclear genomes.
  相似文献   

17.

Background

The analysis of RNA sequences, once a small niche field for a small collection of scientists whose primary emphasis was the structure and function of a few RNA molecules, has grown most significantly with the realizations that 1) RNA is implicated in many more functions within the cell, and 2) the analysis of ribosomal RNA sequences is revealing more about the microbial ecology within all biological and environmental systems. The accurate and rapid alignment of these RNA sequences is essential to decipher the maximum amount of information from this data.

Methods

Two computer systems that utilize the Gutell lab's RNA Comparative Analysis Database (rCAD) were developed to align sequences to an existing template alignment available at the Gutell lab's Comparative RNA Web (CRW) Site. Multiple dimensions of cross-indexed information are contained within the relational database - rCAD, including sequence alignments, the NCBI phylogenetic tree, and comparative secondary structure information for each aligned sequence. The first program, CRWAlign-1 creates a phylogenetic-based sequence profile for each column in the alignment. The second program, CRWAlign-2 creates a profile based on phylogenetic, secondary structure, and sequence information. Both programs utilize their profiles to align new sequences into the template alignment.

Results

The accuracies of the two CRWAlign programs were compared with the best template-based rRNA alignment programs and the best de-novo alignment programs. We have compared our programs with a total of eight alternative alignment methods on different sets of 16S rRNA alignments with sequence percent identities ranging from 50% to 100%. Both CRWAlign programs were superior to these other programs in accuracy and speed.

Conclusions

Both CRWAlign programs can be used to align the very extensive amount of RNA sequencing that is generated due to the rapid next-generation sequencing technology. This latter technology is augmenting the new paradigm that RNA is intimately implicated in a significant number of functions within the cell. In addition, the use of bacterial 16S rRNA sequencing in the identification of the microbiome in many different environmental systems creates a need for rapid and highly accurate alignment of bacterial 16S rRNA sequences.
  相似文献   

18.

Background

A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed.

Results

We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ? each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{? σlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure.

Conclusions

Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.
  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号