共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. 相似文献2.
MOTIVATION: Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions). RESULTS: We have developed a new Partial Order-Partial Order alignment algorithm that optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. Using this algorithm, we show the combined Progressive POA alignment method yields results comparable with the best available MSA programs (CLUSTALW, DIALIGN2, T-COFFEE) but is far faster. For example, depending on the level of sequence similarity, aligning 1000 sequences, each 500 amino acids long, took 15 min (at 90% average identity) to 44 min (at 30% identity) on a standard PC. For large alignments, Progressive POA was 10-30 times faster than the fastest of the three previous methods (CLUSTALW). These data suggest that POA-based methods can scale to much larger alignment problems than possible for previous methods. AVAILABILITY: The POA source code is available at http://www.bioinformatics.ucla.edu/poa 相似文献
3.
The ability to align pairs of very large molecular sequences is essential for a range of comparative genomic studies. However, given the complexity of genomic sequences, it has been difficult to devise a systematic method that can align - even within the same species - pairs of large sequences. Most existing approaches typically attempt to align nucleotide sequences while ignoring valuable features contained within them, eg they filter out low-complexity regions and retroelements before aligning the sequences. However, features are then added post-alignment for visualisation and analysis purposes. We argue that repetitive elements and other features (such as genes, exons and regulatory elements) should be part of the alignment process. A hierarchical approach that aligns the biologically relevant features before aligning the detailed nucleotide sequences has a number of interesting characteristics: (1) features define 'alignment anchor points' that can guide meaningful nucleotide alignment; (2) features can be weighted; (3) a hierarchical approach would identify only meaningful regions to be aligned; (4) nucleotide sequences can be described as sequences of features and non-features, providing a natural mechanism to divide the sequences for processing; and (5) computational speed is significantly faster than other approaches. In this paper, we describe and discuss a feature-based approach to aligning large genome sequences. We refer to this as 'feature-based sequence alignment'. 相似文献
4.
Background
For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units.Results
We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/.Conclusions
USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. 相似文献5.
6.
The long-standing problem of constructing protein structure alignments is of central importance in computational biology. The main goal is to provide an alignment of residue correspondences, in order to identify homologous residues across chains. A critical next step of this is the alignment of protein complexes and their interfaces. Here, we introduce the program CMAPi, a two-dimensional dynamic programming algorithm that, given a pair of protein complexes, optimally aligns the contact maps of their interfaces: it produces polynomial-time near-optimal alignments in the case of multiple complexes. We demonstrate the efficacy of our algorithm on complexes from PPI families listed in the SCOPPI database and from highly divergent cytokine families. In comparison to existing techniques, CMAPi generates more accurate alignments of interacting residues within families of interacting proteins, especially for sequences with low similarity. While previous methods that use an all-atom based representation of the interface have been successful, CMAPi's use of a contact map representation allows it to be more tolerant to conformational changes and thus to align more of the interaction surface. These improved interface alignments should enhance homology modeling and threading methods for predicting PPIs by providing a basis for generating template profiles for sequence-structure alignment. 相似文献
7.
Chu W Ghahramani Z Podtelezhnikov A Wild DL 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(2):98-113
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in /spl beta/-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html. 相似文献
8.
《Journal of Asia》2014,17(3):581-585
The western flower thrips, Frankliniella occidentalis, is the most economically important agronomic pest within Thysanoptera because it is both a direct pest of horticulture crops and an efficient vector of plant viruses. Sixty-seven polymorphic SSR loci were identified in the contigs (containing redundant ESTs) generated by assembling 13,839 F. occidentalis ESTs from the public sequence database. Nineteen SSR markers exhibited polymorphism among 860 samples from 43 F. occidentalis populations, with alleles per SSR marker ranging from two to eight, the effective number of alleles (Ne) range from 0.73 to 2.64; the observed (Ho) and expected (He) heterozygosities ranged from 0.09 to 0.77 and 0.12 to 0.96, respectively. The PIC values were from 0.24 to 0.73. AMOVA revealed most genetic variation resided within, rather than between, greenhouse and field isolates. The Mantel test showed no significant differences between genetic and geographical distances. We demonstrated the value of mining the redundant sequences in public sequence databases for the development of polymorphic SSR markers, which can be used for better understanding population variation and spreading of the invasive pest F. occidentalis. 相似文献
9.
Malde K 《Bioinformatics (Oxford, England)》2008,24(7):897-900
Motivation: The nucleotide sequencing process produces not onlythe sequence of nucleotides, but also associated quality values.Quality values provide valuable information, but are primarilyused only for trimming sequences and generally ignored in subsequentanalyses. Results: This article describes how the scoring schemes of standardalignment algorithms can be modified to take into account qualityvalues to produce improved alignments and statistically moreaccurate scores. A prototype implementation is also provided,and used to post-process a set of BLAST results. Quality-adjustedalignment is a natural extension of standard alignment methods,and can be implemented with only a small constant factor performancepenalty. The method can also be applied to related methods includingheuristic search algorithms like BLAST and FASTA. Availability: Software is available at http://malde.org/~ketil/qaa. Contact: ketil.malde{at}imr.no Supplementary information: Supplementary data are availableat Bioinformatics online.
Associate Editor: Limsoon Wong 相似文献
10.
11.
12.
13.
Background
The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required.Results
In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming.Conclusions
The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing.Availability
Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm". 相似文献14.
15.
The many faces of sequence alignment 总被引:9,自引:0,他引:9
Batzoglou S 《Briefings in bioinformatics》2005,6(1):6-22
Starting with the sequencing of the mouse genome in 2002, we have entered a period where the main focus of genomics will be to compare multiple genomes in order to learn about human biology and evolution at the DNA level. Alignment methods are the main computational component of this endeavour. This short review aims to summarise the current status of research in alignments, emphasising large-scale genomic comparisons and suggesting possible directions that will be explored in the near future. 相似文献
16.
Labbé J Zhang X Yin T Schmutz J Grimwood J Martin F Tuskan GA Le Tacon F 《The New phytologist》2008,180(2):316-328
A genetic linkage map for the ectomycorrhizal basidiomycete Laccaria bicolor was constructed from 45 sib-homokaryotic haploid mycelial lines derived from the parental S238N strain progeny. For map construction, 294 simple sequence repeats (SSRs), single-nucleotide polymorphisms (SNPs), amplified fragment length polymorphisms (AFLPs) and random amplified polymorphic DNA (RAPD) markers were employed to identify and assay loci that segregated in backcross configuration. Using SNP, RAPD and SSR sequences, the L. bicolor whole-genome sequence (WGS) assemblies were aligned onto the linkage groups. A total of 37.36 Mbp of the assembled sequences was aligned to 13 linkage groups. Most mapped genetic markers used in alignment were colinear with the sequence assemblies, indicating that both the genetic map and sequence assemblies achieved high fidelity. The resulting matrix of recombination rates between all pairs of loci was used to construct an integrated linkage map using JoinMap. The final map consisted of 13 linkage groups spanning 812 centiMorgans (cM) at an average distance of 2.76 cM between markers (range 1.9-17 cM). The WGS and the present linkage map represent an initial step towards the identification and cloning of quantitative trait loci associated with development and functioning of the ectomycorrhizal symbiosis. 相似文献
17.
Edmonson MN Zhang J Yan C Finney RP Meerzaman DM Buetow KH 《Bioinformatics (Oxford, England)》2011,27(6):865-866
SUMMARY: Bambino is a variant detector and graphical alignment viewer for next-generation sequencing data in the SAM/BAM format, which is capable of pooling data from multiple source files. The variant detector takes advantage of SAM-specific annotations, and produces detailed output suitable for genotyping and identification of somatic mutations. The assembly viewer can display reads in the context of either a user-provided or automatically generated reference sequence, retrieve genome annotation features from a UCSC genome annotation database, display histograms of non-reference allele frequencies, and predict protein-coding changes caused by SNPs. AVAILABILITY: Bambino is written in platform-independent Java and available from https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html, along with documentation and example data. Bambino may be launched online via Java Web Start or downloaded and run locally. 相似文献
18.
Robinson AK Leal BZ Nanyes DR Kaur Y Ilangovan U Schirf V Hinck AP Demeler B Kim CA 《Biochemistry》2012,51(27):5379-5386
Sterile alpha motifs (SAMs) are frequently found in eukaryotic genomes. An intriguing property of many SAMs is their ability to self-associate, forming an open-ended polymer structure whose formation has been shown to be essential for the function of the protein. What remains largely unresolved is how polymerization is controlled. Previously, we had determined that the stretch of unstructured residues N-terminal to the SAM of a Drosophila protein called polyhomeotic (Ph), a member of the polycomb group (PcG) of gene silencers, plays a key role in controlling Ph SAM polymerization. Ph SAM with its native linker created shorter polymers compared to Ph SAM attached to either a random linker or no linker. Here, we show that the SAM linker for the human Ph ortholog, polyhomeotic homolog 3 (PHC3), also controls PHC3 SAM polymerization but does so in the opposite fashion. PHC3 SAM with its native linker allows longer polymers to form compared to when attached to a random linker. Attaching the PHC3 SAM linker to Ph SAM also resulted in extending Ph SAM polymerization. Moreover, in the context of full-length Ph protein, replacing the SAM linker with PHC3 SAM linker, intended to create longer polymers, resulted in greater repressive ability for the chimera compared to wild-type Ph. These findings show that polymeric SAM linkers evolved to modulate a wide dynamic range of SAM polymerization abilities and suggest that rationally manipulating the function of SAM containing proteins through controlling their SAM polymerization may be possible. 相似文献
19.
A cholecystokinin molecule larger than any previously chemically characterized was purified from canine proximal small intestine mucosa. The purification procedure consisted of sequential steps of affinity chromatography, gel filtration, and high pressure liquid chromatography. Activity was detected and quantitated by radioimmunoassay with an antibody that recognized the carboxyl terminal sequence of porcine cholecystokinin. Microsequencing of the purified peptide revealed an amino terminal nonadecapeptide sequence (AQKVNSGEPRAHLGALLAR) not present in known cholecystokinin molecules followed by a nonadecapeptide sequence (YIQQARKAPSGRMSVIKNL) that corresponds exactly to the amino terminal sequence of porcine cholecystokinin 39 except for reversed positions of a Met and a Val residue. Based on the sequence analysis, immunoreactivity, and presence of biological activity in two bioassay systems, this peptide, tentatively named cholecystokinin 58, may be a biosynthetic precursor of the smaller forms previously characterized in gastrointestinal and brain tissues. 相似文献
20.
An autotetraploid linkage map of rose (Rosa hybrida) validated using the strawberry (Fragaria vesca) genome sequence 总被引:1,自引:0,他引:1
Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map.The release of the Fragaria vesca genome, which also belongs to the Rosoideae, allowed us to place 70 rose sequenced markers on the seven strawberry pseudo-chromosomes. Synteny between Rosa and Fragaria was high with an estimated four major translocations and six inversions required to place the 17 non-collinear markers in the same order. Based on a verified linear order of the rose markers, we could further partition each of the parents into its four homologous groups, thus providing an essential framework to aid the sequencing of an autotetraploid genome. 相似文献