期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

Michael L Sierk Michael E Smoot Ellen J Bass William R Pearson 《BMC bioinformatics》2010,11(1):146

Background

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. 相似文献

2.

Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems 总被引：4，自引：0，他引：4

Grasso C Lee C 《Bioinformatics (Oxford, England)》2004,20(10):1546-1556

MOTIVATION: Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions). RESULTS: We have developed a new Partial Order-Partial Order alignment algorithm that optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. Using this algorithm, we show the combined Progressive POA alignment method yields results comparable with the best available MSA programs (CLUSTALW, DIALIGN2, T-COFFEE) but is far faster. For example, depending on the level of sequence similarity, aligning 1000 sequences, each 500 amino acids long, took 15 min (at 90% average identity) to 44 min (at 30% identity) on a standard PC. For large alignments, Progressive POA was 10-30 times faster than the fastest of the three previous methods (CLUSTALW). These data suggest that POA-based methods can scale to much larger alignment problems than possible for previous methods. AVAILABILITY: The POA source code is available at http://www.bioinformatics.ucla.edu/poa 相似文献

3.

FBSA: feature-based sequence alignment technique for very large sequences

Bellgard M Kenworthy W 《Applied bioinformatics》2003,2(3):145-150

The ability to align pairs of very large molecular sequences is essential for a range of comparative genomic studies. However, given the complexity of genomic sequences, it has been difficult to devise a systematic method that can align - even within the same species - pairs of large sequences. Most existing approaches typically attempt to align nucleotide sequences while ignoring valuable features contained within them, eg they filter out low-complexity regions and retroelements before aligning the sequences. However, features are then added post-alignment for visualisation and analysis purposes. We argue that repetitive elements and other features (such as genes, exons and regulatory elements) should be part of the alignment process. A hierarchical approach that aligns the biologically relevant features before aligning the detailed nucleotide sequences has a number of interesting characteristics: (1) features define 'alignment anchor points' that can guide meaningful nucleotide alignment; (2) features can be weighted; (3) a hierarchical approach would identify only meaningful regions to be aligned; (4) nucleotide sequences can be described as sequences of features and non-features, providing a natural mechanism to divide the sequences for processing; and (5) computational speed is significantly faster than other approaches. In this paper, we describe and discuss a feature-based approach to aligning large genome sequences. We refer to this as 'feature-based sequence alignment'. 相似文献

4.

Universal sequence map (USM) of arbitrary discrete sequences

Jonas S Almeida Susana Vinga 《BMC bioinformatics》2002,3(1):6-11

Background

For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units.

Results

We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/.

Conclusions

USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. 相似文献

5.

SISEQ: manipulation of multiple sequence and large database files for common platforms

Sato N 《Bioinformatics (Oxford, England)》2000,16(2):180-181

相似文献

6.

Optimal contact map alignment of protein-protein interfaces

Pulim V Berger B Bienkowska J 《Bioinformatics (Oxford, England)》2008,24(20):2324-2328

The long-standing problem of constructing protein structure alignments is of central importance in computational biology. The main goal is to provide an alignment of residue correspondences, in order to identify homologous residues across chains. A critical next step of this is the alignment of protein complexes and their interfaces. Here, we introduce the program CMAPi, a two-dimensional dynamic programming algorithm that, given a pair of protein complexes, optimally aligns the contact maps of their interfaces: it produces polynomial-time near-optimal alignments in the case of multiple complexes. We demonstrate the efficacy of our algorithm on complexes from PPI families listed in the SCOPPI database and from highly divergent cytokine families. In comparison to existing techniques, CMAPi generates more accurate alignments of interacting residues within families of interacting proteins, especially for sequences with low similarity. While previous methods that use an all-atom based representation of the interface have been successful, CMAPi's use of a contact map representation allows it to be more tolerant to conformational changes and thus to align more of the interaction surface. These improved interface alignments should enhance homology modeling and threading methods for predicting PPIs by providing a basis for generating template profiles for sequence-structure alignment. 相似文献

7.

Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction

Chu W Ghahramani Z Podtelezhnikov A Wild DL 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(2):98-113

In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in /spl beta/-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html. 相似文献

8.

Development of polymorphic EST-SSR markers by sequence alignment in Frankliniella occidentalis (Pergande)

《Journal of Asia》2014,17(3):581-585

The western flower thrips, Frankliniella occidentalis, is the most economically important agronomic pest within Thysanoptera because it is both a direct pest of horticulture crops and an efficient vector of plant viruses. Sixty-seven polymorphic SSR loci were identified in the contigs (containing redundant ESTs) generated by assembling 13,839 F. occidentalis ESTs from the public sequence database. Nineteen SSR markers exhibited polymorphism among 860 samples from 43 F. occidentalis populations, with alleles per SSR marker ranging from two to eight, the effective number of alleles (Ne) range from 0.73 to 2.64; the observed (Ho) and expected (He) heterozygosities ranged from 0.09 to 0.77 and 0.12 to 0.96, respectively. The PIC values were from 0.24 to 0.73. AMOVA revealed most genetic variation resided within, rather than between, greenhouse and field isolates. The Mantel test showed no significant differences between genetic and geographical distances. We demonstrated the value of mining the redundant sequences in public sequence databases for the development of polymorphic SSR markers, which can be used for better understanding population variation and spreading of the invasive pest F. occidentalis. 相似文献

9.

The effect of sequence quality on sequence alignment

Malde K 《Bioinformatics (Oxford, England)》2008,24(7):897-900

Motivation: The nucleotide sequencing process produces not onlythe sequence of nucleotides, but also associated quality values.Quality values provide valuable information, but are primarilyused only for trimming sequences and generally ignored in subsequentanalyses. Results: This article describes how the scoring schemes of standardalignment algorithms can be modified to take into account qualityvalues to produce improved alignments and statistically moreaccurate scores. A prototype implementation is also provided,and used to post-process a set of BLAST results. Quality-adjustedalignment is a natural extension of standard alignment methods,and can be implemented with only a small constant factor performancepenalty. The method can also be applied to related methods includingheuristic search algorithms like BLAST and FASTA. Availability: Software is available at http://malde.org/~ketil/qaa. Contact: ketil.malde{at}imr.no Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Limsoon Wong 相似文献

10.

Optimization of sequence alignment for simple sequence repeat regions

Abdulqader Jighly Aladdin Hamwieh Francis C Ogbonnaya 《BMC research notes》2011,4(1):239

相似文献

11.

Expansion of the pig comparative map by expressed sequence tags (EST) mapping

A. K. Fridolfsson T. Hori A. K. Winterø M. Fredholm M. Yerle A. Robic L. Andersson Hans Ellegren 《Mammalian genome》1997,8(12):907-912

相似文献

12.

参数序列比对算法研究

张涛涛郭茂祖邹权《生物信息学》2008,6(2):65-68

序列比对是生物信息学中的一项重要任务,通过序列比对可以发现生物序列中的功能、结构和进化的信息。序列比对结果的生物学意义与所选择的匹配、不匹配、插入和删除以及空隙的罚分函数密切相关。现介绍一种参数序列比对方法,该方法把最佳比对作为权值和罚分的函数,可以系统地得到参数的选择对最佳比对结果的影响。然后将其应用于RNA序列比对,分析不同的参数选择对序列比对结果的影响。最后指出参数序列比对算法的应用以及未来的发展方向。相似文献

13.

Fractal MapReduce decomposition of sequence alignment

Almeida JS Grüneberg A Maass W Vinga S 《Algorithms for molecular biology : AMB》2012,7(1):12-12

Background

The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required.

Results

In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming.

Conclusions

The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing.

Availability

Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm". 相似文献

14.

Model-based prediction of sequence alignment quality

Ahola V Aittokallio T Vihinen M Uusipaikka E 《Bioinformatics (Oxford, England)》2008,24(19):2165-2171

相似文献

15.

The many faces of sequence alignment 总被引：9，自引：0，他引：9

Batzoglou S 《Briefings in bioinformatics》2005,6(1):6-22

Starting with the sequencing of the mouse genome in 2002, we have entered a period where the main focus of genomics will be to compare multiple genomes in order to learn about human biology and evolution at the DNA level. Alignment methods are the main computational component of this endeavour. This short review aims to summarise the current status of research in alignments, emphasising large-scale genomic comparisons and suggesting possible directions that will be explored in the near future. 相似文献

16.

A genetic linkage map for the ectomycorrhizal fungus Laccaria bicolor and its alignment to the whole-genome sequence assemblies

Labbé J Zhang X Yin T Schmutz J Grimwood J Martin F Tuskan GA Le Tacon F 《The New phytologist》2008,180(2):316-328

A genetic linkage map for the ectomycorrhizal basidiomycete Laccaria bicolor was constructed from 45 sib-homokaryotic haploid mycelial lines derived from the parental S238N strain progeny. For map construction, 294 simple sequence repeats (SSRs), single-nucleotide polymorphisms (SNPs), amplified fragment length polymorphisms (AFLPs) and random amplified polymorphic DNA (RAPD) markers were employed to identify and assay loci that segregated in backcross configuration. Using SNP, RAPD and SSR sequences, the L. bicolor whole-genome sequence (WGS) assemblies were aligned onto the linkage groups. A total of 37.36 Mbp of the assembled sequences was aligned to 13 linkage groups. Most mapped genetic markers used in alignment were colinear with the sequence assemblies, indicating that both the genetic map and sequence assemblies achieved high fidelity. The resulting matrix of recombination rates between all pairs of loci was used to construct an integrated linkage map using JoinMap. The final map consisted of 13 linkage groups spanning 812 centiMorgans (cM) at an average distance of 2.76 cM between markers (range 1.9-17 cM). The WGS and the present linkage map represent an initial step towards the identification and cloning of quantitative trait loci associated with development and functioning of the ectomycorrhizal symbiosis. 相似文献

17.

Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format

Edmonson MN Zhang J Yan C Finney RP Meerzaman DM Buetow KH 《Bioinformatics (Oxford, England)》2011,27(6):865-866

SUMMARY: Bambino is a variant detector and graphical alignment viewer for next-generation sequencing data in the SAM/BAM format, which is capable of pooling data from multiple source files. The variant detector takes advantage of SAM-specific annotations, and produces detailed output suitable for genotyping and identification of somatic mutations. The assembly viewer can display reads in the context of either a user-provided or automatically generated reference sequence, retrieve genome annotation features from a UCSC genome annotation database, display histograms of non-reference allele frequencies, and predict protein-coding changes caused by SNPs. AVAILABILITY: Bambino is written in platform-independent Java and available from https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html, along with documentation and example data. Bambino may be launched online via Java Web Start or downloaded and run locally. 相似文献

18.

Human Polyhomeotic Homolog 3 (PHC3) Sterile Alpha Motif (SAM) Linker Allows Open-Ended Polymerization of PHC3 SAM

Robinson AK Leal BZ Nanyes DR Kaur Y Ilangovan U Schirf V Hinck AP Demeler B Kim CA 《Biochemistry》2012,51(27):5379-5386

Sterile alpha motifs (SAMs) are frequently found in eukaryotic genomes. An intriguing property of many SAMs is their ability to self-associate, forming an open-ended polymer structure whose formation has been shown to be essential for the function of the protein. What remains largely unresolved is how polymerization is controlled. Previously, we had determined that the stretch of unstructured residues N-terminal to the SAM of a Drosophila protein called polyhomeotic (Ph), a member of the polycomb group (PcG) of gene silencers, plays a key role in controlling Ph SAM polymerization. Ph SAM with its native linker created shorter polymers compared to Ph SAM attached to either a random linker or no linker. Here, we show that the SAM linker for the human Ph ortholog, polyhomeotic homolog 3 (PHC3), also controls PHC3 SAM polymerization but does so in the opposite fashion. PHC3 SAM with its native linker allows longer polymers to form compared to when attached to a random linker. Attaching the PHC3 SAM linker to Ph SAM also resulted in extending Ph SAM polymerization. Moreover, in the context of full-length Ph protein, replacing the SAM linker with PHC3 SAM linker, intended to create longer polymers, resulted in greater repressive ability for the chimera compared to wild-type Ph. These findings show that polymeric SAM linkers evolved to modulate a wide dynamic range of SAM polymerization abilities and suggest that rationally manipulating the function of SAM containing proteins through controlling their SAM polymerization may be possible. 相似文献

19.

Partial structure of a large canine cholecystokinin (CCK₅₈): Amino acid sequence

V.E. Eysselein J.R. Reeve J.E. Shively D. Hawke J.H. Walsh 《Peptides》1982,3(4):687-691

A cholecystokinin molecule larger than any previously chemically characterized was purified from canine proximal small intestine mucosa. The purification procedure consisted of sequential steps of affinity chromatography, gel filtration, and high pressure liquid chromatography. Activity was detected and quantitated by radioimmunoassay with an antibody that recognized the carboxyl terminal sequence of porcine cholecystokinin. Microsequencing of the purified peptide revealed an amino terminal nonadecapeptide sequence (AQKVNSGEPRAHLGALLAR) not present in known cholecystokinin molecules followed by a nonadecapeptide sequence (YIQQARKAPSGRMSVIKNL) that corresponds exactly to the amino terminal sequence of porcine cholecystokinin 39 except for reversed positions of a Met and a Val residue. Based on the sequence analysis, immunoreactivity, and presence of biological activity in two bioassay systems, this peptide, tentatively named cholecystokinin 58, may be a biosynthetic precursor of the smaller forms previously characterized in gastrointestinal and brain tissues. 相似文献

20.

An autotetraploid linkage map of rose (Rosa hybrida) validated using the strawberry (Fragaria vesca) genome sequence 总被引：1，自引：0，他引：1

Gar O Sargent DJ Tsai CJ Pleban T Shalev G Byrne DH Zamir D 《PloS one》2011,6(5):e20463

Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map.The release of the Fragaria vesca genome, which also belongs to the Rosoideae, allowed us to place 70 rose sequenced markers on the seven strawberry pseudo-chromosomes. Synteny between Rosa and Fragaria was high with an estimated four major translocations and six inversions required to place the 17 non-collinear markers in the same order. Based on a verified linear order of the rose markers, we could further partition each of the parents into its four homologous groups, thus providing an essential framework to aid the sequencing of an autotetraploid genome. 相似文献