期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Systematic exploration of guide-tree topology effects for small protein alignments

Fabian Sievers Graham M Hughes Desmond G Higgins 《BMC bioinformatics》2014,15(1)

Background

Guide-trees are used as part of an essential heuristic to enable the calculation of multiple sequence alignments. They have been the focus of much method development but there has been little effort at determining systematically, which guide-trees, if any, give the best alignments. Some guide-tree construction schemes are based on pair-wise distances amongst unaligned sequences. Others try to emulate an underlying evolutionary tree and involve various iteration methods.

Results

We explore all possible guide-trees for a set of protein alignments of up to eight sequences. We find that pairwise distance based default guide-trees sometimes outperform evolutionary guide-trees, as measured by structure derived reference alignments. However, default guide-trees fall way short of the optimum attainable scores. On average chained guide-trees perform better than balanced ones but are not better than default guide-trees for small alignments.

Conclusions

Alignment methods that use Consistency or hidden Markov models to make alignments are less susceptible to sub-optimal guide-trees than simpler methods, that basically use conventional sequence alignment between profiles. The latter appear to be affected positively by evolutionary based guide-trees for difficult alignments and negatively for easy alignments. One phylogeny aware alignment program can strongly discriminate between good and bad guide-trees. The results for randomly chained guide-trees improve with the number of sequences.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-338) contains supplementary material, which is available to authorized users. 相似文献

2.

REDHORSE-REcombination and Double crossover detection in Haploid Organisms using next-geneRation SEquencing data

Jahangheer S Shaik Asis Khan Stephen M Beverley L David Sibley 《BMC genomics》2015,16(1)

Background

Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data.

Results

We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers.

Conclusion

REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users. 相似文献

3.

Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information

Xin Deng Jianlin Cheng 《BMC bioinformatics》2014,15(1)

Background

Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis.

Results

In this work, we integrate predicted solvent accessibility, torsion angles and evolutionary residue coupling information with the pairwise Hidden Markov Model (HMM) based profile alignment method to improve profile-profile alignments. The evaluation results demonstrate that adding predicted relative solvent accessibility and torsion angle information improves the accuracy of profile-profile alignments. The evolutionary residue coupling information is helpful in some cases, but its contribution to the improvement is not consistent.

Conclusion

Incorporating the new structural information such as predicted solvent accessibility and torsion angles into the profile-profile alignment is a useful way to improve pairwise profile-profile alignment methods. 相似文献

4.

progressiveMauve: Multiple Genome Alignment with Gene Gain,Loss and Rearrangement 总被引：1，自引：0，他引：1

Aaron E. Darling Bob Mau Nicole T. Perna 《PloS one》2010,5(6)

Background

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.

Methodology/Principal Findings

We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.

Conclusions

The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve. 相似文献

5.

SFESA: a web server for pairwise alignment refinement by secondary structure shifts

Jing Tong Jimin Pei Nick V. Grishin 《BMC bioinformatics》2015,16(1)

Background

Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate.

Results

We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software.

Conclusions

SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa. 相似文献

6.

ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach

Dimitrios P Lyras Dirk Metzler 《BMC bioinformatics》2014,15(1)

Background

Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments.

Results

We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align.

Conclusions

The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-265) contains supplementary material, which is available to authorized users. 相似文献

7.

Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation

Russell J. Dickson Lindi M. Wahl Andrew D. Fernandes Gregory B. Gloor 《PloS one》2010,5(6)

Background

There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses.

Methodology/Principal Findings

We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature.

Conclusions/Significance

Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation. 相似文献

8.

YOC,A new strategy for pairwise alignment of collinear genomes

Raluca Uricaru Célia Michotey Hélène Chiapello Eric Rivals 《BMC bioinformatics》2015,16(1)

Background

Comparing and aligning genomes is a key step in analyzing closely related genomes. Despite the development of many genome aligners in the last 15 years, the problem is not yet fully resolved, even when aligning closely related bacterial genomes of the same species. In addition, no procedures are available to assess the quality of genome alignments or to compare genome aligners.

Results

We designed an original method for pairwise genome alignment, named YOC, which employs a highly sensitive similarity detection method together with a recent collinear chaining strategy that allows overlaps. YOC improves the reliability of collinear genome alignments, while preserving or even improving sensitivity. We also propose an original qualitative evaluation criterion for measuring the relevance of genome alignments. We used this criterion to compare and benchmark YOC with five recent genome aligners on large bacterial genome datasets, and showed it is suitable for identifying the specificities and the potential flaws of their underlying strategies.

Conclusions

The YOC prototype is available at https://github.com/ruricaru/YOC. It has several advantages over existing genome aligners: (1) it is based on a simplified two phase alignment strategy, (2) it is easy to parameterize, (3) it produces reliable genome alignments, which are easier to analyze and to use.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0530-3) contains supplementary material, which is available to authorized users. 相似文献

9.

5-Methoxyleoligin,a Lignan from Edelweiss,Stimulates CYP26B1-Dependent Angiogenesis In Vitro and Induces Arteriogenesis in Infarcted Rat Hearts In Vivo

Barbara Messner Johann Kern Dominik Wiedemann Stefan Schwaiger Adrian Türkcan Christian Ploner Alexander Trockenbacher Klaus Aumayr Nikolaos Bonaros Günther Laufer Hermann Stuppner Gerold Untergasser David Bernhard 《PloS one》2013,8(3)

Background

Insufficient angiogenesis and arteriogenesis in cardiac tissue after myocardial infarction (MI) is a significant factor hampering the functional recovery of the heart. To overcome this problem we screened for compounds capable of stimulating angiogenesis, and herein investigate the most active molecule, 5-Methoxyleoligin (5ML), in detail.

Methods and Results

5ML potently stimulated endothelial tube formation, angiogenic sprouting, and angiogenesis in a chicken chorioallantoic membrane assay. Further, microarray- and knock down- based analyses revealed that 5ML induces angiogenesis by upregulation of CYP26B1. In an in vivo rat MI model 5ML potently increased the number of arterioles in the peri-infarction and infarction area, reduced myocardial muscle loss, and led to a significant increase in LV function (plus 21% 28 days after MI).

Conclusion

The present study shows that 5ML induces CYP26B1-dependent angiogenesis in vitro, and arteriogenesis in vivo. Whether or not CYP26B1 is relevant for in vivo arteriogenesis is not clear at the moment. Importantly, 5ML-induced arteriogenesis in vivo makes the compound even more interesting for a post MI therapy. 5ML may constitute the first low molecular weight compound leading to an improvement of myocardial function after MI. 相似文献

10.

AliGROOVE – visualization of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support

Patrick Kück Sandra A Meid Christian Gro? Johann W W?gele Bernhard Misof 《BMC bioinformatics》2014,15(1)

Background

Masking of multiple sequence alignment blocks has become a powerful method to enhance the tree-likeness of the underlying data. However, existing masking approaches are insensitive to heterogeneous sequence divergence which can mislead tree reconstructions. We present AliGROOVE, a new method based on a sliding window and a Monte Carlo resampling approach, that visualizes heterogeneous sequence divergence or alignment ambiguity related to single taxa or subsets of taxa within a multiple sequence alignment and tags suspicious branches on a given tree.

Results

We used simulated multiple sequence alignments to show that the extent of alignment ambiguity in pairwise sequence comparison is correlated with the frequency of misplaced taxa in tree reconstructions. The approach implemented in AliGROOVE allows to detect nodes within a tree that are supported despite the absence of phylogenetic signal in the underlying multiple sequence alignment. We show that AliGROOVE equally well detects heterogeneous sequence divergence in a case study based on an empirical data set of mitochondrial DNA sequences of chelicerates.

Conclusions

The AliGROOVE approach has the potential to identify single taxa or subsets of taxa which show predominantly randomized sequence similarity in comparison with other taxa in a multiple sequence alignment. It further allows to evaluate the reliability of node support in a novel way.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-294) contains supplementary material, which is available to authorized users. 相似文献

11.

Gene Classification Based on Amino Acid Motifs and Residues: The DLX (distal-less) Test Case

Nuno A. Fonseca Cristina P. Vieira Jorge Vieira 《PloS one》2009,4(6)

Background

Comparative studies using hundreds of sequences can give a detailed picture of the evolution of a given gene family. Nevertheless, retrieving only the sequences of interest from public databases can be difficult, in particular, when working with highly divergent sequences. The difficulty increases substantially when one wants to include in the study sequences from many (or less well studied) species whose genomes are non-annotated or incompletely annotated.

Methodology/Principal Findings

In this work we evaluate the usefulness of different approaches of gene retrieval and classification, using the distal-less (DLX) gene family as a test case. Furthermore, we evaluate whether the use of a large number of gene sequences from a wide range of animal species, the use of multiple alternative alignments, and the use of amino acids aligned with high confidence only, is enough to recover the accepted DLX evolutionary history.

Conclusions/Significance

The canonical DLX homeobox gene sequence here derived, together with the characteristic amino acid variants here identified in the DLX homeodomain region, can be used to retrieve and classify DLX genes in a simple and efficient way. A program is made available that allows the easy retrieval of synteny information that can be used to classify gene sequences. Maximum likelihood trees using hundreds of sequences can be used for gene identification. Nevertheless, for the DLX case, the proposed DLX evolutionary is not recovered even when multiple alignment algorithms are used. 相似文献

12.

MDAT- Aligning multiple domain arrangements

Carsten Kemena Tristan Bitard-Feildel Erich Bornberg-Bauer 《BMC bioinformatics》2015,16(1)

Background

Proteins are composed of domains, protein segments that fold independently from the rest of the protein and have a specific function. During evolution the arrangement of domains can change: domains are gained, lost or their order is rearranged. To facilitate the analysis of these changes we propose the use of multiple domain alignments.

Results

We developed an alignment program, called MDAT, which aligns multiple domain arrangements. MDAT extends earlier programs which perform pairwise alignments of domain arrangements. MDAT uses a domain similarity matrix to score domain pairs and aligns the domain arrangements using a consistency supported progressive alignment method.

Conclusion

MDAT will be useful for analysing changes in domain arrangements within and between protein families and will thus provide valuable insights into the evolution of proteins and their domains. MDAT is coded in C++, and the source code is freely available for download at http://www.bornberglab.org/pages/mdat.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0442-7) contains supplementary material, which is available to authorized users. 相似文献

13.

W-Curve Alignments for HIV-1 Genomic Comparisons

Douglas J. Cork Steven Lembark Sodsai Tovanabutra Merlin L. Robb Jerome H. Kim 《PloS one》2010,5(6)

相似文献

14.

BED estimates of HIV incidence: resolving the differences, making things simpler

Hargrove J van Schalkwyk C Eastwood H 《PloS one》2012,7(1):e29736

Objective

Develop a simple method for optimal estimation of HIV incidence using the BED capture enzyme immunoassay.

Design

Use existing BED data to estimate mean recency duration, false recency rates and HIV incidence with reference to a fixed time period, T.

Methods

Compare BED and cohort estimates of incidence referring to identical time frames. Generalize this approach to suggest a method for estimating HIV incidence from any cross-sectional survey.

Results

Follow-up and BED analyses of the same, initially HIV negative, cases followed over the same set time period T, produce estimates of the same HIV incidence, permitting the estimation of the BED mean recency period for cases who have been HIV positive for less than T. Follow-up of HIV positive cases over T, similarly, provides estimates of the false-recent rate appropriate for T. Knowledge of these two parameters for a given population allows the estimation of HIV incidence during T by applying the BED method to samples from cross-sectional surveys. An algorithm is derived for providing these estimates, adjusted for the false-recent rate. The resulting estimator is identical to one derived independently using a more formal mathematical analysis. Adjustments improve the accuracy of HIV incidence estimates. Negative incidence estimates result from the use of inappropriate estimates of the false-recent rate and/or from sampling error, not from any error in the adjustment procedure.

Conclusions

Referring all estimates of mean recency periods, false-recent rates and incidence estimates to a fixed period T simplifies estimation procedures and allows the development of a consistent method for producing adjusted estimates of HIV incidence of improved accuracy. Unadjusted BED estimates of incidence, based on life-time recency periods, would be both extremely difficult to produce and of doubtful value. 相似文献

15.

Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing

Mathieu Giraud Mika?l Salson Marc Duez Céline Villenet Sabine Quief Aurélie Caillault Nathalie Grardel Christophe Roumier Claude Preudhomme Martin Figeac 《BMC genomics》2014,15(1)

Background

V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood.

Results

We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols.

Conclusions

The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-409) contains supplementary material, which is available to authorized users. 相似文献

16.

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

Paul B Frandsen Brett Calcott Christoph Mayer Robert Lanfear 《BMC evolutionary biology》2015,15(1)

Background

Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses.

Results

We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias.

Conclusions

Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning. 相似文献

17.

A novel method of characterizing genetic sequences: genome space with biological distance and applications

Deng M Yu C Liang Q He RL Yau SS 《PloS one》2011,6(3):e17293

Background

Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences.

Methodology

To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists'' analyses.

Conclusions

Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve. 相似文献

18.

Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data

Helal M Kong F Chen SC Bain M Christen R Sintchenko V 《PloS one》2011,6(6):e19517

Background

The intra- and inter-species genetic diversity of bacteria and the absence of ‘reference’, or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia.

Methods

A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization.

Results

The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as ‘centroids’ in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578.

Conclusion

The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra-species variability. 相似文献

19.

Mannose-Binding Lectin Deficiency Is Associated with Myocardial Infarction: The HUNT2 Study in Norway

IT Vengen HO Madsen P Garred C Platou L Vatten V Videm 《PloS one》2012,7(7):e42113

Objectives

Mannose-binding lectin (MBL) and ficolins activate the complement cascade, which is involved in atherogenesis. Based on a pilot study, we hypothesized that functional polymorphisms in the MBL gene (MBL2) leading to dysfunctional protein are related to development of myocardial infarction (MI). The aim of the present study was to study polymorphisms in MBL2 and ficolin genes in relation to the risk of MI.

Methods and Results

Using the population-based HUNT Study in Norway, 57133 persons were followed up for a first-time MI from 1995–1997 until the end of 2008. The 370 youngest MI patients were matched by age (range 29–62 years) and gender to 370 controls. A younger population was selected because disease in this group might be less dependent on non-genetic risk factors. The study size was based on power calculation. Polymorphisms in MBL2 and in the genes of ficolin-1, ficolin-2 and ficolin-3 were genotyped by pyrosequencing and related to the risk of MI, estimated as odds ratios (OR). Functional haplotypes were analyzed and stringent alpha levels of significance were set by permutation testing. Variant MBL2 haplotypes causing MBL deficiency were associated with a two-fold higher risk of MI (OR 2.04, 95%CI 1.29–3.24). Adjustments for conventional cardiovascular risk factors did not substantially influence the association. The ficolins were not associated with MI risk.

Conclusion

In a young to middle aged and relatively healthy Caucasian population, MBL2 variants related to functional MBL deficiency were associated with a doubling of the risk for MI, independent of conventional risk factors. This supports that MBL deficiency may lead to increased atherosclerosis or development of vulnerable plaques. 相似文献

20.

An evolutionary relationship between Stearoyl-CoA Desaturase (SCD) protein sequences involved in fatty acid metabolism

Mohammad Salmani Izadi Abbas Ali Naserian Mohammad Reza Nasiri Reza Majidzadeh Heravi 《Reports of Biochemistry & Molecular Biology》2014,3(1):1-6

Background:

Stearoyl-CoA desaturase (SCD) is a key enzyme that converts saturated fatty acids (SFAs) to monounsaturated fatty acids (MUFAs) in fat biosynthesis. Despite being crucial for interpreting SCDs’ roles across species, the evolutionary relationship of SCD proteins across species has yet to be elucidated. This study aims to present this evolutionary relationship based on amino acid sequences.

Methods:

Using Multiple Sequence Alignment (MSA) and phylogenetic construction methods, a hypothetical evolutionary relationship was generated between the stearoyl-CoA desaturase (SCD) protein sequences between 18 different species.

Results:

SCD protein sequences from Homo sapiens, Pan troglodytes (chimpanzee), and Pongo abelii (orangutan) have the lowest genetic distances of 0.006 of the 18 species studied. Capra hircus (goat) and Ovis aries (Sheep) had the next lowest genetic distance of 0.023. These farm animals are 99.987% identical at the amino acid level.

Conclusions:

The SCD proteins are conserved in these 18 species, and their evolutionary relationships are similar. Key Words: Phylogenetic analysis, Stearoyl-CoA desaturase (SCD) proteins, Multiple sequence alignment 相似文献