Fast,scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Fast,scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega

Authors:	David Dineen Toby J Gibson Kevin Karplus Weizhong Li Rodrigo Lopez Hamish McWilliam Michael Remmert Johannes Söding Julie D Thompson Desmond G Higgins

Affiliation:	1. School of Medicine and Medical Science, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, , Dublin, Ireland;2. Structural and Computational Biology Unit, European Molecular Biology Laboratory, , Heidelberg, Germany;3. Department of Biomolecular Engineering, University of California, , Santa Cruz, CA, USA;4. EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, , Hinxton, Cambridge, UK;5. Gene Center Munich, University of Munich (LMU), , Muenchen, Germany;6. Département de Biologie Structurale et Génomique, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), CNRS/INSERM/Université de Strasbourg, , Illkirch, France

Abstract:	Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high‐quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high‐quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

Keywords:	bioinformatics hidden Markov models multiple sequence alignment

设为首页 | 免责声明 | 关于勤云 | 加入收藏