期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

HIPPI: highly accurate protein family classification with ensembles of HMMs

Nguyen Nam-phuong Nute Michael Mirarab Siavash Warnow Tandy 《BMC genomics》2016,17(10):765-100

Background

Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics.

Results

We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy.

Conclusion

HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp.

相似文献

2.

Metagenomic Analysis Reveals Previously Undescribed Bat Coronavirus Strains in Eswatini

Shapiro Julie Teresa Mollerup Sarah Jensen Randi Holm Olofsson Jill Katharina Nguyen Nam-phuong D. Hansen Thomas Arn Vinner Lasse Monadjem Ara McCleery Robert A. Hansen Anders J. 《EcoHealth》2021,18(4):421-428

EcoHealth - We investigated the prevalence of coronaviruses in 44 bats from four families in northeastern Eswatini using high-throughput sequencing of fecal samples. We found evidence of... 相似文献

3.

Design and analysis of a robust genetic Muller C-element

Nam-phuong Nguyen Chris Myers 《Journal of theoretical biology》2010,264(2):174-187

This paper presents results on the design and analysis of a robust genetic Muller C-element. The Muller C-element is a standard logic gate commonly used to synchronize independent processes in most asynchronous electronic circuits. Synthetic biological logic gates have been previously demonstrated, but there remain many open issues in the design of sequential (state-holding) logic operations. Three designs are considered for the genetic Muller C-element: a majority gate, a toggle switch, and a speed-independent implementation. While the three designs are logically equivalent, each design requires different assumptions to operate correctly. The majority gate design requires the most timing assumptions, the speed-independent design requires the least, and the toggle switch design is a compromise between the two. This paper examines the robustness of these designs as well as the effects of parameter variation using stochastic simulation. The results show that robustness to timing assumptions does not necessarily increase reliability, suggesting that modifications to existing logic design tools are going to be necessary for synthetic biology. Parameter variation simulations yield further insights into the design principles necessary for building robust genetic gates. The results suggest that high gene count, cooperativity of at least two, tight repression, and balanced decay rates are necessary for robust gates. Finally, this paper presents a potential application of the genetic Muller C-element as a quorum-mediated trigger. 相似文献

4.

Ultra-large alignments using phylogeny-aware profiles

Nam-phuong D. Nguyen Siavash Mirarab Keerthana Kumar Tandy Warnow 《Genome biology》2015,16(1)

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0688-z) contains supplementary material, which is available to authorized users. 相似文献