期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Error statistics of hidden Markov model and hidden Boltzmann model results

Lee A Newberg 《BMC bioinformatics》2009,10(1):212

Background

Hidden Markov models and hidden Boltzmann models are employed in computational biology and a variety of other scientific fields for a variety of analyses of sequential data. Whether the associated algorithms are used to compute an actual probability or, more generally, an odds ratio or some other score, a frequent requirement is that the error statistics of a given score be known. What is the chance that random data would achieve that score or better? What is the chance that a real signal would achieve a given score threshold? 相似文献

2.

Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes

Crystal L Kahn Shay Mozes Benjamin J Raphael 《Algorithms for molecular biology : AMB》2010,5(1):11

Background

Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. 相似文献

3.

Logo2PWM: a tool to convert sequence logo to position weight matrix

Zhen Gao Lu Liu Jianhua Ruan 《BMC genomics》2017,18(6):709

相似文献

4.

HMM-ModE – Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences

Prashant K Srivastava Dhwani K Desai Soumyadeep Nandi Andrew M Lynn 《BMC bioinformatics》2007,8(1):104

Background

Profile Hidden Markov Models (HMM) are statistical representations of protein families derived from patterns of sequence conservation in multiple alignments and have been used in identifying remote homologues with considerable success. These conservation patterns arise from fold specific signals, shared across multiple families, and function specific signals unique to the families. The availability of sequences pre-classified according to their function permits the use of negative training sequences to improve the specificity of the HMM, both by optimizing the threshold cutoff and by modifying emission probabilities to minimize the influence of fold-specific signals. A protocol to generate family specific HMMs is described that first constructs a profile HMM from an alignment of the family's sequences and then uses this model to identify sequences belonging to other classes that score above the default threshold (false positives). Ten-fold cross validation is used to optimise the discrimination threshold score for the model. The advent of fast multiple alignment methods enables the use of the profile alignments to align the true and false positive sequences, and the resulting alignments are used to modify the emission probabilities in the original model. 相似文献

5.

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

Arumugam M Wei C Brown RH Brent MR 《Genome biology》2006,7(Z1):S5.1-S510

相似文献

6.

Splitting statistical potentials into meaningful scoring functions: Testing the prediction of near-native structures from decoy conformations

Patrick Aloy Baldo Oliva 《BMC structural biology》2009,9(1):71-22

Background

Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both. 相似文献

7.

QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information

Pascal Benkert Torsten Schwede Silvio CE Tosatto 《BMC structural biology》2009,9(1):35-17

Background

The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. 相似文献

8.

Szymon M Kielbasa Didier Gonze Hanspeter Herzel 《BMC bioinformatics》2005,6(1):1-11

相似文献

9.

GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes

David?MA?Martin Email author Matthew?Berriman Geoffrey?J?Barton 《BMC bioinformatics》2004,5(1):178

Background

The function of a novel gene product is typically predicted by transitive assignment of annotation from similar sequences. We describe a novel method, GOtcha, for predicting gene product function by annotation with Gene Ontology (GO) terms. GOtcha predicts GO term associations with term-specific probability (P-score) measures of confidence. Term-specific probabilities are a novel feature of GOtcha and allow the identification of conflicts or uncertainty in annotation. 相似文献

10.

Local sequence alignments statistics: deviations from Gumbel statistics in the rare-event tail

Stefan Wolfsheimer Bernd Burghardt Alexander K Hartmann 《Algorithms for molecular biology : AMB》2007,2(1):9

Background

The optimal score for ungapped local alignments of infinitely long random sequences is known to follow a Gumbel extreme value distribution. Less is known about the important case, where gaps are allowed. For this case, the distribution is only known empirically in the high-probability region, which is biologically less relevant. 相似文献

11.

Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

Olivier Bastien Eric Maréchal 《BMC bioinformatics》2008,9(1):332

Background

Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value ²) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. 相似文献

12.

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER

Markus?Wistrand Erik?LL?Sonnhammer Email author 《BMC bioinformatics》2005,6(1):99

Background

Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring. 相似文献

13.

Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix

Rahul Siddharthan 《PloS one》2010,5(3)

相似文献

14.

Detailed protein sequence alignment based on Spectral Similarity Score (SSS)

Kshitiz?Gupta Email author Dina?Thomas SV?Vidya KV?Venkatesh Email author S?Ramakumar 《BMC bioinformatics》2005,6(1):105

Background

The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain. 相似文献

15.

Weighted bootstrapping: a correction method for assessing the robustness of phylogenetic trees

Vladimir Makarenkov Alix Boc Jingxin Xie Pedro Peres-Neto François-Joseph Lapointe Pierre Legendre 《BMC evolutionary biology》2010,10(1):250

Background

Non-parametric bootstrapping is a widely-used statistical procedure for assessing confidence of model parameters based on the empirical distribution of the observed data [1] and, as such, it has become a common method for assessing tree confidence in phylogenetics [2]. Traditional non-parametric bootstrapping does not weigh each tree inferred from resampled (i.e., pseudo-replicated) sequences. Hence, the quality of these trees is not taken into account when computing bootstrap scores associated with the clades of the original phylogeny. As a consequence, traditionally, the trees with different bootstrap support or those providing a different fit to the corresponding pseudo-replicated sequences (the fit quality can be expressed through the LS, ML or parsimony score) contribute in the same way to the computation of the bootstrap support of the original phylogeny. 相似文献

16.

Hypercholesterolemia and a candidate gene within the 12q24 locus

Claudia Gragnoli 《Cardiovascular diabetology》2011,10(1):1-3

Background

The 12q24 locus entails at least one gene responsible for hypercholesterolemia. Within the 12q24 locus lies the gene of proteasome modulator 9 (PSMD9). PSMD9 is in linkage with type 2 diabetes (T2D), T2D-nephropathy and macrovascular pathology in Italian families and PSMD9 rare mutations contribute to T2D.

Aims

In the present study, we aimed at determining whether the PSMD9 T2D risk single nucleotide polymorphisms (SNPs) IVS3 + nt460 A > G, IVS3 + nt437 T > C and E197G A > G are linked to hypercholesterolemia in 200 T2D Italian families.

Methods

We characterized 200 Italian families for presence and/or absence of hypercholesterolemia characterized by LDL levels ≥ 100 mg/dl in drug-naïve patients and/or presence of a diagnosis of hypercholesterolemia in a patient treated with statin medication. The phenotypes were described as unknown in all cases in which the diagnosis was either unclear or the data were missing. We tested in the 200 Italians families for evidence of linkage of the PSMD9 SNPs with hypercholesterolemia. The non-parametric linkage analysis was performed for the qualitative phenotype by using the Merlin software; the Lod score and correspondent P-value were calculated. For the significant linkage score, 1000 replicates were performed to calculate the empirical P-value.

Results

The PSMD9 gene SNPs studied show linkage to hypercholesterolemia. The results are not due to random chance.

Conclusions

PSMD9 should be tested in all populations reporting linkage to hypercholesterolemia within the chromosome 12q24 locus. The impact of this gene on hypercholesterolemia and contribution to cardio- and cerebrovascular events may be high. 相似文献

17.

HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs

Daniel E Russ Kwan-Yuet Ho Nancy S Longo 《BMC bioinformatics》2015,16(1)

Background

Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results

In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions

HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score. 相似文献

18.

Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

Sven Warris Sander Boymans Iwe Muiser Michiel Noback Wim Krijnen Jan-Peter Nap 《BMC research notes》2014,7(1):1-10

Background

Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings.

Results

Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition.

Conclusion

The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. 相似文献

19.

Comparing the probability of stroke by the Framingham risk score in hypertensive Korean patients visiting private clinics and tertiary hospitals

Cheol Ung Choi Chang Gyu Park 《BMC neurology》2010,10(1):78

Background

The purpose of this study was to investigate the pattern of distribution of risk factors for stroke and the 10-year probability of stroke by the Framingham risk score in hypertensive patients visiting private clinics vs. tertiary hospitals. 相似文献

20.

Optimizing amino acid substitution matrices with a local alignment kernel

Hiroto Saigo Jean-Philippe Vert Tatsuya Akutsu 《BMC bioinformatics》2006,7(1):246-12

Background

Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different. 相似文献