期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM

Liqi Li Sanjiu Yu Weidong Xiao Yongsheng Li Lan Huang Xiaoqi Zheng Shiwen Zhou Hua Yang 《BMC bioinformatics》2014,15(1)

Background

Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed.

Results

Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools.

Conclusions

Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots. 相似文献

2.

Motif Yggdrasil: sampling sequence motifs from a tree mixture model.

Samuel A Andersson Jens Lagergren 《Journal of computational biology》2007,14(5):682-697

In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version. 相似文献

3.

Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

Bo Chen Minhua Chen John Paisley Aimee Zaas Christopher Woods Geoffrey S Ginsburg Alfred HeroIII Joseph Lucas David Dunson Lawrence Carin 《BMC bioinformatics》2010,11(1):552

Background

Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. 相似文献

4.

An image processing approach to computing distances between RNA secondary structures dot plots

Tor Ivry Shahar Michal Assaf Avihoo Guillermo Sapiro Danny Barash 《Algorithms for molecular biology : AMB》2009,4(1):1-19

Background

In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces. 相似文献

5.

Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning

Philippe Lemey Martin Lott Darren P Martin Vincent Moulton 《BMC bioinformatics》2009,10(1):126-18

Background

Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences. 相似文献

6.

On Gibbs sampling for state space models 总被引：26，自引：0，他引：26

CARTER C. K.; KOHN R. 《Biometrika》1994,81(3):541-553

We show how to use the Gibbs sampler to carry out Bayesian inferenceon a linear state space model with errors that are a mixtureof normals and coefficients that can switch over time. Our approachsimultaneously generates the whole of the state vector giventhe mixture and coefficient indicator variables and simultaneouslygenerates all the indicator variables conditional on the statevectors. The states are generated efficiently using the Kalmanfilter. We illustrate our approach by several examples and empiricallycompare its performance to another Gibbs sampler where the statesare generated one at a time. The empirical results suggest thatour approach is both practical to implement and dominates theGibbs sampler that generates the states one at a time. 相似文献

7.

Cost effectiveness of recombinant factor VIIa for treatment of intracerebral hemorrhage

Brett M Kissela Mark H Eckman 《BMC neurology》2008,8(1):17

Background

Phase I/II placebo-controlled clinical trials of recombinant Factor VIIa (rFVIIa) suggested that administration of rFVIIa within 4 hours after onset of intracerebral hemorrhage (ICH) is safe, limits ICH growth, and improves outcomes. We sought to determine the cost-effectiveness of rFVIIa for acute ICH treatment, using published Phase II data. We hypothesized that rFVIIa would have a low marginal cost-effectiveness ratio (mCER) given the poor neurologic outcomes after ICH with conventional management. 相似文献

8.

Validation of models for analysis of ranks in horse breeding evaluation

Anne Ricard Andrés Legarra 《遗传、选种与进化》2010,42(1):3

Background

Ranks have been used as phenotypes in the genetic evaluation of horses for a long time through the use of earnings, normal score or raw ranks. A model, ("underlying model" of an unobservable underlying variable responsible for ranks) exists. Recently, a full Bayesian analysis using this model was developed. In addition, in reality, competitions are structured into categories according to the technical level of difficulty linked to the technical ability of horses (horses considered to be the "best" meet their peers). The aim of this article was to validate the underlying model through simulations and to propose a more appropriate model with a mixture distribution of horses in the case of a structured competition. The simulations involved 1000 horses with 10 to 50 performances per horse and 4 to 20 horses per event with unstructured and structured competitions.

Results

The underlying model responsible for ranks performed well with unstructured competitions by drawing liabilities in the Gibbs sampler according to the following rule: the liability of each horse must be drawn in the interval formed by the liabilities of horses ranked before and after the particular horse. The estimated repeatability was the simulated one (0.25) and regression between estimated competing ability of horses and true ability was close to 1. Underestimations of repeatability (0.07 to 0.22) were obtained with other traditional criteria (normal score or raw ranks), but in the case of a structured competition, repeatability was underestimated (0.18 to 0.22). Our results show that the effect of an event, or category of event, is irrelevant in such a situation because ranks are independent of such an effect. The proposed mixture model pools horses according to their participation in different categories of competition during the period observed. This last model gave better results (repeatability 0.25), in particular, it provided an improved estimation of average values of competing ability of the horses in the different categories of events.

Conclusions

The underlying model was validated. A correct drawing of liabilities for the Gibbs sampler was provided. For a structured competition, the mixture model with a group effect assigned to horses gave the best results. 相似文献

9.

Bayesian latent class models for capture–recapture in the presence of missing data

Davide Di Cecco Marco Di Zio Brunero Liseo 《Biometrical journal. Biometrische Zeitschrift》2020,62(4):957-969

We propose a method for estimating the size of a population in a multiple record system in the presence of missing data. The method is based on a latent class model where the parameters and the latent structure are estimated using a Gibbs sampler. The proposed approach is illustrated through the analysis of a data set already known in the literature, which consists of five registrations of neural tube defects. 相似文献

10.

Optimal Step Length EM Algorithm (OSLEM) for the estimation of haplotype frequency and its application in lipoprotein lipase genotyping

Peisen?Zhang Email author Huitao?Sheng Alfredo?Morabia T?Conrad?Gilliam 《BMC bioinformatics》2003,4(1):3

Background

Haplotype based linkage disequilibrium (LD) mapping has become a powerful and cost-effective method for performing genetic association studies, particularly in the search for genetic markers in linkage disequilibrium with complex disease loci. Various methods (e.g. Monte-Carlo (Gibbs sampling); EM (expectation maximization); and Clark's method) have been used to estimate haplotype frequencies from routine genotyping data. 相似文献

11.

CRANKITE: A fast polypeptide backbone conformation sampler

Alexei A Podtelezhnikov David L Wild 《Source code for biology and medicine》2008,3(1):1-7

Background

CRANKITE is a suite of programs for simulating backbone conformations of polypeptides and proteins. The core of the suite is an efficient Metropolis Monte Carlo sampler of backbone conformations in continuous three-dimensional space in atomic details.

Methods

In contrast to other programs relying on local Metropolis moves in the space of dihedral angles, our sampler utilizes local crankshaft rotations of rigid peptide bonds in Cartesian space.

Results

The sampler allows fast simulation and analysis of secondary structure formation and conformational changes for proteins of average length. 相似文献

12.

*omeSOM: a software for clustering and visualization of transcriptional and metabolite data mined from interspecific crosses of crop plants

Diego H Milone Georgina S Stegmayer Laura Kamenetzky Mariana López Je Min Lee James J Giovannoni Fernando Carrari 《BMC bioinformatics》2010,11(1):438

相似文献

13.

Assessing the reliability of eBURST using simulated populations with known ancestry

Katherine ME Turner William P Hanage Christophe Fraser Thomas R Connor Brian G Spratt 《BMC microbiology》2007,7(1):30

Background

The program eBURST uses multilocus sequence typing data to divide bacterial populations into groups of closely related strains (clonal complexes), predicts the founding genotype of each group, and displays the patterns of recent evolutionary descent of all other strains in the group from the founder. The reliability of eBURST was evaluated using populations simulated with different levels of recombination in which the ancestry of all strains was known. 相似文献

14.

Comparative genomic assessment of Multi-Locus Sequence Typing: rapid accumulation of genomic heterogeneity among clonal isolates of <Emphasis Type="Italic">Campylobacter jejuni</Emphasis>

Eduardo N Taboada Joanne M MacKinnon Christian C Luebbert Victor PJ Gannon John HE Nash Kris Rahn 《BMC evolutionary biology》2008,8(1):229

Background

Multi-Locus Sequence Typing (MLST) has emerged as a leading molecular typing method owing to its high ability to discriminate among bacterial isolates, the relative ease with which data acquisition and analysis can be standardized, and the high portability of the resulting sequence data. While MLST has been successfully applied to the study of the population structure for a number of different bacterial species, it has also provided compelling evidence for high rates of recombination in some species. We have analyzed a set of Campylobacter jejuni strains using MLST and Comparative Genomic Hybridization (CGH) on a full-genome microarray in order to determine whether recombination and high levels of genomic mosaicism adversely affect the inference of strain relationships based on the analysis of a restricted number of genetic loci. 相似文献

15.

Prediction of MHC class II binding peptides based on an iterative learning model

Naveen Murugan Yang Dai 《Immunome research》2005,1(1):1-10

Background

Prediction of the binding ability of antigen peptides to major histocompatibility complex (MHC) class II molecules is important in vaccine development. The variable length of each binding peptide complicates this prediction. Motivated by a text mining model designed for building a classifier from labeled and unlabeled examples, we have developed an iterative supervised learning model for the prediction of MHC class II binding peptides.

Results

A linear programming (LP) model was employed for the learning task at each iteration, since it is fast and can re-optimize the previous classifier when the training sets are altered. The performance of the new model has been evaluated with benchmark datasets. The outcome demonstrates that the model achieves an accuracy of prediction that is competitive compared to the advanced predictors (the Gibbs sampler and TEPITOPE). The average areas under the ROC curve obtained from one variant of our model are 0.753 and 0.715 for the original and homology reduced benchmark sets, respectively. The corresponding values are respectively 0.744 and 0.673 for the Gibbs sampler and 0.702 and 0.667 for TEPITOPE.

Conclusion

The iterative learning procedure appears to be effective in prediction of MHC class II binders. It offers an alternative approach to this important predictionproblem. 相似文献

16.

Comparative performance of contact plates,electrostatic wipes,swabs and a novel sampling device for the detection of Staphylococcus aureus on environmental surfaces

J.K. Lutz J. Crawford A.E. Hoet J.R. Wilkins III J. Lee 《Journal of applied microbiology》2013,115(1):171-178

Aims

To evaluate the performance of four sampling methods [contact plates, electrostatic wipes (wipe), swabs and a novel roller sampler] for recovery of Staphylococcus aureus from a stainless steel surface.

Methods and Results

Stainless steel test plates were inoculated with Staph. aureus, dried for 24 h and sampled using each of the four methods. Samples were either incubated directly (roller, contact plate) or processed using elution and membrane filtration (swab, wipe). Performance was assessed by calculating the apparent sampling efficiency (ASE), analytical sensitivity (Sn) and percentage of replications with positive growth. The wipe demonstrated the best performance across all inoculating concentrations (ASE_48 h = 18%; Sn_48 h = 7 CFU per 100 cm²). The swab performed well when corrected for area actually sampled (ASE_48 h = 24%; Sn_48 h = 76 CFU per 100 cm²). Of the contact‐based methods, the newly developed roller sampler outperformed the contact plate (roller: ASE_48 h = 10%; Sn_48 h = 17 CFU per 100 cm²; contact plate: ASE_48 h = 0·04%; Sn_48 h = 1412 CFU per 100 cm²); both contact samplers performed better at higher inoculating concentrations (6E3 CFU per 100 cm² for the roller and 6E6 CFU per 100 cm² for the contact plate). Overall, the electrostatic wipe produced the highest number of replications resulting in positive growth (74%_24 h, 91%_48 h).

Conclusions

This study demonstrates that selection of the sampling method must be carefully considered, given that different methods have varying performance.

Significance and Impact of the Study

This is the first study assessing static wipes for sampling and one that uses a more real‐world‐relevant 24‐h drying time. The results help with infection control, and environmental health professionals choose better sampling methodologies. 相似文献

17.

The value of position-specific priors in motif discovery using MEME

Timothy L Bailey Mikael Bodén Tom Whitington Philip Machanick 《BMC bioinformatics》2010,11(1):179

Background

Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM). 相似文献

18.

A response to Yu et al. "A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array", BMC Bioinformatics 2007, 8: 145

Oscar M Rueda Ramon Diaz-Uriarte 《BMC bioinformatics》2007,8(1):394

Background

Yu et al. (BMC Bioinformatics 2007,8: 145+) have recently compared the performance of several methods for the detection of genomic amplification and deletion breakpoints using data from high-density single nucleotide polymorphism arrays. One of the methods compared is our non-homogenous Hidden Markov Model approach. Our approach uses Markov Chain Monte Carlo for inference, but Yu et al. ran the sampler for a severely insufficient number of iterations for a Markov Chain Monte Carlo-based method. Moreover, they did not use the appropriate reference level for the non-altered state. 相似文献

19.

The pharmacokinetics of the interstitial space in humans

David?G?Levitt Email author 《BMC clinical pharmacology》2003,3(1):3

Background

The pharmacokinetics of extracellular solutes is determined by the blood-tissue exchange kinetics and the volume of distribution in the interstitial space in the different organs. This information can be used to develop a general physiologically based pharmacokinetic (PBPK) model applicable to most extracellular solutes. 相似文献

20.

Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis

Yuanxin Xi David M Rocke 《BMC bioinformatics》2008,9(1):324

Background

We propose a statistically principled baseline correction method, derived from a parametric smoothing model. It uses a score function to describe the key features of baseline distortion and constructs an optimal baseline curve to maximize it. The parameters are determined automatically by using LOWESS (locally weighted scatterplot smoothing) regression to estimate the noise variance. 相似文献