首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed.

Results

Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools.

Conclusions

Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots.  相似文献   

2.
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.  相似文献   

3.

Background  

Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis.  相似文献   

4.

Background  

In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces.  相似文献   

5.

Background  

Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences.  相似文献   

6.
On Gibbs sampling for state space models   总被引:26,自引:0,他引:26  
CARTER  C. K.; KOHN  R. 《Biometrika》1994,81(3):541-553
We show how to use the Gibbs sampler to carry out Bayesian inferenceon a linear state space model with errors that are a mixtureof normals and coefficients that can switch over time. Our approachsimultaneously generates the whole of the state vector giventhe mixture and coefficient indicator variables and simultaneouslygenerates all the indicator variables conditional on the statevectors. The states are generated efficiently using the Kalmanfilter. We illustrate our approach by several examples and empiricallycompare its performance to another Gibbs sampler where the statesare generated one at a time. The empirical results suggest thatour approach is both practical to implement and dominates theGibbs sampler that generates the states one at a time.  相似文献   

7.

Background  

Phase I/II placebo-controlled clinical trials of recombinant Factor VIIa (rFVIIa) suggested that administration of rFVIIa within 4 hours after onset of intracerebral hemorrhage (ICH) is safe, limits ICH growth, and improves outcomes. We sought to determine the cost-effectiveness of rFVIIa for acute ICH treatment, using published Phase II data. We hypothesized that rFVIIa would have a low marginal cost-effectiveness ratio (mCER) given the poor neurologic outcomes after ICH with conventional management.  相似文献   

8.

Background

Ranks have been used as phenotypes in the genetic evaluation of horses for a long time through the use of earnings, normal score or raw ranks. A model, ("underlying model" of an unobservable underlying variable responsible for ranks) exists. Recently, a full Bayesian analysis using this model was developed. In addition, in reality, competitions are structured into categories according to the technical level of difficulty linked to the technical ability of horses (horses considered to be the "best" meet their peers). The aim of this article was to validate the underlying model through simulations and to propose a more appropriate model with a mixture distribution of horses in the case of a structured competition. The simulations involved 1000 horses with 10 to 50 performances per horse and 4 to 20 horses per event with unstructured and structured competitions.

Results

The underlying model responsible for ranks performed well with unstructured competitions by drawing liabilities in the Gibbs sampler according to the following rule: the liability of each horse must be drawn in the interval formed by the liabilities of horses ranked before and after the particular horse. The estimated repeatability was the simulated one (0.25) and regression between estimated competing ability of horses and true ability was close to 1. Underestimations of repeatability (0.07 to 0.22) were obtained with other traditional criteria (normal score or raw ranks), but in the case of a structured competition, repeatability was underestimated (0.18 to 0.22). Our results show that the effect of an event, or category of event, is irrelevant in such a situation because ranks are independent of such an effect. The proposed mixture model pools horses according to their participation in different categories of competition during the period observed. This last model gave better results (repeatability 0.25), in particular, it provided an improved estimation of average values of competing ability of the horses in the different categories of events.

Conclusions

The underlying model was validated. A correct drawing of liabilities for the Gibbs sampler was provided. For a structured competition, the mixture model with a group effect assigned to horses gave the best results.  相似文献   

9.
We propose a method for estimating the size of a population in a multiple record system in the presence of missing data. The method is based on a latent class model where the parameters and the latent structure are estimated using a Gibbs sampler. The proposed approach is illustrated through the analysis of a data set already known in the literature, which consists of five registrations of neural tube defects.  相似文献   

10.

Background  

Haplotype based linkage disequilibrium (LD) mapping has become a powerful and cost-effective method for performing genetic association studies, particularly in the search for genetic markers in linkage disequilibrium with complex disease loci. Various methods (e.g. Monte-Carlo (Gibbs sampling); EM (expectation maximization); and Clark's method) have been used to estimate haplotype frequencies from routine genotyping data.  相似文献   

11.

Background

CRANKITE is a suite of programs for simulating backbone conformations of polypeptides and proteins. The core of the suite is an efficient Metropolis Monte Carlo sampler of backbone conformations in continuous three-dimensional space in atomic details.

Methods

In contrast to other programs relying on local Metropolis moves in the space of dihedral angles, our sampler utilizes local crankshaft rotations of rigid peptide bonds in Cartesian space.

Results

The sampler allows fast simulation and analysis of secondary structure formation and conformational changes for proteins of average length.  相似文献   

12.
13.

Background  

The program eBURST uses multilocus sequence typing data to divide bacterial populations into groups of closely related strains (clonal complexes), predicts the founding genotype of each group, and displays the patterns of recent evolutionary descent of all other strains in the group from the founder. The reliability of eBURST was evaluated using populations simulated with different levels of recombination in which the ancestry of all strains was known.  相似文献   

14.

Background  

Multi-Locus Sequence Typing (MLST) has emerged as a leading molecular typing method owing to its high ability to discriminate among bacterial isolates, the relative ease with which data acquisition and analysis can be standardized, and the high portability of the resulting sequence data. While MLST has been successfully applied to the study of the population structure for a number of different bacterial species, it has also provided compelling evidence for high rates of recombination in some species. We have analyzed a set of Campylobacter jejuni strains using MLST and Comparative Genomic Hybridization (CGH) on a full-genome microarray in order to determine whether recombination and high levels of genomic mosaicism adversely affect the inference of strain relationships based on the analysis of a restricted number of genetic loci.  相似文献   

15.

Background

Prediction of the binding ability of antigen peptides to major histocompatibility complex (MHC) class II molecules is important in vaccine development. The variable length of each binding peptide complicates this prediction. Motivated by a text mining model designed for building a classifier from labeled and unlabeled examples, we have developed an iterative supervised learning model for the prediction of MHC class II binding peptides.

Results

A linear programming (LP) model was employed for the learning task at each iteration, since it is fast and can re-optimize the previous classifier when the training sets are altered. The performance of the new model has been evaluated with benchmark datasets. The outcome demonstrates that the model achieves an accuracy of prediction that is competitive compared to the advanced predictors (the Gibbs sampler and TEPITOPE). The average areas under the ROC curve obtained from one variant of our model are 0.753 and 0.715 for the original and homology reduced benchmark sets, respectively. The corresponding values are respectively 0.744 and 0.673 for the Gibbs sampler and 0.702 and 0.667 for TEPITOPE.

Conclusion

The iterative learning procedure appears to be effective in prediction of MHC class II binders. It offers an alternative approach to this important predictionproblem.  相似文献   

16.

Aims

To evaluate the performance of four sampling methods [contact plates, electrostatic wipes (wipe), swabs and a novel roller sampler] for recovery of Staphylococcus aureus from a stainless steel surface.

Methods and Results

Stainless steel test plates were inoculated with Staph. aureus, dried for 24 h and sampled using each of the four methods. Samples were either incubated directly (roller, contact plate) or processed using elution and membrane filtration (swab, wipe). Performance was assessed by calculating the apparent sampling efficiency (ASE), analytical sensitivity (Sn) and percentage of replications with positive growth. The wipe demonstrated the best performance across all inoculating concentrations (ASE48 h = 18%; Sn48 h = 7 CFU per 100 cm2). The swab performed well when corrected for area actually sampled (ASE48 h = 24%; Sn48 h = 76 CFU per 100 cm2). Of the contact‐based methods, the newly developed roller sampler outperformed the contact plate (roller: ASE48 h = 10%; Sn48 h = 17 CFU per 100 cm2; contact plate: ASE48 h = 0·04%; Sn48 h = 1412 CFU per 100 cm2); both contact samplers performed better at higher inoculating concentrations (6E3 CFU per 100 cm2 for the roller and 6E6 CFU per 100 cm2 for the contact plate). Overall, the electrostatic wipe produced the highest number of replications resulting in positive growth (74%24 h, 91%48 h).

Conclusions

This study demonstrates that selection of the sampling method must be carefully considered, given that different methods have varying performance.

Significance and Impact of the Study

This is the first study assessing static wipes for sampling and one that uses a more real‐world‐relevant 24‐h drying time. The results help with infection control, and environmental health professionals choose better sampling methodologies.  相似文献   

17.

Background  

Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM).  相似文献   

18.

Background  

Yu et al. (BMC Bioinformatics 2007,8: 145+) have recently compared the performance of several methods for the detection of genomic amplification and deletion breakpoints using data from high-density single nucleotide polymorphism arrays. One of the methods compared is our non-homogenous Hidden Markov Model approach. Our approach uses Markov Chain Monte Carlo for inference, but Yu et al. ran the sampler for a severely insufficient number of iterations for a Markov Chain Monte Carlo-based method. Moreover, they did not use the appropriate reference level for the non-altered state.  相似文献   

19.

Background  

The pharmacokinetics of extracellular solutes is determined by the blood-tissue exchange kinetics and the volume of distribution in the interstitial space in the different organs. This information can be used to develop a general physiologically based pharmacokinetic (PBPK) model applicable to most extracellular solutes.  相似文献   

20.

Background  

We propose a statistically principled baseline correction method, derived from a parametric smoothing model. It uses a score function to describe the key features of baseline distortion and constructs an optimal baseline curve to maximize it. The parameters are determined automatically by using LOWESS (locally weighted scatterplot smoothing) regression to estimate the noise variance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号