首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inference of bacterial microevolution using multilocus sequence data   总被引:5,自引:0,他引:5  
Didelot X  Falush D 《Genetics》2007,175(3):1251-1266
We describe a model-based method for using multilocus sequence data to infer the clonal relationships of bacteria and the chromosomal position of homologous recombination events that disrupt a clonal pattern of inheritance. The key assumption of our model is that recombination events introduce a constant rate of substitutions to a contiguous region of sequence. The method is applicable both to multilocus sequence typing (MLST) data from a few loci and to alignments of multiple bacterial genomes. It can be used to decide whether a subset of isolates share common ancestry, to estimate the age of the common ancestor, and hence to address a variety of epidemiological and ecological questions that hinge on the pattern of bacterial spread. It should also be useful in associating particular genetic events with the changes in phenotype that they cause. We show that the model outperforms existing methods of subdividing recombinogenic bacteria using MLST data and provide examples from Salmonella and Bacillus. The software used in this article, ClonalFrame, is available from http://bacteria.stats.ox.ac.uk/.  相似文献   

2.
3.
Predicting gene expression from sequence   总被引:36,自引:0,他引:36  
Beer MA  Tavazoie S 《Cell》2004,117(2):185-198
  相似文献   

4.
5.
6.
7.
8.
The main objective of this study was to develop feasible, easy to apply models for early prediction of full flowering (FF) and maturing (MA) in apricot (Prunus armeniaca L.). Phenological data for 20 apricot cultivars grown in the Belgrade region were modeled against averages of daily temperature records over ten seasons for FF and eight seasons for MA. A much stronger correlation was found between the phenological timing and temperature at the very beginning than at the end of phenophases. Also, the length of developmental periods were better correlated to daily maximum than to daily minimum and mean air temperatures. Using prediction models based on daily maximum temperatures averaged over 30-, 45- and 60-day periods, starting from 1 January for FF prediction and from the date of FF for MA prediction, the onset of examined phenophases in apricot cultivars could be predicted from a few weeks to up to 2 months ahead with acceptable accuracy. The mean absolute differences between the observations and cross-validated predictions obtained by 30-, 45- and 60-day models were 8.6, 6.9 and 5.7 days for FF and 6.1, 3.6 and 2.8 days for MA, respectively. The validity of the results was confirmed using an independent data set for the year 2009.  相似文献   

9.
The ability to infer relationships between groups of sequences, either by searching for their evolutionary history or by comparing their sequence similarity, can be a crucial step in hypothesis testing. Interpreting relationships of human immunodeficiency virus type 1 (HIV-1) sequences can be challenging because of their rapidly evolving genomes, but it may also lead to a better understanding of the underlying biology. Several studies have focused on the evolution of HIV-1, but there is little information to link sequence similarities and evolutionary histories of HIV-1 to the epidemiological information of the infected individual. Our goal was to correlate patterns of HIV-1 genetic diversity with epidemiological information, including risk and demographic factors. These correlations were then used to predict epidemiological information through analyzing short stretches of HIV-1 sequence. Using standard phylogenetic and phenetic techniques on 100 HIV-1 subtype B sequences, we were able to show some correlation between the viral sequences and the geographic area of infection and the risk of men who engage in sex with men. To help identify more subtle relationships between the viral sequences, the method of multidimensional scaling (MDS) was performed. That method identified statistically significant correlations between the viral sequences and the risk factors of men who engage in sex with men and individuals who engage in sex with injection drug users or use injection drugs themselves. Using tree construction, MDS, and newly developed likelihood assignment methods on the original 100 samples we sequenced, and also on a set of blinded samples, we were able to predict demographic/risk group membership at a rate statistically better than by chance alone. Such methods may make it possible to identify viral variants belonging to specific demographic groups by examining only a small portion of the HIV-1 genome. Such predictions of demographic epidemiology based on sequence information may become valuable in assigning different treatment regimens to infected individuals.  相似文献   

10.
A method is presented that uses beta-strand interactions to predict the parallel right-handed beta-helix super-secondary structural motif in protein sequences. A program called BetaWrap implements this method and is shown to score known beta-helices above non-beta-helices in the Protein Data Bank in cross-validation. It is demonstrated that BetaWrap learns each of the seven known SCOP beta-helix families, when trained primarily on beta-structures that are not beta-helices, together with structural features of known beta-helices from outside the family. BetaWrap also predicts many bacterial proteins of unknown structure to be beta-helices; in particular, these proteins serve as virulence factors, adhesins, and toxins in bacterial pathogenesis and include cell surface proteins from Chlamydia and the intestinal bacterium Helicobacter pylori. The computational method used here may generalize to other beta-structures for which strand topology and profiles of residue accessibility are well conserved.  相似文献   

11.
Although much of the information regarding genes'' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie''s (BT) approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV) procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT''s ones. We also show that CV procedure used by BT to estimate their method''s prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%.  相似文献   

12.
In this postgenomic era, it is no longer necessary to argue the need for automated methods for sequence annotation. Many researchers have designed tools for analyzing DNA sequences, but running multiple tools and interpreting the results can be tedious and confusing. In the last few years, many analysis workbenches have been developed to help streamline the process of sequence annotation. Genotator, developed in 1996, is still a popular choice owing to its ease of use and its configurability. This article will review annotating sequence data using the Genotator.  相似文献   

13.
Using DNA sequence data from pathogens to infer transmission networks has traditionally been done in the context of epidemics and outbreaks. Sequence data could analogously be applied to cases of ubiquitous commensal bacteria; however, instead of inferring chains of transmission to track the spread of a pathogen, sequence data for bacteria circulating in an endemic equilibrium could be used to infer information about host contact networks. Here, we show--using simulated data--that multilocus DNA sequence data, based on multilocus sequence typing schemes (MLST), from isolates of commensal bacteria can be used to infer both local and global properties of the contact networks of the populations being sampled. Specifically, for MLST data simulated from small-world networks, the small world parameter controlling the degree of structure in the contact network can robustly be estimated. Moreover, we show that pairwise distances in the network--degrees of separation--correlate with genetic distances between isolates, so that how far apart two individuals in the network are can be inferred from MLST analysis of their commensal bacteria. This result has important consequences, and we show an example from epidemiology: how this result could be used to test for infectious origins of diseases of unknown etiology.  相似文献   

14.
15.
16.
17.
18.
MOTIVATION: Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. RESULTS: We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. AVAILABILITY: The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide  相似文献   

19.
20.
The identification of management units (MUs) is central to the management of natural populations and is crucial for monitoring the effects of human activity upon species abundance. Here, we propose that the identification of MUs from population genetic data should be based upon the amount of genetic divergence at which populations become demographically independent instead of the current criterion that focuses on rejecting panmixia. MU status should only be assigned when the observed estimate of genetic divergence is significantly greater than a predefined threshold value. We emphasize the need for a demographic interpretation of estimates of genetic divergence given that it is often the dispersal rate of individuals that is the parameter of immediate interest to conservationists rather than the historical amount of gene flow.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号