首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 9 毫秒
Monte Carlo feature selection for supervised classification   总被引:4,自引:0,他引:4  
MOTIVATION: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. RESULTS: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods. AVAILABILITY: Prototype available upon request.  相似文献   

We present an effective theory for water. Our goal is to formulate on accurate model for the effects of solvation on protein dynamics, without incurring the huge computational cost and the slow temporal evolution typical of molecular dynamics simulations of liquids. We replace the individual water molecules in an all-atom potential with a local dielectric density field, with self interactions given by the Landau-Ginzburg free energy and external interactions by Lennard-Jones forces at the surface of the protein atoms. We explore conformational space with finite temperature Monte Carlo dynamics, using parallel Langevin and Fourier acceleration algorithms well suited to data-parallel computer architectures such as the Connection Machine. To establish the validity of our approximations, we compare our electrostatic contribution to the solvalion energy with the results of Lim, Bashford, and Karplus using a conventional static continuum dielectric cavity model, and the non electrostatic contributions with estimates of hydrophohic surface free energy. Our model can also accommodate ionic charges and temperature fluctuations, We propose future investigations extending our effective theory of solvation to include explicit orientational entropy and hydroxen-bonding terms. © 1995 John Wiley & Sons, Inc.  相似文献   

We introduce a Monte Carlo approach to combined segregation and linkage analysis of a quantitative trait observed in an extended pedigree. In conjunction with the Monte Carlo method of likelihood-ratio evaluation proposed by Thompson and Guo, the method provides for estimation and hypothesis testing. The greatest attraction of this approach is its ability to handle complex genetic models and large pedigrees. Two examples illustrate the practicality of the method. One is of simulated data on a large pedigree; the other is a reanalysis of published data previously analyzed by other methods.  相似文献   

Using a recently developed protein folding algorithm, a prediction of the tertiary structure of the KIX domain of the CREB binding protein is described. The method incorporates predicted secondary and tertiary restraints derived from multiple sequence alignments in a reduced protein model whose conformational space is explored by Monte Carlo dynamics. Secondary structure restraints are provided by the PHD secondary structure prediction algorithm that was modified for the presence of predicted U-turns, i.e., regions where the chain reverses global direction. Tertiary restraints are obtained via a two-step process: First, seed side-chain contacts are identified from a correlated mutation analysis, and then, a threading-based algorithm expands the number of these seed contacts. Blind predictions indicate that the KIX domain is a putative three-helix bundle, although the chirality of the bundle could not be uniquely determined. The expected root-mean-square deviation for the correct chirality of the KIX domain is between 5.0 and 6.2 Å. This is to be compared with the estimate of 12.9 Å that would be expected by a random prediction, using the model of F. Cohen and M. Sternberg (J. Mol. Biol. 138:321–333, 1980). Proteins 30:287–294, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

Zhdanov VP  Kasemo B 《Proteins》2000,39(1):76-81
We present the results of three-dimensional lattice Monte Carlo simulations of protein diffusion on the liquid-solid interface in a wide temperature range including the most interesting temperatures (from slightly below T(f) and up to T(c), where T(f) and T(c) are the folding and collapse temperatures). For the model under consideration (27 monomers of two types), the temperature dependence of the diffusion coefficient is found to obey the Arrhenius law with the normal value (approximately 10(-2)-10(-3) cm(2)/s) of the preexponential factor. Proteins 2000;39:76-81.  相似文献   

V.P. Zhdanov  B. Kasemo 《Proteins》1998,30(2):168-176
Denaturation of model proteinlike molecules at the liquid–solid interface is simulated over a wide temperature range by employing the lattice Monte Carlo technique. Initially, the molecule containing 27 monomers of two types (A and B) is assumed to be adsorbed in the native folded state (a 3 × 3 × 3 cube) so that one of its sides is in contact with the surface. The details of the denaturation kinetics are found to be slightly dependent on the choice of the side, but the main qualitative conclusions hold for all the sides. In particular, the kinetics obey approximately the conventional first-order law at T > Tc (Tc is the collapse temperature for solution). With decreasing temperature, below Tc but above Tf (Tf is the folding temperature for solution), deviations appear from the first-order kinetics. For the most interesting temperatures, that is, below Tf, the denaturation kinetics are shown to be qualitatively different from the conventional ones. In particular, the denaturation process occurs via several intermediate steps due to trapping in metastable states. Mathematically, this means that (i) the transition to the denatured state of a given molecule is nonexponential, and (ii) the denaturation process cannot be described by a single rate constant kr. One should rather introduce a distribution of values of this rate constant (different values of kr correspond to the transitions to the altered state via different metastable states). Proteins 30:168–176, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

A Caflisch  P Niederer  M Anliker 《Proteins》1992,13(3):223-230
A new two-step procedure has been developed for the docking of flexible oligopeptide chains of unknown conformation to static proteins of known structure. In the first step positions and conformations are sampled and the association energy minimized starting from an approximate preselected docking position. The resulting conformations are further optimized in the second step by a Metropolis Monte Carlo minimization, which optimizes each of these structures. The method has been tested on the HIV-1 aspartic proteinase complex with an inhibitor, whose crystallographic structure is known at 2.3 A resolution. Furthermore, the application of this method to the docking of the hendecapeptide 58-68 of the influenza A virus matrix protein to the HLA-A2 molecule produced results which are in agreement with experimental observations in identifying side chains critical for T cell recognition and residues responsible of MHC protein binding.  相似文献   

Yi N 《Genetics》2004,167(2):967-975
In this article, a unified Markov chain Monte Carlo (MCMC) framework is proposed to identify multiple quantitative trait loci (QTL) for complex traits in experimental designs, based on a composite space representation of the problem that has fixed dimension. The proposed unified approach includes the existing Bayesian QTL mapping methods using reversible jump MCMC algorithm as special cases. We also show that a variety of Bayesian variable selection methods using Gibbs sampling can be applied to the composite model space for mapping multiple QTL. The unified framework not only results in some new algorithms, but also gives useful insight into some of the important factors governing the performance of Gibbs sampling and reversible jump for mapping multiple QTL. Finally, we develop strategies to improve the performance of MCMC algorithms.  相似文献   

We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.  相似文献   

Summary It is shown how the shape of neutron-produced single event spectra for spherical cavities varies with sphere diameter and neutron energy. This variation is due to the changing contributions of the most frequently produced secondary particles to the different regions of the total single event spectrum. The latter is shown on the basis of single event spectra of protons,12C-,14N- and16O-recoils, which have been calculated using the Monte Carlo method.  相似文献   

PurposeTo analyze breast screening randomized trials with a Monte Carlo simulation tool.MethodsA simulation tool previously developed to simulate breast screening programmes was adapted for that purpose. The history of women participating in the trials was simulated, including a model for survival after local treatment of invasive cancers. Distributions of time gained due to screening detection against symptomatic detection and the overall screening sensitivity were used as inputs. Several randomized controlled trials were simulated. Except for the age range of women involved, all simulations used the same population characteristics and this permitted to analyze their external validity. The relative risks obtained were compared to those quoted for the trials, whose internal validity was addressed by further investigating the reasons of the disagreements observed.ResultsThe Monte Carlo simulations produce results that are in good agreement with most of the randomized trials analyzed, thus indicating their methodological quality and external validity. A reduction of the breast cancer mortality around 20% appears to be a reasonable value according to the results of the trials that are methodologically correct. Discrepancies observed with Canada I and II trials may be attributed to a low mammography quality and some methodological problems. Kopparberg trial appears to show a low methodological quality.ConclusionMonte Carlo simulations are a powerful tool to investigate breast screening controlled randomized trials, helping to establish those whose results are reliable enough to be extrapolated to other populations and to design the trial strategies and, eventually, adapting them during their development.  相似文献   

MOTIVATION: Target selection strategies for structural genomic projects must be able to prioritize gene regions on the basis of significant sequence similarity with proteins that have already been structurally determined. With the rapid development of protein comparison software a robust prioritization scheme should be independent of the choice of algorithm and be able to incorporate different sequence similarity thresholds. RESULTS: A robust target selection strategy has been developed that can assign a priority level to all genes in any genome. Structural assignments to genome sequences are calculated at two thresholds and six levels (1-6) describe the prioritization of all whole genes and partial gene regions. This simple two-threshold approach can be implemented with any fold recognition or homology detection algorithms. The results for 10 genomes are presented using the SSEARCH and PSI-BLAST programs. AVAILABILITY: Programs are available on request from the authors.  相似文献   

Zhang H 《Proteins》1999,34(4):464-471
A new Hybrid Monte Carlo (HMC) algorithm has been developed to test protein potential functions and, ultimately, refine protein structures. The main principle of this algorithm is, in each cycle, a new trial conformation is generated by carrying out a short period of molecular dynamics (MD) iterations with a set of random parameters (including the MD time step, the number of MD steps, the MD temperature, and the seed for initial MD velocity assignment); then to accept or reject the new conformation on the basis of the Metropolis criterion. The novelty in this paper is that the potential in MD iterations is different from that in the MC step. In the former, it is a molecular mechanics potential, in the latter it is a knowledge-based potential (KBP). Directed by the KBP, the MD iteration is used to search conformational space for realistic conformations with low KBP energy. It circumvents the difficulty in using KBP functions directly in MD simulation, as KBP functions are typically incomplete, and do not always have continuous derivatives required for the calculation of the forces. The new algorithm has been tested in explorations of conformational space. In these test calculations the KBP energy was found to drop below the value for the native conformation, and the correlation between the root mean square deviation (RMSD) and the KBP energy was shown to be different from the test results in other references. At the present time, the algorithm is useful for testing new KBP functions. Furthermore, if a KBP function can be found for which the native conformation has the lowest energy and the energy/RMSD correlation is good, then this new algorithm also will be a tool for refinement of the theory-based structural models.  相似文献   

A subcloning strategy for DNA sequence analysis.   总被引:50,自引:15,他引:35       下载免费PDF全文
We describe here a new strategy of fragment preparation for sequencing procedures using endlabelled DNA fragments as substrates (2,3) which is directly applicable to DNA fragments cloned into the Pst I site of pBR322, or in modified form, to inserts into the BamH I or Sal I site of the same plasmid. Ordered sets of subclones of predetermined overlap are are generated. These can be sequenced directly without further strand- or fragment separation steps.  相似文献   



The engineering of fusion proteins has become increasingly important and most recently has formed the basis of many biosensors, protein purification systems, and classes of new drugs. Currently, most fusion proteins consist of three or fewer domains, however, more sophisticated designs could easily involve three or more domains. Using traditional subcloning strategies, this requires micromanagement of restriction enzymes sites that results in complex workaround solutions, if any at all.


Therefore, to aid in the efficient construction of fusion proteins involving multiple domains, we have created a new expression vector that allows us to rapidly generate a library of cassettes. Cassettes have a standard vector structure based on four specific restriction endonuclease sites and using a subtle property of blunt or compatible cohesive end restriction enzymes, they can be fused in any order and number of times. Furthermore, the insertion of PCR products into our expression vector or the recombination of cassettes can be dramatically simplified by screening for the presence or absence of fluorescence.


Finally, the utility of this new strategy was demonstrated by the creation of basic cassettes for protein targeting to subcellular organelles and for protein purification using multiple affinity tags.

CD1d-deficient (CD1d-/-) mouse lymphocytes were analyzed to classify the natural killer T (NKT) cells without reactivity to CD1d. The cells bearing a V(alpha)19.1-J(alpha)26 (AV19-AJ33) invariant TCR alpha chain, originally found in the peripheral blood lymphocytes, were demonstrated to be abundant in the NK1.1+ but not NK1.1- T cell population isolated from CD1d-/- mice. Moreover, more than half (11/21) of the hybrid cell lines established from CD1d-/- NKT cells expressed the V(alpha)19.1-J(alpha)26 invariant TCR alpha chain. The expression of the invariant V(alpha)19.1-J(alpha)26 mRNA was absent in beta2-microglobulin-deficient mice. Collectively, the present findings suggest the presence of a second NKT cell repertoire characterized by an invariant TCR alpha chain (V(alpha)19.1-J(alpha)26) that is selected by an MHC class I-like molecule other than CD1d.  相似文献   

Amino acid sequences have already been examined in some detail in order to relate them to structural aspects, homology and gene duplication. This report introduces the concept of internal uniqueness of tripeptides within protein sequences and uses the Monte Carlo method to study this property. Some idea of internal uniqueness may be obtained from such an analysis using only a single sequence if the probability of the random occurrence is about 0.001 or less. This method of analysis is similar to that used in quantitative evaluations of homology. When the probability of the random occurrence is larger than 0.001 a homologous group of sequences is required and the random probabilities may be compared with the real occurrences within the group. From such an examination insulin and cytochrome c are identified as protein sequences with high internal uniqueness. A comparison of data from internal uniqueness and gene duplication analyses shows that these two properties need not be related. Results of the analysis point to internal uniqueness as an additional parameter for inclusion in speculations on why twenty amino acids are coded in protein structure.  相似文献   

T Noguti  N Go 《Biopolymers》1985,24(3):527-546
A powerful Monte Carlo method is described to simulate thermal conformational fluctuations in native proteins by using an empirical conformational energy function in which bond lengths and bond angles are kept fixed and only dihedral angles are independent variables. In this method, collective variables corresponding to eigenvectors of the second-derivative matrix of the energy function at its minimum point are scaled according to corresponding eigenvalues in such a way that the energy function in terms of the scaled collective variables is isotropic at the minimum point. Simulation is carried out with an isotropic step size in the space of these scaled collective variables. This simulation method is applied to a small protein, bovine pancreatic trypsin inhibitor (BPTI), and its model harmonic system defined by a quadratic energy function with the same second-derivative matrix as that of BPTI at its minimum point. Efficiency of the simulation method with an isotropic step size in the space of the scaled collective variables is found to be about 500–50 times greater than the conventional method with with an isotropic step in the space of the usual nonscaled variables. One step of this new method generates conformational changes that occur in the real-time range of 0.05 ps. In a record of 5 × 105 step simulation, the BPTI molecule is observed to migrate beyond a single minimum-energy region.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号