首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary Nonlinear sampling along the constant-time dimension is applied to the constant-time HNCO spectrum of the dimerization domain of Ga14. Nonlinear sampling was used for the nitrogen dimension, while the carbon and proton dimensions were sampled linearly. A conventional ct-HNCO spectrum is compared with a nonlinearly sampled spectrum, where the gain in experiment time obtained from nonlinear sampling is used to increase the resolution in the carbonyl dimension. Nonlinearly sampled data are processed by maximum entropy reconstruction. It is shown that the nonlinearly sampled spectrum has a higher resolution, although it was recorded in less time. The constant intensity of the signal in the constant-time dimension allows for a variety of sampling schedules. A schedule of randomly distributed sampling points yields the best results. This general method can be used to significantly increase the quality of heteronuclear constant-time spectra.  相似文献   

2.
Constraint-based modeling results in a convex polytope that defines a solution space containing all possible steady-state flux distributions. The properties of this polytope have been studied extensively using linear programming to find the optimal flux distribution under various optimality conditions and convex analysis to define its extreme pathways (edges) and elementary modes. The work presented herein further studies the steady-state flux space by defining its hyper-volume. In low dimensions (i.e. for small sample networks), exact volume calculation algorithms were used. However, due to the #P-hard nature of the vertex enumeration and volume calculation problem in high dimensions, random Monte Carlo sampling was used to characterize the relative size of the solution space of the human red blood cell metabolic network. Distributions of the steady-state flux levels for each reaction in the metabolic network were generated to show the range of flux values for each reaction in the polytope. These results give insight into the shape of the high-dimensional solution space. The value of measuring uptake and secretion rates in shrinking the steady-state flux solution space is illustrated through singular value decomposition of the randomly sampled points. The V(max) of various reactions in the network are varied to determine the sensitivity of the solution space to the maximum capacity constraints. The methods developed in this study are suitable for testing the implication of additional constraints on a metabolic network system and can be used to explore the effects of single nucleotide polymorphisms (SNPs) on network capabilities.  相似文献   

3.
Particle swarm optimization algorithms have been successfully applied to discrete/valued optimization problems. However, in many cases the algorithms have been tailored specifically for the problem at hand. This paper proposes a generic set-based particle swarm optimization algorithm for use in discrete-valued optimization problems that can be formulated as set-based problems. A detailed sensitivity analysis of the parameters of the algorithm is conducted. The performance of the proposed algorithm is then compared against three other discrete particle swarm optimization algorithms from literature using the multidimensional knapsack problem and is shown to statistically outperform the existing algorithms.  相似文献   

4.
There has been a recent trend in genetic studies of wild populations where researchers have changed their sampling schemes from sampling pre-defined populations to sampling individuals uniformly across landscapes. This reflects the fact that many species under study are continuously distributed rather than clumped into obvious “populations”. Once individual samples are collected, many landscape genetic studies use clustering algorithms and multilocus genetic data to group samples into subpopulations. After clusters are derived, landscape features that may be acting as barriers are examined and described. In theory, if populations were evenly sampled, this course of action should reliably identify population structure. However, genetic gradients and irregularly collected samples may impact the composition and location of clusters. We built genetic models where individual genotypes were either randomly distributed across a landscape or contained gradients created by neighbor mating for multiple generations. We investigated the influence of six different sampling protocols on population clustering using program STRUCTURE, the most commonly used model-based clustering method for multilocus genotype data. For models where individuals (and their alleles) were randomly distributed across a landscape, STRUCTURE correctly predicted that only one population was being sampled. However, when gradients created by neighbor mating existed, STRUCTURE detected multiple, but different numbers of clusters, depending on sampling protocols. We recommend testing for fine scale autocorrelation patterns prior to sample clustering, as the scale of the autocorrelation appears to influence the results. Further, we recommend that researchers pay attention to the impacts that sampling may have on subsequent population and landscape genetic results. The U.S. Government's right to retain a non-exclusive, royalty-free license in and to any copyright is acknowledged.  相似文献   

5.
Complementarity-based reserve selection algorithms efficiently prioritize sites for biodiversity conservation, but they are data-intensive and most regions lack accurate distribution maps for the majority of species. We explored implications of basing conservation planning decisions on incomplete and biased data using occurrence records of the plant family Proteaceae in South Africa. Treating this high-quality database as 'complete', we introduced three realistic sampling biases characteristic of biodiversity databases: a detectability sampling bias and two forms of roads sampling bias. We then compared reserve networks constructed using complete, biased, and randomly sampled data. All forms of biased sampling performed worse than both the complete data set and equal-effort random sampling. Biased sampling failed to detect a median of 1-5% of species, and resulted in reserve networks that were 9-17% larger than those designed with complete data. Spatial congruence and the correlation of irreplaceability scores between reserve networks selected with biased and complete data were low. Thus, reserve networks based on biased data require more area to protect fewer species and identify different locations than those selected with randomly sampled or complete data.  相似文献   

6.
Current methods for aligning biological sequences are based on dynamic programming algorithms. If large numbers of sequences or a number of long sequences are to be aligned, the required computations are expensive in memory and central processing unit (CPU) time. In an attempt to bring the tools of large-scale linear programming (LP) methods to bear on this problem, we formulate the alignment process as a controlled Markov chain and construct a suggested alignment based on policies that minimise the expected total cost of the alignment. We discuss the LP associated with the total expected discounted cost and show the results of a solution of the problem based on a primal-dual interior point method. Model parameters, estimated from aligned sequences, along with cost function parameters are used to construct the objective and constraint conditions of the LP problem. This article concludes with a discussion of some alignments obtained from the LP solutions of problems with various cost function parameter values.  相似文献   

7.
Aim Environmental niche models that utilize presence‐only data have been increasingly employed to model species distributions and test ecological and evolutionary predictions. The ideal method for evaluating the accuracy of a niche model is to train a model with one dataset and then test model predictions against an independent dataset. However, a truly independent dataset is often not available, and instead random subsets of the total data are used for ‘training’ and ‘testing’ purposes. The goal of this study was to determine how spatially autocorrelated sampling affects measures of niche model accuracy when using subsets of a larger dataset for accuracy evaluation. Location The distribution of Centaurea maculosa (spotted knapweed; Asteraceae) was modelled in six states in the western United States: California, Oregon, Washington, Idaho, Wyoming and Montana. Methods Two types of niche modelling algorithms – the genetic algorithm for rule‐set prediction (GARP) and maximum entropy modelling (as implemented with Maxent) – were used to model the potential distribution of C. maculosa across the region. The effect of spatially autocorrelated sampling was examined by applying a spatial filter to the presence‐only data (to reduce autocorrelation) and then comparing predictions made using the spatial filter with those using a random subset of the data, equal in sample size to the filtered data. Results The accuracy of predictions from both algorithms was sensitive to the spatial autocorrelation of sampling effort in the occurrence data. Spatial filtering led to lower values of the area under the receiver operating characteristic curve plot but higher similarity statistic (I) values when compared with predictions from models built with random subsets of the total data, meaning that spatial autocorrelation of sampling effort between training and test data led to inflated measures of accuracy. Main conclusions The findings indicate that care should be taken when interpreting the results from presence‐only niche models when training and test data have been randomly partitioned but occurrence data were non‐randomly sampled (in a spatially autocorrelated manner). The higher accuracies obtained without the spatial filter are a result of spatial autocorrelation of sampling effort between training and test data inflating measures of prediction accuracy. If independently surveyed data for testing predictions are unavailable, then it may be necessary to explicitly account for the spatial autocorrelation of sampling effort between randomly partitioned training and test subsets when evaluating niche model predictions.  相似文献   

8.
Zhang Z  Shi Y  Liu H 《Biophysical journal》2003,84(6):3583-3593
We present a novel method that uses the collective modes obtained with a coarse-grained model/anisotropic network model to guide the atomic-level simulations. Based on this model, local collective modes can be calculated according to a single configuration in the conformational space of the protein. In the molecular dynamics simulations, the motions along the slowest few modes are coupled to a higher temperature by the weak coupling method to amplify the collective motions. This amplified-collective-motion (ACM) method is applied to two test systems. One is an S-peptide analog. We realized the refolding of the denatured peptide in eight simulations out of 10 using the method. The other system is bacteriophage T4 lysozyme. Much more extensive domain motions between the N-terminal and C-terminal domain of T4 lysozyme are observed in the ACM simulation compared to a conventional simulation. The ACM method allows for extensive sampling in conformational space while still restricting the sampled configurations within low energy areas. The method can be applied in both explicit and implicit solvent simulations, and may be further applied to important biological problems, such as long timescale functional motions, protein folding/unfolding, and structure prediction.  相似文献   

9.
SUMMARY: Biological and engineered networks have recently been shown to display network motifs: a small set of characteristic patterns that occur much more frequently than in randomized networks with the same degree sequence. Network motifs were demonstrated to play key information processing roles in biological regulation networks. Existing algorithms for detecting network motifs act by exhaustively enumerating all subgraphs with a given number of nodes in the network. The runtime of such algorithms increases strongly with network size. Here, we present a novel algorithm that allows estimation of subgraph concentrations and detection of network motifs at a runtime that is asymptotically independent of the network size. This algorithm is based on random sampling of subgraphs. Network motifs are detected with a surprisingly small number of samples in a wide variety of networks. Our method can be applied to estimate the concentrations of larger subgraphs in larger networks than was previously possible with exhaustive enumeration algorithms. We present results for high-order motifs in several biological networks and discuss their possible functions. AVAILABILITY: A software tool for estimating subgraph concentrations and detecting network motifs (mfinder 1.1) and further information is available at http://www.weizmann.ac.il/mcb/UriAlon/  相似文献   

10.
ABSTRACT: BACKGROUND: Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length- dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. RESULTS: In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case -- without sacrificing much of the accuracy of the results. CONCLUSIONS: Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms (see [25]).  相似文献   

11.
A previous comparison of whistles using data sampled at 48 kHz suggested that certain frequency parameters vary along a latitudinal gradient. This geographical pattern may be biased because whistles sampled at higher frequencies could potentially have very different frequency contents. The current study compared the acoustic parameters of Guiana dolphin (Sotalia guianensis) whistles recorded at a higher sampling rate (96 kHz) and from groups occupying two never before sampled sites, Benevente Bay, Espírito Santo, Brazil, and Formosa Bay, Rio Grande do Norte, Brazil, with recordings of other populations in South America. By only considering data sampled at a rate of at least 96 kHz, we aimed to detect differences in whistles across locations. Contrary to previous findings, our analyses do not indicate any clear separation between northern and southern populations based on whistles, and do not corroborate the hypothesis of latitudinal acoustic variation in this species. The variation in Guiana dolphin whistle parameters found here appears to be influenced by latitude to some extent, but several other factors, including sampling method, environmental fluctuations, and social influence on vocal learning, may be confounding the detection of a geographic pattern in these whistle samples.  相似文献   

12.
The uniform sampling of convex polytopes is an interesting computational problem with many applications in inference from linear constraints, but the performances of sampling algorithms can be affected by ill-conditioning. This is the case of inferring the feasible steady states in models of metabolic networks, since they can show heterogeneous time scales. In this work we focus on rounding procedures based on building an ellipsoid that closely matches the sampling space, that can be used to define an efficient hit-and-run (HR) Markov Chain Monte Carlo. In this way the uniformity of the sampling of the convex space of interest is rigorously guaranteed, at odds with non markovian methods. We analyze and compare three rounding methods in order to sample the feasible steady states of metabolic networks of three models of growing size up to genomic scale. The first is based on principal component analysis (PCA), the second on linear programming (LP) and finally we employ the Lovazs ellipsoid method (LEM). Our results show that a rounding procedure dramatically improves the performances of the HR in these inference problems and suggest that a combination of LEM or LP with a subsequent PCA perform the best. We finally compare the distributions of the HR with that of two heuristics based on the Artificially Centered hit-and-run (ACHR), gpSampler and optGpSampler. They show a good agreement with the results of the HR for the small network, while on genome scale models present inconsistencies.  相似文献   

13.
Least-squares methods for blind source separation based on nonlinear PCA   总被引:2,自引:0,他引:2  
In standard blind source separation, one tries to extract unknown source signals from their instantaneous linear mixtures by using a minimum of a priori information. We have recently shown that certain nonlinear extensions of principal component type neural algorithms can be successfully applied to this problem. In this paper, we show that a nonlinear PCA criterion can be minimized using least-squares approaches, leading to computationally efficient and fast converging algorithms. Several versions of this approach are developed and studied, some of which can be regarded as neural learning algorithms. A connection to the nonlinear PCA subspace rule is also shown. Experimental results are given, showing that the least-squares methods usually converge clearly faster than stochastic gradient algorithms in blind separation problems.  相似文献   

14.
A common pattern found in phylogeny-based empirical studies of diversification is a decrease in the rate of lineage accumulation toward the present. This early-burst pattern of cladogenesis is often interpreted as a signal of adaptive radiation or density-dependent processes of diversification. However, incomplete taxonomic sampling is also known to artifactually produce patterns of rapid initial diversification. The Monte Carlo constant rates (MCCR) test, based upon Pybus and Harvey's gamma (γ)-statistic, is commonly used to accommodate incomplete sampling, but this test assumes that missing taxa have been randomly pruned from the phylogeny. Here we use simulations to show that preferentially sampling disparate lineages within a clade can produce severely inflated type-I error rates of the MCCR test, especially when taxon sampling drops below 75%. We first propose two corrections for the standard MCCR test, the proportionally deeper splits that assumes missing taxa are more likely to be recently diverged, and the deepest splits only MCCR that assumes that all missing taxa are the youngest lineages in the clade, and assess their statistical properties. We then extend these two tests into a generalized form that allows the degree of nonrandom sampling (NRS)to be controlled by a scaling parameter, α. This generalized test is then applied to two recent studies. This new test allows systematists to account for nonrandom taxonomic sampling when assessing temporal patterns of lineage diversification in empirical trees. Given the dramatic affect NRS can have on the behavior of the MCCR test, we argue that evaluating the sensitivity of this test to NRS should become the norm when investigating patterns of cladogenesis in incompletely sampled phylogenies.  相似文献   

15.
Balanced minimum evolution (BME) is a statistically consistent distance-based method to reconstruct a phylogenetic tree from an alignment of molecular data. In 2000, Pauplin showed that the BME method is equivalent to optimizing a linear functional over the BME polytope, the convex hull of the BME vectors obtained from Pauplin’s formula applied to all binary trees. The BME method is related to the Neighbor Joining (NJ) Algorithm, now known to be a greedy optimization of the BME principle. Further, the NJ and BME algorithms have been studied previously to understand when the NJ Algorithm returns a BME tree for small numbers of taxa. In this paper we aim to elucidate the structure of the BME polytope and strengthen knowledge of the connection between the BME method and NJ Algorithm. We first prove that any subtree-prune-regraft move from a binary tree to another binary tree corresponds to an edge of the BME polytope. Moreover, we describe an entire family of faces parameterized by disjoint clades. We show that these clade-faces are smaller dimensional BME polytopes themselves. Finally, we show that for any order of joining nodes to form a tree, there exists an associated distance matrix (i.e., dissimilarity map) for which the NJ Algorithm returns the BME tree. More strongly, we show that the BME cone and every NJ cone associated to a tree T have an intersection of positive measure.  相似文献   

16.
Most structure prediction algorithms consist of initial sampling of the conformational space, followed by rescoring and possibly refinement of a number of selected structures. Here we focus on protein docking, and show that while decoupling sampling and scoring facilitates method development, integration of the two steps can lead to substantial improvements in docking results. Since decoupling is usually achieved by generating a decoy set containing both non‐native and near‐native docked structures, which can be then used for scoring function construction, we first review the roles and potential pitfalls of decoys in protein–protein docking, and show that some type of decoys are better than others for method development. We then describe three case studies showing that complete decoupling of scoring from sampling is not the best choice for solving realistic docking problems. Although some of the examples are based on our own experience, the results of the CAPRI docking and scoring experiments also show that performing both sampling and scoring generally yields better results than scoring the structures generated by all predictors. Next we investigate how the selection of training and decoy sets affects the performance of the scoring functions obtained. Finally, we discuss pathways to better alignment of the two steps, and show some algorithms that achieve a certain level of integration. Although we focus on protein–protein docking, our observations most likely also apply to other conformational search problems, including protein structure prediction and the docking of small molecules to proteins.Proteins 2013; 81:1874–1884. © 2013 Wiley Periodicals, Inc.  相似文献   

17.
Waples RS  Yokota M 《Genetics》2007,175(1):219-233
The standard temporal method for estimating effective population size (N(e)) assumes that generations are discrete, but it is routinely applied to species with overlapping generations. We evaluated bias in the estimates N(e) caused by violation of this assumption, using simulated data for three model species: humans (type I survival), sparrow (type II), and barnacle (type III). We verify a previous proposal by Felsenstein that weighting individuals by reproductive value is the correct way to calculate parametric population allele frequencies, in which case the rate of change in age-structured populations conforms to that predicted by discrete-generation models. When the standard temporal method is applied to age-structured species, typical sampling regimes (sampling only newborns or adults; randomly sampling the entire population) do not yield properly weighted allele frequencies and result in biased N(e). The direction and magnitude of the bias are shown to depend on the sampling method and the species' life history. Results for populations that grow (or decline) at a constant rate paralleled those for populations of constant size. If sufficient demographic data are available and certain sampling restrictions are met, the Jorde-Ryman modification of the temporal method can be applied to any species with overlapping generations. Alternatively, spacing the temporal samples many generations apart maximizes the drift signal compared to sampling biases associated with age structure.  相似文献   

18.
A statistical theory for sampling species abundances   总被引:2,自引:1,他引:1  
Green JL  Plotkin JB 《Ecology letters》2007,10(11):1037-1045
The pattern of species abundances is central to ecology. But direct measurements of species abundances at ecologically relevant scales are typically unfeasible. This limitation has motivated a long-standing interest in the relationship between the abundance distribution in a large, regional community and the distribution observed in a small sample from the community. Here, we develop a statistical sampling theory to describe how observed patterns of species abundances are influenced by the spatial distributions of populations. For a wide range of regional-scale abundance distributions we derive exact expressions for the sampled abundance distributions, as a function of sample size and the degree of conspecific spatial aggregation. We show that if populations are randomly distributed in space then the sampled and regional-scale species-abundance distribution typically have the same functional form: sampling can be expressed by a simple scaling relationship. In the case of aggregated spatial distributions, however, the shape of a sampled species-abundance distribution diverges from the regional-scale distribution. Conspecific aggregation results in sampled distributions that are skewed towards both rare and common species. We discuss our findings in light of recent results from neutral community theory, and in the context of estimating biodiversity.  相似文献   

19.
The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号