首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Four major problems can affect the efficiency of methods developed to estimate relatedness between individuals from information of molecular markers: (i) some of them are dependent on the knowledge of the true allelic frequencies in the base population; (ii) they assume that all loci are unlinked and in Hardy-Weinberg and linkage equilibrium; (iii) pairwise methods can lead to incongruous assignations because they take into account only two individuals at a time; (iv) most are usually constructed for particular structured populations (only consider a few relationship classes, e.g. full-sibs vs. unrelated). We have developed a new approach to estimate relatedness that is free from the above limitations. The method uses a 'blind search algorithm' (actually simulated annealing) to find the genealogy that yield a co-ancestry matrix with the highest correlation with the molecular co-ancestry matrix calculated using the markers. Thus (i and ii) it makes no direct assumptions about allelic frequencies or Hardy-Weinberg and linkage equilibrium; (iii) it always provide congruent relationships, as it considers all individuals at a time; (iv) degrees of relatedness can be as complex as desired just increasing the 'depth' (i.e. number of generations) of the proposed genealogies. Computer simulations have shown that the accuracy and robustness against genotyping errors of this new approach is comparable to that of other proposed methods in those particular situations they were developed for, but it is more flexible and can cope with more complex situations.  相似文献   

2.
A relatively simple method is proposed for the estimation of parameters of stage-structured populations from sample data for situation where (a) unit time survival rates may vary with time, and (b) the distribution of entry times to stage 1 is too complicated to be fitted with a simple parametric model such as a normal or gamma distribution. The key aspects of this model are that the entry time distribution is approximated by an exponential function withp parameters, the unit time survival rates in stages are approximated by anr parameter exponential polynomial in the stage number, and the durations of stages are assumed to be the same for all individuals. The new method is applied to four Zooplankton data sets, with parametric bootstrapping used to assess the bias and variation in estimates. It is concluded that good estimates of demographic parameters from stagefrequency data from natural populations will usually only be possible if extra information such as the durations of stages is known.  相似文献   

3.
4.
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.  相似文献   

5.
The use of non-invasive genetic sampling to estimate population size in elusive or rare species is increasing. The data generated from this sampling differ from traditional mark-recapture data in that individuals may be captured multiple times within a session or there may only be a single sampling event. To accommodate this type of data, we develop a method, named capwire, based on a simple urn model containing individuals of two capture probabilities. The method is evaluated using simulations of an urn and of a more biologically realistic system where individuals occupy space, and display heterogeneous movement and DNA deposition patterns. We also analyse a small number of real data sets. The results indicate that when the data contain capture heterogeneity the method provides estimates with small bias and good coverage, along with high accuracy and precision. Performance is not as consistent when capture rates are homogeneous and when dealing with populations substantially larger than 100. For the few real data sets where N is approximately known, capwire's estimates are very good. We compare capwire's performance to commonly used rarefaction methods and to two heterogeneity estimators in program capture: Mh-Chao and Mh-jackknife. No method works best in all situations. While less precise, the Chao estimator is very robust. We also examine how large samples should be to achieve a given level of accuracy using capwire. We conclude that capwire provides an improved way to estimate N for some DNA-based data sets.  相似文献   

6.
A linear regression method that allows survival rates to vary from stage to stage is described for the analysis of stage-frequency data. It has advantages over previously suggested methods since the calculations are not iterative, and it is not necessary to have independent estimates of stage durations, numbers entering stages, or the rate of entry to stage 1. Simulation is proposed to determine standard errors for estimates of population parameters, and to assess the goodness of fit of models.  相似文献   

7.
Summary Occupational, environmental, and nutritional epidemiologists are often interested in estimating the prospective effect of time‐varying exposure variables such as cumulative exposure or cumulative updated average exposure, in relation to chronic disease endpoints such as cancer incidence and mortality. From exposure validation studies, it is apparent that many of the variables of interest are measured with moderate to substantial error. Although the ordinary regression calibration (ORC) approach is approximately valid and efficient for measurement error correction of relative risk estimates from the Cox model with time‐independent point exposures when the disease is rare, it is not adaptable for use with time‐varying exposures. By recalibrating the measurement error model within each risk set, a risk set regression calibration (RRC) method is proposed for this setting. An algorithm for a bias‐corrected point estimate of the relative risk using an RRC approach is presented, followed by the derivation of an estimate of its variance, resulting in a sandwich estimator. Emphasis is on methods applicable to the main study/external validation study design, which arises in important applications. Simulation studies under several assumptions about the error model were carried out, which demonstrated the validity and efficiency of the method in finite samples. The method was applied to a study of diet and cancer from Harvard's Health Professionals Follow‐up Study (HPFS).  相似文献   

8.
Diagnoses of HIV infection are reported to the Public Health Laboratory Service (PHLS) AIDS Centre under a voluntary surveillance scheme. Names are not held in the data set, but the date of birth of the individual concerned is usually available. This paper describes a statistical method for identifying whether there are likely to be individuals repeatedly represented in the resulting data set, which is considered by birth year. A partial ordering method is used that is especially useful for years where the number of birth years in the sample is too small for chi2 tests to be used. At the 5% level, one of the five birth years tested in the data supplied to us by the PHLS shows evidence of more replication than would be expected from independent random sampling from the population. The results are compared with an alternative maximum-likelihood-based test that reaches the same conclusions. Maximum likelihood methods are further used to estimate the percentage of overcounting of individuals in the sample at 2.7%.  相似文献   

9.

Background

Cross-sectional surveys utilizing biomarkers that test for recent infection provide a convenient and cost effective way to estimate HIV incidence. In particular, the BED assay has been developed for this purpose. Controversy surrounding the way in which false positive results from the biomarker should be handled has lead to a number of different estimators that account for imperfect specificity. We compare the estimators proposed by McDougal et al., Hargrove et al. and McWalter & Welte.

Methodology/Principal Findings

The three estimators are analyzed and compared. An identity showing a relationship between the calibration parameters in the McDougal methodology is shown. When the three estimators are tested under a steady state epidemic, which includes individuals who fail to progress on the biomarker, only the McWalter/Welte method recovers an unbiased result.

Conclusions/Significance

Our analysis shows that the McDougal estimator can be reduced to a formula that only requires calibration of a mean window period and a long-term specificity. This allows simpler calibration techniques to be used and shows that all three estimators can be expressed using the same set of parameters. The McWalter/Welte method is applicable under the least restrictive assumptions and is the least prone to bias of the methods reviewed.  相似文献   

10.
A note on 'plotless' methods for estimating bacterial cell densities   总被引:2,自引:0,他引:2  
R oser , D., N edwell , D.B. & G ordon , A. 1984. A note on "plotless" methods for estimating bacterial cell densities. Journal of Applied Bacteriology 56 , 343–347.
'Plotless' techniques for determining population densities have been developed for, and applied to, higher plant populations. They can often be carried out more rapidly than techniques involving total counts of individuals in plots, or quadrants, but such plotless techniques have not been generally applied to the estimation of densities of bacterial cells. Direct microscopical counting of cell numbers in a field of view, an example of a plot-related method, has been traditionally used for micro-bial cell counts. In this study 'plot' and 'plotless' methods on a variety of bacterial samples are compared. Estimates of bacterial cell density were obtained by measuring the distance of cells from a fixed point in a field of view. The values, which were more rapidly obtained, were directly correlated with total cell counts. Although there was some apparent deviation from a perfect 1:1 relationship with total counts, as indicated by a correlation coefficient less than 1.0, there were no significant differences between the replicated counts of bacteria on samples of tissue from the surface of Hypholoma basidiocarps ( P < 0.05). This indicated that the methods of enumeration were comparable. The distance-related estimates could readily be obtained from fields of view with cell densities varying over several orders of magnitude. It was more rapidly applied, particularly at high density, and the method was applicable not only to random cell distributions but also to the non-random distributions encountered when microbial cells aggregated into micro-colonies. The method appears to be particularly well-suited for automated, digitized, direct counting procedures, as well as to estimating bacterial numbers on membrane filters and natural substrates.  相似文献   

11.

Background

A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need.

Methodology/Findings

We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found.

Conclusions/Significance

We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations.  相似文献   

12.
Variation in the manifestation age is typical of many mitochondrial diseases. The estimation of penetrance of pathogenic mutations causing such diseases is usually conducted on samples of individuals whose age exceeds the maximum age of the disease manifestation. In the case of rare diseases, samples of sufficient size sometimes cannot be formed. In this study, we propose a method for estimating penetrance involving individuals of any age. The efficiency of the method is demonstrated using Leber hereditary optic neuropathy as an example. It is shown that the method provides an unbiased estimate of penetrance and considerably reduces the error of this estimate in comparison with a sample including individuals whose age exceeds the maximum age of disease manifestation.  相似文献   

13.
We developed a new field method for reconstructing the three-dimensional positions of swarming mosquitoes. This method overcame certain inherent difficulties accompanied by conventional stereoscopic methods and is applicable to three-dimensional measurements of other insect species. Firstly, we constructed a probabilistic model for stereoscopy; if mosquitoes and six reference points with known coordinates were photographed simultaneously from two or more perspectives, then from the positions of images of mosquitoes and the reference points on the photographs, 1) the position of each camera with respect to the reference points is estimated; 2) stereo images which correspond to an identical real mosquito are matched; and 3) the spatial positions of the mosquitoes are determined. We automated the processes 1), 2) and 3), developing computer programs based on our model. We then constructed a portable system for three-dimensional measurements of swarming mosquitoes in the field. Initial data that illustrate the application of our method to studying mosquito swarming were presented.  相似文献   

14.
Knowledge of genetic correlations is essential to understand the joint evolution of traits through correlated responses to selection, a difficult and seldom, very precise task even with easy-to-breed species. Here, a simulation-based method to estimate genetic correlations and genetic covariances that relies only on phenotypic measurements is proposed. The method does not require any degree of relatedness in the sampled individuals. Extensive numerical results suggest that the propose method may provide relatively efficient estimates regardless of sample sizes and contributions from common environmental effects.  相似文献   

15.
The objective of this study was to develop methods to estimate the optimal threshold of a longitudinal biomarker and its credible interval when the diagnostic test is based on a criterion that reflects a dynamic progression of that biomarker. Two methods are proposed: one parametric and one non‐parametric. In both the cases, the Bayesian inference was used to derive the posterior distribution of the optimal threshold from which an estimate and a credible interval could be obtained. A numerical study shows that the bias of the parametric method is low and the coverage probability of the credible interval close to the nominal value, with a small coverage asymmetry in some cases. This is also true for the non‐parametric method in case of large sample sizes. Both the methods were applied to estimate the optimal prostate‐specific antigen nadir value to diagnose prostate cancer recurrence after a high‐intensity focused ultrasound treatment. The parametric method can also be applied to non‐longitudinal biomarkers.  相似文献   

16.
A Bayesian missing value estimation method for gene expression profile data   总被引:13,自引:0,他引:13  
MOTIVATION: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology. RESULTS: When applied to DNA microarray data from various experimental conditions, the BPCA method exhibited markedly better estimation ability than other recently proposed methods, such as singular value decomposition and K-nearest neighbors. While the estimation performance of existing methods depends on model parameters whose determination is difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method provides accurate and convenient estimation for missing values. AVAILABILITY: The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/.  相似文献   

17.
Sensitive assay methods for tyrosinase are essential not only for the understanding the process of pigment production but also for the development of effective inhibitors of tyrosinase. To develop an efficient assay method, we applied thymol blue to reaction mixtures. The enzyme kinetic study revealed that DOPA oxidase activity of tyrosinase in thymol blue-applied reaction system was more sensitively measured, even under lower enzyme units compared with the previous report with significant enhancement of Vmax while affinity change on substrate was not observed. To test whether this method could be applicable to the inhibition and the inactivation kinetic study of tyrosinase, the effect of kojic acid, a well-known tyrosinase inhibitor, and sodium chloride respectively, have been studied. Conclusively, thymol blue method can assay tyrosinase activity with sensitivity and is applicable to the inhibition and the inactivation study of tyrosinase.  相似文献   

18.
Usually, genetic correlations are estimated from breeding designs in the laboratory or greenhouse. However, estimates of the genetic correlation for natural populations are lacking, mostly because pedigrees of wild individuals are rarely known. Recently Lynch (1999) proposed a formula to estimate the genetic correlation in the absence of data on pedigree. This method has been shown to be particularly accurate provided a large sample size and a minimum (20%) proportion of relatives. Lynch (1999) proposed the use of the bootstrap to estimate standard errors associated with genetic correlations, but did not test the reliability of such a method. We tested the bootstrap and showed the jackknife can provide valid estimates of the genetic correlation calculated with the Lynch formula. The occurrence of undefined estimates, combined with the high number of replicates involved in the bootstrap, means there is a high probability of obtaining a biased upward, incomplete bootstrap, even when there is a high fraction of related pairs in a sample. It is easier to obtain complete jackknife estimates for which all the pseudovalues have been defined. We therefore recommend the use of the jackknife to estimate the genetic correlation with the Lynch formula. Provided data can be collected for more than two individuals at each location, we propose a group sampling method that produces low standard errors associated with the jackknife, even when there is a low fraction of relatives in a sample.  相似文献   

19.
Temperature is widely accepted as a critical indicator of aerobic microbial activity during composting but, to date, little effort has been made to devise an appropriate statistical approach for the analysis of temperature time series. Nonlinear, time-correlated effects have not previously been considered in the statistical analysis of temperature data from composting, despite their importance and the ubiquity of such features. A novel mathematical model is proposed here, based on a modified Gompertz function, which includes nonlinear, time-correlated effects. Methods are shown to estimate initial values for the model parameter. Algorithms in SAS are used to fit the model to different sets of temperature data from passively aerated compost. Methods are then shown for testing the goodness-of-fit of the model to data. Next, a method is described to determine, in a statistically rigorous manner, the significance of differences among the time-correlated characteristics of the datasets as described using the proposed model. An extra-sum-of-squares method was selected for this purpose. Finally, the model and methods are used to analyze a sample dataset and are shown to be useful tools for the statistical comparison of temperature data in composting.  相似文献   

20.
Abstract Lynch (1999) proposed a method for estimation of genetic correlations from phenotypic measurements of individuals for which no pedigree information is available. This method assumes that shared environmental effects do not contribute to the similarity of relatives, and it is expected to perform best when sample sizes are large, many individuals in the sample are paired with close relatives, and heritability of the traits is high. We tested the practicality of this method for field biologists by using it to estimate genetic correlations from measurements of field‐caught waterstriders {Aquarius remigis). Results for sample sizes of less than 100 pairs were often unstable or undefined, and even with more than 500 pairs only half of those correlations that had been found to be significant in standard laboratory experiments were statistically significant in this study. Statistically removing the influence of environmental effects (shared between relatives) weakened the estimates, possibly by removing some of the genetic similarity between relatives. However, the method did generate statistically significant estimates for some genetic correlations. Lynch (1999) anticipated the problems found, and proposed another method that uses estimates of relatedness between members of pairs (from molecular marker data) to improve the estimates of genetic correlations, but that approach has yet to be tested in the field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号