首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wang J 《Genetics》2012,191(1):183-194
Quite a few methods have been proposed to infer sibship and parentage among individuals from their multilocus marker genotypes. They are all based on Mendelian laws either qualitatively (exclusion methods) or quantitatively (likelihood methods), have different optimization criteria, and use different algorithms in searching for the optimal solution. The full-likelihood method assigns sibship and parentage relationships among all sampled individuals jointly. It is by far the most accurate method, but is computationally prohibitive for large data sets with many individuals and many loci. In this article I propose a new likelihood-based method that is computationally efficient enough to handle large data sets. The method uses the sum of the log likelihoods of pairwise relationships in a configuration as the score to measure its plausibility, where log likelihoods of pairwise relationships are calculated only once and stored for repeated use. By analyzing several empirical and many simulated data sets, I show that the new method is more accurate than pairwise likelihood and exclusion-based methods, but is slightly less accurate than the full-likelihood method. However, the new method is computationally much more efficient than the full-likelihood method, and for the cases of both sexes polygamous and markers with genotyping errors, it can be several orders faster. The new method can handle a large sample with thousands of individuals and the number of markers limited only by the computer memory.  相似文献   

2.
In the Haseman-Elston approach the squared phenotypic difference is regressed on the proportion of alleles shared identical by descent (IBD) to map a quantitative trait to a genetic marker. In applications the IBD distribution is estimated and usually cannot be determined uniquely owing to incomplete marker information. At Genetic Analysis Workshop (GAW) 13, Jacobs et al. [BMC Genet 2003, 4(Suppl 1):S82] proposed to improve the power of the Haseman-Elston algorithm by weighting for information available from marker genotypes. The authors did not show, however, the validity of the employed asymptotic distribution. In this paper, we use the simulated data provided for GAW 14 and show that weighting Haseman-Elston by marker information results in increased type I error rates. Specifically, we demonstrate that the number of significant findings throughout the chromosome is significantly increased with weighting schemes. Furthermore, we show that the classical Haseman-Elston method keeps its nominal significance level when applied to the same data. We therefore recommend to use Haseman-Elston with marker informativity weights only in conjunction with empirical p-values. Whether this approach in fact yields an increase in power needs to be investigated further.  相似文献   

3.
Summary For the situation of a Mendelian disease linked to a genetic marker, a new method is described that allows evaluating for genetic counseling the information potentially available from the linked marker before the marker data are actually obtained, that is, prior to drawing blood for marker typing. For a consultand in a family pedigree, the method determines the risk distribution (small families) or an approximation to it (larger families) and calculates the probability that the risk will deviate beyond certain limits from its a priori value, which exists without marker data, for example, that the risk will be smaller than 0.10 or larger than 0.90. The method was applied here to a pedigree of 15 individuals for which analytical calculations would be difficult to carry out.  相似文献   

4.
Genetic marker‐based identification of distinct individuals and recognition of duplicated individuals has important applications in many research areas in ecology, evolutionary biology, conservation biology and forensics. The widely applied genotype mismatch (MM) method, however, is inaccurate because it relies on a fixed and suboptimal threshold number (TM) of mismatches, and often yields self‐inconsistent pairwise inferences. In this study, I improved MM method by calculating an optimal TM to accommodate the number, mistyping rates, missing data and allele frequencies of the markers. I also developed a pairwise likelihood relationship (LR) method and a likelihood clustering (LC) method for individual identification, using poor‐quality data that may have high and variable rates of allelic dropouts and false alleles at genotyped loci. The 3 methods together with the relatedness (RL) method were then compared in accuracy by analysing an empirical frog data set and many simulated data sets generated under different parameter combinations. The analysis results showed that LC is generally one or two orders more accurate for individual identification than the other methods. Its accuracy is especially superior when the sampled multilocus genotypes have poor quality (i.e. teemed with genotyping errors and missing data) and highly replicated, a situation typical of noninvasive sampling used in estimating population size. Importantly, LC is the only method that guarantees to produce self‐consistent results by partitioning the entire set of multilocus genotypes into distinct clusters, each cluster containing one or more genotypes that all represent the same individual. The LC and LR methods were implemented in a computer program COLONY for free download from the Internet.  相似文献   

5.
The female gametophyte is an absolutely essential structure for angiosperm reproduction, and female sterility has been reported in a number of crops. In this paper, a maximum-likelihood method is presented for estimating the position and effect of a female partial-sterile locus in a backcross population using the observed data of dominant or codominant markers. The ML solutions are obtained via Bailey’s method. The process for the estimating of the recombination fractions and the viabilities of female gametes are described, and the variances of the estimates of the parameters are also presented. Application of the method is demonstrated using a set of simulated data. This method circumvents the problems of the traditional mapping methods for female sterile genes which were based on data from seed set or embryo-sac morphology and anatomy.  相似文献   

6.
MOTIVATION: The observation of positive selection acting on a mutant indicates that the corresponding mutation has some form of functional relevance. Determining the fitness effects of mutations thus has relevance to many interesting biological questions. One means of identifying beneficial mutations in an asexual population is to observe changes in the frequency of marked subsets of the population. We here describe a method to estimate the establishment times and fitnesses of beneficial mutations from neutral marker frequency data. RESULTS: The method accurately reproduces complex marker frequency trajectories. In simulations for which positive selection is close to 5% per generation, we obtain correlations upwards of 0.91 between correct and inferred haplotype establishment times. Where mutation selection coefficients are exponentially distributed, the inferred distribution of haplotype fitnesses is close to being correct. Applied to data from a bacterial evolution experiment, our method reproduces an observed correlation between evolvability and initial fitness defect.  相似文献   

7.
Surrogate marker evaluation from an information theory perspective   总被引:1,自引:0,他引:1  
Alonso A  Molenberghs G 《Biometrics》2007,63(1):180-186
The last 20 years have seen lots of work in the area of surrogate marker validation, partly devoted to frame the evaluation in a multitrial framework, leading to definitions in terms of the quality of trial- and individual-level association between a potential surrogate and a true endpoint (Buyse et al., 2000, Biostatistics 1, 49-67). A drawback is that different settings have led to different measures at the individual level. Here, we use information theory to create a unified framework, leading to a definition of surrogacy with an intuitive interpretation, offering interpretational advantages, and applicable in a wide range of situations. Our method provides a better insight into the chances of finding a good surrogate endpoint in a given situation. We further show that some of the previous proposals follow as special cases of our method. We illustrate our methodology using data from a clinical study in psychiatry.  相似文献   

8.
MOTIVATION: DNA microarray data analysis has been used previously to identify marker genes which discriminate cancer from normal samples. However, due to the limited sample size of each study, there are few common markers among different studies of the same cancer. With the rapid accumulation of microarray data, it is of great interest to integrate inter-study microarray data to increase sample size, which could lead to the discovery of more reliable markers. RESULTS: We present a novel, simple method of integrating different microarray datasets to identify marker genes and apply the method to prostate cancer datasets. In this study, by applying a new statistical method, referred to as the top-scoring pair (TSP) classifier, we have identified a pair of robust marker genes (HPN and STAT6) by integrating microarray datasets from three different prostate cancer studies. Cross-platform validation shows that the TSP classifier built from the marker gene pair, which simply compares relative expression values, achieves high accuracy, sensitivity and specificity on independent datasets generated using various array platforms. Our findings suggest a new model for the discovery of marker genes from accumulated microarray data and demonstrate how the great wealth of microarray data can be exploited to increase the power of statistical analysis. CONTACT: leixu@jhu.edu.  相似文献   

9.
The number of marker loci required to answer a given research question satisfactorily is especially important for dominant markers since they have a lower information content than co‐dominant marker systems. In this study, we used simulated dominant marker data sets to determine the number of dominant marker loci needed to obtain satisfactory results from two popular population genetic analyses: STRUCTURE and AMOVA (analysis of molecular variance). Factors such as migration, level of population differentiation, and unequal sampling were varied in the data sets to mirror a range of realistic research scenarios. AMOVA performed well under all scenarios with a modest quantity of markers while STRUCTURE required a greater number, especially when populations were closely related. The popular ΔK method of determining the number of genetically distinct groups worked well when sampling was balanced, but underestimated the true number of groups with unbalanced sampling. These results provide a window through which to interpret previous work with dominant markers and we provide a protocol for determining the number of markers needed for future dominant marker studies.  相似文献   

10.
An issue often encountered in statistical genetics is whether, or to what extent, it is possible to estimate the degree to which individuals sampled from a background population are related to each other, on the basis of the available genotype data and some information on the demography of the population. In this article, we consider this question using explicit modelling of the pedigrees and gene flows at unlinked marker loci, but then restricting ourselves to a relatively recent history of the population, that is, considering the genealogy at most some tens of generations backwards in time. As a computational tool we use a Markov chain Monte Carlo numerical integration on the state space of genealogies of the sampled individuals. As illustrations of the method, we consider the question of relatedness at the level of genes/genomes (IBD estimation), using both simulated and real data.  相似文献   

11.
Korol AB  Ronin YI  Kirzhner VM 《Biometrics》1996,52(2):426-441
This paper presents a comparison of three methods of parameter estimation in analysis of linkage between a quantitative trait locus (QTL) and a marker locus: maximum likelihood, mean square for trait cumulative distribution function, and method of moments, employing simulated backcross data. The sensitivity of estimates to violation of assumptions of normality and equal variances were also studied. Some measures of discrepancy between the trait distributions in the QTL groups are considered to evaluate the potential dependence of the resolution capacity of the QTL substitution effect with respect to trait mean value and variance.  相似文献   

12.
The use of faecal marker concentration curves, in conjunction with compartmental analysis, is examined as a method for predicting faecal output in ruminants. Formulae for faecal production are derived for the various multicompartment models currently used to interpret marker concentration data. A comparison of observed and model-derived estimates of faecal dry matter production using three different markers is given for sheep consuming hay or a mixed diet.  相似文献   

13.
BACKGROUND: Knowledge of normal cardiac kinematics is important when attempting to understand the mechanisms that impair the contractile function of the heart during disease. The complex kinematics of the heart can be studied by inserting radiopaque markers in the cardiac wall and study the pumping heart with biplane cineradiography. In order to study the local strain, the bead array was developed where small radiopaque beads are inserted along three columns transmurally in the left ventricle. METHOD: This paper suggests a straightforward method for strain computation, based on polynomial least-squares fitting and tailored for combined marker and bead array analyses. RESULTS: This polynomial method gives small errors for a realistic bead array on an analytical test case. The method delivers an explicit expression of the Lagrangian strain tensor as a polynomial function of the coordinates of material points in the reference configuration. The method suggested in this paper is validated with analytical strains on a deforming cylinder resembling the heart, compared to a previously suggested finite element method, and applied to in vivo ovine data. The errors in the estimated strain components are shown to remain unchanged on an analytical test case when evaluating the effects of one missing bead. In conclusion, the proposed strain computation method is accurate and robust, with errors smaller or comparable to the current gold standard when applied on an analytical test case.  相似文献   

14.

Background  

Simple Sequence Repeats (SSRs), or microsatellites, are among the most powerful genetic markers known. A common method for the development of SSR markers is the construction of genomic DNA libraries enriched for SSR sequences, followed by DNA sequencing. However, designing optimal SSR markers from bulk sequence data is a laborious and time-consuming process.  相似文献   

15.
Piepho HP  Koch G 《Genetics》2000,155(3):1459-1468
Amplified fragment length polymorphisms (AFLPs) currently are among the most widely used marker systems. In many studies, AFLPs are analyzed on the basis of the presence or absence of a band on an electrophoretic gel. As a result, dominant homozygous individuals are not distinguished from heterozygous individuals, resulting in a considerable loss of information. This article shows how codominant information can be obtained if the amount of PCR products is quantified. Due to measurement variation, genotyping on the basis of such information is not error-free. We propose use of normal mixture distributions to determine the most likely genotype, given the data. The method is exemplified using AFLP data from sugar beet.  相似文献   

16.
Laura E. Timm 《Molecular ecology》2020,29(12):2133-2136
From its inception, population genetics has been nearly as concerned with the genetic data type—to which analyses are brought to bear—as it is with the analysis methods themselves. The field has traversed allozymes, microsatellites, segregating sites in multilocus alignments and, currently, single nucleotide polymorphisms (SNPs) generated by high‐throughput genomic sequencing methods, primarily whole genome sequencing and reduced representation library (RRL) sequencing. As each emerging data type has gained traction, it has been compared to existing methods, based on its relative ability to discern population structural complexity at increasing levels of resolution. However, this is usually done by comparing the gold standard in one data type to the gold standard in the new data type. These gold standards frequently differ in power and in sampling density, both across a genome and throughout a spatial range. In this issue of Molecular Ecology, D’Aloia et al. apply the high‐throughput approach as fully as possible to microsatellites, nuclear loci and SNPs genotyped through an RRL method; this is coupled with a spatially dense sampling scheme. Completing a battery of population genetics analyses across data types (including a series of down‐sampled data sets), the authors find that SNP data are slightly more sensitive to fine‐scale genetic structure, and the results are more resilient to down‐sampling than microsatellites and nonrepetitive nuclear loci. However, their results are far from an unqualified victory for RRL SNP data over all previous data types: the authors note that modest additions to the microsatellites and nuclear loci data sets may provide the necessary analytical power to delineate the fine‐scale genetic structuring identified by SNPs. As always, as the field begins to fully embrace the newest thing, good science reminds us that traditional data types are far from useless, especially when combined with a well‐designed sampling scheme.  相似文献   

17.
Genetic structure is ubiquitous in wild populations and is the result of the processes of natural selection, genetic drift, mutation, and gene flow. Genetic drift and divergent selection promotes the generation of genetic structure, while gene flow homogenizes the subpopulations. The ability to detect genetic structure from marker data diminishes rapidly with a decreasing level of differentiation among subpopulations. Weak genetic structure may be unimportant over evolutionary time scales but could have important implications in ecology and conservation biology. In this paper we examine methods for detecting and quantifying weak genetic structures using simulated data. We simulated populations consisting of two putative subpopulations evolving for up to 50 generations with varying degrees of gene flow (migration), and varying amounts of information (allelic diversity). There are a number of techniques available to detect and quantify genetic structure but here we concentrate on four methods: F(ST), population assignment, relatedness, and sibship assignment. Under the simple mating system simulated here, the four methods produce qualitatively similar results. However, the assignment method performed relatively poorly when genetic structure was weak and we therefore caution against using this method when the analytical aim is to detect fine-scale patterns. Further work should examine situations with different mating systems, for example where a few individuals dominate reproductive output of the population. This study will help workers to design their experiments (e.g., sample sizes of markers and individuals), and to decide which methods are likely to be most appropriate for their particular data.  相似文献   

18.
A reliable new cell marker in Xenopus   总被引:8,自引:0,他引:8  
A new reliable and durable method for marking cells in Xenopus is described. It is based on the differential staining of the nuclei of different Xenopus species, e.g., X. laevis and X. borealis, with the fluorescent dye quinacrine. This method permits us to recognize with certainty each cell in mitosis and interphase of X. borealis origin in any tissue combination with most of the other Xenopus species tested so far. This holds for all stages of development following grafting experiments, including adult tissues. The method is applicable in smears and squash preparations as well as in microtome sections. The method is particularly useful for marking migrating cells which are difficult to track, for instance, in embryos and in the circulatory system.  相似文献   

19.

Background

Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip®, employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively.

Results

In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16.

Conclusion

This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy.
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号