首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 34 毫秒
1.
Xiao J  Wang X  Hu Z  Tang Z  Xu C 《Heredity》2007,98(6):427-435
Segregation analysis is a method of detecting major genes for quantitative traits without using marker information. It serves as an important tool in helping investigators to plan further studies such as quantitative trait loci mapping or more sophisticated genomic analyses. However, current methods of segregation analysis for a single trait typically have low statistical power. We propose a multivariate segregation analysis (MSA) that takes advantage of the correlation structure of multiple quantitative traits to detect major genes. This method not only increases the statistical power, but allows dissection of the genetic architecture underlying the trait complex. In MSA the observed phenotypes of multiple correlated traits are fitted to a multivariate Gaussian mixture model. Model parameters are estimated under the maximum likelihood framework via the expectation-maximization algorithm. The presence of major genes is tested using likelihood ratio test statistics. Pleiotropy is distinguished from close linkage by comparing three possible models using the Bayesian information criterion. Two simulation experiments were performed based on the F(2) mating design. In the first, the statistical properties of MSA under varying heritabilities and sample sizes were investigated and the results compared with those obtained from single-trait analysis. In the second simulation the efficacy of MSA in separating pleiotropy from close linkage was demonstrated. Finally, the new method was applied to real data and detected a major gene responsible for both plant height and tiller number in rice.  相似文献   

2.
In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test.  相似文献   

3.
The assumption of Hardy-Weinberg equilibrium (HWE) is generally required for association analysis using case-control design on autosomes; otherwise, the size may be inflated. There has been an increasing interest of exploring the association between diseases and markers on X chromosome and the effect of the departure from HWE on association analysis on X chromosome. Note that there are two hypotheses of interest regarding the X chromosome: (i) the frequencies of the same allele at a locus in males and females are equal and (ii) the inbreeding coefficient in females is zero (without excess homozygosity). Thus, excess homozygosity and significantly different minor allele frequencies between males and females are used to filter X-linked variants. There are two existing methods to test for (i) and (ii), respectively. However, their size and powers have not been studied yet. Further, there is no existing method to simultaneously detect both hypotheses till now. Therefore, in this article, we propose a novel likelihood ratio test for both (i) and (ii) on X chromosome. To further investigate the underlying reason why the null hypothesis is statistically rejected, we also develop two likelihood ratio tests for detecting (i) and (ii), respectively. Moreover, we explore the effect of population stratification on the proposed tests. From our simulation study, the size of the test for (i) is close to the nominal significance level. However, the size of the excess homozygosity test and the test for both (i) and (ii) is conservative. So, we propose parametric bootstrap techniques to evaluate their validity and performance. Simulation results show that the proposed methods with bootstrap techniques control the size well under the respective null hypothesis. Power comparison demonstrates that the methods with bootstrap techniques are more powerful than those without bootstrap procedure and the existing methods. The application of the proposed methods to a rheumatoid arthritis dataset indicates their utility.  相似文献   

4.
The interplant variation in sexual and asexual reproduction in an Oregon population of the alpine perennial Antennaria media was investigated. Four polymorphic loci were assayed by enzyme electrophoresis of the progeny of 72 families from two subpopulations of A. media. The population was divided into two spatially distinct subpopulations. A multilocus model, incorporating a mixture of apomixis and random outcrossing, was used to estimate the mating system of pistillate plants both on the population and individual levels with statistical significance of the estimates based on bootstrap methods. The population contained a mixture of sexual individuals, partial apomicts, and obligate apomicts. The first subpopulation contained individuals that were partially apomictic and presumably produced both reduced and unreduced embryo sacs. There was a conspicuous difference in the breeding system composition between the two subpopulations. The first subpopulation had a “female” biased gender ratio and contained mostly obligate apomicts, some partial apomicts, and some outcrossing amphimicts. The second subpopulation, which had a nearly balanced gender ratio, contained mostly amphimicts, some obligate apomicts, but no facultative apomicts. This is the first study to document partial apomixis in individual plants by the use of genetic markers.  相似文献   

5.
Seasonal fitness declines are common, but the relative contribution of different reproductive components to the seasonal change in the production of reproductive young, and the component-specific drivers of this change is generally poorly known. We used long-term data (17 years) on breeding time (i.e. date of first egg laid) in northern wheatears (Oenanthe oenanthe) to investigate seasonal reproductive patterns and estimate the relative contributions of reproductive components to the overall decline in reproduction, while accounting for factors potentially linked to seasonal declines, i.e. individual and habitat quality. All reproductive components—nest success (reflecting nest predation rate), clutch size, fledging success and recruitment success—showed a clear decline with breeding time whereas subsequent adult survival did not. A non-linear increase in nest predation rate caused nest success to decline rapidly early in the season and level off at ~80 % success late in the breeding season. The combined seasonal decline in all reproductive components caused the mean production of recruits per nest to drop from around 0.7–0.2; with the relative contribution greatest for recruitment success which accounted for ~50 % of the decline. Our data suggest that changing environmental conditions together with effects of nest predation have strong effects on the seasonal decline in fitness. Our demonstration of the combined effects of all reproductive components and their relative contribution shows that omitting data from later stages of breeding (recruitment) can greatly underestimate seasonal fitness declines.  相似文献   

6.
A statistical method for parametric density estimation based upon a mixture‐of‐genotypes model is developed for the thermostable phenol sulfotransferase (SULT1A1) activity which has a putative role in modifying risk for colon and prostate cancer/polyps. The EM algorithm for the general mixture model is modified to accommodate the genetic constraints and is used to estimate genotype frequencies from the distribution of the SULT1A1 phenotype. A parametric bootstrap likelihood ratio test is considered as a testing method for the number of mixing components. The size and power of the test is then investigated and compared with the conventional chi‐squared test. The relative risk associated with genotypes defined by this model is also investigated through the generalized linear model. This analysis revealed that a genotype with the highest mean value of SULT1A1 activity has greater impact on cancer risk than others. This result suggests that the phenotype with a higher SULT1A1 activity might be important in studying the association between the cancer risk and SULT1A1 activity. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

7.
You N  Xuan Mao C 《Biometrics》2008,64(2):371-376
Summary .   Capture–recapture methods are widely adopted to estimate sizes of populations of public health interest using information from surveillance systems. For a two-list surveillance system with a discrete covariate, a population is divided into several subpopulations. A unified framework is proposed in which the logits of presence probabilities are decomposed into case effects and list effects. The estimators for the whole population and subpopulation sizes, their adjusted versions, and asymptotic standard errors admit closed-form expressions. Asymptotic and bootstrap individual and simultaneous confidence intervals are easily constructed. Conditional likelihood ratio tests are used to select one from three possible models. Real examples are investigated.  相似文献   

8.
The population genetics study is crucial as it helps in understanding the epidemiological aspects of dengue and help improving a vector control measures. This research aims to investigate the population genetics structure of two common species of Aedes mosquitoes in Penang; Aedes aegypti and Aedes albopictus using Cytochrome Oxidase I (COI) mitochondrial DNA (mtDNA) marker. Molecular investigations were derived from 440 bp and 418 bp mtDNA COI on 125 and 334 larvae of Aedes aegypti and Aedes albopictus respectively, from 32 locations in Penang. All samples were employed in the BLASTn for species identification. The haplotype diversity, nucleotide diversity, neutrality test and mismatch distribution analysis were conducted in DnaSP version 5.10.1. AMOVA analysis was conducted in ARLEQUIN version 3.5 and the phylogenetic reconstructions based on maximum likelihood (ML) and neighbor-joining (NJ) methods were implemented in MEGA X. The relationships among haplotypes were further tested by creating a minimum spanning tree using Network version 4.6.1. All samples were genetically identified and clustered into six distinct species. Among the species, Ae. albopictus was the most abundant (67.2%), followed by Ae. aegypti (25.2%) and the rest were counted for Culex sp. and Toxorhynchites sp. Both Ae. aegypti and Ae. albopictus show low nucleotide diversity (π) and high haplotype diversity (h), while the neutrality test shows a negative value in most of the population for both species. There are a total of 39 and 64 haplotypes recorded for Ae. aegypti and Ae. albopictus respectively. AMOVA analysis revealed that most of the variation occurred within population for both species. Mismatch distribution analysis showed bimodal characteristic of population differentiation for Ae. aegypti but Ae. albopictus showed unimodal characteristics of population differentiation. Genetic distance based on Tamura-Nei parameter showed low genetic divergent within population and high genetic divergent among population for both species. The maximum likelihood tree showed no obvious pattern of population genetic structure for both Ae. aegypti and Ae. albopictus from Penang and a moderate to high bootstrap values has supported this conclusion. The minimum spanning network for Ae. aegypti and Ae. albopictus showed five and three dominant haplotypes respectively, which indicates a mixture of haplotypes from the regions analysed. This study revealed that there is no population genetic structure exhibited by both Ae. aegypti and Ae. albopictus in Penang. Mutation has occurred rapidly in both species and this will be challenging in controlling the populations. However, further analysis needed to confirm this statement.  相似文献   

9.
In a case study of fungi of the class Sordariomycetes, we evaluated the effect of multiple sequence alignment (MSA) on the reliability of the phylogenetic trees, topology and confidence of major phylogenetic clades. We compared two main approaches for constructing MSA based on (1) the knowledge of the secondary (2D) structure of ribosomal RNA (rRNA) genes, and (2) automatic construction of MSA by four alignment programs characterized by different algorithms and evaluation methods, CLUSTAL, MAFFT, MUSCLE, and SAM. In the primary fungal sequences of the two functional rRNA genes, the nuclear small and large ribosomal subunits (18 S and 28 S), we identified four and six, respectively, highly variable regions, which correspond mainly to hairpin loops in the 2D structure. These loops are often positioned in expansion segments, which are missing or are not completely developed in the Archaeal and Eubacterial kingdoms. Proper sorting of these sites was a key for constructing an accurate MSA. We utilized DNA sequences from 28 S as an example for one-gene analysis. Five different MSAs were created and analyzed with maximum parsimony and maximum likelihood methods. The phylogenies inferred from the alignments improved with 2D structure with identified homologous segments, and those constructed using the MAFFT alignment program, with all highly variable regions included, provided the most reliable phylograms with higher bootstrap support for the majority of clades. We illustrate and provide examples demonstrating that re-evaluating ambiguous positions in the consensus sequences using 2D structure and covariance is a promising means in order to improve the quality and reliability of sequence alignments.  相似文献   

10.

Background and scope

Differential equation systems modeling biochemical reaction networks can only give quantitative predictions, when they are in accordance with experimental data. However, even if a model can well recapitulate given data, it is often the case that some of its kinetic parameters can be arbitrarily chosen without significantly affecting the simulation results. This indicates a lack of appropriate data to determine those parameters. In this case, the parameter is called to be practically non-identifiable. Well-identified parameters are paramount for reliable quantitative predictions and, therefore, identifiability analysis is an important topic in modeling of biochemical reaction networks. Here, we describe a hidden feature of the free modeling software COPASI, which can be exploited to easily and quickly conduct a parameter identifiability analysis of differential equation systems by calculating likelihood profiles. The proposed combination of an established method for parameter identifiability analysis with the user-friendly features of COPASI offers an easy and rapid access to parameter identifiability analysis even for non-experts.

Availability

COPASI is freely available for academic use at http://www.copasi.org.  相似文献   

11.
Model-based (likelihood and Bayesian) and non-model-based (PCA and K-means clustering) methods were developed to identify populations and assign individuals to the identified populations using marker genotype data. Model-based methods are favoured because they are based on a probabilistic model of population genetics with biologically meaningful parameters and thus produce results that are easily interpretable and applicable. Furthermore, they often yield more accurate structure inferences than non-model-based methods. However, current model-based methods either are computationally demanding and thus applicable to small problems only or use simplified admixture models that could yield inaccurate results in difficult situations such as unbalanced sampling. In this study, I propose new likelihood methods for fast and accurate population admixture inference using genotype data from a few multiallelic microsatellites to millions of diallelic SNPs. The methods conduct first a clustering analysis of coarse-grained population structure by using the mixture model and the simulated annealing algorithm, and then an admixture analysis of fine-grained population structure by using the clustering results as a starting point in an expectation maximisation algorithm. Extensive analyses of both simulated and empirical data show that the new methods compare favourably with existing methods in both accuracy and running speed. They can analyse small datasets with just a few multiallelic microsatellites but can also handle in parallel terabytes of data with millions of markers and millions of individuals. In difficult situations such as many and/or lowly differentiated populations, unbalanced or very small samples of individuals, the new methods are substantially more accurate than other methods.Subject terms: Population genetics, Evolutionary ecology  相似文献   

12.
Carcinogenesis is commonly described as a multistage process, in which stem cells are transformed into cancer cells via a series of mutations. In this article, we consider extensions of the multistage carcinogenesis model by mixture modeling. This approach allows us to describe population heterogeneity in a biologically meaningful way. We focus on finite mixture models, for which we prove identifiability. These models are applied to human lung cancer data from several birth cohorts. Maximum likelihood estimation does not perform well in this application due to the heavy censoring in our data. We thus use analytic graduation instead. Very good fits are achieved for models that combine a small high risk group with a large group that is quasi immune.  相似文献   

13.
Empirical confidence intervals (CIs) for the estimated quantitative trait locus (QTL) location from selective and non-selective non-parametric bootstrap resampling methods were compared for a genome scan involving an Angus x Brahman reciprocal fullsib backcross population. Genetic maps, based on 357 microsatellite markers, were constructed for 29 chromosomes using CRI-MAP V2.4. Twelve growth, carcass composition and beef quality traits (n = 527-602) were analysed to detect QTLs utilizing (composite) interval mapping approaches. CIs were investigated for 28 likelihood ratio test statistic (LRT) profiles for the one QTL per chromosome model. The CIs from the non-selective bootstrap method were largest (87 7 cM average or 79-2% coverage of test chromosomes). The Selective II procedure produced the smallest CI size (42.3 cM average). However, CI sizes from the Selective II procedure were more variable than those produced by the two LOD drop method. CI ranges from the Selective II procedure were also asymmetrical (relative to the most likely QTL position) due to the bias caused by the tendency for the estimated QTL position to be at a marker position in the bootstrap samples and due to monotonicity and asymmetry of the LRT curve in the original sample.  相似文献   

14.
We present two tests for seasonal trend in monthly incidence data. The first approach uses a penalized likelihood to choose the number of harmonic terms to include in a parametric harmonic model (which includes time trends and autogression as well as seasonal harmonic terms) and then tests for seasonality using a parametric bootstrap test. The second approach uses a semiparametric regression model to test for seasonal trend. In the semiparametric model, the seasonal pattern is modeled nonparametrically, parametric terms are included for autoregressive effects and a linear time trend, and a parametric bootstrap test is used to test for seasonality. For both procedures, a null distribution is generated under a null Poisson model with time trends and autoregression parameters.We apply the methods to skin melanoma incidence rates collected by the surveillance, epidemiology, and end results (SEER) program of the National Cancer Institute, and perform simulation studies to evaluate the type I error rate and power for the two procedures. These simulations suggest that both procedures are alpha-level procedures. In addition, the harmonic model/bootstrap test had similar or larger power than the semiparametric model/bootstrap test for a wide range of alternatives, and the harmonic model/bootstrap test is much easier to implement. Thus, we recommend the harmonic model/bootstrap test for the analysis of seasonal incidence data.  相似文献   

15.
Determining the origin of individuals in mixed population samples is key in many ecological, conservation and management contexts. Genetic data can be analyzed using genetic stock identification (GSI), where the origin of single individuals is determined using Individual Assignment (IA) and population proportions are estimated with Mixed Stock Analysis (MSA). In such analyses, allele frequencies in a reference baseline are required. Unknown individuals or mixture proportions are assigned to source populations based on the likelihood that their multilocus genotypes occur in a particular baseline sample. Representative sampling of populations included in a baseline is important when designing and performing GSI. Here, we investigate the effects of family sampling on GSI, using both simulated and empirical genotypes for Atlantic salmon (Salmo salar). We show that nonrepresentative sampling leading to inclusion of close relatives in a reference baseline may introduce bias in estimated proportions of contributing populations in a mixed sample, and increases the amount of incorrectly assigned individual fish. Simulated data further show that the induced bias increases with increasing family structure, but that it can be partly mitigated by increased baseline population sample sizes. Results from standard accuracy tests of GSI (using only a reference baseline and/or self‐assignment) gave a false and elevated indication of the baseline power and accuracy to identify stock proportions and individuals. These findings suggest that family structure in baseline population samples should be quantified and its consequences evaluated, before carrying out GSI.  相似文献   

16.
It is proposed that the orientation of elongate objects, such as bones, may be used to identify the flow direction of ancient river deposits. If true, elongate objects could be of great value when ancient bedforms such as ripples and dunes are not visible. Two sandstone quarries were investigated wherein the paleoflow direction was determined from both bedforms and elongate dinosaur bones. A mixture of two von Mises distributions captures the observation that elongate bones transported under unidirectional flow conditions will align both parallel and perpendicular to the flow direction. Likelihood ratio tests for a mixture of two von Mises distributions are given. The power of these tests is investigated by simulation since the direction of dinosaur bones agrees with the primary bedforms if the hypothesis test comparing the dominant mean direction of the bones to the paleoflow direction fails to reject. The likelihood ratio test on the dominant mean direction has reasonable power. If the two mean directions in the mixture distribution are pi apart, a more powerful likelihood ratio test can be used. The likelihood ratio test on the hypothesis that the two mean directions are exactly pi apart is useful in determining if the assumptions of the more powerful test are satisfied.  相似文献   

17.
We have examined the statistical requirements for the detection of mixtures of two lognormal distributions in doubly truncated data when the sample size is large. The expectation-maximization algorithm was used for parameter estimation. A bootstrap approach was used to test for a mixture of distributions using the likelihood ratio statistic. Analysis of computer simulated mixtures showed that as the ratio of the difference between the means to the minimum standard deviation increases, the power for detection also increases and the accuracy of parameter estimates improves. These procedures were used to examine the distribution of red blood cell volume in blood samples. Each distribution was doubly truncated to eliminate artifactual frequency counts and tested for best fit to a single lognormal distribution or a mixture of two lognormal distributions. A single population was found in samples obtained from 60 healthy individuals. Two subpopulations of cells were detected in 25 of 27 mixtures of blood prepared in vitro. Analyses of mixtures of blood from 40 patients treated for iron-deficiency anemia showed that subpopulations could be detected in all by 6 weeks after onset of treatment. To determine if two-component mixtures could be detected, distributions were examined from untransfused patients with refractory anemia. In two patients with inherited sideroblastic anemia a mixture of microcytic and normocytic cells was found, while in the third patient a single population of microcytic cells was identified. In two family members previously identified as carriers of inherited sideroblastic anemia, mixtures of microcytic and normocytic subpopulations were found. Twenty-five patients with acquired myelodysplastic anemia were examined. A good fit to a mixture of subpopulations containing abnormal microcytic or macrocytic cells was found in two. We have demonstrated that with large sample sizes, mixtures of distributions can be detected even when distributions appear to be unimodal. These statistical techniques provide a means to characterize and quantify alterations in erythrocyte subpopulations in anemia but could also be applied to any set of grouped, doubly truncated data to test for the presence of a mixture of two lognormal distributions.  相似文献   

18.
Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.  相似文献   

19.
Phylogenetic inference and evaluating support for inferred relationships is at the core of many studies testing evolutionary hypotheses. Despite the popularity of nonparametric bootstrap frequencies and Bayesian posterior probabilities, the interpretation of these measures of tree branch support remains a source of discussion. Furthermore, both methods are computationally expensive and become prohibitive for large data sets. Recent fast approximate likelihood-based measures of branch supports (approximate likelihood ratio test [aLRT] and Shimodaira-Hasegawa [SH]-aLRT) provide a compelling alternative to these slower conventional methods, offering not only speed advantages but also excellent levels of accuracy and power. Here we propose an additional method: a Bayesian-like transformation of aLRT (aBayes). Considering both probabilistic and frequentist frameworks, we compare the performance of the three fast likelihood-based methods with the standard bootstrap (SBS), the Bayesian approach, and the recently introduced rapid bootstrap. Our simulations and real data analyses show that with moderate model violations, all tests are sufficiently accurate, but aLRT and aBayes offer the highest statistical power and are very fast. With severe model violations aLRT, aBayes and Bayesian posteriors can produce elevated false-positive rates. With data sets for which such violation can be detected, we recommend using SH-aLRT, the nonparametric version of aLRT based on a procedure similar to the Shimodaira-Hasegawa tree selection. In general, the SBS seems to be excessively conservative and is much slower than our approximate likelihood-based methods.  相似文献   

20.
Markatou M 《Biometrics》2000,56(2):483-486
Problems associated with the analysis of data from a mixture of distributions include the presence of outliers in the sample, the fact that a component may not be well represented in the data, and the problem of biases that occur when the model is slightly misspecified. We study the performance of weighted likelihood in this context. The method produces estimates with low bias and mean squared error, and it is useful in that it unearths data substructures in the form of multiple roots. This in turn indicates multiple potential mixture model fits due to the presence of more components than originally specified in the model. To compute the weighted likelihood estimates, we use as starting values the method of moment estimates computed on bootstrap subsamples drawn from the data. We address a number of important practical issues involving bootstrap sample size selection, the role of starting values, and the behavior of the roots. The algorithm used to compute the weighted likelihood estimates is competitive with EM, and it is similar to EM when the components are not well separated. Moreover, we propose a new statistical stopping rule for the termination of the algorithm. An example and a small simulation study illustrate the above points.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号