期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An EM algorithm for mapping quantitative resistance loci

Xu C Zhang YM Xu S 《Heredity》2005,94(1):119-128

Many disease resistance traits in plants have a polygenic background and the disease phenotypes are modified by environmental factors. As a consequence, the phenotypic values usually show a quantitative variation. The phenotypes of such disease traits, however, are often measured in discrete but ordered categories. These traits are called ordinal traits. In terms of disease resistance, they are called quantitative resistance traits, as opposed to qualitative resistance traits, and are controlled by the quantitative resistance loci (QRL). Classical quantitative trait locus mapping methods are not optimal for ordinal trait analysis because the assumption of normal distribution is violated. Methods for mapping binary trait loci are not suitable either because there are more than two categories in ordinal traits. We developed a maximum likelihood method to map these QRL. The method is implemented via a multicycle expectation-conditional-maximization (ECM) algorithm under the threshold model, where we can estimate both the QRL effects and the thresholds that link the disease liability and the categorical phenotype. The method is verified in simulated data under various combinations of the parameters. An SAS program is available to implement the multicycle ECM algorithm. The program can be downloaded from our website at www.statgen.ucr.edu. 相似文献

2.

A likelihood approach for mapping growth trajectories using dominant markers in a phase-unknown full-sib family

Ma CX Lin M Littell RC Yin T Wu R 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2004,108(4):699-705

Dominant markers have been commonly used in mapping quantitative trait loci (QTLs) in outcrossing species, in which not much prior genome information is available. But the dominant nature of these markers may lead to reduced QTL mapping precision and power. A new statistical method is proposed to incorporate growth laws into a QTL mapping framework, under which the use of the efficiency of dominant markers can be increased. This new method can be used to identify specific QTLs affecting differentiation in growth trajectories, and further estimate the timing of a QTL to turn on, or turn off, affecting growth during the entire ontogeny of a species. Using this method based on dominant markers we have successfully mapped a QTL for stem height growth trajectories to a linkage group in a forest tree. The implications of this method for the understanding of the genetic architecture of growth using dominant markers are discussed.Communicated by F. Salamini 相似文献

3.

The EM algorithm for maximum likelihood estimation in the mover-stayer model

C Fuchs J B Greenhouse 《Biometrics》1988,44(2):605-613

The discrete-time mover-stayer model (Blumen, Kogan, and McCarthy, 1955, The Industrial Mobility of Labor as a Probability Process, Ithaca, New York: Cornell University Press) is a useful model for studying changes over time in heterogeneous populations. Using the EM algorithm, we present an alternative method for obtaining maximum likelihood estimates of the parameters of the mover-stayer model, and consider an extension of the basic model to the problem of incomplete follow-up in panel studies. The models and the methods are illustrated with data from a community-based survey of changes in mental health status over a 1-year period. 相似文献

4.

Use of the EM algorithm for maximum likelihood estimation in electron microscope autoradiography

AYKROYD R. G.; ANDERSON C. W. 《Biometrika》1994,81(1):41-52

相似文献

5.

EM chromosome mapping using surface-spread polytene chromosomes

W.-E. Kalisch 《Genetica》1982,60(1):21-24

Electron micrographs as well as light micrographs of individual surface-spread polytene (SSP) chromosomes indicate more detailed banding patterns than standard squash preparations do. For EM preparations of SSP chromosomes a simple technique is described, avoiding thin-sectioning of chromosomes. 相似文献

6.

A gene selection algorithm based on the gene regulation probability using maximal likelihood estimation

Wang HQ Huang DS 《Biotechnology letters》2005,27(8):597-603

A novel gene selection algorithm based on the gene regulation probability is proposed. In this algorithm, a probabilistic model is established to estimate gene regulation probabilities using the maximum likelihood estimation method and then these probabilities are used to select key genes related by class distinction. The application on the leukemia data-set suggests that the defined gene regulation probability can identify the key genes to the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) class distinction and the result of our proposed algorithm is competitive to those of the previous algorithms. 相似文献

7.

Finding maximum likelihood estimates of patterned covariance matrices by the EM algorithm 总被引：1，自引：0，他引：1

RUBIN DONALD B.; SZATROWSKI TED H. 《Biometrika》1982,69(3):657-660

相似文献

8.

A genomewide single-nucleotide-polymorphism panel for Mexican American admixture mapping

下载免费PDF全文

Tian C Hinds DA Shigeta R Adler SG Lee A Pahl MV Silva G Belmont JW Hanson RL Knowler WC Gregersen PK Ballinger DG Seldin MF 《American journal of human genetics》2007,80(6):1014-1023

For admixture mapping studies in Mexican Americans (MAM), we define a genomewide single-nucleotide-polymorphism (SNP) panel that can distinguish between chromosomal segments of Amerindian (AMI) or European (EUR) ancestry. These studies used genotypes for >400,000 SNPs, defined in EUR and both Pima and Mayan AMI, to define a set of ancestry-informative markers (AIMs). The use of two AMI populations was necessary to remove a subset of SNPs that distinguished genotypes of only one AMI subgroup from EUR genotypes. The AIMs set contained 8,144 SNPs separated by a minimum of 50 kb with only three intermarker intervals >1 Mb and had EUR/AMI FST values >0.30 (mean FST = 0.48) and Mayan/Pima FST values <0.05 (mean FST < 0.01). Analysis of a subset of these SNP AIMs suggested that this panel may also distinguish ancestry between EUR and other disparate AMI groups, including Quechuan from South America. We show, using realistic simulation parameters that are based on our analyses of MAM genotyping results, that this panel of SNP AIMs provides good power for detecting disease-associated chromosomal segments for genes with modest ethnicity risk ratios. A reduced set of 5,287 SNP AIMs captured almost the same admixture mapping information, but smaller SNP sets showed substantial drop-off in admixture mapping information and power. The results will enable studies of type 2 diabetes, rheumatoid arthritis, and other diseases among which epidemiological studies suggest differences in the distribution of ancestry-associated susceptibility. 相似文献

9.

A hybrid likelihood algorithm for risk modelling

A. M. Kellerer M. Kreisheimer D. Chmelevsky D. Barclay 《Radiation and environmental biophysics》1995,34(1):13-20

The risk of radiation-induced cancer is assessed through the follow-up of large cohorts, such as atomic bomb survivors or underground miners who have been occupationally exposed to radon and its decay products. The models relate to the dose, age and time dependence of the excess tumour rates, and they contain parameters that are estimated in terms of maximum likelihood computations. The computations are performed with the software package EPICURE, which contains the two main options of person-by person regression or of Poisson regression with grouped data. The Poisson regression is most frequently employed, but there are certain models that require an excessive number of cells when grouped data are used. One example involves computations that account explicitly for the temporal distribution of continuous exposures, as they occur with underground miners. In past work such models had to be approximated, but it is shown here that they can be treated explicitly in a suitably reformulated person-by person computation of the likelihood. The algorithm uses the familiar partitioning of the log-likelihood into two terms,L ₁ andL ₀. The first term,L ₁, represents the contribution of the events (tumours). It needs to be evaluated in the usual way, but constitutes no computational problem. The second term,L ₀, represents the event-free periods of observation. It is, in its usual form, unmanageable for large cohorts. However, it can be reduced to a simple form, in which the number of computational steps is independent of cohort size. The method requires less computing time and computer memory, but more importantly it leads to more stable numerical results by obviating the need for grouping the data. The algorithm may be most relevant to radiation risk modelling, but it can facilitate the modelling of failure-time data in general. 相似文献

10.

An improved procedure of mapping a quantitative trait locus via the EM algorithm using posterior probabilities

Saurabh Ghosh Partha P. Majumder 《Journal of genetics》2000,79(2):47-53

Mapping a locus controlling a quantitative genetic trait (e.g. blood pressure) to a specific genomic region is of considerable contemporary interest. Data on the quantitative trait under consideration and several codominant genetic markers with known genomic locations are collected from members of families and statistically analysed to estimate the recombination fraction, θ, between the putative quantitative trait locus and a genetic marker. One of the major complications in estimating θ for a quantitative trait in humans is the lack of haplotype information on members of families. We have devised a computationally simple two-stage method of estimation of θ in the absence of haplotypic information using the expectation-maximization (EM) algorithm. In the first stage, parameters of the quantitative trait locus (QTL) are estimated on the basis of data of a sample of unrelated individuals and a Bayes’s rule is used to classify each parent into a QTL genotypic class. In the second stage, we have proposed an EM algorithm for obtaining the maximum-likelihood estimate of θ based on data of informative families (which are identified upon inferring parental QTL genotypes performed in the first stage). The purpose of this paper is to investigate whether, instead of using genotypically ‘classified’ data of parents, the use of posterior probabilities of QT genotypes of parents at the second stage yields better estimators. We show, using simulated data, that the proposed procedure using posterior probabilities is statistically more efficient than our earlier classification procedure, although it is computationally heavier. 相似文献

11.

A structural EM algorithm for phylogenetic inference.

Nir Friedman Matan Ninio Itsik Pe'er Tal Pupko 《Journal of computational biology》2002,9(2):331-353

A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework. 相似文献

12.

A genomewide admixture mapping panel for Hispanic/Latino populations 总被引：4，自引：1，他引：4

下载免费PDF全文

Mao X Bigham AW Mei R Gutierrez G Weiss KM Brutsaert TD Leon-Velarde F Moore LG Vargas E McKeigue PM Shriver MD Parra EJ 《American journal of human genetics》2007,80(6):1171-1178

Admixture mapping (AM) is a promising method for the identification of genetic risk factors for complex traits and diseases showing prevalence differences among populations. Efficient application of this method requires the use of a genomewide panel of ancestry-informative markers (AIMs) to infer the population of origin of chromosomal regions in admixed individuals. Genomewide AM panels with markers showing high frequency differences between West African and European populations are already available for disease-gene discovery in African Americans. However, no such a map is yet available for Hispanic/Latino populations, which are the result of two-way admixture between Native American and European populations or of three-way admixture of Native American, European, and West African populations. Here, we report a genomewide AM panel with 2,120 AIMs showing high frequency differences between Native American and European populations. The average intermarker genetic distance is ~1.7 cM. The panel was identified by genotyping, with the Affymetrix GeneChip Human Mapping 500K array, a population sample with European ancestry, a Mesoamerican sample comprising Maya and Nahua from Mexico, and a South American sample comprising Aymara/Quechua from Bolivia and Quechua from Peru. The main criteria for marker selection were both high information content for Native American/European ancestry (measured as the standardized variance of the allele frequencies, also known as "f value") and small frequency differences between the Mesoamerican and South American samples. This genomewide AM panel will make it possible to apply AM approaches in many admixed populations throughout the Americas. 相似文献

13.

A likelihood approach for functional response models

Toshinori Okuyama 《Biological Control》2012,60(2):103-107

Functional response is an important determinant of community dynamics, and thus empirical methods for characterizing functional responses are as important in understanding ecological processes. The most commonly used method is based on the sum of squares, and the maximum likelihood method is rarely used. When the likelihood method is used, potentially inappropriate probability distributions such as binomial distributions are typically assumed for the number of prey eaten in experiments. In this study, I present a likelihood approach in which the probability distributions are generated by mechanistic understanding of predation processes using Monte Carlo simulations. An example is given on the Holling type II functional response model, but the method is flexible and allows characterization of a wide variety of functional response models. In the example, the likelihood method consistently resulted in superior estimates than the least squares method. 相似文献

14.

A comparison of physical mapping algorithms based on the maximum likelihood model

Huang J Bhandarkar SM 《Bioinformatics (Oxford, England)》2003,19(11):1303-1310

MOTIVATION: Physical mapping of chromosomes using the maximum likelihood (ML) model is a problem of high computational complexity entailing both discrete optimization to recover the optimal probe order as well as continuous optimization to recover the optimal inter-probe spacings. In this paper, two versions of the genetic algorithm (GA) are proposed, one with heuristic crossover and deterministic replacement and the other with heuristic crossover and stochastic replacement, for the physical mapping problem under the maximum likelihood model. The genetic algorithms are compared with two other discrete optimization approaches, namely simulated annealing (SA) and large-step Markov chains (LSMC), in terms of solution quality and runtime efficiency. RESULTS: The physical mapping algorithms based on the GA, SA and LSMC have been tested using synthetic datasets and real datasets derived from cosmid libraries of the fungus Neurospora crassa. The GA, especially the version with heuristic crossover and stochastic replacement, is shown to consistently outperform the SA-based and LSMC-based physical mapping algorithms in terms of runtime and final solution quality. Experimental results on real datasets and simulated datasets are presented. Further improvements to the GA in the context of physical mapping under the maximum likelihood model are proposed. AVAILABILITY: The software is available upon request from the first author. 相似文献

15.

A nearest-neighboring-end algorithm for genetic mapping

Crane CF Crane YM 《Bioinformatics (Oxford, England)》2005,21(8):1579-1591

MOTIVATION: High-throughput methods are beginning to make possible the genotyping of thousands of loci in thousands of individuals, which could be useful for tightly associating phenotypes to candidate loci. Current mapping algorithms cannot handle so many data without building hierarchies of framework maps. RESULTS: A version of Kruskal's minimum spanning tree algorithm can solve any genetic mapping problem that can be stated as marker deletion from a set of linkage groups. These include backcross, recombinant inbred, haploid and double-cross recombinational populations, in addition to conventional deletion and radiation hybrid populations. The algorithm progressively joins linkage groups at increasing recombination fractions between terminal markers, and attempts to recognize and correct erroneous joins at peaks in recombination fraction. The algorithm is O (mn3) for m individuals and n markers, but the mean run time scales close to mn2. It is amenable to parallel processing and has recovered true map order in simulations of large backcross, recombinant inbred and deletion populations with up to 37,005 markers. Simulations were used to investigate map accuracy in response to population size, allelic dominance, segregation distortion, missing data and random typing errors. It produced accurate maps when marker distribution was sufficiently uniform, although segregation distortion could induce translocated marker orders. The algorithm was also used to map 1003 loci in the F7 ITMI population of bread wheat, Triticum aestivum L. emend Thell., where it shortened an existing standard map by 16%, but it failed to associate blocks of markers properly across gaps within linkage groups. This was because it depends upon the rankings of recombination fractions at individual markers, and is susceptible to sampling error, typing error and joint selection involving the terminal markers of nearly finished linkage groups. Therefore, the current form of the algorithm is useful mainly to improve local marker ordering in linkage groups obtained in other ways. AVAILABILITY: The source code and supplemental data are http://www.iubio.bio.indiana.edu/soft/molbio/qtl/flipper/ CONTACT: ccrane@purdue.edu. 相似文献

16.

Methods for high-density admixture mapping of disease genes 总被引：26，自引：0，他引：26

下载免费PDF全文

Patterson N Hattangadi N Lane B Lohmueller KE Hafler DA Oksenberg JR Hauser SL Smith MW O'Brien SJ Altshuler D Daly MJ Reich D 《American journal of human genetics》2004,74(5):979-1000

Admixture mapping (also known as mapping by admixture linkage disequilibrium, or MALD) has been proposed as an efficient approach to localizing disease-causing variants that differ in frequency (because of either drift or selection) between two historically separated populations. Near a disease gene, patient populations descended from the recent mixing of two or more ethnic groups should have an increased probability of inheriting the alleles derived from the ethnic group that carries more disease-susceptibility alleles. The central attraction of admixture mapping is that, since gene flow has occurred recently in modern populations (e.g., in African and Hispanic Americans in the past 20 generations), it is expected that admixture-generated linkage disequilibrium should extend for many centimorgans. High-resolution marker sets are now becoming available to test this approach, but progress will require (a). computational methods to infer ancestral origin at each point in the genome and (b). empirical characterization of the general properties of linkage disequilibrium due to admixture. Here we describe statistical methods to estimate the ancestral origin of a locus on the basis of the composite genotypes of linked markers, and we show that this approach accurately estimates states of ancestral origin along the genome. We apply this approach to show that strong admixture linkage disequilibrium extends, on average, for 17 cM in African Americans. Finally, we present power calculations under varying models of disease risk, sample size, and proportions of ancestry. Studying approximately 2500 markers in approximately 2500 patients should provide power to detect many regions contributing to common disease. A particularly important result is that the power of an admixture mapping study to detect a locus will be nearly the same for a wide range of mixture scenarios: the mixture proportion should be 10%-90% from both ancestral populations. 相似文献

17.

Semiparametric estimation of random effects using the Cox model based on the EM algorithm.

J P Klein 《Biometrics》1992,48(3):795-806

Consider a survival experiment where individuals within a certain subset of the population share a common, unobservable, random frailty. Such a frailty could be an unobservable genetic or early environmental effect if individuals were in sibling groups or an environmental effect if individuals were grouped by households. Suppose that if the frailty, omega, is known, the Cox proportional hazards model for the observable covariates is valid with the consequence of the random effect being a multiplicative factor on the hazard rate. Assuming tht the random frailties follow a gamma distribution, estimates of the fixed and random effects are obtained by using an EM algorithm based on a profile likelihood construction. The method developed is applied to the Framingham Heart Study to examine the risks of smoking and cholesterol levels, adjusting for potential random effects. 相似文献

18.

A simulated annealing algorithm for maximum likelihood pedigree reconstruction

Almudevar A 《Theoretical population biology》2003,63(2):63-75

The calculation of maximum likelihood pedigrees for related organisms using genotypic data is considered. The problem is formulated so that the domain of optimization is a permutation space. This is a feature shared by the travelling salesman problem, for which simulated annealing is known to be effective. Using this technique it is found that pedigrees can be reconstructed with minimal error using genotypic data of a quality currently realizable. In complex pedigrees accurate reconstruction can be done with no a priori age or sex information. For smaller numbers of individuals a method of efficiently enumerating all admissible pedigrees of nonzero likelihood is given. 相似文献

19.

Analytical correction for multiple testing in admixture mapping

Sha Q Zhang X Zhu X Zhang S 《Human heredity》2006,62(2):55-63

Admixture mapping, using unrelated individuals from the admixture populations that result from recent mating between members of each parental population, is an efficient approach to localize disease-causing variants that differ in frequency between two or more historically separated populations. Recently, several methods have been proposed to test linkage between a susceptibility gene and a disease locus by using admixture-generated linkage disequilibrium (LD) for each of the genotyped markers. In a genome scan, admixture mapping usually tests 2,000 to 3,000 markers across the genome. Currently, either a very conservative Sidak (or Bonferroni) correction or a very time consuming simulation-based method is used to correct for the multiple tests and evaluate the overall p value. In this report, we propose a computationally efficient analytical approach for correction of the multiple tests and for calculating the overall p value for an admixture genome scan. Except for the Sidak (or Bonferroni) correction, our proposed method is the first analytical approach for correction of the multiple tests and for calculating the overall p value for a genome scan. Our simulation studies show that the proposed method gives correct overall type I error rates for genome scans in all cases, and is much more computationally efficient than simulation-based methods. 相似文献

20.

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Louis Alberto García-Cortés Daniel Sorensen 《遗传、选种与进化》2001,33(4):443-452

Two methods of computing Monte Carlo estimators of variance components using restricted maximum likelihood via the expectation-maximisation algorithm are reviewed. A third approach is suggested and the performance of the methods is compared using simulated data. 相似文献