首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The relative contribution of two parental populations to a hybrid group (the admixture proportions) can be estimated using not only the frequencies of different alleles, but also the degree of molecular divergence between them. In this paper, we extend this possibility to the case of any number of parental populations. The newly derived multiparental estimator is tested by Monte Carlo simulations and by generating artificial hybrid groups by pooling mtDNA samples from human populations. The general properties (including the variance) of the two-parental estimator seem to be retained by the multiparental estimator. When mixed human populations are considered and hypervariable single-locus data are analyzed (mtDNA control region), errors in the estimated contributions appear reasonably low only when highly differentiated parental populations are involved. Finally, the method applied to the hybrid Canary Island population points to a much lower female contribution from Spain than has previously been estimated.  相似文献   

2.
Maximum-likelihood estimation of admixture proportions from genetic data   总被引:9,自引:0,他引:9  
Wang J 《Genetics》2003,164(2):747-765
For an admixed population, an important question is how much genetic contribution comes from each parental population. Several methods have been developed to estimate such admixture proportions, using data on genetic markers sampled from parental and admixed populations. In this study, I propose a likelihood method to estimate jointly the admixture proportions, the genetic drift that occurred to the admixed population and each parental population during the period between the hybridization and sampling events, and the genetic drift in each ancestral population within the interval between their split and hybridization. The results from extensive simulations using various combinations of relevant parameter values show that in general much more accurate and precise estimates of admixture proportions are obtained from the likelihood method than from previous methods. The likelihood method also yields reasonable estimates of genetic drift that occurred to each population, which translate into relative effective sizes (N(e)) or absolute average N(e)'s if the times when the relevant events (such as population split, admixture, and sampling) occurred are known. The proposed likelihood method also has features such as relatively low computational requirement compared with previous ones, flexibility for admixture models, and marker types. In particular, it allows for missing data from a contributing parental population. The method is applied to a human data set and a wolflike canids data set, and the results obtained are discussed in comparison with those from other estimators and from previous studies.  相似文献   

3.
Rubin BE  Ree RH  Moreau CS 《PloS one》2012,7(4):e33394
Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species.  相似文献   

4.
5.
This paper introduces a likelihood method of estimating ethnic admixture that uses individuals, pedigrees, or a combination of individuals and pedigrees. For each founder of a pedigree, admixture proportions are calculated by conditioning on the pedigree-wide genotypes at all ancestry-informative markers. These estimates are then propagated down the pedigree to the nonfounders by a simple averaging process. The large-sample standard errors of the founders' proportions can be similarly transformed into standard errors for the admixture proportions of the descendants. These standard errors are smaller than the corresponding standard errors when each individual is treated independently. Both hard and soft information on a founder's ancestry can be accommodated in this scheme, which has been implemented in the genetic software package Mendel. The utility of the method is demonstrated on simulated data and a real data example involving Mexican families of mixed Amerindian and Spanish ancestry.  相似文献   

6.
7.
Jiang R  Marjoram P  Borevitz JO  Tavaré S 《Genetics》2006,173(4):2257-2267
This article is concerned with a statistical modeling procedure to call single-feature polymorphisms from microarray experiments. We use this new type of polymorphism data to estimate the mutation and recombination parameters in a population. The mutation parameter can be estimated via the number of single-feature polymorphisms called in the sample. For the recombination parameter, a two-feature sampling distribution is derived in a way analogous to that for the two-locus sampling distribution with SNP data. The approximate-likelihood approach using the two-feature sampling distribution is examined and found to work well. A coalescent simulation study is used to investigate the accuracy and robustness of our method. Our approach allows the utilization of single-feature polymorphism data for inference in population genetics.  相似文献   

8.
Inferring genetic regulatory logic from expression data   总被引:1,自引:0,他引:1  
MOTIVATION: High-throughput molecular genetics methods allow the collection of data about the expression of genes at different time points and under different conditions. The challenge is to infer gene regulatory interactions from these data and to get an insight into the mechanisms of genetic regulation. RESULTS: We propose a model for genetic regulatory interactions, which has a biologically motivated Boolean logic semantics, but is of a probabilistic nature, and is hence able to confront noisy biological processes and data. We propose a method for learning the model from data based on the Bayesian approach and utilizing Gibbs sampling. We tested our method with previously published data of the Saccharomyces cerevisiae cell cycle and found relations between genes consistent with biological knowledge.  相似文献   

9.
10.
11.
12.
Using DNA sequence data from pathogens to infer transmission networks has traditionally been done in the context of epidemics and outbreaks. Sequence data could analogously be applied to cases of ubiquitous commensal bacteria; however, instead of inferring chains of transmission to track the spread of a pathogen, sequence data for bacteria circulating in an endemic equilibrium could be used to infer information about host contact networks. Here, we show--using simulated data--that multilocus DNA sequence data, based on multilocus sequence typing schemes (MLST), from isolates of commensal bacteria can be used to infer both local and global properties of the contact networks of the populations being sampled. Specifically, for MLST data simulated from small-world networks, the small world parameter controlling the degree of structure in the contact network can robustly be estimated. Moreover, we show that pairwise distances in the network--degrees of separation--correlate with genetic distances between isolates, so that how far apart two individuals in the network are can be inferred from MLST analysis of their commensal bacteria. This result has important consequences, and we show an example from epidemiology: how this result could be used to test for infectious origins of diseases of unknown etiology.  相似文献   

13.
Many important cellular protein interactions are mediated by peptide recognition domains. The ability to predict a domain's binding specificity directly from its primary sequence is essential to understanding the complexity of protein-protein interaction networks. One such recognition domain is the PDZ domain, functioning in scaffold proteins that facilitate formation of signaling networks. Predicting the PDZ domain's binding specificity was a part of the DREAM4 Peptide Recognition Domain challenge, the goal of which was to describe, as position weight matrices, the specificity profiles of five multi-mutant ERBB2IP-1 domains. We developed a method that derives multi-mutant binding preferences by generalizing the effects of single point mutations on the wild type domain's binding specificities. Our approach, trained on publicly available ERBB2IP-1 single-mutant phage display data, combined linear regression-based prediction for ligand positions whose specificity is determined by few PDZ positions, and single-mutant position weight matrix averaging for all other ligand columns. The success of our method as the winning entry of the DREAM4 competition, as well as its superior performance over a general PDZ-ligand binding model, demonstrates the advantages of training a model on a well-selected domain-specific data set.  相似文献   

14.
15.
Within a community, the abundance of any given species depends in large part on a network of direct and indirect, positive and negative interactions with other species, including shared enemies. In communities where experimental manipulations are often impossible (e.g., parasite communities), census data can be used to evaluate the strength or frequency of positive and negative associations among species. In ectoparasite communities, competitive associations can arise because of limited space or food, but facilitative associations can also exist if one species suppresses host immune defenses. In addition, positive associations among parasites could arise merely due to shared preferences for the same host, without any interaction going on. We used census data from 28 regional surveys of gamasid mites parasitic on small mammals throughout the Palaearctic, to assess how the abundance of individual mite species is influenced by the abundance and diversity of other mite species on the same host. After controlling for several confounding variables, the abundance of individual mite species was generally positively correlated with the combined abundances of all other mite species in the community. This trend was confirmed by meta-analysis of the results obtained for separate mite species. In contrast, there were generally no consistent relationships between the abundance of individual mite species and either the species richness or taxonomic diversity of the community in which they occur. These patterns were independent of mite feeding mode. Our results indicate either that synergistic facilitative interactions among mites increase the host’s susceptibility to further attacks (e.g., via immunosuppression) and lead to different species all having increased abundance on the same host, or that certain characteristics make some host species preferred habitats for many parasite species. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

16.
ABSTRACT: BACKGROUND: Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. 1 RESULTS: Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method SupportMix to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information. CONCLUSIONS: By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.  相似文献   

17.
We analyzed the European genetic contribution to 10 populations of African descent in the United States (Maywood, Illinois; Detroit; New York; Philadelphia; Pittsburgh; Baltimore; Charleston, South Carolina; New Orleans; and Houston) and in Jamaica, using nine autosomal DNA markers. These markers either are population-specific or show frequency differences >45% between the parental populations and are thus especially informative for admixture. European genetic ancestry ranged from 6.8% (Jamaica) to 22.5% (New Orleans). The unique utility of these markers is reflected in the low variance associated with these admixture estimates (SEM 1.3%-2.7%). We also estimated the male and female European contribution to African Americans, on the basis of informative mtDNA (haplogroups H and L) and Y Alu polymorphic markers. Results indicate a sex-biased gene flow from Europeans, the male contribution being substantially greater than the female contribution. mtDNA haplogroups analysis shows no evidence of a significant maternal Amerindian contribution to any of the 10 populations. We detected significant nonrandom association between two markers located 22 cM apart (FY-null and AT3), most likely due to admixture linkage disequilibrium created in the interbreeding of the two parental populations. The strength of this association and the substantial genetic distance between FY and AT3 emphasize the importance of admixed populations as a useful resource for mapping traits with different prevalence in two parental populations.  相似文献   

18.
Spatial interactions are key determinants in the dynamics of many epidemiological and ecological systems; therefore it is important to use spatio-temporal models to estimate essential parameters. However, spatially-explicit data sets are rarely available; moreover, fitting spatially-explicit models to such data can be technically demanding and computationally intensive. Thus non-spatial models are often used to estimate parameters from temporal data. We introduce a method for fitting models to temporal data in order to estimate parameters which characterise spatial epidemics. The method uses semi-spatial models and pair approximation to take explicit account of spatial clustering of disease without requiring spatial data. The approach is demonstrated for data from experiments with plant populations invaded by a common soilborne fungus, Rhizoctonia solani. Model inferences concerning the number of sources of disease and primary and secondary infections are tested against independent measures from spatio-temporal data. The applicability of the method to a wide range of host-pathogen systems is discussed.  相似文献   

19.

Background

Modern approaches to treating genetic disorders, cancers and even epidemics rely on a detailed understanding of the underlying gene signaling network. Previous work has used time series microarray data to infer gene signaling networks given a large number of accurate time series samples. Microarray data available for many biological experiments is limited to a small number of arrays with little or no time series guarantees. When several samples are averaged to examine differences in mean value between a diseased and normal state, information from individual samples that could indicate a gene relationship can be lost.

Results

Asynchronous Inference of Regulatory Networks (AIRnet) provides gene signaling network inference using more practical assumptions about the microarray data. By learning correlation patterns for the changes in microarray values from all pairs of samples, accurate network reconstructions can be performed with data that is normally available in microarray experiments.

Conclusions

By focussing on the changes between microarray samples, instead of absolute values, increased information can be gleaned from expression data.
  相似文献   

20.
MOTIVATION: Identifying protein-protein interactions is critical for understanding cellular processes. Because protein domains represent binding modules and are responsible for the interactions between proteins, computational approaches have been proposed to predict protein interactions at the domain level. The fact that protein domains are likely evolutionarily conserved allows us to pool information from data across multiple organisms for the inference of domain-domain and protein-protein interaction probabilities. RESULTS: We use a likelihood approach to estimating domain-domain interaction probabilities by integrating large-scale protein interaction data from three organisms, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. The estimated domain-domain interaction probabilities are then used to predict protein-protein interactions in S.cerevisiae. Based on a thorough comparison of sensitivity and specificity, Gene Ontology term enrichment and gene expression profiles, we have demonstrated that it may be far more informative to predict protein-protein interactions from diverse organisms than from a single organism. AVAILABILITY: The program for computing the protein-protein interaction probabilities and supplementary material are available at http://bioinformatics.med.yale.edu/interaction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号