首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Heritable variation is essential for evolution by natural selection. In Neotropical army ants, the ecological role of a given species is linked intimately to the morphological variation within the sterile worker caste. Furthermore, the army ant Eciton burchellii is highly polyandrous, presenting a unique opportunity to explore heritability of morphological traits among related workers sharing the same colonial environment. In order to exploit the features of this organismal system, we generated a large genetic and morphological dataset and applied our new method that employs geometric morphometrics (GM) to detect the heritability of complex morphological traits. After validating our approach with an existing dataset of known heritability, we simulated our ability to detect heritable variation given our sampled genotypes, demonstrating the method can robustly recover heritable variation of small effect size. Using this method, we tested for genetic caste determination and heritable morphological variation using genetic and morphological data on 216 individuals of E. burchellii. Results reveal this ant lineage (1) has the highest mating frequency known in ants, (2) demonstrates no paternal genetic caste determination, and (3) suggests a lack of heritable morphological variation in this complex trait associated with paternal genotype. We recommend this method for leveraging the increased resolution of GM data to explore and understand heritable morphological variation in nonmodel organisms.  相似文献   

2.
High-throughput genotyping and sequencing techniques are rapidly and inexpensively providing large amounts of human genetic variation data. Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability and have been implicated in several human diseases, including cancer. Amino acid mutations resulting from non-synonymous SNPs in coding regions may generate protein functional changes that affect cell proliferation. In this study, we developed a machine learning approach to predict cancer-causing missense variants. We present a Support Vector Machine (SVM) classifier trained on a set of 3163 cancer-causing variants and an equal number of neutral polymorphisms. The method achieve 93% overall accuracy, a correlation coefficient of 0.86, and area under ROC curve of 0.98. When compared with other previously developed algorithms such as SIFT and CHASM our method results in higher prediction accuracy and correlation coefficient in identifying cancer-causing variants.  相似文献   

3.
While the cost and speed of generating genomic data have come down dramatically in recent years, the slow pace of collecting medical data for large cohorts continues to hamper genetic research. Here we evaluate a novel online framework for obtaining large amounts of medical information from a recontactable cohort by assessing our ability to replicate genetic associations using these data. Using web-based questionnaires, we gathered self-reported data on 50 medical phenotypes from a generally unselected cohort of over 20,000 genotyped individuals. Of a list of genetic associations curated by NHGRI, we successfully replicated about 75% of the associations that we expected to (based on the number of cases in our cohort and reported odds ratios, and excluding a set of associations with contradictory published evidence). Altogether we replicated over 180 previously reported associations, including many for type 2 diabetes, prostate cancer, cholesterol levels, and multiple sclerosis. We found significant variation across categories of conditions in the percentage of expected associations that we were able to replicate, which may reflect systematic inflation of the effects in some initial reports, or differences across diseases in the likelihood of misdiagnosis or misreport. We also demonstrated that we could improve replication success by taking advantage of our recontactable cohort, offering more in-depth questions to refine self-reported diagnoses. Our data suggest that online collection of self-reported data from a recontactable cohort may be a viable method for both broad and deep phenotyping in large populations.  相似文献   

4.
Gene flow among invertebrate populations inhabiting bodies of nonflowing freshwater such as ponds or lakes must at some stage involve transport across habitat unsuitable for adult stages. Consequently the potential for interpopulational differentiation is high in these species, yet empirical studies of lake populations of Cladocerans such as Daphnia have failed to reveal high levels of genetic distinctiveness among populations and have led to much speculation about how these populations exchange genes and remain cohesive evolutionary units. In this study we surveyed 42 Oregon lake populations of Daphnia from the D. pulex species complex for genetic variation within the mitochondrial DNA control region. We have used this data to test the relative abilities of various ecological factors to explain the observed patterns in genetic differentiation among lakes. Despite limited genetic variation detected among our samples--11 very similar RFLP-defined mtDNA genotypes from 388 individuals--analyses of nucleotide variance using analogs to Wright's F statistics indicate that when multilake populations are defined in terms of the river drainage basin to which they belong, strong and significant amounts of among-population genetic variation can be detected at this locus (F(ST) estimates between 0.5 and 0.6). In contrast, we fail to detect consistent significant among-population variation when populations are defined on the basis of regional physical geography, bird migratory flyways, or lake trophic status. The manner in which the data are compiled, that is, whether RFLPs or nucleotide sequences are used, has little effect on the overall conclusions, yet it is clear that nucleotide sequence data would lower the standard errors of F(ST) estimates. We propose that periodic widescale flooding during the late Pleistocene may be an important mechanism to homogenize genetic differences among lake Daphnia continent-wide south of the southern-most extent of Pleistocene glaciation.  相似文献   

5.
The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats.  相似文献   

6.
Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics. We present here examples of many of the ways these new data can be analyzed from a population perspective using results from our laboratory on multiple individual DNA-based polymorphisms, many clustered in haplotypes, studied in multiple populations representing all major geographic regions of the world. These data support an "out of Africa" hypothesis for human dispersal around the world and begin to refine the understanding of population structures and genetic relationships. We are also developing baseline information against which we can compare findings at different loci to aid in the identification of loci subject, now and in the past, to selection (directional or balancing). We do not yet have a comprehensive understanding of the extensive variation in the human genome, but some of that understanding is coming from population genetics.  相似文献   

7.
Evolutionary change results from selection acting on genetic variation. For migration to be successful, many different aspects of an animal’s physiology and behaviour need to function in a co-coordinated way. Changes in one migratory trait are therefore likely to be accompanied by changes in other migratory and life-history traits. At present, we have some knowledge of the pressures that operate at the various stages of migration, but we know very little about the extent of genetic variation in various aspects of the migratory syndrome. As a consequence, our ability to predict which species is capable of what kind of evolutionary change, and at which rate, is limited. Here, we review how our evolutionary understanding of migration may benefit from taking a quantitative-genetic approach and present a framework for studying the causes of phenotypic variation. We review past research, that has mainly studied single migratory traits in captive birds, and discuss how this work could be extended to study genetic variation in the wild and to account for genetic correlations and correlated selection. In the future, reaction-norm approaches may become very important, as they allow the study of genetic and environmental effects on phenotypic expression within a single framework, as well as of their interactions. We advocate making more use of repeated measurements on single individuals to study the causes of among-individual variation in the wild, as they are easier to obtain than data on relatives and can provide valuable information for identifying and selecting traits. This approach will be particularly informative if it involves systematic testing of individuals under different environmental conditions. We propose extending this research agenda by using optimality models to predict levels of variation and covariation among traits and constraints. This may help us to select traits in which we might expect genetic variation, and to identify the most informative environmental axes. We also recommend an expansion of the passerine model, as this model does not apply to birds, like geese, where cultural transmission of spatio-temporal information is an important determinant of migration patterns and their variation.  相似文献   

8.
MOTIVATION: Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are constructed for multiple strains of the same model organism under the same condition. The goal of analyses of these data is to find differences in regulatory patterns due to genetic variation between strains, often without a phenotype of interest in mind. We present a new method based on notions of tight regulation and differential expression to look for sets of genes which appear to be significantly affected by genetic variation. RESULTS: When we use categorical phenotype information, as in the Alzheimer's and diabetes datasets, our method finds many of the same gene sets as gene set enrichment analysis. In addition, our notion of correlated gene sets allows us to focus our efforts on biological processes subjected to tight regulation. In murine hematopoietic stem cells, we are able to discover significant gene sets independent of a phenotype of interest. Some of these gene sets are associated with several blood-related phenotypes. AVAILABILITY: The programs are available by request from the authors.  相似文献   

9.
Large amount of population-scale genetic variation data are being collected in populations. One potentially important biological problem is to infer the population genealogical history from these genetic variation data. Partly due to recombination, genealogical history of a set of DNA sequences in a population usually cannot be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work in. We first show that the "tree scan" method can be converted to a probabilistic inference method based on a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden-Markov-model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.  相似文献   

10.
Recent studies have suggested that females of the field cricket Gryllus bimaculatus exercise post-copulatory choice over the paternity of their offspring. There is evidence that these choices are made in relation to the genetic compatibility of mates rather than their absolute quality, but the magnitude of heritable differences in males has not been thoroughly examined. Using a half-sib breeding design we measured additive genetic variance and dam effects in a suite of reproductive and non-reproductive traits. Both components explained relatively little of the phenotypic variance across traits. The dam component in our design contains variance caused by both maternal effects and dominance. If maternal effects are negligible as suggested by previous studies, our data suggest that dominance variance is an important source of variation in these traits. The lack of additive genetic variation, but possible existence of large amounts of non-additive genetic variation is consistent with the idea that female mate choice and multiple mating may be driven by differences in genetic compatibility between potential mates rather than by differences in genetic quality.  相似文献   

11.
Bayesian logistic regression using a perfect phylogeny   总被引:1,自引:0,他引:1  
Haplotype data capture the genetic variation among individuals in a population and among populations. An understanding of this variation and the ancestral history of haplotypes is important in genetic association studies of complex disease. We introduce a method for detecting associations between disease and haplotypes in a candidate gene region or candidate block with little or no recombination. A perfect phylogeny demonstrates the evolutionary relationship between single-nucleotide polymorphisms (SNPs) in the haplotype blocks. Our approach extends the logic regression technique of Ruczinski and others (2003) to a Bayesian framework, and constrains the model space to that of a perfect phylogeny. Environmental factors, as well as their interactions with SNPs, may be incorporated into the regression framework. We demonstrate our method on simulated data from a coalescent model, as well as data from a candidate gene study of sarcoidosis.  相似文献   

12.
Recent studies have suggested that females of the field cricket Gryllus bimaculatus exercise post-copulatory choice over the paternity of their offspring. There is evidence that these choices are made in relation to the genetic compatibility of mates rather than their absolute quality, but the magnitude of heritable differences in males has not been thoroughly examined. Using a half-sib breeding design we measured additive genetic variance and dam effects in a suite of reproductive and non-reproductive traits. Both components explained relatively little of the phenotypic variance across traits. The dam component in our design contains variance caused by both maternal effects and dominance. If maternal effects are negligible as suggested by previous studies, our data suggest that dominance variance is an important source of variation in these traits. The lack of additive genetic variation, but possible existence of large amounts of non-additive genetic variation is consistent with the idea that female mate choice and multiple mating may be driven by differences in genetic compatibility between potential mates rather than by differences in genetic quality.  相似文献   

13.
Haplotype reconstruction from genotype data using Imperfect Phylogeny   总被引:13,自引:0,他引:13  
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes that shows that SNPs are organized in highly correlated 'blocks'. In a few recent studies, considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks, and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (<2% over the data) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared with previous methods such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large datasets. AVAILABILITY: The algorithm is available via a Web server at http://www.calit2.net/compbio/hap/  相似文献   

14.
Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth–based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth–based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.  相似文献   

15.
Reducing disease prevalence through selection for host resistance offers a desirable alternative to chemical treatment. Selection for host resistance has proven difficult, however, due to low heritability estimates. These low estimates may be caused by a failure to capture all the relevant genetic variance in disease resistance, as genetic analysis currently is not taylored to estimate genetic variation in infectivity. Host infectivity is the propensity of transmitting infection upon contact with a susceptible individual, and can be regarded as an indirect effect to disease status. It may be caused by a combination of physiological and behavioural traits. Though genetic variation in infectivity is difficult to measure directly, Indirect Genetic Effect (IGE) models, also referred to as associative effects or social interaction models, allow the estimation of this variance from more readily available binary disease data (infected/non-infected). We therefore generated binary disease data from simulated populations with known amounts of variation in susceptibility and infectivity to test the adequacy of traditional and IGE models. Our results show that a conventional model fails to capture the genetic variation in infectivity inherent in populations with simulated infectivity. An IGE model, on the other hand, does capture some of the variation in infectivity. Comparison with expected genetic variance suggests that there is scope for further methodological improvement, and that potential responses to selection may be greater than values presented here. Nonetheless, selection using an index of estimated direct and indirect breeding values was shown to have a greater genetic selection differential and reduced future disease risk than traditional selection for resistance only. These findings suggest that if genetic variation in infectivity substantially contributes to disease transmission, then breeding designs which explicitly incorporate IGEs might help reduce disease prevalence.  相似文献   

16.
Moorad JA  Wade MJ 《Genetics》2005,170(3):1373-1384
Inbreeding depression is expected to play an important but complicated role in evolution. If we are to understand the evolution of inbreeding depression (i.e., purging), we need quantitative genetic interpretations of its variation. We introduce an experimental design in which sires are mated to multiple dams, some of which are unrelated to the sire but others are genetically related owing to an arbitrary number of prior generations of selfing or sib-mating. In this way we introduce the concept of "inbreeding depression effect variance," a parameter more relevant to selection and the purging of inbreeding depression than previous measures. We develop an approach for interpreting the genetic basis of the variation in inbreeding depression by: (1) predicting the variation in inbreeding depression given arbitrary initial genetic variance and (2) estimating genetic variance components given half-sib covariances estimated by our experimental design. As quantitative predictions of selection depend upon understanding genetic variation, our approach reveals the important difference between how inbreeding depression is measured experimentally and how it is viewed by selection.  相似文献   

17.
Large-scale, multilocus genetic association studies require powerful and appropriate statistical-analysis tools that are designed to relate genotype and haplotype information to phenotypes of interest. Many analysis approaches consider relating allelic, haplotypic, or genotypic information to a trait through use of extensions of traditional analysis techniques, such as contingency-table analysis, regression methods, and analysis-of-variance techniques. In this work, we consider a complementary approach that involves the characterization and measurement of the similarity and dissimilarity of the allelic composition of a set of individuals' diploid genomes at multiple loci in the regions of interest. We describe a regression method that can be used to relate variation in the measure of genomic dissimilarity (or "distance") among a set of individuals to variation in their trait values. Weighting factors associated with functional or evolutionary conservation information of the loci can be used in the assessment of similarity. The proposed method is very flexible and is easily extended to complex multilocus-analysis settings involving covariates. In addition, the proposed method actually encompasses both single-locus and haplotype-phylogeny analysis methods, which are two of the most widely used approaches in genetic association analysis. We showcase the method with data described in the literature. Ultimately, our method is appropriate for high-dimensional genomic data and anticipates an era when cost-effective exhaustive DNA sequence data can be obtained for a large number of individuals, over and above genotype information focused on a few well-chosen loci.  相似文献   

18.
Genetic mapping in the presence of genotyping errors   总被引:1,自引:0,他引:1       下载免费PDF全文
Cartwright DA  Troggio M  Velasco R  Gutin A 《Genetics》2007,176(4):2521-2527
Genetic maps are built using the genotypes of many related individuals. Genotyping errors in these data sets can distort genetic maps, especially by inflating the distances. We have extended the traditional likelihood model used for genetic mapping to include the possibility of genotyping errors. Each individual marker is assigned an error rate, which is inferred from the data, just as the genetic distances are. We have developed a software package, called TMAP, which uses this model to find maximum-likelihood maps for phase-known pedigrees. We have tested our methods using a data set in Vitis and on simulated data and confirmed that our method dramatically reduces the inflationary effect caused by increasing the number of markers and leads to more accurate orders.  相似文献   

19.
MOTIVATION: Large-scale association studies, investigating the genetic determinants of a phenotype of interest, are producing increasing amounts of genomic variation data on human cohorts. A fundamental challenge in these studies is the detection of genotypic patterns that discriminate individuals exhibiting the phenotype under study from individuals that do not possess it. The difficulty stems from the large number of single nucleotide polymorphism (SNP) combinations that have to be tested. The discrimination problem becomes even more involved when additional high-throughput data, such as gene expression data, are available for the same cohort. RESULTS: We have developed a graph theoretic approach for identifying discriminating patterns (DPs) for a given phenotype in a genotyped population. The method is based on representing the SNP data as a bipartite graph of individuals and their SNP states, and identifying fully connected subgraphs of this graph that relate individuals enriched for a given phenotypic group. The method can handle additional data types such as expression profiles of the genotyped population. It is reminiscent of biclustering approaches with the crucial difference that its search process is guided by the phenotype under consideration in a supervised manner. We tested our approach in simulations and on real data. In simulations, our method was able to retrieve planted patterns with high success rate. We then applied our approach to a dataset of 72 breast cancer patients with available gene expression profiles, genotyped over 695 SNPs. We detected several DPs that were highly significant with respect to various clinical phenotypes, and investigated the groups of patients and the groups of genes they defined. We found the patient groups to be highly enriched for other phenotypes and to display expression coherency among their profiles. The gene groups displayed functional coherency and involved genes with known role in cancer, providing additional support to their involvement. AVAILABILITY: The program is available upon request.  相似文献   

20.
The red flour beetle, Tribolium castaneum, is a common pest, which has become an important model study organism, especially in genetic, ecological and evolutionary research. Although almost all studies on this species have been conducted using established laboratory strains, very little is known about the loss of genetic diversity within the strains and genetic divergence between different laboratory stocks. In this study, five long‐term laboratory strains and one wild strain were examined for genetic variation at 12 microsatellite loci, which were designed using publicly available sequences. One of the laboratory strains is resistant to phosphine and one to organophosphorous insecticides. All strains had significant amounts of molecular variation, but genetic diversity in the laboratory strains was lower than in the wild‐derived strain used as control. We observed significant molecular divergence among the strains, however, the relationship between them reflected resistance status rather than geographic origins. We found no evidence for recent bottlenecks, but the wild‐derived population showed signs of demographic expansion. A novel multivariate method, multiple co‐inertia analysis, revealed that the two loci contributing most to the divergence between the resistant strains were located on the eighth chromosome, near genes associated with insecticide resistance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号