首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is crucial to understand the genetic health and implications of inbreeding in wildlife populations, especially of vulnerable species. Using extensive demographic and genetic data, we investigated the relationships among pedigree inbreeding coefficients, metrics of molecular heterozygosity and fitness for a large population of endangered African wild dogs (Lycaon pictus) in South Africa. Molecular metrics based on 19 microsatellite loci were significantly, but modestly correlated to inbreeding coefficients in this population. Inbred wild dogs with inbreeding coefficients of ??0.25 and subordinate individuals had shorter lifespans than outbred and dominant contemporaries, suggesting some deleterious effects of inbreeding. However, this trend was confounded by pack-specific effects as many inbred individuals originated from a single large pack. Despite wild dogs being endangered and existing in small populations, findings within our sample population indicated that molecular metrics were not robust predictors in models of fitness based on breeding pack formation, dominance, reproductive success or lifespan of individuals. Nonetheless, our approach has generated a vital database for future comparative studies to examine these relationships over longer periods of time. Such detailed assessments are essential given knowledge that wild canids can be highly vulnerable to inbreeding effects over a few short generations.  相似文献   

2.
Knowledge of kin relationships between members of wild animal populations has broad application in ecology and evolution research by allowing the investigation of dispersal dynamics, mating systems, inbreeding avoidance, kin recognition, and kin selection as well as aiding the management of endangered populations. However, the assessment of kinship among members of wild animal populations is difficult in the absence of detailed multigenerational pedigrees. Here, we first review the distinction between genetic relatedness and kinship derived from pedigrees and how this makes the identification of kin using genetic data inherently challenging. We then describe useful approaches to kinship classification, such as parentage analysis and sibship reconstruction, and explain how the combined use of marker systems with biparental and uniparental inheritance, demographic information, likelihood analyses, relatedness coefficients, and estimation of misclassification rates can yield reliable classifications of kinship in groups with complex kin structures. We outline alternative approaches for cases in which explicit knowledge of dyadic kinship is not necessary, but indirect inferences about kinship on a group‐ or population‐wide scale suffice, such as whether more highly related dyads are in closer spatial proximity. Although analysis of highly variable microsatellite loci is still the dominant approach for studies on wild populations, we describe how the long‐awaited use of large‐scale single‐nucleotide polymorphism and sequencing data derived from noninvasive low‐quality samples may eventually lead to highly accurate assessments of varying degrees of kinship in wild populations.  相似文献   

3.
Functional data are smooth, often continuous, random curves, which can be seen as an extreme case of multivariate data with infinite dimensionality. Just as componentwise inference for multivariate data naturally performs feature selection, subsetwise inference for functional data performs domain selection. In this paper, we present a unified testing framework for domain selection on populations of functional data. In detail, p-values of hypothesis tests performed on pointwise evaluations of functional data are suitably adjusted for providing control of the familywise error rate (FWER) over a family of subsets of the domain. We show that several state-of-the-art domain selection methods fit within this framework and differ from each other by the choice of the family over which the control of the FWER is provided. In the existing literature, these families are always defined a priori. In this work, we also propose a novel approach, coined thresholdwise testing, in which the family of subsets is instead built in a data-driven fashion. The method seamlessly generalizes to multidimensional domains in contrast to methods based on a priori defined families. We provide theoretical results with respect to consistency and control of the FWER for the methods within the unified framework. We illustrate the performance of the methods within the unified framework on simulated and real data examples and compare their performance with other existing methods.  相似文献   

4.
The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. Various methods were recently developed for this purpose. While most of them combine different types of data and a priori knowledge, methods based on graphical Gaussian models are capable of learning the network directly from raw data. They consider the full-order partial correlations which are partial correlations between two variables given the remaining ones, for modeling direct links between variables. Statistical methods were developed for estimating these links when the number of observations is larger than the number of variables. However, the rapid advance of new technologies that allow the simultaneous measure of genome expression, led to large-scale datasets where the number of variables is far larger than the number of observations. To get around this dimensionality problem, different strategies and new statistical methods were proposed. In this study we focused on statistical methods recently published. All are based on the fact that the number of direct relationships between two variables is very small in regards to the number of possible relationships, p(p-1)/2. In the biological context, this assumption is not always satisfied over the whole graph. It is essential to precisely know the behavior of the methods in regards to the characteristics of the studied object before applying them. For this purpose, we evaluated the validity domain of each method from wide-ranging simulated datasets. We then illustrated our results using recently published biological data.  相似文献   

5.
Ionita I  Lo SH 《Human heredity》2005,60(4):227-240
OBJECTIVE: The conventional affected sib pair methods evaluate the linkage information at a locus by considering only marginal information. We describe a multilocus linkage method that uses both the marginal information and information derived from the possible interactions among several disease loci, thereby increasing the significance of loci with modest effects. METHODS: Our method is based on a statistic that quantifies the linkage information contained in a set of markers. By a marker selection-reduction process, we screen a set of polymorphisms and select a few that seem linked to disease. RESULTS: We test our approach on genome scan data for inflammatory bowel disease (InfBD) and on simulated data. On real data we detect 6 of the 8 known InfBD loci; on simulated data we obtain improvements in power of up to 40% compared to a conventional single-locus method. CONCLUSION: Our extensive simulations and the results on real data show that our method is in general more powerful than single-locus methods in detecting disease loci responsible for complex traits. A further advantage of our approach is that it can be extended to make use of both the linkage and the linkage disequilibrium between disease loci and nearby markers.  相似文献   

6.
Elmer KR  Dávila JA  Lougheed SC 《Heredity》2007,99(5):506-515
We assess patterns of genetic diversity of a neotropical leaflitter frog, Eleutherodactylus ockendeni, in the upper Amazon of Ecuador without a priori delineation of biological populations and with sufficiently intensive sampling to assess inter-individual patterns. We mapped the location of each collected frog across a 5.4 x 1 km landscape at the Jatun Sacha Biological Station, genotyped 185 individuals using five species-specific DNA microsatellite loci, and sequenced a fragment of mitochondrial cytochrome b for a subset of 51 individuals. The microsatellites were characterized by high allelic diversity and homozygote excess across all loci, suggesting that when pooled the sample is not a panmictic population. We conclude that the lack of panmixia is not attributable to the influence of null alleles or biased sampling of consanguineous family groups. Multiple methods of population cluster analysis, using both Bayesian and maximum likelihood approaches, failed to identify discrete genetic clusters across the sampled area. Using multivariate spatial autocorrelation, kinship coefficients and relatedness coefficients, we identify a continuous isolation by distance population structure, with a first patch size of ca. 260 m and apparently large population sizes. Analysis of mtDNA corroborates the observation of high genetic diversity at fine scales: there are multiple haplotypes, they are non-randomly distributed and a binary haplotype correlogram shows significant spatial genetic autocorrelation. We demonstrate the utility of inter-individual genetic methods and caution against making a priori assumptions about population genetic structure based simply on arbitrary or convenient patterns of sampling.  相似文献   

7.
Clustering of microarray gene expression data is performed routinely, for genes as well as for samples. Clustering of genes can exhibit functional relationships between genes; clustering of samples on the other hand is important for finding e.g. disease subtypes, relevant patient groups for stratification or related treatments. Usually this is done by first filtering the genes for high-variance under the assumption that they carry most of the information needed for separating different sample groups. If this assumption is violated, important groupings in the data might be lost. Furthermore, classical clustering methods do not facilitate the biological interpretation of the results. Therefore, we propose to methodologically integrate the clustering algorithm with prior biological information. This is different from other approaches as knowledge about classes of genes can be directly used to ease the interpretation of the results and possibly boost clustering performance. Our approach computes dendrograms that resemble decision trees with gene classes used to split the data at each node which can help to find biologically meaningful differences between the sample groups. We have tested the proposed method both on simulated and real data and conclude its usefulness as a complementary method, especially when assumptions of few differentially expressed genes along with an informative mapping of genes to different classes are met.  相似文献   

8.
9.
Use of microsatellite loci to classify individuals by relatedness   总被引:19,自引:1,他引:18  
This study investigates the use of microsatellite loci for estimating relatedness between individuals in wild, outbred, vertebrate populations. We measured allele frequencies at 20 unlinked, dinucleotide-repeat microsatellite loci in a population of wild mice ( Mus musculus ), and used these observed frequencies to generate the expected distributions of pairwise relatedness among full sib, half sib, and unrelated pairs of individuals, as would be estimated from the microsatellite data. In this population one should be able to discriminate between unrelated and full-sib dyads with at least 97% accuracy, and to discriminate half-sib pairs from unrelated pairs or from full-sib pairs with better than 80% accuracy. If one uses the criterion that parent-offspring pairs must share at least one allele per locus, then only 15% of full-sib pairs, 2% of half-sib pairs, and 0% of unrelated pairs in this population would qualify as potential parent-offspring pairs. We verified that the simulation results (which assume a random mating population in Hardy-Weinberg and linkage equilibrium) accurately predict results one would obtain from this population in real life by scoring laboratory-bred full- and half-sib families whose parents were wild-caught mice from the study population. We also investigated the effects of using different numbers of loci, or loci of different average heterozygosities ( He ), on misclassification frequencies. Both variables have strong effects on misclassification rate. For example, it requires almost twice as many loci of He = 0.62 to achieve the same accuracy as a given number of loci of He = 0.75. Finally, we tested the ability of UPGMA clustering to identify family groups in our population. Clustering of allele matching scores among the offspring of four sets of independent maternal half sibships (four females, each mated to two different males) perfectly recovered the true family relationships.  相似文献   

10.
A pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. Along with rapidly growing knowledge of genetics and accumulation of genealogy information, pedigree data is becoming increasingly important. In large pedigree graphs, path-based methods for efficiently computing genealogical measurements, such as inbreeding and kinship coefficients of individuals, depend on efficient identification and processing of paths. In this paper, we propose a new compact path encoding scheme on large pedigrees, accompanied by an efficient algorithm for identifying paths. We demonstrate the utilization of our proposed method by applying it to the inbreeding coefficient computation. We present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements.  相似文献   

11.
One of the most powerful and commonly used approaches for detecting local adaptation in the genome is the identification of extreme allele frequency differences between populations. In this article, we present a new maximum likelihood method for finding regions under positive selection. It is based on a Gaussian approximation to allele frequency changes and it incorporates admixture between populations. The method can analyze multiple populations simultaneously and retains power to detect selection signatures specific to ancestry components that are not representative of any extant populations. Using simulated data, we compare our method to related approaches, and show that it is orders of magnitude faster than the state-of-the-art, while retaining similar or higher power for most simulation scenarios. We also apply it to human genomic data and identify loci with extreme genetic differentiation between major geographic groups. Many of the genes identified are previously known selected loci relating to hair pigmentation and morphology, skin, and eye pigmentation. We also identify new candidate regions, including various selected loci in the Native American component of admixed Mexican-Americans. These involve diverse biological functions, such as immunity, fat distribution, food intake, vision, and hair development.  相似文献   

12.
Complex genetic interactions lie at the foundation of many diseases. Understanding the nature of these interactions is critical to developing rational intervention strategies. In mammalian systems hypothesis testing in vivo is expensive, time consuming, and often restricted to a few physiological endpoints. Thus, computational methods that generate causal hypotheses can help to prioritize targets for experimental intervention. We propose a Bayesian statistical method to infer networks of causal relationships among genotypes and phenotypes using expression quantitative trait loci (eQTL) data from genetically randomized populations. Causal relationships between network variables are described with hierarchical regression models. Prior distributions on the network structure enforce graph sparsity and have the potential to encode prior biological knowledge about the network. An efficient Monte Carlo method is used to search across the model space and sample highly probable networks. The result is an ensemble of networks that provide a measure of confidence in the estimated network topology. These networks can be used to make predictions of system-wide response to perturbations. We applied our method to kidney gene expression data from an MRL/MpJ × SM/J intercross population and predicted a previously uncharacterized feedback loop in the local renin-angiotensin system.  相似文献   

13.
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.  相似文献   

14.
Scoring clustering solutions by their biological relevance   总被引:1,自引:0,他引:1  
MOTIVATION: A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. RESULTS: We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. AVAILABILITY: The software is available from the authors upon request.  相似文献   

15.
For wildlife populations, it is often difficult to determine biological parameters that indicate breeding patterns and population mixing, but knowledge of these parameters is essential for effective management. A pedigree encodes the relationship between individuals and can provide insight into the dynamics of a population over its recent history. Here, we present a method for the reconstruction of pedigrees for wild populations of animals that live long enough to breed multiple times over their lifetime and that have complex or unknown generational structures. Reconstruction was based on microsatellite genotype data along with ancillary biological information: sex and observed body size class as an indicator of relative age of individuals within the population. Using body size‐class data to infer relative age has not been considered previously in wildlife genealogy and provides a marked improvement in accuracy of pedigree reconstruction. Body size‐class data are particularly useful for wild populations because it is much easier to collect noninvasively than absolute age data. This new pedigree reconstruction system, PR‐genie, performs reconstruction using maximum likelihood with optimization driven by the cross‐entropy method. We demonstrated pedigree reconstruction performance on simulated populations (comparing reconstructed pedigrees to known true pedigrees) over a wide range of population parameters and under assortative and intergenerational mating schema. Reconstruction accuracy increased with the presence of size‐class data and as the amount and quality of genetic data increased. We provide recommendations as to the amount and quality of data necessary to provide insight into detailed familial relationships in a wildlife population using this pedigree reconstruction technique.  相似文献   

16.
MOTIVATION: In the context of sample (e.g. tumor) classifications with microarray gene expression data, many methods have been proposed. However, almost all the methods ignore existing biological knowledge and treat all the genes equally a priori. On the other hand, because some genes have been identified by previous studies to have biological functions or to be involved in pathways related to the outcome (e.g. cancer), incorporating this type of prior knowledge into a classifier can potentially improve both the predictive performance and interpretability of the resulting model. RESULTS: We propose a simple and general framework to incorporate such prior knowledge into building a penalized classifier. As two concrete examples, we apply the idea to two penalized classifiers, nearest shrunken centroids (also called PAM) and penalized partial least squares (PPLS). Instead of treating all the genes equally a priori as in standard penalized methods, we group the genes according to their functional associations based on existing biological knowledge or data, and adopt group-specific penalty terms and penalization parameters. Simulated and real data examples demonstrate that, if prior knowledge on gene grouping is indeed informative, our new methods perform better than the two standard penalized methods, yielding higher predictive accuracy and screening out more irrelevant genes.  相似文献   

17.
MOTIVATION: The result of a typical microarray experiment is a long list of genes with corresponding expression measurements. This list is only the starting point for a meaningful biological interpretation. Modern methods identify relevant biological processes or functions from gene expression data by scoring the statistical significance of predefined functional gene groups, e.g. based on Gene Ontology (GO). We develop methods that increase the explanatory power of this approach by integrating knowledge about relationships between the GO terms into the calculation of the statistical significance. RESULTS: We present two novel algorithms that improve GO group scoring using the underlying GO graph topology. The algorithms are evaluated on real and simulated gene expression data. We show that both methods eliminate local dependencies between GO terms and point to relevant areas in the GO graph that remain undetected with state-of-the-art algorithms for scoring functional terms. A simulation study demonstrates that the new methods exhibit a higher level of detecting relevant biological terms than competing methods.  相似文献   

18.

Background

In classical pedigree-based analysis, additive genetic variance is estimated from between-family variation, which requires the existence of larger phenotyped and pedigreed populations involving numerous families (parents). However, estimation is often complicated by confounding of genetic and environmental family effects, with the latter typically occurring among full-sibs. For this reason, genetic variance is often inferred based on covariance among more distant relatives, which reduces the power of the analysis. This simulation study shows that genome-wide identity-by-descent sharing among close relatives can be used to quantify additive genetic variance solely from within-family variation using data on extremely small family samples.

Methods

Identity-by-descent relationships among full-sibs were simulated assuming a genome size similar to that of humans (effective number of loci ~80). Genetic variance was estimated from phenotypic data assuming that genomic identity-by-descent relationships could be accurately re-created using information from genome-wide markers. The results were compared with standard pedigree-based genetic analysis.

Results

For a polygenic trait and a given number of phenotypes, the most accurate estimates of genetic variance were based on data from a single large full-sib family only. Compared with classical pedigree-based analysis, the proposed method is more robust to selection among parents and for confounding of environmental and genetic effects. Furthermore, in some cases, satisfactory results can be achieved even with less ideal data structures, i.e., for selectively genotyped data and for traits for which the genetic variance is largely under the control of a few major genes.

Conclusions

Estimation of genetic variance using genomic identity-by-descent relationships is especially useful for studies aiming at estimating additive genetic variance of highly fecund species, using data from small populations with limited pedigree information and/or few available parents, i.e., parents originating from non-pedigreed or even wild populations.  相似文献   

19.
Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that GWAS provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We use a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model, we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also significantly outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.  相似文献   

20.
Next-generation sequencing and phylogenomics hold great promise for elucidating complex relationships among large plant families. Here, we performed targeted capture of low copy sequences followed by next-generation sequencing on the Illumina platform in the large and diverse angiosperm family Compositae (Asteraceae). The family is monophyletic, based on morphology and molecular data, yet many areas of the phylogeny have unresolved polytomies and interpreting phylogenetic patterns has been historically difficult. In order to outline a method and provide a framework and for future phylogenetic studies in the Compositae, we sequenced 23 taxa from across the family in which the relationships were well established as well as a member of the sister family Calyceraceae. We generated nuclear data from 795 loci and assembled chloroplast genomes from off-target capture reads enabling the comparison of nuclear and chloroplast genomes for phylogenetic analyses. We also analyzed multi-copy nuclear genes in our data set using a clustering method during orthology detection, and we applied a network approach to these clusters—analyzing all related locus copies. Using these data, we produced hypotheses of phylogenetic relationships employing both a conservative (restricted to only loci with one copy per targeted locus) and a multigene approach (including all copies per targeted locus). The methods and bioinformatics workflow presented here provide a solid foundation for future work aimed at understanding gene family evolution in the Compositae as well as providing a model for phylogenomic analyses in other plant mega-families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号