首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Single nucleotide polymorphisms (SNPs) and genes that exhibit presence/absence variation have provided informative marker sets for bacterial and viral genotyping. Identification of marker sets optimised for these purposes has been based on maximal generalized discriminatory power as measured by Simpson's Index of Diversity, or on the ability to identify specific variants. Here we describe the Not-N algorithm, which is designed to identify small sets of genetic markers diagnostic for user-specified subsets of known genetic variants. The algorithm does not treat the user-specified subset and the remaining genetic variants equally. Rather Not-N analysis is designed to underpin assays that provide 0% false negatives, which is very important for e.g. diagnostic procedures for clinically significant subgroups within microbial species.  相似文献   

2.
Many genetic variants that are significantly correlated to gene expression changes across human individuals have been identified, but the ability of these variants to predict expression of unseen individuals has rarely been evaluated. Here, we devise an algorithm that, given training expression and genotype data for a set of individuals, predicts the expression of genes of unseen test individuals given only their genotype in the local genomic vicinity of the predicted gene. Notably, the resulting predictions are remarkably robust in that they agree well between the training and test sets, even when the training and test sets consist of individuals from distinct populations. Thus, although the overall number of genes that can be predicted is relatively small, as expected from our choice to ignore effects such as environmental factors and trans sequence variation, the robust nature of the predictions means that the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension that incorporates heterogeneous types of genomic annotations to differentially weigh the importance of the various genetic variants, and we show that assigning higher weights to variants with particular annotations such as proximity to genes and high regional G/C content can further improve the predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted from their cis genetic variation.  相似文献   

3.
4.
5.
Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to “correct” for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP.  相似文献   

6.
Pathogens are believed to drive genetic diversity at host loci involved in immunity to infectious disease. To date, studies exploring the genetic basis of pathogen resistance in the wild have focussed almost exclusively on genes of the Major Histocompatibility Complex (MHC); the role of genetic variation elsewhere in the genome as a basis for variation in pathogen resistance has rarely been explored in natural populations. Cytokines are signalling molecules with a role in many immunological and physiological processes. Here we use a natural population of field voles (Microtus agrestis) to examine how genetic diversity at a suite of cytokine and other immune loci impacts the immune response phenotype and resistance to several endemic pathogen species. By using linear models to first control for a range of non-genetic factors, we demonstrate strong effects of genetic variation at cytokine loci both on host immunological parameters and on resistance to multiple pathogens. These effects were primarily localized to three cytokine genes (Interleukin 1 beta (Il1b), Il2, and Il12b), rather than to other cytokines tested, or to membrane-bound, non-cytokine immune loci. The observed genetic effects were as great as for other intrinsic factors such as sex and body weight. Our results demonstrate that genetic diversity at cytokine loci is a novel and important source of individual variation in immune function and pathogen resistance in natural populations. The products of these loci are therefore likely to affect interactions between pathogens and help determine survival and reproductive success in natural populations. Our study also highlights the utility of wild rodents as a model of ecological immunology, to better understand the causes and consequences of variation in immune function in natural populations including humans.  相似文献   

7.
IBDSim is a package for the simulation of genotypic data under isolation by distance. It is based on a backward 'generation by generation' coalescent algorithm allowing the consideration of various isolation by distance models with discrete subpopulations as well as continuous populations. Many dispersal distributions can be considered as well as heterogeneities in space and time of the demographic parameters. Typical applications of our program include (i) the study of the effect of various sampling, mutational and demographic factors on the pattern of genetic variation; and (ii) the production of test data sets to assess the influence of these factors on inferential methods available to analyse genotypic data.  相似文献   

8.
Any given human individual carries multiple genetic variants that disrupt protein-coding genes, through structural variation, as well as nucleotide variants and indels. Predicting the phenotypic consequences of a gene disruption remains a significant challenge. Current approaches employ information from a range of biological networks to predict which human genes are haploinsufficient (meaning two copies are required for normal function) or essential (meaning at least one copy is required for viability). Using recently available study gene sets, we show that these approaches are strongly biased towards providing accurate predictions for well-studied genes. By contrast, we derive a haploinsufficiency score from a combination of unbiased large-scale high-throughput datasets, including gene co-expression and genetic variation in over 6000 human exomes. Our approach provides a haploinsufficiency prediction for over twice as many genes currently unassociated with papers listed in Pubmed as three commonly-used approaches, and outperforms these approaches for predicting haploinsufficiency for less-studied genes. We also show that fine-tuning the predictor on a set of well-studied ‘gold standard’ haploinsufficient genes does not improve the prediction for less-studied genes. This new score can readily be used to prioritize gene disruptions resulting from any genetic variant, including copy number variants, indels and single-nucleotide variants.  相似文献   

9.
For North American river otters (Lontra canadensis) in Louisiana, statewide distribution, availability of aquatic habitats, and the absence of physical barriers to dispersal might suggest that they exist as a large, panmictic population. However, the wide variety of habitat types in this region, and the dynamic nature of these habitats over time, could potentially structure river otter populations in accordance with cryptic landscape features. Recently developed landscape genetic models offer a spatially explicit approach that could be useful in identifying potential barriers to the movement of river otters through the dynamic aquatic landscape of Louisiana. We used georeferenced multilocus microsatellite genotypes in spatially implicit (STRUCTURE) and spatially explicit (GENELAND) models to characterize patterns of landscape genetic structure. All models identified 3 subpopulations of river otters in Louisiana, corresponding to Inland, Atchafalaya River, and Mississippi River regions. Variation in breeding seasonality, brought about by variation in prey abundance between inland and coastal populations, may have contributed to genetic differentiation among populations. It is also possible that the genetic discontinuities we observed indicate a correlation between otter distribution and access to freshwater. Regardless of the mechanism, it is likely that any genetic differentiation among subpopulations is exacerbated by relatively poor dispersal.  相似文献   

10.
Metagenomics holds the promise of greatly advancing the study of diversity in natural communities, but novel theoretical and methodological approaches must first be developed and adjusted for these data sets. We evaluated widely used macroecological metrics of taxonomic diversity on a simulated set of metagenomic samples, using phylogenetically meaningful protein-coding genes as ecological proxies. To our knowledge, this is the first approach of this kind to evaluate taxonomic diversity metrics derived from metagenomic data sets. We demonstrate that abundance matrices derived from protein-coding marker genes reproduce more faithfully the structure of the original community than those derived from SSU-rRNA gene. We also found that the most commonly used diversity metrics are biased estimators of community structure and differ significantly from their corresponding real parameters and that these biases are most likely caused by insufficient sampling and differences in community phylogenetic composition. Our results suggest that the ranking of samples using multidimensional metrics makes a good qualitative alternative for contrasting community structure and that these comparisons can be greatly improved with the incorporation of metrics for both community structure and phylogenetic diversity. These findings will help to achieve a standardized framework for community diversity comparisons derived from metagenomic data sets.  相似文献   

11.
12.
Polymorphism for immune functions can explain significant variation in health and reproductive success within species. Drastic loss in genetic diversity at such loci constitutes an extinction risk and should be monitored in species of conservation concern. However, effective implementations of genome-wide immune polymorphism sets into high-throughput genotyping assays are scarce. Here, we report the design and validation of a microfluidics-based amplicon sequencing assay to comprehensively capture genetic variation in Alpine ibex (Capra ibex). This species represents one of the most successful large mammal restorations recovering from a severely depressed census size and a massive loss in diversity at the major histocompatibility complex (MHC). We analysed 65 whole-genome sequencing sets of the Alpine ibex and related species to select the most representative markers and to prevent primer binding failures. In total, we designed ~1,000 amplicons densely covering the MHC, further immunity-related genes as well as randomly selected genome-wide markers for the assessment of neutral population structure. Our analysis of 158 individuals shows that the genome-wide markers perform equally well at resolving population structure as RAD-sequencing or low-coverage genome sequencing data sets. Immunity-related loci show unexpectedly high degrees of genetic differentiation within the species. Such information can now be used to define highly targeted individual translocations. Our design strategy can be realistically implemented into genetic surveys of a large range of species. In conclusion, leveraging whole-genome sequencing data sets to design targeted amplicon assays allows the simultaneous monitoring of multiple genetic risk factors and can be translated into species conservation recommendations.  相似文献   

13.
14.
Although the synthesis of cell wall polysaccharides is a critical process during plant cell growth and differentiation, many of the wall biosynthetic genes have not yet been identified. This review focuses on the synthesis of non-cellulosic matrix polysaccharides formed in the Golgi apparatus. Our consideration is limited to two types of plant cell wall biosynthetic enzymes: glycan synthases and glycosyltransferases. Classical means of identifying these enzymes and the genes that encode them rely on biochemical purification of enzyme activity to obtain amino acid sequence data that is then used to identify the corresponding gene. This type of approach is difficult, especially when acceptor substrates for activity assays are unavailable, as is the case for many enzymes. However, bioinformatics and functional genomics provide powerful alternative means of identifying and evaluating candidate genes. Database searches using various strategies and expression profiling can identify candidate genes. The involvement of these genes in wall biosynthesis can be evaluated using genetic, reverse genetic, biochemical, and heterologous expression methods. Recent advances using these methods are considered in this review.  相似文献   

15.
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.  相似文献   

16.
To control for hidden population stratification in genetic-association studies, statistical methods that use marker genotype data to infer population structure have been proposed as a possible alternative to family-based designs. In principle, it is possible to infer population structure from associations between marker loci and from associations of markers with the trait, even when no information about the demographic background of the population is available. In a model in which the total population is formed by admixture between two or more subpopulations, confounding can be estimated and controlled. Current implementations of this approach have limitations, the most serious of which is that they do not allow for uncertainty in estimations of individual admixture proportions or for lack of identifiability of subpopulations in the model. We describe methods that overcome these limitations by a combination of Bayesian and classical approaches, and we demonstrate the methods by using data from three admixed populations--African American, African Caribbean, and Hispanic American--in which there is extreme confounding of trait-genotype associations because the trait under study (skin pigmentation) varies with admixture proportions. In these data sets, as many as one-third of marker loci show crude associations with the trait. Control for confounding by population stratification eliminates these associations, except at loci that are linked to candidate genes for the trait. With only 32 markers informative for ancestry, the efficiency of the analysis is 70%. These methods can deal with both confounding and selection bias in genetic-association studies, making family-based designs unnecessary.  相似文献   

17.
18.
A greater understanding of the function of the human immune system at the single-cell level in healthy individuals is critical for discerning aberrant cellular behavior that occurs in settings such as autoimmunity, immunosenescence, and cancer. To achieve this goal, a systems-level approach capable of capturing the response of the interdependent immune cell types to external stimuli is required. In this study, an extensive characterization of signaling responses in multiple immune cell subpopulations within PBMCs from a cohort of 60 healthy donors was performed using single-cell network profiling (SCNP). SCNP is a multiparametric flow cytometry-based approach that enables the simultaneous measurement of basal and evoked signaling in multiple cell subsets within heterogeneous populations. In addition to establishing the interindividual degree of variation within a broad panel of immune signaling responses, the possible association of any observed variation with demographic variables including age and race was investigated. Using half of the donors as a training set, multiple age- and race-associated variations in signaling responses in discrete cell subsets were identified, and several were subsequently confirmed in the remaining samples (test set). Such associations may provide insight into age-related immune alterations associated with high infection rates and diminished protection following vaccination and into the basis for ethnic differences in autoimmune disease incidence and treatment response. SCNP allowed for the generation of a functional map of healthy immune cell signaling responses that can provide clinically relevant information regarding both the mechanisms underlying immune pathological conditions and the selection and effect of therapeutics.  相似文献   

19.
Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.  相似文献   

20.
Community association populations are composed of phenotypically and genetically diverse accessions. Once these populations are genotyped, the resulting marker data can be reused by different groups investigating the genetic basis of different traits. Because the same genotypes are observed and scored for a wide range of traits in different environments, these populations represent a unique resource to investigate pleiotropy. Here, we assembled a set of 234 separate trait datasets for the Sorghum Association Panel, a group of 406 sorghum genotypes widely employed by the sorghum genetics community. Comparison of genome-wide association studies (GWAS) conducted with two independently generated marker sets for this population demonstrate that existing genetic marker sets do not saturate the genome and likely capture only 35–43% of potentially detectable loci controlling variation for traits scored in this population. While limited evidence for pleiotropy was apparent in cross-GWAS comparisons, a multivariate adaptive shrinkage approach recovered both known pleiotropic effects of existing loci and new pleiotropic effects, particularly significant impacts of known dwarfing genes on root architecture. In addition, we identified new loci with pleiotropic effects consistent with known trade-offs in sorghum development. These results demonstrate the potential for mining existing trait datasets from widely used community association populations to enable new discoveries from existing trait datasets as new, denser genetic marker datasets are generated for existing community association populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号