首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Whole‐genome duplications have occurred in the recent ancestors of many plants, fish and amphibians. Signals of these whole‐genome duplications still exist in the form of paralogous loci. Recent advances have allowed reliable identification of paralogs in genotyping‐by‐sequencing (GBS) data such as that generated from restriction‐site‐associated DNA sequencing (RADSeq); however, excluding paralogs from analyses is still routine due to difficulties in genotyping. This exclusion of paralogs may filter a large fraction of loci, including loci that may be adaptively important or informative for population genetic analyses. We present a maximum‐likelihood method for inferring allele dosage in paralogs and assess its accuracy using simulated GBS, empirical RADSeq and amplicon sequencing data from Chinook salmon. We accurately infer allele dosage for some paralogs from a RADSeq data set and show how accuracy is dependent upon both read depth and allele frequency. The amplicon sequencing data set, using RADSeq‐derived markers, achieved sufficient depth to infer allele dosage for all paralogs. This study demonstrates that RADSeq locus discovery combined with amplicon sequencing of targeted loci is an effective method for incorporating paralogs into population genetic analyses.  相似文献   

2.
Variation in cognitive performance, which strongly predicts functional outcome in schizophrenia (SZ), has been associated with multiple immune‐relevant genetic loci. These loci include complement component 4 (C4A), structural variation at which was recently associated with SZ risk and synaptic pruning during neurodevelopment and cognitive function. Here, we test whether this genetic association with cognition and SZ risk is specific to C4A, or extends more broadly to genes related to the complement system. Using a gene‐set with an identified role in “complement” function (excluding C4A), we used MAGMA to test if this gene‐set was enriched for genes associated with human intelligence and SZ risk, using genome‐wide association summary statistics (IQ; N = 269 867, SZ; N = 105 318). We followed up this gene‐set analysis with a complement gene‐set polygenic score (PGS) regression analysis in an independent data set of patients with psychotic disorders and healthy participants with cognitive and genomic data (N = 1000). Enrichment analysis suggested that genes within the complement pathway were significantly enriched for genes associated with IQ, but not SZ. In a gene‐based analysis of 90 genes, SERPING1 was the most enriched gene for the phenotype of IQ. In a PGS regression analysis, we found that a complement pathway PGS associated with IQ genome‐wide association studies statistics also predicted variation in IQ in our independent sample. This association (observed across both patients and controls) remained significant after controlling for the relationship between C4A and cognition. These results suggest a robust association between the complement system and cognitive function, extending beyond structural variation at C4A.  相似文献   

3.
Individuals born of consanguineous union have segments of their genomes that are homozygous as a result of inheriting identical ancestral genomic segments through both parents. One consequence of this is an increased incidence of recessive disease within these sibships. Theoretical calculations predict that 6% (1/16) of the genome of a child of first cousins will be homozygous and that the average homozygous segment will be 20 cM in size. We assessed whether these predictions held true in populations that have preferred consanguineous marriage for many generations. We found that in individuals with a recessive disease whose parents were first cousins, on average, 11% of their genomes were homozygous (n = 38; range 5%-20%), with each individual bearing 20 homozygous segments exceeding 3 cM (n = 38; range of number of homozygous segments 7-32), and that the size of the homozygous segment associated with recessive disease was 26 cM (n = 100; range 5-70 cM). These data imply that prolonged parental inbreeding has led to a background level of homozygosity increased approximately 5% over and above that predicted by simple models of consanguinity. This has important clinical and research implications.  相似文献   

4.
Rare variants affecting phenotype pose a unique challenge for human genetics. Although genome-wide association studies have successfully detected many common causal variants, they are underpowered in identifying disease variants that are too rare or population-specific to be imputed from a general reference panel and thus are poorly represented on commercial SNP arrays. We set out to overcome these challenges and detect association between disease and rare alleles using SNP arrays by relying on long stretches of genomic sharing that are identical by descent. We have developed an algorithm, DASH, which builds upon pairwise identical-by-descent shared segments to infer clusters of individuals likely to be sharing a single haplotype. DASH constructs a graph with nodes representing individuals and links on the basis of such segments spanning a locus and uses an iterative minimum cut algorithm to identify densely connected components. We have applied DASH to simulated data and diverse GWAS data sets by constructing haplotype clusters and testing them for association. In simulations we show this approach to be significantly more powerful than single-marker testing in an isolated population that is from Kosrae, Federated States of Micronesia and has abundant IBD, and we provide orthogonal information for rare, recent variants in the outbred Wellcome Trust Case-Control Consortium (WTCCC) data. In both cohorts, we identified a number of haplotype associations, five such loci in the WTCCC data and ten in the isolated, that were conditionally significant beyond any individual nearby markers. We have replicated one of these loci in an independent European cohort and identified putative structural changes in low-pass whole-genome sequence of the cluster carriers.  相似文献   

5.
6.
The analysis of contiguous homozygosity (runs of homozygous loci) in human genotyping datasets is critical in the search for causal disease variants in monogenic disorders, studies of population history and the identification of targets of natural selection. Here, we report methods for extracting homozygous segments from high-density genotyping datasets, quantifying their local genomic structure, identifying outstanding regions within the genome and visualizing results for comparative analysis between population samples.  相似文献   

7.
A recent approach for gene mapping based on confidence set inference (CSI) promises several advantages, including avoidance of corrections for multiple tests, availability of confidence intervals with known statistical properties, and sufficient localizations of disease genes. This paper proposes an extended CSI procedure that can handle markers with incomplete polymorphism, thereby increasing the applicability of the set of CSI methods in practical situations. Simulation studies show that the new procedure retains the main advantages of the original CSI. Although it generally requires more data to achieve a similar power, this increase is moderate for markers with 80% heterozygosity or higher. We also investigate the effects of relative risk estimates and disease models. Our analyses show that perturbation from actual relative risks or multilocus disease models generally leads to reduction in power or inflation in type I error, as expected. Nevertheless, for certain classes of two-locus disease models, CSI can still perform well, with reasonably high actual coverage probabilities for at least one of the disease loci. Application of CSI to the data provided by the Genetic Analysis Workshop 13 yields encouraging results, as they compare favorably to those obtained from GENEHUNTER using its NPL sib-pair method.  相似文献   

8.
Xu C  Li Z  Xu S 《Genetics》2005,169(2):1045-1059
Joint mapping for multiple quantitative traits has shed new light on genetic mapping by pinpointing pleiotropic effects and close linkage. Joint mapping also can improve statistical power of QTL detection. However, such a joint mapping procedure has not been available for discrete traits. Most disease resistance traits are measured as one or more discrete characters. These discrete characters are often correlated. Joint mapping for multiple binary disease traits may provide an opportunity to explore pleiotropic effects and increase the statistical power of detecting disease loci. We develop a maximum-likelihood method for mapping multiple binary traits. We postulate a set of multivariate normal disease liabilities, each contributing to the phenotypic variance of one disease trait. The underlying liabilities are linked to the binary phenotypes through some underlying thresholds. The new method actually maps loci for the variation of multivariate normal liabilities. As a result, we are able to take advantage of existing methods of joint mapping for quantitative traits. We treat the multivariate liabilities as missing values so that an expectation-maximization (EM) algorithm can be applied here. We also extend the method to joint mapping for both discrete and continuous traits. Efficiency of the method is demonstrated using simulated data. We also apply the new method to a set of real data and detect several loci responsible for blast resistance in rice.  相似文献   

9.
Modern analytical methods for population genetics and phylogenetics are expected to provide more accurate results when data from multiple genome‐wide loci are analysed. We present the results of an initial application of parallel tagged sequencing (PTS) on a next‐generation platform to sequence thousands of barcoded PCR amplicons generated from 95 nuclear loci and 93 individuals sampled across the range of the tiger salamander (Ambystoma tigrinum) species complex. To manage the bioinformatic processing of this large data set (344 330 reads), we developed a pipeline that sorts PTS data by barcode and locus, identifies high‐quality variable nucleotides and yields phased haplotype sequences for each individual at each locus. Our sequencing and bioinformatic strategy resulted in a genome‐wide data set with relatively low levels of missing data and a wide range of nucleotide variation. structure analyses of these data in a genotypic format resulted in strongly supported assignments for the majority of individuals into nine geographically defined genetic clusters. Species tree analyses of the most variable loci using a multi‐species coalescent model resulted in strong support for most branches in the species tree; however, analyses including more than 50 loci produced parameter sampling trends that indicated a lack of convergence on the posterior distribution. Overall, these results demonstrate the potential for amplicon‐based PTS to rapidly generate large‐scale data for population genetic and phylogenetic‐based research.  相似文献   

10.
T. Druet  M. Gautier 《Molecular ecology》2017,26(20):5820-5841
Inbreeding results from the mating of related individuals and may be associated with reduced fitness because it brings together deleterious variants in one individual. In general, inbreeding is estimated with respect to an arbitrary base population consisting of ancestors that are assumed unrelated. We herein propose a model‐based approach to estimate and characterize individual inbreeding at both global and local genomic scales by assuming the individual genome is a mosaic of homozygous‐by‐descent (HBD) and non‐HBD segments. The HBD segments may originate from ancestors tracing back to different periods in the past defining distinct age‐related classes. The lengths of the HBD segments are exponentially distributed with class‐specific parameters reflecting that inbreeding of older origin generates on average shorter stretches of observed homozygous markers. The model is implemented in a hidden Markov model framework that uses marker allele frequencies, genetic distances, genotyping error rates and the sequences of observed genotypes. Note that genotyping errors, low‐fold sequencing or genotype‐by‐sequencing data are easily accommodated under this framework. Based on simulations under the inference model, we show that the genomewide inbreeding coefficients and the parameters of the model are accurately estimated. In addition, when several inbreeding classes are simulated, the model captures them if their ages are sufficiently different. Complementary analyses, either on data sets simulated under more realistic models or on human, dog and sheep real data, illustrate the range of applications of the approach and how it can reveal recent demographic histories among populations (e.g., very recent bottlenecks or founder effects). The method also allows to clearly identify individuals resulting from extreme consanguineous matings.  相似文献   

11.
Sets of substitution lines have advantages over segregating populations for the rigorous analysis of loci influencing quantitative traits. A general strategy for the rapid production of substitution lines was developed. It involved the systematic application of marker-assisted selection over 2-4 generations of backcrossing. The effectiveness of this strategy was demonstrated by the production of intervarietal substitution lines in Brassica napus. A genetic map containing 158 loci, distributed across all 19 B. napus linkage groups and assayed in 200 B1 individuals, was generated. Six complementary B1 individuals enriched for recurrent genotype and collectively carrying almost all the donor genome were selected. A total of 288 B2 plants derived from the selected B1 individuals were analysed and complementary individuals carrying five or fewer donor segments were identified. Similar selection, carried out on 250 B3 plants from two distinct B1 lineages, identified 74 B3 individuals carrying one or two donor segments. Together, 12 of these isolated segments represented 33% of the mapped genome. Lines homozygous for single substituted segments were derived from selfed progeny of selected B3 plants. A full set of substitution lines will be used to elucidate the genetic control of quantitative production traits in oilseed rape over several environments. Key words : QTL mapping, quantitative genetics, backcross, genetic linkage map, plant breeding, restriction fragment length polymorphism.  相似文献   

12.
Ongoing hybridization and retained ancestral polymorphism in rapidly radiating lineages could mask recent cladogenetic events. This presents a challenge for the application of molecular phylogenetic methods to resolve differences between closely related taxa. We reanalyzed published genotyping‐by‐sequencing (GBS) data to infer the phylogeny of four species within the Ophrys sphegodes complex, a recently radiated clade of orchids. We used different data filtering approaches to detect different signals contained in the dataset generated by GBS and estimated their effects on maximum likelihood trees, global FST and bootstrap support values. We obtained a maximum likelihood tree with high bootstrap support, separating the species by using a large dataset based on loci shared by at least 30% of accessions. Bootstrap and FST values progressively decreased when filtering for loci shared by a higher number of accessions. However, when filtering more stringently to retain homozygous and organellar loci, we identified two main clades. These clades group individuals independently from their a priori species assignment, but were associated with two organellar haplotype clusters. We infer that a less stringent filtering preferentially selects for rapidly evolving lineage‐specific loci, which might better delimit lineages. In contrast, when using homozygous/organellar DNA loci the signature of a putative hybridization event in the lineage prevails over the most recent phylogenetic signal. These results show that using differing filtering strategies on GBS data could dissect the organellar and nuclear DNA phylogenetic signal and yield novel insights into relationships between closely related species.  相似文献   

13.
Array-based comparative genomic hybridization (arrayCGH) is a microarray-based comparative genomic hybridization technique that has been used to compare tumor genomes with normal genomes, thus providing rapid genomic assays of tumor genomes in terms of copy-number variations of those chromosomal segments that have been gained or lost. When properly interpreted, these assays are likely to shed important light on genes and mechanisms involved in the initiation and progression of cancer. Specifically, chromosomal segments, deleted in one or both copies of the diploid genomes of a group of patients with cancer, point to locations of tumor-suppressor genes (TSGs) implicated in the cancer. In this study, we focused on automatic methods for reliable detection of such genes and their locations, and we devised an efficient statistical algorithm to map TSGs, using a novel multipoint statistical score function. The proposed algorithm estimates the location of TSGs by analyzing segmental deletions (hemi- or homozygous) in the genomes of patients with cancer and the spatial relation of the deleted segments to any specific genomic interval. The algorithm assigns, to an interval of consecutive probes, a multipoint score that parsimoniously captures the underlying biology. It also computes a P value for every putative TSG by using concepts from the theory of scan statistics. Furthermore, it can identify smaller sets of predictive probes that can be used as biomarkers for diagnosis and therapeutics. We validated our method using different simulated artificial data sets and one real data set, and we report encouraging results. We discuss how, with suitable modifications to the underlying statistical model, this algorithm can be applied generally to a wider class of problems (e.g., detection of oncogenes).  相似文献   

14.
Lo SH  Zheng T 《Human heredity》2002,53(4):197-215
The mapping of complex traits is one of the most important and central areas of human genetics today. Recent attention has been focused on genome scans using a large number of marker loci. Because complex traits are typically caused by multiple genes, the common approaches of mapping them by testing markers one after another fail to capture the substantial information of interactions among disease loci. Here we propose a backward haplotype transmission association (BHTA) algorithm to address this problem. The algorithm can administer a screening on any disease model when case-parent trio data are available. It identifies the important subset of an original larger marker set by eliminating the markers of least significance, one at a time, after a complete evaluation of its importance. In contrast with the existing methods, three major advantages emerge from this approach. First, it can be applied flexibly to arbitrary markers, regardless of their locations. Second, it takes into account haplotype information; it is more powerful in detecting the multifactorial traits in the presence of haplotypic association. Finally, the proposed method can potentially prove to be more efficient in future genomewide scans, in terms of greater accuracy of gene detection and substantially reduced number of tests required in scans. We illustrate the performance of the algorithm with several examples, including one real data set with 31 markers for a study on the Gilles de la Tourette syndrome. Detailed theoretical justifications are also included, which explains why the algorithm is likely to select the 'correct' markers.  相似文献   

15.
We have simulated the evolution of sexually reproducing populations composed of individuals represented by diploid genomes. A series of eight bits formed an allele occupying one of 128 loci of one haploid genome (chromosome). The environment required a specific activity of each locus, this being the sum of the activities of both alleles located at the corresponding loci on two chromosomes. This activity is represented by the number of bits set to zero. In a constant environment the best fitted individuals were homozygous with alleles’ activities corresponding to half of the environment requirement for a locus (in diploid genome two alleles at corresponding loci produced a proper activity). Changing the environment under a relatively low recombination rate promotes generation of more polymorphic alleles. In the heterozygous loci, alleles of different activities complement each other fulfilling the environment requirements. Nevertheless, the genetic pool of populations evolves in the direction of a very restricted number of complementing haplotypes and a fast changing environment kills the population. If simulations start with all loci heterozygous, they stay heterozygous for a long time.  相似文献   

16.
The maintenance or breakdown of reproductive isolation is an observable outcome of secondary contact between species. In cases where hybrids beyond the F1 are formed, the representation of each species' ancestry can vary dramatically among genomic regions. This genomic heterogeneity in ancestry and introgression can offer insight into evolutionary processes, particularly if introgression is compared in multiple hybrid zones. Similarly, considerable heterogeneity exists across the genome in the extent to which populations and species have diverged, reflecting the combined effects of different evolutionary processes on genetic variation. We studied hybridization across two hybrid zones of two phenotypically well‐differentiated bird species in Mexico (Pipilo maculatus and P. ocai), to investigate genomic heterogeneity in differentiation and introgression. Using genotyping‐by‐sequencing (GBS) and hierarchical Bayesian models, we genotyped 460 birds at over 41 000 single nucleotide polymorphism (SNP) loci. We identified loci exhibiting extreme introgression relative to the genome‐wide expectation using a Bayesian genomic cline model. We also estimated locus‐specific FST and identified loci with exceptionally high genetic divergence between the parental species. We found some concordance of locus‐specific introgression in the two independent hybrid zones (6–20% of extreme loci shared across zones), reflecting areas of the genome that experience similar gene flow when the species interact. Additionally, heterogeneity in introgression and divergence across the genome revealed another subset of loci under the influence of locally specific factors. These results are consistent with a history in which reproductive isolation has been influenced by a common set of loci in both hybrid zones, but where local environmental and stochastic factors also lead to genomic differentiation.  相似文献   

17.
Although tomato has been the subject of extensive quantitative trait loci (QTLs) mapping experiments, most of this work has been conducted on transient populations (e.g., F2 or backcross) and few homozygous, permanent mapping populations are available. To help remedy this situation, we have developed a set of inbred backcross lines (IBLs) from the interspecific cross between Lycopersicon esculentum cv. E6203 and L. pimpinellifolium (LA1589). A total of 170 BC2F1 plants were selfed for five generations to create a set of homozygous BC2F6 lines by single-seed descent. These lines were then genotyped for 127 marker loci covering the entire tomato genome. These IBLs were evaluated for 22 quantitative traits. In all, 71 significant QTLs were identified, 15% (11/71) of which mapped to the same chromosomal positions as QTLs identified in earlier studies using the same cross. For 48% (34/71) of the detected QTLs, the wild allele was associated with improved agronomic performance. A number of new QTLs were identified including several of significant agronomic importance for tomato production: fruit shape, firmness, fruit color, scar size, seed and flower number, leaf curliness, plant growth, fertility, and flowering time. To improve the utility of the IBL population, a subset of 100 lines giving the most uniform genome coverage and map resolution was selected using a randomized greedy algorithm as implemented in the software package MapPop (http://www.bio.unc.edu/faculty/vision/lab/ mappop/). The map, phenotypic data, and seeds for the IBL population are publicly available (http://soldb.cit.cornell.edu) and will provide tomato geneticists and breeders with a genetic resource for mapping, gene discovery, and breeding.  相似文献   

18.
For complex diseases, recent interest has focused on methods that take into account joint effects at interacting loci. Conditioning on effects of disease loci at known locations can lead to increased power to detect effects at other loci. Moreover, use of joint models allows investigation of the etiologic mechanisms that may be involved in the disease. Here we present a method for simultaneous analysis of the joint genetic effects at several loci that uses affected relative pairs. The method is a generalization of the two-locus LOD-score analysis for affected sib pairs proposed by Cordell et al. We derive expressions for the relative risk, lambdaR, to a relative of an affected individual, in terms of the additive and epistatic components of variance at an arbitrary number of disease loci, and we show how these can be used to fit a likelihood model to the identity-by-descent sharing among pairs of affected relatives in extended pedigrees. We implement the method by use of a stepwise strategy in which, given evidence of linkage to disease at m-1 locations on the genome, we calculate the conditional likelihood curve across the genome for an mth disease locus, using multipoint methods similar to those proposed by Kruglyak et al. We evaluate the properties of our method by use of simulated data and present an application to real data from families with insulin-dependent diabetes mellitus.  相似文献   

19.
Gompert Z  Buerkle CA 《Molecular ecology》2011,20(10):2111-2127
We developed a Bayesian genomic cline model to study the genetic architecture of adaptive divergence and reproductive isolation between hybridizing lineages. This model quantifies locus‐specific patterns of introgression with two cline parameters that describe the probability of locus‐specific ancestry as a function of genome‐wide admixture. ‘Outlier’ loci with extreme patterns of introgression relative to most of the genome can be identified. These loci are potentially associated with adaptive divergence or reproductive isolation. We simulated genetic data for admixed populations that included neutral introgression, as well as loci that were subject to directional, epistatic or underdominant selection, and analysed these data using the Bayesian genomic cline model. Under many demographic conditions, underdominance or directional selection had detectable and predictable effects on cline parameters, and ‘outlier’ loci were greatly enriched for genetic regions affected by selection. We also analysed previously published genetic data from two transects through a hybrid zone between Mus domesticus and M. musculus. We found considerable variation in rates of introgression across the genome and particularly low rates of introgression for two X‐linked markers. There were similarities and differences in patterns of introgression between the two transects, which likely reflects a combination of stochastic variability because of genetic drift and geographic variation in the genetic architecture of reproductive isolation. By providing a robust framework to quantify and compare patterns of introgression among genetic regions and populations, the Bayesian genomic cline model will advance our understanding of the genetics of reproductive isolation and the speciation process.  相似文献   

20.
Genomic data sets are increasingly central to ecological and evolutionary biology, but far fewer resources are available for invertebrates. Powerful new computational tools and the rapidly decreasing cost of Illumina sequencing are beginning to change this, enabling rapid genome assembly and reference marker extraction. We have developed and tested a practical workflow for developing genomic resources in nonmodel groups with real‐world data on Collembola (springtails), one of the most dominant soil animals on Earth. We designed universal molecular marker sets, single‐copy orthologues (BUSCO s) and ultraconserved elements (UCEs), using three existing and 11 newly generated genomes. Both marker types were tested in silico via marker capture success and phylogenetic performance. The new genomes were assembled with Illumina short reads and 9,585?14,743 protein‐coding genes were predicted with ab initio and protein homology evidence. We identified 1,997 benchmarking universal single‐copy orthologues (BUSCO s) across 14 genomes and created and assessed a custom BUSCO data set for extracting single‐copy genes. We also developed a new UCE probe set containing 46,087 baits targeting 1,885 loci. We successfully captured 1,437?1,865 BUSCO s and 975?1,186 UCEs across 14 genomes. Phylogenomic reconstructions using these markers proved robust, giving new insight on deep‐time collembolan relationships. Our study demonstrates the feasibility of generating thousands of universal markers from highly efficient whole‐genome sequencing, providing a valuable resource for genome‐scale investigations in evolutionary biology and ecology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号