首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Genomic structural variations represent an important source of genetic variation in mammal genomes, thus, they are commonly related to phenotypic expressions. In this work, ∼770,000 single nucleotide polymorphism genotypes from 506 animals from 19 cattle breeds were analyzed. A simple LD-based structural variation was defined, and a genome-wide analysis was performed. After applying some quality control filters, for each breed and each chromosome we calculated the linkage disequilibrium (r 2) of short range (≤100 Kb). We sorted SNP pairs by distance and obtained a set of LD means (called the expected means) using bins of 5 Kb. We identified 15,246 segments of at least 1 Kb, among the 19 breeds, consisting of sets of at least 3 adjacent SNPs so that, for each SNP, r 2 within its neighbors in a 100 Kb range, to the right side of that SNP, were all bigger than, or all smaller than, the corresponding expected mean, and their P-value were significant after a Benjamini-Hochberg multiple testing correction. In addition, to account just for homogeneously distributed regions we considered only SNPs having at least 15 SNP neighbors within 100 Kb. We defined such segments as structural variations. By grouping all variations across all animals in the sample we defined 9,146 regions, involving a total of 53,137 SNPs; representing the 6.40% (160.98 Mb) from the bovine genome. The identified structural variations covered 3,109 genes. Clustering analysis showed the relatedness of breeds given the geographic region in which they are evolving. In summary, we present an analysis of structural variations based on the deviation of the expected short range LD between SNPs in the bovine genome. With an intuitive and simple definition based only on SNPs data it was possible to discern closeness of breeds due to grouping by geographic region in which they are evolving.  相似文献   

3.
Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms.  相似文献   

4.
One of the persistent challenges of genetic association studies is the replication of genetic marker-disease associations across ethnic groups. Here, we conducted high-density association mapping of PARK2/PACRG SNPs with leprosy and identified 69 SNPs significantly associated with leprosy in 198 single-case Vietnamese leprosy families. A total of 56 associated SNPs localized to the overlapping promoter regions of PARK2/PACRG. For this region, multivariate analysis identified four SNPs belonging to two major SNP bins (rs1333955, rs7744433) and two single SNP bins (rs2023004, rs6936895) that capture the combined statistical evidence (P = 1.1 × 10?5) for association among Vietnamese patients. Next, we enrolled a case–control sample of 364 leprosy cases and 370 controls from Northern India. We genotyped all subjects for 149 SNPs that capture >80 % of the genetic variation in the Vietnamese sample and found 24 SNPs significantly associated with leprosy. Multivariate analysis identified three SNPs (rs1333955, rs9356058 and rs2023004) that capture the association with leprosy (P < 10?8). Hence, two SNPs (rs1333955 and rs2023004) were replicated by multivariate analysis between both ethnic groups. Marked differences in the linkage disequilibrium pattern explained some of the differences in univariate analysis between the two ethnic groups. In addition, the strength of association for two promoter region SNP bins was significantly stronger among young leprosy patients in the Vietnamese sample. The same trend was observed in the Indian sample, but due to the higher age-at-diagnosis of the patients the age effect was less pronounced.  相似文献   

5.
Threespine stickleback populations are model systems for studying adaptive evolution and the underlying genetics. In lakes on the Haida Gwaii archipelago (off western Canada), stickleback have undergone a remarkable local radiation and show phenotypic diversity matching that seen throughout the species distribution. To provide a historical context for this radiation, we surveyed genetic variation at >1000 single nucleotide polymorphism (SNP) loci in stickleback from over 100 populations. SNPs included markers evenly distributed throughout genome and candidate SNPs tagging adaptive genomic regions. Based on evenly distributed SNPs, the phylogeographic pattern differs substantially from the disjunct pattern previously observed between two highly divergent mtDNA lineages. The SNP tree instead shows extensive within watershed population clustering and different watersheds separated by short branches deep in the tree. These data are consistent with separate colonizations of most watersheds, despite underlying genetic connections between some independent drainages. This supports previous suppositions that morphological diversity observed between watersheds has been shaped independently, with populations exhibiting complete loss of lateral plates and giant size each occurring in several distinct clades. Throughout the archipelago, we see repeated selection of SNPs tagging candidate freshwater adaptive variants at several genomic regions differentiated between marine–freshwater populations on a global scale (e.g. EDA, Na/K ATPase). In estuarine sites, both marine and freshwater allelic variants were commonly detected. We also found typically marine alleles present in a few freshwater lakes, especially those with completely plated morphology. These results provide a general model for postglacial colonization of freshwater habitat by sticklebacks and illustrate the tremendous potential of genome‐wide SNP data sets hold for resolving patterns and processes underlying recent adaptive divergences.  相似文献   

6.
Han F  Pan W 《Biometrics》2012,68(1):307-315
Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.  相似文献   

7.
This study explored a semi-parametric method built upon reproducing kernels for estimating and testing the joint effect of a set of single nucleotide polymorphisms (SNPs). The kernel adopted is the identity-by-state kernel that measures SNP similarity between subjects. In this article, through simulations we first assessed its statistical power under different situations. It was found that in addition to the effect of sample size, the testing power was impacted by the strength of association between SNPs and the outcome of interest, and by the SNP similarity among the subjects. A quadratic relationship between SNP similarity and testing power was identified, and this relationship was further affected by sample sizes. Next we applied the method to a SNP-lung function data set to estimate and test the joint effect of a set of SNPs on forced vital capacity, one type of lung function measure. The findings were then connected to the patterns observed in simulation studies and further explored via variable importance indices of each SNP inferred from a variable selection procedure.  相似文献   

8.
We searched for SNPs in 417 regions distributed throughout the genome of three Oryza sativa ssp. japonica cultivars, two indica cultivars, and a wild rice (O. rufipogon). We found 2800 SNPs in approximately 250,000 aligned bases for an average of one SNP every 89 bp, or one SNP every 232 bp between two randomly selected strains. Graphic representation of the frequency of SNPs along each chromosome showed uneven distribution of polymorphism-rich and -poor regions, but little obvious association with the centromere or telomere. The 94 SNPs that we found between the closely related cultivars 'Nipponbare' and 'Koshihikari' can be converted into molecular markers. Our establishment of 213 co-dominant SNP markers distributed throughout the genome illustrates the immense potential of SNPs as molecular markers not only for genome research, but also for molecular breeding of rice.  相似文献   

9.
We have developed a genotyping system for detecting genetic contamination in the laboratory mouse based on assaying single-nucleotide polymorphism (SNP) markers positioned on all autosomes and the X chromosome. This system provides a fast, reliable, and cost-effective way for genetic monitoring, while maintaining a very high degree of confidence. We describe the allelic distribution of 235 SNPs in 48 mouse strains, thereby creating a database of polymorphisms useful for genotyping purposes. The SNP markers used in this study were chosen from publicly available SNP databases. Four genotyping methods were evaluated, and dynamic two-tube allele-specific PCR assays were developed for each marker and tested on a set of 48 inbred mouse strains. The minimal number of assays sufficient to distinguish groups consisting of different numbers of mouse strains was estimated, and a panel of 28 SNPs sufficient to distinguish virtually all of the inbred strains tested was selected. Amplifluor SNP detection assays were developed for these markers and tested on an extended list of 96 strains. This panel was used as a genetic quality control approach to monitor the genotypes of nearly 300 inbred, wild-derived, congenic, consomic, and recombinant inbred strains maintained at The Jackson Laboratory. We have concluded that this marker panel is sufficient for genetic contamination monitoring in colonies containing a large number of genetically diverse mouse strains and that reduced versions of the panel could be implemented in facilities housing a lower number of strains.  相似文献   

10.
双色荧光杂交芯片在近交系小鼠遗传监测中的应用   总被引:2,自引:0,他引:2  
应用一种新的高通量SNP检测方法-双色荧光杂交芯片技术进行近交系小鼠遗传监测。应用双色荧光杂交芯片技术对4个品系近交系小鼠的多个基因组DNA样本进行SNP分型,整合6个SNP位点的芯片杂交信息,对样本所属品系进行判断。研究结果表明SNP检测方法-双色荧光杂交芯片技术能够对选定的6个SNP位点进行高准确率分型;双色荧光杂交芯片技术是一种高通量SNP检测的良好工具,适合于对少量近交系品系来源的大样本量小鼠进行遗传污染监测和品系鉴定,并具有扩大应用的潜力。  相似文献   

11.
水稻单核苷酸多态性及其应用现状   总被引:6,自引:0,他引:6  
刘传光  张桂权 《遗传》2006,28(6):737-744
单核苷酸多态性(single nucleotide polymorphisms, SNPs)在水稻中数量多,分布密度高,遗传稳定性高。水稻SNPs的发现方法主要有对样本DNA的PCR产物直接测序、从SSR区段检测SNPs和从基因组序列直接搜索等。目前已有多种基因分型技术运用到了水稻SNPs检测,SNPs检测的高度自动化使水稻SNPs基因分型非常方便。单核苷酸多态性在水稻遗传图谱的构建、基因克隆和功能基因组学研究、标记辅助选择育种、遗传资源分类及物种进化等方面的应用具有巨大潜力。  相似文献   

12.
Genomic selection (GS) using high-density single-nucleotide polymorphisms (SNPs) is promising to improve response to selection in populations that are under artificial selection. High-density SNP genotyping of all selection candidates each generation, however, may not be cost effective. Smaller panels with SNPs that show strong associations with phenotype can be used, but this may require separate SNPs for each trait and each population. As an alternative, we propose to use a panel of evenly spaced low-density SNPs across the genome to estimate genome-assisted breeding values of selection candidates in pedigreed populations. The principle of this approach is to utilize cosegregation information from low-density SNPs to track effects of high-density SNP alleles within families. Simulations were used to analyze the loss of accuracy of estimated breeding values from using evenly spaced and selected SNP panels compared to using all high-density SNPs in a Bayesian analysis. Forward stepwise selection and a Bayesian approach were used to select SNPs. Loss of accuracy was nearly independent of the number of simulated quantitative trait loci (QTL) with evenly spaced SNPs, but increased with number of QTL for the selected SNP panels. Loss of accuracy with evenly spaced SNPs increased steadily over generations but was constant when the smaller number individuals that are selected for breeding each generation were also genotyped using the high-density SNP panel. With equal numbers of low-density SNPs, panels with SNPs selected on the basis of the Bayesian approach had the smallest loss in accuracy for a single trait, but a panel with evenly spaced SNPs at 10 cM was only slightly worse, whereas a panel with SNPs selected by forward stepwise selection was inferior. Panels with evenly spaced SNPs can, however, be used across traits and populations and their performance is independent of the number of QTL affecting the trait and of the methods used to estimate effects in the training data and are, therefore, preferred for broad applications in pedigreed populations under artificial selection.  相似文献   

13.
Ensuring the genetic homogeneity of the mice used in laboratory experiments contributes to the Reduction aspect of the Three Rs, by maximising the quality of the data obtained from any animals that are used for these purposes, and ultimately reducing the numbers of animals used. Single nucleotide polymorphism (SNP) genotyping is especially suitable for use in the analysis of the genetic purity of model organisms such as the mouse, because bi-allelic markers remain fully informative when used to characterise crosses between inbred strains. Here, we attempted to apply a microarray-based method for a SNP marker to monitor the genetic quality of inbred mouse strains, so as to validate the reliability, stability and applicability of this SNP genotyping panel. The amplified PCR products containing four different SNP loci from four inbred mouse strains were spotted and immobilised onto amino-modified glass slides to generate a microarray. This was then interrogated through hybridisation with dual-colour probes, to determine the SNP genotypes of each sample. The results indicated that this microarray-based method could effectively determine the genotypes of the four selected SNPs with a high degree of accuracy. We have developed a new SNP genotyping technique for effective use in the genetic monitoring of inbred mouse strains.  相似文献   

14.
This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.  相似文献   

15.
Currently, single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) of >5% are preferentially used in case-control association studies of common human diseases. Recent technological developments enable inexpensive and accurate genotyping of a large number of SNPs in thousands of cases and controls, which can provide adequate statistical power to analyze SNPs with MAF <5%. Our purpose was to determine whether evaluating rare SNPs in case-control association studies could help identify causal SNPs for common diseases. We suggest that slightly deleterious SNPs (sdSNPs) subjected to weak purifying selection are major players in genetic control of susceptibility to common diseases. We compared the distribution of MAFs of synonymous SNPs with that of nonsynonymous SNPs (1) predicted to be benign, (2) predicted to be possibly damaging, and (3) predicted to be probably damaging by PolyPhen. Our sources of data were the International HapMap Project, ENCODE, and the SeattleSNPs project. We found that the MAF distribution of possibly and probably damaging SNPs was shifted toward rare SNPs compared with the MAF distribution of benign and synonymous SNPs that are not likely to be functional. We also found an inverse relationship between MAF and the proportion of nsSNPs predicted to be protein disturbing. On the basis of this relationship, we estimated the joint probability that a SNP is functional and would be detected as significant in a case-control study. Our analysis suggests that including rare SNPs in genotyping platforms will advance identification of causal SNPs in case-control association studies, particularly as sample sizes increase.  相似文献   

16.
Single nucleotide polymorphism (SNP) markers have become a genetic technology of choice because of their automation and high precision of allele calls. In this study, our goal was to develop 94 SNPs and test them across well-chosen common bean (Phaseolus vulgaris L.) germplasm. We validated and accessed SNP diversity at 84 gene-based and 10 non-genic loci using KASPar technology in a panel of 70 genotypes that have been used as parents of mapping populations and have been previously evaluated for SSRs. SNPs exhibited high levels of genetic diversity, an excess of middle frequency polymorphism, and a within-genepool mismatch distribution as expected for populations affected by sudden demographic expansions after domestication bottlenecks. This set of markers was useful for distinguishing Andean and Mesoamerican genotypes but less useful for distinguishing within each gene pool. In summary, slightly greater polymorphism and race structure was found within the Andean gene pool than within the Mesoamerican gene pool but polymorphism rate between genotypes was consistent with genepool and race identity. Our survey results represent a baseline for the choice of SNP markers for future applications because gene-associated SNPs could themselves be causative SNPs for traits. Finally, we discuss that the ideal genetic marker combination with which to carry out diversity, mapping and association studies in common bean should consider a mix of both SNP and SSR markers.  相似文献   

17.
The number of metastases associated with a primary tumor can be a major determinant for the chance of cure. A model is proposed here where the frequency distribution of the number of experimental organ metastases is affected by Poisson statistics, and by stochastic variations in regional blood flow. The model predicts that the mean E (X) and variance var (X) of the number of metastases per organ should relate according to a power function,var (X)= E (X)(p)+E (X), where and p are the constants, and that the actual distribution has a Poisson-negative binomial form. This model was found consistent with the data derived from a meta-analysis of over 47 000 murine experimental metastases. This frequency distribution (together with knowledge of the size distribution for metastases and the fraction of clonogenic cells) could permit more accurate assessments for tumor control probabilities with adjuvant therapies.  相似文献   

18.
OBJECTIVE: Evaluate the consistency of the contribution of interactions between single nucleotide polymorphism (SNP) genotype effects to variation in measures of lipid metabolism across ethnic strata within gender. METHODS AND RESULTS: We considered 80 SNPs within the apolipoprotein (APO) A1/C3/A4/A5 gene cluster using an over-parameterized general linear model to identify SNPs whose genotype effects combine non-additively to influence plasma levels of high density lipoprotein cholesterol (HDL-C), total cholesterol (TC) and triglycerides (TG) in a consistent manner across ethnic strata. We analyzed population-based samples of unrelated 18 to 30 year old African-Americans (n = 1,858) and European-Americans (n = 1,973) ascertained without regard to health at four field centers (Birmingham, Ala.; Chicago, Ill.; Minneapolis, Minn. and Oakland, Calif., USA) by the Coronary Artery Risk Development in Young Adults (CARDIA) study. To identify which SNP genotype effects combine non-additively we used a two-tier analysis strategy. We first required that pairs of SNPs show statistically significant non-additivity in both ethnic strata within a gender, where experiment-wise significance was evaluated using a permutation test to determine the probability of observing the number of tests significant in both ethnic strata by chance alone. Second, we required no significant evidence of heterogeneity of the relationship between the phenotype and the two SNP genotypes across ethnic strata and across field centers within each ethnic group. From this strategy we identified ten pairs of SNPs, involving thirteen SNPs, that displayed statistically significant non-additivity of SNP genotype effects on TC. Only one of these thirteen SNPs had statistically significant genotype effects that were consistent across samples. CONCLUSION: Our analyses suggest that ignoring the contribution of interactions between SNP genotype effects when modeling multi-SNP genotype-phenotype relationships may result in an underestimate of the contribution of genetic variation to variation in quantitative cardiovascular disease (CVD) risk factor traits.  相似文献   

19.
An improved approach for increasing the multiplex level of single nucleotide polymorphism (SNP) typing by adapter ligation-mediated allele-specific amplification (ALM-ASA) has been developed. Based on an adapter ligation, each reaction requires n allele-specific primers plus an adapter-specific primer that is common for all SNPs. Thus, only n+1 primers are used for an n-plex PCR amplification. The specificity of ALM-ASA was increased by a special design of the adapter structure and PCR suppression. Given that the genetic polymorphisms in the liver enzyme cytochrome P450 CYP2D6 (debrisoquine 4-hydroxylase) have profound effects on responses of individuals to a particular drug, we selected 17 SNPs in the CYP2D6 gene as an example for the multiplex SNP typing. Without extensive optimization, we successfully typed 17-plex SNPs in the CYP2D6 gene by ALM-ASA. The results for genotyping 70 different genome samples by the 17-plex ALM-ASA were completely consistent with those obtained by both Sanger's sequencing and PCR restriction fragment length polymorphism (PCR-RFLP) analysis. ALM-ASA is a potential method for SNP typing at an ultra-low cost because of a high multiplex level and a simple optimization step for PCR. High-throughput SNP typing could be readily realized by coupling ALM-ASA with a well-developed automation device for sample processing.  相似文献   

20.
Highly parallel SNP genotyping platforms have been developed for some important crop species, but these platforms typically carry a high cost per sample for first-time or small-scale users. In contrast, recently developed genotyping by sequencing (GBS) approaches offer a highly cost effective alternative for simultaneous SNP discovery and genotyping. In the present investigation, we have explored the use of GBS in soybean. In addition to developing a novel analysis pipeline to call SNPs and indels from the resulting sequence reads, we have devised a modified library preparation protocol to alter the degree of complexity reduction. We used a set of eight diverse soybean genotypes to conduct a pilot scale test of the protocol and pipeline. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, we obtained 5.5 M reads and these were processed using our pipeline. A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence. Validation of over 400 genotypes at a set of randomly selected SNPs using Sanger sequencing showed a 98% success rate. We then explored the use of selective primers to achieve a greater complexity reduction during GBS library preparation. The number of SNP calls could be increased by almost 40% and their depth of coverage was more than doubled, thus opening the door to an increase in the throughput and a significant decrease in the per sample cost. The approach to obtain high quality SNPs developed here will be helpful for marker assisted genomics as well as assessment of available genetic resources for effective utilisation in a wide number of species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号