首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.  相似文献   

3.
See D  Kanazin V  Talbert H  Blake T 《BioTechniques》2000,28(4):710-4, 716
Single-nucleotide polymorphisms (SNPs) represent the most prevalent class of genetic markers available for linkage disequilibrium or cladistic analyses. PCR primers may be labeled with fluorescent dyes and used to rapidly and accurately differentiate among alleles that are defined by a single-nucleotide differences. Here, we describe the primer-mediated detection of SNPs based on primer mismatch during allele-specific amplification of preamplified target sequences. Primers are labeled with different fluors at their 5' nucleotides, with their 3' termini at the transition mutation that defines allelic variation at the target locus. Each primer perfectly matches one of the two available alleles for each locus. Electrophoretic detection permits characterization of the product both by size and fluor. This report demonstrates some of the capabilities of this assay, including heterozygote determination and multiplexed analysis.  相似文献   

4.
Mining single-nucleotide polymorphisms from hexaploid wheat ESTs.   总被引:20,自引:0,他引:20  
Single-nucleotide polymorphisms (SNPs) represent a new form of functional marker, particularly when they are derived from expressed sequence tags (ESTs). A bioinformatics strategy was developed to discover SNPs within a large wheat EST database and to demonstrate the utility of SNPs in genetic mapping and genetic diversity applications. A collection of > 90000 wheat ESTs was assembled into contiguous sequences (contigs), and 45 random contigs were then visually inspected to identify primer pairs capable of amplifying specific alleles. We estimate that homoeologue sequence variants occurred 1 in 24 bp and the frequency of SNPs between wheat genotypes was 1 SNP/540 bp (theta = 0.0069). Furthermore, we estimate that one diagnostic SNP test can be developed from every contig with 10-60 EST members. Thus, EST databases are an abundant source of SNP markers. Polymorphism information content for SNPs ranged from 0.04 to 0.50 and ESTs could be mapped into a framework of microsatellite markers using segregating populations. The results showed that SNPs in wheat can be discovered in ESTs, validated, and be applied to conventional genetic studies.  相似文献   

5.
Structural location of disease-associated single-nucleotide polymorphisms   总被引:7,自引:0,他引:7  
Non-synonymous single-nucleotide polymorphism (nsSNP) of genes introduces amino acid changes to proteins, and plays an important role in providing genetic functional diversity. To understand the structural characteristics of disease-associated SNPs, we have mapped a set of nsSNPs derived from the online mendelian inheritance in man (OMIM) database to the structural surfaces of encoded proteins. These nsSNPs are disease-associated or have distinctive phenotypes. As a control dataset, we mapped a set of nsSNPs derived from SNP database dbSNP to the structural surfaces of those encoded proteins. Using the alpha shape method from computational geometry, we examine the geometric locations of the structural sites of these nsSNPs. We classify each nsSNP site into one of three categories of geometric locations: those in a pocket or a void (type P); those on a convex region or a shallow depressed region (type S); and those that are buried completely in the interior (type I). We find that the majority (88%) of disease-associated nsSNPs are located in voids or pockets, and they are infrequently observed in the interior of proteins (3.2% in the data set). We find that nsSNPs mapped from dbSNP are less likely to be located in pockets or voids (68%). We further introduce a novel application of hidden Markov models (HMM) for analyzing sequence homology of SNPs on various geometric sites. For SNPs on surface pocket or void, we find that there is no strong tendency for them to occur on conserved residues. For SNPs buried in the interior, we find that disease-associated mutations are more likely to be conserved. The approach of classifying nsSNPs with alpha shape and HMM developed in this study can be integrated with additional methods to improve the accuracy of predictions of whether a given nsSNP is likely to be disease-associated.  相似文献   

6.
PURPOSE OF REVIEW: The identification of regulatory polymorphisms has become a key problem in human genetics. In the past few years there has been a conceptual change in the way in which regulatory single-nucleotide polymorphisms are studied. We revise the new approaches and discuss how gene expression studies can contribute to a better knowledge of the genetics of common diseases. RECENT FINDINGS: New techniques for the association of single-nucleotide polymorphisms with changes in gene expression have been recently developed. This, together with a more comprehensive use of the old in-vitro methods, has produced a great amount of genetic information. When added to current databases, it will help to design better tools for the detection of regulatory single-nucleotide polymorphisms. SUMMARY: The identification of functional regulatory single-nucleotide polymorphisms cannot be done by the simple inspection of DNA sequence. In-vivo techniques, based on primer-extension, and the more recently developed 'haploChIP' allow the association of gene variants to changes in gene expression. Gene expression analysis by conventional in-vitro techniques is the only way to identify the functional consequences of regulatory single-nucleotide polymorphisms. The amount of information produced in the last few years will help to refine the tools for the future analysis of regulatory gene variants.  相似文献   

7.
The efficacy of linkage studies using microsatellites and single-nucleotide polymorphisms (SNPs) was evaluated. Analyzed data were supplied by the Collaborative Study on the Genetics of Alcoholism (COGA). Alcoholism was analyzed together with a simulated trait caused by a gene of known position, through a nonparametric linkage test (NPL). For the alcoholism trait, four densities of SNPs (1 SNP per 0.2 cM, 0.5 cM, 1 cM and 2 cM) showed higher peaks of NPL z scores and smaller significant p-values than the usual 10-cM density of microsatellites. However, the two highest densities of SNPs had unstable z score signals, and therefore were difficult to interpret. Analyzing a simulated trait with the same markers in the same pedigrees, we confirmed the higher power of all four densities of SNPs compared to the 10-cM microsatellites panel, although the existence of other confounding peaks was confirmed for maps that are denser than 1 SNP/cM. We further showed that estimating the gene position using SNPs is far less biased than using the usual panel of microsatellites (biases of 0-2 cM for SNPs vs. 8.9 cM for microsatellites). We conclude that using dense maps of SNPs in linkage analysis is more powerful and less biased than using the 10-cM maps of microsatellites. However, linkage signals can be unstable and difficult to interpret when several SNPs are genotyped per centimorgan. The power and accuracy of 1 SNP/cM or 1 SNP/2 cM may be sufficient in a genome-wide linkage scan while denser maps may be most useful in fine-gene mapping studies exploiting linkage disequilibrium.  相似文献   

8.
Mapping the genetic architecture of complex traits in experimental populations   总被引:18,自引:0,他引:18  
SUMMARY: Understanding how interactions among set of genes affect diverse phenotypes is having a greater impact on biomedical research, agriculture and evolutionary biology. Mapping and characterizing the isolated effects of single quantitative trait locus (QTL) is a first step, but we also need to assemble networks of QTLs and define non-additive interactions (epistasis) together with a host of potential environmental modulators. In this article, we present a full-QTL model with which to explore the genetic architecture of complex trait in multiple environments. Our model includes the effects of multiple QTLs, epistasis, QTL-by-environment interactions and epistasis-by-environment interactions. A new mapping strategy, including marker interval selection, detection of marker interval interactions and genome scans, is used to evaluate putative locations of multiple QTLs and their interactions. All the mapping procedures are performed in the framework of mixed linear model that are flexible to model environmental factors regardless of fix or random effects being assumed. An F-statistic based on Henderson method III is used for hypothesis tests. This method is less computationally greedy than corresponding likelihood ratio test. In each of the mapping procedures, permutation testing is exploited to control for genome-wide false positive rate, and model selection is used to reduce ghost peaks in F-statistic profile. Parameters of the full-QTL model are estimated using a Bayesian method via Gibbs sampling. Monte Carlo simulations help define the reliability and efficiency of the method. Two real-world phenotypes (BXD mouse olfactory bulb weight data and rice yield data) are used as exemplars to demonstrate our methods. AVAILABILITY: A software package is freely available at http://ibi.zju.edu.cn/software/qtlnetwork  相似文献   

9.
A major challenge with single-nucleotide polymorphism (SNP) fingerprinting of bacteria and higher organisms is the combination of genome-wide screenings with the potential of multiplexing and accurate SNP detection. Single-nucleotide extension by the minisequencing principle represents a technology that both is highly accurate and enables multiplexing. A current bottleneck for direct genome analyses by minisequencing, however, is the sensitivity, since minisequencing relies on linear signal amplification. Here, we present SNPtrap, which is a novel approach that combines the specificity and possibility of multiplexing by minisequencing with the sensitivity obtained by logarithmic signal amplification by polymerase chain reaction (PCR). We show a SNPtrap proof of principle in a model system for two polymorphic SNP sites in the Salmonella tetrathionate reductase gene (ttrC).  相似文献   

10.
The authors describe a method in which the population frequency of single-nucleotide polymorphisms (SNPs) can be efficiently detected and their allele frequencies accurately measured. Selected SNPs in TNFbeta, IL-4, and CTLA-4 were used to demonstrate the method. Blood from 4000 individuals was pooled, DNA was extracted, and target sequences were PCR amplified and analyzed by denaturant capillary electrophoresis. Alleles were separated into peaks based on melting properties of the double DNA helix. Frequencies of the different alleles were determined by calculating the area under the peaks. Allele frequencies and Hardy-Weinberg equilibrium estimated from the pooled data were verified by analyzing 7.5% of the samples randomly selected from the blood donor series. The method herein is equally suitable for single-samples and/or pooled-samples analysis of SNPs, in which sample treatment is kept to a minimum. The potential throughput of the method is beyond obtainable numbers of samples.  相似文献   

11.
There has been great interest in the prospects of using single-nucleotide polymorphisms (SNPs) in the search for complex disease genes, and several initiatives devoted to the identification and mapping of SNPs throughout the human genome are currently underway. However, actual data investigating the use of SNPs for identification of complex disease genes are scarce. To begin to look at issues surrounding the use of SNPs in complex disease studies, we have initiated a collaborative SNP mapping study around APOE, the well-established susceptibility gene for late-onset Alzheimer disease (AD). Sixty SNPs in a 1.5-Mb region surrounding APOE were genotyped in samples of unrelated cases of AD, in controls, and in families with AD. Standard tests were conducted to look for association of SNP alleles with AD, in cases and controls. We also used family-based association analyses, including recently developed methods to look for haplotype association. Evidence of association (P相似文献   

12.
Kostem E  Lozano JA  Eskin E 《Genetics》2011,188(2):449-460
Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.  相似文献   

13.
14.
The overlap of 94 single-nucleotide polymorphisms (SNP) among the 4,720 and 11,120 SNPs contained in the linkage panels of Illumina and Affymetrix, respectively, allows an assessment of the discrepancy rate produced by these two platforms. Although the no-call rate for the Affymetrix platform is approximately 8.6 times greater than for the Illumina platform, when both platforms make a genotypic call, the agreement is an impressive 99.85%. To determine if disputed genotypes can be resolved without sequencing, we studied recombination in the region of the discrepancy for the most discrepant SNP rs958883 (typed by Illumina) and tsc02060848 (typed by Affymetrix). We find that the number of inferred recombinants is substantially higher for the Affymetrix genotypes compared to the Illumina genotypes. We illustrate this with pedigree 10043, in which 3 of 7 versus 0 of 7 offspring must be double recombinants using the genotypes from the Affymetrix and the Illumina platforms, respectively. Of the 36 SNPs with one or more discrepancies, we identified a subset that appears to cluster in families. Some of this clustering may be due to the presence of a second segregating SNP that obliterates a XbaI site (the restriction enzyme used in the Affymetrix platform), resulting in a fragment too long (>1,000 bp) to be amplified.  相似文献   

15.
Although there has been great success in identifying disease genes for simple, monogenic Mendelian traits, deciphering the genetic mechanisms involved in complex diseases remains challenging. One major approach is to identify configurations of interacting factors such as single nucleotide polymorphisms (SNPs) that confer susceptibility to disease. Traditional methods, such as the multiple dimensional reduction method and the combinatorial partitioning method, provide good tools to decipher such interactions amid a disease population with a single genetic cause. However, these traditional methods have not managed to resolve the issue of genetic heterogeneity, which is believed to be a very common phenomenon in complex diseases. There is rarely prior knowledge of the genetic heterogeneity of a disease, and traditional methods based on estimation over the entire population are unlikely to succeed in the presence of heterogeneity. We present a novel Boosted Generative Modeling (BGM) approach for structure-model the interactions leading to diseases in the context of genetic heterogeneity. Our BGM method bridges the ensemble and generative modeling approaches to genetic association studies under a case-control design. Generative modeling is employed to model the interaction network configuration and the causal relationships, while boosting is used to address the genetic heterogeneity problem. We perform our method on simulation data of complex diseases. The results indicate that our method is capable of modeling the structure of interaction networks among disease-susceptible loci and of addressing genetic heterogeneity issues where the traditional methods, such as multiple dimensional reduction method, fail to apply. Our BGM method provides an exploratory tool that identifies the variables (e.g., disease-susceptible loci) that are likely to correlate and contribute to the disease.  相似文献   

16.
Single-nucleotide polymorphisms (SNPs) are increasingly used as genetic markers. Although a high number of SNP-genotyping techniques have been described, most techniques still have low throughput or require major investments. For laboratories that have access to an automated sequencer, a single-base extension (SBE) assay can be implemented using the ABI SNaPshot™ kit. Here we present a modified protocol comprising multiplex template generation, multiplex SBE reaction, and multiplex sample analysis on a gel-based sequencer such as the ABI 377. These sequencers run on a Macintosh platform, but on this platform the software available for analysis of data from the ABI 377 has limitations. First, analysis of the size standard included with the kit is not facilitated. Therefore a new size standard was designed. Second, using Genotype (ABI), the analysis of the data is very tedious and time consuming. To enable automated batch analysis of 96 samples, with 10 SNPs each, we developed SNPtyper. This is a spreadsheet-based tool that uses the data from Genotyper and offers the user a convenient interface to set parameters required for correct allele calling. In conclusion, the method described will enable any lab having access to an ABI sequencer to genotype up to 1000 SNPs per day for a single experimenter, without investing in new equipment.  相似文献   

17.
In recent years matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI) has emerged as a very powerful method for genotyping single nucleotide polymorphisms. The accuracy, speed of data accumulation, and data structure are the major features of MALDI. Several SNP genotyping methods have been implemented with a high degree of automation and are being applied for large-scale association studies. Most methods for SNP genotyping using MALDI mass spectrometric detection and their potential application for high-throughput are reviewed here.  相似文献   

18.
We identified 37 single-nucleotide polymorphisms (SNPs) in sheep and screened 16 individuals from 8 different sheep breeds selected throughout Europe. Population genetic measures based on the genotyping of about 30 sheep from the same 8 breeds are reported. To date, there are no sheep SNPs documented in the National Center for Biotechnology Information dbSNP database. Therefore, the markers presented here contribute significantly to those currently available.  相似文献   

19.
We studied several methods for selecting single-nucleotide polymorphisms (SNPs) in a disease association study. Two major categories for analytical strategy are the univariate and the set selection approaches. The univariate approach evaluates each SNP marker one at a time, while the set selection approach tests disease association of a set of SNP markers simultaneously. We examined various test statistics that can be utilized in testing disease association and also reviewed several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers. The set association methods were then briefly reviewed. Finally, we applied these methods to the data from Collaborative Study on the Genetics of Alcoholism (COGA).  相似文献   

20.
Many investigators are now using haplotype-tagging single-nucleotide polymorphism (htSNPs) as a way of screening regions of the genome for association with disease. A common approach is to genotype htSNPs in a study population and to use this information to draw inferences about each individual's haplotypic makeup, including SNPs that were not directly genotyped. To test the validity of this approach, we simulated the exercise of typing htSNPs in a large sample of individuals and compared the true and inferred haplotypes. The accuracy of haplotype inference varied, depending on the method of selecting htSNPs, the linkage-disequilibrium structure of the region, and the amount of missing data. At the stage of selection of htSNPs, haplotype-block-based methods required a larger number of htSNPs than did unstructured methods but gave lower levels of error in haplotype inference, particularly when there was a significant amount of missing data. We present a Web-based utility that allows investigators to compare the likely error rates of different sets of htSNPs and to arrive at an economical set of htSNPs that provides acceptable levels of accuracy in haplotype inference.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号