首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Recent development of high-resolution single nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. There is increasing recognition of the importance of SNPs for medicine and developmental biology. However, SNP data set typically has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease data set) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data.

Results

In this paper, we use shrunken dissimilarity measure to analyze and select relevant SNPs for classification problems. Examples of HapMap data and Parkinson disease (PD) data are given to demonstrate the effectiveness of the proposed method, and illustrate it has a potential to become a useful analysis tool for SNP data sets. We use Parkinson disease data as an example, and perform a whole genome analysis. For the 367440 SNPs with less than 1% missing percentage from all 22 chromosomes, we can select 357 SNPs from this data set. For the unique genes that those SNPs are located in, a gene-gene similarity value is computed using GOSemSim and gene pairs that has a similarity value being greater than a threshold are selected to construct several groups of genes. For the SNPs that involved in these groups of genes, a statistical software PLINK is employed to compute the pair-wise SNP-SNP interactions, and SNPs with significance of P < 0.01 are chosen to identify SNPs networks based on their P values. Here SNPs networks are constructed based on Gene Ontology knowledge, and therefore each SNP network plays a role in the biological process. An analysis shows that such networks have relationships directly or indirectly to Parkinson disease.

Conclusions

Experimental results show that our approach is suitable to handle genetic variations, and provide useful knowledge in a genome-wide SNP study.
  相似文献   

2.
Genetic variation analysis holds much promise as a basis for disease-gene association. However, due to the tremendous number of candidate single nucleotide polymorphisms (SNPs), there is a clear need to expedite genotyping by selecting and considering only a subset of all SNPs. This process is known as tagging SNP selection. Several methods for tagging SNP selection have been proposed, and have shown promising results. However, most of them rely on strong assumptions such as prior block-partitioning, bi-allelic SNPs, or a fixed number or location of tagging SNPs. We introduce BNTagger, a new method for tagging SNP selection, based on conditional independence among SNPs. Using the formalism of Bayesian networks (BNs), our system aims to select a subset of independent and highly predictive SNPs. Similar to previous prediction-based methods, we aim to maximize the prediction accuracy of tagging SNPs, but unlike them, we neither fix the number nor the location of predictive tagging SNPs, nor require SNPs to be bi-allelic. In addition, for newly-genotyped samples, BNTagger directly uses genotype data as input, while producing as output haplotype data of all SNPs. Using three public data sets, we compare the prediction performance of our method to that of three state-of-the-art tagging SNP selection methods. The results demonstrate that our method consistently improves upon previous methods in terms of prediction accuracy. Moreover, our method retains its good performance even when a very small number of tagging SNPs are used.  相似文献   

3.
Elucidating the effects of genetic polymorphisms on genes and gene networks is an important step in disease association studies. We developed the SNP2NMD database for human SNPs (single nucleotide polymorphisms) that result in PTCs (premature termination codons) and trigger nonsense-mediated mRNA decay (NMD). The SNP2NMD Web interfaces provide extensive genetic information on and graphical views of the queried SNP, gene, and disease terms. Availability: SNP2NMD is available from http://variome.net, or directly from http://bioportal.kobic.re.kr/SNP2NMD. Supplementary information: http://bioportal.kobic.re.kr/SNP2NMD/Wiki.jsp?page=Statistics.  相似文献   

4.
Li C  Zhang G  Li X  Rao S  Gong B  Jiang W  Hao D  Wu P  Wu C  Du L  Xiao Y  Wang Y 《Gene》2008,408(1-2):104-111
The advent of high-throughput single nucleotide polymorphisms (SNPs) omics technologies has brought tremendous genetic data. Systematic evaluation of the genome-wide SNPs is expected to provide breakthroughs in the understanding of complex diseases. In this study, we developed a new systematic method for mapping multiple loci and applied the proposed method to construct a genetic network for rheumatoid arthritis (RA) via analysis of 746 multiplex families genotyped with more than five thousands of genome-wide SNPs. We successfully identified 41 significant SNPs relevant to RA, 25 associated genes and a number of important SNP-SNP interactions (SNP patterns). Many findings (loci, genes and interactions) have experimental support from previous studies while novel findings may define unknown genetic pathways for this complex disease. Finally, we constructed a genetic network by integrating the results from this analysis with the rapidly accumulated knowledge in biomedical domains, which gave us a more detailed insight onto the RA etiology. The results suggest that the proposed systematic method is powerful when applied to genome-wide association studies. Integrating the analysis of high-throughput SNP data with knowledge-based SNP functional annotation offers a promising way to reversely engineer the underlying genetic networks for complex human diseases.  相似文献   

5.
Li C  Li Y  Xu J  Lv J  Ma Y  Shao T  Gong B  Tan R  Xiao Y  Li X 《Gene》2011,489(2):119-129
Detection of the synergetic effects between variants, such as single-nucleotide polymorphisms (SNPs), is crucial for understanding the genetic characters of complex diseases. Here, we proposed a two-step approach to detect differentially inherited SNP modules (synergetic SNP units) from a SNP network. First, SNP-SNP interactions are identified based on prior biological knowledge, such as their adjacency on the chromosome or degree of relatedness between the functional relationships of their genes. These interactions form SNP networks. Second, disease-risk SNP modules (or sub-networks) are prioritised by their differentially inherited properties in IBD (Identity by Descent) profiles of affected and unaffected sibpairs. The search process is driven by the disease information and follows the structure of a SNP network. Simulation studies have indicated that this approach achieves high accuracy and a low false-positive rate in the identification of known disease-susceptible SNPs. Applying this method to an alcoholism dataset, we found that flexible patterns of susceptible SNP combinations do play a role in complex diseases, and some known genes were detected through these risk SNP modules. One example is GRM7, a known alcoholism gene successfully detected by a SNP module comprised of two SNPs, but neither of the two SNPs was significantly associated with the disease in single-locus analysis. These identified genes are also enriched in some pathways associated with alcoholism, including the calcium signalling pathway, axon guidance and neuroactive ligand-receptor interaction. The integration of network biology and genetic analysis provides putative functional bridges between genetic variants and candidate genes or pathways, thereby providing new insight into the aetiology of complex diseases.  相似文献   

6.
By using assembled expressed sequence tags (ESTs) from 14 different eDNA libraries that contain 84 132 sequences reads, 556 Populus candidate single nucleotide polymorphisms (SNPs) were identified. Because traces were not available from dbEST (http://www.ncbi.nlm.nih.gov/dbEST/index.html), stringent filters were used to identify reliable candidate SNPs. Sequences analysis indicated that the main types of substitutions among candidate SNPs were A/G and T/C transitions, which accounted for 22.0% and 30.8%, respectively. One hundred and ten candidate SNPs were tested. As a result, 38 candidate SNPs were confirmed by directed sequencing of PCR products amplified from six different individuals. Thirteen new SNPs in intron regions were found and multiple SNPs were found to be located in both intron and exon regions of four contigs. Heterozygosis was found in all 47 candidate sites and five SNP sites were heterozygous in all six samples. This is the first report of SNP identification in a tree species which reveals that assembled ESTs from multiple libraries of the public database may provide a rich source of comparative sequences for an SNP search in the poplar genome.  相似文献   

7.

Background

Single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) are the most common type of polymorphisms and are frequently used for molecular marker development. Such markers have become very popular for all kinds of genetic analysis, including haplotype reconstruction. Haplotypes can be reconstructed for whole chromosomes but also for specific genes, based on the SNPs present. Haplotypes in the latter context represent the different alleles of a gene. The computational approach to SNP mining is becoming increasingly popular because of the continuously increasing number of sequences deposited in databases, which allows a more accurate identification of SNPs. Several software packages have been developed for SNP mining from databases. From these, QualitySNP is the only tool that combines SNP detection with the reconstruction of alleles, which results in a lower number of false positive SNPs and also works much faster than other programs. We have build a web-based SNP discovery and allele detection tool (HaploSNPer) based on QualitySNP.

Results

HaploSNPer is a flexible web-based tool for detecting SNPs and alleles in user-specified input sequences from both diploid and polyploid species. It includes BLAST for finding homologous sequences in public EST databases, CAP3 or PHRAP for aligning them, and QualitySNP for discovering reliable allelic sequences and SNPs. All possible and reliable alleles are detected by a mathematical algorithm using potential SNP information. Reliable SNPs are then identified based on the reconstructed alleles and on sequence redundancy.

Conclusion

Thorough testing of HaploSNPer (and the underlying QualitySNP algorithm) has shown that EST information alone is sufficient for the identification of alleles and that reliable SNPs can be found efficiently. Furthermore, HaploSNPer supplies a user friendly interface for visualization of SNP and alleles. HaploSNPer is available from http://www.bioinformatics.nl/tools/haplosnper/.  相似文献   

8.
The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs accurately representing the rest of the SNPs. Informative SNP selection can achieve (1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or (2) necessary reduction of the huge SNP sets (obtained, e.g. from Affymetrix) for further fine haplotype analysis. A novel informative SNP selection method for unphased genotype data based on multiple linear regression (MLR) is implemented in the software package MLR-tagging. This software can be used for informative SNP (tag) selection and genotype prediction. The stepwise tag selection algorithm (STSA) selects positions of the given number of informative SNPs based on a genotype sample population. The MLR SNP prediction algorithm predicts a complete genotype based on the values of its informative SNPs, their positions among all SNPs, and a sample of complete genotypes. An extensive experimental study on various datasets including 10 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin et al. (2005). AVAILABILITY: MLR-Tagging software package is publicly available at http://alla.cs.gsu.edu/~software/tagging/tagging.html  相似文献   

9.
As a consequence of Human Genome Project and single nucleotide polymorphism (SNP) discovery projects, several millions of SNPs, which include possible susceptibility SNPs for multifactorial diseases, have been revealed. Accordingly, there has been a strong drive to perform the investigation with all candidate SNPs for a certain disease without decreasing the number of analyzed SNPs. We developed DigiTag assay, which uses well-designed oligonucleotides called DNA coded numbers (DCNs) in multiplex SNP genotype analysis. During the analysis, the information of a genotype is converted to one of the DCNs in a one to one manner using oligonucleotide ligation assay (encoding). After the encoding reaction, only the DCNs regions and not the SNP specific regions are amplified using the universal primers and then SNP genotype is read out using DNA capillary arrays. DigiTag assay was found to be successful in SNP genotyping, giving a high success rate (24 of 27 SNPs) for randomly chosen SNPs. Moreover, this assay has the potential to analyze almost all kinds of the target SNPs by applying mismatch-induced probes and redesigned primer pairs at a low-cost.  相似文献   

10.
GWAS has facilitated greatly the discovery of risk SNPs associated with complex diseases. Traditional methods analyze SNP individually and are limited by low power and reproducibility since correction for multiple comparisons is necessary. Several methods have been proposed based on grouping SNPs into SNP sets using biological knowledge and/or genomic features. In this article, we compare the linear kernel machine based test (LKM) and principal components analysis based approach (PCA) using simulated datasets under the scenarios of 0 to 3 causal SNPs, as well as simple and complex linkage disequilibrium (LD) structures of the simulated regions. Our simulation study demonstrates that both LKM and PCA can control the type I error at the significance level of 0.05. If the causal SNP is in strong LD with the genotyped SNPs, both the PCA with a small number of principal components (PCs) and the LKM with kernel of linear or identical-by-state function are valid tests. However, if the LD structure is complex, such as several LD blocks in the SNP set, or when the causal SNP is not in the LD block in which most of the genotyped SNPs reside, more PCs should be included to capture the information of the causal SNP. Simulation studies also demonstrate the ability of LKM and PCA to combine information from multiple causal SNPs and to provide increased power over individual SNP analysis. We also apply LKM and PCA to analyze two SNP sets extracted from an actual GWAS dataset on non-small cell lung cancer.  相似文献   

11.
An international effort is underway to generate a comprehensive haplotype map (HapMap) of the human genome represented by an estimated 300000 to 1 million ‘tag’ single nucleotide polymorphisms (SNPs). Our analysis indicates that the current human SNP map is not sufficiently dense to support the HapMap project. For example, 24.6% of the genome currently lacks SNPs at the minimal density and spacing that would be required to construct even a conservative tag SNP map containing 300 000 SNPs. In an effort to improve the human SNP map, we identified 140 696 additional SNP candidates using a new bioinformatics pipeline. Over 51 000 of these SNPs mapped to the largest gaps in the human SNP map, leading to significant improvements in these regions. Our SNPs will be immediately useful for the HapMap project, and will allow for the inclusion of many additional genomic intervals in the final HapMap. Nevertheless, our results also indicate that additional SNP discovery projects will be required both to define the haplotype architecture of the human genome and to construct comprehensive tag SNP maps that will be useful for genetic linkage studies in humans.  相似文献   

12.
We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.  相似文献   

13.
14.
We have used linkage disequilibrium (LD) to identify single nucleotide polymorphisms (SNPs) on the Illumina Equine SNP50 BeadChip, which may be incorrectly positioned on the genome map. A total of 1201 Thoroughbred horses were genotyped using the Illumina Equine SNP50 BeadChip. LD was evaluated in a pairwise fashion between all autosomal SNPs, both within and across chromosomes. Filters were then applied to the data, firstly to identify SNPs that may have been mapped to the wrong chromosome and secondly to identify SNPs that may have been incorrectly positioned within chromosomes. We identified a single SNP on ECA28, which showed low LD with neighbouring SNPs but considerable LD with a group of SNPs on ECA10. Furthermore, a cluster of SNPs on ECA5 showed unusually low LD with surrounding SNPs. A total of 39 SNPs met the criteria for unusual within-chromosome LD. The results of this study indicate that some SNPs may be misplaced. This finding is significant, as misplaced SNPs may lead to difficulties in the application of genomic methods, such as homozygosity mapping, for which SNP order is important.  相似文献   

15.
Schizophrenia (SZ) is a complex disorder resulting from both genetic and environmental causes with a lifetime prevalence world-wide of 1%; however, there are no specific, sensitive and validated biomarkers for SZ. A general unifying hypothesis has been put forward that disease-associated single nucleotide polymorphisms (SNPs) from genome-wide association study (GWAS) are more likely to be associated with gene expression quantitative trait loci (eQTL). We will describe this hypothesis and review primary methodology with refinements for testing this paradigmatic approach in SZ. We will describe biomarker studies of SZ and testing enrichment of SNPs that are associated both with eQTLs and existing GWAS of SZ. SZ-associated SNPs that overlap with eQTLs can be placed into gene-gene expression, protein-protein and protein-DNA interaction networks. Further, those networks can be tested by reducing/silencing the gene expression levels of critical nodes. We present pilot data to support these methods of investigation such as the use of eQTLs to annotate GWASs of SZ, which could be applied to the field of biomarker discovery. Those networks that have association with SNP markers, especially cis-regulated expression, might lead to a more clear understanding of important candidate genes that predispose to disease and alter expression. This method has general application to many complex disorders.  相似文献   

16.
MOTIVATION: Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data in public repositories makes it feasible to evaluate SNP predictions on the DNA chromatogram level. MAVIANT, a platform-independent Multipurpose Alignment VIewing and Annotation Tool, provides DNA chromatogram and alignment views and facilitates evaluation of predictions. In addition, it supports direct manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS: Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non-synonymous SNPs were analyzed for their potential effect on the protein structure/function using the PolyPhen and SIFT prediction programs. Predicted SNPs and annotations are stored in a web-based database. Using MAVIANT SNPs can visually be verified based on the DNA sequencing traces. A subset of candidate SNPs was selected for experimental validation by resequencing and genotyping. This study provides a web-based DNA chromatogram and contig browser that facilitates the evaluation and selection of candidate SNPs, which can be applied as genetic markers for genome wide genetic studies. AVAILABILITY: The stand-alone version of MAVIANT program for local use is freely available under GPL license terms at http://snp.agrsci.dk/maviant. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

17.
We have developed an online program, WCLUSTAG, for tag SNP selection that allows the user to specify variable tagging thresholds for different SNPs. Tag SNPs are selected such that a SNP with user-specified tagging threshold C will have a minimum R2 of C with at least one tag SNP. This flexible feature is useful for researchers who wish to prioritize genomic regions or SNPs in an association study. AVAILABILITY: The online WCLUSTAG program is available at http://bioinfo.hku.hk/wclustag/  相似文献   

18.
Event-related oscillations (EROs) represent highly heritable neuroelectric correlates of cognitive processes that manifest deficits in alcoholics and in offspring at high risk to develop alcoholism. Theta ERO to targets in the visual oddball task has been shown to be an endophenotype for alcoholism. A family-based genome-wide association study was performed for the frontal theta ERO phenotype using 634 583 autosomal single nucleotide polymorphisms (SNPs) genotyped in 1560 family members from 117 families densely affected by alcohol use disorders, recruited in the Collaborative Study on the Genetics of Alcoholism. Genome-wide significant association was found with several SNPs on chromosome 21 in KCNJ6 (a potassium inward rectifier channel; KIR3.2/GIRK2), with the most significant SNP at P = 4.7 × 10(-10)). The same SNPs were also associated with EROs from central and parietal electrodes, but with less significance, suggesting that the association is frontally focused. One imputed synonymous SNP in exon four, highly correlated with our top three SNPs, was significantly associated with the frontal theta ERO phenotype. These results suggest KCNJ6 or its product GIRK2 account for some of the variations in frontal theta band oscillations. GIRK2 receptor activation contributes to slow inhibitory postsynaptic potentials that modulate neuronal excitability, and therefore influence neuronal networks.  相似文献   

19.

Background

There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C).

Methodology/Principal Findings

A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates).

Conclusions/Significance

This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号