首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.  相似文献   

2.
To detect the role of a candidate gene for a trait in a sample of individuals, we may test SNP haplotype or diplotype effects. For a limited sample size, many haplotype or diplotype categories may contain few individuals. This involves a power decrease when testing the association between the trait and the haplotypes or diplotypes as these categories provide little additional information while increasing the degrees of freedom. The present paper proposes a new strategy to group rare categories based on a measure of similarity between haplotypes or diplotypes and compares it to two other possible strategies to deal with rare categories: a SNP selection strategy based on haplotype diversity, and a grouping strategy that pools all rare categories into a single baseline group. This comparison is performed by means of simulation under four scenarios. We show that this new strategy shows the largest increase in power irrespective of the model underlying the candidate gene in the studied trait. This strategy therefore provides a powerful alternative to currently used methods to reduce the number of rare categories.  相似文献   

3.
4.
Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.  相似文献   

5.
Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.  相似文献   

6.
Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.  相似文献   

7.
The acetylating activity of N-acetyltransferase 2 (NAT2) has critical implications for therapeutics and disease susceptibility. To date, several polymorphisms that alter the enzymatic activity and/or protein stability of NAT2 have been identified. We examined the distribution and frequency of NAT2 genotypes in the Mexican population. Among 250 samples amplified and sequenced for the NAT2 gene, we found seven different SNPs; the most frequent allele was 803 A>G (35.8%), followed by 282 C>T, 341 T>C, and 481 C>T. There were no differences in the distribution of SNPs between healthy subjects and cancer patients. These eight polymorphisms defined 26 diplotypes; 11.6% were wild type (NAT2*4/NAT2*4), while the most common diplotype was NAT2*4/NAT2*5B, present in 17.2%. We did not identify other common polymorphisms. The results were compared with the NAT2 SNPs reported from other populations. All but the Turkish population was significantly different from ours. We conclude that the mixed-race Mexican population requires special attention because NAT2 genotype frequencies differ from those in other regions of the world.  相似文献   

8.
Insulin receptor-related receptor (IRR) is an orphan receptor tyrosine kinase of the insulin receptor family, and involved in the growth and reproduction processes of the Pacific oyster Crassostrea gigas. Polymorphisms of the IRR gene were evaluated for associations with growth performance of 336 individuals in five families, and further confirmed in 206 individuals from three selectively bred strains for fast growth. Two of the six identified synonymous mutations (C.1996G > A and C.2110C > T) were significantly associated with growth performance in the families and strains. Five diplotypes were constructed based on the two growth-related SNPs, and diplotypes analysis revealed that D3 (GGTT) might be the most advantageous diplotype for growth traits. The results suggest that two SNPs (C.1996G > A and C.2110C > T) in IRR gene are potentially associated with growth performance of C. gigas, and could serve as genetic markers for fast growth in oyster breeding.  相似文献   

9.
Estrogen receptor (ERalpha) modifies the expression of genes involved in cell growth, proliferation and differentiation through binding to estrogen response elements (EREs) located in a number of gene promoters, so the ERalpha gene is considered as an important factor affecting reproductive endocrinology in Japanese flounder (Paralichthys olivaceus). In this study, twelve single nucleotide polymorphisms (SNPs) within eight CDS exons and 1 kb of 3'-UTR of the ERalpha gene were tested to association with four reproductive traits in a population of 119 Japanese flounder individuals with polymerase chain reaction-single stranded conformational polymorphism (PCR-SSCP). The association analysis of SNPs within Japanese flounder ERalpha gene with the reproductive traits was carried out using General Linear Model (GLM) estimation. Results indicated that two SNPs in the exon4 of ERalpha gene, P1 (A803G and C864T), were significantly associated with hepatosomatic index (HSI) (P<0.05) in female Japanese flounder. Other ten SNPs in 3'-UTR associated to serum 17beta-estradiol (E(2)) and HSI showed that P2 (A1982T) was significantly associated with E(2) (P<0.01) and P3 (A2149G, 2181TTACAAG2182 insertion or deletion, T2324G, A2359G and G2391A) was significantly associated with HSI (P<0.05) in female Japanese flounder. However, P2 (A1982T) and P4 (G2256T, T2294C, T2309G and A2333T) had significant effects on E(2) (P<0.05 and P<0.01, respectively) in male Japanese flounder. In addition, there were significant associations between diplotype D1 based on fourteen SNPs and reproductive traits. The genetic effects for HSI (female) or E(2) (male) of diplotype D1 were significantly higher than those of other eight diplotypes (P<0.05), respectively. Our findings implied that P1 of ERalpha gene affecting the reproductive traits could be a potential QTN (quantitative trait nucleotide) which would be useful genetic marker in the selection of some reproductive traits for its in Japanese flounder.  相似文献   

10.
11.
Model-based (likelihood and Bayesian) and non-model-based (PCA and K-means clustering) methods were developed to identify populations and assign individuals to the identified populations using marker genotype data. Model-based methods are favoured because they are based on a probabilistic model of population genetics with biologically meaningful parameters and thus produce results that are easily interpretable and applicable. Furthermore, they often yield more accurate structure inferences than non-model-based methods. However, current model-based methods either are computationally demanding and thus applicable to small problems only or use simplified admixture models that could yield inaccurate results in difficult situations such as unbalanced sampling. In this study, I propose new likelihood methods for fast and accurate population admixture inference using genotype data from a few multiallelic microsatellites to millions of diallelic SNPs. The methods conduct first a clustering analysis of coarse-grained population structure by using the mixture model and the simulated annealing algorithm, and then an admixture analysis of fine-grained population structure by using the clustering results as a starting point in an expectation maximisation algorithm. Extensive analyses of both simulated and empirical data show that the new methods compare favourably with existing methods in both accuracy and running speed. They can analyse small datasets with just a few multiallelic microsatellites but can also handle in parallel terabytes of data with millions of markers and millions of individuals. In difficult situations such as many and/or lowly differentiated populations, unbalanced or very small samples of individuals, the new methods are substantially more accurate than other methods.Subject terms: Population genetics, Evolutionary ecology  相似文献   

12.
Many studies in human genetics compare informativeness of single‐nucleotide polymorphisms (SNPs) and microsatellites (single sequence repeats; SSR) in genome scans, but it is difficult to transfer the results directly to livestock because of different population structures. The aim of this study was to determine the number of SNPs needed to obtain the same differentiation power as with a given standard set of microsatellites. Eight chicken breeds were genotyped for 29 SSRs and 9216 SNPs. After filtering, only 2931 SNPs remained. The differentiation power was evaluated using two methods: partitioning of the Euclidean distance matrix based on a principal component analysis (PCA) and a Bayesian model‐based clustering approach. Generally, with PCA‐based partitioning, 70 SNPs provide a comparable resolution to 29 SSRs. In model‐based clustering, the similarity coefficient showed significantly higher values between repeated runs for SNPs compared to SSRs. For the membership coefficients, reflecting the proportion to which a fraction segment of the genome belongs to the ith cluster, the highest values were obtained for 29 SSRs and 100 SNPs respectively. With a low number of loci (29 SSRs or ≤100 SNPs), neither marker types could detect the admixture in the Gödöllö Nhx population. Using more than 250 SNPs allowed a more detailed insight into the genetic architecture. Thus, the admixed population could be detected. It is concluded that breed differentiation studies will substantially gain power even with moderate numbers of SNPs.  相似文献   

13.
Sequences associated with human iris pigmentation   总被引:7,自引:0,他引:7  
To determine whether and how common polymorphisms are associated with natural distributions of iris colors, we surveyed 851 individuals of mainly European descent at 335 SNP loci in 13 pigmentation genes and 419 other SNPs distributed throughout the genome and known or thought to be informative for certain elements of population structure. We identified numerous SNPs, haplotypes, and diplotypes (diploid pairs of haplotypes) within the OCA2, MYO5A, TYRP1, AIM, DCT, and TYR genes and the CYP1A2-15q22-ter, CYP1B1-2p21, CYP2C8-10q23, CYP2C9-10q24, and MAOA-Xp11.4 regions as significantly associated with iris colors. Half of the associated SNPs were located on chromosome 15, which corresponds with results that others have previously obtained from linkage analysis. We identified 5 additional genes (ASIP, MC1R, POMC, and SILV) and one additional region (GSTT2-22q11.23) with haplotype and/or diplotypes, but not individual SNP alleles associated with iris colors. For most of the genes, multilocus gene-wise genotype sequences were more strongly associated with iris colors than were haplotypes or SNP alleles. Diplotypes for these genes explain 15% of iris color variation. Apart from representing the first comprehensive candidate gene study for variable iris pigmentation and constituting a first step toward developing a classification model for the inference of iris color from DNA, our results suggest that cryptic population structure might serve as a leverage tool for complex trait gene mapping if genomes are screened with the appropriate ancestry informative markers.  相似文献   

14.
A genotype calling algorithm for affymetrix SNP arrays   总被引:11,自引:0,他引:11  
MOTIVATION: A classification algorithm, based on a multi-chip, multi-SNP approach is proposed for Affymetrix SNP arrays. Current procedures for calling genotypes on SNP arrays process all the features associated with one chip and one SNP at a time. Using a large training sample where the genotype labels are known, we develop a supervised learning algorithm to obtain more accurate classification results on new data. The method we propose, RLMM, is based on a robustly fitted, linear model and uses the Mahalanobis distance for classification. The chip-to-chip non-biological variance is reduced through normalization. This model-based algorithm captures the similarities across genotype groups and probes, as well as across thousands of SNPs for accurate classification. In this paper, we apply RLMM to Affymetrix 100 K SNP array data, present classification results and compare them with genotype calls obtained from the Affymetrix procedure DM, as well as to the publicly available genotype calls from the HapMap project.  相似文献   

15.
MOTIVATION: A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. RESULTS: We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach.  相似文献   

16.
We employed a multi-scale clustering methodology known as “data cloud geometry” to extract functional connectivity patterns derived from functional magnetic resonance imaging (fMRI) protocol. The method was applied to correlation matrices of 106 regions of interest (ROIs) in 29 individuals with autism spectrum disorders (ASD), and 29 individuals with typical development (TD) while they completed a cognitive control task. Connectivity clustering geometry was examined at both “fine” and “coarse” scales. At the coarse scale, the connectivity clustering geometry produced 10 valid clusters with a coherent relationship to neural anatomy. A supervised learning algorithm employed fine scale information about clustering motif configurations and prevalence, and coarse scale information about intra- and inter-regional connectivity; the algorithm correctly classified ASD and TD participants with sensitivity of and specificity of . Most of the predictive power of the logistic regression model resided at the level of the fine-scale clustering geometry, suggesting that cellular versus systems level disturbances are more prominent in individuals with ASD. This article provides validation for this multi-scale geometric approach to extracting brain functional connectivity pattern information and for its use in classification of ASD.  相似文献   

17.
The ability to maintain an appropriate physical distance (i.e., interpersonal distance) from others is a critical aspect of social interaction and contributes importantly to real-life social functioning. In Study 1, using parent-report data that had been acquired on a large number of individuals (ages 4–18 years) for the Autism Genetic Resource Exchange and the Simons Simplex Collection, we found that those with Autism Spectrum Disorder (ASD; n = 766) more often violated the space of others compared to their unaffected siblings (n = 766). This abnormality held equally across ASD diagnostic categories, and correlated with clinical measures of communication and social functioning. In Study 2, laboratory experiments in a sample of high-functioning adults with ASD demonstrated an altered relationship between interpersonal distance and personal space, and documented a complete absence of personal space in 3 individuals with ASD. Furthermore, anecdotal self-report from several participants confirmed that violations of social distancing conventions continue to occur in real-world interactions through adulthood. We suggest that atypical social distancing behavior offers a practical and sensitive measure of social dysfunction in ASD, and one whose psychological and neurological substrates should be further investigated.  相似文献   

18.
建鲤ODC1基因型与增重的相关性分析   总被引:1,自引:0,他引:1  
构建了建鲤(Cyprinus carpio var. jian)鸟氨酸脱羧酶(Ornithine decarboxylase,ODC)jlODC1a基因上6个和jlODC1b基因上4个SNP位点的PCR-RFLP方法,检测了这10个位点在12个家系约900尾建鲤选育群体中的基因型,各位点的基因型频率存在差异,最小等位基因频率(MAF)为0.140.48。各位点不同基因型与增重相关分析结果,其中7个SNPs与建鲤增重显性相关的位点,ODC1s基因上与雌鱼增重相关的SNP位点(7个)较雄鱼(4个)多。标记富集结果表明富集与建鲤增重相关的优势基因型的SNP个数越多的个体增重速度越快SNP个数越多的个体增重速度越快,富集4个的平均增重显著快于富集03的个体增重,且比0标记的快约14%,这反映出生长为数量性状。进一步对所检测位点进行双倍型分析,结果显示具有四个优势基因型且全部杂合优势基因型的4567(XXXXXXACCTCTCT)组的增重最快,比0优势基因型的增重快达26.6%,可以考虑用于今后的快增长建鲤的选育计划中。此外,双倍型分析结果还表明,不同位点之间可能存在或颉抗或协同的互作,如1和4之间存在拮抗关系,因此在今后的选育计划中,在考虑标记富集的情况下还应考虑标记之间的关系。宜在选择互为协同作用优势基因型的前提下,富集尽可能多的SNP标记。    相似文献   

19.
Interval-based distance function for identifying RNA structure candidates   总被引:1,自引:0,他引:1  
Many clustering approaches have been developed for biological data analysis, however, the application of traditional clustering algorithms for RNA structure data analysis is still a challenging issue. This arises from the existence of complex secondary structures while clustering. One of the most critical issues of cluster analysis is the development of appropriate distance measures in high dimensional space. The traditional distance measures focus on scale issues, but ignores the correlation between two values. This article develops a novel interval-based distance (Hausdorff) measure for computing the similarity between characterized structures. Three relationships including perfect match, partially overlapped and non-overlapped are considered. Finally, we demonstrate the methods by analyzing a data set of RNA secondary structures from the Rfam database.  相似文献   

20.
研究旨在探讨生长激素释放激素基因(Growth hormone-releasing hormone,GHRH)对斑点叉尾鲖(Ictalurus punctatus)生长性状的影响。采用DNA混池测序法筛选GHRH基因的单核苷酸多态性(Singlenucleotide polymorphisms,SNPs)位点,使用SNaPshot法将筛选到的SNPs多态性位点进行分型,并对这些位点进行连锁不平衡和单倍型分析。结果表明,在GHRH基因内含子区域共检测到4个SNPs位点,并成功地对3个位点进行了分型,3个位点间均不存在强连锁不平衡;3个SNPs位点在176尾斑点叉尾鲖中形成了6种有效单倍型。关联分析表明SNP位点g.6301 GA的AA基因型的体质量显著性地高于AG和GG型(P0.05),比群体的平均体质量高14%。单倍型组合H1/H4和H1/H5个体的体质量和体长极显著性地高于其他单倍型组合(P0.01),体质量比群体平均体质量分别高30%和15%,体长比群体平均体长分别高7%和6%。研究为斑点叉尾鲖生长性状分子标记辅助选育和QTL定位提供了参考依据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号