The extended Simes’ test (known as GATES) and scaled chi-square test were proposed to combine a set of dependent genome-wide association signals at multiple single-nucleotide polymorphisms (SNPs) for assessing the overall significance of association at the gene or pathway levels. The two tests use different strategies to combine association p values and can outperform each other when the number of and linkage disequilibrium between SNPs vary. In this paper, we introduce a hybrid set-based test (HYST) combining the two tests for genome-wide association studies (GWASs). We describe how HYST can be used to evaluate statistical significance for association at the protein-protein interaction (PPI) level in order to increase power for detecting disease-susceptibility genes of moderate effect size. Computer simulations demonstrated that HYST had a reasonable type 1 error rate and was generally more powerful than its parents and other alternative tests to detect a PPI pair where both genes are associated with the disease of interest. We applied the method to three complex disease GWAS data sets in the public domain; the method detected a number of highly connected significant PPI pairs involving multiple confirmed disease-susceptibility genes not found in the SNP- and gene-based association analyses. These results indicate that HYST can be effectively used to examine a collection of predefined SNP sets based on prior biological knowledge for revealing additional disease-predisposing genes of modest effects in GWASs.  相似文献   

Deep sequencing will soon generate comprehensive sequence information in large disease samples. Although the power to detect association with an individual rare variant is limited, pooling variants by gene or pathway into a composite test provides an alternative strategy for identifying susceptibility genes. We describe a statistical method for detecting association of multiple rare variants in protein-coding genes with a quantitative or dichotomous trait. The approach is based on the regression of phenotypic values on individuals'' genotype scores subject to a variable allele-frequency threshold, incorporating computational predictions of the functional effects of missense variants. Statistical significance is assessed by permutation testing with variable thresholds. We used a rigorous population-genetics simulation framework to evaluate the power of the method, and we applied the method to empirical sequencing data from three disease studies.  相似文献   

Conflict analysis has been used as an important tool in economic, business, governmental and political dispute, games, management negotiations, military operations and etc. There are many mathematical formal models have been proposed to handle conflict situations and one of the most popular is rough set theory. With the ability to handle vagueness from the conflict data set, rough set theory has been successfully used. However, computational time is still an issue when determining the certainty, coverage, and strength of conflict situations. In this paper, we present an alternative approach to handle conflict situations, based on some ideas using soft set theory. The novelty of the proposed approach is that, unlike in rough set theory that uses decision rules, it is based on the concept of co-occurrence of parameters in soft set theory. We illustrate the proposed approach by means of a tutorial example of voting analysis in conflict situations. Furthermore, we elaborate the proposed approach on real world dataset of political conflict in Indonesian Parliament. We show that, the proposed approach achieves lower computational time as compared to rough set theory of up to 3.9%.  相似文献   

We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels.  相似文献   

一种优化的MALDI-TOF质谱分析多肽C端序列方法   总被引:4,自引:0,他引:4  
利用基质辅助激光解吸飞行时间 (MALDI TOF)质谱技术 ,测定羧肽酶Y消化蛋白质和多肽 .所产生的缩短肽片段的质量 ,在一张谱图上得到各个不同酶解时间所形成的肽质量梯度 .根据谱图中相邻两肽峰的质量差得到切去氨基酸的信息 ,从而读出C端氨基酸序列 .在pmol水平下对人促肾上腺皮质激素片段 (ACTH 1 3 9) ,人血管紧张肽片段 (angiotensin Ⅰ ,angiotensin Ⅱ )的C端序列进行了测定 .讨论了在不同浓度 ,不同时间 ,不同温度下酶解所得到的序列测定结果 .在优化条件下 ,人ACTH片段得到了C端 2 0个氨基酸残基顺序 ,为目前C端序列分析所得到的最长序列  相似文献   

Consider k independent exponential populations with location parameters μ1,…, μk and a common scale parameter or standard deviation θ. Let μ(k) be the largest of the μ's and define a population to be good if its location parameter exceeds μ(k) –Δ1. A selection procedure is proposed to select a subset of the k populations which includes the good populations with probability at least P*, a pre-assigned value. Simultaneous confidence intervals, that can be derived with the proposed selection procedure, are discussed. Moreover, if populations with locations below μ(k) –δ2, (δ2 > δ1) are “bad”, a selection procedure is proposed and a sample size is determined so that the probability of omitting a “good” population or selecting a “bad” population is at most 1 – P*.  相似文献   

Genotype imputation has become standard practice in modern genetic studies. As sequencing-based reference panels continue to grow, increasingly more markers are being well or better imputed but at the same time, even more markers with relatively low minor allele frequency are being imputed with low imputation quality. Here, we propose new methods that incorporate imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all potential genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. For scenario I, we have developed an expectation-maximization likelihood-ratio test for association based on posterior probabilities. When only imputed dosages are available (scenario II), we first sample the genotype probabilities from its posterior distribution given the dosages, and then apply the EM-LRT on the sampled probabilities. Our simulations show that type I error of the proposed EM-LRT methods under both scenarios are protected. Compared with existing methods, EM-LRT-Prob (for scenario I) offers optimal statistical power across a wide spectrum of MAF and imputation quality. EM-LRT-Dose (for scenario II) achieves a similar level of statistical power as EM-LRT-Prob and, outperforms the standard Dosage method, especially for markers with relatively low MAF or imputation quality. Applications to two real data sets, the Cebu Longitudinal Health and Nutrition Survey study and the Women’s Health Initiative Study, provide further support to the validity and efficiency of our proposed methods.  相似文献   

Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.  相似文献   

In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.
This is a PLOS Computational Biology Software article.

The sequential procedure for testing up to k upper outliers proposed by Kimber (1982) for one-parameter exponential distribution is modified to a two-parameter exponential distribution. Further null distributions of some test statistics for an upper outlier-pair in a complete or censored sample from a two-parameter exponential distribution are given. Percentage points of the statistic T1 are tabulated.  相似文献   

Summary .  Hidden population substructure in case–control data has the potential to distort the performance of Cochran–Armitage trend tests (CATTs) for genetic associations. Three possible scenarios that may arise are investigated here: (i) heterogeneity of genotype frequencies across unidentified subpopulations (PSI), (ii) heterogeneity of genotype frequencies and disease risk across unidentified subpopulations (PSII), and (iii) cryptic correlations within unidentified subpopulations. A unified approach is presented for deriving the bias and variance distortion under the three scenarios for any CATT in a general family. Using these analytical formulas, we evaluate the excess type I errors of the CATTs numerically in the presence of population substructure. Our results provide insight into the properties of some proposed corrections for bias and variance distortion and show why they may not fully correct for the effects of population substructure.  相似文献   

There is strong evidence that rare variants are involved in complex disease etiology. The first step in implicating rare variants in disease etiology is their identification through sequencing in both randomly ascertained samples (e.g., the 1,000 Genomes Project) and samples ascertained according to disease status. We investigated to what extent rare variants will be observed across the genome and in candidate genes in randomly ascertained samples, the magnitude of variant enrichment in diseased individuals, and biases that can occur due to how variants are discovered. Although sequencing cases can enrich for casual variants, when a gene or genes are not involved in disease etiology, limiting variant discovery to cases can lead to association studies with dramatically inflated false positive rates.  相似文献   

The availability of a large number of dense SNPs, high-throughput genotyping and computation methods promotes the application of family-based association tests. While most of the current family-based analyses focus only on individual traits, joint analyses of correlated traits can extract more information and potentially improve the statistical power. However, current TDT-based methods are low-powered. Here, we develop a method for tests of association for bivariate quantitative traits in families. In particular, we correct for population stratification by the use of an integration of principal component analysis and TDT. A score test statistic in the variance-components model is proposed. Extensive simulation studies indicate that the proposed method not only outperforms approaches limited to individual traits when pleiotropic effect is present, but also surpasses the power of two popular bivariate association tests termed FBAT-GEE and FBAT-PC, respectively, while correcting for population stratification. When applied to the GAW16 datasets, the proposed method successfully identifies at the genome-wide level the two SNPs that present pleiotropic effects to HDL and TG traits.  相似文献   

为明确银川番茄(Lycopersicon esculentum)是否遭受了番茄斑萎病毒(TSWV)的危害, 采用国家标准TSWV RT- PCR检测技术对银川番茄上采集的14份疑似感染TSWV病叶样本进行分子鉴定, 对克隆得到的核衣壳蛋白基因N (Nucleocapsid)序列进行多序列比对和系统进化树分析, 随后对PCR阳性样本进行蛋白检测。结果表明, 14份病叶样本中有8份扩增出长度为394 bp的TSWV N基因序列, 且8条序列完全一致; 获得的银川番茄TSWV分离物与云南番茄、中国莴苣(Lactuca sativa)、中国鸢尾(Iris tectorum)和重庆辣椒(Capsicum annuum) TSWV分离物相对近缘, 与山东、黑龙江和北京等地及国外TSWV分离物相对远缘; 利用TSWV的抗体通过Western blot对8个PCR阳性样本进一步检测, 结果也证实8个阳性样本中存在TSWV感染。该研究首次通过分子鉴定及蛋白检测证明银川番茄上存在TSWV感染, 需要加快抗TSWV番茄品种的选育工作。  相似文献   

