首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Shriner D 《Heredity》2011,107(5):413-420
Principal components analysis of genetic data is used to avoid inflation in type I error rates in association testing due to population stratification by covariate adjustment using the top eigenvectors and to estimate cluster or group membership independent of self-reported or ethnic identities. Eigendecomposition transforms correlated variables into an equal number of uncorrelated variables. Numerous stopping rules have been developed to identify which principal components should be retained. Recent developments in random matrix theory have led to a formal hypothesis test of the top eigenvalue, providing another way to achieve dimension reduction. In this study, I compare Velicer's minimum average partial test to a test on the basis of Tracy-Widom distribution as implemented in EIGENSOFT, the most widely used implementation of principal components analysis in genome-wide association analysis. By computer simulation of vicariance on the basis of coalescent theory, EIGENSOFT systematically overestimates the number of significant principal components. Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. Overestimating the number of significant principal components can potentially lead to a loss of power in association testing by adjusting for unnecessary covariates and may lead to incorrect inferences about group differentiation. Velicer's minimum average partial test is shown to have both smaller bias and smaller variance, often with a mean squared error of 0, in estimating the number of principal components to retain. Velicer's minimum average partial test is implemented in R code and is suitable for genome-wide genotype data with or without population labels.  相似文献   

2.
3.
One of the goals of molecular bioinformatics is decoding amino acid sequences to extract information on the principles of protein folding. However, this is difficult to perform with standard bioinformatics techniques such as multiple sequence alignment and so on. Thus, we propose a technique based on inter-residue average distance statistics to make predictions regarding the protein folding mechanisms of amino acid sequences. Our method involves constructing a kind of predicted contact map called an Average Distance Map (ADM) based on average distance statistics to pinpoint regions of possible folding nuclei for proteins. Only information on the amino acid sequence of a given protein is required for the present method. In this article, we summarize the results of studies using our method to analyze how specific protein sequences affect folding properties. In particular, we present studies on proteins in the phage lysozyme, such as the globin, fatty acid binding protein-like, and the cupredoxin-like fold families. In the present review, we characterize the 3D architectures of these proteins through the properties of the protein ADMs. Furthermore, we combine the information on the conserved residues within the regions predicted by the ADMs with our results obtained so far. Such information may help identify the folding characteristics of each protein. We discuss this possibility in the present review.  相似文献   

4.
5.
Cancer survival is one of the most important measures to evaluate the effectiveness of treatment and early diagnosis. The ultimate goal of cancer research and patient care is the cure of cancer. As cancer treatments progress, cure becomes a reality for many cancers if patients are diagnosed early and get effective treatment. If a cure does exist for a certain type of cancer, it is useful to estimate the time of cure. For cancers that impose excess risk of mortality, it is informative to understand the difference in survival between cancer patients and the general cancer-free population. In population-based cancer survival studies, relative survival is the standard measure of excess mortality due to cancer. Cure is achieved when the survival of cancer patients is equivalent to that of the general population. This definition of cure is usually called the statistical cure, which is an important measure of burden due to cancer. In this paper, a minimum version of the log-rank test is proposed to test the equivalence of cancer patients' survival using the relative survival data. Performance of the proposed test is evaluated by simulation. Relative survival data from population-based cancer registries in SEER Program are used to examine patients' survival after diagnosis for various major cancer sites.  相似文献   

6.
Gene(s) for the autosomal dominant endocrine cancer syndromes, multiple endocrine neoplasia type 2A (MEN2A), multiple endocrine neoplasia type 2B (MEN2B), and familial medullary thyroid carcinoma (MTC1) all map to the pericentromeric region of chromosome 10. Predictive testing for the inheritance of mutant alleles in individuals at risk for these disorders has been limited by the availability of highly informative and closely linked flanking markers. We describe the development of eight new markers, including two PCR-based dinucleotide repeat polymorphisms and six RFLPs that flank the disease loci. One of the dinucleotide repeat markers (sJRH-1) derives from the RBP3 locus on 10q11.2 and has a PIC of .88. The other dinucleotide repeat (sTCL-1) defines a new locus, D10S176, that maps by in situ hybridization to 10p11.2 and has a PIC of .68. We have constructed a new genetic linkage map of the pericentromeric region of chromosome 10, on the basis of 13 polymorphisms at six loci, which places the MEN2A locus between the dinucleotide repeat markers, with odds of 5,750:1 over the next most likely position. Using this set of markers, predictive genetic testing of 130 at-risk individuals from six families segregating MEN2A revealed that 95% were jointly informative with flanking markers, representing a significant improvement in genetic testing capabilities.  相似文献   

7.
Bayesian statistical methods for the estimation of hidden genetic structure of populations have gained considerable popularity in the recent years. Utilizing molecular marker data, Bayesian mixture models attempt to identify a hidden population structure by clustering individuals into genetically divergent groups, whereas admixture models target at separating the ancestral sources of the alleles observed in different individuals. We discuss the difficulties involved in the simultaneous estimation of the number of ancestral populations and the levels of admixture in studied individuals' genomes. To resolve this issue, we introduce a computationally efficient method for the identification of admixture events in the population history. Our approach is illustrated by analyses of several challenging real and simulated data sets. The software (baps), implementing the methods introduced here, is freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.  相似文献   

8.
Wang Y  Chen YH  Yang Q 《PloS one》2012,7(3):e32485
For many complex traits, single nucleotide polymorphisms (SNPs) identified from genome-wide association studies (GWAS) only explain a small percentage of heritability. Next generation sequencing technology makes it possible to explore unexplained heritability by identifying rare variants (RVs). Existing tests designed for RVs look for optimal strategies to combine information across multiple variants. Many of the tests have good power when the true underlying associations are either in the same direction or in opposite directions. We propose three tests for examining the association between a phenotype and RVs, where two of them jointly consider the common association across RVs and the individual deviations from the common effect. On one hand, similar to some of the best existing methods, the individual deviations are modeled as random effects to borrow information across multiple RVs. On the other hand, unlike the existing methods which pool individual effects towards zero, we pool them towards a possibly non-zero common effect by adding a pooled variant into the model. The common effect and the individual effects are jointly tested. We show through extensive simulations that at least one of the three tests proposed here is the most powerful or very close to being the most powerful in various settings of true models. This is appealing in practice because the direction and size of the true effects of the associated RVs are unknown. Researchers can apply the developed tests to improve power under a wide range of true models.  相似文献   

9.
Disease association with a genetic marker is often taken as a preliminary indication of linkage with disease susceptibility. However, population subdivision and admixture may lead to disease association even in the absence of linkage. In a previous paper, we described a test for linkage (and linkage disequilibrium) between a genetic marker and disease susceptibility; linkage is detected by this test only if association is also present. This transmission/disequilibrium test (TDT) is carried out with data on transmission of marker alleles from parents heterozygous for the marker to affected offspring. The TDT is a valid test for linkage and association, even when the association is caused by population subdivision and admixture. In the previous paper, we did not explicitly consider the effect of recent history on population structure. Here we extend the previous results by examining in detail the effects of subdivision and admixture, viewed as processes in population history. We describe two models for these processes. For both models, we analyze the properties of (a) the TDT as a test for linkage (and association) between marker and disease and (b) the conventional contingency statistic used with family data to test for population association. We show that the contingency test statistic does not have a chi 2 distribution if subdivision or admixture is present. In contrast, the TDT remains a valid chi 2 statistic for the linkage hypothesis, regardless of population history.  相似文献   

10.
Hereditary multiple exostoses (EXT) is an autosomal dominant disorder characterized by the presence of multiple cartilage-capped exostoses in the juxta-epiphyseal regions of the long bones. EXT is heterogeneous with at least three different locations currently having been identified on chromosomes 8, 11 and 19. We have tested a series of 29 EXT families for possible linkage to the three disease loci and estimated the probability of linkage of the disease to each locus in our series, by using an extension of the admixture test, which makes modelling of heterogeneous monogenic disease feasible. The maximum likelihood was obtained for proportions of 44%, 28% and 28% of families being linked to chromosome 8, 11 and 19, respectively. The a posteriori probability of linkage of the disease to EXT1, EXT2 and EXT3 was greater than 80% for 8/29, 5/29 and 3/29 families, respectively, and did not give evidence of a fourth locus for the disease. The present approach can be generalized to the investigation of genetic heterogeneity in other monogenic diseases, as it simultaneously estimates the location of each disease gene and the proportion of families linked to each locus. Received: 28 May 1996 / Revised: 7 October 1996  相似文献   

11.
12.
13.
SA Stanhope  AD Skol 《PloS one》2012,7(9):e42367
In a two stage genome-wide association study (2S-GWAS), a sample of cases and controls is allocated into two groups, and genetic markers are analyzed sequentially with respect to these groups. For such studies, experimental design considerations have primarily focused on minimizing study cost as a function of the allocation of cases and controls to stages, subject to a constraint on the power to detect an associated marker. However, most treatments of this problem implicitly restrict the set of feasible designs to only those that allocate the same proportions of cases and controls to each stage. In this paper, we demonstrate that removing this restriction can improve the cost advantages demonstrated by previous 2S-GWAS designs by up to 40%. Additionally, we consider designs that maximize study power with respect to a cost constraint, and show that recalculated power maximizing designs can recover a substantial amount of the planned study power that might otherwise be lost if study funding is reduced. We provide open source software for calculating cost minimizing or power maximizing 2S-GWAS designs.  相似文献   

14.
15.
Baffled 500 ml Erlenmeyer flasks were compared with conventional 2800 ml Fernbach flasks forXanthomonas campestris to produce xanthan. Bacterial growth rates were similar in both types of flask although the Fernbach flasks gave higher biomass concentrations. Xanthan production was similar in both types of flasks but different viscosities were attained. On a weight basis, the xanthan produced in baffled flasks was up to three times more viscous and more pseudoplastic or shear thinning. For screening purposes, baffled flasks are better because the rheological quality of the gum produced in them is more like that obtained in stirred fermentors than the gum from Fernbach flasks and considerably less shaker space is required, thus allowing a larger number of tests to be performed.  相似文献   

16.
17.
The leucocyte migration inhibition test (LMT) was performed by the agarose plate method with thyroid and pancreatic antigens in patients with insulin-dependent or independent diabetes mellitus. The mean migration indices with thyroglobulin, thyroid mitochondria and beef insulin were not significantly different in insulin-dependent diabetics from those in insulin-independent diabetics or normal controls. However, significant inhibition of leucocyte migration was observed in insulin-dependent diabetics when thyroid microsome or pancreatic extract was used as antigen. Although no significant difference was found in the percentages of T and B lymphocytes between insulin-dependent diabetics and insulin-independent diabetics or normal controls, the results of LMT strongly suggest the presence of cellular immunity against the thyroid and pancreas in insulin-dependent juvenile-onset diabetes.  相似文献   

18.
ABSTRACT: BACKGROUND: Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. 1 RESULTS: Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method SupportMix to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information. CONCLUSIONS: By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.  相似文献   

19.
20.
The lactate minimum test (LACmin) has been considered an important indicator of endurance exercise capacity and a single session protocol can predict the maximal steady state lactate (MLSS). The objective of this study was to determine the best swimming protocol to induce hyperlactatemia in order to assure the LACmin in rats (Rattus norvegicus), standardized to four different protocols (P) of lactate elevation. The protocols were P1: 6 min of intermittent jumping exercise in water (load of 50% of the body weight — bw); P2: two 13% bw load swimming bouts until exhaustion (tlim); P3: one tlim 13% bw load swimming bout; and P4: two 13% bw load swimming bouts (1st 30 s, 2nd to tlim), separated by a 30 s interval. The incremental phase of LACmin beginning with initial loads of 4% bw, increased in 0.5% at each 5 min. Peak lactate concentration was collected after 5, 7 and 9 min (mmol L− 1) and differed among the protocols P1 (15.2 ± 0.4, 14.9 ± 0.7, 14.8 ± 0.6) and P2 (14.0 ± 0.4, 14.9 ± 0.4, 15.5 ± 0.5) compared to P3 (5.1 ± 0.1, 5.6 ± 0.3, 5.6 ± 0.3) and P4 (4.7 ± 0.2, 6.8 ± 0.2, 7.1 ± 0.2). The LACmin determination success rates were 58%, 55%, 80% and 91% in P1, P2, P3 and P4 protocols, respectively. The MLSS did not differ from LACmin in any protocol. The LACmin obtained from P4 protocol showed better assurance for the MLSS identification in most of the tested rats.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号