首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
C A Beam  H S Wieand 《Biometrics》1991,47(3):907-919
In this paper we study a statistic that is suitable for comparing a discrete diagnostic marker to one or more continuous diagnostic markers. Test procedures and confidence intervals are based on asymptotic normality. The statistic is applicable for correlated data in which all the markers are obtained for each subject. The statistic was studied for use in comparing two markers for rectal bleeding. Examples for this application and two more general applications are presented.  相似文献   

2.
A test statistic to detect errors in sib-pair relationships.   总被引:4,自引:2,他引:2  
Several authors have proposed algorithms to detect Mendelian errors in human genetic linkage data. Most currently available methods use likelihood-based methods on multiplex family data to identify typing or pedigree errors. These algorithms cannot be applied in many sib-pair collections, because of lack of parental-genotype information. Nonetheless, misspecifying the relationships between individuals has serious consequences for sib-pair linkage studies: false relationships bias the statistics designed to identify linkage with disease phenotypes. To test the hypothesis that two individuals are sibs, we propose a test statistic based on the summation, over a large number of genetic markers, of the number of alleles shared identical by state by a pair of individuals, for each marker. The test statistic has an approximately normal distribution under the null hypothesis, and extreme negative values correspond to nonsib pairs. Power and significance studies show that the test statistic calculated by use of 50 unlinked markers has 96% power to detect half-sibs and has 100% power to detect unrelated individuals as not full-sib pairs, with a 5% false-positive rate. Furthermore, extreme positive values of the test statistic identify sibs as MZ twins.  相似文献   

3.
Understanding the genetics of biological diversification across micro‐ and macro‐evolutionary time scales is a vibrant field of research for molecular ecologists as rapid advances in sequencing technologies promise to overcome former limitations. In palms, an emblematic, economically and ecologically important plant family with high diversity in the tropics, studies of diversification at the population and species levels are still hampered by a lack of genomic markers suitable for the genotyping of large numbers of recently diverged taxa. To fill this gap, we used a whole genome sequencing approach to develop target sequencing for molecular markers in 4,184 genome regions, including 4,051 genes and 133 non‐genic putatively neutral regions. These markers were chosen to cover a wide range of evolutionary rates allowing future studies at the family, genus, species and population levels. Special emphasis was given to the avoidance of copy number variation during marker selection. In addition, a set of 149 well‐known sequence regions previously used as phylogenetic markers by the palm biological research community were included in the target regions, to open the possibility to combine and jointly analyse already available data sets with genomic data to be produced with this new toolkit. The bait set was effective for species belonging to all three palm sub‐families tested (Arecoideae, Ceroxyloideae and Coryphoideae), with high mapping rates, specificity and efficiency. The number of high‐quality single nucleotide polymorphisms (SNPs) detected at both the sub‐family and population levels facilitates efficient analyses of genomic diversity across micro‐ and macro‐evolutionary time scales.  相似文献   

4.
5.
Both theoretical and applied studies have proven that the utility of single nucleotide polymorphism (SNP) markers in linkage analysis is more powerful and cost-effective than current microsatellite marker assays. Here we performed a whole-genome scan on 115 White, non-Hispanic families segregating for alcohol dependence, using one 10.3-cM microsatellite marker set and two SNP data sets (0.33-cM, 0.78-cM spacing). Two definitions of alcohol dependence (ALDX1 and ALDX2) were used. Our multipoint nonparametric linkage analysis found alcoholism was nominal linked to 12 genomic regions. The linkage peaks obtained by using the microsatellite marker set and the two SNP sets had a high degree of correspondence in general, but the microsatellite marker set was insufficient to detect some nominal linkage peaks. The presence of linkage disequilibrium between markers did not significantly affect the results. Across the entire genome, SNP datasets had a much higher average linkage information content (0.33 cM: 0.93, 0.78 cM: 0.91) than did microsatellite marker set (0.57). The linkage peaks obtained through two SNP datasets were very similar with some minor differences. We conclude that genome-wide linkage analysis by using approximately 5,000 SNP markers evenly distributed across the human genome is sufficient and might be more powerful than current 10-cM microsatellite marker assays.  相似文献   

6.
An entropy-based statistic for genomewide association studies   总被引:8,自引:0,他引:8       下载免费PDF全文
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.  相似文献   

7.
Accurate determination of patterns of genetic variation provides a powerful inferential tool for studies of evolution and conservation. For more than 30 years, enzyme electrophoresis was the preferred method for elucidating these patterns. As a result, evolutionary geneticists have acquired considerable understanding of the relationship between patterns of allozyme variation and aspects of evolutionary process. Myriad molecular markers and statistical analyses have since emerged, enabling improved estimates of patterns of genetic diversity. With these advances, there is a need to evaluate results obtained with different markers and analytical methods. We present a comparative study of gene statistic estimates (F(ST), G(ST), F(IS), H(S), and H(T)) calculated from an intersimple sequence repeat (ISSR) and an allozyme data set derived from the same populations using both standard and Bayesian statistical approaches. Significant differences were found between estimates, owing to the effects of marker and analysis type. Most notably, F(ST) estimates for codominant data differ between Bayesian and standard approaches. Levels of statistical significance are greatly affected by methodology and, in some cases, are not associated with similar levels of biological significance. Our results suggest that caution should be used in equating or comparing results obtained using different markers and/or methods of analysis.  相似文献   

8.
Mapping multiple Quantitative Trait Loci by Bayesian classification   总被引:2,自引:0,他引:2       下载免费PDF全文
Zhang M  Montooth KL  Wells MT  Clark AG  Zhang D 《Genetics》2005,169(4):2305-2318
We developed a classification approach to multiple quantitative trait loci (QTL) mapping built upon a Bayesian framework that incorporates the important prior information that most genotypic markers are not cotransmitted with a QTL or their QTL effects are negligible. The genetic effect of each marker is modeled using a three-component mixture prior with a class for markers having negligible effects and separate classes for markers having positive or negative effects on the trait. The posterior probability of a marker's classification provides a natural statistic for evaluating credibility of identified QTL. This approach performs well, especially with a large number of markers but a relatively small sample size. A heat map to visualize the results is proposed so as to allow investigators to be more or less conservative when identifying QTL. We validated the method using a well-characterized data set for barley heading values from the North American Barley Genome Mapping Project. Application of the method to a new data set revealed sex-specific QTL underlying differences in glucose-6-phosphate dehydrogenase enzyme activity between two Drosophila species. A simulation study demonstrated the power of this approach across levels of trait heritability and when marker data were sparse.  相似文献   

9.
The number of marker loci required to answer a given research question satisfactorily is especially important for dominant markers since they have a lower information content than co‐dominant marker systems. In this study, we used simulated dominant marker data sets to determine the number of dominant marker loci needed to obtain satisfactory results from two popular population genetic analyses: STRUCTURE and AMOVA (analysis of molecular variance). Factors such as migration, level of population differentiation, and unequal sampling were varied in the data sets to mirror a range of realistic research scenarios. AMOVA performed well under all scenarios with a modest quantity of markers while STRUCTURE required a greater number, especially when populations were closely related. The popular ΔK method of determining the number of genetically distinct groups worked well when sampling was balanced, but underestimated the true number of groups with unbalanced sampling. These results provide a window through which to interpret previous work with dominant markers and we provide a protocol for determining the number of markers needed for future dominant marker studies.  相似文献   

10.
To reduce the effects of skin movement artefacts and apparent joint dislocations in the kinematics of whole body movement derived from marker locations, global optimisation procedures with a chain model have been developed. These procedures can also be used to reduce the number of markers when self-occlusions are hard to avoid. This paper assesses the kinematics precision of three marker sets: 16, 11 and 7 markers, for movements on high bar with straddled piked posture. A three-dimensional person-specific chain model was defined with 9 parameters and 12 degrees of freedom and an iterative procedure optimised the gymnast posture for each frame of the three marker sets. The time histories of joint angles obtained from the reduced marker sets were compared with those from the 16 marker set by means of a root mean square difference measure. Occlusions of medial markers fixed on the lower limb occurred when the legs were together and the pelvis markers disappeared primarily during the piked posture. Despite these occlusions, reconstruction was possible with 16, 11 and 7 markers. The time histories of joint angles were similar; the main differences were for the thigh mediolateral rotation and the knee flexion because the knee was close to full extension. When five markers were removed, the average angles difference was about 3 degrees . This difference increased to 9 degrees for the seven marker set. It is concluded that kinematics of sports movement can be reconstructed using a chain model and a global optimisation procedure for a reduced number of markers.  相似文献   

11.
Dense sets of hundreds of thousands of markers have been developed for genome-wide association studies. These marker sets are also beneficial for linkage analysis of large, deep pedigrees containing distantly related cases. It is impossible to analyse jointly all genotypes in large pedigrees using the Lander–Green Algorithm, however, as marker density increases it becomes less crucial to analyse all individuals’ genotypes simultaneously. In this report, an approximate multipoint non-parametric technique is described, where large pedigrees are split into many small pedigrees, each containing just two cases. This technique is demonstrated, using phased data from the International Hapmap Project to simulate sets of 10,000, 50,000 and 250,000 markers, showing that it becomes increasingly accurate as more markers are genotyped. This method allows routine linkage analysis of large families with dense marker sets and represents a more easily applied alternative to Monte Carlo Markov Chain methods.  相似文献   

12.
We present a class of likelihood-based score statistics that accommodate genotypes of both unrelated individuals and families, thereby combining the advantages of case-control and family-based designs. The likelihood extends the one proposed by Schaid and colleagues (Schaid and Sommer 1993, 1994; Schaid 1996; Schaid and Li 1997) to arbitrary family structures with arbitrary patterns of missing data and to dense sets of multiple markers. The score statistic comprises two component test statistics. The first component statistic, the nonfounder statistic, evaluates disequilibrium in the transmission of marker alleles from parents to offspring. This statistic, when applied to nuclear families, generalizes the transmission/disequilibrium test to arbitrary numbers of affected and unaffected siblings, with or without typed parents. The second component statistic, the founder statistic, compares observed or inferred marker genotypes in the family founders with those of controls or those of some reference population. The founder statistic generalizes the statistics commonly used for case-control data. The strengths of the approach include both the ability to assess, by comparison of nonfounder and founder statistics, the potential bias resulting from population stratification and the ability to accommodate arbitrary family structures, thus eliminating the need for many different ad hoc tests. A limitation of the approach is the potential power loss and/or bias resulting from inappropriate assumptions on the distribution of founder genotypes. The systematic likelihood-based framework provided here should be useful in the evaluation of both the relative merits of case-control and various family-based designs and the relative merits of different tests applied to the same design. It should also be useful for genotype-disease association studies done with the use of a dense set of multiple markers.  相似文献   

13.
Linkage analysis can be problematic in humans because of the lack of large, multigenerational pedigrees and the difficulties in obtaining phenotypic data on all family members. In contrast, large, captive colonies of rhesus macaque are a potentially valuable resource for linkage studies because detailed phenotypic and genealogical data are kept, inbreeding is avoided, and DNA samples can usually be obtained. Microsatellite marker sets for genome-wide screening are available in a number of species, but not for the rhesus macaque. We tested primers to 400 human microsatellite markers from a genome-wide mapping set using DNA from nine unrelated female rhesus macaques. We found that 76 (19%) of the primers amplified a polymorphic product using the standard protocols for human DNA. The average heterozygosity of the markers in humans was 0.80, compared to 0.65 in the rhesus macaques. This study provides preliminary data, which could be used toward the development of a linkage mapping set in this species. There would be a need, however, to confirm the Mendelian inheritance of the markers.  相似文献   

14.
The affected-pedigree-member (APM) method of linkage analysis is a nonparametric statistic that tests for nonrandom cosegregation of a disease and marker loci. The APM statistic is based on the observation that if a marker locus is near a disease-susceptibility locus, then affected individuals within a family should be more similar at the marker locus than is expected by chance. The APM statistic measures marker similarity in terms of identity by state (IBS) of marker alleles; that is, two alleles are IBS if they are the same, regardless of their ancestral origin. Since the APM statistic measures increased marker similarity, it makes no assumptions concerning how the disease is inherited; this can be an advantage when dealing with complex diseases for which the mode of inheritance is difficult to determine. We investigate here the power of the APM statistic to detect linkage in the context of a genomewide search. In such a search, the APM statistic is evaluated at a grid of markers. Then regions with high APM statistics are investigated more thoroughly by typing more markers in the region. Using simulated data, we investigate various search strategies and recommend an optimal search strategy that maximizes the power to detect linkage while minimizing the false-positive rate and number of markers. We determine an optimal series of three increasing cut-points and an independent criterion for significance.  相似文献   

15.
Bjørnstad A  Westad F  Martens H 《Hereditas》2004,141(2):149-165
The utility of a relatively new multivariate method, bi-linear modelling by cross-validated partial least squares regression (PLSR), was investigated in the analysis of QTL. The distinguishing feature of PLSR is to reveal reliable covariance structures in data of different types with regard to the same set objects. Two matrices X (here: genetic markers) and Y (here: phenotypes) are interactively decomposed into latent variables (PLS components, or PCs) in a way which facilitates statistically reliable and graphically interpretable model building. Natural collinearities between input variables are utilized actively to stabilise the modelling, instead of being treated as a statistical problem. The importance of cross-validation/jack-knifing as an intuitively appealing way to avoid overfitting, is emphasized. Two datasets from chromosomal mapping studies of different complexity were chosen for illustration (QTL for tomato yield and for oat heading date). Results from PLSR analysis were compared to published results and to results using the package PLABQTL in these data sets. In all cases PLSR gave at least similar explained validation variances as the reported studies. An attractive feature is that PLSR allows the analysis of several traits/replicates in one analysis, and the direct visual identification of individuals with desirable marker genotypes. It is suggested that PLSR may be useful in structural and functional genomics and in marker assisted selection, particularly in cases with limited number of objects.  相似文献   

16.
Molecular markers are frequently used to study genetic variation among individuals within or between populations. Differences in marker banding patterns can be used to verify if individuals do, or do not, represent distinct groups or populations. Only in 2005, more than 500 studies used molecular markers to group individuals in clusters. Such studies make use of an arbitrary number of molecular markers from each of an arbitrary number of individuals presumed to represent distinct genotypes. However, the greater the genetic variation, the more likely a larger number of individuals and markers will be needed to capture a population's genetic signature. The numbers of both, markers and individuals included thus affect the way in which individuals are organized through cluster analyses, thereby affecting the conclusions drawn. Here we present a method that provides statistical criteria to verify that individual and marker sample sizes are sufficient to accurately depict genetic differentiation among different populations. Our method uses a resampling technique to assess the reproducibility of obtaining a particular grouping pattern for specific data sets. It thus, allows to estimate the robustness of the results obtained without including additional individuals, or markers.  相似文献   

17.
Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.  相似文献   

18.
The capability of molecular markers to provide information of genetic structure is influenced by their number and the way they are chosen. This study evaluates the effects of single nucleotide polymorphism (SNP) number and selection strategy on estimates of germplasm diversity and population structure for different types of barley germplasm, namely cultivar and landrace. One hundred and sixty-nine barley landraces from Syria and Jordan and 171 European barley cultivars were genotyped with 1536 SNPs. Different subsets of 384 and 96 SNPs were selected from the 1536 set, based on their ability to detect diversity in landraces or cultivated barley in addition to corresponding randomly chosen subsets. All SNP sets except the landrace-optimised subsets underestimated the diversity present in the landrace germplasm, and all subsets of SNP gave similar estimates for cultivar germplasm. All marker subsets gave qualitatively similar estimates of the population structure in both germplasm sets, but the 96 SNP sets showed much lower data resolution values than the larger SNP sets. From these data we deduce that pre-selecting markers for their diversity in a germplasm set is very worthwhile in terms of the quality of data obtained. Second, we suggest that a properly chosen 384 SNP subset gives a good combination of power and economy for germplasm characterization, whereas the rather modest gain from using 1536 SNPs does not justify the increased cost and 96 markers give unacceptably low performance. Lastly, we propose a specific 384 SNP subset as a standard genotyping tool for middle-eastern landrace barley.  相似文献   

19.
Zhao J  Boerwinkle E  Xiong M 《Human genetics》2007,121(3-4):357-367
Availability of a large collection of single nucleotide polymorphisms (SNPs) and efficient genotyping methods enable the extension of linkage and association studies for complex diseases from small genomic regions to the whole genome. Establishing global significance for linkage or association requires small P-values of the test. The original TDT statistic compares the difference in linear functions of the number of transmitted and nontransmitted alleles or haplotypes. In this report, we introduce a novel TDT statistic, which uses Shannon entropy as a nonlinear transformation of the frequencies of the transmitted or nontransmitted alleles (or haplotypes), to amplify the difference in the number of transmitted and nontransmitted alleles or haplotypes in order to increase statistical power with large number of marker loci. The null distribution of the entropy-based TDT statistic and the type I error rates in both homogeneous and admixture populations are validated using a series of simulation studies. By analytical methods, we show that the power of the entropy-based TDT statistic is higher than the original TDT, and this difference increases with the number of marker loci. Finally, the new entropy-based TDT statistic is applied to two real data sets to test the association of the RET gene with Hirschsprung disease and the Fcγ receptor genes with systemic lupus erythematosus. Results show that the entropy-based TDT statistic can reach p-values that are small enough to establish genome-wide linkage or association analyses.  相似文献   

20.
Individual differences scaling is a multidimensional scaling method for finding a common ordination for several data sets. An individual ordination for each data set can then be derived from the common ordination by adjusting the axis lengths so as to maximize the correlations between observed proximities and individual ordination distances. The importance of the various axes for each data set and the mutual similarities and goodness of fit for the individual data sets are described by weight plots. As an example, 46 soft-water lakes in eastern Finland are ordinated on two dimensions according to 3 chemical data sets (water in summer and autumn, sediment) and 4 biological sets (major phytoplankton groups, phytoplankton, surface sediment diatom and cladoceran assemblages). The method seems to be effective as a means of ordination for obtaining the common ordination for the data sets. The major taxonomic groups gave the ordination which differed most clearly from the ordinations of the other data sets. Phytoplankton was most poorly ordinated in all the analyses. The other data sets were fairly coherent. When only biological data sets were ordinated, the diatoms and cladocerans showed rather different patterns. It seems that the cladocerans are best correlated with water chemistry, both according to weights in the joint analysis, and according to correlation between the axes from the biological data sets and the chemical variables.Abbreviations CCA = Canonical correspondence analysis - IDS = Individual differences scaling - MDS = multidimensional scaling - PCA = Principal components analysis  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号