共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: The analysis of gene expression data in its chromosomal context has been a recent development in cancer research. However, currently available methods fail to account for variation in the distance between genes, gene density and genomic features (e.g. GC content) in identifying increased or decreased chromosomal regions of gene expression. RESULTS: We have developed a model-based scan statistic that accounts for these aspects of the complex landscape of the human genome in the identification of extreme chromosomal regions of gene expression. This method may be applied to gene expression data regardless of the microarray platform used to generate it. To demonstrate the accuracy and utility of this method, we applied it to a breast cancer gene expression dataset and tested its ability to predict regions containing medium-to-high level DNA amplification (DNA ratio values >2). A classifier was developed from the scan statistic results that had a 10-fold cross-validated classification rate of 93% and a positive predictive value of 88%. This result strongly suggests that the model-based scan statistic and the expression characteristics of an increased chromosomal region of gene expression can be used to accurately predict chromosomal regions containing amplified genes. AVAILABILITY: Functions in the R-language are available from the author upon request. CONTACT: fcouples@umich.edu. 相似文献
2.
Summary Closely related proteins show an obvious kinship by having numerous matching amino acids in their aligned sequences. Kinship between anciently separated proteins requires a statistical evaluation to rule out fortuitous similarities. A simple statistic is developed which assumes equal probability for all codon pairs, and a table of critical values for amino acid sequence alignments of length 200 or less is presented. Applying this statistic toV andC regions of immunoglobulin chains, aligned on the basis of shared features of three-dimensional structure, provides evidence that theV andC sequences descended from a common ancestor. Similarly the distant evolutionary relationship of dehydrogenases, flavdoxin, and subtilisin, suggested by structural alignments, is verified. On the other hand, the statistic does not verify a common evolutionary origin for the heme binding pocket in globins and cytochromeb
5. Empirical evidence from the distribution of MMD values of amino acid pairs in comparisons of misaligned polypeptide chains and from Monte Carlo trials of sequences aligned with arbitrary gaps supports the validity of the statistic. 相似文献
3.
An entropy-based statistic TPE has been proposed for genomic association study for disease-susceptibility locus.The statistic TPE may be directly adopted and/or extended to quantitative-trait locus (QTL)mapping for quantitative traits.In this article,the statistic TPE was extended and applied to quantitative trait for association analysis of QTL by means of selective genotyping.The statistical properties (the type I error rate and the power) were examined under a range of parameters and population-sampling strategies (e.g.,various genetic models,various heritabilities,and various sample-selection threshold values) by simulation studies.The results indicated that the statistic Tee is robust and powerful for genomic association study of QTL.A simulation study based on the haplotype frequencies of 10 single nucleotide polymorphisms (SNPs) of angiotensin-I converting enzyme genes was conducted to evaluate the performance of the statistic TPE for genetic association study. 相似文献
4.
A spatial scan statistic for multiple clusters 总被引:1,自引:0,他引:1
Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters’ shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster’s shadowing effect. 相似文献
5.
Spatial scan statistics with Bernoulli and Poisson models are commonly used for geographical disease surveillance and cluster detection. These models, suitable for count data, were not designed for data with continuous outcomes. We propose a spatial scan statistic based on an exponential model to handle either uncensored or censored continuous survival data. The power and sensitivity of the developed model are investigated through intensive simulations. The method performs well for different survival distribution functions including the exponential, gamma, and log-normal distributions. We also present a method to adjust the analysis for covariates. The cluster detection method is illustrated using survival data for men diagnosed with prostate cancer in Connecticut from 1984 to 1995. 相似文献
6.
A genomewide scan for age-related macular degeneration provides evidence for linkage to several chromosomal regions 总被引:12,自引:0,他引:12
Seddon JM Santangelo SL Book K Chong S Cote J 《American journal of human genetics》2003,73(4):780-790
We report the results of a genomewide scan for age-related macular degeneration (AMD) in 158 multiplex families. AMD classification was based on fundus photography and was assigned a grade ranging from 1 (no disease) to 5 (exudative disease). Genotyping was performed by the National Heart, Lung, and Blood Institute Mammalian Genotyping Service at Marshfield (404 short tandem repeat markers). The sample included 158 families with two or more siblings with AMD, 490 affected individuals, 101 unaffected individuals, and 38 whose affection status was unknown. Relative pairs included 511 affected sibling, 28 avuncular, 53 cousin, 7 grandparent-grandchild, and 9 grand-avuncular pairs. Two-point parametric and multipoint parametric and nonparametric analyses were performed. Maximum two-point LOD scores of 1.0-2.0 were found for markers on chromosomes 1, 2, 8, 10, 14, 15, and 22. Multipoint analyses were consistent with the two-point results for chromosomes 1, 2, 8, 10, and 22 and provided evidence for additional linkage regions on chromosomes 3, 6, 8, 12, 16, and X. Our signals on chromosomes 1q, 6p, and 10q are consistent with some other previously published results. Significant linkage to AMD was found for one marker on chromosome 2, two adjacent markers on chromosome 3, two adjacent markers on chromosome 6, and seven contiguous markers on chromosome 8, with empirical P values of .00001. The consistency of many of the other signals across both two-point and multipoint, as well as parametric and nonparametric, analyses indicate several other regions worthy of follow-up. 相似文献
7.
We have adapted the originally described electronic PCR (e-PCR) algorithm to perform string searches more accurately and much more rapidly than previously possible. Our implementation [multithreaded e-PCR (me-PCR)] runs sufficiently fast to allow even desktop machines to query quickly large genomes with very large genomic element sets. In addition, me-PCR is multithreaded, interprets all IUPAC nucleotide symbols, allows searches with elements specified by long sequences (such as SNPs), accepts ranges in the expected PCR size input field, requires substantially less memory for analysis of large sequences and corrects a number of minor flaws causing misreporting of hits in exceptional cases. Thus, me-PCR provides increased annotation capabilities for complex genomes to non-expert laboratories. 相似文献
8.
Many databases exist with which it is possible to study the relationship between health events and various potential risk factors. Among these databases, some have variables that naturally form a hierarchical tree structure, such as pharmaceutical drugs and occupations. It is of great interest to use such databases for surveillance purposes in order to detect unsuspected relationships to disease risk. We propose a tree-based scan statistic, by which the surveillance can be conducted with a minimum of prior assumptions about the group of occupations/drugs that increase risk, and which adjusts for the multiple testing inherent in the many potential combinations. The method is illustrated using data from the National Center for Health Statistics Multiple Cause of Death Database, looking at the relationship between occupation and death from silicosis. 相似文献
9.
BackgroundThe ability to detect disease outbreaks early is important in order to minimize morbidity and mortality through timely implementation of disease prevention and control measures. Many national, state, and local health departments are launching disease surveillance systems with daily analyses of hospital emergency department visits, ambulance dispatch calls, or pharmacy sales for which population-at-risk information is unavailable or irrelevant.ConclusionIf such results hold up over longer study times and in other locations, the space–time permutation scan statistic will be an important tool for local and national health departments that are setting up early disease detection surveillance systems. 相似文献
10.
Fu & Curnow (1990) derive recursive equations to find thelevel of significance and power of a likelihood ratio test fora changed segment of specified length, based on the scan statistic,the maximum number of successes within the specified length.Their method is computationally feasible for segment lengthsof 20 or less. We present and evaluate highly accurate approximationsas well as bounds for the power function of this test that arecomputationally feasible even for very large segment lengths.We also evaluate power when the duration of the increased lengthused in the test statistic does not correspond to the actuallength. 相似文献
11.
MOTIVATION: We consider the problem of identifying low-complexity regions (LCRs) in a protein sequence. LCRs are regions of biased composition, normally consisting of different kinds of repeats. RESULTS: We define new complexity measures to compute the complexity of a sequence based on a given scoring matrix, such as BLOSUM 62. Our complexity measures also consider the order of amino acids in the sequence and the sequence length. We develop a novel graph-based algorithm called GBA to identify LCRs in a protein sequence. In the graph constructed for the sequence, each vertex corresponds to a pair of similar amino acids. Each edge connects two pairs of amino acids that can be grouped together to form a longer repeat. GBA finds short subsequences as LCR candidates by traversing this graph. It then extends them to find longer subsequences that may contain full repeats with low complexities. Extended subsequences are then post-processed to refine repeats to LCRs. Our experiments on real data show that GBA has significantly higher recall compared to existing algorithms, including 0j.py, CARD, and SEG. AVAILABILITY: The program is available on request. 相似文献
12.
The simultaneous testing of a large number of hypotheses in a genome scan, using individual thresholds for significance, inherently leads to inflated genome-wide false positive rates. There exist various approaches to approximating the correct genomewide p-values under various assumptions, either by way of asymptotics or simulations. We explore a philosophically different criterion, recently proposed in the literature, which controls the false discovery rate. The test statistics are assumed to arise from a mixture of distributions under the null and non-null hypotheses. We fit the mixture distribution using both a nonparametric approach and commingling analysis, and then apply the local false discovery rate to select cut-off points for regions to be declared interesting. Another criterion, the minimum total error, is also explored. Both criteria seem to be sensible alternatives to controlling the classical type I and type II error rates. 相似文献
13.
Horizontally transferred DNA is largely responsible for the dissemination of virulence traits amongst bacteria. Rapid identification of acquired DNA remains difficult as whole-genome sequencing of outbreak strains is impractical, and microarray-based approaches, while powerful, are limited to genes present only in the reference strains. Here we present a novel bacterial comparative genomic hybridization method that directly compares the genomes of related strains at sub-kilobase resolution in order to identify acquired DNA. Bacterial comparative genomic hybridization utilizes the concept of metaphase chromosome comparative genomic hybridization, and exploits the resolving power of two-dimensional DNA electrophoresis. Comparison of isogenic variants of the pathogen Pseudomonas aeruginosa detected a single-copy gene insertion responsible for gentamicin resistance. 相似文献
14.
15.
Background
Repeat-rich regions such as centromeres receive less attention than their gene-rich euchromatic counterparts because the former are difficult to assemble and analyze. Our objectives were to 1) map all ten centromeres onto the maize genetic map and 2) characterize the sequence features of maize centromeres, each of which spans several megabases of highly repetitive DNA. Repetitive sequences can be mapped using special molecular markers that are based on PCR with primers designed from two unique "repeat junctions". Efficient screening of large amounts of maize genome sequence data for repeat junctions, as well as key centromere sequence features required the development of specific annotation software. 相似文献16.
Genome scans using large numbers of randomly selected markers have revealed a small proportion of loci that deviate from neutral expectations and so may mark genomic regions that contribute to local adaptation. Measurements of sequence differentiation and identification of genes in these regions is important but difficult, especially in organisms with limited genetic information available. We have followed up a genome scan in the marine gastropod, Littorina saxatilis, by searching a bacterial artificial chromosome library with differentiated and undifferentiated markers, sequencing four bacterial artificial chromosomes and then analysing sequence variation in population samples for fragments at, and close to the original marker polymorphisms. We show that sequence differentiation follows the patterns expected from the original marker frequencies, that differentiated markers identify independent and highly localized sites and that these sites fall outside coding regions. Two differentiated loci are characterized by insertions of putative transposable elements that appear to have increased in frequency recently and which might influence expression of downstream genes. These results provide strong candidate loci for the study of local adaptation in Littorina. They demonstrate an approach that can be applied to follow up genome scans in other taxa and they show that the genome scan approach can lead rapidly to candidate genes in nonmodel organisms. 相似文献
17.
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic. 相似文献
18.
Gangnon RE 《Biometrics》2012,68(1):174-182
The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. 相似文献
19.
Aneuploidy plays a significant role in adverse human health conditions including birth defects, pregnancy wastage, and cancer. Currently, there is no screening method sufficiently validated that can be used routinely to identify aneugenic agents in vitro because most conventional test systems rely on the labor-intensive microscopic assessment of the aneuploid cell population. Our laboratory has recently developed a flow cytometry-based procedure for assessing numerical chromosomal aberrations in mitotic populations of lymphocytes on the basis of DNA content. Studies were conducted in 24 h treated human lymphocyte cultures to determine the sensitivity of this flow cytometry-based procedure to detect aneugenic agents. A comparison between the microscopic and the flow cytometry-based procedures for scoring polyploidy shows a strong agreement exists between the two methods. Treatments with two known aneugenic agents, griseofulvin, and paclitaxel (taxol), resulted in a dose-related increase in the mitotic index, aneuploidy, and polyploidy. In contrast, results from the treatments with two known clastogenic agents, mitomycin-C, and etoposide, show a dose-related decrease in the mitotic index with a slight increase in the frequency of hypodiploidy at concentrations that produce severe chromosomal breakage. There were no increases in hyperdiploidy and polyploidy observed. In conclusion, the reproducibility of the results obtained in this study indicates that this flow cytometry-based procedure for assessing numerical chromosomal effects in mitotic populations on the basis of DNA content is promising for the routine detection and characterization of aneugenic agents. 相似文献
20.
Elo LL Filén S Lahesmaa R Aittokallio T 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(3):423-431
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well. 相似文献