共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
4.
5.
6.
7.
8.
Daniel J. Schaid Jason P. Sinnwell Shannon K. McDonnell Stephen N. Thibodeau 《Human genetics》2013,132(11):1301-1309
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method—Tango’s statistic—to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. 相似文献
9.
MOTIVATION: Analysis of oligonucleotide array data, especially to select genes of interest, is a highly challenging task because of the large volume of information and various experimental factors. Moreover, interaction effect (i.e. expression changes depend on probe effects) complicates the analysis because current methods often use an additive model to analyze data. We propose an approach to address these issues with the aim of producing a more reliable selection of differentially expressed genes. The approach uses the rank for normalization, employs the percentile-range to measure expression variation, and applies various filters to monitor expression changes. RESULTS: We compare our approach with MAS and Dchip models. A data set from an angiogenesis study is used for illustration. Results show that our approach performs better than other methods either in identification of the positive control gene or in PCR confirmatory tests. In addition, the invariant set of genes in our approach provides an efficient way for normalization. 相似文献
10.
11.
12.
Small genome sequencing and annotations are leading to the definition of metabolic genotypes in an increasing number of organisms. Proteomics is beginning to give insights into the use of the metabolic genotype under given growth conditions. These data sets give the basis for systemically studying the genotype-phenotype relationship. Methods of systems science need to be employed to analyze, interpret, and predict this complex relationship. These endeavors will lead to the development of a new field, tentatively named phenomics. This article illustrates how the metabolic characteristics of annotated small genomes can be analyzed using flux balance analysis (FBA). A general algorithm for the formulation of in silico metabolic genotypes is described. Illustrative analyses of the in silico Escherichia coli K-12 metabolic genotypes are used to show how FBA can be used to study the capabilities of this strain. 相似文献
13.
RNAi, inhibition of gene expression by double stranded RNA molecules, has rapidly become a powerful laboratory technique to study gene function. The effectiveness of the procedure raised the question of whether this laboratory technique may actually mimic a natural cellular control mechanism that works on similar principles. Indeed recent evidence is accumulating to suggest that RNAi is a natural control mechanism that might even serve as a primitive immune response against RNA viruses and retroposons. Three different interference scenarios seem to be utilized by various RNAi mechanisms. One of the mechanisms involves degradation of mRNA molecules. Here we suggest a method to systematically scan entire genomes simultaneously for RNAi elements and the presence of cellular genes that are degraded by these RNAi elements via exact short base-pair matching. The method is based on scanning the genomes using a suffix tree data structure that was specifically modified to identify sets of combinations of repeated and inverted repeated sequences of 20 bp or more. Initial scan suggest that a large number, about 7% of C.elegans and 3% of C.briggsae genes, have the potential to be subject to natural RNAi control. Two methods are proposed to further analyze these genes to select the cases that are more likely to be actual cases of RNAi control. One method involves looking for ESTs that can provide direct evidence that RNAi control element are indeed expressed. The other method looks for synteny between C.elegans and C.briggsae assuming that genes that might be under RNAi control in both organisms are more likely to be biological significant. Taken together, supportive evidence was found for about 70 genes to be under RNAi control. Among these genes are: transposase, hormone receptors, homeobox proteins, defensin, actins, and several types of collagens. While our method is not capable of detecting all cases of natural RNAi control, it points to a large number of potential cases that can be further verified by experimental work. 相似文献
14.
RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets 总被引:1,自引:0,他引:1
Lazzarato F Franceschinis G Botta M Cordero F Calogero RA 《Bioinformatics (Oxford, England)》2004,20(16):2848-2850
RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. AVAILABILITY: RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html 相似文献
15.
16.
Microarrays are widely used for gene expression profiling. In the case of prokaryotes such arrays usually provide data about composition of modulons, groups of genes whose expression is influenced by a single regulatory system or external stimulus. Unlike modulons, regulons include only genes directly controlled by regulatory systems. Here we compared the structures of the Fnr and ArcA modulons and regulons. The data about modulon composition were taken from published microarray assays, whereas regulons were characterized using comparative genomic approaches. The Fnr and ArcA regulons were shown to contain 26 and 16 operons, respectively. Ten operons had high-score and highly conserved site for both Fnr and ArcA. These genes are the "core of regulons". Remarkably, all "core genes" encode enzymes involved in aerobic respiration and central metabolism. The Fnr-ArcA regulatory cascade plays an important role in expansion of the Fnr modulon. 相似文献
17.
High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression (NPR) analysis to efficiently integrate genomic data and metadata. Such NPR models consider multiple pathways simultaneously and allow complex interactions among genes within the pathways and can be applied to identify pathways and genes that are related to variations of the phenotypes. These methods also provide an alternative to mediating the problem of a large number of potential interactions by limiting analysis to biologically plausible interactions between genes in related pathways. Our simulation studies indicate that the proposed boosting procedure can indeed identify relevant pathways. Application to a gene expression data set on breast cancer distant metastasis identified that Wnt, apoptosis, and cell cycle-regulated pathways are more likely related to the risk of distant metastasis among lymph-node-negative breast cancer patients. Results from analysis of other two breast cancer gene expression data sets indicate that the pathways of Metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer relapse and survival. We also observed that by incorporating the pathway information, we achieved better prediction for cancer recurrence. 相似文献
18.
Background
Normalization of gene expression data refers to the comparison of expression values using reference standards that are consistent across all conditions of an experiment. In PCR studies, genes designated as "housekeeping genes" have been used as internal reference genes under the assumption that their expression is stable and independent of experimental conditions. However, verification of this assumption is rarely performed. Here we assess the use of gene microarray analysis to facilitate selection of internal reference sequences with higher expression stability across experimental conditions than can be expected using traditional selection methods. 相似文献19.
Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. 总被引:21,自引:0,他引:21
Although two-color fluorescent DNA microarrays are now standard equipment in many molecular biology laboratories, methods for identifying differentially expressed genes in microarray data are still evolving. Here, we report a refined test for differentially expressed genes which does not rely on gene expression ratios but directly compares a series of repeated measurements of the two dye intensities for each gene. This test uses a statistical model to describe multiplicative and additive errors influencing an array experiment, where model parameters are estimated from observed intensities for all genes using the method of maximum likelihood. A generalized likelihood ratio test is performed for each gene to determine whether, under the model, these intensities are significantly different. We use this method to identify significant differences in gene expression among yeast cells growing in galactose-stimulating versus non-stimulating conditions and compare our results with current approaches for identifying differentially-expressed genes. The effect of sample size on parameter optimization is also explored, as is the use of the error model to compare the within- and between-slide intensity variation intrinsic to an array experiment. 相似文献
20.
GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The
program includes
- a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;
- a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database;
- two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and
- effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.