首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
New high-throughput sequencing technologies can generate millions of short sequences in a single experiment. As the size of the data increases, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. In this paper, we investigate ways to compare multiple ChIP-sequencing experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-sequencing data from Illumina Genome Analyzer II. First, we evaluate the correlation among different experiments focusing on the total number of reads in transcribed and promoter regions of the genome. Then, we adopt the method that is used to identify the most stable genes in RT-PCR experiments to understand background signal across all of the experiments and to identify the most variable transcribed and promoter regions of the genome. We observed that the most variable genes for transcribed regions and promoter regions are very distinct. Gene ontology and function enrichment analysis on these most variable genes demonstrate the biological relevance of the results. In this study, we present a method that can effectively select differential regions of the genome based on protein-binding profiles over multiple experiments using real data points without any normalization among the samples.  相似文献   

2.
3.
Time course expression analysis constitutes a large portion of applications of microarray experiments. One primary goal of such experiments is to detect genes with the temporal changes over a period of time or at some interested time points. Difficulties arising from data with small number of replicates over only a few unaligned time points in multiple groups pose challenges for efficient statistical analysis. Some known methods are limited by the unverifiable assumptions or by the scope of applications for only two groups. We present a new method for detecting differentially expressed genes under nonhomogeneous time course experiments in multiple groups. The new method first models the time course curve of one gene by a Gaussian process to align the nonhomogeneous time course data and to compute the gradient of the time course curve as well, the latter of which is used as directional information to enhance the sensitivity of detection for temporal changes. Second, we adopt a nonparametric method to test a surrogate hypothesis based on the augmented data from the Gaussian process model. The proposed method is robust in terms of model fitting and testing. It does not require any distributional assumption for the observations or the test statistic and the method works for the case with as few as triplicate samples over four or five time points under multiple groups. We show the effectiveness and superiority of the new method in comparison with some existing methods using simulated models and two real data sets.  相似文献   

4.
Modern microarray technology is capable of providing data about the expression of thousands of genes, and even of whole genomes. An important question is how this technology can be used most effectively to unravel the workings of cellular machinery. Here, we propose a method to infer genetic networks on the basis of data from appropriately designed microarray experiments. In addition to identifying the genes that affect a specific other gene directly, this method also estimates the strength of such effects. We will discuss both the experimental setup and the theoretical background.  相似文献   

5.
Inferring genetic regulatory logic from expression data   总被引:1,自引:0,他引:1  
MOTIVATION: High-throughput molecular genetics methods allow the collection of data about the expression of genes at different time points and under different conditions. The challenge is to infer gene regulatory interactions from these data and to get an insight into the mechanisms of genetic regulation. RESULTS: We propose a model for genetic regulatory interactions, which has a biologically motivated Boolean logic semantics, but is of a probabilistic nature, and is hence able to confront noisy biological processes and data. We propose a method for learning the model from data based on the Bayesian approach and utilizing Gibbs sampling. We tested our method with previously published data of the Saccharomyces cerevisiae cell cycle and found relations between genes consistent with biological knowledge.  相似文献   

6.
Duarte CW  Zeng ZB 《Genetics》2011,187(3):955-964
Expression QTL (eQTL) studies involve the collection of microarray gene expression data and genetic marker data from segregating individuals in a population to search for genetic determinants of differential gene expression. Previous studies have found large numbers of trans-regulated genes (regulated by unlinked genetic loci) that link to a single locus or eQTL "hotspot," and it would be desirable to find the mechanism of coregulation for these gene groups. However, many difficulties exist with current network reconstruction algorithms such as low power and high computational cost. A common observation for biological networks is that they have a scale-free or power-law architecture. In such an architecture, highly influential nodes exist that have many connections to other nodes. If we assume that this type of architecture applies to genetic networks, then we can simplify the problem of genetic network reconstruction by focusing on discovery of the key regulatory genes at the top of the network. We introduce the concept of "shielding" in which a specific gene expression variable (the shielder) renders a set of other gene expression variables (the shielded genes) independent of the eQTL. We iteratively build networks from the eQTL to the shielder down using tests of conditional independence. We have proposed a novel test for controlling the shielder false-positive rate at a predetermined level by requiring a threshold number of shielded genes per shielder. Using simulation, we have demonstrated that we can control the shielder false-positive rate as well as obtain high shielder and edge specificity. In addition, we have shown our method to be robust to violation of the latent variable assumption, an important feature in the practical application of our method. We have applied our method to a yeast expression QTL data set in which microarray and marker data were collected from the progeny of a backcross of two species of Saccharomyces cerevisiae (Brem et al. 2002). Seven genetic networks have been discovered, and bioinformatic analysis of the discovered regulators and corresponding regulated genes has generated plausible hypotheses for mechanisms of regulation that can be tested in future experiments.  相似文献   

7.
Linden R  Bhaya A 《Bio Systems》2007,88(1-2):76-91
This paper develops an algorithm that extracts explanatory rules from microarray data, which we treat as time series, using genetic programming (GP) and fuzzy logic. Reverse polish notation is used (RPN) to describe the rules and to facilitate the GP approach. The algorithm also allows for the insertion of prior knowledge, making it possible to find sets of rules that include the relationships between genes already known. The algorithm proposed is applied to problems arising in the construction of gene regulatory networks, using two different sets of real data from biological experiments on the Arabidopsis thaliana cold response and the rat central nervous system, respectively. The results show that the proposed technique can fit data to a pre-defined precision even in situations where the data set has thousands of features but only a limited number of points in time are available, a situation in which traditional statistical alternatives encounter difficulties, due to the scarcity of time points.  相似文献   

8.
Diseases such as obesity, diabetes, and atherosclerosis result from multiple genetic and environmental factors, and importantly, interactions between genetic and environmental factors. Identifying susceptibility genes for these diseases using genetic and genomic technologies is accelerating, and the expectation over the next several years is that a number of genes will be identified for common diseases. However, the identification of single genes for disease has limited utility, given that diseases do not originate in complex systems from single gene changes. Further, the identification of single genes for disease may not lead directly to genes that can be targeted for therapeutic intervention. Therefore, uncovering single genes for disease in isolation of the broader network of molecular interactions in which they operate will generally limit the overall utility of such discoveries. Several integrative approaches have been developed and applied to reconstructing networks. Here we review several of these approaches that involve integrating genetic, expression, and clinical data to elucidate networks underlying disease. Networks reconstructed from these data provide a richer context in which to interpret associations between genes and disease. Therefore, these networks can lead to defining pathways underlying disease more objectively and to identifying biomarkers and more-robust points for therapeutic intervention.  相似文献   

9.
MOTIVATION: The reconstruction of genetic networks is the holy grail of functional genomics. Its core task is to identify the causal structure of a gene network, that is, to distinguish direct from indirect regulatory interactions among gene products. In other words, to reconstruct a genetic network is to identify, for each network gene, which other genes and their activity the gene influences directly. Crucial to this task are perturbations of gene activity. Genomic technology permits large-scale experiments perturbing the activity of many genes and assessing the effect of each perturbation on all other genes in a genome. However, such experiments cannot distinguish between direct and indirect effects of a genetic perturbation. RESULTS: I present an algorithm to reconstruct direct regulatory interactions in gene networks from the results of gene perturbation experiments. The algorithm is based on a graph representation of genetic networks and applies to networks of arbitrary size and complexity. Algorithmic complexity in both storage and time is low, less than O(n(2)). In practice, the algorithm can reconstruct networks of several thousand genes in mere CPU seconds on a desktop workstation. AVAILABILITY: A perl implementation of the algorithm is given in the Appendix. CONTACT: wagnera@unm.edu  相似文献   

10.
The male germ line stem cell is the only cell type in the adult that can contribute genes to the next generation and is characterized by postnatal proliferation. It has not been determined whether this cell population can be used to deliberately introduce genetic modification into the germ line to generate transgenic animals or whether human somatic cell gene therapy has the potential to accidentally introduce permanent genetic changes into a patient's germ line. Here we report that several techniques can be used to achieve both in vitro and in vivo gene transfer into mouse male germ line stem cells using a retroviral vector. Expression of a retrovirally delivered reporter lacZ transgene in male germ line stem cells and differentiated germ cells persisted in the testis for more than 6 months. At least one in 300 stem cells could be infected. The experiments demonstrate a system to introduce genes directly into the male germ line and also provide a method to address the potential of human somatic cell gene therapy DNA constructs to enter a patient's germ line.  相似文献   

11.
This report describes an Escherichia coli genetic system that permits bacterial genetic methods to be applied to the study of essentially any prokaryotic or eukaryotic site-specific DNA binding protein. It consists of two parts. The first part is a set of tools that facilitate construction of customized E.coli strains bearing single copy lacZYA reporters that are repressed by a specific target protein. The second part is a pair of regulatable protein expression vectors that permit in vivo production of the target protein at levels appropriate for genetic experiments. When expressed in a properly designed reporter strain, the target protein represses the lac genes, resulting in an E.coli phenotype that can be quantitatively measured or exploited in large scale genetic screens or selections. As a result, large plasmid-based libraries of protein genes or pools of mutagenized variants of a given gene may be examined in relatively simple genetic experiments. The strain construction technique is also useful for generating E.coli strains bearing reporters for other types of genetic systems, including repression-based and activation-based systems in which chimeric proteins are used to examine interactions between foreign protein domains.  相似文献   

12.
13.
In recent years, the increase in the amounts of available genomic data has made it easier to appreciate the extent by which organisms increase their genetic diversity through horizontally transferred genetic material. Such transfers have the potential to give rise to extremely dynamic genomes where a significant proportion of their coding DNA has been contributed by external sources. Because of the impact of these horizontal transfers on the ecological and pathogenic character of the recipient organisms, methods are continuously sought that are able to computationally determine which of the genes of a given genome are products of transfer events. In this paper, we introduce and discuss a novel computational method for identifying horizontal transfers that relies on a gene's nucleotide composition and obviates the need for knowledge of codon boundaries. In addition to being applicable to individual genes, the method can be easily extended to the case of clusters of horizontally transferred genes. With the help of an extensive and carefully designed set of experiments on 123 archaeal and bacterial genomes, we demonstrate that the new method exhibits significant improvement in sensitivity when compared to previously published approaches. In fact, it achieves an average relative improvement across genomes of between 11 and 41% compared to the Codon Adaptation Index method in distinguishing native from foreign genes. Our method's horizontal gene transfer predictions for 123 microbial genomes are available online at http://cbcsrv.watson.ibm.com/HGT/.  相似文献   

14.
Genes are gained and lost over the course of evolution. A recent study found that over 1,800 new genes have appeared during primate evolution and that an unexpectedly high proportion of these genes are expressed in the human brain. But what are the molecular functions of newly evolved genes and what is their impact on an organism's fitness? The acquisition of new genes may provide a rich source of genetic diversity that fuels evolutionary innovation. Although gene manipulation experiments are not feasible in humans, studies in model organisms, such as Drosophila melanogaster, have shown that new genes can quickly become integrated into genetic networks and become essential for survival or fertility. Future studies of new genes, especially chimeric genes, and their functions will help determine the role of genetic novelty in the adaptation and diversification of species.  相似文献   

15.
MOTIVATION: Most biological traits may be correlated with the underlying gene expression patterns that are partially determined by DNA sequence variation. The correlations between gene expressions and quantitative traits are essential for understanding the functions of genes and dissecting gene regulatory networks. RESULTS: In the present study, we adopted a novel statistical method, called the stochastic expectation and maximization (SEM) algorithm, to analyze the associations between gene expression levels and quantitative trait values and identify genetic loci controlling the gene expression variations. In the first step, gene expression levels measured from microarray experiments were assigned to two different clusters based on the strengths of their association with the phenotypes of a quantitative trait under investigation. In the second step, genes associated with the trait were mapped to genetic loci of the genome. Because gene expressions are quantitative, the genetic loci controlling the expression traits are called expression quantitative trait loci. We applied the same SEM algorithm to a real dataset collected from a barley genetic experiment with both quantitative traits and gene expression traits. For the first time, we identified genes associated with eight agronomy traits of barley. These genes were then mapped to seven chromosomes of the barley genome. The SEM algorithm and the result of the barley data analysis are useful to scientists in the areas of bioinformatics and plant breeding. Availability and implementation: The R program for the SEM algorithm can be downloaded from our website: http://www.statgen.ucr.edu.  相似文献   

16.
Meta-analysis of information from quantitative trait loci (QTL) mapping experiments was used to derive distributions of the effects of genes affecting quantitative traits. The two limitations of such information, that QTL effects as reported include experimental error, and that mapping experiments can only detect QTL above a certain size, were accounted for. Data from pig and dairy mapping experiments were used. Gamma distributions of QTL effects were fitted with maximum likelihood. The derived distributions were moderately leptokurtic, consistent with many genes of small effect and few of large effect. Seventeen percent and 35% of the leading QTL explained 90% of the genetic variance for the dairy and pig distributions respectively. The number of segregating genes affecting a quantitative trait in dairy populations was predicted assuming genes affecting a quantitative trait were neutral with respect to fitness. Between 50 and 100 genes were predicted, depending on the effective population size assumed. As data for the analysis included no QTL of small effect, the ability to estimate the number of QTL of small effect must inevitably be weak. It may be that there are more QTL of small effect than predicted by our gamma distributions. Nevertheless, the distributions have important implications for QTL mapping experiments and Marker Assisted Selection (MAS). Powerful mapping experiments, able to detect QTL of 0.1σp, will be required to detect enough QTL to explain 90% the genetic variance for a quantitative trait.  相似文献   

17.
Spermatogenesis is an elaborate process involving both cell division and differentiation, and cell-cell interactions. Defects in any of these processes can result in infertility, and in some cases these can be genetic in cause. Mapping experiments have defined at least three regions of the human Y chromosome that are required for normal spermatogenesis. Two of these contain the genes encoding the RNA binding proteins RBM and DAZ, suggesting that the control of RNA metabolism is likely to be an important control point for human spermatogenesis. A similar analysis in mice has shown that at least two regions of the mouse Y chromosome are essential for spermatogenesis. Both genetic and reverse genetic approaches have been used to identify mouse autosomal genes required for spermatogenesis. These studies have shown that genes in a number of different pathways are essential for normal spermatogenesis, and also provide putative models of human infertility.  相似文献   

18.
An improved algorithm for clustering gene expression data   总被引:1,自引:0,他引:1  
MOTIVATION: Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. RESULTS: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.  相似文献   

19.
采用混合分布模型和极大似然法,提出了杂种早期世代F_2与回交群体中两个位点之 基因控制的质量-数量性状的遗传分析方法,据此可以进行主基因及其作用方式的测验、主基因 和微基因效应的估计等.探讨了利用回交群体进行质量-数量性状遗传分析与检测的适用范围和 有效范围等.  相似文献   

20.
The cultivated peanut (Arachis hypogaea L.) is an allotetraploid of recent origin, with an AABB genome and low genetic diversity. Perhaps because of its limited genetic diversity, this species lacks resistance to a number of important pests and diseases. In contrast, wild species of Arachis are genetically diverse and are rich sources of disease resistance genes. Consequently, a study of wild peanut relatives is attractive from two points of view: to help understand peanut genetics and to characterize wild alleles that could confer disease resistance. With this in mind, a diploid population from a cross between two wild peanut relatives was developed, in order to make a dense genetic map that could serve as a reference for peanut genetics and in order to characterize the regions of the Arachis genome that code for disease resistance. We tested two methods for developing and genotyping single nucleotide polymorphisms in candidate genes for disease resistance; one is based on single-base primer extension methods and the other is based on amplification refractory mutation system-polymerase chain reaction. We found single-base pair extension to be an efficient method, suitable for high-throughput, single-nucleotide polymorphism mapping; it allowed us to locate five candidate genes for resistance on our genetic map.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号