首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Yu  Yun  Jermaine  Christopher  Nakhleh  Luay 《BMC genomics》2016,17(10):784-124

Background

Phylogenetic networks are leaf-labeled graphs used to model and display complex evolutionary relationships that do not fit a single tree. There are two classes of phylogenetic networks: Data-display networks and evolutionary networks. While data-display networks are very commonly used to explore data, they are not amenable to incorporating probabilistic models of gene and genome evolution. Evolutionary networks, on the other hand, can accommodate such probabilistic models, but they are not commonly used for exploration.

Results

In this work, we show how to turn evolutionary networks into a tool for statistical exploration of phylogenetic hypotheses via a novel application of Gibbs sampling. We demonstrate the utility of our work on two recently available genomic data sets, one from a group of mosquitos and the other from a group of modern birds. We demonstrate that our method allows the use of evolutionary networks not only for explicit modeling of reticulate evolutionary histories, but also for exploring conflicting treelike hypotheses. We further demonstrate the performance of the method on simulated data sets, where the true evolutionary histories are known.

Conclusion

We introduce an approach to explore phylogenetic hypotheses over evolutionary phylogenetic networks using Gibbs sampling. The hypotheses could involve reticulate and non-reticulate evolutionary processes simultaneously as we illustrate on mosquito and modern bird genomic data sets.
  相似文献   

2.

Background

Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.

Results

In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.

Conclusion

The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.  相似文献   

3.

Background

Although the patterns of co-substitutions in RNA is now well characterized, detection of coevolving positions in proteins remains a difficult task. It has been recognized that the signal is typically weak, due to the fact that (i) amino-acid are characterized by various biochemical properties, so that distinct amino acids changes are not functionally equivalent, and (ii) a given mutation can be compensated by more than one mutation, at more than one position.

Results

We present a new method based on phylogenetic substitution mapping. The two above-mentioned problems are addressed by (i) the introduction of a weighted mapping, which accounts for the biochemical effects (volume, polarity, charge) of amino-acid changes, (ii) the use of a clustering approach to detect groups of coevolving sites of virtually any size, and (iii) the distinction between biochemical compensation and other coevolutionary mechanisms. We apply this methodology to a previously studied data set of bacterial ribosomal RNA, and to three protein data sets (myoglobin of vertebrates, S-locus Receptor Kinase and Methionine Amino-Peptidase).

Conclusion

We succeed in detecting groups of sites which significantly depart the null hypothesis of independence. Group sizes range from pairs to groups of size ? 10, depending on the substitution weights used. The structural and functional relevance of these groups of sites are assessed, and the various evolutionary processes potentially generating correlated substitution patterns are discussed.  相似文献   

4.

Background

The availability of sequences from whole genomes to reconstruct the tree of life has the potential to enable the development of phylogenomic hypotheses in ways that have not been before possible. A significant bottleneck in the analysis of genomic-scale views of the tree of life is the time required for manual curation of genomic data into multi-gene phylogenetic matrices.

Results

To keep pace with the exponentially growing volume of molecular data in the genomic era, we have developed an automated technique, ASAP (Automated Simultaneous Analysis Phylogenetics), to assemble these multigene/multi species matrices and to evaluate the significance of individual genes within the context of a given phylogenetic hypothesis.

Conclusion

Applications of ASAP may enable scientists to re-evaluate species relationships and to develop new phylogenomic hypotheses based on genome-scale data.  相似文献   

5.

Background

The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column.

Results

Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses.

Conclusion

Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research.  相似文献   

6.
7.

Background

The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set.

Results

In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method P athway A nalysis with D own-weighting of O verlapping G enes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results.

Conclusions

PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org.  相似文献   

8.

Background

The retention of ancestral juvenile characters by adult stages of descendants is called paedomorphosis. However, this process can mislead phylogenetic analyses based on morphological data, even in combination with molecular data, because the assessment if a character is primary absent or secondary lost is difficult. Thus, the detection of incongruence between morphological and molecular data is necessary to investigate the reliability of simultaneous analyses. Different methods have been proposed to detect data congruence or incongruence. Five of them (PABA, PBS, NDI, LILD, DRI) are used herein to assess incongruence between morphological and molecular data in a case study addressing salamander phylogeny, which comprises several supposedly paedomorphic taxa. Therefore, previously published data sets were compiled herein. Furthermore, two strategies ameliorating effects of paedomorphosis on phylogenetic studies were tested herein using a statistical rigor. Additionally, efficiency of the different methods to assess incongruence was analyzed using this empirical data set. Finally, a test statistic is presented for all these methods except DRI.

Results

The addition of morphological data to molecular data results in both different positions of three of the four paedomorphic taxa and strong incongruence, but treating the morphological data using different strategies ameliorating the negative impact of paedomorphosis revokes these changes and minimizes the conflict. Of these strategies the strategy to just exclude paedomorphic character traits seem to be most beneficial. Of the three molecular partitions analyzed herein the RAG1 partition seems to be the most suitable to resolve deep salamander phylogeny. The rRNA and mtDNA partition are either too conserved or too variable, respectively. Of the different methods to detect incongruence, the NDI and PABA approaches are more conservative in the indication of incongruence than LILD and PBS.

Conclusion

Paedomorphosis induces strong conflicts and can mislead the phylogenetic analyses even in combined analyses. However, different strategies are efficiently minimizing these problems. Though the exploration of different methods to detect incongruence is preferable NDI and PABA are more conservative than the others and NDI is computational less extensive than PABA.  相似文献   

9.
Survey of human mitochondrial diseases using new genomic/proteomic tools   总被引:1,自引:0,他引:1  
Thomas N Plasterer  Temple F Smith  Scott C Mohr 《Genome biology》2001,2(6):research0021.1-research002116

Background

We have constructed Bayesian prior-based, amino-acid sequence profiles for the complete yeast mitochondrial proteome and used them to develop methods for identifying and characterizing the context of protein mutations that give rise to human mitochondrial diseases. (Bayesian priors are conditional probabilities that allow the estimation of the likelihood of an event - such as an amino-acid substitution - on the basis of prior occurrences of similar events.) Because these profiles can assemble sets of taxonomically very diverse homologs, they enable identification of the structurally and/or functionally most critical sites in the proteins on the basis of the degree of sequence conservation. These profiles can also find distant homologs with determined three-dimensional structures that aid in the interpretation of effects of missense mutations.

Results

This survey reports such an analysis for 15 missense mutations, one insertion and three deletions involved in Leber's hereditary optic neuropathy, Leigh syndrome, mitochondrial neurogastrointestinal encephalomyopathy, Mohr-Tranebjaerg syndrome, iron-storage disorders related to Friedreich's ataxia, and hereditary spastic paraplegia. We present structural correlations for seven of the mutations.

Conclusions

Of the 19 mutations analyzed, 14 involved changes in very highly conserved parts of the affected proteins. Five out of seven structural correlations provided reasonable explanations for the malfunctions. As additional genetic and structural data become available, this methodology can be extended. It has the potential for assisting in identifying new disease-related genes. Furthermore, profiles with structural homologs can generate mechanistic hypotheses concerning the underlying biochemical processes - and why they break down as a result of the mutations.  相似文献   

10.

Background

Co-evolution is the process in which two (or more) sets of orthologs exhibit a similar or correlative pattern of evolution. Co-evolution is a powerful way to learn about the functional interdependencies between sets of genes and cellular functions and to predict physical interactions. More generally, it can be used for answering fundamental questions about the evolution of biological systems. Orthologs that exhibit a strong signal of co-evolution in a certain part of the evolutionary tree may show a mild signal of co-evolution in other branches of the tree. The major reasons for this phenomenon are noise in the biological input, genes that gain or lose functions, and the fact that some measures of co-evolution relate to rare events such as positive selection. Previous publications in the field dealt with the problem of finding sets of genes that co-evolved along an entire underlying phylogenetic tree, without considering the fact that often co-evolution is local.

Results

In this work, we describe a new set of biological problems that are related to finding patterns of local co-evolution. We discuss their computational complexity and design algorithms for solving them. These algorithms outperform other bi-clustering methods as they are designed specifically for solving the set of problems mentioned above. We use our approach to trace the co-evolution of fungal, eukaryotic, and mammalian genes at high resolution across the different parts of the corresponding phylogenetic trees. Specifically, we discover regions in the fungi tree that are enriched with positive evolution. We show that metabolic genes exhibit a remarkable level of co-evolution and different patterns of co-evolution in various biological datasets. In addition, we find that protein complexes that are related to gene expression exhibit non-homogenous levels of co-evolution across different parts of the fungi evolutionary line. In the case of mammalian evolution, signaling pathways that are related to neurotransmission exhibit a relatively higher level of co-evolution along the primate subtree.

Conclusions

We show that finding local patterns of co-evolution is a computationally challenging task and we offer novel algorithms that allow us to solve this problem, thus opening a new approach for analyzing the evolution of biological systems.  相似文献   

11.
Improved method for predicting linear B-cell epitopes   总被引:2,自引:0,他引:2  

Background

B-cell epitopes are the sites of molecules that are recognized by antibodies of the immune system. Knowledge of B-cell epitopes may be used in the design of vaccines and diagnostics tests. It is therefore of interest to develop improved methods for predicting B-cell epitopes. In this paper, we describe an improved method for predicting linear B-cell epitopes.

Results

In order to do this, three data sets of linear B-cell epitope annotated proteins were constructed. A data set was collected from the literature, another data set was extracted from the AntiJen database and a data sets of epitopes in the proteins of HIV was collected from the Los Alamos HIV database. An unbiased validation of the methods was made by testing on data sets on which they were neither trained nor optimized on. We have measured the performance in a non-parametric way by constructing ROC-curves.

Conclusion

The best single method for predicting linear B-cell epitopes is the hidden Markov model. Combining the hidden Markov model with one of the best propensity scale methods, we obtained the BepiPred method. When tested on the validation data set this method performs significantly better than any of the other methods tested. The server and data sets are publicly available at http://www.cbs.dtu.dk/services/BepiPred.  相似文献   

12.

Background and Aims

The Neotropical tribe Trimezieae are taxonomically difficult. They are generally characterized by the absence of the features used to delimit their sister group Tigridieae. Delimiting the four genera that make up Trimezieae is also problematic. Previous family-level phylogenetic analyses have not examined the monophyly of the tribe or relationships within it. Reconstructing the phylogeny of Trimezieae will allow us to evaluate the status of the tribe and genera and to examine the suitability of characters traditionally used in their taxonomy.

Methods

Maximum parsimony and Bayesian phylogenetic analyses are presented for 37 species representing all four genera of Trimezieae. Analyses were based on nrITS sequences and a combined plastid dataset. Ancestral character state reconstructions were used to investigate the evolution of ten morphological characters previously considered taxonomically useful.

Key Results

Analyses of nrITS and plastid datasets strongly support the monophyly of Trimezieae and recover four principal clades with varying levels of support; these clades do not correspond to the currently recognized genera. Relationships within the four clades are not consistently resolved, although the conflicting resolutions are not strongly supported in individual analyses. Ancestral character state reconstructions suggest considerable homoplasy, especially in the floral characters used to delimit Pseudotrimezia.

Conclusions

The results strongly support recognition of Trimezieae as a tribe but suggest that both generic- and species-level taxonomy need revision. Further molecular analyses, with increased sampling of taxa and markers, are needed to support any revision. Such analyses will help determine the causes of discordance between the plastid and nuclear data and provide a framework for identifying potential morphological synapomorphies for infra-tribal groups. The results also suggest Trimezieae provide a promising model for evolutionary research.  相似文献   

13.

Background

The orders Ascaridida, Oxyurida, and Spirurida represent major components of zooparasitic nematode diversity, including many species of veterinary and medical importance. Phylum-wide nematode phylogenetic hypotheses have mainly been based on nuclear rDNA sequences, but more recently complete mitochondrial (mtDNA) gene sequences have provided another source of molecular information to evaluate relationships. Although there is much agreement between nuclear rDNA and mtDNA phylogenies, relationships among certain major clades are different. In this study we report that mtDNA sequences do not support the monophyly of Ascaridida, Oxyurida and Spirurida (clade III) in contrast to results for nuclear rDNA. Results from mtDNA genomes show promise as an additional independently evolving genome for developing phylogenetic hypotheses for nematodes, although substantially increased taxon sampling is needed for enhanced comparative value with nuclear rDNA. Ultimately, topological incongruence (and congruence) between nuclear rDNA and mtDNA phylogenetic hypotheses will need to be tested relative to additional independent loci that provide appropriate levels of resolution.

Results

For this comparative phylogenetic study, we determined the complete mitochondrial genome sequences of three nematode species, Cucullanus robustus (13,972 bp) representing Ascaridida, Wellcomia siamensis (14,128 bp) representing Oxyurida, and Heliconema longissimum (13,610 bp) representing Spirurida. These new sequences were used along with 33 published nematode mitochondrial genomes to investigate phylogenetic relationships among chromadorean orders. Phylogenetic analyses of both nucleotide and amino acid sequence datasets support the hypothesis that Ascaridida is nested within Rhabditida. The position of Oxyurida within Chromadorea varies among analyses; in most analyses this order is sister to the Ascaridida plus Rhabditida clade, with representative Spirurida forming a distinct clade, however, in one case Oxyurida is sister to Spirurida. Ascaridida, Oxyurida, and Spirurida (the sampled clade III taxa) do not form a monophyletic group based on complete mitochondrial DNA sequences. Tree topology tests revealed that constraining clade III taxa to be monophyletic, given the mtDNA datasets analyzed, was a significantly worse result.

Conclusion

The phylogenetic hypotheses from comparative analysis of the complete mitochondrial genome data (analysis of nucleotide and amino acid datasets, and nucleotide data excluding 3rd positions) indicates that nematodes representing Ascaridida, Oxyurida and Spirurida do not share an exclusive most recent common ancestor, in contrast to published results based on nuclear ribosomal DNA. Overall, mtDNA genome data provides reliable support for nematode relationships that often corroborates findings based on nuclear rDNA. It is anticipated that additional taxonomic sampling will provide a wealth of information on mitochondrial genome evolution and sequence data for developing phylogenetic hypotheses for the phylum Nematoda.
  相似文献   

14.

Background

The availability of hundreds of bacterial genomes allowed a comparative genomic study of the Type VI Secretion System (T6SS), recently discovered as being involved in pathogenesis. By combining comparative and phylogenetic approaches using more than 500 prokaryotic genomes, we characterized the global T6SS genetic structure in terms of conservation, evolution and genomic organization.

Results

This genome wide analysis allowed the identification of a set of 13 proteins constituting the T6SS protein core and a set of conserved accessory proteins. 176 T6SS loci (encompassing 92 different bacteria) were identified and their comparison revealed that T6SS-encoded genes have a specific conserved genetic organization. Phylogenetic reconstruction based on the core genes showed that lateral transfer of the T6SS is probably its major way of dissemination among pathogenic and non-pathogenic bacteria. Furthermore, the sequence analysis of the VgrG proteins, proposed to be exported in a T6SS-dependent way, confirmed that some C-terminal regions possess domains showing similarities with adhesins or proteins with enzymatic functions.

Conclusion

The core of T6SS is composed of 13 proteins, conserved in both pathogenic and non-pathogenic bacteria. Subclasses of T6SS differ in regulatory and accessory protein content suggesting that T6SS has evolved to adapt to various microenvironments and specialized functions. Based on these results, new functional hypotheses concerning the assembly and function of T6SS proteins are proposed.  相似文献   

15.

Background

Sexual size dimorphism (SSD) is widespread and variable among animals. Sexual selection, fecundity selection and ecological divergence between males and females are the major evolutionary forces of SSD. However, the influences of mating system and habitat types on SSD have received little attention. Here, using phylogenetic comparative methods, we at first examine the hypotheses to that mating system (intensity of sexual selection) and habitat types affect significantly variation in SSD in anurans (39 species and 18 genera).

Results

Our data set encompass 39 species with female-biased SSD. We provide evidence that the effects of mating system and habitat types on SSD were non-significant across species, also when the analyses were phylogenetically corrected.

Conclusions

Contrast to the hypotheses, our findings suggest that mating system and habitat types do not play an important role in shaping macro-evolutionary patterns of SSD in anurans. Mating system and habitat types cannot explain the variation in SSD when correcting for phylogenetic effects.
  相似文献   

16.
17.

Background

Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences.

Results

Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours.

Conclusions

We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.  相似文献   

18.

Background

Although it has proven to be an important foundation for investigations of carnivoran ecology, biology and evolution, the complete species-level supertree for Carnivora of Bininda-Emonds et al. is showing its age. Additional, largely molecular sequence data are now available for many species and the advancement of computer technology means that many of the limitations of the original analysis can now be avoided. We therefore sought to provide an updated estimate of the phylogenetic relationships within all extant Carnivora, again using supertree analysis to be able to analyze as much of the global phylogenetic database for the group as possible.

Results

In total, 188 source trees were combined, representing 114 trees from the literature together with 74 newly constructed gene trees derived from nearly 45,000 bp of sequence data from GenBank. The greater availability of sequence data means that the new supertree is almost completely resolved and also better reflects current phylogenetic opinion (for example, supporting a monophyletic Mephitidae, Eupleridae and Prionodontidae; placing Nandinia binotata as sister to the remaining Feliformia). Following an initial rapid radiation, diversification rate analyses indicate a downturn in the net speciation rate within the past three million years as well as a possible increase some 18.0 million years ago; numerous diversification rate shifts within the order were also identified.

Conclusions

Together, the two carnivore supertrees remain the only complete phylogenetic estimates for all extant species and the new supertree, like the old one, will form a key tool in helping us to further understand the biology of this charismatic group of carnivores.  相似文献   

19.

Background

Speciose clades usually harbor species with a broad spectrum of adaptive strategies and complex distribution patterns, and thus constitute ideal systems to disentangle biotic and abiotic causes underlying species diversification. The delimitation of such study systems to test evolutionary hypotheses is difficult because they often rely on artificial genus concepts as starting points. One of the most prominent examples is the bellflower genus Campanula with some 420 species, but up to 600 species when including all lineages to which Campanula is paraphyletic. We generated a large alignment of petD group II intron sequences to include more than 70% of described species as a reference. By comparison with partial data sets we could then assess the impact of selective taxon sampling strategies on phylogenetic reconstruction and subsequent evolutionary conclusions.

Methodology/Principal Findings

Phylogenetic analyses based on maximum parsimony (PAUP, PRAP), Bayesian inference (MrBayes), and maximum likelihood (RAxML) were first carried out on the large reference data set (D680). Parameters including tree topology, branch support, and age estimates, were then compared to those obtained from smaller data sets resulting from “classification-guided” (D088) and “phylogeny-guided sampling” (D101). Analyses of D088 failed to fully recover the phylogenetic diversity in Campanula, whereas D101 inferred significantly different branch support and age estimates.

Conclusions/Significance

A short genomic region with high phylogenetic utility allowed us to easily generate a comprehensive phylogenetic framework for the speciose Campanula clade. Our approach recovered 17 well-supported and circumscribed sub-lineages. Knowing these will be instrumental for developing more specific evolutionary hypotheses and guide future research, we highlight the predictive value of a mass taxon-sampling strategy as a first essential step towards illuminating the detailed evolutionary history of diverse clades.  相似文献   

20.

Background

Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult.

Results

S TAR N ET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. S TAR N ET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new H EAT S EEKER module.

Conclusion

S TAR N ET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to compare two networks. The list of genes in a S TAR N ET network may be useful in developing a list of candidate genes to use for the inference of causal networks. The tool is freely available at http://vanburenlab.medicine.tamhsc.edu/starnet2.html, and does not require user registration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号