共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
A novel gene selection algorithm based on the gene regulation probability is proposed. In this algorithm, a probabilistic model is established to estimate gene regulation probabilities using the maximum likelihood estimation method and then these probabilities are used to select key genes related by class distinction. The application on the leukemia data-set suggests that the defined gene regulation probability can identify the key genes to the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) class distinction and the result of our proposed algorithm is competitive to those of the previous algorithms. 相似文献
4.
Energy constraints on the evolution of gene expression 总被引:8,自引:0,他引:8
Wagner A 《Molecular biology and evolution》2005,22(6):1365-1374
5.
6.
The microarray technique has become a standard means in simultaneously examining expression of all genes measured in different circumstances. As microarray data are typically characterized by high dimensional features with a small number of samples, feature selection needs to be incorporated to identify a subset of genes that are meaningful for biological interpretation and accountable for the sample variation. In this article, we present a simple, yet effective feature selection framework suitable for two-dimensional microarray data. Our correlation-based, nonparametric approach allows compact representation of class-specific properties with a small number of genes. We evaluated our method using publicly available experimental data and obtained favorable results. 相似文献
7.
A single determinant dominates the rate of yeast protein evolution 总被引:21,自引:0,他引:21
A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast. 相似文献
8.
Maximum likelihood methods for detecting adaptive evolution after gene duplication 总被引:12,自引:0,他引:12
The rapid accumulation of genomic sequences in public databases will finally allow large scale studies of gene family evolution, including evaluation of the role of positive Darwinian selection following a duplication event. This will be possible because recent statistical methods of comparing synonymous and nonsynonymous substitution rates permit reliable detection of positive selection at individual amino acid sites and along evolutionary lineages. Here, we summarize maximum-likelihood based methods, and present a framework for their application to analysis of gene families. Using these methods, we investigated the role of positive Darwinian selection in the ECP-EDN gene family of primates and the Troponin C gene family of vertebrates. We also comment on the limitations of these methods and discuss directions for further improvements. 相似文献
9.
The ordinary-, penalized-, and bootstrap t-test, least squares and best linear unbiased prediction were compared for their false discovery rates (FDR), i.e. the fraction of falsely discovered genes, which was empirically estimated in a duplicate of the data set. The bootstrap-t-test yielded up to 80% lower FDRs than the alternative statistics, and its FDR was always as good as or better than any of the alternatives. Generally, the predicted FDR from the bootstrapped P-values agreed well with their empirical estimates, except when the number of mRNA samples is smaller than 16. In a cancer data set, the bootstrap-t-test discovered 200 differentially regulated genes at a FDR of 2.6%, and in a knock-out gene expression experiment 10 genes were discovered at a FDR of 3.2%. It is argued that, in the case of microarray data, control of the FDR takes sufficient account of the multiple testing, whilst being less stringent than Bonferoni-type multiple testing corrections. Extensions of the bootstrap simulations to more complicated test-statistics are discussed. 相似文献
10.
11.
The function of individual sites within a protein influences their rate of accepted point mutation. During the computation of phylogenetic likelihoods, rate heterogeneity can be modeled on a site-per-site basis with relative rates drawn from a discretized Gamma-distribution. Site-rate estimates (e.g., the rate of highest posterior probability given the data at a site) can then be used as a measure of evolutionary constraints imposed by function. However, if the sequence availability is limited, the estimation of rates is subject to sampling error. This article presents a simulation study that evaluates the robustness of evolutionary site-rate estimates for both small and phylogenetically unbalanced samples. The sampling error on rate estimates was first evaluated for alignments that included 5-45 sequences, sampled by jackknifing, from a master alignment containing 968 sequences. We observed that the potentially enhanced resolution among site rates due to the inclusion of a larger number of rate categories is negated by the difficulty in correctly estimating intermediate rates. This effect is marked for data sets with less than 30 sequences. Although the computation of likelihood theoretically accounts for phylogenetic distances through branch lengths, the introduction of a single long-branch outlier sequence had a significant negative effect on site-rate estimates. Finally, the presence of a shift in rates of evolution between related lineages can be diagnostic of a gain/loss of function within a protein family. Our analyses indicate that detecting these rate shifts is a harder problem than estimating rates. This is so, partially, because the difference in rates depends on two rate estimates, each with an intrinsic uncertainty. The performances of four methods to detect these site-rate shifts are evaluated and compared. Guidelines are suggested for preparing data sets minimally influenced by error introduced by sequence sampling. 相似文献
12.
It has been known that the conservation or diversity of homeobox genes is responsible for the similarity and variability of some of the morphological or physiological characters among different organisms. To gain some insights into the evolutionary pattern of homeobox genes in bilateral animals, we studied the change of the numbers of these genes during the evolution of bilateral animals. We analyzed 2,031 homeodomain sequences compiled from 11 species of bilateral animals ranging from Caenorhabditis elegans to humans. Our phylogenetic analysis using a modified reconciled-tree method suggested that there were at least about 88 homeobox genes in the common ancestor of bilateral animals. About 50-60 genes of them have left at least one descendant gene in each of the 11 species studied, suggesting that about 30-40 genes were lost in a lineage-specific manner. Although similar numbers of ancestral genes have survived in each species, vertebrate lineages gained many more genes by duplication than invertebrate lineages, resulting in more than 200 homeobox genes in vertebrates and about 100 in invertebrates. After these gene duplications, a substantial number of old duplicate genes have also been lost in each lineage. Because many old duplicate genes were lost, it is likely that lost genes had already been differentiated from other groups of genes at the time of gene loss. We conclude that both gain and loss of homeobox genes were important for the evolutionary change of phenotypic characters in bilateral animals. 相似文献
13.
《Expert review of proteomics》2013,10(1):67-75
The rapid expansion of methods for measuring biological data ranging from DNA sequence variations to mRNA expression and protein abundance presents the opportunity to utilize multiple types of information jointly in the study of human health and disease. Organisms are complex systems that integrate inputs at myriad levels to arrive at an observable phenotype. Therefore, it is essential that questions concerning the etiology of phenotypes as complex as common human diseases take the systemic nature of biology into account, and integrate the information provided by each data type in a manner analogous to the operation of the body itself. While limited in scope, the initial forays into the joint analysis of multiple data types have yielded interesting results that would not have been reached had only one type of data been considered. These early successes, along with the aforementioned theoretical appeal of data integration, provide impetus for the development of methods for the parallel, high-throughput analysis of multiple data types. The idea that the integrated analysis of multiple data types will improve the identification of biomarkers of clinical endpoints, such as disease susceptibility, is presented as a working hypothesis. 相似文献
14.
Wang H 《Genetica》2009,136(1):149-161
Bmal1 (Brain and muscle ARNT
like 1) gene is a key circadian clock gene. Tetrapods also have the second Bmal gene, Bmal2. Fruit fly has only one bmal1/cycle gene. Interrogation of the five teleost fish genome sequences coupled with phylogenetic and splice site analyses found that
zebrafish have two bmal1 genes, bmal1a and bmal1b, and bmal2a; Japanese pufferfish (fugu), green spotted pufferfish (tetraodon) and Japanese medaka fish each have two bmal2 genes, bmal2a and bmal2b, and bmal1a; and three-spine stickleback have bmal1a and bmal2b. Syntenic analysis further indicated that zebrafish bmal1a/bmal1b, and fugu, tetraodon and medaka bmal2a/bmal2b are ancient duplicates. Although the dN/dS ratios of these four fish bmal duplicates are all <1, implicating they have been under purifying selection, the Tajima relative rate test showed that fugu,
tetraodon and medaka bmal2a/bmal2b have asymmetric evolutionary rates, suggesting that one of these duplicates have been subject to positive selection or relaxed
functional constraint. These results support the notion that teleost fish bmal genes were derived from the fish-specific genome duplication (FSGD), divergent resolution following the duplication led to
retaining different ancient bmal duplicates in different fishes, which could have shaped the evolution of the complex teleost fish timekeeping mechanisms.
Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. 相似文献
15.
The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses 总被引:17,自引:0,他引:17
Kilian J Whitehead D Horak J Wanke D Weinl S Batistic O D'Angelo C Bornberg-Bauer E Kudla J Harter K 《The Plant journal : for cell and molecular biology》2007,50(2):347-363
16.
Our present work focuses on the set of genes, which are involved in primary brain tumors - the glioma pathway. These gliomas are mostly malignant (cancerous) in nature and are difficult to be cured and that's why they attract the attention of all the workers. To understand the relative functionality of these genes, we analyzed the expression pattern of all genes, using gene expression data, at genomic level, and then to check their universality in all other cancers, we compared their expression levels and patterns in all other types of cancers by using gene expression graphs, and observed their expression levels in all these cancers, whether they are over or under expressed. We found that every gene has its own unique expression pattern and level and on that basis it can be classified. We also found that oncogenes and tumor suppressor genes that were involved in the glioma pathway were showing similar expression patterns in other cancers too but their expression level is low. 相似文献
17.
18.
Barkman TJ 《Molecular biology and evolution》2003,20(2):168-172
Isoeugenol-O-methyltransferase (IEMT) is an enzyme involved in the production of the floral volatile compounds methyl eugenol and methyl isoeugenol in Clarkia breweri (Onagraceae). IEMT likely evolved by gene duplication from caffeic acid-O-methyltransferase followed by amino acid divergence, leading to the acquisition of its novel function. To investigate the selective context under which IEMT evolved, maximum likelihood methods that estimate variable d(N)/d(S) ratios among lineages, among sites, and among a combination of both lineages and sites were utilized. Statistically significant support was obtained for a hypothesis of positive selection driving the evolution of IEMT since its origin. Subsequent Bayesian analyses identified several sites in IEMT that have experienced positive selection. Most of these positions are in the active site of IEMT and have been shown by site-directed mutagenesis to have large effects on substrate specificity. Although the selective agent is unknown, the adaptive evolution of this gene may have resulted in increased effectiveness of pollinator attraction or herbivore repellence. 相似文献
19.
Theories focused on kinship and the genetic conflict it induces are widely considered to be the primary explanations for the evolution of genomic imprinting. However, there have appeared many competing ideas that do not involve kinship/conflict. These ideas are often overlooked because kinship/conflict is entrenched in the literature, especially outside evolutionary biology. Here we provide a critical overview of these non-conflict theories, providing an accessible perspective into this literature. We suggest that some of these alternative hypotheses may, in fact, provide tenable explanations of the evolution of imprinting for at least some loci. 相似文献
20.
Kaburagi H Sugano N Oshikawa M Koshi R Senda N Kawamoto K Ito K 《Acta biochimica et biophysica Sinica》2007,39(6):399-405
To analyze the molecular events that occur in the developing mandible, we examined the expression of 8803 genes from samples taken at different time points during rat postnatal mandible development. Total RNA was extracted from the mandibles of 1-day-old, 1-week-old, and 2-week-old rats. Complementary RNA (cRNA) was synthesized from cDNA and biotinylated. Fragmented cRNA was hybridized to RGU34A GeneChip arrays. Among the 8803 genes tested, 4344 were detectable. We identified 148 genes with significantly increased expression, and 19 genes with significantly decreased expression. A comprehensive analysis appears to be an effective method of studying the complex process of development. 相似文献