首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
DNA microarray technology provides useful tools for profiling global gene expression patterns in different cell/tissue samples. One major challenge is the large number of genes relative to the number of samples. The use of all genes can suppress or reduce the performance of a classification rule due to the noise of nondiscriminatory genes. Selection of an optimal subset from the original gene set becomes an important prestep in sample classification. In this study, we propose a family-wise error (FWE) rate approach to selection of discriminatory genes for two-sample or multiple-sample classification. The FWE approach controls the probability of the number of one or more false positives at a prespecified level. A public colon cancer data set is used to evaluate the performance of the proposed approach for the two classification methods: k nearest neighbors (k-NN) and support vector machine (SVM). The selected gene sets from the proposed procedure appears to perform better than or comparable to several results reported in the literature using the univariate analysis without performing multivariate search. In addition, we apply the FWE approach to a toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV) for a total of 55 samples for a multisample classification. Two gene sets are considered: the gene set omegaF formed by the ANOVA F-test, and a gene set omegaT formed by the union of one-versus-all t-tests. The predicted accuracies are evaluated using the internal and external crossvalidation. Using the SVM classification, the overall accuracies to predict 55 samples into one of the nine treatments are above 80% for internal crossvalidation. OmegaF has slightly higher accuracy rates than omegaT. The overall predicted accuracies are above 70% for the external crossvalidation; the two gene sets omegaT and omegaF performed equally well.  相似文献   

2.
3.
Accurate quantification by real-time RT-PCR relies on normalisation of the measured gene expression data. Normalisation with multiple reference genes is becoming the standard, but the best reference genes for gene expression studies within one organism may depend on the applied treatments or the organs and tissues studied. Ideally, reference genes should be evaluated in all experimental systems. A number of candidate reference genes for Arabidopsis have been proposed, which can be used as a starting point to evaluate their expression stability in individual experimental systems by available computer algorithms like geNorm and NormFinder. Using this approach, we identified the best three reference genes from a set of ten candidates, which included three traditional “housekeeping” genes, for normalisation of gene expression when roots and leaves of Arabidopsis thaliana are exposed to cadmium (Cd) and copper (Cu). The expression stabilities of AT5G15710 (F-box protein), AT2G28390 (SAND family protein) and AT5G08290 (mitosis protein YLS8) were the highest when considering the effect to the roots and shoots of Cd and Cu treatments. Even though the effect of Cd and excess Cu on the plants is very different, the same best reference genes were identified when considering Cd or Cu treatments separately. This suggests that these three genes may also be suitable when studying the gene expression after exposure of Arabidopsis thaliana to increased concentrations of other metals. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

4.
李丽  李霞  陈义汉  郭政  姜伟  张瑞杰  饶绍奇 《遗传》2006,28(9):1129-1134
基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型, 因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点, 提出了基于耦合双向聚类的异质性分析方法(Heterogeneous Analysis Based on Coupled Two-Way Clustering, HCTWC)来搜索有意义的基因簇以便发现样本的内在分割。该方法被应用于弥漫性大B细胞淋巴瘤(diffuse large B-cell lymphoma DLBCL)芯片数据集, 通过识别的基因簇作为特征对DLBCL样本聚类发现生存期分别为55%和25%的两类DLBCL亚型(P<0.05), 因此, HCTWC方法在解决疾病异质性是有效的。  相似文献   

5.
To identify and evaluate potentially useful biomarkers for oxidative stress as early warning indices in the polychaete, Perinereis nuntia, we exposed P. nuntia to copper (Cu) and measured several biomarker enzymes (glutathione S-transferase; GST, glutathione peroxidase; GPx, Metallothionein-like protein; MTLPs, and catalase; CAT) and genes (Pn-GSTs, Pn-CAT, and Pn-MT) with a cellular oxidative index, reactive oxygen species (ROS) level. Accumulated Cu concentrations in P. nuntia increased in a time-dependent manner. Intracellular ROS reached high levels 6h after exposure in P. nuntia with an increase of GST activity and glutathione (GSH) content. Particularly, GSH in polychaetes showed a positive correlation with Cu contents accumulated in P. nuntia. Messenger RNA expressions of GST sigma and GST omega showed relatively high expressions at 50 μg/L of Cu exposure, even though the moderate increase of rest of GST isoforms was also observed. Also regarding long-term exposure, we reared P. nuntia in sediments for 15 days, and found that there was an obvious increase of Pn-GSTs, Pn-CAT, and Pn-MT genes with elevated concentrations of Cu and Cd in polychaete body, compared to initial levels, suggesting that P. nuntia in sediment was affected by metals as well as by other organic pollutants to induce oxidative stress genes and enzymes. These findings suggest that oxidative stress is a potential modulator of defense system of P. nuntia. Several potential biomarker genes are available as early warning signals for environmental biomonitoring.  相似文献   

6.
采用基因表达谱可以研究基因功能模块与疾病异质性之间的关系.根据两套白血病基因表达谱数据,将富集高变异基因的Gene Ontology基因功能模块作为特征功能模块,将疾病样本聚为两类.通过对比原始多类标签,采用聚类评估指标来分析两类化聚类结果的效果,并探讨特征功能模块与疾病异质性之间的关系.实验结果显示:在两套不同的白血病基因表达谱数据中得到的特征功能模块类似,它们对白血病亚型有较强的分型能力.  相似文献   

7.
Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.  相似文献   

8.
This study aimed at investigating both the individual and combined effects of cadmium (Cd) and arsenate (AsV) on the physiology and behaviour of the Crustacean Gammarus pulex at three temperatures (5, 10 and 15 °C). G. pulex was exposed during 96 h to (i) two [Cd] alone, (ii) two [AsV] alone, and (iii) four combinations of [Cd] and [AsV] to obtain a complete factorial plane. After exposure, survival, [AsV] or [Cd] in body tissues, behavioural (ventilatory and locomotor activities) and physiological responses (iono-regulation of [Na(+)] and [Cl(-)] in haemolymph) were examined. The interactive effects (antagonistic, additive or synergistic) of binary mixtures were evaluated for each tested temperature using a predictive model for the theoretically expected interactive effect of chemicals. In single metal exposure, both the internal metal concentration in body tissues and the mortality rate increased along metallic gradient concentration. Cd alone significantly impaired both [Na(+)] and [Cl(-)] while AsV alone had a weak impact only on [Cl(-)]. The behavioural responses of G. pulex declined with increasing metal concentration suggesting a reallocation of energy from behavioural responses to maintenance functions. The interaction between AsV and Cd was considered as 'additive' for all the tested binary mixtures and temperatures (except for the lowest combination at 10 °C considered as "antagonistic"). In binary mixtures, the decrease in both ventilatory and locomotor activities and the decline in haemolymphatic [Cl(-)] were amplified when respectively compared to those observed with the same concentrations of AsV or Cd alone. However, the presence of AsV decreased the haemolymphatic [Na(+)] loss when G. pulex was exposed to the lowest Cd concentration. Finally, the observed physiological and behavioural effects (except ventilation) in G. pulex exposed to AsV and/or Cd were exacerbated under the highest temperature. The discussion encompasses both the toxicity mechanisms of these metals and their interaction with rising temperature.  相似文献   

9.
Use of whole genome sequence data to infer baculovirus phylogeny   总被引:18,自引:0,他引:18       下载免费PDF全文
Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance analysis and a novel method developed here, termed neighbor pair analysis. The third set recorded gene content by scoring gene presence or absence in each genome. All three data sets yielded phylogenies supporting the separation of the Nucleopolyhedrovirus (NPV) and Granulovirus (GV) genera, the division of the NPVs into groups I and II, and species relationships within group I NPVs. Generation of phylogenies based on the combined sequences of all 63 shared genes proved to be the most effective approach to resolving the relationships among the group II NPVs and the GVs. The history of gene acquisitions and losses that have accompanied baculovirus diversification was visualized by mapping the gene content data onto the phylogenetic tree. This analysis highlighted the fluid nature of baculovirus genomes, with evidence of frequent genome rearrangements and multiple gene content changes during their evolution. Of more than 416 genes identified in the genomes analyzed, only 63 are present in all nine genomes, and 200 genes are found only in a single genome. Despite this fluidity, the whole genome-based methods we describe are sufficiently powerful to recover the underlying phylogeny of the viruses.  相似文献   

10.
11.
Gene expression profiling has been widely used to study molecular signatures of many diseases and to develop molecular diagnostics for disease prediction. Gene selection, as an important step for improved diagnostics, screens tens of thousands of genes and identifies a small subset that discriminates between disease types. A two-step gene selection method is proposed to identify informative gene subsets for accurate classification of multiclass phenotypes. In the first step, individually discriminatory genes (IDGs) are identified by using one-dimensional weighted Fisher criterion (wFC). In the second step, jointly discriminatory genes (JDGs) are selected by sequential search methods, based on their joint class separability measured by multidimensional weighted Fisher criterion (wFC). The performance of the selected gene subsets for multiclass prediction is evaluated by artificial neural networks (ANNs) and/or support vector machines (SVMs). By applying the proposed IDG/JDG approach to two microarray studies, that is, small round blue cell tumors (SRBCTs) and muscular dystrophies (MDs), we successfully identified a much smaller yet efficient set of JDGs for diagnosing SRBCTs and MDs with high prediction accuracies (96.9% for SRBCTs and 92.3% for MDs, resp.). These experimental results demonstrated that the two-step gene selection method is able to identify a subset of highly discriminative genes for improved multiclass prediction.  相似文献   

12.
The P1B-type heavy metal ATPases (HMAs) are diverse in terms of tissue distribution, subcellular localization, and metal specificity. Functional studies of HMAs have shown that these transporters can be divided into two subgroups based on their metal-substrate specificity: a copper (Cu)/silver (Ag) group and a zinc (Zn)/cobalt (Co)/cadmium (Cd)/lead (Pb) group. Studies on Arabidopsis thaliana and metal hyperaccumulator plants indicate that HMAs play an important role in the translocation or detoxification of Zn and Cd in plants. Rice possesses nine HMA genes, of which OsHMA1–OsHMA3 belong to the Zn/Co/Cd/Pb subgroup. OsHMA2 plays an important role in root-to-shoot translocation of Zn and Cd, and participates in Zn and Cd transport to developing seeds in rice. OsHMA3 transports Cd and plays a role in the sequestration of Cd into vacuoles in root cells. Modification of the expression of these genes might be an effective approach for reducing the Cd concentration in rice grains.  相似文献   

13.
14.
Analysis of microbial community structure by multivariate ordination methods, using data obtained by high‐throughput sequencing of amplified markers (i.e., DNA metabarcoding), often requires clustering of DNA sequences into operational taxonomic units (OTUs). Parameters for the clustering procedure tend not to be justified but are set by tradition rather than being based on explicit knowledge. In this study, we explore the extent to which ordination results are affected by variation in parameter settings for the clustering procedure. Amplicon sequence data from nine microbial community studies, representing different sampling designs, spatial scales and ecosystems, were subjected to clustering into OTUs at seven different similarity thresholds (clustering thresholds) ranging from 87% to 99% sequence similarity. The 63 data sets thus obtained were subjected to parallel DCA and GNMDS ordinations. The resulting community structures were highly similar across all clustering thresholds. We explain this pattern by the existence of strong ecological structuring gradients and phylogenetically diverse sets of abundant OTUs that are highly stable across clustering thresholds. Removing low‐abundance, rare OTUs had negligible effects on community patterns. Our results indicate that microbial data sets with a clear gradient structure are highly robust to choice of sequence clustering threshold.  相似文献   

15.
Clustering of microarray gene expression data is performed routinely, for genes as well as for samples. Clustering of genes can exhibit functional relationships between genes; clustering of samples on the other hand is important for finding e.g. disease subtypes, relevant patient groups for stratification or related treatments. Usually this is done by first filtering the genes for high-variance under the assumption that they carry most of the information needed for separating different sample groups. If this assumption is violated, important groupings in the data might be lost. Furthermore, classical clustering methods do not facilitate the biological interpretation of the results. Therefore, we propose to methodologically integrate the clustering algorithm with prior biological information. This is different from other approaches as knowledge about classes of genes can be directly used to ease the interpretation of the results and possibly boost clustering performance. Our approach computes dendrograms that resemble decision trees with gene classes used to split the data at each node which can help to find biologically meaningful differences between the sample groups. We have tested the proposed method both on simulated and real data and conclude its usefulness as a complementary method, especially when assumptions of few differentially expressed genes along with an informative mapping of genes to different classes are met.  相似文献   

16.
In an attempt to develop a method to discriminate among isolates of Listeria monocytogenes, the sequences of all of the annotated genes from the fully sequenced strain L. monocytogenes EGD-e (serotype 1/2a) were compared by BLASTn to a file of the unfinished genomic sequence of L. monocytogenes ATCC 19115 (serotype 4b). Approximately 7% of the matching genes demonstrated 90% or lower identity between the two strains, and the lowest observed identity was 80%. Nine genes (hisJ, cbiE, truB, ribC, comEA, purM, aroE, hisC, and addB) in the 80 to 90% identity group and two genes (gyrB and rnhB) with approximately 97% identity were selected for multilocus sequence analysis in two sets of L. monocytogenes isolates (a 15-strain diversity set and a set of 19 isolates from a single food-processing plant). Based on concatenated sequences, a total of 33 allotypes were differentiated among the 34 isolates tested. Population genetics analyses revealed three lineages of L. monocytogenes that differed in their history of apparent recombination. Lineage I appeared to be completely clonal, whereas representatives of the other lineages demonstrated evidence of horizontal gene transfer and recombination. Although most of the gene sequences for lineage II strains were distinct from those of lineage I, a few strains with the majority of genes characteristic of lineage II had some that were characteristic of lineage I. Genes from lineage III organisms were mostly similar to lineage I genes, with instances of genes appearing to be mosaics with lineage II genes. Even though lineage I and lineage II generally demonstrated very distinct sequences, the sequences for the 11 selected genes demonstrated little discriminatory power within each lineage. In the L. monocytogenes isolate set obtained from one food-processing plant, lineage I and lineage II were found to be almost equally prevalent. While it appears that different lineages of L. monocytogenes can share habitats, they appear to differ in their histories of horizontal gene transfer.  相似文献   

17.
Morphological data supports monotremes as the sister group of Theria (extant marsupials + eutherians), but phylogenetic analyses of 12 mitochondrial protein-coding genes have strongly supported the grouping of monotremes with marsupials: the Marsupionta hypothesis. Various nuclear genes tend to support Theria, but a comprehensive study of long concatenated sequences and broad taxon sampling is lacking. We therefore determined sequences from six nuclear genes and obtained additional sequences from the databases to create two large and independent nuclear data sets. One (data set I) emphasized taxon sampling and comprised five genes, with a concatenated length of 2,793 bp, from 21 species (two monotremes, six marsupials, nine placentals, and four outgroups). The other (data set II) emphasized gene sampling and comprised eight genes and three proteins, with a concatenated length of 10,773 bp or 3,669 amino acids, from five taxa (a monotreme, a marsupial, a rodent, human, and chicken). Both data sets were analyzed by parsimony, minimum evolution, maximum likelihood, and Bayesian methods using various models and data partitions. Data set I gave bootstrap support values for Theria between 55% and 100%, while support for Marsupionta was at most 12.3%. Taking base compositional bias into account generally increased the support for Theria. Data set II exclusively supported Theria, with the highest possible values and significantly rejected Marsupionta. Independent phylogenetic evidence in support of Theria was obtained from two single amino acid deletions and one insertion, while no supporting insertions and deletions were found for Marsupionta. On the basis of our data sets, the time of divergence between Monotremata and Theria was estimated at 231-217 MYA and between Marsupialia and Eutheria at 193-186 MYA. The morphological evidence for a basal position of Monotremata, well separated from Theria, is thus fully supported by the available molecular data from nuclear genes.  相似文献   

18.
Yang Z  Nielsen R  Goldman N  Pedersen AM 《Genetics》2000,155(1):431-449
Comparison of relative fixation rates of synonymous (silent) and nonsynonymous (amino acid-altering) mutations provides a means for understanding the mechanisms of molecular sequence evolution. The nonsynonymous/synonymous rate ratio (omega = d(N)d(S)) is an important indicator of selective pressure at the protein level, with omega = 1 meaning neutral mutations, omega < 1 purifying selection, and omega > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and have different underlying omega ratios. We develop models that account for heterogeneous omega ratios among amino acid sites and apply them to phylogenetic analyses of protein-coding DNA sequences. These models are useful for testing for adaptive molecular evolution and identifying amino acid sites under diversifying selection. Ten data sets of genes from nuclear, mitochondrial, and viral genomes are analyzed to estimate the distributions of omega among sites. In all data sets analyzed, the selective pressure indicated by the omega ratio is found to be highly heterogeneous among sites. Previously unsuspected Darwinian selection is detected in several genes in which the average omega ratio across sites is <1, but in which some sites are clearly under diversifying selection with omega > 1. Genes undergoing positive selection include the beta-globin gene from vertebrates, mitochondrial protein-coding genes from hominoids, the hemagglutinin (HA) gene from human influenza virus A, and HIV-1 env, vif, and pol genes. Tests for the presence of positively selected sites and their subsequent identification appear quite robust to the specific distributional form assumed for omega and can be achieved using any of several models we implement. However, we encountered difficulties in estimating the precise distribution of omega among sites from real data sets.  相似文献   

19.
mdclust--exploratory microarray analysis by multidimensional clustering   总被引:1,自引:0,他引:1  
MOTIVATION: Unsupervised clustering of microarray data may detect potentially important, but not obvious characteristics of samples, for instance subgroups of diagnoses with distinct gene profiles or systematic errors in experimentation. RESULTS: Multidimensional clustering (mdclust) is a method, which identifies sets of sample clusters and associated genes. It applies iteratively two-means clustering and score-based gene selection. For any phenotype variable best matching sets of clusters can be selected. This provides a method to identify gene-phenotype associations, suited even for settings with a large number of phenotype variables. An optional model based discriminant step may reduce further the number of selected genes.  相似文献   

20.
Sugarcane (Saccharum spp.) offers the potential to be a phytoremediator species due to its outstanding biomass production, but its prospective metal accumulation and tolerance have not been fully characterized. Sugarcane plantlets were able to tolerate up to 100microM of copper in nutrient solution for 33 days, with no significant reduction in fresh weight, while accumulating 45mgCukg(-1) shoot dry weight. Higher levels of copper in solution (250 and 500microM) were lethal. Sugarcane displayed tolerance to 500microM Cd without symptoms of toxicity, accumulating 451mgCdkg(-1) shoot dry weight after 33 days, indicating its potential as Cd phytoremediator. DNA gel blot analyses detected 8 fragments using a metallothionein (MT) Type I probe, while 10 were revealed for the MT Type II and 8 for MT Type III. The number of genes for each type of MT in sugarcane might be similar to the ones identified in rice considering the interspecific origin of sugarcane cultivars. MT Type I gene appeared to present the highest level of constitutive expression, mainly in roots, followed by MT Type II, corroborating the expression pattern described based on large-scale expressed sequence tags sequencing. MT Type II and III genes were more expressed in shoots, where MT I was also importantly expressed. Increasing Cu concentration had little or no effect in modulating MT genes expression, while an apparent minor modulation of some of the MT genes could be detected in Cd treatments. However, the level of response was too small to explain the tolerance and/or accumulation of Cd in sugarcane tissues. Thus, cadmium tolerance and accumulation in sugarcane might derive from other mechanisms, although MT may be involved in oxidative responses to high levels of Cd. Sugarcane can be considered a potential candidate to be tested in Cd phytoremediation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号