首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An objective of many functional genomics studies is to estimate treatment-induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By-gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p-values (1 - p) versus their expectation under a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p-value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision-theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.  相似文献   

2.
Nonrandom patterns associated with adaptively evolving genes can shed light on how selection and mutation produce rapid changes in sequences. I examine such patterns in two independent families of antimicrobial peptide genes: those in frogs, which are known to have evolved under positive selection, and those in flatfishes, which I show have also evolved under positive selection. I address two recently proposed hypotheses about the molecular evolution of antimicrobial peptide genes. The first is that the mature peptide region is replicated by an error-prone polymerase that increases the mutation rate and the transversion/transition ratio compared to the signal sequence of the same genes. The second is that mature peptides evolve in a coordinated fashion with their propieces, such that a change in net charge in one molecular region prompts an opposite change in charge in the other region. I test these hypotheses using alternative methods that minimize alignment errors, correct for phylogenetic nonindependence, reduce sequence saturation, and account for differing selection pressures on different regions of the gene. In both gene families I show that divergence at both synonymous and nonsynonymous sites within the mature peptide region is enhanced. However, in neither gene family is there evidence of an increased mutational transversion/transition ratio or coordinated evolution. My observations are consistent with either an elevated mutation rate in an adaptively evolving gene region or widespread selection on “silent” sites. These hypotheses challenge the assumption that mutations are random and can be measured by the synonymous substitution rate. [Reviewing Editor: Dr. Willie J. Swanson]  相似文献   

3.
Père David's deer (Elaphurus davidianus) is a highly inbred species that arose from 11 founders but now comprises a population of about 3,000 individuals, making it interesting to investigate the adaptive variation of this species from the major histocompatibility complex (MHC) perspective. In this study, we isolated Elda-MHC class I loci using magnetic bead-based cDNA hybridization, and examined the molecular variations of these loci using single-strand conformation polymorphism (SSCP) and sequence analysis. We obtained seven MHC class I genes, which we designated F1, F12, G2, I7, AF, I8, and C1. Our analyses of stop codons, phylogenetic trees, amino acid conservation, and G+C content revealed that F1, F12, G2, and I7 were classical genes, AF was a nonclassical gene, and I8 and C1 were pseudogenes. Our subsequent molecular examinations showed that the diversity pattern in the Père David's deer was unusual. Most mammals have more polymorphic classical class I loci vs. the nonclassical and neutral genes. In contrast, the Père David's deer was found to be monomorphic at classical genes F1, F12, G2, and I7, dimorphic at the nonclassical AF gene, dimorphic at pseudogene I8, and tetramorphic at pseudogene C1. The adverse polymorphism patterns of Elda-I genes might provide evidence for selection too faster deplete MHC variation than drift in the bottlenecked populations, while the postbottleneck tetramorphism of the C1 pseudogene appears to be evidence of strong historical balancing selection.  相似文献   

4.
The duplicate tuf genes on the Salmonella enterica serovar Typhimurium chromosome co-evolve by a RecA-, RecB-dependent gene conversion mechanism. Gene conversion is defined as a non-reciprocal transfer of genetic information. However, in a replicating bacterial chromosome there is a possibility that a reciprocal genetic exchange between different tuf genes sitting on sister chromosomes could result in "apparent" gene conversion. We asked whether the major mechanism of tuf gene conversion was classical or apparent. We devised a genetic selection that allowed us to isolate and examine both expected products from a reciprocal recombination event between the tuf genes. Using this selection we tested within individual cultures for a correlation in the frequency of jackpots as expected if recombination were reciprocal. We found no correlation, either in the frequency of each type of recombinant product, or in the DNA sequences of the products resulting from each recombination event. We conclude that the evidence argues in favor of a non-reciprocal gene conversion mechanism as the basis for tuf gene co-evolution.  相似文献   

5.
Human epidermal type I transglutaminase coexists in keratinocytes with another cross-linking enzyme, tissue type II transglutaminase. There are at least five different forms of the enzyme in mammals. Gene mapping studies allowed us to determine whether the different transglutaminases are products of the same gene or separate genes. The gene encoding factor XIII subunit a transglutaminase (F13A1) was previously assigned to human chromosome 6, p24----p25. We demonstrate using somatic cell hybrids that the human epidermal type I transglutaminase gene (gene symbol is designated TGM1) is located on human chromosome 14, providing evidence that at least two human transglutaminases are encoded by separate genes.  相似文献   

6.
The genes of the major histocompatibility complex (MHC) are a central component of the immune system in vertebrates and have become important markers of functional, fitness-related genetic variation. We have investigated the evolutionary processes that generate diversity at MHC class I genes in a large population of an archaic reptile species, the tuatara (Sphenodon punctatus), found on Stephens Island, Cook Strait, New Zealand. We identified at least 2 highly polymorphic (UA type) loci and one locus (UZ) exhibiting low polymorphism. The UZ locus is characterized by low nucleotide diversity and weak balancing selection and may be either a nonclassical class I gene or a pseudogene. In contrast, the UA-type alleles have high nucleotide diversity and show evidence of balancing selection at putative peptide-binding sites. Twenty-one different UA-type genotypes were identified among 26 individuals, suggesting that the Stephens Island population has high levels of MHC class I variation. UA-type allelic diversity is generated by a mixture of point mutation and gene conversion. As has been found in birds and fish, gene conversion obscures the genealogical relationships among alleles and prevents the assignment of alleles to loci. Our results suggest that the molecular mechanisms that underpin MHC evolution in nonmammals make locus-specific amplification impossible in some species.  相似文献   

7.
The adaptive alpha-spending algorithm incorporates additional contextual evidence (including correlations among genes) about differential expression to adjust the initial p-values to yield the alpha-spending adjusted p-values. The alpha-spending algorithm is named so because of its similarity with the alpha-spending algorithm in interim analysis of clinical trials in which stage-specific significance levels are assigned to each stage of the clinical trial. We show that the Bonferroni correction applied to the alpha-spending adjusted p-values approximately controls the Family Wise Error Rate under the complete null hypothesis. Using simulations we also show that the use of the alpha spending algorithm yields increased power over the unadjusted p-values while controlling FDR. We found the greater benefits of the alpha spending algorithm with increasing sample sizes and correlation among genes. The use of the alpha spending algorithm will result in microarray experiments that make more efficient use of their data and may help conserve resources.  相似文献   

8.
9.
It has recently been shown that the three metazoan superphyla that are recognized on the basis of 18S rDNA phylogenies--ecdysozoans, lophotrochozoans, and deuterostomes--each have characteristic Hox genes. This observation has been taken further, and these "signature" Hox genes have been looked for in taxa of uncertain affinity such as the mesozoa, in order to link them to one of the three superphyla. Here I point out that, in the absence of an out-group, these so-called signature Hox genes are unpolarized characters and, as such, should not be used in this cladistic sense to determine phylogeny. Taking the example of the mesozoans, which have the Lox5 gene in common with the lophotrochozoans, I show that it is possible to polarize this character using paralogous Hox genes as proxy out-groups; however, due to the impossibility of reliable alignment outside the homeobox, only two residues of the Lox5 peptide are susceptible to this method. With this in mind, I find slim evidence for an association between mesozoans and lophotrochozoans. I demonstrate that the lophotrochozoan genes Lox2 and Lox4 would provide many more reliable residues that are truly indicative of lophotrochozoan affinity. Finally, I point out the potential problems in using unpolarized signatures to address the question of the position of the acoel flatworms.  相似文献   

10.

Background  

Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.  相似文献   

11.
Major histocompatibility complex (MHC) class I genes play a crucial role in the immune defence against intracellular pathogens. An important evolutionary strategy is to generate and maintain a high level of diversity in these genes. Humans express three highly polymorphic classical MHC class I genes (HLA-A, HLA-B and HLA-C). In contrast, some species, for example rat and rhesus macaque, maintain diversity by generation of haplotypes that vary considerably with regard to the number and combination of transcribed genes. Cattle appear to use both strategies. We show that various combinations of six apparently classical genes, three of which are highly polymorphic, are transcribed on different haplotypes. Although additional sequences were identified in both cDNA and gDNA, it was not possible to assign them to any of these defined genes. Most were highly divergent or were non-classical class I genes. Thus, we found little evidence for frequent duplication and deletion of classical class I genes as reported in some other species. However, the maintenance of class I diversity in cattle may involve limited gene shuffling and deletion, possibly as a result of unequal crossing-over within the class I region.The first two authors made an equal contribution to this work.  相似文献   

12.
Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the "important" genes and "irrelevant" genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known chi(2) distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood.  相似文献   

13.
Significance of gene ranking for classification of microarray samples   总被引:1,自引:0,他引:1  
Many methods for classification and gene selection with microarray data have been developed. These methods usually give a ranking of genes. Evaluating the statistical significance of the gene ranking is important for understanding the results and for further biological investigations, but this question has not been well addressed for machine learning methods in existing works. Here, we address this problem by formulating it in the framework of hypothesis testing and propose a solution based on resampling. The proposed r-test methods convert gene ranking results into position p-values to evaluate the significance of genes. The methods are tested on three real microarray data sets and three simulation data sets with support vector machines as the method of classification and gene selection. The obtained position p-values help to determine the number of genes to be selected and enable scientists to analyze selection results by sophisticated multivariate methods under the same statistical inference paradigm as for simple hypothesis testing methods.  相似文献   

14.

Background  

This paper presents a unified framework for finding differentially expressed genes (DEGs) from the microarray data. The proposed framework has three interrelated modules: (i) gene ranking, ii) significance analysis of genes and (iii) validation. The first module uses two gene selection algorithms, namely, a) two-way clustering and b) combined adaptive ranking to rank the genes. The second module converts the gene ranks into p-values using an R-test and fuses the two sets of p-values using the Fisher's omnibus criterion. The DEGs are selected using the FDR analysis. The third module performs three fold validations of the obtained DEGs. The robustness of the proposed unified framework in gene selection is first illustrated using false discovery rate analysis. In addition, the clustering-based validation of the DEGs is performed by employing an adaptive subspace-based clustering algorithm on the training and the test datasets. Finally, a projection-based visualization is performed to validate the DEGs obtained using the unified framework.  相似文献   

15.
In the analysis of gene expression by microarrays there are usually few subjects, but high-dimensional data. By means of techniques, such as the theory of spherical tests or with suitable permutation tests, it is possible to sort the endpoints or to give weights to them according to specific criteria determined by the data while controlling the multiple type I error rate. The procedures developed so far are based on a sequential analysis of weighted p-values (corresponding to the endpoints), including the most extreme situation of weighting leading to a complete order of p-values. When the data for the endpoints have approximately equal variances, these procedures show good power properties. In this paper, we consider an alternative procedure, which is based on completely sorting the endpoints, but smoothed in the sense that some perturbations in the sequence of the p-values are allowed. The procedure is relatively easy to perform, but has high power under the same restrictions as for the weight-based procedures.  相似文献   

16.
The major histocompatibility complex (MHC) genes code for proteins that play a critical role in the immune system response. The MHC genes are among the most polymorphic genes in vertebrates, presumably due to balancing selection. The two MHC classes appear to differ in the rate of evolution, but the reasons for this variation are not well understood. Here, we investigate the level of polymorphism and the evolution of sequences that code for the peptide-binding regions of MHC class I and class II DRB genes in the Alpine marmot (Marmota marmota). We found evidence for four expressed MHC class I loci and two expressed MHC class II loci. MHC genes in marmots were characterized by low polymorphism, as one to eight alleles per putative locus were detected in 38 individuals from three French Alps populations. The generally limited degree of polymorphism, which was more pronounced in class I genes, is likely due to bottleneck the populations undergone. Additionally, gene duplication within each class might have compensated for the loss of polymorphism at particular loci. The two gene classes showed different patterns of evolution. The most polymorphic of the putative loci, Mama-DRB1, showed clear evidence of historical positive selection for amino acid replacements. However, no signal of positive selection was evident in the MHC class I genes. These contrasting patterns of sequence evolution may reflect differences in selection pressures acting on class I and class II genes.  相似文献   

17.
Cystinuria type I is an autosomal recessive disorder with an exclusively renal phenotype caused by inactivating mutations in SLC3A1. Recently 3 similar but distinct syndromes associated with cystinuria type I have been described: 2p21 deletion syndrome, Hypotonia-Cystinuria Syndrome (HCS) and atypical HCS. Genetic analysis indicated that these are recessive contiguous gene deletion syndromes which differ in the number of genes affected. Patients with HCS are missing both alleles of SLC3A1 and PREPL. In atypical HCS an additional gene (C2orf34) is deleted, and finally, in the 2p21 deletion syndrome the open reading frame of PPM1B is also disrupted. With the exception of SLC3A1, the gene products have not been fully characterized. The severity of the different syndromes reflects the number of genes which are deleted. HCS, a relatively mild syndrome, is characterised by cystinuria type I, generalised hypotonia at birth, growth retardation and minor facial dysmorphic features. On the other end of the spectrum is the 2p21 deletion syndrome, a severe syndrome with a number of additional features including a moderate to severe psychomotor retardation and a decrease in activity of the respiratory chain complexes I, III, IV and V. Finally, atypical HCS displays an intermediate phenotype comparable with classical HCS but associated with mild to moderate mental retardation and a decrease in activity of only the respiratory chain complex IV. This review will focus on the phenotypic similarities and differences observed in these syndromes. Furthermore, we speculate on the function of the gene products, based on the available data.  相似文献   

18.
19.
A number of statistical tests have been proposed to detect positive Darwinian selection affecting a few amino acid sites in a protein, exemplified by an excess of nonsynonymous nucleotide substitutions. These tests are often more powerful than pairwise sequence comparison, which averages synonymous (d(S)) and nonsynonymous (d(N)) rates over the whole gene. In a recent study, however, Hughes AL and Friedman R (2005. Variation in the pattern of synonymous and nonsynonymous difference between two fungal genomes. Mol Bio Evol. 22: 1320-1324) argue that d(S) and d(N) are expected to fluctuate along the sequence by chance and that an excess of nonsynonymous differences in individual codons is no evidence for positive selection. The authors compared codons in protein-coding genes from the genomes of 2 yeast species, Saccharomyces cerevisiae and Saccharomyces paradoxus. They calculated the proportions of synonymous and nonsynonymous differences per site (p(S) and p(N)) in every codon and discovered that p(N) is often greater than p(S) and that among some codons p(S) and p(N) are negatively correlated. The authors argued that these results invalidate previous tests of codons under positive selection. Here I discuss several errors of statistics in the analysis of Hughes and Friedman, including confusion of statistics with parameters, arbitrary data filtering, and derivation of hypotheses from data. I also apply likelihood ratio tests of positive selection to the yeast data and illustrate empirically that Hughes and Friedman's criticisms on such tests are not valid.  相似文献   

20.
Summary Defects in the enzyme, steroid 21-hydroxylase, result in congenital adrenal hyperplasia (CAH), a common autosomal recessive disorder of cortisol biosynthesis. The gene encoding this protein (CYP21B) and a closely linked pseudogene (CYP21A) have been mapped in the HLA complex on chromosome 6p, adjacent to the complement genes C4B and C4A, about 80 kb from the factor B gene. Molecular analyses of patients with CAH have shown that the cause of the defect may be either a deletion, a point mutation or a conversion of the active gene. Linkage of the disease to HLA has previously been studied by several groups. We have analyzed DNAs from patients with classical and non-classical CAH and from their family members, by probing with CYP21, C4 and BF cDNAs. In 70% of the CAH haplotypes studied, the defective CYP21B gene was indistinguishable from its structurally intact corresponding gene in Southern blot analysis, and presumably bore point mutations. In the remaining chromosomes, evidence for gene conversions, deletions and various deleterious mutations of the CYP21B gene is given. Moreover, our linkage studies show that a polymorphic TaqI cleavage site in the factor B gene, recently described by us, may be a new and useful genetic marker, because we found this TaqI restriction site only in unaffected haplotypes carrying functional CYP21B genes and, therefore, in negative association with the defective CYP21B gene.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号