首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Scanning of the human genome by use of affected relative pairs and dense sets of highly polymorphic markers or by emerging techniques such as genomic mismatch scanning. (GMS) is making it possible to identify the genetic etiology of a disease through detection of susceptibility loci. We present a general statistical model and test to detect disease genes, using affected relative pairs and either markers or GMS technologies in a genome search. There are an exact test and large-sample normal approximation that control for the elevated probability of false detection of linkage in a genome search. The approach can be used to determine the sample size needed to obtain a prespecified power to detect a disease gene in the presence of etiologic heterogeneity for a single class or mixture of relative classes, with any number of markers, or clones, markers PIC values, or mapping function. The approach is used to examine differences in performance of markers and GMS technologies in a common statistical framework and to provide practical information for designing studies of complex traits.  相似文献   

3.
Statistical validation of gene clusters is imperative for many important applications in comparative genomics which depend on the identification of genomic regions that are historically and/or functionally related. We develop the first rigorous statistical treatment of max-gap clusters, a cluster definition frequently used in empirical studies. We present exact expressions for the probability of observing an individual cluster of a set of marked genes in one genome, as well as upper and lower bounds on the probability of observing a cluster of h homologs in a pairwise whole-genome comparison. We demonstrate the utility of our approach by applying it to a whole-genome comparison of E. coli and B. subtilis. Code for statistical tests is available at.  相似文献   

4.
Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.  相似文献   

5.
The ArcAB two-component system of Escherichia coli regulates the aerobic/anaerobic expression of genes that encode respiratory proteins whose synthesis is coordinated during aerobic/anaerobic cell growth. A genomic study of E. coli was undertaken to identify other potential targets of oxygen and ArcA regulation. A group of 175 genes generated from this study and our previous study on oxygen regulation (Salmon, K., Hung, S. P., Mekjian, K., Baldi, P., Hatfield, G. W., and Gunsalus, R. P. (2003) J. Biol. Chem. 278, 29837-29855), called our gold standard gene set, have p values <0.00013 and a posterior probability of differential expression value of 0.99. These 175 genes clustered into eight expression patterns and represent genes involved in a large number of cell processes, including small molecule biosynthesis, macromolecular synthesis, and aerobic/anaerobic respiration and fermentation. In addition, 119 of these 175 genes were also identified in our previous study of the fnr allele. A MEME/weight matrix method was used to identify a new putative ArcA-binding site for all genes of the E. coli genome. 16 new sites were identified upstream of genes in our gold standard set. The strict statistical analyses that we have performed on our data allow us to predict that 1139 genes in the E. coli genome are regulated either directly or indirectly by the ArcA protein with a 99% confidence level.  相似文献   

6.
Arquès DG  Lacan J  Michel CJ 《Bio Systems》2002,66(1-2):73-92
A new statistical approach using functions based on the circular code classifies correctly more than 93% of bases in protein (coding) genes and non-coding genes of human sequences. Based on this statistical study, a research software called 'Analysis of Coding Genes' (ACG) has been developed for identifying protein genes in the genomes and for determining their frame. Furthermore, the software ACG also allows an evaluation of the length of protein genes, their position in the genome, their relative position between themselves, and the prediction of internal frames in protein genes.  相似文献   

7.
8.
Lang GI  Murray AW 《Genetics》2008,178(1):67-82
Although mutation rates are a key determinant of the rate of evolution they are difficult to measure precisely and global mutations rates (mutations per genome per generation) are often extrapolated from the per-base-pair mutation rate assuming that mutation rate is uniform across the genome. Using budding yeast, we describe an improved method for the accurate calculation of mutation rates based on the fluctuation assay. Our analysis suggests that the per-base-pair mutation rates at two genes differ significantly (3.80x10(-10) at URA3 and 6.44x10(-10) at CAN1) and we propose a definition for the effective target size of genes (the probability that a mutation inactivates the gene) that acknowledges that the mutation rate is nonuniform across the genome.  相似文献   

9.
10.

   

Understanding the evolutionary plasticity of the genome requires a global, comparative approach in which genetic events are considered both in a phylogenetic framework and with regard to population genetics and environmental variables. In the mechanisms that generate adaptive and non-adaptive changes in genomes, segmental duplications (duplication of individual genes or genomic regions) and polyploidization (whole genome duplications) are well-known driving forces. The probability of fixation and maintenance of duplicates depends on many variables, including population sizes and selection regimes experienced by the corresponding genes: a combination of stochastic and adaptive mechanisms has shaped all genomes. A survey of experimental work shows that the distinction made between fixation and maintenance of duplicates still needs to be conceptualized and mathematically modeled. Here we review the mechanisms that increase or decrease the probability of fixation or maintenance of duplicated genes, and examine the outcome of these events on the adaptation of the organisms.  相似文献   

11.
Many microsatellite sequences have been described in the bovine genome. Being highly polymorphic these have been suggested as markers for parentage verification and individual identification in cattle. We have evaluated the use of five highly polymorphic microsatellite markers for parentage verification in 14 breeds of cattle in the UK. Three of the microsatellite loci occur within introns in genes: BoLA DRB3 , steroid 21-hydroxylase, and the beta subunit of the follicle-stimulating hormone. The other two are anonymous sites ETH131 and HEL6. Results were analysed by a statistical approach that takes in to account deviations from Hardy-Wienberg equilibrium and linkage disequilibrium for multiple loci. The method of determining the probability of random sire exclusion uses observed genotype frequencies instead of allele frequencies. Independently, the markers used have a probability of between 0.72 and 0.62 of identifying a parentage error, while used together the five markers give, on average across breeds, a probability of 0.99 of excluding an incorrect sire.  相似文献   

12.
DAGchainer: a tool for mining segmental genome duplications and synteny   总被引:8,自引:0,他引:8  
SUMMARY: Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order between genomic regions, by identifying paths through a directed acyclic graph (DAG). These chains of collinear gene pairs can represent segmentally duplicated regions and genes within a single genome or syntenic regions between related genomes. Automated mining of the Arabidopsis genome for segmental duplications illustrates the use of DAGchainer.  相似文献   

13.
We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes.  相似文献   

14.
Zakharov IA  Markov AV 《Genetika》2005,41(12):1624-1633
The gene orders in the genomes of nine alpha-proteobacteria were compared using quantitative indices S (the relative number of common pairs of adjacent genes) and L (the mean difference between intergenic distances). A sample of 200 homologous genes, occurring in all 11 strains, was studied. In all of the genomes examined, 20 conserved, "uninterrupted" regions, including in total 63 out of 200 genes, were found. The rate of evolutionary change in the gene order widely varied in different evolutionary lineages. The highest rate (40 to 60 genome rearrangements per 100 Myr) was characteristic of the intercellular parasite Wolbachia (Rickettsiales). Computer simulation has showed that the S to L ratio observed in the sample testified that the probability of large genome rearrangements was somewhat lower than that of small ones.  相似文献   

15.
MOTIVATION: Establishment of intra-cellular life involved a profound re-configuration of the genetic characteristics of bacteria, including genome reduction and rearrangements. Understanding the mechanisms underlying these phenomena will shed light on the genome rearrangements essential for the development of an intra-cellular lifestyle. Comparison of genomes with differences in their sizes poses statistical as well as computational problems. Little efforts have been made to develop flexible computational tools with which to analyse genome reduction and rearrangements. RESULTS: Investigation of genome reduction and rearrangements in endosymbionts using a novel computational tool (GRAST) identified gathering of genes with similar functions. Conserved clusters of functionally related genes (CGSCs) were detected. Heterogeneous gene and gene cluster non-functionalization/loss are identified between genome regions, functional gene categories and during evolution. Results show that gene non-functionalisation has accelerated during the last 50 MY of Buchnera's evolution while CGSCs have been static.  相似文献   

16.
Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50-100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a "Phenotype Sequencing" approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost $7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only $1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only $110-$340.  相似文献   

17.
Whether higher-order chromatin organization is related to genome stability over evolutionary time remains elusive. We find that regions of conserved gene order across the genus Drosophila are larger if they harbor genes bound by B-type lamin (Lam) and Suppressor of Under-Replication (SUUR), two proteins located at the nuclear periphery. Low recombination rates and coexpression of genes in regions of conserved gene order do not explain the lower probability of disruption in these regions by genome rearrangements. Instead, we find a significant colocalization between evolutionarily stable genomic regions associated with Lam and sequences thought to regulate local gene expression, which have the potential to impose constraints on genome rearrangement. At least in the genus Drosophila, localization of particular genomic regions at the nuclear periphery is intimately associated with their long-term integrity during evolution.  相似文献   

18.
J I Weller  J Z Song  D W Heyen  H A Lewin  M Ron 《Genetics》1998,150(4):1699-1706
Saturated genetic marker maps are being used to map individual genes affecting quantitative traits. Controlling the "experimentwise" type-I error severely lowers power to detect segregating loci. For preliminary genome scans, we propose controlling the "false discovery rate," that is, the expected proportion of true null hypotheses within the class of rejected null hypotheses. Examples are given based on a granddaughter design analysis of dairy cattle and simulated backcross populations. By controlling the false discovery rate, power to detect true effects is not dependent on the number of tests performed. If no detectable genes are segregating, controlling the false discovery rate is equivalent to controlling the experimentwise error rate. If quantitative loci are segregating in the population, statistical power is increased as compared to control of the experimentwise type-I error. The difference between the two criteria increases with the increase in the number of false null hypotheses. The false discovery rate can be controlled at the same level whether the complete genome or only part of it has been analyzed. Additional levels of contrasts, such as multiple traits or pedigrees, can be handled without the necessity of a proportional decrease in the critical test probability.  相似文献   

19.
With the production of whole genome microarray chips the ability arises to investigate whether the regulation of particular groups of genes may be influenced by their chromosomal localization. Chromosome Co-Localization probability calculator (ChroCoLoc) is a publicly available web-based tool for the analysis of co-localization of co-expressed genes identified by microarray experiments. AVAILABILITY: http://www.ebi.ac.uk/expressionprofiler/  相似文献   

20.
In the present study, we sequenced the complete mt genome (14,022 bp) of parasitic nematode Contracaecum rudolphii B and its structure and organization compared with Anisakis simplex s.l. The mt genome of C. rudolphii B is slightly longer than that of A. simplex s.l. (13,916 bp). C. rudolphii B mt genome is circular, and consists of 36 genes, including 12 genes for proteins, 2 genes for rRNA and 22 genes for tRNA. This genome contains a high A+T (70.5%) content. The mt gene order for C. rudolphii B is the same as those for A. simplex s.l., but it is distinctly different from other nematodes compared. The start codons inferred in the mt genome of C. rudolphii B are TTG and ATT. Six protein-coding genes use TAA as a stop codon whereas five genes use T and one genes use TAG as a termination codon. This pattern of codon usage reflects the strong bias for A and T in the mt genome of C. rudolphii B. Phylogenetic analyses using concatenated amino acid sequences of the 12 protein-coding genes, with three different computational algorithms (Bayes, ML and MP), all revealed distinct groups with high statistical support, indicating that C. rudolphii B and A. simplex s.l. is distinct but closely related species. These data provide additional novel mtDNA markers for studying the molecular epidemiology and population genetics of the C. rudolphii B, and should have implications for the molecular diagnosis, prevention and control of anisakidosis in humans and animals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号