共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
5.
6.
Background
The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection of foreign/conserved DNA, and plasmid-host similarity comparisons. Additionally, the reliability of the methods was tested by comparing both real and random genomic DNA.Results
Our findings show that the optimal method is context dependent. ROFs were best suited for distant homology searches, whilst the hexanucleotide ZOM and MCM measures were more reliable measures in terms of phylogeny. The dinucleotide ZOM method produced high correlation values when used to compare real genomes to an artificially constructed random genome with similar %GC, and should therefore be used with care. The tetranucleotide ZOM measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation (R 2 = 0.4) was found with genomic GC content and intra-chromosomal homogeneity.Conclusion
The statistical methods examined are fast, easy to implement, and powerful for a number of different applications involving genomic sequence comparisons. However, none of the measures examined were superior in all tests, and therefore the choice of the statistical method should depend on the task at hand. 相似文献7.
Fugu and human sequence comparison identifies novel human genes and conserved non-coding sequences 总被引:8,自引:0,他引:8
The compact genome of the pufferfish, Fugu rubripes, has been proposed as a 'reference' genome to aid in annotating and analysing the human genome. We have annotated and compared 85 kb of Fugu sequence containing 17 genes with its homologous loci in the human draft genome and identified three 'novel' human genes that were missed or incompletely predicted by the previous gene prediction methods. Two of the novel genes contain zinc finger domains and are designated ZNF366 and ZNF367. They map to human chromosomes 5q13.2 and 9q22.32, respectively. The third novel gene, designated C9orf21, maps to chromosome 9q22.32. This gene is unique to vertebrates, and the protein encoded by it does not contain any known domains. We could not find human homologs for two Fugu genes, a novel chemokine gene and a kinase gene. These genes are either specific to teleosts or lost in the human lineage. The Fugu-human comparison identified several conserved non-coding sequences in the promoter and intronic regions. These sequences, conserved during 450 million years of vertebrate evolution, are likely to be involved in gene regulation. The 85 kb Fugu locus is dispersed over four human loci, occupying about 1.5 Mb. Contiguity is conserved in the human genome between six out of 16 Fugu gene pairs. These contiguous chromosomal segments should share a common evolutionary history dating back to the common ancestor of mammals and teleosts. We propose contiguity as strong evidence to identify orthologous genes in distant organisms. This study confirms the utility of the Fugu as a supplementary tool to uncover and confirm novel genes and putative gene regulatory regions in the human genome. 相似文献
8.
Hydrophobins are small, secreted proteins that play important roles in the development of pathogenic and symbiotic fungi. Evolutionary mechanisms generating sequence and expression divergence among members in hydrophobin gene families are largely unknown. Seven hydrophobin (hyd) genes and one hyd pseudogene were isolated from strains of the ectomycorrhizal fungus Paxillus involutus. Sequences were analysed using phylogenetic methods. Expression profiles were inferred from microarray experiments. The hyd genes included both young (recently diverged) and old duplicates. Some young hyd genes exhibited an initial phase of enhanced sequence evolution owing to relaxed or positive selection. There was no significant association between sequence divergence and variation in expression levels. However, three hyd genes displayed a shift in the expression levels or an altered tissue specificity following duplication. The Paxillus hyd genes evolve according to the so-called birth-and-death model in which some duplicates are maintained for a long time, whereas others are inactivated through mutations. The role of subfunctionalization and/or neofunctionalization for preserving the hyd duplicates in the genome is discussed. 相似文献
9.
Amphioxus and lamprey AP-2 genes: implications for neural crest evolution and migration patterns 总被引:6,自引:0,他引:6
The neural crest is a uniquely vertebrate cell type present in the most basal vertebrates, but not in cephalochordates. We have studied differences in regulation of the neural crest marker AP-2 across two evolutionary transitions: invertebrate to vertebrate, and agnathan to gnathostome. Isolation and comparison of amphioxus, lamprey and axolotl AP-2 reveals its extensive expansion in the vertebrate dorsal neural tube and pharyngeal arches, implying co-option of AP-2 genes by neural crest cells early in vertebrate evolution. Expression in non-neural ectoderm is a conserved feature in amphioxus and vertebrates, suggesting an ancient role for AP-2 genes in this tissue. There is also common expression in subsets of ventrolateral neurons in the anterior neural tube, consistent with a primitive role in brain development. Comparison of AP-2 expression in axolotl and lamprey suggests an elaboration of cranial neural crest patterning in gnathostomes. However, migration of AP-2-expressing neural crest cells medial to the pharyngeal arch mesoderm appears to be a primitive feature retained in all vertebrates. Because AP-2 has essential roles in cranial neural crest differentiation and proliferation, the co-option of AP-2 by neural crest cells in the vertebrate lineage was a potentially crucial event in vertebrate evolution. 相似文献
10.
Guo FB 《Journal of biomolecular structure & dynamics》2007,25(2):127-133
The distribution patterns of bases of DNA fragments in different regions in P. aeruginosa genome are analyzed in this paper. It's shown that 5565 protein-coding genes, 17315 non-coding ORFs, and 1104 intergenic sequences are located into seven clusters based on their base frequencies. Almost all the protein-coding genes are contained in one of the seven clusters. The significant difference of base frequencies among three codon positions in high GC genome, which arouse the division between the distribution patterns of bases of six reading frames of protein-coding genes, is responsible for the appearance of the clustering phenomenon. In the light of the clustering phenomenon, the author supposes that the anitisense strand ORFs, particularly those corresponding to Frame 2' and Frame 3', may not code for proteins in P. aeruginosa genome. 相似文献
11.
12.
13.
14.
R Nussinov A Sarai G W Smythers D Wang R L Jernigan 《Journal of biomolecular structure & dynamics》1989,7(3):707-722
Previous studies of the dinucleotides flanking both the 5' and 3' ends of homooligomer tracts have shown that some flanks are consistently preferred over others (1,2). In the first preferred group, the homooligomer tracts are flanked by the same nucleotide and/or the complementary nucleotides, e.g.,ATAn,TTAn,CCGn, where n = 2-5. Runs flanked by nucleotides with which they cannot base pair are distinctly disfavored. (In this group An/Tn are flanked by C and/or G; Gn/Cn are flanked by A/T, e.g.,CGAn,TnGG,GnAT). The frequencies of runs flanked by A or T, and G or C ("mixed"group) are as expected. Here we seek the origin of this effect and its relevance to protein-DNA interactions. Surprisingly, within the first group, runs flanked by their complements with a pyrimidine-purine junction (e.g.,TTAn,CnGG) are greatly preferred. The frequencies of their purine-pyrimidine junction mirror-images is just as expected. This effect, as well as additional ones enumerated below, is seen universally in eukaryotes and in prokaryotes, although it is stronger in the former. Detailed analysis of regulatory regions shows these strong trends, particularly in GC sequences. The potential relationship to DNA conformation and DNA-protein interaction is discussed. 相似文献
15.
16.
17.
Constitutive expression of Slp genes in mouse strain B10.WR directed by C4 regulatory sequences 总被引:6,自引:0,他引:6
P A Rosa D S Sepich D M Robins R T Ogata 《Journal of immunology (Baltimore, Md. : 1950)》1987,139(5):1568-1577
The murine fourth component of complement (C4) and sex-limited protein (Slp) are two closely related serum proteins that exhibit very disparate patterns of gene expression: all mice constitutively express C4, whereas only adult male mice from a limited number of standard inbred strains express Slp. Several exceptional strains exhibit constitutive (C4-like) Slp expression, a phenotype that correlates with multiple copies of the Slp gene. To determine the molecular basis for constitutive Slp expression we have isolated genomic clones and compared the sequences of 1.5 kb of 5' flanking DNA from 1 C4 gene and three different Slp genes from the Slp-constitutive strain B10.WR. These sequence comparisons demonstrate C4-like regulatory sequences adjacent to two of the Slp genes. By analysis of cDNA clones isolated from a B10.WR liver library we demonstrate that the constitutive Slp phenotype is due primarily to expression of one of these C4/Slp hybrid genes. It appears likely that Slp gene duplication in strain B10.WR came about via homologous unequal crossover events between C4 and Slp genes; this would accommodate both the gene sequence data and the pattern of C4-like Slp expression in mouse strain B10.WR. 相似文献
18.
Like many plants, Populus has an evolutionary history in which several, both recent and more ancient, genome duplication events have occurred and,
therefore, constitutes an excellent model system for studying the functional evolution of genes. In the present study, we
have focused on the properties of genes with tissue-specific differential expression patterns in poplar. We identified the
genes by analyzing digital expression profiles derived by mapping 90,000+ expressed sequence tags (ESTs) from 18 sources to
the predicted genes of Populus. Our sequence analysis suggests that tissue-specific differentially expressed genes have less diverged paralogs than average,
indicating that gene duplication events is an important event in the pathway leading to this type of expression pattern. The
functional analysis showed that genes coding for proteins involved in processes of functional importance for the specific
tissue(s) in which they are expressed and genes coding for regulatory or responsive proteins are most common among the differentially
expressed genes, demonstrating that the expression differentiation process is under strong selective pressure. Thus, our data
supports a model where gene duplication followed by gene specialization or expansion of the regulatory and responsive networks
leads to tissue-specific differential expression patterns. We have also searched for clustering of genes with similar expression
pattern into gene-expression neighborhoods within the Populus genome. However, we could not detect any major clustering among the analyzed genes with highly specific expression patterns.
Electronic supplementary material Supplementary material is available in the online version of this article at and is accessible for authorized users. 相似文献
19.
Olga N. Danilevskaya Elena V. Kurenova Maria N. Pavlova Dmitrii V. Bebehov Andrew J. Link Akihiko Koga Ann Vellek Daniel L. Hartl 《Chromosoma》1991,100(2):118-124
The genome of Drosophila melanogaster contains a class of repetitive DNA sequences called the He-T family, which is unusual in being confined to telomeric and heterochromatic regions. The specific He-T fragment designated Dm665 was cloned in yeast by selection for an autonomously replicating sequence (ARS). Dm665 contains a restriction fragment length polymorphism (RFLP) that is specific to males and thus derives from the Y chromosome. Deletion mapping using X-Y translocations indicates that sequences homologous to Dm665 occur in at least one major cluster in each arm of the Y chromosome. Among 20 yeast artificial chromosome (YAC) clones containing Drosophila sequences homologous with Dm665, four clones derive from defined regions of the long arm of the Y and two from the short arm. The sequence of Dm665 is 2443 bp long, consists of 59% A+T, and contains no significant open reading frames or direct or inverted repeats. However, Dm665 contains a region of 650 bp that shares homology with portions of the X-linked locus Stellate.by W. Hennig 相似文献
20.
Mochizuki A 《Journal of theoretical biology》2008,250(2):307-321
Complexity of gene regulatory network has been considered to be responsible for diversity of cells. Different types of cells, characterized by the expression patterns of genes, are produced in early development through the dynamics of gene activities based on the regulatory network. However, very little is known about relationship between the structure of regulatory networks and the dynamics of gene activities. In this paper, I introduce new idea of “steady-state compatibility” by which the diversity of possible gene activities can be determined from the topological structure of gene regulatory networks. The basic premise is very simple: the activity of a gene should be a function of the controlling genes. Thus, a gene should always show unique expression activity if the activities of the controlling genes are unique. Based on this, the maximum possible diversity of steady states is determined using only information regarding regulatory linkages without knowing the regulatory functions of genes. By extending this idea, some general properties were derived. For example, multiple loop structures in regulatory networks are necessary for increasing the diversity of gene activity. On the other hand, connected multiple loops sharing the same genes do not increase the diversity. The method was applied to a gene regulatory network responsible for early development in a sea urchin species. A set of important genes responsible for generating diversities of gene activities was derived based on the concept of compatibility of steady states. 相似文献