首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Given a long string of characters from a constant size alphabet we present an algorithm to determine whether its characters have been generated by a single i.i.d. random source. More specifically, consider all possible n-coin models for generating a binary string S, where each bit of S is generated via an independent toss of one of the n coins in the model. The choice of which coin to toss is decided by a random walk on the set of coins where the probability of a coin change is much lower than the probability of using the same coin repeatedly. We present a procedure to evaluate the likelihood of a n-coin model for given S, subject a uniform prior distribution over the parameters of the model (that represent mutation rates and probabilities of copying events). In the absence of detailed prior knowledge of these parameters, the algorithm can be used to determine whether the a posteriori probability for n=1 is higher than for any other n>1. Our algorithm runs in time O(l4logl), where l is the length of S, through a dynamic programming approach which exploits the assumed convexity of the a posteriori probability for n. Our test can be used in the analysis of long alignments between pairs of genomic sequences in a number of ways. For example, functional regions in genome sequences exhibit much lower mutation rates than non-functional regions. Because our test provides means for determining variations in the mutation rate, it may be used to distinguish functional regions from non-functional ones. Another application is in determining whether two highly similar, thus evolutionarily related, genome segments are the result of a single copy event or of a complex series of copy events. This is particularly an issue in evolutionary studies of genome regions rich with repeat segments (especially tandemly repeated segments).  相似文献   

3.
4.
M de Zamaroczy  G Bernardi 《Gene》1992,122(1):91-99
The introns of three genes (oxi3, cob and 21S) from the mitochondrial (mt) genome of Saccharomyces cerevisiae contain closed reading frames (CRFs). In the present work, we have analyzed these sequences in their oligodeoxyribonucleotide (oligo; isostich) patterns. We have shown that the relative amounts of di- to hexanucleotides, when compared to random sequences having the same sizes and compositions, exhibit the same deviations as the intergenic noncoding sequences of the mt genome (except for the CRFs from 21S intron). In contrast, intronic open reading frames (ORFs) showed oligo patterns which were generally quite distinct from those of CRFs, although some similarities could be detected in some cases (especially for aI5 alpha). The mt introns of yeast, therefore, are endowed with a mosaic structure, in which CRFs derive from mt intergenic sequences, whereas ORFs have a different origin (indicated as exogenous by other evidences) yet show, in some cases, the effects of 'sequence assimilation' with CRFs.  相似文献   

5.
ORF organization and gene recognition in the yeast genome   总被引:3,自引:0,他引:3  
Some rules on gene recognition and ORF organization in the Saccharomyces cerevisiae genome are demonstrated by statistical analyses of sequence data. This study includes: (a) The random frame rule-that the six reading frames W1, W2, W3, C1, C2 and C3 in the double-stranded genome are randomly occupied by ORFs (related phenomena on ORF overlapping are also discussed). (b) The inhomogeneity rule-coding and non-coding ORFs differ in inhomogeneity of base composition in the three codon positions. By use of the inhomogeneity index (IHI), one can make a distinction between coding (IHI > 14) and non-coding (IHI 相似文献   

6.
7.
Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree.  相似文献   

8.
9.
The Kaposi sarcoma associated herpesvirus (KSHV) genome encodes more than 85 open reading frames (ORFs). Serological evaluation of KSHV infection now generally relies on reactivity to just one latent and/or one lytic protein (commonly ORF73 and K8.1). Most of the other polypeptides encoded by the virus have unknown antigenic profiles. We have systematically expressed and purified products from 72 KSHV ORFs in recombinant systems and analyzed seroreactivity in US patients with KSHV-associated malignancies, and US blood donors (low KSHV seroprevalence population). We identified several KSHV proteins (ORF38, ORF61, ORF59 and K5) that elicited significant responses in individuals with KSHV-associated diseases. In these patients, patterns of reactivity were heterogeneous; however, HIV infection appeared to be associated with breadth and intensity of serological responses. Improved antigenic characterization of additional ORFs may increase the sensitivity of serologic assays, lead to more rapid progresses in understanding immune responses to KSHV, and allow for better comprehension of the natural history of KSHV infection. To this end, we have developed a bead-based multiplex assay detecting antibodies to six KSHV antigens.  相似文献   

10.
Null‐model analysis of co‐occurrence patterns is a powerful tool to identify ‘structure’ in community ecology data sets. We evaluated the community structure of chameleons in rainforest regions of Nigeria and Cameroon using available data in the literature, including peer‐reviewed articles and unpublished environmental reports to industries. We performed Monte Carlo simulations (5000 iterations, using the sequential swap algorithm) under several model assumptions to derive co‐occurrence patterns among species. Food and spatial (habitat) segregation patterns in both lowland rainforest and montane forest were investigated. We subjected four indices of co‐occurrence patterns (C‐ratio, number of checkerboard species pairs, number of species combinations, and V‐score) to randomization procedures. Overall, the chameleon communities do not show random organization, but instead exhibit precise deterministic patterns. In lowland rainforest, chameleon communities are assembled deterministically along the food niche resource axis, but not along the habitat niche resource axis. The opposite holds for chameleon communities in montane rainforest. We predict that these patterns can be generalized to other regions of tropical Africa, thus helping to determine the general structure of chameleon communities in tropical African forests.  相似文献   

11.
The protozoans Trypanosoma cruzi, Trypanosoma brucei and Leishmania major (Tritryps), are evolutionarily ancient eukaryotes which cause worldwide human parasitosis. They present unique biological features. Indeed, canonical DNA/RNA cis-acting elements remain mostly elusive. Repetitive sequences, originally considered as selfish DNA, have been lately recognized as potentially important functional sequence elements in cell biology. In particular, the dinucleotide patterns have been related to genome compartmentalization, gene evolution and gene expression regulation. Thus, we perform a comparative analysis of the occurrence, length and location of dinucleotide repeats (DRs) in the Tritryp genomes and their putative associations with known biological processes. We observe that most types of DRs are more abundant than would be expected by chance. Complementary DRs usually display asymmetrical strand distribution, favoring TT and GT repeats in the coding strands. In addition, we find that GT repeats are among the longest DRs in the three genomes. We also show that specific DRs are non-uniformly distributed along the polycistronic unit, decreasing toward its boundaries. Distinctive non-uniform density patterns were also found in the intergenic regions, with predominance at the vicinity of the ORFs. These findings further support that DRs may control genome structure and gene expression.  相似文献   

12.
The small size of RNA virus genomes (2-to-32 kb) has been attributed to high mutation rates during replication, which is thought to lack proof-reading. This paradigm is being revisited owing to the discovery of a 3′-to-5′ exoribonuclease (ExoN) in nidoviruses, a monophyletic group of positive-stranded RNA viruses with a conserved genome architecture. ExoN, a homolog of canonical DNA proof-reading enzymes, is exclusively encoded by nidoviruses with genomes larger than 20 kb. All other known non-segmented RNA viruses have smaller genomes. Here we use evolutionary analyses to show that the two- to three-fold expansion of the nidovirus genome was accompanied by a large number of replacements in conserved proteins at a scale comparable to that in the Tree of Life. To unravel common evolutionary patterns in such genetically diverse viruses, we established the relation between genomic regions in nidoviruses in a sequence alignment-free manner. We exploited the conservation of the genome architecture to partition each genome into five non-overlapping regions: 5′ untranslated region (UTR), open reading frame (ORF) 1a, ORF1b, 3′ORFs (encompassing the 3′-proximal ORFs), and 3′ UTR. Each region was analyzed for its contribution to genome size change under different models. The non-linear model statistically outperformed the linear one and captured >92% of data variation. Accordingly, nidovirus genomes were concluded to have reached different points on an expansion trajectory dominated by consecutive increases of ORF1b, ORF1a, and 3′ORFs. Our findings indicate a unidirectional hierarchical relation between these genome regions, which are distinguished by their expression mechanism. In contrast, these regions cooperate bi-directionally on a functional level in the virus life cycle, in which they predominantly control genome replication, genome expression, and virus dissemination, respectively. Collectively, our findings suggest that genome architecture and the associated region-specific division of labor leave a footprint on genome expansion and may limit RNA genome size.  相似文献   

13.
14.
15.
16.
Exhaustive identification of open reading frames in complete genome sequences is a difficult task. It is possible that important genes are missed. In our efforts to reanalyze the intergenic regions of Mycoplasma genitalium and Mycoplasma pneumoniae, we have newly identified a number of new open reading frames (ORFs) in both M. genitalium and M. pneumoniae. The most significant identification was that of a ribonuclease H enzyme in both species which until now has not been identified or assumed absent and interpreted as such. In this paper we discuss the biological importance of RNase H and its evolutionary implication. We also stress the usefulness of our method for identifying new ORFs by reanalyzing intergenic regions of existing ORFs in complete genome sequences.  相似文献   

17.
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.  相似文献   

18.
Fire A  Alcazar R  Tan F 《Genetics》2006,173(3):1259-1273
We describe a surprising long-range periodicity that underlies a substantial fraction of C. elegans genomic sequence. Extended segments (up to several hundred nucleotides) of the C. elegans genome show a strong bias toward occurrence of AA/TT dinucleotides along one face of the helix while little or no such constraint is evident on the opposite helical face. Segments with this characteristic periodicity are highly overrepresented in intron sequences and are associated with a large fraction of genes with known germline expression in C. elegans. In addition to altering the path and flexibility of DNA in vitro, sequences of this character have been shown by others to constrain DNA::nucleosome interactions, potentially producing a structure that could resist the assembly of highly ordered (phased) nucleosome arrays that have been proposed as a precursor to heterochromatin. We propose a number of ways that the periodic occurrence of An/Tn clusters could reflect evolution and function of genes that express in the germ cell lineage of C. elegans.  相似文献   

19.
In the search for the genome of egg drop syndrome virus (EDSV-76) Chinese strain AAV-2, part of restrietion endonuclease physical map is analyzed, the complete genomic library is organized. On basis of this, the eomplete genome nueleotide sequences (32 838 bp in length, including terminal structures) are determined. The data analysis shows: compared with the other Adenoviruses, strain AAV-2 has more disparity on ganomic structure and the distribution of open reading frame (ORF). There are no elear E1, E3 and F4 regions in AAV-2 genome. Two segments located at both ends of genome (1.1 kb and 8.3 kb in length respectively) have no homology with the other adenovirus genomes. In addition, strain AAV-2 genome lacks ORFs encoding E1A, pⅤ and pⅨ, which are common ORFs encoding early, lately proteins in Adenovirus. This reveals differences between EDSA-76, the sole standard strain of group Ⅲ Avian Adenoviruses, and the other Avian Adenoviruses for the first time. It will help the search for Avian Adenovirus  相似文献   

20.
In the search for the genome of egg drop syndrome virus (EDSV-76) Chinese strain AAV-2, part of restriction endonuclease physical map is analyzed, the complete genomic library is organized. On basis of this, the complete genome nucleotide sequences (32 838 bp in length, including terminal structures) are determined. The data analysis shows: compared with the other Adenoviruses, strain AAV-2 has more disparity on genomic structure and the distribution of open reading frame (ORF). There are no clear E1, E3 and E4 regions in AAV-2 genome. Two segments located at both ends of genome (1.1 kb and 8.3 kb in length respectively) have no homology with the other adenovirus genomes. In addition, strain AAV-2 genome lacks ORFs encoding ElA, pV and pIX, which are common ORFs encoding early, lately proteins in Adenovirus. This reveals differences between EDSA-76, the sole standard strain of group III Avian Adenoviruses, and the other Avian Adenoviruses for the first time. It will help the search for Avian Adenovirus and will also help the search of all Adenoviruses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号