首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers.  相似文献   

2.
Analyses of 55 individual and 31 concatenated protein data sets encoded in Reclinomonas americana and Marchantia polymorpha mitochondrial genomes revealed that current methods for constructing phylogenetic trees are insufficiently sensitive (or artifact-insensitive) to ascertain the sister of mitochondria among the current sample of eight alpha-proteobacterial genomes using mitochondrially-encoded proteins. However, Rhodospirillum rubrum came as close to mitochondria as any alpha-proteobacterium investigated. This prompted a search for methods to directly compare eukaryotic genomes to their prokaryotic counterparts to investigate the origin of the mitochondrion and its host from the standpoint of nuclear genes. We examined pairwise amino acid sequence identity in comparisons of 6,214 nuclear protein-coding genes from Saccharomyces cerevisiae to 177,117 proteins encoded in sequenced genomes from 45 eubacteria and 15 archaebacteria. The results reveal that approximately 75% of yeast genes having homologues among the present prokaryotic sample share greater amino acid sequence identity to eubacterial than to archaebacterial homologues. At high stringency comparisons, only the eubacterial component of the yeast genome is detectable. Our findings indicate that at the levels of overall amino acid sequence identity and gene content, yeast shares a sister-group relationship with eubacteria, not with archaebacteria, in contrast to the current phylogenetic paradigm based on ribosomal RNA. Among eubacteria and archaebacteria, proteobacterial and methanogen genomes, respectively, shared more similarity with the yeast genome than other prokaryotic genomes surveyed.  相似文献   

3.

Background

Reconstruction of evolutionary history of bacteriophages is a difficult problem because of fast sequence drift and lack of omnipresent genes in phage genomes. Moreover, losses and recombinational exchanges of genes are so pervasive in phages that the plausibility of phylogenetic inference in phage kingdom has been questioned.

Results

We compiled the profiles of presence and absence of 803 orthologous genes in 158 completely sequenced phages with double-stranded DNA genomes and used these gene content vectors to infer the evolutionary history of phages. There were 18 well-supported clades, mostly corresponding to accepted genera, but in some cases appearing to define new taxonomic groups. Conflicts between this phylogeny and trees constructed from sequence alignments of phage proteins were exploited to infer 294 specific acts of intergenome gene transfer.

Conclusion

A notoriously reticulate evolutionary history of fast-evolving phages can be reconstructed in considerable detail by quantitative comparative genomics.

Open peer review

This article was reviewed by Eugene Koonin, Nicholas Galtier and Martijn Huynen.  相似文献   

4.
Clustering of main orthologs for multiple genomes   总被引:1,自引:0,他引:1  
The identification of orthologous genes shared by multiple genomes is critical for both functional and evolutionary studies in comparative genomics. While it is usually done by sequence similarity search and reconciled tree construction in practice, recently a new combinatorial approach and high-throughput system MSOAR for ortholog identification between closely related genomes based on genome rearrangement and gene duplication has been proposed in Fu et al. MSOAR assumes that orthologous genes correspond to each other in the most parsimonious evolutionary scenario, minimizing the number of genome rearrangement and (postspeciation) gene duplication events. However, the parsimony approach used by MSOAR limits it to pairwise genome comparisons. In this paper, we extend MSOAR to multiple (closely related) genomes and propose an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes. As a preliminary experiment, we apply MultiMSOAR to rat, mouse, and human genomes, and validate our results using gene annotations and gene function classifications in the public databases. We further compare our results to the ortholog clusters predicted by MultiParanoid, which is an extension of the well-known program InParanoid for pairwise genome comparisons. The comparison reveals that MultiMSOAR gives more detailed and accurate orthology information, since it can effectively distinguish main orthologs from inparalogs.  相似文献   

5.

Background  

Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins.  相似文献   

6.
“Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes.  相似文献   

7.
An increasing number of complete sequences of mitochondrial (mt) genomes provides the opportunity to optimise the choice of molecular markers for phylogenetic and ecological studies. This is particularly the case where mt genomes from closely related taxa have been sequenced; e.g., within Schistosoma. These blood flukes include species that are the causative agents of schistosomiasis, where there has been a need to optimise markers for species and strain recognition. For many phylogenetic and population genetic studies, the choice of nucleotide sequences depends primarily on suitable PCR primers. Complete mt genomes allow individual gene or other mt markers to be assessed relative to one another for potential information content, prior to broad-scale sampling. We assess the phylogenetic utility of individual genes and identify regions that contain the greatest interspecific variation for molecular ecological and diagnostic markers. We show that variable characters are not randomly distributed along the genome and there is a positive correlation between polymorphism and divergence. The mt genomes of African and Asian schistosomes were compared with the available intraspecific dataset of Schistosoma mansoni through sliding window analyses, in order to assess whether the observed polymorphism was at a level predicted from interspecific comparisons. We found a positive correlation except for the two genes (cox1 and nad1) adjoining the putative control region in S. mansoni. The genes nad1, nad4, nad5, cox1 and cox3 resolved phylogenies that were consistent with a benchmark phylogeny and in general, longer genes performed better in phylogenetic reconstruction. Considering the information content of entire mt genome sequences, partial cox1 would not be the ideal marker for either species identification (barcoding) or population studies with Schistosoma species. Instead, we suggest the use of cox3 and nad5 for both phylogenetic and population studies. Five primer pairs designed against Schistosoma mekongi and Schistosoma malayensis were tested successfully against Schistosoma japonicum. In combination, these fragments encompass 20-27% of the variation amongst the genomes (average total length approximately 14,000bp), thus providing an efficient means of encapsulating the greatest amount of variation within the shortest sequence. Comparative mitogenomics provides the basis of a rational approach to molecular marker selection and optimisation.  相似文献   

8.
Meyer TE  Bansal AK 《Biochemistry》2005,44(34):11458-11465
Based largely upon analysis of ribosomal RNA, a third domain of life, called archaea, had been proposed in addition to bacteria and eukaryotes. However, quantitative analysis of 73 whole genomes shows only a two-domain division of life: into eukaryotes and prokaryotes. Thousands of orthologous genes in archaea and bacteria show an essentially unimodal distribution of sequence identities. Thus, whole genome analyses indicate that archaea are a phylum of bacteria rather than a separate domain of life. In contrast, archaeal rRNA and that of hyperthermophilic bacteria differ from the rRNA of mesophilic bacteria. Thus, there is a bimodal distribution of rRNA sequence identities which differ by 12%. This discrepancy in rRNA and gene content based analyses of whole genomes is likely due to a 15% elevated C:G content of the rRNA of archaea and hyperthermophilic bacteria. The elevated C:G content is consistent with stabilization against thermal denaturation caused by additional hydrogen bonding (3 bonds) in C:G pairs compared to A:U pairs (2 bonds). Based upon this premise, there is no reliable way to correct rRNA for such differences in base composition and it is not possible to quantitatively compare hyperthermophiles with mesophiles by the rRNA method. Furthermore, quantitative study of whole genomes shows that the extent of change in both bacterial and archaeal genes, including rRNA, has reached a limit. Thus, direct sequence comparisons work with closely related genomes, but it is not possible to differentiate the most divergent prokaryotic species, which are currently designated as separate phyla. We believe that the differences in characteristics of archaeal species is based primarily upon selection of genes and pathways compatible with the extreme environmental lifestyle, i.e., hyperthermophily.  相似文献   

9.
Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences--thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.  相似文献   

10.
We present a prototype of a new database tool, GeneCensus, which focuses on comparing genomes globally, in terms of the collective properties of many genes, rather than in terms of the attributes of a single gene (e.g. sequence similarity for a particular ortholog). The comparisons are presented in a visual fashion over the web at GeneCensus.org. The system concentrates on two types of comparisons: (i) trees based on the sharing of generalized protein families between genomes, and (ii) whole pathway analysis in terms of activity levels. For the trees, we have developed a module (TreeViewer) that clusters genomes in terms of the folds, superfamilies or orthologs—all can be considered as generalized ‘families’ or ‘protein parts’—they share, and compares the resulting trees side-by-side with those built from sequence similarity of individual genes (e.g. a traditional tree built on ribosomal similarity). We also include comparisons to trees built on whole-genome dinucleotide or codon composition. For pathway comparisons, we have implemented a module (PathwayPainter) that graphically depicts, in selected metabolic pathways, the fluxes or expression levels of the associated enzymes (i.e. generalized ‘activities’). One can, consequently, compare organisms (and organism states) in terms of representations of these systemic quantities. Develop ment of this module involved compiling, calculating and standardizing flux and expression information from many different sources. We illustrate pathway analysis for enzymes involved in central metabolism. We are able to show that, to some degree, flux and expression fluctuations have characteristic values in different sections of the central metabolism and that control points in this system (e.g. hexokinase, pyruvate kinase, phosphofructokinase, isocitrate dehydrogenase and citric synthase) tend to be especially variable in flux and expression. Both the TreeViewer and PathwayPainter modules connect to other information sources related to individual-gene or organism properties (e.g. a single-gene structural annotation viewer).  相似文献   

11.
《Genomics》2019,111(6):1590-1603
Genomes are not random sequences because natural selection has injected information in biological sequences for billions of years. Inspired by this idea, we developed a simple method to compare genomes considering nucleotide counts in subsequences (blocks) instead of their exact sequences.We introduce the Block Alignment method for comparing two genomes and based on this comparison method, define a similarity score and a distance. The presented model ignores nucleotide order in the sequence. On the other hand, in this block comparison method, due to exclusion of point mutations and small size variations, there is no need for high coverage sequencing which is responsible for the high costs of data production and storage; moreover, the sequence comparisons could be performed with higher speed.Phylogenetic trees of two sets of bacterial genomes were constructed and the results were in full agreement with their already constructed phylogenetic trees. Furthermore, a weighted and directed similarity network of each set of bacterial genomes was inferred ab initio by this model. Remarkably, the communities of these networks are in agreement with the clades of the corresponding phylogenetic trees which means these similarity networks also contain phylogenetic information about the genomes. Moreover, the block comparison method was used to distinguish rob(15;21)c-associated iAMP21 and sporadic iAMP21 rearrangements in subgroups of chromosome 21 in acute lymphoblastic leukemia. Our results show a meaningful difference between the number of contigs that mapped to chromosomes 15 and 21 in these cases. Furthermore, the presented block alignment model can select the candidate blocks to perform more accurate analysis and it is capable to find conserved blocks on a set of genomes.  相似文献   

12.
The genomes of pathogenic Haemophilus influenzae strains are larger than that of Rd KW20 (Rd), the nonpathogenic laboratory strain whose genome has been sequenced. To identify potential virulence genes, we examined genes possessed by Int1, an invasive nonencapsulated isolate from a meningitis patient, but absent from Rd. Int1 was found to have a novel gene termed lav, predicted to encode a member of the AIDA-I/VirG/PerT family of virulence-associated autotransporters (ATs). Associated with lav are multiple repeats of the tetranucleotide GCAA, implicated in translational phase variation of surface molecules. Laterally acquired by H. influenzae, lav is restricted in distribution to a few pathogenic strains, including H. influenzae biotype aegyptius and Brazilian purpuric fever isolates. The DNA sequence of lav is surprisingly similar to that of a gene previously described for Neisseria meningitidis. Sequence comparisons suggest that lav was transferred relatively recently from Haemophilus to Neisseria, shortly before the divergence of N. meningitidis and Neisseria gonorrhoeae. Segments of lav predicted to encode passenger and beta-domains differ sharply in G+C base content, supporting the idea that AT genes have evolved by fusing domains which originated in different genomes. Homology and base sequence comparisons suggest that a novel biotype aegyptius AT arose by swapping an unrelated sequence for the passenger domain of lav. The unusually mobile lav locus joins a growing list of genes transferred from H. influenzae to Neisseria. Frequent gene exchange suggests a common pool of hypervariable contingency genes and may help to explain the origin of invasiveness in certain respiratory pathogens.  相似文献   

13.
We report the complete 36,717 bp genome sequence of bacteriophage Mu and provide an analysis of the sequence, both with regard to the new genes and other genetic features revealed by the sequence itself and by a comparison to eight complete or nearly complete Mu-like prophage genomes found in the genomes of a diverse group of bacteria. The comparative studies confirm that members of the Mu-related family of phage genomes are genetically mosaic with respect to each other, as seen in other groups of phages such as the phage lambda-related group of phages of enteric hosts and the phage L5-related group of mycobacteriophages. Mu also possesses segments of similarity, typically gene-sized, to genomes of otherwise non-Mu-like phages. The comparisons show that some well-known features of the Mu genome, including the invertible segment encoding tail fiber sequences, are not present in most members of the Mu genome sequence family examined here, suggesting that their presence may be relatively volatile over evolutionary time.The head and tail-encoding structural genes of Mu have only very weak similarity to the corresponding genes of other well-studied phage types. However, these weak similarities, and in some cases biochemical data, can be used to establish tentative functional assignments for 12 of the head and tail genes. These assignments are strongly supported by the fact that the order of gene functions assigned in this way conforms to the strongly conserved order of head and tail genes established in a wide variety of other phages. We show that the Mu head assembly scaffolding protein is encoded by a gene nested in-frame within the C-terminal half of another gene that encodes the putative head maturation protease. This is reminiscent of the arrangement established for phage lambda.  相似文献   

14.
We are interested in quantifying the contribution of gene acquisition, loss, expansion and rearrangements to the evolution of microbial genomes. Here, we discuss factors influencing microbial genome divergence based on pair-wise genome comparisons of closely related strains and species with different lifestyles. A particular focus is on intracellular pathogens and symbionts of the genera Rickettsia, Bartonella and BUCHNERA: Extensive gene loss and restricted access to phage and plasmid pools may provide an explanation for why single host pathogens are normally less successful than multihost pathogens. We note that species-specific genes tend to be shorter than orthologous genes, suggesting that a fraction of these may represent fossil-orfs, as also supported by multiple sequence alignments among species. The results of our genome comparisons are placed in the context of phylogenomic analyses of alpha and gamma proteobacteria. We highlight artefacts caused by different rates and patterns of mutations, suggesting that atypical phylogenetic placements can not a priori be taken as evidence for horizontal gene transfer events. The flexibility in genome structure among free-living microbes contrasts with the extreme stability observed for the small genomes of aphid endosymbionts, in which no rearrangements or inflow of genetic material have occurred during the past 50 millions years (1). Taken together, the results suggest that genomic stability correlate with the content of repeated sequences and mobile genetic elements, and thereby indirectly with bacterial lifestyles.  相似文献   

15.
Comparisons of Two Large Phaeoviral Genomes and Evolutionary Implications   总被引:1,自引:0,他引:1  
The evolution of viral genomes has recently attracted considerable attention. We compare the sequences of two large viral genomes, EsV-1 and FirrV-1, belonging to the family of phaeoviruses which infect different species of marine brown algae. Although their genomes differ substantially in size, these viruses share similar morphologies and similar latent infection cycles. In fact, sequence comparisons show that the viruses have more than 60% of their genes in common. However, the order of genes is completely different in the two genomes, suggesting that extensive recombinational events in addition to several large deletions had occurred during the separate evolutionary routes from a common ancestor. We investigated genes encoding components of signal transduction pathways and genes encoding replicative functions in more detail. We found that the two genomes possess different, although overlapping, sets of genes in both classes, suggesting that different genes from each class were lost, perhaps randomly, after the separate evolution from an ancestral genome. Random loss would also account for the fact that more than one-third of the genes in one viral genome has no counterparts in the other genome. We speculate that the ancestral genome belonged to a cellular organism that had once invaded a primordial brown algal host.  相似文献   

16.
Wu H  Mao F  Olman V  Xu Y 《Nucleic acids research》2007,35(7):2125-2140
Functional classification of genes represents a fundamental problem to many biological studies. Most of the existing classification schemes are based on the concepts of homology and orthology, which were originally introduced to study gene evolution but might not be the most appropriate for gene function prediction, particularly at high resolution level. We have recently developed a scheme for hierarchical classification of genes (HCGs) in prokaryotes. In the HCG scheme, the functional equivalence relationships among genes are first assessed through a careful application of both sequence similarity and genomic neighborhood information; and genes are then classified into a hierarchical structure of clusters, where genes in each cluster are functionally equivalent at some resolution level, and the level of resolution goes higher as the clusters become increasingly smaller traveling down the hierarchy. The HCG scheme is validated through comparisons with the taxonomy of the prokaryotic genomes, Clusters of Orthologous Groups (COGs) of genes and the Pfam system. We have applied the HCG scheme to 224 complete prokaryotic genomes, and constructed a HCG database consisting of a forest of 5339 multi-level and 15 770 single-level trees of gene clusters covering approximately 93% of the genes of these 224 genomes. The validation results indicate that the HCG scheme not only captures the key features of the existing classification schemes but also provides a much richer organization of genes which can be used for functional prediction of genes at higher resolution and to help reveal evolutionary trace of the genes.  相似文献   

17.
Fragments of mitochondrial DNA (mtDNA) transferred to the nuclear genome are called nuclear mitochondrial DNAs (NUMTs). We report here a comparison of NUMT content between genomes from two species of the same genus. Analysis of the genomes of Phytophthora sojae and P. ramorum revealed large differences in the NUMT content of the two genomes: 16.27 x 10(-3) and 2.28 x 10(-3)% of each genome, respectively. Substantial differences also exist between the two species in the sizes of the NUMTs found in each genome, with ranges of 20 to 405 bp for P. sojae and 19 to 137 bp for P. ramorum. Furthermore, in P. sojae, fragments from the mitochondrial genes rns, rnl, coxl, and nad (various subunits) are found most frequently, whereas P. ramorum NUMTs most often originate from the cox3, rpsl4, nad4, and nad5 genes. The large differences in the presumptive mtDNA insertions suggest that the insertions occurred subsequent to the divergence of the two species, and this is supported by sequence comparisons among the NUMTs and the mtDNA sequences of the two species. P. sojae mtDNA sequences inserted in the nuclear genome appear to have been altered as a result of insertions, deletions, inversions, and translocations and provide insights into active mechanisms of sequence divergence in this plant pathogen. No clear examples were found of NUMTs forming functional nuclear genes or of NUMTs inserted into exons or introns of any nuclear gene.  相似文献   

18.
MOTIVATION: Calculation of the information content of motifs in genomes highly biased in nucleotide composition is likely to lead to overestimates of the amount of useful information in the motif. Calculating relative information can compensate for biases, however the resulting information content is the amount seen by an observer and not by a macromolecule binding to the motif. The latter is needed to calculate the discriminatory power of the motif and to compare motifs between species. RESULTS: By treating a biased genome as a discrete channel with noise, in accordance with Shannon Information Theory, we were able to remove both 'Distortion' and 'Noise' from the motif and recover a more instructive biological 'signal.' A Java application, LogoPaint, was developed to remove nucleotide bias distortion and triplet frequency noise from motifs, calculate information content and present the motif as a logo. We demonstrate how this technique can 'unmask' motifs in the translation initiation regions of bacteria that are obscured by strong sequence biases. AVAILABILITY: LogoPaint is available to all users from the authors as an executable JAR file. Source code is available by arrangement.  相似文献   

19.
A number of studies indicated that lineages of animals with high rates of mitochondrial (mt) gene rearrangement might have high rates of mt nucleotide substitution. We chose the hemipteroid assemblage and the Insecta to test the idea that rates of mt gene rearrangement and mt nucleotide substitution are correlated. For this purpose, we sequenced the mt genome of a lepidopsocid from the Psocoptera, the only order of hemipteroid insects for which an entire mtDNA sequence is not available. The mt genome of this lepidopsocid is circular, 16,924 bp long, and contains 37 genes and a putative control region; seven tRNA genes and a protein-coding gene in this genome have changed positions relative to the ancestral arrangement of mt genes of insects. We then compared the relative rates of nucleotide substitution among species from each of the four orders of hemipteroid insects and among the 20 insects whose mt genomes have been sequenced entirely. All comparisons among the hemipteroid insects showed that species with higher rates of gene rearrangement also had significantly higher rates of nucleotide substitution statistically than did species with lower rates of gene rearrangement. In comparisons among the 20 insects, where the mt genomes of the two species differed by more than five breakpoints, the more rearranged species always had a significantly higher rate of nucleotide substitution than the less rearranged species. However, in comparisons where the mt genomes of two species differed by five or less breakpoints, the more rearranged species did not always have a significantly higher rate of nucleotide substitution than the less rearranged species. We tested the statistical significance of the correlation between the rates of mt gene rearrangement and mt nucleotide substitution with nine pairs of insects that were phylogenetically independent from one another. We found that the correlation was positive and statistically significant (R2 = 0.73, P = 0.01; Rs = 0.67, P < 0.05). We propose that increased rates of nucleotide substitution may lead to increased rates of gene rearrangement in the mt genomes of insects.  相似文献   

20.
The identification of conserved sequence tags (CSTs) through comparative genome analysis may reveal important regulatory elements involved in shaping the spatio-temporal expression of genetic information. It is well known that the most significant fraction of CSTs observed in human–mouse comparisons correspond to protein coding exons, due to their strong evolutionary constraints. As we still do not know the complete gene inventory of the human and mouse genomes it is of the utmost importance to establish if detected conserved sequences are genes or not. We propose here a simple algorithm that, based on the observation of the specific evolutionary dynamics of coding sequences, efficiently discriminates between coding and non-coding CSTs. The application of this method may help the validation of predicted genes, the prediction of alternative splicing patterns in known and unknown genes and the definition of a dictionary of non-coding regulatory elements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号