首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

In eukaryotic genomes, most genes are members of gene families. When comparing genes from two species, therefore, most genes in one species will be homologous to multiple genes in the second. This often makes it difficult to distinguish orthologs (separated through speciation) from paralogs (separated by other types of gene duplication). Combining phylogenetic relationships and genomic position in both genomes helps to distinguish between these scenarios. This kind of comparison can also help to describe how gene families have evolved within a single genome that has undergone polyploidy or other large-scale duplications, as in the case of Arabidopsis thaliana – and probably most plant genomes.  相似文献   

2.

Background  

Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances.  相似文献   

3.

Background  

The recent availability of an expanding collection of genome sequences driven by technological advances has facilitated comparative genomics and in particular the identification of synteny among multiple genomes. However, the development of effective and easy-to-use methods for identifying such conserved gene clusters among multiple genomes–synteny blocks–as well as databases, which host synteny blocks from various groups of species (especially eukaryotes) and also allow users to run synteny-identification programs, lags behind.  相似文献   

4.

Background  

Integral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. They come in two structural classes, the α-helical and the β-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the α-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the β-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane β-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained in a discriminative manner, aiming at maximizing the probability of correct predictions rather than the likelihood of the sequences.  相似文献   

5.

Background  

Searching for small tandem/disperse repetitive DNA sequences streamlines many biomedical research processes. For instance, whole genomic array analysis in yeast has revealed 22 PHO-regulated genes. The promoter regions of all but one of them contain at least one of the two core Pho4p binding sites, CACGTG and CACGTT. In humans, microsatellites play a role in a number of rare neurodegenerative diseases such as spinocerebellar ataxia type 1 (SCA1). SCA1 is a hereditary neurodegenerative disease caused by an expanded CAG repeat in the coding sequence of the gene. In bacterial pathogens, microsatellites are proposed to regulate expression of some virulence factors. For example, bacteria commonly generate intra-strain diversity through phase variation which is strongly associated with virulence determinants. A recent analysis of the complete sequences of the Helicobacter pylori strains 26695 and J99 has identified 46 putative phase-variable genes among the two genomes through their association with homopolymeric tracts and dinucleotide repeats. Life scientists are increasingly interested in studying the function of small sequences of DNA. However, current search algorithms often generate thousands of matches – most of which are irrelevant to the researcher.  相似文献   

6.
7.
We describe the complete mitochondrial genomes of the green lacewing species Chrysoperla nipponensis (Okamoto, 1914) and Apochrysa matsumurae Okamoto 1912 (Neuroptera: Chrysopidae). The genomes were 16,057 and 16,214 bp in size, respectively, and comprised 37 genes (13 protein coding genes, 22 tRNA genes and two rRNA genes). A major noncoding (control) region was 1,244 bp in C. nipponensis and 1,407 in A. matsumurae, and the structure was simpler than that reported in other Neuroptera, lacking conserved blocks or long tandem repeats. The overall arrangement of genes was almost the same as that found in most arthropod mitochondrial genomes, with the one exception of a tRNA rearrangement to tRNA-Cys–tRNA-Trp–tRNA-Tyr, rather than the plesiomorphic tRNA-Trp–tRNA-Cys–tRNA-Tyr. A high A + T content (78.89 and 79.02%, respectively), A + T-rich codon bias, and a mismatch between the most-used codon and its corresponding tRNA anticodon were observed as a typical feature of the insect mitochondrial genome.  相似文献   

8.

Background  

Transposable elements (TE) are mobile genetic entities present in nearly all genomes. Previous work has shown that TEs tend to have a different nucleotide composition than the host genes, either considering codon usage bias or dinucleotide frequencies. We show here how these compositional differences can be used as a tool for detection and analysis of TE sequences.  相似文献   

9.

Background  

Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution.  相似文献   

10.

Background  

Recent studies have demonstrated a selection pressure for reduced mRNA secondary-structure stability near the start codon of coding sequences. This selection pressure can be observed in bacteria, archaea, and eukaryotes, and is likely caused by the requirement of efficient translation initiation in cellular organism.  相似文献   

11.

Background  

High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes.  相似文献   

12.
The compositional distributions of large DNA fragments reflect those of the isochores that make up vertebrate genomes and can provide novel phylogenetic insights in the case of mammalian genomes (see Sabeur et al. 1993). This approach has been complemented here by an analysis of the compositional patterns of coding sequences and their codon positions (which also reflect the isochore pattern) and by a comparison of the base compositions of codon positions from homologous genes in a number of pairs of species. The results obtained using these two approaches support the existence of a general compositional pattern for mammalian genomes and of a distinct pattern for Myomorpha. The other two “special” patterns identified in a megachiropteran and in pangolin could not be tested here. Presented at the NATO Advanced Research Workshop onGenome Organization and Evolution, Spetsai, Greece, 16–22 September 1992  相似文献   

13.

Background  

Although extensive research has been performed to control differentiation of neural stem cells – still, the response of those cells to diverse cell culture conditions often appears to be random and difficult to predict. To this end, we strived to obtain stabilized protocol of NHA cells differentiation – allowing for an increase in percentage yield of neuronal cells.  相似文献   

14.

Background  

Non-coding RNAs (ncRNAs) have a multitude of roles in the cell, many of which remain to be discovered. However, it is difficult to detect novel ncRNAs in biochemical screens. To advance biological knowledge, computational methods that can accurately detect ncRNAs in sequenced genomes are therefore desirable. The increasing number of genomic sequences provides a rich dataset for computational comparative sequence analysis and detection of novel ncRNAs.  相似文献   

15.
16.

Background  

Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.  相似文献   

17.

Background  

Retrotransposons are commonly occurring eukaryotic transposable elements (TEs). Among these, long terminal repeat (LTR) retrotransposons are the most abundant TEs and can comprise 50–90% of the genome in higher plants. By comparing the orthologous chromosomal regions of closely related species, the effects of TEs on the evolution of plant genomes can be studied in detail.  相似文献   

18.

Background  

DNA homopolymer tracts, poly(dA).poly(dT) and poly(dG).poly(dC), are the simplest of simple sequence repeats. Homopolymer tracts have been systematically examined in the coding, intron and flanking regions of a limited number of eukaryotes. As the number of DNA sequences publicly available increases, the representation (over and under) of homopolymer tracts of different lengths in these regions of different genomes can be compared.  相似文献   

19.
20.

Background  

Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号