共查询到20条相似文献,搜索用时 15 毫秒
1.
The translation start site (TSS) plays an important role in the control of the translational efficiency and cytoplasmic stability of eukaryotic mRNAs. The efficiency of TSS recognition is known to be influenced by sequence context, and mRNAs with weak TSSs are relatively abundant. We analyzed a sample of 4113 yeast genes in a search for features that might serve to compensate for the inefficient recognition of weak TSSs by initiating ribosomes. The first feature found to correlate with variations in TSS strength is differences in the stability of secondary structure upstream and downstream of the start AUG codon. The second feature concerns the characteristics of AUG triplets found at the beginning of the coding sequence, i.e., downstream of the predicted TSS. In particular, the proximal downstream AUG lies in frame with the CDS significantly more often if the TSS itself is located in a weak context. The accuracy of TSS annotation, the possibility of polypeptide heterogeneity due to the use of alternative downstream AUGs, and the influence of related features of mRNA sequences are discussed.Communicated by C. P. Hollenberg 相似文献
2.
The translation start site (TSS) plays an important role in the control of the translational efficiency and cytoplasmic stability of eukaryotic mRNAs. The efficiency of TSS recognition is known to be influenced by sequence context, and mRNAs with "weak" TSSs are relatively abundant. We analyzed a sample of 4113 yeast genes in a search for features that might serve to compensate for the inefficient recognition of "weak" TSSs by initiating ribosomes. The first feature found to correlate with variations in TSS strength is differences in the stability of secondary structure upstream and downstream of the start AUG codon. The second feature concerns the characteristics of AUG triplets found at the beginning of the coding sequence, i.e., downstream of the predicted TSS. In particular, the proximal downstream AUG lies in frame with the CDS significantly more often if the TSS itself is located in a "weak" context. The accuracy of TSS annotation, the possibility of polypeptide heterogeneity due to the use of alternative downstream AUGs, and the influence of related features of mRNA sequences are discussed. 相似文献
3.
Wuster A Venkatakrishnan AJ Schertler GF Babu MM 《Bioinformatics (Oxford, England)》2010,26(22):2906-2907
MOTIVATION: Spial (Specificity in alignments) is a tool for the comparative analysis of two alignments of evolutionarily related sequences that differ in their function, such as two receptor subtypes. It highlights functionally important residues that are either specific to one of the two alignments or conserved across both alignments. It permits visualization of this information in three complementary ways: by colour-coding alignment positions, by sequence logos and optionally by colour-coding the residues of a protein structure provided by the user. This can aid in the detection of residues that are involved in the subtype-specific interaction with a ligand, other proteins or nucleic acids. Spial may also be used to detect residues that may be post-translationally modified in one of the two sets of sequences. AVAILABILITY: http://www.mrc-lmb.cam.ac.uk/genomes/spial/; supplementary information is available at http://www.mrc-lmb.cam.ac.uk/genomes/spial/help.html. 相似文献
4.
Probing the relationship between Gram-negative and Gram-positive S1 proteins by sequence analysis 下载免费PDF全文
Philippe Salah Marco Bisaglia Pascale Aliprandi Marc Uzan Christina Sizun Fran?ois Bontems 《Nucleic acids research》2009,37(16):5578-5588
Escherichia coli ribosomal protein S1 is required for the translation initiation of messenger RNAs, in particular when their Shine–Dalgarno sequence is degenerated. Closely related forms of the protein, composed of the same number of domains (six), are found in all Gram-negative bacteria. More distant proteins, generally formed of fewer domains, have been identified, by sequence similarities, in Gram-positive bacteria and are also termed ‘S1 proteins’. However in the absence of functional information, it is generally difficult to ascertain their relationship with Gram-negative S1. In this article, we report the solution structure of the fourth and sixth domains of the E. coli protein S1 and show that it is possible to characterize their β-barrel by a consensus sequence that allows a precise identification of all domains in Gram-negative and Gram-positive S1 proteins. In addition, we show that it is possible to discriminate between five domain types corresponding to the domains 1, 2, 3, 4–5 and 6 of E. coli S1 on the basis of their sequence. This enabled us to identify the nature of the domains present in Gram-positive proteins and, subsequently, to probe the filiations between all forms of S1. 相似文献
5.
There is currently a gap in knowledge between complexes of known three-dimensional structure and those known from other experimental methods such as affinity purifications or the two-hybrid system. This gap can sometimes be bridged by methods that extrapolate interaction information from one complex structure to homologues of the interacting proteins. To do this, it is important to know if and when proteins of the same type (e.g. family, superfamily or fold) interact in the same way. Here, we study interactions of known structure to address this question. We found all instances within the structural classification of proteins database of the same domain pairs interacting in different complexes, and then compared them with a simple measure (interaction RMSD). When plotted against sequence similarity we find that close homologues (30-40% or higher sequence identity) almost invariably interact the same way. Conversely, similarity only in fold (i.e. without additional evidence for a common ancestor) is only rarely associated with a similarity in interaction. The results suggest that there is a twilight zone of sequence similarity where it is not possible to say whether or not domains will interact similarly. We also discuss the rare instances of fold similarities interacting the same way, and those where obviously homologous proteins interact differently. 相似文献
6.
The underlying relationship between functional variables and sequence evolutionary rates is often assessed by partial correlation
analysis. However, this strategy is impeded by the difficulty of conducting meaningful statistical analysis using noisy biological
data. A recent study suggested that the partial correlation analysis is misleading when data is noisy and that the principal
component regression analysis is a better tool to analyze biological data. In this paper, we evaluate how these two statistical
tools (partial correlation and principal component regression) perform when data are noisy. Contrary to the earlier conclusion,
we found that these two tools perform comparably in most cases. Furthermore, when there is more than one ‘true’ independent
variable, partial correlation analysis delivers a better representation of the data. Employing both tools may provide a more
complete and complementary representation of the real data. In this light, and with new analyses, we suggest that protein
length and gene dispensability play significant, independent roles in yeast protein evolution.
Electronic supplementary material Supplementary material is available in the online version of this article at and is accessible for authorized users. 相似文献
7.
On the relationship between residue structural environment and sequence conservation in proteins 下载免费PDF全文
Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the Cα positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment‐related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side‐chain atoms, or side‐chain centroid. To know whether the Cα atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared Cα atoms with other substructures in their contributions to the sequence conservation. Our results show that Cα positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between Cα atoms and the other substructures are high, yielding similar structure–conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between Cα and all‐atom substructures. These results indicate that only Cα atoms of a protein structure could reflect sequence conservation at the residue level. 相似文献
8.
Insect endogenous retroviruses (IERVs) are present in the genome of several species. Previous studies have shown a relationship between the envelope glycoproteins (Envs) and fusion proteins (FPs) of several baculoviruses. We used this sequence similarity to predict fusion domains in the Envs of IERVs. We suggest that FPs and Envs share several specific sequence and structural motifs with other RNA viruses in the viral transmembrane protein superfamily. 相似文献
9.
10.
Gene expression is known to correlate with the degree of codon bias in many unicellular organisms. However, such a correlation is not observed in some organisms. It was demonstrated that inverted complementary repeats within coding DNA sequences (ORFs) should be considered for proper estimation of the translation efficiency because they can form secondary structures that obstruct ribosome movement. A program was developed for estimating the potential expression of ORFs in unicellular organisms on the basis of their genome sequences. The program computes the elongation efficiency index (EEI) and takes into account three key factors: codon bias, the average number of inverted complementary repeats, and the free energies of potential stem-loop structures formed by these repeats. The influence of these factors on translation was numerically estimated. Their optimal ratio was computed for each organism. EEIs of 384 unicellular organisms (351 bacteria, 28 archaea, and 5 eukaryotes) were computed using the annotated genomes available from GenBank. Five potential evolutionary strategies of translational optimization were determined in the organisms studied. A considerable difference in preferential translational strategies was observed between bacteria and archaea. Significant correlations between EEIs and gene expression levels were shown for two species (Saccharomyces cerevisiae and Helicobacter pylori), using the available microarray data. The method allows the numerical estimation of the translation efficiency of an ORF and optimization of the nucleotide composition of heterologous genes in specified unicellular organisms. The program is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/eei-calculator. 相似文献
11.
Gene expression is known to correlate with degree of codon bias in many unicellular organisms. However, such correlation is absent in some organisms. Recently we demonstrated that inverted complementary repeats within coding DNA sequence must be considered for proper estimation of translation efficiency, since they may form secondary structures that obstruct ribosome movement. We have developed a program for estimation of potential coding DNA sequence expression in defined unicellular organism using its genome sequence. The program computes elongation efficiency index. Computation is based on estimation of coding DNA sequence elongation efficiency, taking into account three key factors: codon bias, average number of inverted complementary repeats, and free energy of potential stem-loop structures formed by the repeats. The influence of these factors on translation is numerically estimated. An optimal proportion of these factors is computed for each organism individually. Quantitative translational characteristics of 384 unicellular organisms (351 bacteria, 28 archaea, 5 eukaryota) have been computed using their annotated genomes from NCBI GenBank. Five potential evolutionary strategies of translational optimization have been determined among studied organisms. A considerable difference of preferred translational strategies between Bacteria and Archaea has been revealed. Significant correlations between elongation efficiency index and gene expression levels have been shown for two organisms (S. cerevisiae and H. pylori) using available microarray data. The proposed method allows to estimate numerically the coding DNA sequence translation efficiency and to optimize nucleotide composition of heterologous genes in unicellular organisms. Availability: http://www.mgs.bionet.nsc.ru/mgs/programs/eei-calculator/. 相似文献
12.
Quantitative analysis of the relationship between nucleotide sequence and functional activity. 总被引:13,自引:9,他引:13 下载免费PDF全文
Matrices can be used to evaluate sequences for functional activity. Multiple regression can solve for the matrix that gives the best fit between sequence evaluations and quantitative activities. This analysis shows that the best model for context effects on suppression by su2 involves primarily the two nucleotides 3' to the amber codon, and that their contributions are independent and additive. Context effects on 2AP mutagenesis also involve the two nucleotides 3' to the 2AP insertion, but their effects are not independent. In a construct for producing beta-galactosidase, the effects on translational yields of the tri-nucleotide 5' to the initiation codon are dependent on the entire triplet. Models based on these quantitative results are presented for each of the examples. 相似文献
13.
14.
15.
16.
Knowledge about the structural features underlying cold adaptation is important for designing enzymes of different industrial relevance. Vibriolysin from Antarctic bacterium strain 643 (VAB) is at present the only enzyme of the thermolysin family from an organism that thrive in extremely cold climate. In this study comparative sequence-structure analysis and molecular dynamics (MD) simulations were used to reveal the molecular features of cold adaptation of VAB. Amino acid sequence analysis of 44 thermolysin enzymes showed that VAB compared to the other enzymes has: (1) fewer arginines, (2) a lower Arg/(Lys + Arg) ratio, (3) a lower fraction of large aliphatic side chains, expressed by the (Ile + Leu)/(Ile + Leu + Val) ratio, (4) more methionines, (5) more serines, and (6) more of the thermolabile amino acid asparagine. A model of the catalytic domain of VAB was constructed based on homology with pseudolysin. MD simulations for 3 ns of VAB, pseudolysin, and thermolysin supported the assumption that cold-adapted enzymes have a more flexible three-dimensional (3D) structure than their thermophilic and mesophilic counterparts, especially in some loop regions. The structural analysis indicated that VAB has fewer intramolecular cation-pi electron interactions and fewer hydrogen bonds than its mesophilic (pseudolysin) and thermophilic (thermolysin) counterparts. Lysine is the dominating cationic amino acids involved in salt bridges in VAB, while arginine is dominating in thermolysin and pseudolysin. VAB has a greater volume of inaccessible cavities than pseudolysin and thermolysin. The electrostatic potentials on the surface of the catalytic domain were also more negative for VAB than for thermolysin and pseudolysin. Thus, the MD simulations, the structural patterns, and the amino acid composition of VAB relative to other enzymes of the thermolysin family suggest that VAB possesses the biophysical properties generally following adaptation to cold climate. 相似文献
17.
The occurrence and relative positions of cysteine residues were investigated in proteins of various species. Considering random mathematical occurrence for an amino acid coded by two codons (3. 28%), cysteine is underrepresented in all organisms investigated. Representation of cysteine appears to correlate positively with the complexity of the organism, ranging between 2.26% in mammals and 0. 5% in some members of the Archeabacteria order. This observation, together with the results obtained from comparison of cysteine content of various ribosomal proteins, indicates that evolution takes advantage of increased use of cysteine residues. In all organisms studied except plants, two cysteines are frequently found two amino acid residues apart (C-(X)(2)-C motif). Such a motif is known to be present in a variety of metal-binding proteins and oxidoreductases. Remarkably, more than 21% of all of cysteines were found within the C-(X)(2)-C motifs in ARCHEA.: This observation may indicate that cysteine appeared in ancient metal-binding proteins first and was introduced into other proteins later. 相似文献
18.
Comparative proteomic analysis reveals similar and distinct features of proteins in dry and wet stigmas 总被引:1,自引:0,他引:1
Angiosperm stigma supports compatible pollen germination and tube growth, resulting in fertilization and seed production. Stigmas are mainly divided into two types, dry and wet, according to the absence or presence of exudates on their surfaces. Here, we used 2DE and MS to identify proteins specifically and preferentially expressed in the stigmas of maize (Zea Mays, dry stigma) and tobacco (Nicotiana tabacum, wet stigma), as well as proteins rinsed from the surface of the tobacco stigma. We found that the specifically and preferentially expressed proteins in maize and tobacco stigmas share similar distributions in functional categories. However, these proteins showed important difference between dry and wet stigmas in a few aspects, such as protein homology in "signal transduction" and "lipid metabolism," relative expression levels of proteins containing signal peptides and proteins in "defense and stress response." These different features might be related to the specific structures and functions of dry and wet stigmas. The possible roles of some stigma-expressed proteins were discussed. Our results provide important information on functions of proteins in dry and wet stigmas and reveal aspects of conservation and divergence between dry and wet stigmas at the proteomic level. 相似文献
19.
The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. 相似文献
20.
《Saudi Journal of Biological Sciences》2022,29(3):1618-1627
Genus Pinus is a widely dispersed genus of conifer plants in the Northern Hemisphere. However, the inadequate accessibility of genomic knowledge limits our understanding of molecular phylogeny and evolution of Pinus species. In this study, the evolutionary features of complete plastid genome and the phylogeny of the Pinus genus were studied. A total of thirteen divergent hotspot regions (trnk-UUU, matK, trnQ-UUG, atpF, atpH, rpoC1, rpoC2, rpoB, ycf2, ycf1, trnD-GUC, trnY-GUA, and trnH-GUG) were identified that would be utilized as possible genetic markers for determination of phylogeny and population genetics analysis of Pinus species. Furthermore, seven genes (petD, psaI, psaM, matK, rps18, ycf1, and ycf2) with positive selection site in Pinus species were identified. Based on the whole genome this phylogenetic study showed that twenty-four Pinus species form a significant genealogical clade. Divergence time showed that the Pinus species originated about 100 million years ago (MYA) (95% HPD, 101.76.35–109.79 MYA), in lateral stages of Cretaceous. Moreover, two of the subgenera are consequently originated in 85.05 MYA (95% HPD, 81.04–88.02 MYA). This study provides a phylogenetic relationship and a chronological framework for the future study of the molecular evolution of the Pinus species. 相似文献