首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
R R Robinson  N Davidson 《Cell》1981,23(1):251-259
A recombinant DNA phage containing a cluster of Drosophila melanogaster tRNA genes has been isolated and analyzed. The insert of this phage has been mapped by in situ hybridization to chromosomal region 50AB, a known tRNA site. Nucleotide sequencing of the entire Drosophila tRNA coding region reveals seven tRNA genes spanning 2.5 kb of chromosomal DNA. This cluster is separated from other tRNA regions on the chromosome by at least 2.7 kb on one side, and 9.6 kb on the other. Two tRNA genes are nearly identical and contain intervening sequences of length 38 and 45 bases, respectively, in the anticodon loop. These two genes are assigned to be tRNALeu genes because of significant sequence homology with yeast tRNA3Leu, and secondary structure homology with yeast tRNA3Leu intervening sequence. In addition, an 8 base sequence (AAAAUCUU) is conserved in the same location in the intervening sequences of Drosophila tRNALeu genes and a yeast tRNA3Leu gene. Similar sequenes occur in all other tRNAs containing intervening sequences. The remaining five genes are identical tRNAIle genes, which are also identical to a tRNAIle gene from chromosomal region 42A. The 5' flanking regions are only weakly homologous, but each set of isoacceptors contains short regions of strong homology approximately 20 nucleotides preceding the tRNA coding sequences: GCNTTTTG preceding tRNAIle genes; and GANTTTGG preceding tRNALeu genes. The genes are irregularly distributed on both DNA strands; spacing regions are divergent in sequence and length.  相似文献   

2.
We present the sequence of the 5' terminal 585 nucleotides of mouse 28S rRNA as inferred from the DNA sequence of a cloned gene fragment. The comparison of mouse 28S rRNA sequence with its yeast homolog, the only known complete sequence of eukaryotic nucleus-encoded large rRNA (see ref. 1, 2) reveals the strong conservation of two large stretches which are interspersed with completely divergent sequences. These two blocks of homology span the two segments which have been recently proposed to participate directly in the 5.8S-large rRNA complex in yeast (see ref. 1) through base-pairing with both termini of 5.8S rRNA. The validity of the proposed structural model for 5.8S-28S rRNA complex in eukaryotes is strongly supported by comparative analysis of mouse and yeast sequences: despite a number of mutations in 28S and 5.8S rRNA sequences in interacting regions, the secondary structure that can be proposed for mouse complex is perfectly identical with yeast's, with all the 41 base-pairings between the two molecules maintained through 11 pairs of compensatory base changes. The other regions of the mouse 28S rRNA 5'terminal domain, which have extensively diverged in primary sequence, can nevertheless be folded in a secondary structure pattern highly reminiscent of their yeast' homolog. A minor revision is proposed for mouse 5.8S rRNA sequence.  相似文献   

3.
This paper describes a computer program designed to look for similarities between pairs of nucleic or amino acid sequences. The program looks both for segments of perfect identity or for regions where, using a scoring matrix, a minimum value is exceeded. The results of comparisons are presented as a matrix which is displayed on a simple graphics terminal. Use of a graphics terminal allows the user to display the whole of the two sequences in one screenful or to home-in on regions of interest to examine them in more detail. The program is interactive and so the user can easily see the effect of changes to variables and can use inbuilt editing functions to make insertions to produce alignments of the two sequences. These aligned sequences can then be saved on disk files for further processing.  相似文献   

4.
Sequence alignments are fundamental to a wide range of applications, including database searching, functional residue identification and structure prediction techniques. These applications predict or propagate structural/functional/evolutionary information based on a presumed homology between the aligned sequences. If the initial hypothesis of homology is wrong, no subsequent application, however sophisticated, can be expected to yield accurate results. Here we present a novel method, LEON, to predict homology between proteins based on a multiple alignment of complete sequences (MACS). In MACS, weak signals from distantly related proteins can be considered in the overall context of the family. Intermediate sequences and the combination of individual weak matches are used to increase the significance of low-scoring regions. Residue composition is also taken into account by incorporation of several existing methods for the detection of compositionally biased sequence segments. The accuracy and reliability of the predictions is demonstrated in large-scale comparisons with structural and sequence family databases, where the specificity was shown to be >99% and the sensitivity was estimated to be ~76%. LEON can thus be used to reliably identify the complex relationships between large multidomain proteins and should be useful for automatic high-throughput genome annotations, 2D/3D structure predictions, protein–protein interaction predictions etc.  相似文献   

5.
S. Rackovsky 《Proteins》2015,83(11):1923-1928
We examine the utility of informatic‐based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge‐based correlation between the sequences and structures of proteins. It is shown that there are well‐defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common—almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined. Proteins 2015; 83:1923–1928. © 2015 Wiley Periodicals, Inc.  相似文献   

6.
MOTIVATION: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps. RESULTS: In this paper we introduce the notion of glocal alignment, a combination of global and local methods, where one creates a map that transforms one sequence into the other while allowing for rearrangement events. We present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global aligner, and is able to align long genomic sequences. To test Shuffle-LAGAN we split the mouse genome into BAC-sized pieces, and aligned these pieces to the human genome. We demonstrate that Shuffle-LAGAN compares favorably in terms of sensitivity and specificity with standard local and global aligners. From the alignments we conclude that about 9% of human/mouse homology may be attributed to small rearrangements, 63% of which are duplications.  相似文献   

7.
The myosin 20,000-D regulatory light chain (RLC) has a central role in smooth muscle contraction. Previous work has suggested either the presence of two RLC isoforms, one specific for nonmuscle and one specific for smooth muscle, or the absence of a true smooth muscle-specific isoform, in which instance smooth muscle cells would use nonmuscle isoforms. To address this issue directly, we have isolated rat RLC cDNAs and corresponding genomic sequences of two smooth muscle RLC based on homology to the amino acid sequence of the chicken gizzard RLC. These cDNAs are highly homologous in their amino acid coding regions and contain unique 3'-untranslated regions. RNA analyses of rat tissue using these unique 3'-untranslated regions revealed that their expression is differentially regulated. However, one cDNA (RLC-B), predominantly a nonmuscle isoform, based on abundant expression in nonmuscle tissues including brain, spleen, and lung, is easily detected in smooth muscle tissues. The other cDNA (RLC-A; see Taubman, M., J. W. Grant, and B. Nadal-Ginard. 1987. J. Cell Biol. 104:1505-1513) was detected in a variety of nonmuscle, smooth muscle, and sarcomeric tissues. RNA analyses comparing expression of both RLC genes with the actin gene family and smooth muscle specific alpha-tropomyosin demonstrated that neither RLC gene was strictly smooth muscle specific. RNA analyses of cell lines demonstrated that both of the RLC genes are expressed in a variety of cell types. The complete genomic structure of RLC-A and close linkage to RLC-B is described.  相似文献   

8.
Summary The centromeric regions of human chromosomes are characterized by diverged chromosome-specific subsets of a tandemly repeated DNA family, alpha satellite, which is based on a fundamental monomer repeat unit 171 bp in length. We have compared the nucleotide sequences of 44 alphoid monomers derived from cloned representatives of the multimeric higher-order repeat units of human chromosomes 1, 11, 17, and X. The 44 monomers exhibit an average 16% divergence from a consensus alphoid sequence, and can be assigned to five distinct homology groups based on patterns of sequence substitutions and gaps relative to the consensus. Approximately half of the overall sequence divergence can be accounted for by sequence changes specific to a particular homology group; the remaining divergence appears to be independent of the five groups and is randomly distributed, both within and between chromosomal subsets. The data are consistent with the proposal that the contemporary tandem arrays on chromosomes 1, 11, 17, and X derive from a common multimeric repeat, consisting of one monomer each from the five homology groups. The sequence comparisons suggest that this pentameric repeat must have spread to these four chromosomal locations many millions of years ago, since which time evolution of the four, now chromosome-specific, alpha satellite subsets has been essentially independent.  相似文献   

9.
T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.  相似文献   

10.
Genome annotation in differently evolved organisms presents challenges because the lack of sequence-based homology limits the ability to determine the function of putative coding regions. To provide an alternative to annotation by sequence homology, we developed a method that takes advantage of unusual trypanosomatid biology and skews in nucleotide composition between coding regions and upstream regions to rank putative open reading frames based on the likelihood of coding. The method is 93% accurate when tested on known genes. We have applied our method to the full complement of open reading frames on Chromosome I of Trypanosoma brucei, and we can predict with high confidence that 226 putative coding regions are likely to be functional. Methods such as the one described here for discriminating true coding regions are critical for genome annotation when other sources of evidence for function are limited.  相似文献   

11.
Phylogenetic analyses of non-protein-coding nucleotide sequences such as ribosomal RNA genes, internal transcribed spacers, and introns are often impeded by regions of the alignments that are ambiguously aligned. These regions are characterized by the presence of gaps and their uncertain positions, no matter which optimization criteria are used. This problem is particularly acute in large-scale phylogenetic studies and when aligning highly diverged sequences. Accommodating these regions, where positional homology is likely to be violated, in phylogenetic analyses has been dealt with very differently by molecular systematists and evolutionists, ranging from the total exclusion of these regions to the inclusion of every position regardless of ambiguity in the alignment. We present a new method that allows the inclusion of ambiguously aligned regions without violating homology. In this three-step procedure, first homologous regions of the alignment containing ambiguously aligned sequences are delimited. Second, each ambiguously aligned region is unequivocally coded as a new character, replacing its respective ambiguous region. Third, each of the coded characters is subjected to a specific step matrix to account for the differential number of changes (summing substitutions and indels) needed to transform one sequence to another. The optimal number of steps included in the step matrix is the one derived from the pairwise alignment with the greatest similarity and the least number of steps. In addition to potentially enhancing phylogenetic resolution and support, by integrating previously nonaccessible characters without violating positional homology, this new approach can improve branch length estimations when using parsimony.  相似文献   

12.
The nuclear and chloroplast ribosomal DNAs from Euglena were shown to have specific regions of nucleotide sequence homology. The regions of homology were identified by hybridization of restriction endonuclease DNA fragments of cloned chloroplast and nuclear ribosomal DNAs to one another. The regions of homology between these two ribosomal DNAs were in that part of the genes that code for the 3′ end of the small rRNAs (16S and 19S) and near or at the DNA sequences coding for the 5S RNAs. The nucleotide sequence homology between these regions was estimated to be approximately 94% by the melting point depression of a hybrid formed between the two ribosomal DNAs.  相似文献   

13.
Structure and evolution of the Xenopus laevis albumin genes   总被引:4,自引:0,他引:4  
The 68K and 74K albumin genes of Xenopus laevis arose by duplication approximately 30 million years ago. Electron microscopic analysis showed that both genes contain 15 coding sequences. The lengths of corresponding coding sequences are almost identical and are extremely similar to those of mammalian albumin genes. A block of four coding sequences, which in mammals codes for one protein domain, is repeated three times. The corresponding introns are usually different in length and have therefore diverged as a result of insertion/deletion events. The extensive homology between these gene sequences is neither confined to nor most extensive in the coding sequences and similar amounts of homologous sequences are found in the flanking DNAs as in the gene regions. Various structures were formed in the 5'-flanking DNA by mutually exclusive pairing of different homology regions. Analysis of the two 74K albumin gene sequences isolated suggests that the X. laevis genome may contain one 68K albumin gene and two very closely related 74K albumin genes.  相似文献   

14.
R C Yang  A Young    R Wu 《Journal of virology》1980,34(2):416-430
The DNA sequence of the early region of the human papovavirus BK (MM strain) was determined. A potential initiation signal for translation is located at nucleotides 3,047 to 3,045 or map position 0.614. Extending counterclockwise from this AUG signal there is only one open reading frame, which can code for a putative t antigen of 100 amino acids in length. If the early mRNA of BKV is spliced, then the regions between nucleotides 3,047 to 2,808 and 2,725 to 884 can code for a T antigen 694 amino acids in length. The sequences of the deduced T antigens in BK virus share 71% amino acid homology with those in simian virus 40, whereas the coding sequences of the two viruses share 70% DNA homology. Comparison of DNA sequences and evaluation of homology measurements between these two viruses are discussed.  相似文献   

15.
The normal distribution of crossover events on meiotic bivalents depends on homolog recognition, alignment, and interference. We developed a method for precisely locating all crossovers on Caenorhabditis elegans chromosomes and demonstrated that wild-type animals have essentially complete interference, with each bivalent receiving one and only one crossover. A physical break in one homolog has previously been shown to disrupt interference, suggesting that some aspect of bivalent structure is required for interference. We measured the distribution of crossovers in animals heterozygous for a large insertion to determine whether a break in sequence homology would have the same effect as a physical break. Insertions disrupt crossing over locally. However, every bivalent still experiences essentially one and only one crossover, suggesting that interference can act across a large gap in homology. Although insertions did not affect crossover number, they did have an effect on crossover distribution. Crossing over was consistently higher on the side of the chromosome bearing the homolog recognition region and lower on the other side of the chromosome. We suggest that nonhomologous sequences cause heterosynapsis, which disrupts crossovers along the distal chromosome, even when those regions contain sequences that could otherwise align. However, because crossovers are not completely eliminated distal to insertions, we propose that alignment can be reestablished after a megabase-scale gap in sequence homology.  相似文献   

16.
The 3'- and 5'-terminal sequences of the five large double-stranded RNA species (L-dsRNA; 4.5-6.0 X 10(6) daltons) of EP713, a hypovirulent strain of Endothia parasitica, were determined by mobility-shift and enzymatic methods. All the L-dsRNAs appeared to have identical terminal sequences. A heteropolymer sequence was found at one 3'-terminus and a poly(A) sequence of variable length at the other. It was possible to label only one 5'-terminus using polynucleotide kinase and [gamma-32P]ATP, and this was shown to be a poly(U) sequence of variable length. We propose that the dsRNAs have the following structure, where X represents a blocking group: (Formula: see text). A recombinant plasmid containing dsRNA-related sequences was constructed. Hybridization analysis using the recombinant probe indicated that the sequence homology among the L-dsRNAs extended beyond these terminal regions and was also shared by small dsRNAs (0.3-0.45 X 10(6) daltons).  相似文献   

17.
Characterization of sarcomeric myosin heavy chain genes   总被引:28,自引:0,他引:28  
Myosin heavy chain is encoded by a large multigene family. Using pMHC-25, a recombinant cDNA clone isolated from the rat myogenic cell line L6E9, four members of this family in the rat have been isolated and shown to be tissue-specific and developmentally regulated. The coding regions of these genes share regions of homology interspaced with regions of non-homology. Detailed analysis of one embryonic and one adult myosin heavy chain gene shows that the coding sequences are interrupted by numerous intervening sequences whose number, size, and distribution do not appear to be conserved in the same organism or between species.  相似文献   

18.
Cloning and expression of rat homeo-box-containing sequences   总被引:2,自引:0,他引:2  
M Falzon  N Sanderson  S Y Chung 《Gene》1987,54(1):23-32
  相似文献   

19.
Genomic DNA sequences sharing homology with the NBS-LRR (nucleotide binding site-leucine-rich repeat) resistance genes were isolated and cloned from apricot (Prunus armeniaca L.) using a PCR approach with degenerate primers designed from conserved regions of the NBS domain. Restriction digestion and sequence analyses of the amplified fragments led to the identification of 43 unique amino acid sequences grouped into six families of resistance gene analogs (RGAs). All of the RGAs identified belong to the Toll-Interleukin receptor (TIR) group of the plant disease resistance genes (R-genes). RGA-specific primers based on non-conserved regions of the NBS domain were developed from the consensus sequences of each RGA family. These primers were used to develop amplified fragment length polymorphism (AFLP)-RGA markers by means of an AFLP-modified procedure where one standard primer is substituted by an RGA-specific primer. Using this method, 27 polymorphic markers, six of which shared homology with the TIR class of the NBS-LRR R-genes, were obtained from 17 different primer combinations. Of these 27 markers, 16 mapped in an apricot genetic map previously constructed from the self-pollination of the cultivar Lito. The development of AFLP-RGA markers may prove to be useful for marker-assisted selection and map-based cloning of R-genes in apricot.  相似文献   

20.
Proteins have been classified into families based upon sequence homology. An accurate, systematic comparative model-building procedure for a homologous family of proteins would be very valuable scientifically. This paper presents such a procedure and applies it to the mammalian serine proteases, which are ubiquitous and involved in many important biological functions. Eleven proteins of this family are considered here, including a variety of blood serum, intestinal and pancreatic proteins as well as a closely related bacterial enzyme.The modeling method capitalizes upon the availability of three experimentally determined structures for mammalian serine proteases. These structures show that the molecule is divided into structurally conserved regions, which contain the strong sequence homology, and structurally variable regions, which include all the additions and deletions. We show that by applying this structural distinction to new sequences, erroneous alignments of the sequences are greatly minimized.For each aligned new sequence, the structurally conserved regions can be constructed from any of the known structures. In examining the variable regions, we have found that a variable region that has the same length and residue character in two different known structures usually has the same conformation in both. Thus, when the eight structurally unknown proteins are modeled, most of the variable regions can be constructed directly from the known structures. A minority of the variable regions require more sophisticated analysis to evaluate the relative merits of a small number of possible conformations. Only a very few are so different that modeling by homology is entirely ruled out. We demonstrate, therefore, that by this modeling procedure, the maximum of each of these mammalian serine proteases is constructed directly from the experimentally determined structures and the necessity to build from intuition or from energy considerations is greatly reduced.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号