首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
From the total DNA of 17 transgenic tobacco plants the DNA fragments containing T-DNA/plant DNA junctions were amplified using inverse polymerase chain reaction. Comparison of the nucleotide sequences of 34 fragments with the GENEBANK sequences revealed homology with vector sequences outside T-DNA in 10 cases and no homology with the known nucleotide sequences in most clones. The AT-content varied from 51 up to 72% that is close to the total percentage of AT pairs in tobacco genome. Alignment of the sequences truncated during embedding of the left and the right borders has shown that for the left border significant clusterization (10 bp region) of truncation sites was observed, and five sequences had identical sites of truncation (+23 T) that showed the preferable use of this nucleotide. Nine created nucleotide sequences were homologous to the repeating sequences in tobacco genome. The percentage of homology varied from 70 up to 90%. The identified repeats belong to different types.  相似文献   

2.
The extent of homology between two sequences is generally expressed quantitatively by using a set of rules to assign to aligned residue pairs, numerical values that depend on some measure of the similarity between the pairs. A homology score for the alignment is obtained by summing the numerical value for each pair, and possibly also adding some appropriate penalty for insertions and deletions in the alignment. Whatever the set of rules used in arriving at the best homology score, the fundamental question that remains is whether the score implies that the sequences are in some way related. In the absence of biological criteria, such a question is generally given a preliminary answer by a Monte Carlo evaluation of the statistical significance of the score. By randomizing the two sequences a large number of times and noting the homology on each throw, the probability of a given score can be obtained for each sequence, and hence the likelihood that the observed homology could have occurred by chance for sequences with the given composition, is easily assessed. In addition to this statistical assessment, however, there is a second statistical question that must also be evaluated—particularly for short homologous stretches (<15 residues). When homology searches are performed against databases containing several hundred thousand residues, an important number to know is the probability that the homologous sequence would have occurred simply as the result of the large number of arrangements of short sequences that must be present in any large collection of disparate sequences. In this note we evaluate different versions of this question using different sets of rules. Our results indicate that homologies of roughly six or more residues out of ten are statistically significant and that the best procedure for finding homologies and determining their significance is with the use of the Dayhoff mutation matrix, assigning a weight of at least —8 as a penalty for gaps.  相似文献   

3.
4.
Mammalian cells frequently depend on homologous recombination (HR) to repair DNA damage accurately and to help rescue stalled or collapsed replication forks. The essence of HR is an exchange of nucleotides between identical or nearly identical sequences. Although HR fulfills important biological roles, recombination between inappropriate sequence partners can lead to translocations or other deleterious rearrangements and such events must be avoided. For example, the recombination machinery must follow stringent rules to preclude recombination between the many repetitive elements in a mammalian genome that share significant but imperfect homology. This paper takes a conceptual approach in addressing the homology requirements for recombination in mammalian genomes as well as the general strategy used by cells to reject recombination between similar but imperfectly matched sequences. A mechanism of heteroduplex rejection that involves the unwinding of recombination intermediates that may form between mismatched sequences is discussed.  相似文献   

5.
Summary Completion of the sequence determination of all 52 Escherichia coli ribosomal proteins enabled a final comparison of their sequences. Similarities in amino acid compositions were compared to the relatedness of the sequences, which was analyzed statistically with the aid of the computer programs RELATE and ALIGN.Among the examined 52×52 possible protein pairs at least 40 pairs were found that can be regarded as distantly related (showing segment comparison score values slightly above 3.0 S.D. units). These protein pairs were further examined with the programs ALIGN and SEEK to locate homologous sequence stretches. In no case were two complete homologous sequences found (with the exception of the known identical pairs L7/L12 and S20/L26). However, short homologous sequence regions were observed. Beside those protein pairs that show significant although distant relatedness, other pairs were slightly below the threshold value of 3.0 S.D. units.Those pairs observed to be distantly related consisted either of two proteins from the same subunit or of one protein from each of the different subunits. A further analysis of these pairs revealed a correlation between their relatedness and their time of incorporation into the ribosome during assembly.  相似文献   

6.
Isolation and structure of a rat cytochrome c gene   总被引:18,自引:0,他引:18  
We screened a Charon 4A-rat genomic library using the cloned iso-1 cytochrome c gene from Saccharomyces cerevisiae as a specific hybridization probe. Eight different recombinant phages homologous to a coding region subfragment of the yeast gene were isolated. Nucleotide sequence analysis of a 0.96-kilobase portion of one of these established the existence of a gene coding for a cytochrome c identical in amino acid sequence with that of mouse. The rat polypeptide chain sequence had not previously been determined. In contrast to the yeast iso-1 and iso-2 cytochrome c genes, neither of which have introns, the rat gene contains a single 105-base pair intervening sequence interrupting glycine codon 56. The overall nucleotide sequence homology between cytochrome c genes of yeast and rat is about 62%, with areas of greater homology coinciding with four regions of functionally constrained amino acid sequences. Two of these regions displayed 85-90% DNA sequence homology, including the longest consecutive homologous stretch of 14 nucleotides, corresponding to amino acids 47-52 of the rat protein. Somewhat less homology was observed in the DNA-specifying amino acids 70-80, which are invariant residues in most known cytochrome c molecules. Thermal dissociation of the yeast probe from the homologous rat DNA was at about 58 degrees C in 0.39 M Na+. These results establish that cytochrome c genes may be isolated by interspecies hybridization between widely divergent organisms.  相似文献   

7.
RmI, a chimeric DNA molecule containing polyomavirus (Py) and mouse sequences, generates unit-length Py DNA via intramolecular recombination between two directly repeated viral sequences of 182 base pairs (S repeats). To analyze the contribution of the S repeats in this process, we produced mutants of RmI carrying deletions in either one or both S repeats and tested them for their ability to recombine in mouse 3T6 cells. Mutant DNAs were found to yield unit-length Py DNA as long as they carried a minimal internal homology of 40 to 50 base pairs. Unlike RmI itself, however, the mutants also gave rise to nonhomologous recombination products. These results suggest that when the generation of homologous products is hampered by a limiting homology, nonhomologous products may arise instead of homologous ones. Therefore, the initial step(s) in the mechanisms yielding the two kinds of products could be identical.  相似文献   

8.
E Schramm  J Mende  V Braun    R M Kamp 《Journal of bacteriology》1987,169(7):3350-3357
Colicin B formed by Escherichia coli kills sensitive bacteria by dissipating the membrane potential through channel formation. The nucleotide sequence of the structural gene (cba) which encodes colicin B and of the upstream region was determined. A polypeptide consisting of 511 amino acids was deduced from the open reading frame. The active colicin had a molecular weight of 54,742. The carboxy-terminal amino acid sequence showed striking homology to the corresponding channel-forming region of colicin A. Of 216 amino acids, 57% were identical and an additional 19% were homologous. In this part 66% of the nucleotides were identical in the colicin A and B genes. This region contained a sequence of 48 hydrophobic amino acids. Sequence homology to the other channel-forming colicins, E1 and I, was less pronounced. A homologous pentapeptide was detected in colicins B, M, and I whose uptake required TonB protein function. The same consensus sequence was found in all outer membrane proteins involved in the TonB-dependent uptake of iron siderophores and of vitamin B12. Upstream of cba a sequence comprising 294 nucleotides was identical to the sequence upstream of the structural gene of colicin E1, with the exception of 43 single-nucleotide replacements, additions, or deletions. Apparently, the region upstream of colicins B and E1 and the channel-forming sequences of colicins A and B have a common origin.  相似文献   

9.
DNA fragments containing T-DNA/plant DNA junctions isolated from 17 transgenic tobacco plants were amplified using inverse PCR. Analysis of the nucleotide sequences of 34 cloned DNA fragments revealed 100% homology with vector sequences outside T-DNA in 10 cases. Nine nucleotide sequences had homology with the repeats in the tobacco genome. The percentage of homology varied from 70 to 90%, with the identified repeats belonging to different types. In most clones no homology was revealed with the GENEBANK sequences. Alignment of the sequences truncated during the integration of the left and the right borders of the T-DNA insertions demonstrated significant clusterization (10 bp region) of truncation sites for the left border. Five sequences had identical truncation sites (+23 T) that showed the perferable use of this nucleotide. The AT content varied from 51 to 72% which was close to the total percentage of AT pairs in the tobacco genome.  相似文献   

10.
The nucleotide sequence of the feline c-fes/fps proto-oncogene was analyzed. Comparison with v-fes and v-fps revealed that all v-fes/fps homologous sequences were dispersed over 11 kilobase pairs in 19 interspersed segments. All segments, numbered exon 1 to exon 19 as in the chicken and human loci, were flanked by consensus splice junctions. The putative promoter region contained a CATT sequence and three CCGCCC motifs which were also found in the human locus at similar positions. About 200 nucleotides downstream of a translational stop codon in exon 19, a putative poly(A) addition signal was identified. Using the putative translation initiation codon in exon 2, a 93,000-molecular-weight protein could be deduced. This protein resembled very well the putative protein of the human c-fes/fps proto-oncogene (94% overall homology) and, although less well, the putative protein of the chicken c-fes/fps proto-oncogene (70% overall homology). As far as the feline c-fes/fps proto-oncogene sequences transduced to the Gardner-Arnstein (GA) and Snyder-Theilen (ST) strains of feline sarcoma virus (FeSV) are concerned, homology in deduced amino acid sequences between the GA- and ST-v-fes viral oncogenes and the proto-oncogene was 99%. Analysis of the recombination junctions between feline leukemia virus and v-fes sequences in GA- and ST-FeSV proviral DNA revealed for the left-hand junction the involvement of homologous recombination, presumably at the DNA level. The right-hand junction, which appeared identical in the GA-FeSV and ST-FeSV genomes, could have been the result of a site-specific recombination at the RNA level.  相似文献   

11.
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire.  相似文献   

12.
13.
Chromosomal restriction fragments of Corynebacterium ulcerans and C. diphtheriae, containing an integration site for corynephages of the beta family, show homology on Southern blots. Homologous DNA in also found in the soil isolate C. glutamicum, although this strain is not susceptible to beta-corynephages. Three of these DNA fragments, one for each bacterial strain, and a fragment of gamma-corynephage DNA previously shown to contain the phage integration site, were cloned and sequenced. Alignment of the 3 bacterial sequences shows a very high degree of homology in a stretch of ca 120 nucleotides, whereas the rest of the sequences is generally non-homologous. Within this common bacterial portion, a segment of ca. 96 nucleotides (core sequence) is also highly homologous to the phage sequence. The first half (ca. 50 bp) of the core sequence is identical in all aligned sequences whereas the second half, which is largely occupied by a stem-and-loop structure, contains point mutations peculiar to each clone. The described sequences are likely to be involved in phage integration/excision processes.  相似文献   

14.
15.
D R Hyde  C P Tu 《Nucleic acids research》1982,10(13):3981-3993
The nucleotide sequences at the ends of the Tn4 transposon (mercury spectinomycin and sulfonamide resistance) have been determined. They are inverted repeated sequences of 38 nucleotides with three mismatched base pairs. These sequences are strongly homologous with the terminal sequences of Tn501 (mercury resistance) but less so with those of Tn3 (ampicillin resistance). The Tn4 transposon generates pentanucleotide members (Tn3, Tn1000, Tn501, Tn551, IS2) with the exception of Tn1721 and bacteriophage Mu. Among the three Tn4 insertion sites examined here, two of them occurred near a nonanucleotide sequence in perfect homology with part of the terminal inverted-repeat sequence of Tn4 and the third insertion occurred near a sequence of partial homology to one end of Tn4. All three insertions were in the same orientation such that IRb is proximal to its homologous sequence on the recipient DNA.  相似文献   

16.
17.
MOTIVATION: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma et al. discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. RESULTS: Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences.  相似文献   

18.
蒙古冰草Actin基因片段的克隆及序列分析   总被引:2,自引:0,他引:2  
旨在利用同源序列法分离蒙古冰草(Agropyron mongolicum Keng)Actin基因同源片段,为研究其他基因在蒙古冰草中的表达和调控提供内标参照.根据禾本科植物小麦Actin基因(AB181991)的保守序列设计2对引物A4和A5,采用RT-PCR扩增蒙古冰草的Actin基因片段,分别得到656 bp和848 bp的片段,使用DNAman和DNAuser等分子生物学软件进行序列分析,将2个片段的重复序列合并后获得一段长度为962 bp的基因片段,编码237个氨基酸,将克隆的Actin基因片段命名为MwACT.该序列与其它植物Actin基因核苷酸序列的同源性均在80%以上,其中与小麦、大麦的同源性达到94%;与氨基酸序列的同源性均在90%以上.  相似文献   

19.
Two new sets of scoring matrices are introduced: H2 for the protein sequence comparison and T2 for the protein sequence-structure correlation. Each element of H2 or T2 measures the frequency with which a pair of amino acid types in one protein, k-residues apart in the sequence, is aligned with another pair of residues, of given amino acid types (for H2) or in given structural states (for T2), in other structurally homologous proteins. There are four types, corresponding to the k-values of 1 to 4, for both H2 and T2. These matrices were set up using a large number of structurally homologous protein pairs, with little sequence homology between the pair, that were recently generated using the structure comparison program SHEBA. The two scoring matrices were incorporated into the main body of the sequence alignment program SSEARCH in the FASTA package and tested in a fold recognition setting in which a set of 107 test sequences were aligned to each of a panel of 3,539 domains that represent all known protein structures. Six procedures were tested; the straight Smith-Waterman (SW) and FASTA procedures, which used the Blosum62 single residue type substitution matrix; BLAST and PSI-BLAST procedures, which also used the Blosum62 matrix; PASH, which used Blosum62 and H2 matrices; and PASSC, which used Blosum62, H2, and T2 matrices. All procedures gave similar results when the probe and target sequences had greater than 30% sequence identity. However, when the sequence identity was below 30%, a similar structure could be found for more sequences using PASSC than using any other procedure. PASH and PSI-BLAST gave the next best results.  相似文献   

20.
Twilight zone of protein sequence alignments   总被引:38,自引:0,他引:38  
Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 20-35% sequence identity. Here, more than a million sequence alignments were analysed between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (i) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (ii) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if 10 residues were similar in an alignment of length 16 (>60%), structural similarity could not be inferred. (iii) The 'more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (iv) Using intermediate sequences for finding links between more distant families was almost as successful: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号