首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Genomics》2019,111(6):1590-1603
Genomes are not random sequences because natural selection has injected information in biological sequences for billions of years. Inspired by this idea, we developed a simple method to compare genomes considering nucleotide counts in subsequences (blocks) instead of their exact sequences.We introduce the Block Alignment method for comparing two genomes and based on this comparison method, define a similarity score and a distance. The presented model ignores nucleotide order in the sequence. On the other hand, in this block comparison method, due to exclusion of point mutations and small size variations, there is no need for high coverage sequencing which is responsible for the high costs of data production and storage; moreover, the sequence comparisons could be performed with higher speed.Phylogenetic trees of two sets of bacterial genomes were constructed and the results were in full agreement with their already constructed phylogenetic trees. Furthermore, a weighted and directed similarity network of each set of bacterial genomes was inferred ab initio by this model. Remarkably, the communities of these networks are in agreement with the clades of the corresponding phylogenetic trees which means these similarity networks also contain phylogenetic information about the genomes. Moreover, the block comparison method was used to distinguish rob(15;21)c-associated iAMP21 and sporadic iAMP21 rearrangements in subgroups of chromosome 21 in acute lymphoblastic leukemia. Our results show a meaningful difference between the number of contigs that mapped to chromosomes 15 and 21 in these cases. Furthermore, the presented block alignment model can select the candidate blocks to perform more accurate analysis and it is capable to find conserved blocks on a set of genomes.  相似文献   

2.
A comparative analysis between human, mouse, and rabbit immunoglobulin (Ig) kappa-gene DNA sequences is presented. New formulas for determining the expected length and variance of the longest block identity (a succession of matching nucleotides) between multiple random sequences are given and are used to establish statistical criteria for ascertaining the significance of block identities shared in r out of s sequences. The statistically significant block identities within and between the Ig-kappa-gene sequences are ascertained, and alignment maps based on these similarities are constructed. The human and rabbit sequences (especially in the noncoding regions) and the human and mouse sequences (on the coding regions) show a similarity much stronger than that between the mouse and rabbit sequences. The existence of several highly significant shared oligonucleotides occurring in alignment with each other or with respect to the J- and C-gene segments suggests a configuration of multiple control sites. Discussion and interpretations of the form and distribution of the block identities are given.   相似文献   

3.
4.
Polymorphic sequence in the D-loop region of equine mitochondrial DNA   总被引:8,自引:0,他引:8  
The D-loop regions in equine mitochondrial DNA were cloned from three thoroughbred horses by polymerase chain reaction (PCR). The total number of bases in the D-loop region were 1114bp, 1115bp and 1146bp. The equine D-loop region is A/T rich like many other mammalian D-loops. The large central conserved sequence block and small conserved sequence blocks 1, 2 and 3, that are common to other mammals, were observed. Between conserved sequence blocks 1 and 2 there were tandem repeats of an 8bp equine-specific sequence TGTGCACC, and the number of tandem repeats differed among individual horses. The base composition in the unit of these repeats is G/C rich as are the short repeats in the D-loops of rabbit and pig. Comparing DNA sequences between horse and other mammals, the difference in the D-loop region length is mostly due to the difference in the number of DNA sequences at both extremities. The similarities of the DNA sequences are in the middle part of the D-loop. In comparison of the sequences among three thoroughbred horses, it was determined that the region between tRNAPro and the large central conserved sequence block was the richest in variation. PCR primers in the D-loop region were designed and the expected maternal inheritance was confirmed by PCR-RFLP (restriction fragment length polymorphism).  相似文献   

5.
In this work, the mitochondrial genomes for spotted halibut (Verasper variegatus) and barfin flounder (Verasper moseri) were completely sequenced. The entire mitochondrial genome sequences of the spotted halibut and barfin flounder were 17,273 and 17,588 bp in length, respectively. The organization of the two mitochondrial genomes was similar to those reported from other fish mitochondrial genomes containing 37 genes (2 rRNAs, 22 tRNAs and 13 protein-coding genes) and two non-coding regions (control region (CR) and WANCY region). In the CR, the termination associated sequence (ETAS), six central conserved block (CSB-A,B,C,D,E,F), three conserved sequence blocks (CSB1-3) and a region of 61-bp tandem repeat cluster at the end of CSB-3 were identified by similarity comparison with fishes and other vertebrates. The tandem repeat sequences show polymorphism among the different individuals of the two species. The complete mitochondrial genomes of spotted halibut and barfin flounder should be useful for evolutionary studies of flatfishes and other vertebrate species.  相似文献   

6.
Let A be a sequence of n real numbers, L(1) and L(2) be two integers such that L(1) < or = L(2) , and R(1) and R(2) be two real numbers such that R(1) < or = R(2). An interval of A is feasible if its length is between L(1) and L(2) and its average is between R(1) and R(2). In this paper, we study the following problems: finding all feasible intervals of A, counting all feasible intervals of A, finding a maximum cardinality set of non-overlapping feasible intervals of A, locating a longest feasible interval of A, and locating a shortest feasible interval of A. The problems are motivated from the problem of locating CpG islands in biomolecular sequences. In this paper, we firstly show that all the problems have Omega (n log n)-time lower bound in the comparison model. Then, we use geometric approaches to design optimal algorithms for the problems. All the presented algorithms run in an on-line manner and use O(n) space.  相似文献   

7.
The availability of mannuronan and mannuronan C-5 epimerases allows the production of a strictly alternating mannuronate-guluronate (MG) polymer and the MG-enrichment of natural alginates, providing a powerful tool for the analysis of the role of such sequences in the calcium-alginate gel network. In view of the calcium binding properties of long alternating sequences revealed by circular dichroism studies which leads eventually to the formation of stable hydrogels, their direct involvement in the gel network is here suggested. In particular, 1H NMR results obtained from a mixed alginate sample containing three polymeric species, G blocks, M blocks, and MG blocks, without chemical linkages between the block structures, indicate for the first time the formation of mixed junctions between G and MG blocks. This is supported by the analysis of the Young's modulus of hydrogels from natural and epimerized samples obtained at low calcium concentrations. Furthermore, the "zipping" of long alternating sequences in secondary MG/MG junctions is suggested to account for the shrinking (syneresis) of alginate gels in view of its dependence on the length of the MG blocks. As a consequence, a partial network collapse, macroscopically revealed by a decrease in the Young's modulus, occurred as the calcium concentration in the gel was increased. The effect of such "secondary" junctions on the viscoelastic properties of alginate gels was evaluated measuring their creep compliance under uniaxial compression. The experimental curves, fitted by a model composed of a Maxwell and a Voigt element in series, revealed an increase in the frictional forces between network chains with increasing length of the alternating sequences. This suggests the presence of an ion mediated mechanism preventing the shear of the gel.  相似文献   

8.
Results of classification of terrestrial ecosystems using an average similarity matrix are reported for the West Siberian Plain. Initial indices are first calculated separately for four components of an ecosystem. These components (blocks) include the underground block (soil humus, mortmass, and underground phytomass), above-ground vegetation, and invertebrates and vertebrates. Mismatch of boundaries in separate blocks of ecosystems and in comparison with the inhomogeneity of ecosystems in general was demonstrated. These differences are observed in both the typological and typological-chorological analysis. The indicated features of spatial succession within the blocks generate continuity of ecosystems and the conventional character of all the classifications and drawn boundaries.  相似文献   

9.
The amino acid sequences of most of the CH1, CH2 and CH3 domains of IgG Zie, a myeloma protein belonging to the IgG2 subclass, are presented. These data make possible a comparison of the sequences of residues 253-446 of all four subclasses of immunoglobulins: these residues make up almost the entire Fc regions. A comparison can also be made of the CH1 domain of IgG1 Eu and the CH1 domain of IgG2 Zie. Earlier sequence analyses of the Fc regions of subclass 1 and 3 proteins, and parts of the Fc regions of subclass 2 and 4 proteins showed that about 95% of these sequences were identical. The extended comparisons made possible by the data presented here show that this very high degree of identity is maintained throughout the four subclasses. Similarly, the CH1 domains of gamma 1 and gamma 2 chains were found to have about 93% sequence identity. It is unlikely that the few single amino acid changes within the constant region domains can account for the marked differences between subclasses observed in the region domains can account for the marked differences between subclasses observed in the biological effector functions of immunoglobulin Fc regions, especially since most of the changes are highly conservative. Rather, it seems probable that these functional differences are caused by conformational differences between the subgroups, which result from sequence differences in the hinge regions.  相似文献   

10.
11.
构建基于折叠核心的全α类蛋白取代矩阵   总被引:1,自引:0,他引:1  
氨基酸残基取代矩阵是影响多序列比对效果的重要因素,现有的取代矩阵对低相似序列的比对性能较低.在已有的 BLOSUM 取代矩阵算法基础上,定义了基于蛋白质折叠核心结构的序列 结构数据块;提出一种新的基于全α类蛋白质折叠核心结构的氨基酸残基取代矩阵——TOPSSUM25,用于提高低相似度序列的比对效果.将矩阵TOPSSUM25导入多序列比对程序,对相似性小于25%的一组四螺旋束序列 结构数据块的测试结果表明,基于 TOPSSUM25的多序列比对效果明显优于BLOSUM30矩阵;基于一个BAliBASE子集的比对检验也进一步表明, TOPSSUM25在全α类蛋白质的两两序列比对上优于BLOSUM30矩阵.研究结果可为进一步的阐明低同源蛋白质序列 结构 功能关系提供帮助.  相似文献   

12.

Background

Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487–1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings.

Results

In this paper, we study the length distribution of k-mismatch common substrings between two sequences. We show that the number of substitutions per position can be accurately estimated from the position of a local maximum in the length distribution of their k-mismatch common substrings.
  相似文献   

13.
The Tus protein of Escherichia coli is capable of arresting DNA replication in an orientation-dependent manner when bound to specific sequences in the bacterial chromosome called Ter sites. Arrest of DNA replication has been postulated to occur either by a barrier mechanism, where Tus acts as a physical block to replication fork progression, or through protein-protein interactions between Tus and some component of the replication fork. A previous mutational analysis of Tus suggested that the amino acids in the L1 loop might play a role in replication arrest. Site-directed mutagenesis of amino acids in the L1 loop and other amino acid residues on the "non-permissive" face of Tus was performed to identify residues that affected Tus function. One mutant, E47Q, gave results that are inconsistent with the barrier model, showing a greater affinity for the Ter site (with a t 1/2 of 348 min versus 150 min for wild-type Tus) but a reduced ability to arrest DNA replication in vivo. In addition to the site-directed mutagenesis studies, the tus genes of Salmonella, Klebsiella, and Yersinia were sequenced and the proteins expressed in E. coli to assess their ability to arrest DNA replication. The results presented here support a role for protein-protein interactions in Tus function, and suggest that residues E47 and E49 participate in replication fork arrest.  相似文献   

14.
Chimeric PMA1::PMA2 sequences, placed under the control of the PMA1 promoter, were constructed by in vivo recombination between a gapped linearized plasmid containing the PMA2 gene and four different fragments of the PMA1 gene. Correct in-frame assembly of the PMA sequences was screened by the expression of the lacZ reporter gene fused to the PMA2 coding region. Restriction and sequencing analysis of 35 chimeras showed that in all cases, the hybrid sequences was obtained as fusions between continuous sequences specific to PMA1 and PMA2, separated by a region of identity. In all but three cases, the junction sequences were not located at regions of greatest identity. Strikingly, depending on the PMA1 fragment used, junction distribution fell into two categories. In the first, the junctions were scattered over several hundreds of nucleotides upstream of the extremity of the PMA1 fragment, while in the second, they were concentrated at this extremity. Analysis of the alignment of the PMA1 and PMA2 sequences suggests that the distribution is not related to the size of the region of identity at the PMA1-PMA2 boundary but depends on the degree of identity of the PMA genes upstream of the region of identity, the accumulation of successive mismatches leading to a clustered distribution of the junctions. Moreover, the introduction of seven closely spaced mismatches near the end of a PMA1 segment with an otherwise-high level of identity with PMA2 led to a significantly increased concentration of the junctions near this end. These data show that a low level of identity in the vicinity of the common boundary stretch is a strong barrier to recombination. In contrast, consecutive mismatches or regions of overall moderate identity which are located several hundreds of nucleotides upstream from the PMA1 end do not necessarily block recombination.  相似文献   

15.
The novel human papillomavirus type 199 (HPV199) was initially identified in a nasopharyngeal swab sample obtained from a 25 year-old immunocompetent male. The complete genome of HPV199 is 7,184 bp in length with a GC content of 36.5%. Comparative genomic characterization of HPV199 and its closest relatives showed the classical genomic organization of Gammapapillomaviruses (Gamma-PVs). HPV199 has seven major open reading frames (ORFs), encoding five early (E1, E2, E4, E6, and E7) and two late (L1 and L2) proteins, while lacking the E5 ORF. The long control region (LCR) of 513 bp is located between the L1 and E6 ORFs. Phylogenetic analysis additionally confirmed that HPV-199 clusters into the Gamma-PV genus, species Gamma-12, additionally containing HPV127, HV132, HPV148, HPV165, and three putative HPV types: KC5, CG2 and CG3. HPV199 is most closely related to HPV127 (nucleotide identity 77%). The complete viral genome sequence of additional HPV199 isolate was determined from anal canal swab sample. Two HPV199 complete viral sequences exhibit 99.4% nucleotide identity. To the best of our knowledge, this is the first member of Gamma-PV with complete nucleotide sequences determined from two independent clinical samples. To evaluate the tissue tropism of the novel HPV type, 916 clinical samples were tested using HPV199 type-specific real-time PCR: HPV199 was detected in 2/76 tissue samples of histologically confirmed common warts, 2/108 samples of eyebrow hair follicles, 2/137 anal canal swabs obtained from individuals with clinically evident anal pathology, 4/184 nasopharyngeal swabs and 3/411 cervical swabs obtained from women with normal cervical cytology. Although HPV199 was found in 1.4% of cutaneous and mucosal samples only, it exhibits dual tissue tropism. According to the results of our study and literature data, dual tropism of all Gamma-12 members is highly possible.  相似文献   

16.
An algorithm is presented for localizing variable and constant regions in homologous protein sequences. A set of aligned protein sequences is divided into two groups consisting of m and n sequences. Each group contains sequences of most related species. Value of the position dissimilarity of proteins from different groups of m and n sequences is defined as a number of failures to coincide in comparison with all possible mXn pairs of amino acid residues in the position (each from different group) divided by mXn. The position dissimilarity value of m protein sequences within a group is defined as the number of failures to coincide in comparison with all possible mX X(m-1)/2 pairs of amino acid residues divided by mX(m-1)/2. Ten position average of dissimilarity values is plotted vs. the first position number. Area of the figure included between the profile of dissimilarity values and its mean value line characterizes the overall irregularity of amino acid substitutions along the protein sequences. If the area value is greater than the average area for 1000 random profile by more than two standard deviation units, the profile extrema containing the "surplus" of area are cut off. The cut off stretches are likely to be variable and constant regions. In case of "between groups" comparisons it is found that the overall irregularity of amino acid substitutions is very high for all considered families of proteins; phospholipases A2, aspartate aminotransferases, alpha-subunits of Na+,K(+)-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre, human rhodopsins.  相似文献   

17.
Measuring in a quantitative, statistical sense the degree to which structural and functional information can be "transferred" between pairs of related protein sequences at various levels of similarity is an essential prerequisite for robust genome annotation. To this end, we performed pairwise sequence, structure and function comparisons on approximately 30,000 pairs of protein domains with known structure and function. Our domain pairs, which are constructed according to the SCOP fold classification, range in similarity from just sharing a fold, to being nearly identical. Our results show that traditional scores for sequence and structure similarity have the same basic exponential relationship as observed previously, with structural divergence, measured in RMS, being exponentially related to sequence divergence, measured in percent identity. However, as the scale of our survey is much larger than any previous investigations, our results have greater statistical weight and precision. We have been able to express the relationship of sequence and structure similarity using more "modern scores," such as Smith-Waterman alignment scores and probabilistic P-values for both sequence and structure comparison. These modern scores address some of the problems with traditional scores, such as determining a conserved core and correcting for length dependency; they enable us to phrase the sequence-structure relationship in more precise and accurate terms. We found that the basic exponential sequence-structure relationship is very general: the same essential relationship is found in the different secondary-structure classes and is evident in all the scoring schemes. To relate function to sequence and structure we assigned various levels of functional similarity to the domain pairs, based on a simple functional classification scheme. This scheme was constructed by combining and augmenting annotations in the enzyme and fly functional classifications and comparing subsets of these to the Escherichia coli and yeast classifications. We found sigmoidal relationships between similarity in function and sequence, with clear thresholds for different levels of functional conservation. For pairs of domains that share the same fold, precise function appears to be conserved down to approximately 40 % sequence identity, whereas broad functional class is conserved to approximately 25 %. Interestingly, percent identity is more effective at quantifying functional conservation than the more modern scores (e.g. P-values). Results of all the pairwise comparisons and our combined functional classification scheme for protein structures can be accessed from a web database at http://bioinfo.mbb.yale.edu/alignCopyright 2000 Academic Press.  相似文献   

18.
Summary The haploid genomes of all known primates have two or more adult -globin genes contained within tandemly arranged duplication units. Although the tandem duplication event generating these -globin loci is believed to occur prior to the divergence of primates, a number of length polymorphisms exist within the loci among different primate species. In order to understand the molecular basis of these length polymorphisms, we have cloned and determined the nucleotide sequence of a major portion of the rhesus monkey adult -globin locus. Sequence comparison to human suggests that the length difference between the adult -globin loci of human and Old World monkey is the result of one or more DNA recombination processes, all of which appeared to be related to the transposition of Alu family repeats. First, the finding of a monomeric Alu family repeat at the junction between nonhomology block I and homology block Y of the 2 genecontaining unit in rhesus macaque suggests that the dimeric Alu family repeat, Alu 3, at the orthologous position in human was generated by insertion of a monomeric Alu family repeat into the 3 end of another preexisting Alu family repeat. Second, two Alu family repeats, Alu 1 and Alu 2, exist in human at the 3 end of each of the two X homology blocks, respectively. However, this pair of paralogous Alu family repeats is absent at the corresponding positions in rhesus macaques. This raises interesting questions regarding the evolutionary origin of Alu 1 and Alu 2. Finally, DNA sequences immediately downstream from the insertion site of Alu 2 are completely different between human and rhesus macaque. This last event is similar to DNA rearrangements occurring nearby transposable element(s) in the chromosomes of bacteria, yeast, and plant cells. Its possible role in accelerating the genomic evolution of noncoding or spacer DNA is discussed.  相似文献   

19.
We discuss the statistical significance of local similarities found between DNA sequences, and illustrate the procedure with reference to the Queen and Korn algorithm. If the longest similarity found for two sequences has length L, this length is said to be significant at the 5% level if there is a probability of no more than 0.05 of finding a length of L or greater between a pair of sequences consisting of randomly chosen bases with the same overall base frequencies. The distribution of longest lengths is related to that of lengths from any particular pair of starting positions on the two sequences. For our implementation of the Queen and Korn algorithm, this latter distribution is constructed by combining the five different blocks of bases that may be added to extend a similarity. A table is given to assess the significance of longest similarities in sequences of length up to 1000 bases. Quite long similarities are expected to occur by chance alone. The critical values we calculate for assessing significance are preferable to expected numbers of similarities used by some commercial computer packages.  相似文献   

20.
Comparison of latent and nominal rabbit Ig VHa1 allotype cDNA sequences   总被引:1,自引:0,他引:1  
The genetic basis for the expression of a latent VH allotype in the rabbit was investigated. VH region cDNA libraries were produced from spleen mRNA derived from a homozygous a2a2 rabbit expressing an induced latent VHa1 allotype and, for comparison, from a normal homozygus a1a1 rabbit expressing nominal VHa1 allotype. The deduced amino acid sequences of the nominal VHa1 cDNA were concordant with previously published VHa1 protein sequences. A comparison of two complete VH-DH-JH and six partial VHa1 sequences reveals highly conserved sequence within VH framework regions (FR) and considerable diversity in complementarity-determining regions and D region sequences. Two functional JH genes or alleles are evident. Amino acid sequencing of the N-terminal 15 residues of pooled affinity-purified latent VHa1 H chain showed complete sequence identity with the nominal VHa1 sequences. Possible latent VHa1-encoding cDNA clones, derived from the a2a2 rabbit, were selected by hybridization with oligonucleotide probes corresponding to the VHa1 allotype-associated segments of the first and third framework regions (FR1 and FR3). cDNA sequence analysis reveals that the 5' untranslated regions of nominal and latent VHa1 cDNA were virtually identical to each other and to previously reported sequences associated with VHa2 and VHa-negative genes. Moreover, some latent VHa1 genes encode FR1 segments that are essentially homologous to the corresponding segment of a nominal VHa1 allotype. In contrast, other putative latent genes display blocks of VHa1 sequence in either FR1 or FR3 that are flanked by blocks of sequence identical to other rabbit VH genes (i.e., VHa2 or VHa-negative). These composite sequences may be directly encoded by composite germ-line VH genes or may be the products of somatically generated recombination or gene conversion between genes encoding latent and nominal allotypes. The data do not support the hypothesis that latent genes are the result of extensive modification by somatic point mutation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号