首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Genomics》2019,111(6):1590-1603
Genomes are not random sequences because natural selection has injected information in biological sequences for billions of years. Inspired by this idea, we developed a simple method to compare genomes considering nucleotide counts in subsequences (blocks) instead of their exact sequences.We introduce the Block Alignment method for comparing two genomes and based on this comparison method, define a similarity score and a distance. The presented model ignores nucleotide order in the sequence. On the other hand, in this block comparison method, due to exclusion of point mutations and small size variations, there is no need for high coverage sequencing which is responsible for the high costs of data production and storage; moreover, the sequence comparisons could be performed with higher speed.Phylogenetic trees of two sets of bacterial genomes were constructed and the results were in full agreement with their already constructed phylogenetic trees. Furthermore, a weighted and directed similarity network of each set of bacterial genomes was inferred ab initio by this model. Remarkably, the communities of these networks are in agreement with the clades of the corresponding phylogenetic trees which means these similarity networks also contain phylogenetic information about the genomes. Moreover, the block comparison method was used to distinguish rob(15;21)c-associated iAMP21 and sporadic iAMP21 rearrangements in subgroups of chromosome 21 in acute lymphoblastic leukemia. Our results show a meaningful difference between the number of contigs that mapped to chromosomes 15 and 21 in these cases. Furthermore, the presented block alignment model can select the candidate blocks to perform more accurate analysis and it is capable to find conserved blocks on a set of genomes.  相似文献   

2.
The small multidrug resistance (SMR) protein family is a bacterial multidrug transporter family. As suggested by their title, SMR proteins are composed of four transmembrane alpha-helices of approximately 100-140 amino acids in length. Since their designation as a family, many homologues have been identified and characterized both structurally and functionally. In this review the topology, structure, drug resistance, drug binding, and transport mechanisms of the entire SMR protein family are examined. Additionally, updated bioinformatic analysis of predicted and characterized SMR protein family members was also conducted. Based on SMR sequence alignments and phylogenetic analysis of current members, we propose that this small multidrug resistance transporter family should be expanded into three subclasses: (i) the small multidrug pumps (SMP), (ii) suppressor of groEL mutation proteins (SUG), and a third group (iii) paired small multidrug resistance proteins (PSMR). The roles of these three SMR subclasses are examined, and the well-characterized members, such as Escherichia coli EmrE and SugE, are described in terms of their function and structural organization.  相似文献   

3.
Position-specific substitution matrices, known as profiles,derived from multiple sequence alignments are currently usedto search sequence databases for distantly related members ofprotein families. The performance of the database searches isenhanced by using (i) a sequence weighting scheme which assignshigher weights to more distantly related sequences based onbranch lengths derived from phylogenetic trees, (ii) exclusionof positions with mainly padding characters at sites of insertionsor deletions and (iii) the BLOSUM62 residue comparison matrix.A natural consequence of these modifications is an improvementin the alignment of new sequences to the profiles. However,the accuracy of the alignments can be further increased by employinga similarity residue comparison matrix. These developments areimplemented in a program called PROFILEWEIGHT which runs onUnix and Vax computers. The only input required by the programis the multiple sequence alignment. The output from PROFILEWEIGHTis a profile designed to be used by existing searching and alignmentprograms. Test results from database searches with four differentfamilies of proteins show the improved sensitivity of the weightedprofiles.  相似文献   

4.
Abstract The primary structure of a novel adenoviral protein referred to as p32K and found exclusively in members of the proposed new genus Atadenovirus was analyzed. The p32K gene sequence was determined from two bovine and one snake adenovirus types. Altogether five different p32K sequences were examined, two of them were obtained from the Gene Bank. The C-terminal part of the protein is conserved and shares similarity with certain bacterial small acid soluble proteins (SASPs). The sequence similarity seems coupled with functional relatedness, i.e. both protein groups are found in structures where the genome of the “dormant” organism is packaged in tight nucleoprotein complexes. In these complexes the DNA is protected against harmful environmental effects until the new reproductive cycle is started with specific protease cleavage of the packaging proteins. Although there is no experimental clue about the role of the p32K proteins, we hypothesize phylogenetic relationship between the two protein groups based on the sequence similarity and the supposed functional similarity. The alignments of these protein groups shows that the conserved part of the p32Ks probably is the result of the duplication of a shorter sequence similar to the SASPs of the Bacilli.  相似文献   

5.
The small multidrug resistance (SMR) protein family is a bacterial multidrug transporter family. As suggested by their title, SMR proteins are composed of four transmembrane α-helices of approximately 100-140 amino acids in length. Since their designation as a family, many homologues have been identified and characterized both structurally and functionally. In this review the topology, structure, drug resistance, drug binding, and transport mechanisms of the entire SMR protein family are examined. Additionally, updated bioinformatic analysis of predicted and characterized SMR protein family members was also conducted. Based on SMR sequence alignments and phylogenetic analysis of current members, we propose that this small multidrug resistance transporter family should be expanded into three subclasses: (i) the small multidrug pumps (SMP), (ii) suppressor of groEL mutation proteins (SUG), and a third group (iii) paired small multidrug resistance proteins (PSMR). The roles of these three SMR subclasses are examined, and the well-characterized members, such as Escherichia coli EmrE and SugE, are described in terms of their function and structural organization.  相似文献   

6.
Identification and Classification of G-protein coupled receptors (GPCRs) using protein sequences is an important computational challenge, given that experimental screening of thousands of ligands is an expensive proposition. There are two distinct but complementary approaches to GPCR classification --machine learning and sequence motif analysis. Machine learning methodologies typically suffer from problems of class imbalance and lack of multi-class classification. Many sequence motif methods, meanwhile, are too dependent on the similarity of the primary sequence alignments. It is desirable to have a motif discovery and application methodology that is not strongly dependent on primary sequence similarity. It should also overcome limitations of machine learning. We propose and evaluate the effectiveness of a simple methodology that uses a reduced protein functional alphabet representation, where similar functional residues have similar symbols. Regular expression motifs can then be obtained by ClustalW based multiple sequence alignment, using an identity matrix. Since evolutionary matrices like BLOSUM, PAM are not used, this method can be useful for any set of sequences that do not necessarily share a common ancestry. Reduced alphabet motifs can accurately classify known GPCR proteins and the results are comparable to PRINTS and PROSITE. For well known GPCR proteins from SWISSPROT, there were no false negatives and only a few false positives. This methodology covers most currently known classes of GPCRs, even if there are very few representative sequences. It also predicts more than one class for certain sequences, thus overcoming the limitation of machine learning methods. We also annotated, 695 orphan receptors, and 121 were identified as belonging to Family A. A simple JavaScript based web interface has been developed to predict GPCR families and subfamilies (www.insilico-consulting.com/gpcrmotif.html).  相似文献   

7.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

8.
We have performed an amino acid composition (AAC) analysis of the complete sequences for 235 secondary transport proteins from Escherichia coli, which have functions in the uptake and export of organic and inorganic metabolites, efflux of drugs and in controlling membrane potential. This revealed the trends in content for specific amino acid types and for combinations of amino acids with similar physicochemical properties. In certain proteins or groups of proteins, the so-called spikes of high content for a specific amino acid type or combination of amino acids were identified and confirmed statistically, which in some cases could be directly related to function and ligand specificity. This was prevalent in proteins with a function of multidrug or metal ion efflux. Any tool that can help in identifying bacterial multidrug efflux proteins is important for a better understanding of this mechanism of antibiotic resistance. Phylogenetic analysis based on sequence alignments and comparison of sequences at the N- and C-terminal ends confirmed transporter Family classification. Locations of specific amino acid types in some of the proteins that have crystal structures (EmrE, LacY, AcrB) were also considered to help link amino acid content with protein function. Though there are limitations, this work has demonstrated that a basic analysis of AAC is a useful tool to use in combination with other computational and experimental methods for classifying and investigating function and ligand specificity in a large group of transport or other membrane proteins, including those that are molecular targets for development of new drugs.  相似文献   

9.
There are numerous examples of convergent evolution in nature. Major ecological adaptations such as flight, loss of limbs in vertebrates, pesticide resistance, adaptation to a parasitic way of life, etc., have all evolved more than once, as seen by their analogous functions in separate taxa. But what about protein evolution? Does the environment have a strong enough influence on intracellular processes that enzymes and other functional proteins play, to evolve similar functional roles separately in different organisms? Manganese Superoxide Dismutase (MnSOD) is a manganesedependant metallo-enzyme which plays a crucial role in protecting cells from anti-oxidative stress by eliminating reactive (superoxide) oxygen species. It is a ubiquitous housekeeping enzyme found in nearly all organisms. In this study we compare phylogenies based on MnSOD protein sequences to those based on scores from Hydrophobic Cluster Analysis (HCA). We calculated HCA similarity values for each pair of taxa to obtain a pair-wise distance matrix. A UPGMA tree based on the HCA distance matrix and a common tree based on the primary protein sequence for MnSOD was constructed. Differences between these two trees within animals, enterobacteriaceae, planctomycetes and cyanobacteria are presented and cited as possible examples of convergence. We note that several residue changes result in changes in hydrophobicity at positions which apparently are under the effect of positive selection.  相似文献   

10.
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.  相似文献   

11.
Classification of bacteria is mainly based on sequence comparisons of certain homologous genes such as 16S rRNA. Recently there are challenges to classify bacteria using oligonucleotide frequency pattern of nonhomologous sequences. However, the evolutionary significance of oligonucleotides longer than tetra-nucleotide is not studied well. We performed phylogenetic analysis by using the Euclidean distances calculated from the di to deca-nucleotide frequencies in bacterial genomes, and compared these oligonucleotide frequency-based tree topologies with those for 16S rRNA gene and concatenated seven genes. When oligonucleotide frequency-based trees were constructed for bacterial species with similar GC content, their topologies at genus and family level were congruent with those based on homologous genes. Our results suggest that oligonucleotide frequency is useful not only for classification of bacteria, but also for estimation of their phylogenetic relationships for closely related species.  相似文献   

12.
13.
SimShift: identifying structural similarities from NMR chemical shifts   总被引:3,自引:0,他引:3  
MOTIVATION: An important quantity that arises in NMR spectroscopy experiments is the chemical shift. The interpretation of these data is mostly done by human experts; to our knowledge there are no algorithms that predict protein structure from chemical shift sequences alone. One approach to facilitate this process could be to compare two such sequences, where the structure of one protein has already been resolved. Our claim is that similarity of chemical shifts thereby found implies structural similarity of the respective proteins. RESULTS: We present an algorithm to identify structural similarities of proteins by aligning their associated chemical shift sequences. To evaluate the correctness of our predictions, we propose a benchmark set of protein pairs that have high structural similarity, but low sequence similarity (because with high sequence similarity the structural similarities could easily be detected by a sequence alignment algorithm). We compare our results with those of HHsearch and SSEA and show that our method outperforms both in >50% of all cases.  相似文献   

14.
We have collected a set of 44 Arabidopsis proteins with similarity to the USPA (universal stress protein A of Escherichia coli) domain of bacteria. The USPA domain is found either in small proteins, or it makes up the N-terminal portion of a larger protein, usually a protein kinase. Phylogenetic tree analysis based upon a multiple sequence alignment of the USPA domains shows that these domains of protein kinases 1.3.1 and 1.3.2 form distinct groups, as do the protein kinases 1.4.1. This indicates that their USPA domain structures have diverged appreciably and suggests that they may subserve distinct cellular functions. Two USPA fold classes have been proposed: one based on Methanococcus jannaschii MJ0577 (1MJH) that binds ATP, and the other based on the Haemophilus influenzae universal stress protein (1JMV), highly similar to E. coli UspA, which does not bind ATP. A set of common residues involved in ATP binding in 1MJH and conserved in similar bacterial sequences is also found in a distinct cluster of Arabidopsis sequences. Threading analysis, which examines aspects of secondary and tertiary structure, confirms this Arabidopsis sequence cluster as highly similar to 1MJH. This structural approach can distinguish between the characteristic fold differences of 1MJH-like and 1JMV-like bacterial proteins and was used to assign the complete set of candidate Arabidopsis proteins to one of these fold classes. It is clear that all the plant sequences have arisen from a 1MJH-like ancestor.  相似文献   

15.
Although it is well known that there is no long range colinearity in gene order in bacterial genomes, it is thought that there are several regions that are under strong structural constraints during evolution, in which gene order is extremely conserved. One such region is the str locus, containing the S10-spc-alpha operons. These operons contain genes coding for ribosomal proteins and for a number of housekeeping genes. We compared the organisation of these gene clusters in 111 sequenced prokaryotic genomes (99 bacterial and 12 archaeal genomes). We also compared the organisation to the phylogeny based on 16S ribosomal RNA gene sequences and the sequences of the ribosomal proteins L22, L16 and S14. Our data indicate that there is much variation in gene order and content in these gene clusters, both in bacterial as well as in archaeal genomes. Our data indicate that differential gene loss has occurred on multiple occasions during evolution. We also noted several discrepancies between phylogenetic trees based on 16S rRNA gene sequences and sequences of ribosomal proteins L16, L22 and S14, suggesting that horizontal gene transfer did play a significant role in the evolution of the S10-spc-alpha gene clusters.  相似文献   

16.
17.
We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient. The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling. The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PROFILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.  相似文献   

18.
Members of the immunoglobulin superfamily in bacteria.   总被引:4,自引:0,他引:4       下载免费PDF全文
We report a prediction that two prokaryotic proteins contain immunoglobulin superfamily domains. Immunoglobulin-like folds have been identified previously in prokaryotic proteins, but these share no recognizable sequence similarity with eukaryotic immunoglobulin superfamily (IgSF) folds, and may be the result of the physics and chemistry of proteins favoring certain common folds. In contrast, the prokaryotic proteins identified have sequences whose match to the immunoglobulin superfamily can be detected by hidden Markov modeling, BLASTP matches, key residue analysis, and secondary structure predictions. We propose that these prokaryotic immunoglobulin-like domains are almost certain to be related by divergence from a common ancestor to eukaryotic immunoglobulin superfamily domains.  相似文献   

19.
The nucleotide-binding-site and leucine-rich-repeat (NBS–LRR) class of R proteins is abundant and widely distributed in plants. By using degenerate primers designed on the NBS domain in lettuce, we amplified sequences in sugar pine that shared sequence identity with many of the NBS–LRR class resistance genes catalogued in GenBank. The polymerase chain reaction products were used to probe a cDNA library constructed from needle tissue of sugar pine seedlings. A full-length cDNA was obtained that demonstrated high predicted amino acid sequence similarity to the coiled coil (CC)–NBS–LRR subclass of NBS–LRR resistance proteins in GenBank. Sequence analyses of this gene in megagametophytes from two sugar pine trees segregating for the hypersensitive response to white pine blister rust revealed zero nucleotide variation. Moreover, there was no variation found in 24 unrelated sugar pine trees except for three single-nucleotide polymorphisms located in the 3′ untranslated region. Compared to other genes sequenced in Pinaceae, such a low level of sequence variation in unrelated individuals is unusual. Although, numerous studies have reported that plant R genes are under diversifying selection for specificity to evolving pathogens, the resistance gene analog discussed here appears to be under intense purifying selection.An erratum to this article can be found at  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号