首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A novel sequence-analysis technique for detecting correlated amino acid positions in intermediate-size protein families (50-100 sequences) was developed, and applied to study voltage-dependent gating of potassium channels. Most contemporary methods for detecting amino acid correlations within proteins use very large sets of data, typically comprising hundreds or thousands of evolutionarily related sequences, to overcome the relatively low signal-to-noise ratio in the analysis of co-variations between pairs of amino acid positions. Such methods are impractical for voltage-gated potassium (Kv) channels and for many other protein families that have not yet been sequenced to that extent. Here, we used a phylogenetic reconstruction of paralogous Kv channels to follow the evolutionary history of every pair of amino acid positions within this family, thus increasing detection accuracy of correlated amino acids relative to contemporary methods. In addition, we used a bootstrapping procedure to eliminate correlations that were statistically insignificant. These and other measures allowed us to increase the method's sensitivity, and opened the way to reliable identification of correlated positions even in intermediate-size protein families. Principal-component analysis applied to the set of correlated amino acid positions in Kv channels detected a network of inter-correlated residues, a large fraction of which were identified as gating-sensitive upon mutation. Mapping the network of correlated residues onto the 3D structure of the Kv channel from Aeropyrum pernix disclosed correlations between residues in the voltage-sensor paddle and the pore region, including regions that are involved in the gating transition. We discuss these findings with respect to the evolutionary constraints acting on the channel's various domains. The software is available on our website  相似文献   

2.
The genetic architecture of resistance   总被引:13,自引:0,他引:13  
Plant resistance genes (R genes), especially the nucleotide binding site leucine-rich repeat (NBS-LRR) family of sequences, have been extensively studied in terms of structural organization, sequence evolution and genome distribution. These studies indicate that NBS-LRR sequences can be split into two related groups that have distinct amino-acid motif organizations, evolutionary histories and signal transduction pathways. One NBS-LRR group, characterized by the presence of a Toll/interleukin receptor domain at the amino-terminal end, seems to be absent from the Poaceae. Phylogenetic analysis suggests that a small number of NBS-LRR sequences existed among ancient Angiosperms and that these ancestral sequences diversified after the separation into distinct taxonomic families. There are probably hundreds, perhaps thousands, of NBS-LRR sequences and other types of R gene-like sequences within a typical plant genome. These sequences frequently reside in 'mega-clusters' consisting of smaller clusters with several members each, all localized within a few million base pairs of one another. The organization of R-gene clusters highlights a tension between diversifying and conservative selection that may be relevant to gene families that are unrelated to disease resistance.  相似文献   

3.
Lv HJ  Huang Y 《动物学研究》2012,33(3):319-328
该研究基于直翅目56种昆虫的COI基因全序列构建了该目部分类群间的系统发育关系,同时也分析了COI基因编码的氨基酸序列构建直翅目系统发育关系的可靠性。将COI序列按照密码子一、二、三位点划分,分别计算PBS(partioned Bremer support)值,评估蛋白质编码基因密码子不同位点的系统发生信号强度。分析结果支持螽亚目和蝗亚目的单系性;剑角蝗科、斑腿蝗科、斑翅蝗科、网翅蝗科和槌角蝗科5科均不是单系群,科间的遗传距离在0.107~0.153之间变化,与其他科相比遗传距离较小,符合将这5科合并为一科(即蝗科)的分类系统,瘤锥蝗科和锥头蝗科归为锥头蝗总科,癞蝗科单独成为一科,这也与Otte(1997)系统的划分一致。根据PBS值的大小推断密码子第三、第一位点对系统树分支的贡献比第二位点大,并且较长的序列含有较多的信息位点。研究也证实将各物种COI基因之间的遗传距离作为直翅目划分科级阶元的工具是可行的。  相似文献   

4.
Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.  相似文献   

5.
Amino acid sequences of proteinaceous proteinase inhibitors have been extensively analysed for deriving information regarding the molecular evolution and functional relationship of these proteins. These sequences have been grouped into several well defined families. It was found that the phylogeny constructed with the sequences corresponding to the exposed loop responsible for inhibition has several branches that resemble those obtained from comparisons using the entire sequence. The major branches of the unrooted tree corresponded to the families to which the inhibitors belonged. Further branching is related to the enzyme specificity of the inhibitor. Examination of the active site loop sequences of trypsin inhibitors revealed that here are strong preferences for specific amino acids at different positions of the loop. These preferences are inhibitor class specific. Inhibitors active against more than one enzyme occur within a class and confirm to class specific sequence in their loops. Hence, only a few positions in the loop seem to determine the specificity. The ability to inhibit the same enzyme by inhibitors that belong to different classes appears to be a result of convergent evolution.  相似文献   

6.
On the phylogeny of t-RNA's   总被引:1,自引:0,他引:1  
t-RNA sequences have been aligned to maximize matches in corresponding positions. The sequences have subsequently been divided into two parts, the “squelette” (skeleton) and “muscle” positions. A test of homology based on the binomial approach has been developed and was applied to the “muscle” positions of t-RNA. The phylogenetic relationship and parts of ancestral sequences have been obtained from Val and Ser t-RNA families. The known Leu t-RNA sequences have been shown to be a part of two different homologous families, indicating the possibility that degenerate codons give rise to non-homologous isoaccepting t-RNA's.  相似文献   

7.
Thiamine diphosphate-dependent decarboxylases catalyze both cleavage and formation of C C bonds in various reactions, which have been assigned to different homologous sequence families. This work compares 53 ThDP-dependent decarboxylases with known crystal structures. Both sequence and structural information were analyzed synergistically and data were analyzed for global and local properties by means of statistical approaches (principle component analysis and principal coordinate analysis) enabling complexity reduction. The different results obtained both locally and globally, that is, individual positions compared with the overall protein sequence or structure, revealed challenges in the assignment of separated homologous families. The methods applied herein support the comparison of enzyme families and the identification of functionally relevant positions. The findings for the family of ThDP-dependent decarboxylases underline that global sequence identity alone is not sufficient to distinguish enzyme function. Instead, local sequence similarity, defined by comparisons of structurally equivalent positions, allows for a better navigation within several groups of homologous enzymes. The differentiation between homologous sequences is further enhanced by taking structural information into account, such as BioGPS analysis of the active site properties or pairwise structural superimpositions. The methods applied herein are expected to be transferrable to other enzyme families, to facilitate family assignments for homologous protein sequences.  相似文献   

8.
The 3D structural comparison of families of divergent homologous domains revealed two main populations of hydrophobic amino acids, one with a low and the other with a significantly higher mean solvent accessibility, allowing two regions of the core of protein globular domains to be distinguished. The side chains of hydrophobic amino acids in topologically conserved positions (positions in the structural alignment where only hydrophobic amino acids are found), which we call topohydrophobic positions, are considerably less dispersed than those of the other amino acids (hydrophobic or not). Mean distances between gravity centers of amino acids in topohydrophobic positions are significantly shorter than those for non-topohydrophobic positions and show that the corresponding amino acids are almost all in direct contact in the inner core of globular domains. This study also showed that the small number of topohydrophobic positions is a characteristic of the structural differences between proteins of a family. This criterion is independent of the sequence identity between the sequences and of the root-mean-square distance between their corresponding structures. Using sensitive sequence alignment processes it will be possible, for many protein families, to identify topohydrophobic positions from sequences only. Proteins 33:329–342, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

9.
Sequence conservation in Alu evolution   总被引:25,自引:8,他引:17       下载免费PDF全文
A statistical analysis of a set of genomic human Alu elements is based on a published alignment and a recent classification of these sequences. After separation of the Alu sequences into families, the consensus sequences of these families are determined, using the correct weighting of the unidirectional decay of CG-dinucleotides. For, the tenfold greater mutation rate at CG's requires separate consideration of an independent clock at every stage of analysis. The distributions of the substitutions with respect to the new consensus sequences, taking the CG and the non-CG-nucleotide positions separately, lie far closer to the expected distributions than the total diversity. Computer analysis of the folding of RNAs derived from these sequences indicates that RNA secondary structure is conserved among Alu families, suggesting its importance for Alu proliferation and/or function. The folding pattern, further substantiated by a number of compensatory mutations, includes secondary structure domains which are homologous to those observed in 7SL RNA and a defined region of interaction between the two Alu subunits. These results are consistent with a model in which a small number of conserved Alu master genes give rise via retroposition to the numerous copies of Alu pseudogenes, that then diversify by random substitution. The master genes appeared at different periods during evolution giving rise to different families of Alu sequences.  相似文献   

10.

Background  

When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences.  相似文献   

11.
P Thuriaux 《Biochimie》1983,65(10):585-588
The nucleotide occupancy of 288 sequences of tRNA has been analyzed for every position on the standard tRnA sequence, except for the anticodon and the variable regions of the D and V loops. Modified nucleotides were assimilated to the canonical nucleotide from which they derive. A X2 test applied at the P = 0.01 level of significance showed family-specific patterns in each of the 6 isoacceptor families (tRNAMet, tRNAPhe, tRNALeu, tRNASer, tRNAVal and tRNAGly) where enough sequences are known to apply the test. The number of positions showing such a pattern ranged from 6 in the tRNASer and tRNAVal families to 15 in the tRNAMet, which is mostly formed of initiator tRNAs. Seven positions (12, 22, 31, 39, 44, 59 and 73) showed homologies in at least four families. The localization of most homologous nucleotides on the tRNA molecule makes it plausible that they interact with the recognition of the aminoacyl tRNA synthetase or, in a few cases, with the anticodon-codon recognition. A few positions (44, 59, 63) show homologies which are difficult to explain by a common functional constraint according to current ideas on the structure and function of tRNAs.  相似文献   

12.
13.
14.
《Gene》1996,172(1):GC33-GC41
We have developed a fast heuristic algorithm for multiple sequence alignment which provides near-to-optimal results for sufficiently homologous sequences. The algorithm makes use of the standard dynamic programming procedure by applying it to all pairs of sequences. The resulting score matrices for pair-wise alignment give rise to secondary matrices containing the additional charges imposed by forcing the alignment path to run through a particular vertex. Such a constraint corresponds to slicing the sequences at the positions defining that vertex, and aligning the remaining pairs of prefix and suffix sequences separately. From these secondary matrices, one can compute - for any given family of sequences - suitable positions for cutting all of these sequences simultaneously, thus reducing the problem of aligning a family of n sequences of average length l in a Divide and Conquer fashion to aligning two families of n sequences of approximately half that length.In this paper, we explain the method for the case of 3 sequences in detail, and we demonstrate its potential and its limits by discussing its behaviour for several test families. A generalization for aligning more than 3 sequences is lined out, and some actual alignments constructed by our algorithm for various user-defined parameters are presented.  相似文献   

15.
Coordinated amino acid changes in homologous protein families   总被引:4,自引:0,他引:4  
In the tobamovirus coat protein family, amino acid residues at some spatially close positions are found to be substituted in a coordinated manner [Altschuh et al. (1987) J. Mol. Biol., 193, 693]. Therefore, these positions show an identical pattern of amino acid substitutions when amino acid sequences of these homologous proteins are aligned. Based on this principle, coordinated substitutions have been searched for in three additional protein families: serine proteases, cysteine proteases and the haemoglobins. Coordinated changes have been found in all three protein families mostly within structurally constrained regions. This method works with a varying degree of success depending on the function of the proteins, the range of sequence similarities and the number of sequences considered. By relaxing the criteria for residue selection, the method was adapted to cover a broader range of protein families and to study regions of the proteins having weaker structural constraints. The information derived by these methods provides a general guide for engineering of a large variety of proteins to analyse structure-function relationships.  相似文献   

16.
Throughout evolution, eukaryotic genomes have been invaded by transposable elements (TEs). Little is known about the factors leading to genomic proliferation of TEs, their preferred integration sites and the molecular mechanisms underlying their insertion. We analyzed hundreds of thousands nested TEs in the human genome, i.e. insertions of TEs into existing ones. We first discovered that most TEs insert within specific ‘hotspots’ along the targeted TE. In particular, retrotransposed Alu elements contain a non-canonical single nucleotide hotspot for insertion of other Alu sequences. We next devised a method for identification of integration sequence motifs of inserted TEs that are conserved within the targeted TEs. This method revealed novel sequences motifs characterizing insertions of various important TE families: Alu, hAT, ERV1 and MaLR. Finally, we performed a global assessment to determine the extent to which young TEs tend to nest within older transposed elements and identified a 4-fold higher tendency of TEs to insert into existing TEs than to insert within non-TE intergenic regions. Our analysis demonstrates that TEs are highly biased to insert within certain TEs, in specific orientations and within specific targeted TE positions. TE nesting events also reveal new characteristics of the molecular mechanisms underlying transposition.  相似文献   

17.
KpnI families of long, interspersed repetitive DNAs are ubiquitous repetitive elements that occur in tens of thousands of copies in primate genomes. KpnI 1.2, 1.5 and two different KpnI 1.8-kb families were found within and flanking a 6.4-kb repeat beginning at 3 kb, 3' from the human β-globin gene. Thus, six different types of KpnI families have now been identified, and four of these are found next to each other in a specific 6.4-kb repeat. Clones of the distinct KpnI families were hybridized to clones of the 6.4-kb repeat and adjacent sequences encompassed within some 17.6 kb of DNA lying 3' to the β-globin gene cluster. The four KpnI families appear to make up the entire length of the 6.4-kb repeat. The linear order of the various cloned KpnI sequences in the repeat is 5'-pBK(1.8)26-pBK.(1.5)54-pBK(1.2)11-pBK(1.8)11-3'. KpnI 1.2-kb sequences were also detected downstream from the 6.4-kb repeat. As in the case of the KpnI 1.2 and 1.5-kb families, the two KpnI 1.8-kb sequence families described here each hybridized with about 15% of all plaques in two independently generated human genome libraries.  相似文献   

18.
MOTIVATION: With hundreds of completely sequenced microbial genomes available, and advancements in DNA microarray technology, the detection of genes in microbial communities consisting of hundreds of thousands of sequences may be possible. The existing strategies developed for DNA probe design, geared toward identifying specific sequences, are not suitable due to the lack of coverage, flexibility and efficiency necessary for applications in metagenomics. METHODS: ProDesign is a tool developed for the selection of oligonucleotide probes to detect members of gene families present in environmental samples. Gene family-specific probe sequences are generated based on specific and shared words, which are found with the spaced seed hashing algorithm. To detect more sequences, those sharing some common words are re-clustered into new families, then probes specific for the new families are generated. RESULTS: The program is very flexible in that it can be used for designing probes for detecting many genes families simultaneously and specifically in one or more genomes. Neither the length nor the melting temperature of the probes needs to be predefined. We have found that ProDesign provides more flexibility, coverage and speed than other software programs used in the selection of probes for genomic and gene family arrays. AVAILABILITY: ProDesign is licensed free of charge to academic users. ProDesign and Supplementary Material can be obtained by contacting the authors. A web server for ProDesign is available at http://www.uhnresearch.ca/labs/tillier/ProDesign/ProDesign.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
Fishes of the order Cypriniformes are almost completely restricted to freshwater bodies and number > 3400 species placed in 5 families, each with poorly defined subfamilies and/or tribes. The present study represents the first attempt toward resolution of the higher-level relationships of the world’s largest freshwater-fish clade based on whole mitochondrial (mt) genome sequences from 53 cypriniforms (including 46 newly determined sequences) plus 6 outgroups. Unambiguously aligned, concatenated mt genome sequences (14,563 bp) were divided into 5 partitions (first, second, and third codon positions of the protein-coding genes, rRNA genes, and tRNA genes), and partitioned Bayesian analyses were conducted, with protein-coding genes being treated in 3 different manners (all positions included; third codon positions converted into purine [R] and pyrimidine [Y] [RY-coding]; third codon positions excluded). The resultant phylogenies strongly supported monophyly of the Cypriniformes as well as that of the families Cyprinidae, Catostomidae, and a clade comprising Balitoridae + Cobitidae, with the 2 latter loach families being reciprocally paraphyletic. Although all of the data sets yielded nearly identical tree topologies with regard to the shallower relationships, deeper relationships among the 4 major clades (the above 3 major clades plus Gyrinocheilidae, represented by a single species Gyrinocheilus aymonieri in this study), were incongruent depending on the data sets. Treatment of the rapidly saturated third codon–position transitions appeared to be a source of such incongruities, and we advocate that RY-coding, which takes only transversions into account, effectively removes this likely “noise” from the data set and avoids the apparent lack of signal by retaining all available positions in the data set. [Reviewing Editor: Rafael Zardoya]  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号