首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The uroporphyrinogen III synthase (UROS) enzyme (also known as hydroxymethylbilane hydrolyase) catalyzes the cyclization of hydroxymethylbilane to uroporphyrinogen III during heme biosynthesis. A deficiency of this enzyme is associated with the very rare Gunther''s disease or congenital erythropoietic porphyria, an autosomal recessive inborn error of metabolism. The current study investigated the possible role of UROS (Homo sapiens [EC: 4.2.1.75; 265 aa; 1371 bp mRNA; Entrez Pubmed ref NP_000366.1, NM_000375.2]) in evolution by studying the phylogenetic relationship and divergence of this gene using computational methods. The UROS protein sequences from various taxa were retrieved from GenBank database and were compared using Clustal-W (multiple sequence alignment) with defaults and a first-pass phylogenetic tree was built using neighbor-joining method as in DELTA BLAST 2.2.27+ version. A total of 163 BLAST hits were found for the uroporphyrinogen III synthase query sequence and these hits showed putative conserved domain, HemD superfamily (as on 14th Nov 2012). We then narrowed down the search by manually deleting the proteins which were not UROS sequences and sequences belonging to phyla other than Chordata were deleted. A repeat phylogenetic analysis of 39 taxa was performed using PhyML and TreeDyn software to confirm that UROS is a highly conserved protein with approximately 85% conserved sequences in almost all chordate taxons emphasizing its importance in heme synthesis.  相似文献   

2.
PhyloBLAST is an internet-accessed application based on CGI/Perl programming that compares a users protein sequence to a SwissProt/TREMBL database using BLAST2 and then allows phylogenetic analyses to be performed on selected sequences from the BLAST output. Flexible features such as ability to input your own multiple sequence alignment and use PHYLIP program options provide additional web-based phylogenetic analysis functionality beyond the analysis of a BLAST result.  相似文献   

3.
Cubilin, (CUBN; also known as intrinsic factor-cobalamin receptor [Homo sapiens Entrez Pubmed ref NM_001081.3; NG_008967.1; GI: 119606627]), located in the epithelium of intestine and kidney acts as a receptor for intrinsic factor – vitamin B12 complexes. Mutations in CUBN may play a role in autosomal recessive megaloblastic anemia. The current study investigated the possible role of CUBN in evolution using phylogenetic testing. A total of 588 BLAST hits were found for the cubilin query sequence and these hits showed putative conserved domain, CUB superfamily (as on 27th Nov 2012). A first-pass phylogenetic tree was constructed to identify the taxa which most often contained the CUBN sequences. Following this, we narrowed down the search by manually deleting sequences which were not CUBN. A repeat phylogenetic analysis of 25 taxa was performed using PhyML, RAxML and TreeDyn softwares to confirm that CUBN is a conserved protein emphasizing its importance as an extracellular domain and being present in proteins mostly known to be involved in development in many chordate taxa but not found in prokaryotes, plants and yeast.. No horizontal gene transfers have been found between different taxa.  相似文献   

4.
【背景】植物乳杆菌含有丰富的天然质粒,分析这些质粒的序列特征有利于分析质粒所携带的遗传信息。【目的】分析从植物乳杆菌PC518分离的新质粒pLP224,聚类分析其所属家族质粒的保守性与多样性。【方法】提取植物乳杆菌PC518的质粒,酶切后构建质粒DNA文库,测序和BLAST鉴定文库中的新序列;通过反向PCR完成质粒全序列测定,注释新质粒;使用进化树软件MEGA X构建质粒的Rep蛋白进化树,并分析结合序列的变化。【结果】从植物乳杆菌PC518分离出一个质粒pLP224,大小为1766bp,其中(G+C)mol%含量为41.39%,与已知质粒的最大序列相似性为86.85%。推定其复制方式为滚环复制,属于pMV158家族成员。17个pMV158家族质粒的Rep蛋白分析表明:pMV158家族质粒的Rep蛋白进化距离越近,其dso位点的结合序列相似性越高,进化距离越远则其序列相似性越低。【结论】pLP224是pMV158家族的新成员。pMV158家族质粒在dso位点的切开序列上保守,在结合序列上多样。其Rep蛋白随结合序列变化而不同。这种差异有利于pMV158家族不同成员在同一宿主的共存,是家...  相似文献   

5.
The Alzheimer's disease amyloid protein precursor (APP) gene is part of a multi-gene super-family from which sixteen homologous amyloid precursor-like proteins (APLP) and APP species homologues have been isolated and characterised. Comparison of exon structure (including the uncharacterised APL-1 gene), construction of phylogenetic trees, and analysis of the protein sequence alignment of known homologues of the APP super-family were performed to reconstruct the evolution of the family and to assess the functional significance of conserved protein sequences between homologues. This analysis supports an adhesion function for all members of the APP super family, with specificity determined by those sequences which are not conserved between APLP lineages, and provides evidence for an increasingly complex APP superfamily during evolution. The analysis also suggests that Drosophila APPL and Caenorhabditis elegans APL-1 may be a fourth APLP lineage indicating that these proteins, while not functional homologues of human APP, are similarly likely to regulate cell adhesion. Furthermore, the betaA4 sequence is highly conserved only in APP orthologues, strongly suggesting this sequence is of significant functional importance in this lineage.  相似文献   

6.
MOTIVATION: Phylogenomic approaches towards functional and evolutionary annotation of unknown sequences have been suggested to be superior to those based only on pairwise local alignments. User-friendly software tools making the advantages of phylogenetic annotation available for the ever widening range of bioinformatically uninitiated biologists involved in genome/EST annotation projects are, however, not available. We were particularly confronted with this issue in the annotation of sequences from different groups of complex algae originating from secondary endosymbioses, where the identification of the phylogenetic origin of genes is often more problematic than in taxa well represented in the databases (e.g. animals, plants or fungi). RESULTS: We present a flexible pipeline with a user-friendly, interactive graphical user interface running on desktop computers that automatically performs a basic local alignment search tool (BLAST) search of query sequences, selects a representative subset of them, then creates a multiple alignment from the selected sequences, and finally computes a phylogenetic tree. The pipeline, named PhyloGena, uses public domain software for all standard bioinformatics tasks (similarity search, multiple alignment, and phylogenetic reconstruction). As the major technological innovation, selection of a meaningful subset of BLAST hits was implemented using logic programming, mimicing the selection procedure (BLAST tables, multiple alignments and phylogenetic trees) are displayed graphically, allowing the user to interact with the pipeline and deduce the function and phylogenetic origin of the query. PhyloGena thus makes phylogenomic annotation available also for those biologists without access to large computing facilities and with little informatics background. Although phylogenetic annotation is particularly useful when working with composite genomes (e.g. from complex algae), PhyloGena can be helpful in expressed sequence tag and genome annotation also in other organisms. AVAILABILITY: PhyloGena (executables for LINUX and Windows 2000/XP as well as source code) is available by anonymous ftp from http://www.awi.de/en/phylogena.  相似文献   

7.
MOTIVATION: A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution. RESULTS: The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach.  相似文献   

8.
Nine proteins have been assigned to date to the superfamily of mammalian small heat shock proteins (sHsps): Hsp27 (HspB1, Hsp25), myotonic dystrophy protein kinase-binding protein (MKBP) (HspB2), HspB3, alphaA-crystallin (HspB4), alphaB-crystallin (HspB5), Hsp20 (p20, HspB6), cardiovascular heat shock protein (cvHsp [HspB7]), Hsp22 (HspB8), and HspB9. The most pronounced structural feature of sHsps is the alpha-crystallin domain, a conserved stretch of approximately 80 amino acid residues in the C-terminal half of the molecule. Using the alpha-crystallin domain of human Hsp27 as query in a BLAST search, we found sequence similarity with another mammalian protein, the sperm outer dense fiber protein (ODFP). ODFP occurs exclusively in the axoneme of sperm cells. Multiple alignment of human ODFP with the other human sHsps reveals that the primary structure of ODFP fits into the sequence pattern that is typical for this protein superfamily: alpha-crystallin domain (conserved), N-terminal domain (less conserved), central region (variable), and C-terminal tails (variable). In a phylogenetic analysis of 167 proteins of the sHsp superfamily, using Bayesian inference, mammalian ODFPs form a clade and are nested within previously identified sHsps, some of which have been implicated in cytoskeletal functions. Both the multiple alignment and the phylogeny suggest that ODFP is the 10th member of the superfamily of mammalian sHsps, and we propose to name it HspB10 in analogy with the other sHsps. The C-terminal tail of HspB10 has a remarkable low-complexity structure consisting of 10 repeats of the motif C-X-P. A BLAST search using the C-terminal tail as query revealed similarity with sequence elements in a number of Drosophila male sperm proteins, and mammalian type I keratins and cornifin-alpha. Taken together, the following findings suggest a specialized role of HspB10 in cytoskeleton: (1) the exclusive location in sperm cell tails, (2) the phylogenetic relationship with sHsps implicated in cytoskeletal functions, and (3) the partial similarity with cytoskeletal proteins.  相似文献   

9.

Background

An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. However, it is known that the best BLAST hit may not always correspond to the best taxonomic match. An alternative approach involves phylogenetic methods, which take into account alignments and a model of evolution in order to more accurately define the taxonomic origin of sequences. Similarity-search based methods typically run faster than phylogenetic methods and work well when the organisms in the sample are well represented in the database. In contrast, phylogenetic methods have the capability to identify new organisms in a sample but are computationally quite expensive.

Results

We propose a two-step approach for metagenomic taxon identification; i.e., use a rapid method that accurately classifies sequences using a reference database (this is a filtering step) and then use a more complex phylogenetic method for the sequences that were unclassified in the previous step. In this work, we explore whether and when using top BLAST hit(s) yields a correct taxonomic label. We develop a method to detect outliers among BLAST hits in order to separate the phylogenetically most closely related matches from matches to sequences from more distantly related organisms. We used modified BILD (Bayesian Integral Log-Odds) scores, a multiple-alignment scoring function, to define the outliers within a subset of top BLAST hits and assign taxonomic labels. We compared the accuracy of our method to the RDP classifier and show that our method yields fewer misclassifications while properly classifying organisms that are not present in the database. Finally, we evaluated the use of our method as a pre-processing step before more expensive phylogenetic analyses (in our case TIPP) in the context of real 16S rRNA datasets.

Conclusion

Our experiments make a good case for using a two-step approach for accurate taxonomic assignment. We show that our method can be used as a filtering step before using phylogenetic methods and provides a way to interpret BLAST results using more information than provided by E-values and bit-scores alone.
  相似文献   

10.
PhyloGenie: automated phylome generation and analysis   总被引:12,自引:1,他引:11  
Phylogenetic reconstruction is the method of choice to determine the homologous relationships between sequences. Difficulties in producing high-quality alignments, which are the basis of good trees, and in automating the analysis of trees have unfortunately limited the use of phylogenetic reconstruction methods to individual genes or gene families. Due to the large number of sequences involved, phylogenetic analyses of proteomes preclude manual steps and therefore require a high degree of automation in sequence selection, alignment, phylogenetic inference and analysis of the resulting set of trees. We present a set of programs that automates the steps from seed sequence to phylogeny and a utility to extract all phylogenies that match specific topological constraints from a database of trees. Two example applications that show the type of questions that can be answered by phylome analysis are provided. The generation and analysis of the Thermoplasma acidophilum phylome with regard to lateral gene transfer between Thermoplasmata and Sulfolobus, showed best BLAST hits to be far less reliable indicators of lateral transfer than the corresponding protein phylogenies.The generation and analysis of the Danio rerio phylome provided more than twice as many proteins as described previously, supporting the hypothesis of an additional round of genome duplication in the actinopterygian lineage.  相似文献   

11.
解偶联蛋白1(Uncoupling protein 1,UCP1)是位于褐色脂肪组织线粒体内膜上的一种解偶联蛋白,该蛋白可以诱导质子漏从而产热。通过设计简并引物进行RT-PCR从大绒鼠BAT中获得UCP1基因cDNA核心序列,RTPCR所得产物长约458 bp,包含的开放阅读框(open reading frame,ORF)为456 bp,编码151个氨基酸。通过BLAST搜索,所得大绒鼠UCP1基因cDNA氨基酸序列与黑线仓鼠、橙腹草原田鼠、金黄仓鼠、小家鼠和褐家鼠等哺乳动物的UCP1氨基酸序列同源性均在80%以上,而与鱼类和两栖类的氨基酸序列同源性在61%以下。研究结果表明UCP1在哺乳类中高度保守。同时,通过NJ方法以UCP1序列构建系统进化树表明大绒鼠与橙腹草原田鼠聚成一支,构成田鼠类分支。  相似文献   

12.
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence‐structure‐dynamics‐function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence‐conserved residues and build phylogenetic tree. Three‐dimensional structure alignment was also applied to obtain structure‐conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics.  相似文献   

13.
The evolutionary relationships among arthropod hemocyanins and insect hexamerins were investigated. A multiple sequence alignment of 12 hemocyanin and 31 hexamerin subunits was constructed and used for studying sequence conservation and protein phylogeny. Although hexamerins and hemocyanins belong to a highly divergent protein superfamily and only 18 amino acid positions are identical in all the sequences, the core structures of the three protein domains are well conserved. Under the assumption of maximum parsimony, a phylogenetic tree was obtained that matches perfectly the assumed phylogeny of the insect orders. An interesting common clade of the hymenopteran and coleopteran hexamerins was observed. In most insect orders, several paralogous hexamerin subclasses were identified that diversified after the splitting of the major insect orders. The dipteran arylphorin/LSP-1-like hexamerins were subject to closer examination, demonstrating hexamerin gene amplification and gene loss in the brachyceran Diptera. The hexamerin receptors, which belong to the hexamerin/hemocyanin superfamily, diverged early in insect evolution, before the radiation of the winged insects. After the elimination of some rapidly or slowly evolving sequences, a linearized phylogenetic tree of the hexamerins was constructed under the assumption of a molecular clock. The inferred time scale of hexamerin evolution, which dates back to the Carboniferous, agrees with the available paleontological data and reveals some previously unknown divergence times among and within the insect orders. Received: 4 August 1997 / Accepted: 29 October 1997  相似文献   

14.
A global alignment of EF-G(2) sequences was corrected by reference to protein structure. The selection of characters eligible for construction of phylogenetic trees was optimized by searching for regions arising from the artifactual matching of sequence segments unique to different phylogenetic domains. The spurious matchings were identified by comparing all sections of the global alignment with a comprehensive inventory of significant binary alignments obtained by BLAST probing of the DNA and protein databases with representative EF-G(2) sequences. In three discrete alignment blocks (one in domain II and two in domain IV), the alignment of the bacterial sequences with those of Archaea–Eucarya was not retrieved by database probing with EF-G(2) sequences, and no EF-G homologue of the EF-2 sequence segments was detected by using partial EF-G(2) sequences as probes in BLAST/FASTA searches. The two domain IV regions (one of which comprises the ADP-ribosylatable site of EF-2) are almost certainly due to the artifactual alignment of insertion segments that are unique to Bacteria and to Archaea–Eucarya. Phylogenetic trees have been constructed from the global alignment after deselecting positions encompassing the unretrieved, spuriously aligned regions, as well as positions arising from misalignment of the G′ and G″ subdomain insertion segments flanking the ``fifth' consensus motif of the G domain (?varsson, 1995). The results show inconsistencies between trees inferred by alternative methods and alternative (DNA and protein) data sets with regard to Archaea being a monophyletic or paraphyletic grouping. Both maximum-likelihood and maximum-parsimony methods do not allow discrimination (by log-likelihood difference and difference in number of inferred substitutions) between the conflicting (monophyletic vs. paraphyletic Archaea) topologies. No specific EF-2 insertions (or terminal accretions) supporting a crenarchaeal–eucaryal clade are detectable in the new EF-G(2) sequence alignment.  相似文献   

15.
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11:739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling.  相似文献   

16.
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.  相似文献   

17.
18.
为揭示马铃薯Y病毒(Potato virus Y, PVY) pipo基因的分子变异和结构特征, 文章根据文献报道的马铃薯Y病毒属(Potyvirus) pipo 基因保守区序列设计一对简并引物, 从感染PVY的马铃薯病叶中克隆获得pipo基因的cDNA全长序列, 分析其核苷酸序列和氨基酸序列的特征, 并基于氨基酸序列使用贝叶斯法重建了Potyvirus的系统发育树。结果显示:20个PVY分离物成功扩增出预期大小(约235 bp)的特异性片段, 其核苷酸序列与已报道的其它PVY 株系的pipo基因核苷酸序列一致性均在92%以上; 5′端均含有典型的G1-2A6-7 基序(motif), 无碱基插入/缺失, 所有的核苷酸变异都是碱基置换, 共发现13个多态性位点, 其中4个简约信息位点, 9个单一变异位点, 表明该基因高度保守, 但不同分离物也存在一定的分子变异; PIPO蛋白理论等电点11.26~11.62, 无信号肽和跨膜区, 是可溶的亲水性蛋白; 整个蛋白含有3个保守区, 其中位于10~59aa的基序最为保守。该蛋白主要定位于线粒体中, 可能是线粒体导肽。系统发育分析结果显示, 源于PVY不同株系优先相聚成簇, 而向日葵褪绿斑驳病毒(Sunflower chlorotic mottle virus, SuCMoV)与辣椒重花叶病毒(Pepper severe mosaic virus, PepSMV)的亲缘关系较PVY相比更近, 与前人的结果相一致, 表明PIPO蛋白可以作为研究Potyvirus系统发育关系的新的分子标记。  相似文献   

19.
摘要:【目的】repC为质粒复制必需的起始蛋白基因。本研究旨在对华癸中生根瘤菌菌株HN3015及其质粒消除突变株进行repC基因的克隆和鉴定。【方法】采用通用引物RC1和RC3进行repC基因的PCR扩增,扩增产物克隆到载体pMD-18T,然后测序。利用Southern 杂交对repC基因定位。利用在线软件分析基因的序列特征,BLAST 工具进行同源性搜索;ExPASy推断其氨基酸的序列;ClustalW进行同源核苷酸和氨基酸序列的多重比较分析;PredictProtein 进行蛋白二级结构分析。【结果】  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号