首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
随着各种生物基因组序列测定工作的完成,大量的DNA序列数据涌现出来,为研究在基因组中寻找水平转移基因提供了极大的便利.将基因序列特征分析和支持向量机技术结合起来,通过分析基因序列的特征差异发现水平转移基因.依据以前研究工作的基础,选取了绝对密码子使用频率(FCU)作为序列特征,主要因为它既包含了基因密码子使用偏性的信息,也包含了基因所编码蛋白的氨基酸组成信息,支持向量机利用这些信息进行水平转移基因分析和预测,可以提高预测的准确性.另外,提出了基于分链的水平转移基因预测新方法,即将细菌基因组前导链和滞后链上的基因区别对待,分别进行水平转移基因预测.结果显示,基本预测方法要优于目前预测结果最好的Tsirigos等提出的基于八联核苷酸频率的打分算法,命中率的相对提高率最高达31.47%,而基于分链的方法对水平转移基因的预测取得了更好的结果.  相似文献   

2.
G蛋白偶联受体161(GPR161)是G蛋白偶联受体家族孤儿受体家族成员,在哺乳动物晶状体发育调节和神经胚形成中具有重要作用。近年来研究发现该蛋白罕见地拥有类似支架蛋白的结构特征,暗示其信号转导机制不同于其他G蛋白偶联受体。本研究以家鸡Gallus gallus为动物模型,探究GPR161基因序列信息、分子遗传进化关系以及其组织表达图谱。家鸡GPR161基因编码区序列全长1 566 bp,编码具521个氨基酸的前体蛋白。序列分析显示,家鸡GPR161基因编码区与人Homo sapiens、小鼠Mus musculus、斑马鱼Danio rerio的氨基酸相似度分别为83.0%、82.6%、65.8%。分子进化遗传分析结果显示,GPR161基因在家鸡与斑马鱼的进化关系比家鸡与人或小鼠都更为疏远。利用荧光定量PCR探究家鸡GPR161基因在各组织的表达分布,结果显示,家鸡GPR161基因mRNA在精巢或卵巢、大脑、心脏、肌肉中有较高表达。本研究是鸟类中关于GPR161基因的首次报道,研究结果为进一步探究GPR161基因在鸟类中的生理效应提供参考。  相似文献   

3.
昆虫纲半翅目异翅亚目黾蝽科圆臀大黾蝽 Aquarius paludum ( Fabricius,1794) 已成为生物学研究的理想生物材料之一,为更全面了解其分子生物学特征,本研究测定了圆臀大黾蝽 Aquarius paludum线粒体基因组全序列。该基因组全长15 380 bp,为双链环状 D N A 分子,包含 13 个蛋白编码基因、22 个 tRNA 基因、2 个 rRNA 基因及一个控制区。其基因排序与已报道的其它大部分异翅亚目类群排列方式相同。该基因组基因排列紧密,共观察到 64 bp 基因间隔 ( 除控制区 781 bp 外) 与 33 bp 基因重叠。全基因组 AT 含量为75. 7 % ,而控制区 AT 含量仅为 66. 2 % ,密码子使用也显示出 AT 使用偏好。13 个蛋白编码基因中,除 COⅠ、ND5 使用 TTG 作为起始密码子外,其余使用 ATV。此外,7 个蛋白编码基因使用常规三联终止密码子 TAA,TAG 作为终止密码子,其余以 T 作为终止密码子,下游为同链编码的tRN A 基因。在 tRN A-Ser ( G C T ) 二级结构中,D HU 臂缺失,未形成典型的三叶草结构。  相似文献   

4.
刘飞  张幼怡 《生命科学》2008,20(1):53-57
G蛋白偶联受体是体内最大的受体超家族,它们参与调节生物体内多种生理功能与病理过程。G蛋白偶联受体的分子内构象变化与G蛋白的偶联以及受体的二聚化等是G蛋白偶联受体激活的重要基本过程。借助于单分予研究手段,在G蛋白偶联受体激活方面取得了重要进展。本文将就这些方面进行简要的综述。  相似文献   

5.
为探讨该总科内部亲缘关系及其与线粒体基因排序之间的相关性,研究以方蟹科(Grapsidae)白纹方蟹(Grapsus albolineatus)为代表种,测定其线粒体基因组全序列。其全长为15577 bp,包含13个蛋白编码基因,22个tRNA基因, 2个rRNA基因和1个控制区。基因组碱基组成为33.4%A、12.0%G、20.6%C和34.0%T,具有明显的AT偏向性(67.4%)。除ATP8和ND1以GTG作为起始密码子外,其余蛋白编码基因均以ATN作为起始密码子;除COⅡ和Cyt b以T作为不完全终止密码子外,其余基因均以TAN作为终止密码子。亮氨酸(Leu)和半胱氨酸(Cys)分别是使用频率最高(15.28%)和最低(0.81%)的两种密码子。除tRNA-Ser1缺少DHU臂外,其余tRNA均能形成典型的三叶草结构。基于13个蛋白编码基因的核苷酸序列同时构建了方蟹总科的贝叶斯树(BI)和最大似然树(ML),两种方法构建的系统发育树扑拓结构一致,均显示所有方蟹科(Grapsidae)种类聚在一起,其中白纹方蟹与同属的细纹方蟹(G. tenuicrustatus)的亲缘关系最近;...  相似文献   

6.
眼虫Astasia longa类核纤层蛋白基因的初步研究   总被引:1,自引:1,他引:0  
利用PCR和克隆测序技术,对眼虫Astasia longa的核纤层蛋白(lamin)基因进行了研究。参考多种相对较低等多细胞动物的已知序列,设计出扩增lamin基因尾部区的引物,扩增获得两个主要片段:序列Ⅰ(650bp)和序列Ⅱ(797bp)。测序分析表明,序列Ⅱ包含序列Ⅰ,并具有lamin基因尾部特征(编码“CaaX“序列的四种密码子 终止密码子)的序列片段。  相似文献   

7.
《遗传》2020,(8)
G蛋白偶联受体(G protein-coupled receptors, GPCRs)作为最大的一类膜蛋白受体家族,可被多种配体激活并发挥相应的信号转导功能,参与生物体内重要的生理过程。G蛋白偶联受体相关分选蛋白(G protein-coupled receptors associated sorting proteins, GASPs)则对内吞后的GPCRs分选过程发挥着重要的作用,并介导受体进入降解或再循环途径,进而调控细胞的信号转导等过程。研究发现GASPs的功能缺陷与多种疾病相关,包括神经系统疾病、肿瘤和耳聋等。本文重点介绍了G蛋白偶联受体相关分选蛋白的功能特征及其相关信号通路,描述了GASPs功能缺陷与疾病的关联性及家族蛋白与GPCRs的相互作用、GASPs分选途径的发现、参与的信号通路及对基因转录调控,以期为GASPs相关多种疾病的治疗提供新的思路和策略。  相似文献   

8.
昆虫纲半翅目异翅亚目黾蝽科圆臀大黾蝽Aquarius paludum(Fabricius,1794)已成为生物学研究的理想生物材料之一,为更全面了解其分子生物学特征,本研究测定了圆臀大黾蝽Aquarius paludum线粒体基因组全序列.该基因组全长15380 bP,为双链环状DNA分子,包含13个蛋白编码基因、22个tRNA基因、2个rRNA基因及一个控制区.其基因排序与已报道的其它大部分异翅亚目类群排列方式相同.该基因组基因排列紧密,共观察到64 bp基因间隔(除控制区781 bp外)与33 bp基因重叠.全基因组AT含量为75.7%,而控制区AT含量仅为66.2%,密码子使用也显示出AT使用偏好.13个蛋白编码基因中,除COⅠ、ND5使用TTG作为起始密码子外,其余使用ATV.此外,7个蛋白编码基因使用常规三联终止密码子TAA,TAG作为终止密码子,其余以T作为终止密码子,下游为同链编码的tRNA基因.在tRNA-Ser (GGT)二级结构中,DHU臂缺失,未形成典型的三叶草结构.  相似文献   

9.
密码子偏性对痘苗病毒载体表达效率影响的研究   总被引:1,自引:0,他引:1  
为了研究密码子偏性对痘苗病毒载体表达效率的影响,分别采用痘苗病毒及其宿主细胞的优势密码子对绿色荧光蛋白基因进行改造,利用荧光、Western blot和FCM等方法分析其在痘苗病毒载体系统的表达水平。结果显示,全部采用痘苗病毒优势密码子(富含A T)和全部采用宿主细胞优势密码子(富含G C),以及部分使用宿主细胞优势密码子的三种绿色荧光蛋白基因都能够有效表达,表达水平相近,表明痘苗病毒载体对目的基因密码子的使用具有很好宽容性。为了探讨这种宽容性的机理,分别利用在胞核内和在胞浆内转录的质粒载体对不同密码子偏性的绿色荧光蛋白基因进行表达分析。结果显示,胞核内转录目的基因的pcDNA3质粒载体能有效表达富含G C的绿色荧光蛋白基因,不能有效表达富含A T的绿色荧光蛋白基因,而胞浆内转录目的基因的pSCA质粒载体能同样有效表达上述不同密码子偏性的目的基因。这些结果表明,位于胞浆内的富含A U的转录产物能够有效表达,细胞核内生成的富含A U的转录产物可能受核膜屏障或其它核内因素影响而不能有效表达。因此,胞浆内繁殖的特性是痘苗病毒载体具有密码子宽容性的主要原因。此研究为痘苗病毒载体和常用真核表达载体的选择使用提供了重要实验依据。  相似文献   

10.
中华攀雀线粒体基因组全序列测定与分析   总被引:1,自引:0,他引:1  
该研究使用长PCR扩增和引物步移法测定了中华攀雀(Remiz consobrinus)线粒体基因组全序列,在对序列进行拼接和注释的基础上,分析了其结构、序列组成及蛋白编码基因密码子使用情况等,并对22个tRNA和2个rRNA的二级结构以及控制区结构进行了预测及系统发育分析,为雀形目鸟类的系统发育研究提供了新信息。中华攀雀线粒体基因组全长16737bp,GenBank登录号KC463856,碱基A、T、C、G的含量分别为27.8%、21.5%、35.4%及15.3%,37个基因排列顺序与已报道的其他鸟类基本一致,包含13个蛋白编码基因、22个tRNA基因、2个rRNA基因及1个非编码的控制区(D-loop),有18对基因间共存在77bp的间隔,7对基因间共存在30bp的重叠。除ND3基因的起始密码子为ATT外,其余均为标准的ATG,11个蛋白编码基因的终止密码子为TAA、TAG、AGA或AGG,2个为不完全终止密码子T(COⅢ、ND4)。除tRNASer-AGNDHU臂缺失外,其余21个tRNA均可形成典型的三叶草结构,在出现的27处碱基错配中有19处为常见的G-U错配。SrRNA和LrRNA二级结构分别包含3个结构域47个茎环结构和6个结构域60个茎环结构,与所发表的其他鸟类rRNA二级结构大体一致。中华攀雀控制区发现了同样存在于其他鸟类控制区的保守框F-box、D-box、C-box、B-box、Bird similarity-box和CSB1-box。该研究支持将攀雀科作为独立的科,同时,支持莺总科与攀雀科的单系性。  相似文献   

11.
12.
Abstract-- A novel approach for gene classification, which adopts codon usage bias as input feature vector for classification by support vector machines (SVM) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector where each element corresponds to the relative synonymous usage frequency of a codon. As the input to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be classified are of different lengths, a condition that homology-based methods tend to fail. The method is demonstrated by using 1,841 Human Leukocyte Antigen (HLA) sequences which are classified into two major classes: HLA-I and HLA-II; each major class is further subdivided into sub-groups of HLA-I and HLA-II molecules. Using codon usage frequencies, binary SVM achieved accuracy rate of 99.3% for HLA major class classification and multi-class SVM achieved accuracy rates of 99.73% and 98.38% for sub-class classification of HLA-I and HLA-II molecules, respectively. The results show that gene classification based on codon usage bias is consistent with the molecular structures and biological functions of HLA molecules.  相似文献   

13.
Codon usage in bacteria: correlation with gene expressivity   总被引:153,自引:53,他引:100       下载免费PDF全文
The nucleic acid sequence bank now contains over 600 protein coding genes of which 107 are from prokaryotic organisms. Codon frequencies in each new prokaryotic gene are given. Analysis of genetic code usage in the 83 sequenced genes of the Escherichia coli genome (chromosome, transposons and plasmids) is presented, taking into account new data on gene expressivity and regulation as well as iso-tRNA specificity and cellular concentration. The codon composition of each gene is summarized using two indexes: one is based on the differential usage of iso-tRNA species during gene translation, the other on choice between Cytosine and Uracil for third base. A strong relationship between codon composition and mRNA expressivity is confirmed, even for genes transcribed in the same operon. The influence of codon use of peptide elongation rate and protein yield is discussed. Finally, the evolutionary aspect of codon selection in mRNA sequences is studied.  相似文献   

14.
Methods to determine periodicity in protein sequences are useful for inferring function. Fourier transformation is one approach but care is required to ensure the periodicity is genuine. Here we have shown that empirically-derived statistical tables can be used as a measure of significance. Genuine protein sequences data rather than randomly generated sequences were used as the statistical backdrop. The method has been applied to G-protein coupled receptor (GPCR) sequences, by Fourier transformation of hydrophobicity values, codon frequencies and the extent of over-representation of codon pairs; the latter being related to translational step times. Genuine periodicity was observed in the hydrophobicity whereas the apparent periodicity (as inferred from previously reported measures) in the translation step times was not validated statistically. GCR2 has recently been proposed as the plant GPCR receptor for the hormone abscisic acid. It has homology to the Lanthionine synthetase C-like family of proteins, an observation confirmed by fold recognition. Application of the Fourier transform algorithm to the GCR2 family revealed strongly predicted seven fold periodicity in hydrophobicity, suggesting why GCR2 has been reported to be a GPCR, despite negative indications in most transmembrane prediction algorithms. The underlying multiple sequence alignment, also required for the Fourier transform analysis of periodicity, indicated that the hydrophobic regions around the 7 GXXG motifs commence near the C-terminal end of each of the 7 inner helices of the alpha-toroid and continue to the N-terminal region of the helix. The results clearly explain why GCR2 has been understandably but erroneously predicted to be a GPCR.  相似文献   

15.
This paper introduces a new subcellular localization system (TSSub) for eukaryotic proteins. This system extracts features from both profiles and amino acid sequences. Four different features are extracted from profiles by four probabilistic neural network (PNN) classifiers, respectively (the amino acid composition from whole profiles; the amino acid composition from the N-terminus of profiles; the dipeptide composition from whole profiles and the amino acid composition from fragments of profiles). In addition, a support vector machine (SVM) classifier is added to implement the residue-couple feature extracted from amino acid sequences. The results from the five classifiers are fused by an additional SVM classifier. The overall accuracies of this TSSub reach 93.0 and 77.4% on Reinhardt and Hubbard's eukaryotic protein dataset and Huang and Li's eukaryotic protein dataset, respectively. The comparison with existing methods results shows TSSub provides better prediction performance than existing methods. AVAILABILITY: The web server is available from http://166.111.24.5/webtools/TSSub/index.html.  相似文献   

16.
Naveed M  Khan A  Khan AU 《Amino acids》2012,42(5):1809-1823
G protein-coupled receptors (GPCRs) are transmembrane proteins, which transduce signals from extracellular ligands to intracellular G protein. Automatic classification of GPCRs can provide important information for the development of novel drugs in pharmaceutical industry. In this paper, we propose an evolutionary approach, GPCR-MPredictor, which combines individual classifiers for predicting GPCRs. GPCR-MPredictor is a web predictor that can efficiently predict GPCRs at five levels. The first level determines whether a protein sequence is a GPCR or a non-GPCR. If the predicted sequence is a GPCR, then it is further classified into family, subfamily, sub-subfamily, and subtype levels. In this work, our aim is to analyze the discriminative power of different feature extraction and classification strategies in case of GPCRs prediction and then to use an evolutionary ensemble approach for enhanced prediction performance. Features are extracted using amino acid composition, pseudo amino acid composition, and dipeptide composition of protein sequences. Different classification approaches, such as k-nearest neighbor (KNN), support vector machine (SVM), probabilistic neural networks (PNN), J48, Adaboost, and Naives Bayes, have been used to classify GPCRs. The proposed hierarchical GA-based ensemble classifier exploits the prediction results of SVM, KNN, PNN, and J48 at each level. The GA-based ensemble yields an accuracy of 99.75, 92.45, 87.80, 83.57, and 96.17% at the five levels, on the first dataset. We further perform predictions on a dataset consisting of 8,000 GPCRs at the family, subfamily, and sub-subfamily level, and on two other datasets of 365 and 167 GPCRs at the second and fourth levels, respectively. In comparison with the existing methods, the results demonstrate the effectiveness of our proposed GPCR-MPredictor in classifying GPCRs families. It is accessible at .  相似文献   

17.
Guo J  Lin Y  Liu X 《Proteomics》2006,6(19):5099-5105
This paper proposes a new integrative system (GNBSL--Gram-negative bacteria subcellular localization) for subcellular localization specifized on the Gram-negative bacteria proteins. First, the system generates a position-specific frequency matrix (PSFM) and a position-specific scoring matrix (PSSM) for each protein sequence by searching the Swiss-Prot database. Then different features are extracted by four modules from the PSFM and the PSSM. The features include whole-sequence amino acid composition, N- and C-terminus amino acid composition, dipeptide composition, and segment composition. Four probabilistic neural network (PNN) classifiers are used to classify these modules. To further improve the performance, two modules trained by support vector machine (SVM) are added in this system. One module extracts the residue-couple distribution from the amino acid sequence and the other module applies a pairwise profile alignment kernel to measure the local similarity between every two sequences. Finally, an additional SVM is used to fuse the outputs from the six modules. Test on a benchmark dataset shows that the overall success rate of GNBSL is higher than those of PSORT-B, CELLO, and PSLpred. A web server GNBSL can be visited from http://166.111.24.5/webtools/GNBSL/index.htm.  相似文献   

18.
Abstract— Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic hypothesis while the other character set may impart information to a hypothesis. These two possibilities are cases of non-independence, however, we argue that congruence in such cases can be thought of as increasing the weight of the particular phylogenetic hypothesis that is supported by those characters. In the third case, the two sources of character information for a particular codon may be entirely incongruent with respect to phylogenetic hypotheses concerning the taxa examined. In this last case the two character sets are independent in that information from neither can predict the character states of the other. Examples of these possibilities are discussed and the general applicability of combining these two sources of information for protein coding genes is presented using sequences from the homeobox region of 46 homeobox genes fromDrosophila melanogasterto develop a hypothesis of genealogical relationship of these genes in this large multigene family.  相似文献   

19.
Genomics projects have resulted in a flood of sequence data. Functional annotation currently relies almost exclusively on inter-species sequence comparison and is restricted in cases of limited data from related species and widely divergent sequences with no known homologs. Here, we demonstrate that codon composition, a fusion of codon usage bias and amino acid composition signals, can accurately discriminate, in the absence of sequence homology information, cytoplasmic ribosomal protein genes from all other genes of known function in Saccharomyces cerevisiae, Escherichia coli and Mycobacterium tuberculosis using an implementation of support vector machines, SVM(light). Analysis of these codon composition signals is instructive in determining features that confer individuality to ribosomal protein genes. Each of the sets of positively charged, negatively charged and small hydrophobic residues, as well as codon bias, contribute to their distinctive codon composition profile. The representation of all these signals is sensitively detected, combined and augmented by the SVMs to perform an accurate classification. Of special mention is an obvious outlier, yeast gene RPL22B, highly homologous to RPL22A but employing very different codon usage, perhaps indicating a non-ribosomal function. Finally, we propose that codon composition be used in combination with other attributes in gene/protein classification by supervised machine learning algorithms.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号