首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
基因组序列的8-mer频谱具有物种特异性,解读8-mer频谱内在规律,对于揭示基因组序列的结构组成和进化模式具有重要的意义。本研究统计了66个物种的8-mer频谱分布,发现高等哺乳动物8-mer频谱分布以三峰为主,鸟类和爬行类动物频谱分布以双峰为主,而鱼类和非脊椎类动物频谱分布以单峰为主。为了进一步研究基因组8-mer频谱的构成,使用16种XY二核苷分类方法。研究结果表明,只有在CG分类下具有以下两个特征:(1)CG0、 CG1和CG2子集的8-mer频谱呈现单峰分布,并且3个峰彼此分离;(2)相对随机中心位置,CG1和CG2子集频谱分布远离随机中心,CG0子集频谱分布在随机中心周围。为了进一步验证CG0、 CG1和CG2子集频谱分布与物种进化的关系,使用3个CG子集频谱的分离性指标构建了66个物种的系统发育树,该系统发育树将物种分为4个簇,分别为高等哺乳类、鸟类与爬行类、鱼类和非脊椎类。研究结果表明3个CG子集频谱分布与物种基因组进化信息密切相关。  相似文献   

2.
以人类1号染色体DNA序列为样本,分别计算了CDS、5'UTR、3'UTR、内含子和基因间五类序列中8-mer出现的频数,并得到8-mer相对模体数随频数的分布。发现内含子和基因间序列是明显的三峰分布,CDS是单峰分布,5'UTR和3'UTR是近似单峰分布。为了揭示这些分布出现差异的原因,将8-mer集合按照8-mer中包含两个或两个以上某二核苷酸(XY2)、包含一个某二核苷酸(XY1)和包含0个某二核苷酸(XY0)进行分类。发现16种分类中,只有CGj分类的三个子集分别形成独立的单峰分布。表明DNA序列是由三类CGj子集组成的,它们出现的频数是独立进化的结果;实际的8-mer频数分布是这三个CGj频数分布的叠加;由于这三个分布的距离不同,才造成了五类序列中8-mer分布的差异。对五类序列CGj三个子集中二核苷酸和三核苷酸出现的相对频数进行分析,发现CG2模体的相对频数在五类序列中基本相同,CG1模体的相对频数可将五类序列明显区分,CG0模体的相对频数可将编码序列和非编码序列明显区分。总之,CGj模体集合在DNA序列的组成上具有特定的规律性,在DNA序列进化上扮演了重要的角色。  相似文献   

3.
串联重复序列广泛存在于真核生物的基因组中,它通过影响染色质的空间结构及基因表达从而影响生物的遗传与进化.本研究以琴叶拟南芥(Arabidopsis lyrata)基因组为材料,分析了1~50 bp重复单元的串联重复序列特征.研究发现串联重复序列在基因的5'UTR和启动子区域密度最高(8757 bp/Mb,8430 bp/Mb),而编码区CDS的密度最低(2406 bp/Mb).基因组中重复模体最高的为单核苷酸重复的T/A碱基,5'UTR中包含大量的二核苷酸重复模体,而在CDS中主要是三核酸重复模体.串联重复序列特征在琴叶拟南芥基因组不同区域的差别,显示其与基因表达和调控功能相适应.本研究深入探讨了串联重复序列在植物基因组中的特征及作用,为重复序列调控基因表达及植物基因组进化提供借鉴.  相似文献   

4.
5.
采用生物素标记的拟南芥基因组DNA探针在75%杂交严谨度下对双子叶植物番茄、蚕豆和单子叶植物水稻、玉米、大麦的染色体进行了比较基因组荧光原位杂交(comparative genomic in situ hybridization,cGISH)分析,以揭示拟南芥与远缘植物基因组间的同源性.cGISH信号代表了拟南芥基因组DNA中的重复DNA与靶物种染色体上同源序列的杂交.探针DNA在所有靶物种的全部染色体上都产生了杂交信号.杂交信号为散在分布,并呈现随基因组增大,杂交信号增多,且分布更加分散的趋势.所有靶物种的核仁组织区(NOR)都显示了明显强于其他区域的杂交信号,表明拟南芥基因组DNA探针可用于植物NOR的物理定位.在所有的靶物种中,信号主要分布在染色体的臂中间区和末端,着丝粒或近着丝粒区有少数信号分布.大麦染色体显示了与C-和N-带不同的独特的cGISH信号带型,表明此探针可用于不同植物染色体的识别.这些结果表明,拟南芥基因组与远缘植物基因组之间,除rDNA和端粒重复序列外,还存在其它同源的重复DNA;一些重复DNA序列在被子植物分歧进化为单子叶和双子叶植物之前就已存在,虽经历了长期的进化过程,至今在远缘物种之间仍保持了较高的同源性.结果还提示,大基因组中古老而保守的重复DNA在进化过程中发生了明显的扩增.  相似文献   

6.
用一种植物的总基因组DNA与近缘或远缘物种的染色体杂交,可以研究植物近缘或远缘物种基因组进化关系。以拟高粱总基因组DNA为探针,对栽培高粱、甜高粱基因组进行杂交,结果表明栽培高粱、甜高梁和拟高梁基因组中重复序列存在很大的同源性,基因组进化关系表现出保守性。栽培高粱与拟高粱基因组间重复序列的同源性要比甜高粱与拟高粱间重复序列的同源性高。  相似文献   

7.
用一种植物的总基因组DNA与近缘或远缘物种的染色体杂交,可以研究植物近缘或远缘物种基因组进化关系。以拟高粱总基因组DNA为探针,对栽培高粱、甜高粱基因组进行杂交,结果表明栽培高粱、甜高粱和拟高粱基因组中重复序列存在很大的同源性,基因组进化关系表现出保守性。栽培高粱与拟高粱基因组间重复序列的同源性要比甜高粱与拟高粱间重复序列的同源性高。  相似文献   

8.
重建生物进化树一直以来都是进化生物学家的梦想。大量物种全基因组的测序使得我们可以从全基因组水平上构建进化树,来研究各个物种之间的进化关系。本文采用2种统计方法和3种距离计算方法,在全基因组水平上建立基于蛋白质结构的进化树。选取93个物种的全基因组作为分析对象,涵盖了3个超界:真核生物,细菌和古细菌。而结果也正确地将这些物种分为三个大类,每个大分支内部的物种聚类情况也基本和这些物种的形态学分类相吻合。并将这些方法的聚类结果与物种分类的结果相比较,得出丰度的统计方法和基于两向量夹角的距离计算方法这种组合在构建进化树上比其他组合更好。  相似文献   

9.
张太奎  苑兆和 《遗传》2018,40(1):44-56
植物古基因组学是基因组学一个新兴分支,从现存物种中重建其祖先基因组,推断在古历史中导致形成现存物种的进化或物种形成事件。高通量测序技术的不断革新使测序读长更长、更准确,加快了植物参考基因组序列的组装进程,为古基因组学研究提供了大批量可靠的现存物种的基因组序列资源。全基因组复制(whole-genome duplication, WGD)亦称古多倍化,使植物基因组快速重组,丢失大量基因,增加结构变异,对植物进化极其重要。本文综述了植物基因组测序与组装研究进展、植物古基因组学的原理、植物基因组WGD事件以及植物祖先基因组进化场景,并对未来植物古基因组学研究进行了展望。  相似文献   

10.
孙高飞  何守朴  潘兆娥  杜雄明 《遗传》2015,37(2):192-203
SSRs(Simple sequence repeats)是一类广泛存在于动植物基因组的DNA短串联重复序列,是重要的基因组分子标记。比较不同基因组同源SSR的差异,有利于了解相近物种间的进化过程。文章使用雷蒙德氏棉基因组(D5)、亚洲棉基因组(A2)全基因组序列和陆地棉(AD1)的限制性酶切基因组测序数据,进行全基因组SSR扫描,比较了A组和D组的SSR分布情况,通过识别3个基因组之间的同源SSR,比较它们之间同源SSR重复序列的差异。结果发现,A组和D组同源SSR的分布规律非常相似,但A组与AD组的同源SSR保守性比D组与AD组同源SSR的保守性强。与AD组同源SSR相比,A组中重复序列长度增长的SSR数量约为长度缩短的SSR数量的5倍,在D组中这一比值约为3倍。可以推测,四倍体AD组在与A组、D组的平行进化过程中,由于基因组融合,导致SSR的重复序列长度变化速率与二倍体A、D组有差异,同时这种差异可能导致了AD组SSR重复序列长度在进化过程中与二倍体相比有变短的趋势。文章首次对3个棉花基因组的同源SSR进行了系统地比较,发现了同源SSR在棉属四倍体基因组和二倍体基因组中的显著差异,为进一步揭示棉属基因组的进化规律提供了基础。  相似文献   

11.
The spectra of k-mer frequencies can reveal the structures and evolution of genome sequences. We confirmed that the trimodal spectrum of 8-mers in human genome sequences is distinguished only by CG2, CG1 and CG0 8-mer sets, containing 2,1 or 0 CpG, respectively. This phenomenon is called independent selection law. The three types of CG 8-mers were considered as different functional elements. We conjectured that (1) nucleosome binding motifs are mainly characterized by CG1 8-mers and (2) the core structural units of CpG island sequences are predominantly characterized by CG2 8-mers. To validate our conjectures, nucleosome occupied sequences and CGI sequences were extracted, then the sequence parameters were constructed through the information of the three CG 8-mer sets respectively. ROC analysis showed that CG1 8-mers are more preference in nucleosome occupied segments (AUC > 0.7) and CG2 8-mers are more preference in CGI sequences (AUC > 0.99). This validates our conjecture in principle.  相似文献   

12.
The contribution of slippage-like processes to genome evolution   总被引:19,自引:0,他引:19  
Simple sequences present in long (>30 kb) sequences representative of the single-copy genome of five species (Homo sapiens, Caenorhabditis elegans Saccharomyces cerevisiae, E. coli, and Mycobacterium leprae) have been analyzed. A close relationship was observed between genome size and the overall level of sequence repetition. This suggested that the incorporation of simple sequences had accompanied increases of genome size during evolution. Densities of simple sequence motifs were higher in noncoding regions than in coding regions in eukaryotes but not in eubacteria. All five genomes showed very biased frequency distributions of simple sequence motifs in all species, particularly in eukaryotes where AAA and TTT predominated. Interspecific comparisons showed that noncoding sequences in eukaryotes showed highly significantly similar frequency distributions of simple sequence motifs but this was not true of coding sequences. ANOVA of the frequency distributions of simple sequence motifs indicated strong contributions from motif base composition and repeat unit length, but much of the variation remained unexplained by these parameters. The sequence composition of simple sequences therefore appears to reflect both underlying sequence biases in slippage-like processes and the action of selection. Frequency distributions of simple sequence motifs in coding sequences correlated weakly or not at all with those in noncoding sequences. Selection on coding sequences to eliminate undesirable sequences may therefore have been strong, particularly in the human lineage.  相似文献   

13.
Compositional constraints and genome evolution   总被引:31,自引:0,他引:31  
  相似文献   

14.
Several proteins and genes are members of families that share a public evolutionary. In order to outline the evolutionary relationships and to recognize conserved patterns, sequence comparison becomes an emerging process. The current work investigates critically the k-mer role in composition vector method for comparing genome sequences. Generally, composition vector methods using k-mer are applied under choice of different value of k to compare genome sequences. For some values of k, results are satisfactory, but for other values of k, results are unsatisfactory. Standard composition vector method is carried out in the proposed work using 3-mer string length. In addition, special type of information based similarity index is used as a distance measure. It establishes that use of 3-mer and information based similarity index provide satisfactory results especially for comparison of whole genome sequences in all cases. These selections provide a sort of unified approach towards comparison of genome sequences.  相似文献   

15.
Summary Fifty random clones (350–2300 bp), derived from sheared, nuclear DNA, were studied via Southern analysis in order to make deductions about the organization and evolution of the tomato genome. Thirty-four of the clones were mapped genetically and determined to represent points on 11 of the 12 tomato chromosomes. Under moderate stringency conditions (80% homology required) 44% of the clones were classified as single copy. Under higher stringency, the majority of the clones (78%) behaved as single copy. Most of the remaining clones belonged to multicopy families containing 2–20 copies, while a few contained moderately or highly repeated sequences (10% at moderate stringency, 4% at high stringency). Divergence rates of sequences homologous to the 50 random genomic clones were compared with those corresponding to 20 previously described cDNA (coding sequence) clones. Rates were measured by probing each clone (random genomics and cDNAs) onto filters containing DNA from various species from the family Solanaceae (including potato, Datura, petunia and tobacco) as well as one species (watermelon) from another plant family, Cucurbitaceae. Under moderate stringency conditions, the majority of the random clones (single copy and repetitive) failed to detect homologous sequences in the more distantly related species, whereas approximately 90% of the 20 coding sequences analyzed could still be detected in all solanaceous species. The most highly repeated sequences appear to be the fastest evolving and homologous copies could be detected only in species most closely related to tomato. Dispersion of repetitive sequences, as opposed to tandem clustering, appears to be the rule for the tomato genome. None of the repetitive sequences discovered by this random sampling of the genome were tandemly arranged — a finding consistent with the notion that the tomato genome contains only a small fraction of satellite DNA. This study, along with a companion paper (Ganal et al. 1988), provides the first general sketch of the tomato genome at the molecular level and indicates that it is comprised largely of single copy sequences and these sequences, together with repetitive sequences are evolving at a rate faster than the coding portion of the genome. The small genome and paucity of highly repetitive DNA are favourable attributes with respect to the possibilities of conducting chromosome walking experiments in tomato and the fact that coding regions are well conserved among solanaceous species may be useful for distinguishing clones that contain coding regions from those that do not.  相似文献   

16.
Essentially all of the sequences in the pea (Pisum sativum) genome which reassociate with single copy kinetics at standard (Tm -25°C) criterion follow repetitive kinetics at lower temperatures (about Tm-35°C). Analysis of thermal stability profiles for presumptive single copy duplexes show that they contain substantial mismatch even when formed at standard criterion. Thus most of the sequences in the pea genome which are conventionally defined as single copy are actually fossil repeats — that is, they are members of extensively diverged (mutuated) and thus presumably ancient families of repeated sequences. Coding sequences as represented by a cDNA probe prepared from poly-somal poly(A) + mRNA reassociate with single copy kinetics regardless of criterion and do not form mismatched duplexes. The coding regions thus appear to be composed of true single copy sequences but they cannot represent more than a few percent of the pea genome. Ancient diverged repeats are present, but not a prominent feature of the smaller mung bean (Vigna radiata) genome. An extension of a simple evolutionary model is proposed in which these and other differences in genome organization are considered to reflect different rates of sequence amplification or genome turnover during evolution. The model accounts for some of the differences between typical plant and animal genomes.  相似文献   

17.
Microbial genome sequences provide us with the fossil records for inferring their origination and evolution. Assuming that current microbial genomes are the evolutionary results of ancient genomes or fragments and the neighboring genes in ancient genomes are more likely neighbors in current genomes, in this paper we proposed a paleontological algorithm and assembled the orthologous gene groups from 66 complete and current microbial genome sequences into a pseudo-ancient genome, which consists of continuous fragments of various sizes. We performed bootstrap resampling and correlation analyses and the results showed that the assembled ancient genome and fragments are statistically significant and the genes of the same fragment are inherently related and likely derived from common ancestors. This method provides a new computational tool for studying microbial genome structure and evolution.  相似文献   

18.
Tomato genomic libraries were screened for the presence of simple sequence repeats (SSRs) with seventeen synthetic oligonucleotide probes, consisting of 2- to 5-basepair motifs repeated in tandem. GAn and GTn sequences were found to occur most frequently in the tomato genome (every 1.2 Mb), followed by ATTn and GCCn (every 1.4 Mb and 1.5 Mb, respectively). In contrast, only ATn and GAn microsatellites (n > 7) were found to be frequent in the GenBank database, suggesting that other motifs may be preferentially located away from genes. Polymorphism of microsatellites was measured by PCR amplification of individual loci or by Southern hybridization, using a set of ten tomato cultivars. Surprisingly, only two of the nine microsatellite clones surveyed (five GTn, three GAn and one ATTn), showed length variation among these accessions. Polymorphism was also very limited betweenLycopersicon esculentum andL. pennelli, two distant species. Southern analysis using the seventeen oligonucleotide probes identified GATAn and GAAAn as useful motifs for the detection of multiple polymorphic fragments among tomato cultivars. To determine the structure of microsatellite loci, a GAn probe was used for hybridization at low stringency on a small insert genomic library, and randomly selected clones were analyzed. GAn based motifs of increasing complexity were found, indicating that simple dinucleotide sequences may have evolved into larger tandem repeats such as minisatellites as a result of basepair substitution, replication slippage, and possibly unequal crossing-over. Finally, we genetically mapped loci corresponding to two amplified microsatellites, as well as nine large hypervariable fragments detected by Southern hybridization with a GATA8 probe. All loci are located around putative tomato centromeres. This may contribute to understanding of the structure of centromeric regions in tomato.  相似文献   

19.
中国明对虾基因组小卫星重复序列分析   总被引:4,自引:0,他引:4  
高焕  孔杰 《动物学报》2005,51(1):101-107
通过对中国明对虾基因组随机DNA片断的测序 ,我们获得了总长度约 6 4 10 0 0个碱基的基因组DNA序列 ,从中共找到 172 0个重复序列。其中 ,小卫星序列的数目为 398个 ,占重复序列总数目的 2 3 14 %。这些小卫星序列的重复单位长度为 7- 16 5个碱基 ,集中分布于 7- 2 1个碱基范围内 ,其中以重复单位长度为 12个碱基的重复序列数目最多 ,为 5 8个 ,占小卫星重复序列总数目的 14 5 7%。不同拷贝数目所对应的重复序列的数目情况为 :拷贝数目为 2的重复单位所组成的重复序列数目最多 ,为 137个 ;其次是拷贝数目为 3的重复序列 ,为12 2个 ,且随着拷贝数目的增加 ,由其所组成的重复序列的数目呈递减的趋势。其中一部分序列见GeneBank数据库 ,登录号为AY6 990 72 -AY6 990 76。 398个重复序列分别由 398种重复单位所组成 ,因而小卫星重复序列的类型很多 ,我们初步分成三类 :两种碱基组成类别、三种碱基组成类别和四种碱基组成类别 ,并进一步根据各个重复序列中所含有的碱基种类的数量从大到小排列这些碱基而分成若干小类。从这些分类中可以看出 ,中国明对虾基因组中的小卫星整体上是富含A T的重复序列 ,并具有一定的“等级制度” ,揭示了其与微卫星重复序列之间的关系 ,即一部分小卫星重复序列可能起源于微卫星  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号