首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In order to compare different genome sequences, an alignment-free method has proposed. First, we presented a new graphical representation of DNA sequences without degeneracy, which is conducive to intuitive comparison of sequences. Then, a new numerical characterization based on the representation was introduced to quantitatively depict the intrinsic nature of genome sequences, and considered as a 10-dimensional vector in the mathematical space. Alignment-free comparison of sequences was performed by computing the distances between vectors of the corresponding numerical characterizations, which define the evolutionary relationship. Two data sets of DNA sequences were constructed to assess the performance on sequence comparison. The results illustrate well validity of the method. The new numerical characterization provides a powerful tool for genome comparison.  相似文献   

2.
The currently available yeast mitochondrial DNA (mtDNA) sequence is incomplete, contains many errors and is derived from several polymorphic strains. Here, we report that the mtDNA sequence of the strain used for nuclear genome sequencing assembles into a circular map of 85 779 bp which includes 10 kb of new sequence. We give a list of seven small hypothetical open reading frames (ORFs). Hot spots of point mutations are found in exons near the insertion sites of optional mobile group I intron-related sequences. Our data suggest that shuffling of mobile elements plays an important role in the remodelling of the yeast mitochondrial genome.  相似文献   

3.
The structure of macronuclear DNA of a hypotrichous ciliate, Stylonychia pustulata, was examined by both electron microscopy and nucleotide sequence analysis. The DNA in the macronucleus consists of small linear molecules with average length of about 3400 base pairs (bp). Most, if not all, of these DNA molecules have identical inverted terminal repeat sequence of 20 nucleotide residues. This sequence is 5'-CCCCAAAACCCC-AAAACCCC.  相似文献   

4.
5.
The nucleotide sequences of nine clones, pKA191/l-4 from Drosophila kitumensis and pMR.190/1–5 from D. microlabis, were determined. They represent a tandemly arranged and highly repetitive satellite DNA family, KM190, which is specific for the two species.  相似文献   

6.
The nucleotide sequences of nine clones, pKA191/l-4 from Drosophila kitumensis and pMR.190/1–5 from D. microlabis, were determined. They represent a tandemly arranged and highly repetitive satellite DNA family, KM190, which is specific for the two species.  相似文献   

7.
Libraries of cosmid and plasmid clones covering the entire region of mtDNA from the liverwortMarchantia polymorpha were constructed. These clones were used for the determination of the complete nucleotide sequence of the liverwort mtDNA totally 186,608 bp (GenBank no. M68929) and including genes for 3 species of ribosomal RNAs, 29 genes for 27 species of transfer RNAs, and 30 genes for functionally known proteins (16 ribosomal proteins, 3 subunits of cytochromec oxidase, apocytochromeb protein, 3 subunits of H+-ATPase, and 7 subunits of NADH ubiquinone oxidoreductase). The genome also contains 32 unidentified open reading frames. Thus the complete nucleotide sequences from both chloroplast and mitochondrial genomes have been determined in the same organism. Plasmid clones are available upon the request. Gene names are represented according to Lonsdale and Leaver (1988) with modifications recommended by Lonsdale (personal communication).  相似文献   

8.
将臭鼩DAN经过Bam H Ⅰ酶切得到的高重复顺序DNA最小片段重组到质粒pAT153上,转化后得到了含有臭鼩BMS(Bam H Ⅰ)-1高重复顺序DNA片段的克隆。再把此片段重组到M_(13)mp19噬菌体DNA上。用末端终止法测得全部苷酸顺序为495个碱基对。对臭鼬BMS(Bam H Ⅰ)-1片段的结构特点进行了分析,并和树鼩TSr(BglⅡ)-1高重复顺序DNA进行了比较。为确定树鼩在分类学上的地位,提供了一定的分子遗传学证据。  相似文献   

9.
We have cloned and sequenced the displacement-loop (D-loop) region of the mitochondrial DNA (mtDNA) from the European seabass Dicentrarchus labrax (Dl). This sequencing revealed the presence of four tandemly repeated elements (R1, R2, R3 and R4); the individual variation in mtDNA total length is entirely accounted for by their variable number. The individuals examined also possessed an imperfect copy of one of the tandem repeats (ΨR2). At least one termination-associated sequence (TAS) is present in each of the repeats and in two copies 5′ upstream from the tandem array as well. The alignment of the Dl D-loop region with D-loop sequences from four other Teleosts and one Chondrosteus showed the Dl sequence to be larger than that of other fish. The extraordinary length of the D1 D-loop sequence is also due to the 5′ and 3′ regions that are flanking the tandem array, the largest ones to date analyzed in fish. In this study, we also report the unique organization and localization of putative TAS and conserved-sequence block (CSB) elements, and the presence of a conserved 218-bp sequence in the D1 D-loop region.  相似文献   

10.
Alignment-free classifiers are especially useful in the functional classification of protein classes with variable homology and different domain structures. Thus, the Topological Indices to BioPolymers (TI2BioP) methodology (Agüero-Chapin et al., 2010) inspired in both the TOPS-MODE and the MARCH-INSIDE methodologies allows the calculation of simple topological indices (TIs) as alignment-free classifiers. These indices were derived from the clustering of the amino acids into four classes of hydrophobicity and polarity revealing higher sequence-order information beyond the amino acid composition level. The predictability power of such TIs was evaluated for the first time on the RNase III family, due to the high diversity of its members (primary sequence and domain organization). Three non-linear models were developed for RNase III class prediction: Decision Tree Model (DTM), Artificial Neural Networks (ANN)-model and Hidden Markov Model (HMM). The first two are alignment-free approaches, using TIs as input predictors. Their performances were compared with a non-classical HMM, modified according to our amino acid clustering strategy. The alignment-free models showed similar performances on the training and the test sets reaching values above 90% in the overall classification. The non-classical HMM showed the highest rate in the classification with values above 95% in training and 100% in test. Although the higher accuracy of the HMM, the DTM showed simplicity for the RNase III classification with low computational cost. Such simplicity was evaluated in respect to HMM and ANN models for the functional annotation of a new bacterial RNase III class member, isolated and annotated by our group.  相似文献   

11.
We have identified and cloned a new member of the mammalian tandem pore domain K+ channel subunit family, TWIK-originated similarity sequence, from a human testis cDNA library. The 939 bp open reading frame encodes a 313 amino acid polypeptide with a calculated Mr of 33.7 kDa. Despite the same predicted topology, there is a relatively low sequence homology between TWIK-originated similarity sequence and other members of the mammalian tandem pore domain K+ channel subunit family group. TWIK-originated similarity sequence shares a low (< 30%) identity with the other mammalian tandem pore domain K+ channel subunit family group members and the highest identity (34%) with TWIK-1 at the amino acid level. Similar low levels of sequence homology exist between all members of the mammalian tandem pore domain K+ channel subunit family. Potential glycosylation and consensus PKC sites are present. Northern analysis revealed species and tissue-specific expression patterns. Expression of TWIK-originated similarity sequence is restricted to human pancreas, placenta and heart, while in the mouse, TWIK-originated similarity sequence is expressed in the liver. No functional currents were observed in Xenopus laevis oocytes or HEK293T cells, suggesting that TWIK-originated similarity sequence may be targeted to locations other than the plasma membrane or that TWIK-originated similarity sequence may represent a novel regulatory mammalian tandem pore domain K+ channel subunit family subunit.  相似文献   

12.
The nucleotide (nt) sequence of the 5508-nt intergenic spacer (IGS), between the 25S- and the 18S-coding regions of Cucurbita maxima rDNA, was determined. The fragment sequenced is 6142 nt long and includes 472 nt of 25S- and 162 nt of 18S-coding regions. The IGS has a complex primary structure, composed of five repetitive families (A-E) and three unique domains. It is dominated by the presence of nine, tandemly-repeating units of approximately 250 nt (repeat D), each unit containing four copies of an internal subrepeat (repeat E). The repetitive units show sequence variability consisting of nt changes, insertions and deletions. Upstream of the nine D repeats and between two copies of the B repeat is a 575-nt region, highly G + C rich (83%) and heavily biased toward C (58%) in the sense strand. Within this region are six repetitive units, averaging 42 nt (repeat C) each, containing but a single A nt. Downstream from the terminus of the 25S-coding sequence, are two tandem copies of the 103-nt A repeat. The IGS of C. maxima is longer and more complex than that of other plant IGSs described to date. The 600 nt at the 5' portion of cucurbit IGS is more conserved in evolution than the remainder, as revealed by comparison of C. maxima and C. pepo IGS restriction maps and by nucleotide sequence comparison of C. maxima and Cucumis sativa IGSs.  相似文献   

13.
黄瓜线粒体DNA类质粒pC1的性质和核酸序列研究   总被引:2,自引:0,他引:2  
津研四号黄瓜线粒体中除主环DNA外,还有4种DNA类质粒:pC1、pC2、pC3、pC4。将环形类质粒pC1lpk gc pUC19的EcoRⅠ位点上,克隆至E.coli JM109中。以克隆的pC1为探针,进行同源性检测,pC1与津研四号黄瓜的核基因组、叶绿体基因组、线粒体基因组和线粒体中其他类质粒不同源。对pC1进行序列测定和分析,pC1长度2 889bp,含有多个正向和反向重复序列,有3个8  相似文献   

14.
Abstract

Protein sequences are treated as stochastic processes on the basis of a reduced amino acid alphabet of 10 types of amino acids. The realization of a stochastic process is described by associated transition probability matrix that corresponds to the process uniquely. Then new distances between transition probability matrices are defined for sequences similarity analysis. Two separate datasets are prepared and tested to identify the validity of the method. The results demonstrate the new method is powerful and efficient.  相似文献   

15.
Several proteins and genes are members of families that share a public evolutionary. In order to outline the evolutionary relationships and to recognize conserved patterns, sequence comparison becomes an emerging process. The current work investigates critically the k-mer role in composition vector method for comparing genome sequences. Generally, composition vector methods using k-mer are applied under choice of different value of k to compare genome sequences. For some values of k, results are satisfactory, but for other values of k, results are unsatisfactory. Standard composition vector method is carried out in the proposed work using 3-mer string length. In addition, special type of information based similarity index is used as a distance measure. It establishes that use of 3-mer and information based similarity index provide satisfactory results especially for comparison of whole genome sequences in all cases. These selections provide a sort of unified approach towards comparison of genome sequences.  相似文献   

16.
The plasmid pBR322 was one of the first EK2 multipurpose cloning vectors to be designed and constructed (ten years ago) for the efficient cloning and selection of recombinant DNA molecules in Escherichia coli. This 4363-bp DNA molecule has been extensively used as a cloning vehicle because of its simplicity and the availability of its nucleotide sequence. The widespread use of pBR322 has prompted numerous studies into its molecular structure and function. These studies revealed two features that detract from the plasmid's effectiveness as a cloning vector: (a) plasmid instability in the absence of selection and, (b) the lack of a direct selection scheme for recombinant DNA molecules. Several vectors based on pBR322 have been constructed to overcome these limitations and to extend the vector's versatility to accomodate special cloning purposes. The objective of this review is to provide a survey of these derivative vectors and to summarize information currently available on pBR322.  相似文献   

17.
    
The gain in fitness during adaptation depends on the supply of beneficial mutations. Despite a good theoretical understanding of how evolution proceeds for a defined set of mutations, there is little understanding of constraints on net fitness-whether fitness will reach a limit despite ongoing selection and mutation, and if there is a limit, what determines it. Here, the dsDNA bacteriophage SP6, a virus of Salmonella, was adapted to Escherichia coli K-12. From an isolate capable of modest growth on E. coli, four lines were adapted for rapid growth by protocols differing in use of mutagen, propagation method, and duration, but using the same media, temperature, and a continual excess of the novel host. Nucleotide changes underlying those adaptations differed greatly in number and identity, but the four lines achieved similar absolute fitness at the end, an increase of more than 4000-fold phage descendants per hour. Thus, the fitness landscape allows multiple genetic paths to the same approximate fitness limit. The existence and causes of fitness limits have ramifications to genome engineering, vaccine design, and \"lethal mutagenesis\" treatments to cure viral infections.  相似文献   

18.
We isolated cDNA for regenectin, a C-type lectin of the American cockroach (Periplaneta americana), and analysed expression of the regenectin gene in the regenerating legs. Regenectin was found to be a member of the Periplaneta lectin-related protein family. We found that the regenectin gene was expressed specifically in the epidermal cells of the newly formed regenerating legs. Together with our previous results, these results suggest that regenectin is synthesized by epidermal cells, secreted into the regenerating leg saccule, and assembles around myoblasts to form leg muscle fibers in situ.  相似文献   

19.
The concept of nucleic acid sequence base alternations is presented.The number of base alterations for the sequences of differentlength is established. The definition of "enlarged similarity"of nucleic acids sequences on the basis of sequence base alterationsis introduced. Mutual information between sequences is usedas a quantitative measure of enlarged similarity for two comparedsequences. The method of mutual information calculation is developedconsidering the correlation of bases in compared sequences.The definitions of correlated similarity and evolution similaritybetween compared sequences are given. Results of the use ofenlarged similarity approach for DNA sequences analysis arediscussed.  相似文献   

20.
在DNA序列相似性的研究中,通常采用的动态规划算法对空位罚分函数缺乏理论依据而带有主观性,从而取得不同的结果,本文提出了一种基于DTW(Dynamic Time Warping,动态时间弯曲)距离的DNA序列相似性度量方法可以解决这一问题.通过DNA序列的图形表示把DNA序列转化为时间序列,然后计算DTW距离来度量序列相似度以表征DNA序列属性,得到能够比较DNA序列相似性度量方法,并用这个方法比较分析了七种东亚钳蝎神经毒素(Buthusmartensi Karsch neurotoxin)基因序列的相似性,验证了该度量方法的有效性和准确性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号