首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Deng M  Yu C  Liang Q  He RL  Yau SS 《PloS one》2011,6(3):e17293

Background

Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences.

Methodology

To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists'' analyses.

Conclusions

Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve.  相似文献   

2.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

3.
In order to study the evolution of mitochondrial genomes in the early branching lineages of the monocotyledons, i.e., the Acorales and Alismatales, we are sequencing complete genomes from a suite of key taxa. As a starting point the present paper describes the mitochondrial genome of Butomus umbellatus (Butomaceae) based on next-generation sequencing data. The genome was assembled into a circular molecule, 450,826 bp in length. Coding sequences cover only 8.2% of the genome and include 28 protein coding genes, four rRNA genes, and 12 tRNA genes. Some of the tRNA genes and a 16S rRNA gene are transferred from the plastid genome. However, the total amount of recognized plastid sequences in the mitochondrial genome is only 1.5% and the amount of DNA transferred from the nucleus is also low. RNA editing is abundant and a total of 557 edited sites are predicted in the protein coding genes. Compared to the 40 angiosperm mitochondrial genomes sequenced to date, the GC content of the Butomus genome is uniquely high (49.1%). The overall similarity between the mitochondrial genomes of Butomus and Spirodela (Araceae), the closest relative yet sequenced, is low (less than 20%), and the two genomes differ in size by a factor 2. Gene order is also largely unconserved. However, based on its phylogenetic position within the core alismatids Butomus will serve as a good reference point for subsequent studies in the early branching lineages of the monocotyledons.  相似文献   

4.
Background and Aims Some plant groups, especially on islands, have been shaped by strong ancestral bottlenecks and rapid, recent radiation of phenotypic characters. Single molecular markers are often not informative enough for phylogenetic reconstruction in such plant groups. Whole plastid genomes and nuclear ribosomal DNA (nrDNA) are viewed by many researchers as sources of information for phylogenetic reconstruction of groups in which expected levels of divergence in standard markers are low. Here we evaluate the usefulness of these data types to resolve phylogenetic relationships among closely related Diospyros species.Methods Twenty-two closely related Diospyros species from New Caledonia were investigated using whole plastid genomes and nrDNA data from low-coverage next-generation sequencing (NGS). Phylogenetic trees were inferred using maximum parsimony, maximum likelihood and Bayesian inference on separate plastid and nrDNA and combined matrices.Key Results The plastid and nrDNA sequences were, singly and together, unable to provide well supported phylogenetic relationships among the closely related New Caledonian Diospyros species. In the nrDNA, a 6-fold greater percentage of parsimony-informative characters compared with plastid DNA was found, but the total number of informative sites was greater for the much larger plastid DNA genomes. Combining the plastid and nuclear data improved resolution. Plastid results showed a trend towards geographical clustering of accessions rather than following taxonomic species.Conclusions In plant groups in which multiple plastid markers are not sufficiently informative, an investigation at the level of the entire plastid genome may also not be sufficient for detailed phylogenetic reconstruction. Sequencing of complete plastid genomes and nrDNA repeats seems to clarify some relationships among the New Caledonian Diospyros species, but the higher percentage of parsimony-informative characters in nrDNA compared with plastid DNA did not help to resolve the phylogenetic tree because the total number of variable sites was much lower than in the entire plastid genome. The geographical clustering of the individuals against a background of overall low sequence divergence could indicate transfer of plastid genomes due to hybridization and introgression following secondary contact.  相似文献   

5.
Next generation sequencing is quickly emerging as the go-to tool for plant virologists when sequencing whole virus genomes, and undertaking plant metagenomic studies for new virus discoveries. This study aims to compare the genomic and biological properties of Bean yellow mosaic virus (BYMV) (genus Potyvirus), isolates from Lupinus angustifolius plants with black pod syndrome (BPS), systemic necrosis or non-necrotic symptoms, and from two other plant species. When one Clover yellow vein virus (ClYVV) (genus Potyvirus) and 22 BYMV isolates were sequenced on the Illumina HiSeq2000, one new ClYVV and 23 new BYMV sequences were obtained. When the 23 new BYMV genomes were compared with 17 other BYMV genomes available on Genbank, phylogenetic analysis provided strong support for existence of nine phylogenetic groupings. Biological studies involving seven isolates of BYMV and one of ClYVV gave no symptoms or reactions that could be used to distinguish BYMV isolates from L. angustifolius plants with black pod syndrome from other isolates. Here, we propose that the current system of nomenclature based on biological properties be replaced by numbered groups (I–IX). This is because use of whole genomes revealed that the previous phylogenetic grouping system based on partial sequences of virus genomes and original isolation hosts was unsustainable. This study also demonstrated that, where next generation sequencing is used to obtain complete plant virus genomes, consideration needs to be given to issues regarding sample preparation, adequate levels of coverage across a genome and methods of assembly. It also provided important lessons that will be helpful to other plant virologists using next generation sequencing in the future.  相似文献   

6.
To investigate the phylogenetic relationships among species with awnless lemmas in Roegneria and their related diploid genera, the possible genomic constitution and genome donor of species with awnless lemmas in Roegneria, phylogenetic analyses of disrupted meiotic cDNA (DMC1) sequences were investigated in this study. The results showed that: (1) Roegneria alashanica-1 grouped with the Y-type sequences and Roegneria alashanica-2 grouped with the St-type sequences, confirming that Roegneria alashanica has the StY genomes. (2) Roegneria grandis-1 grouped with the Y-type sequences and Roegneria grandis-2 grouped with the St-type sequences, confirming that Roegneria grandis has the StgY genomes, where the St genome from R. grandis is different from the St genome but is homologous with the Y genome. (3) Two Roegneria elytrigioides sequences grouped with the St-type sequences, confirming that Roegneria elytrigioides has the St1St2 genomes and should therefore be classified as Pseudoroegneria elytrigioides. (4) We prefer the suggestion that the Y genome is closely related to the St genome, however, the data do not certify that the St and Y genomes have the same origin.  相似文献   

7.

Background

Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes.

Methodology/Principal Findings

We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes.

Conclusion

The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.  相似文献   

8.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

9.
Ribosomal gene sequences are a popular choice for identification of bacterial species and, often, for making phylogenetic interpretations. Although very popular, the sequences of 16S rDNA and 16-23S intergenic sequences often fail to differentiate closely related species of bacteria. The availability of complete genome sequences of bacteria, in the recent years, has accelerated the search for new genome targets for phylogenetic interpretations. The recently published full genome data of nine strains of R. solanacearum, which causes bacterial wilt of crop plants, has provided enormous genomic choices for phylogenetic analysis in this globally important plant pathogen. We have compared a gene candidate recN, which codes for DNA repair and recombination function, with 16S rDNA/16-23S intergenic ribosomal gene sequences for identification and intraspecific phylogenetic interpretations in R. solanacearum. recN gene sequence analysis of R. solanacearum revealed subgroups within phylotypes (or newly proposed species within plant pathogenic genus, Ralstonia), indicating its usefulness for intraspecific genotyping. The taxonomic discriminatory power of recN gene sequence was found to be superior to ribosomal DNA sequences. In all, the recN-sequence-based phylogenetic tree generated with the Bayesian model depicted 21 haplotypes against 15 and 13 haplotypes obtained with 16S rDNA and 16-23S rDNA intergenic sequences, respectively. Besides this, we have observed high percentage of polymorphic sites (S 23.04%), high rate of mutations (Eta 276) and high codon bias index (CBI 0.60), which makes the recN an ideal gene candidate for intraspecific molecular typing of this important plant pathogen.  相似文献   

10.
The degree of similarity of DNA sequences can be concluded according to the comparison of DNA sequences, which helps to speculate their relationship in respect of the structure, function and evolution. In this paper, we introduce the fundamental of the weighted relative entropy based on 2-step Markov Model to compare DNA sequences. The DNA sequence, consisted of four characters A, T, C, G, can be considered as a Markov chain. By taking state space I = {A, T, C, G} and describe the DNA sequences with 2-step transition probability matrix we can get the eigenvalue of the DNA sequence to define the similarity metric. Therefore, we find a new method to compare the DNA sequences, which is used to classify chromosomes DNA sequences obtained from 30 species. The phylogenetic tree built by the alignment-free method of the distance matrix resulted from the weighted relative entropy has clearer and more accurate division.  相似文献   

11.
Praxelis (Eupatorium catarium Veldkamp) is a new hazardous invasive plant species that has caused serious economic losses and environmental damage in the Northern hemisphere tropical and subtropical regions. Although previous studies focused on detecting the biological characteristics of this plant to prevent its expansion, little effort has been made to understand the impact of Praxelis on the ecosystem in an evolutionary process. The genetic information of Praxelis is required for further phylogenetic identification and evolutionary studies. Here, we report the complete Praxelis chloroplast (cp) genome sequence. The Praxelis chloroplast genome is 151,410 bp in length including a small single-copy region (18,547 bp) and a large single-copy region (85,311 bp) separated by a pair of inverted repeats (IRs; 23,776 bp). The genome contains 85 unique and 18 duplicated genes in the IR region. The gene content and organization are similar to other Asteraceae tribe cp genomes. We also analyzed the whole cp genome sequence, repeat structure, codon usage, contraction of the IR and gene structure/organization features between native and invasive Asteraceae plants, in order to understand the evolution of organelle genomes between native and invasive Asteraceae. Comparative analysis identified the 14 markers containing greater than 2% parsimony-informative characters, indicating that they are potential informative markers for barcoding and phylogenetic analysis. Moreover, a sister relationship between Praxelis and seven other species in Asteraceae was found based on phylogenetic analysis of 28 protein-coding sequences. Complete cp genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family.  相似文献   

12.
DNA condensation with polyamines. II. Electron microscopic studies   总被引:24,自引:0,他引:24  
Approximately 75% of the wheat and rye genomes consist of repeated sequence DNA. Three-quarters of the non-repeated or few copy sequences in wheat are less than 1000 base-pairs long, whilst in rye approximately half of the non-repeated or few copy sequences are in this size class. Most of the remaining non-repeated or few copy sequences appear to be a few thousand base-pairs long.In this paper a somewhat novel approach has been used to quantitatively analyse the linear organisation of the large proportion of repeated sequence DNA as well as the non-repeated DNA in the wheat and rye genomes. Repeated sequences in the genomes of oats, barley, wheat and rye have been used as probes to distinguish and isolate four different groups of repeated sequences and their neighbouring sequences from the wheat and rye genomes. Radioactively labelled wheat or rye DNA fragments ranging from 200 to over 9000 nucleotides long were incubated separately with large excesses of denatured unlabelled oats, barley, wheat and rye DNAs to Cot values which enable all the repeated sequences of the unlabelled DNA to renature. The following parameters were then determined from the proportions of total labelled DNA in fragments which had at least partially renatured. (1) The proportions of the repeated sequences in the labelled DNAs that were able to hybridise to each unlabelled DNA; (2) the mean distance apart of the hybridising sequences on the longer labelled fragments; and (3) the proportion of the genome in which the hybridising sequences were concentrated. Analysis of these results, together with those of separate experiments designed to quantitatively estimate the nature of sequences unable to reanneal with the repeated sequences of each of the probe DNAs, have enabled schematic maps to be drawn which show how the repeated and non-repeated sequences are arranged in the wheat and rye genomes.Both genomes are constructed from millions of relatively short sequences, most of them considerably shorter than 3000 base-pairs. This structure was recognised because adjacent sequences can be distinguished by their frequency of repetition (i.e. repeated or non-repeated) or by their evolutionary origin. Approximately 40 to 45% of the wheat genome and 30 to 35% of the rye genome consists of short non-repeated sequences interspersed between short repeated sequences. Approximately 50% of the wheat genome and 60% of the rye genome consists of tandemly arranged repeated sequences of different evolutionary origins. It is postulated that much of this complex repeated sequence DNA could have arisen from amplification of compound sequences, each containing repeated and non-repeated sequence DNA.Short repeated sequences with a number average length of around 200 base-pairs and which occupy about 20% of the wheat and rye genomes are related to repeated sequences also found in oats and barley. They are concentrated in 60 to 70% of the wheat and rye genomes, being interspersed with different short repeated sequences and a significant proportion of the short non-repeated sequences.Rye chromosomes contain more DNA than wheat chromosomes. This is principally, but not entirely, due to additional repeated sequence DNA. Many quantitative changes appear to have occurred in both genomes, possibly affecting most families of repeated sequences, since wheat and rye diverged from a common ancestor. Both species contain species-specific repeated sequences (24% of rye genome; 16% of wheat genome) but a large proportion of these are closely interspersed with repeated sequences found in both genomes.  相似文献   

13.
Arctica islandica is known as the longest-lived non-colonial metazoan species on earth and is therefore increasingly being investigated as a new model in aging research. As the mitochondrial genome is associated with the process of aging in many species and bivalves are known to possess a peculiar mechanism of mitochondrial genome inheritance including doubly uniparental inheritance (DUI), we aimed to assess the genomic variability of the A. islandica mitochondrial DNA (mtDNA). We sequenced the complete mitochondrial genomes of A. islandica specimens from three different sites in the Western Palaearctic (Iceland, North Sea, Baltic Sea). We found the A. islandica mtDNA to fall within the normal size range (18 kb) and exhibit similar coding capacity as other animal mtDNAs. The concatenated protein sequences of all currently known Veneroidea mtDNAs were used to robustly place A. islandica in a phylogenetic framework. Analysis of the observed single nucleotide polymorphism (SNP) patterns on further specimen revealed two prevailing haplotypes. Populations in the Baltic and the North Sea are very homogenous, whereas the Icelandic population, from which exceptionally old individuals have been collected, is the most diverse one. Homogeneity in Baltic and North Sea populations point to either stronger environmental constraints or more recent colonization of the habitat. Our analysis lays the foundation for further studies on A. islandica population structures, age research with this organism, and for phylogenetic studies. Accessions for the mitochondrial genome sequences: KC197241 Iceland; KF363951 Baltic Sea; KF363952 North Sea; KF465708 to KF465758 individual amplified regions from different speciemen  相似文献   

14.
Circoviruses are highly prevalent porcine and avian pathogens. In recent years, novel circular ssDNA genomes have recently been detected in a variety of fecal and environmental samples using deep sequencing approaches. In this study the identification of genomes of novel circoviruses and cycloviruses in feces of insectivorous bats is reported. Pan-reactive primers were used targeting the conserved rep region of circoviruses and cycloviruses to screen DNA bat fecal samples. Using this approach, partial rep sequences were detected which formed five phylogenetic groups distributed among the Circovirus and the recently proposed Cyclovirus genera of the Circoviridae. Further analysis using inverse PCR and Sanger sequencing led to the characterization of four new putative members of the family Circoviridae with genome size ranging from 1,608 to 1,790 nt, two inversely arranged ORFs, and canonical nonamer sequences atop a stem loop.  相似文献   

15.
《Genomics》2020,112(1):659-668
The NCBI database has >15 chloroplast (cp) genome sequences available for different Camellia species but none for C. assamica. There is no report of any mitochondrial (mt) genome in the Camellia genus or Theaceae family. With the strong believes that these organelle genomes can play a great tool for taxonomic and phylogenetic analysis, we successfully assembled and analyzed cp and mt genome of C. assamica. We assembled the complete mt genome of C. assamica in a single circular contig of 707,441 bp length comprising of a total of 66 annotated genes, including 35 protein-coding genes, 29 tRNAs and two rRNAs. The first ever cp genome of C. assamica resulted in a circular contig of 157,353 bp length with a typical quadripartite structure. Phylogenetic analysis based on these organelle genomes showed that C. assamica was closely related to C. sinensis and C. leptophylla. It also supports Caryophyllales as Superasterids.  相似文献   

16.
Mitochondrial genomic investigation of flatfish monophyly   总被引:1,自引:0,他引:1  
We present the first study to use whole mitochondrial genome sequences to examine phylogenetic affinities of the flatfishes (Pleuronectiformes). Flatfishes have attracted attention in evolutionary biology since the early history of the field because understanding the evolutionary history and patterns of diversification of the group will shed light on the evolution of novel body plans. Because recent molecular studies based primarily on DNA sequences from nuclear loci have yielded conflicting results, it is important to examine phylogenetic signal in different genomes and genome regions. We aligned and analyzed mitochondrial genome sequences from thirty-nine pleuronectiforms including nine that are newly reported here, and sixty-six non-pleuronectiforms (twenty additional clade L taxa [Carangimorpha or Carangimorpharia] and forty-six secondary outgroup taxa). The analyses yield strong support for clade L and weak support for the monophyly of Pleuronectiformes. The suborder Pleuronectoidei receives moderate support, and as with other molecular studies the putatively basal lineage of Pleuronectiformes, the Psettodoidei is frequently not most closely related to other pleuronectiforms. Within the Pleuronectoidei, the basal lineages in the group are poorly resolved, however several flatfish subclades receive consistent support. The affinities of Lepidoblepharon and Citharoides among pleuronectoids are particularly uncertain with these data.  相似文献   

17.
Despite Diplostomum baeri (Dubois, 1937) being one of the most widely distributed parasites of freshwater fish, there is no complete mitochondrial (mt) genome currently available. The complicated systematics presented by D. baeri has hampered investigations into the species distributions and infective dynamics of the species. Within this study we obtained complete mt genome sequences of D. baeri and assessed its phylogenetic relationship with other species of Digenea. The complete mitochondrial genome of D. baeri is 14,480 bp in length, containing 36 genes in total. The phylogenetic tree resulting from Bayesian inference of concatenated 12 protein coding gene sequences placed D. baeri alongside published mt genomes of Diplostomidae, with the overall taxonomic placement of the genus being a sister lineage of the order Plagiochiida The characterization of further mitochondrial genomes within the family Diplostomidae will help progress phylogenetic and epidemiological investigations as well as providing a framework for the analysis of diagnostic markers to be used in further monitoring of the parasite worldwide.  相似文献   

18.
FORRepeats: detects repeats on entire chromosomes and between genomes   总被引:1,自引:0,他引:1  
MOTIVATION: As more and more whole genomes are available, there is a need for new methods to compare large sequences and transfer biological knowledge from annotated genomes to related new ones. BLAST is not suitable to compare multimegabase DNA sequences. MegaBLAST is designed to compare closely related large sequences. Some tools to detect repeats in large sequences have already been developed such as MUMmer or REPuter. They also have time or space restrictions. Moreover, in terms of applications, REPuter only computes repeats and MUMmer works better with related genomes. RESULTS: We present a heuristic method, named FORRepeats, which is based on a novel data structure called factor oracle. In the first step it detects exact repeats in large sequences. Then, in the second step, it computes approximate repeats and performs pairwise comparison. We compared its computational characteristics with BLAST and REPuter. Results demonstrate that it is fast and space economical. We show FORRepeats ability to perform intra-genomic comparison and to detect repeated DNA sequences in the complete genome of the model plant Arabidopsis thaliana.  相似文献   

19.
Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However; for some organisms, it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.  相似文献   

20.
《Genomics》2019,111(6):1590-1603
Genomes are not random sequences because natural selection has injected information in biological sequences for billions of years. Inspired by this idea, we developed a simple method to compare genomes considering nucleotide counts in subsequences (blocks) instead of their exact sequences.We introduce the Block Alignment method for comparing two genomes and based on this comparison method, define a similarity score and a distance. The presented model ignores nucleotide order in the sequence. On the other hand, in this block comparison method, due to exclusion of point mutations and small size variations, there is no need for high coverage sequencing which is responsible for the high costs of data production and storage; moreover, the sequence comparisons could be performed with higher speed.Phylogenetic trees of two sets of bacterial genomes were constructed and the results were in full agreement with their already constructed phylogenetic trees. Furthermore, a weighted and directed similarity network of each set of bacterial genomes was inferred ab initio by this model. Remarkably, the communities of these networks are in agreement with the clades of the corresponding phylogenetic trees which means these similarity networks also contain phylogenetic information about the genomes. Moreover, the block comparison method was used to distinguish rob(15;21)c-associated iAMP21 and sporadic iAMP21 rearrangements in subgroups of chromosome 21 in acute lymphoblastic leukemia. Our results show a meaningful difference between the number of contigs that mapped to chromosomes 15 and 21 in these cases. Furthermore, the presented block alignment model can select the candidate blocks to perform more accurate analysis and it is capable to find conserved blocks on a set of genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号