期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An information-based sequence distance and its application to whole mitochondrial genome phylogeny 总被引：12，自引：0，他引：12

Li M Badger JH Chen X Kwong S Kearney P Zhang H 《Bioinformatics (Oxford, England)》2001,17(2):149-154

MOTIVATION: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. RESULTS: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. AVAILABILITY: The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html. 相似文献

2.

A mitochondrial genome phylogeny of Diptera: whole genome sequence data accurately resolve relationships over broad timescales with high precision

STEPHEN L. CAMERON CHRISTINE L. LAMBKIN STEPHEN C. BARKER MICHAEL F. WHITING 《Systematic Entomology》2007,32(1):40-59

Abstract Mitochondrial genomes provide a promising new tool for understanding deep‐level insect phylogenetics, but have yet to be evaluated for their ability to resolve intraordinal relationships. We tested the utility of mitochondrial genome data for the resolution of relationships within Diptera, the insect order for which the most data are available. We sequenced an additional three genomes, from a syrphid, nemestrinid and tabanid, representing three additional dipteran clades, ‘aschiza’, non‐heteroneuran muscomorpha and ‘basal brachyceran’, respectively. We assessed the influence of optimality criteria, gene inclusion/exclusion, data recoding and partitioning strategies on topology and nodal support within Diptera. Our consensus phylogeny of Diptera was largely consistent with previous phylogenetic hypotheses of the order, except that we did not recover a monophyletic Muscomorpha (Nesmestrinidae grouped with Tabanidae) or Acalyptratae (Drosophilidae grouped with Calliphoridae). The results were very robust to optimality criteria, as parsimony, likelihood and Bayesian approaches yielded very similar topologies, although nodal support varied. The addition of ribosomal and transfer RNA genes to the protein coding genes traditionally used in mitochondrial genome phylogenies improved the resolution and support, contrary to previous suggestions that these genes would evolve too quickly or prove too difficult to align to provide phylogenetic signal at deep nodes. Strategies to recode data, aimed at reducing homoplasy, resulted in a decrease in tree resolution and branch support. Bayesian analyses were highly sensitive to partitioning strategy: biologically realistic partitions into codon groups produced the best results. The implications of this study for dipteran systematics and the effective approaches to using mitochondrial genome data are discussed. Mitochondrial genomes resolve intraordinal relationships within Diptera accurately over very wide time ranges (1–200 million years ago) and genetic distances, suggesting that this may be an excellent data source for deep‐level studies within other, less studied, insect orders. 相似文献

3.

Extrapolating ENCODE data to the whole human genome

Costantini M Di Filippo M Bernardi G 《Gene》2008,419(1-2):66-69

The ENCODE (ENCyclopedia Of DNA Elements) project was launched three years ago with the purpose of identifying all of the functional elements in the human genome. ENCODE was started with 44 target sequences, which comprise 1% of the human genome. A crucial question about ENCODE is how representative it is of the human genome. Indeed, this is not a negligible problem if one considers that only 1% of the genome was selected for the project, and, more importantly, that the choice of the large DNA segments was based on two major criteria, namely the presence of extensively characterized genes and/or other functional elements, and the availability of a substantial amount of comparative sequence data. We found that the ENCODE data lead to an unbalanced representation of the compositional pattern of the human genome, especially for the GC-poorest and GC-richest regions. This unbalanced representativity of ENCODE can, however, be corrected by multiplying ENCODE data by a G/E factor (the ratio of whole genome data over ENCODE data), so amplifying the potential interest of ENCODE. 相似文献

4.

Examining phylogenetic relationships of Erwinia and Pantoea species using whole genome sequence data

Yucheng Zhang Sai Qiu 《Antonie van Leeuwenhoek》2015,108(5):1037-1046

相似文献

5.

How to improve gene phylogeny analysis from sequence data

Naruya S 《Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme》2002,47(9):1240-1242

相似文献

6.

Origin and phylogeny of Oryza species with the CD genome based on multiple-gene sequence data

Y. Bao S. Ge 《Plant Systematics and Evolution》2004,249(1-2):55-66

The CD genome species in the genus Oryza are endemic to Latin America, including O. alta, O. grandiglumis and O. latifolia. Origins and phylogenetic relationship of these species have long been in dispute and are still ambiguous due to their homogeneous genome type, similar morphological characteristics and overlapping distribution. In the present study, we sequenced two chloroplast fragments (matK and trnL-trnF) and portions of three nuclear genes (Adh1, Adh2 and GPA1) from sixteen accessions representing seven species with the C, CD, and E genomes, as well as one G genome species as the outgroup. Phylogenetic analyses using parsimony and distance methods strongly supported that the CD genome originated from a single hybridization event, and that the C genome species (O. officinalis or O. rhizomatis instead of O. eichingeri) served as the maternal parent while the E genome species (O. australiensis) was the paternal donor during the formation of CD genome. In addition, the consistent phylogenetic relationships among the CCDD species indicated that significant divergence existed between O. latifolia and the other two (O. alta and O. grandiglumis), which corroborated the suggestion of treating the latter two as a single species or as taxa within species.We thank Tao Sang of Michigan State University (East Lansing, USA) and Bao-rong Lu of Fudan University (Shanghai, China) for their encouragement and assistance. We are also grateful to the International Rice Research Institute (Manila, Philippines) for providing plant material for this study. This research was supported by the Chinese Academy of Sciences (kscxz-sw-101A), the National Natural Science Foundation of China (30025005) and the Program for Key International S & T Cooperation Project of P. R. China (2001CB711103). 相似文献

7.

The impact of whole genome sequence data on drug discovery--a malaria case study.

M P Joachimiak C Chang P J Rosenthal F E Cohen 《Molecular medicine (Cambridge, Mass.)》2001,7(10):698-710

BACKGROUND: Identification and validation of a drug discovery target is a prominent step in drug development. In the post-genomic era it is possible to reevaluate the association of a gene with a specific biological function to see if a homologous gene can subsume this role. This concept has special relevance to drug discovery in human infectious diseases, like malaria. A trophozoite cysteine protease (falcipain-1) from the papain family, thought to be responsible for the degradation of erythrocyte hemoglobin, has been considered a promising target for drug discovery efforts owing to the antimalarial activity of peptide based covalent cysteine protease inhibitors. This led to the development of non-peptidic non-covalent inhibitors of falcipain-1 and their characterization as antimalarials. It is now clear from sequencing efforts that the malaria genome contains more than one cysteine protease and that falcipain-1 is not the most important contributor to hemoglobin degradation. Rather, falcipain-2 and falcipain-3 appear to account for the majority of cysteine hemoglobinase activity in the plasmodium trophozoite. MATERIALS AND METHODS: We have modeled the falcipain-2 cysteine protease from one of the major human malaria species, Plasmodium falciparum and compared it to our original work on falcipain-1. As with falcipain-1, computa-tional screening of the falcipain-2 active site was conducted using DOCK. Using structural superpositions within the protease family and evolutionary analysis of substrate specificity sites, we focused on the commonalities and the protein specific features to direct our drug discovery effort. RESULTS: Since 1993, the size of the Available Chemicals Directory had increased from 55313 to 195419 unique chemical structures. For falcipain-2, eight inhibitors were identified with IC50's against the enzyme between 1 and 7 microM. Application of three of these inhibitors to infected erythrocytes cured malaria in culture, but parasite death did not correlate with food vacuole abnormalities associated with the activity of mechanistic inhibitors of cysteine proteases like the epoxide E64. CONCLUSIONS: Using plasmodial falcipain proteases, we show how a protein family perspective can influence target discovery and inhibitor design. We suspect that parallel drug discovery programs where a family of targets is considered, rather than serial programs built on a single therapeutic focus, will become the dominant industrial paradigm. Economies of scale in assay development and in compound synthesis are expected owing to the functional and structural features of individual family members. One of the remaining challenges in post-genomic drug discovery is that inhibitors of one target are likely to show some activity against other family members. This lack of specificity may lead to difficulties in functional assignments and target validation as well as a complex side effect profile. 相似文献

8.

Using matK sequence data to unravel the phylogeny of Casuarinaceae

Steane DA Wilson KL Hill RS 《Molecular phylogenetics and evolution》2003,28(1):47-59

Casuarinaceae are a Gondwanic family with a unique combination of morphological characters not comparable to any other family. Until recently, the 96 species in the family were classified in a single genus, Casuarina s.l. A recent morphological revision of the family resulted in the splitting of Casuarina s.l. into four genera-Allocasuarina, Casuarina s.s., Ceuthostoma, and Gymnostoma. This study uses matK sequence data from 76 species of Casuarinaceae and eight outgroup taxa to examine the phylogenetic structure within the Casuarinaceae. The study demonstrates the monophyly of the four genera and examines the relationships within the family; it tests the validity of the infra-generic subdivision of Allocasuarina; it discovers geography-based infra-generic subdivisions within Gymnostoma and Casuarina; and, finally, provides a molecular framework on which to trace the evolution of xeromorphy in the Casuarinaceae. 相似文献

9.

BSMAP: whole genome bisulfite sequence MAPping program

Yuanxin Xi Wei Li 《BMC bioinformatics》2009,10(1):232-9

Background

Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation. 相似文献

10.

Use of whole genome amplification to rescue DNA from plasma samples

Lu Y Gioia-Patricola L Gomez JV Plummer M Franceschi S Kato I Canzian F 《BioTechniques》2005,39(4):511-515

While DNA of good quality and sufficient amount can be obtained easily from whole blood, buccal swabs, surgical specimens, or cell lines, these DNA-rich sources are not always available. This is particularly the case in studies for which biological specimens were collected when genotyping assays were not widely available. In those studies, serum or plasma is often the only source of DNA. Newly developed whole genome amplification (WGA) methods, based on phi29 polymerase, may play a significant role in recovering DNA in such instances. We tested a total of 528 plasma samples kept in storage at -40 degrees C for approximately 10 years for 8 single nucleotide polymorphisms (SNPs) using the 5' exonuclease (TaqMan) assay. These specimens yielded undetectable levels of DNA following extraction with an affinity column but produced an average 52.7 microg (standard deviation of 31.2 microg) of DNA when column-extracted DNA was used as a template for WGA. This increased the genotyping success rate from 54% to 93%. There were only 3 disagreements out of 364 paired genotyping results for pre- and post-WGA DNAs, indicating an error rate of 0.82%. These results are encouraging for expanding the use of poor DNA resources in genotyping studies. 相似文献

11.

Correction: Inference of past demography,dormancy and self-fertilization rates from whole genome sequence data

Thibaut Paul Patrick Sellinger Diala Abu Awad Markus Moest Aurlien Tellier 《PLoS genetics》2021,17(4)

相似文献

12.

Use of amino acid sequence data in phylogeny and evaluation of methods using computer simulation.

D Peacock D Boulter 《Journal of molecular biology》1975,95(4):513-527

相似文献

13.

Molecular phylogeny of Diploschistes inferred from ITS sequence data

《Lichenologist (London, England)》2003,35(1):27-32

相似文献

14.

Calculating bootstrap probabilities of phylogeny using multilocus sequence data

Seo TK 《Molecular biology and evolution》2008,25(5):960-971

Phylogeny estimation is extremely crucial in the study of molecular evolution. The increase in the amount of available genomic data facilitates phylogeny estimation from multilocus sequence data. Although maximum likelihood and Bayesian methods are available for phylogeny reconstruction using multilocus sequence data, these methods require heavy computation, and their application is limited to the analysis of a moderate number of genes and taxa. Distance matrix methods present suitable alternatives for analyzing huge amounts of sequence data. However, the manner in which distance methods can be applied to multilocus sequence data remains unknown. Here, we suggest new procedures to estimate molecular phylogeny using multilocus sequence data and evaluate its significance in the framework of the distance method. We found that concatenation of the multilocus sequence data may result in incorrect phylogeny estimation with an extremely high bootstrap probability (BP), which is due to incorrect estimation of the distances and intentional ignorance of the intergene variations. Therefore, we suggest that the distance matrices for multilocus sequence data be estimated separately and these matrices be subsequently combined to reconstruct phylogeny instead of phylogeny reconstruction using concatenated sequence data. To calculate the BPs of the reconstructed phylogeny, we suggest that 2-stage bootstrap procedures be adopted; in this, genes are resampled followed by resampling of the sequence columns within the resampled genes. By resampling the genes during calculation of BPs, intergene variations are properly considered. Via simulation studies and empirical data analysis, we demonstrate that our 2-stage bootstrap procedures are more suitable than the conventional bootstrap procedure that is adopted after sequence concatenation. 相似文献

15.

Replication of the baculovirus genome

MIkhaĭlov VS 《Molekuliarnaia biologiia》2003,37(2):288-299

The review describes the current state of studying the baculovirus DNA replication. The structural organization of replication initiation sites and replication intermediates are considered. Attention is focused on virus replication factors, including DNA polymerase, helicase, IE-1, LEF-1, LEF-2, and LEF-3. 相似文献

16.

A contribution to sedentary polychaete phylogeny using 18S rRNA sequence data 总被引：6，自引：0，他引：6

C. Bleidorn L. Vogt T. Bartolomaeus 《Journal of Zoological Systematics and Evolutionary Research》2003,41(3):186-195

The phylogenetic position of Annelida as well as its ingroup relationships are a matter of ongoing debate. A molecular phylogenetic study of sedentary polychaete relationships was conducted based on 70 sequences of 18S rRNA, including unpublished sequences of 18 polychaete species. The data set was analysed with maximum parsimony and maximum likelihood methods. Clade robustness was estimated by parsimony-bootstrapping and jackknifing, decay index, and clade support, as well as a posteriori probability tests using Bayesian inference. Irrespective of the applied method, some traditional sedentary polychaete taxa, such as Cirratulidae, Opheliidae, Orbiniidae, Siboglinidae and Spionidae, were recovered by our phylogenetic reconstruction. A close relationship between Orbiniidae and Questa received a particularly strong support. Echiura appears to be a polychaete ingroup taxon which is closely related to Dasybranchus (Capitellidae). As in previous molecular analyses, no support was found for the monophyly of Annelida nor for that of Polychaeta. However, we suggest that an increase in taxon sampling may yield additional resolution in the reconstruction of polychaete ingroup phylogeny, although the difficulties in reconstructing the basal phylogenetic relationships within Annelida may be due to their rapid radiation. 相似文献

17.

rpoB gene as a novel molecular marker to infer phylogeny in Planctomycetales

Joana Bondoso Jens Harder Olga Maria Lage 《Antonie van Leeuwenhoek》2013,104(4):477-488

The 16S rRNA gene has been used in the last decades as a gold standard for determining the phylogenetic position of bacteria and their taxonomy. It is a well conserved gene, with some variations, present in all bacteria and allows the reconstruction of genealogies of microorganisms. Nevertheless, this gene has its limitations when inferring phylogenetic relationships between closely related isolates. To overcome this problem, DNA–DNA hybridization appeared as a solution to clarify interspecies relationships when the sequence similarity of the 16S rRNA gene is above 97 %. However, this technique is time consuming, expensive and laborious and so, researchers developed other molecular markers such as sequencing of housekeeping or functional genes for accurate determination of bacterial phylogeny. One of these genes that have been used successfully, particularly in clinical microbiology, codes for the beta subunit of the RNA polymerase (rpoB). The rpoB gene is sufficiently conserved to be used as a molecular clock, it is present in all bacteria and it is a mono-copy gene. In this study, rpoB gene sequencing was applied to the phylum Planctomycetes. Based on the genomes of 19 planctomycetes it was possible to determine the correlation between the rpoB gene sequence and the phylogenetic position of the organisms at a 95–96 % sequence similarity threshold for a novel species. A 1200-bp fragment of the rpoB gene was amplified from several new planctomycetal isolates and their intra and inter-species relationships to other members of this group were determined based on a 96.3 % species border and 98.2 % for intraspecies resolution. 相似文献

18.

Reconciling gene and genome duplication events: using multiple nuclear gene families to infer the phylogeny of the aquatic plant family Pontederiaceae

Ness RW Graham SW Barrett SC 《Molecular biology and evolution》2011,28(11):3009-3018

Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis. 相似文献

19.

Using genome scans of DNA polymorphism to infer adaptive population divergence 总被引：21，自引：0，他引：21

Storz JF 《Molecular ecology》2005,14(3):671-688

Elucidating the genetic basis of adaptive population divergence is a goal of central importance in evolutionary biology. In principle, it should be possible to identify chromosomal regions involved in adaptive divergence by screening genome-wide patterns of DNA polymorphism to detect the locus-specific signature of positive directional selection. In the case of spatially separated populations that inhabit different environments or sympatric populations that exploit different ecological niches, it is possible to identify loci that underlie divergently selected traits by comparing relative levels of differentiation among large numbers of unlinked markers. In this review I first address the question of whether diversifying selection on polygenic traits can be expected to produce predictable patterns of allelic variation at the underlying quantitative trait loci (QTL), and whether the locus-specific effects of selection can be reliably detected against the genome-wide backdrop of stochastic variability. I then review different approaches that have been developed to identify loci involved in adaptive population divergence and I discuss the relative merits of model-based approaches that rely on assumptions about population structure vs. model-free approaches that are based on empirical distributions of summary statistics. Finally, I consider the evolutionary and functional insights that might be gained by conducting genome scans for loci involved in adaptive population divergence. 相似文献

20.

V genes in primates from whole genome sequencing data

D. N. Olivieri F. Gambón-Deza 《Immunogenetics》2015,67(4):211-228

相似文献