首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Background

Short-read resequencing of genomes produces abundant information of the genetic variation of individuals. Due to their numerous nature, these variants are rarely exhaustively validated. Furthermore, low levels of undetected variant miscalling will have a systematic and disproportionate impact on the interpretation of individual genome sequence information, especially should these also be carried through into in reference databases of genomic variation.

Results

We find that sequence variation from short-read sequence data is subject to recurrent-yet-intermittent miscalling that occurs in a sequence intrinsic manner and is very sensitive to sequence read length. The miscalls arise from difficulties aligning short reads to redundant genomic regions, where the rate of sequencing error approaches the sequence diversity between redundant regions. We find the resultant miscalled variants to be sensitive to small sequence variations between genomes, and thereby are often intrinsic to an individual, pedigree, strain or human ethnic group. In human exome sequences, we identify 2–300 recurrent false positive variants per individual, almost all of which are present in public databases of human genomic variation. From the exomes of non-reference strains of inbred mice, we identify 3–5000 recurrent false positive variants per mouse – the number of which increasing with greater distance between an individual mouse strain and the reference C57BL6 mouse genome. We show that recurrently miscalled variants may be reproduced for a given genome from repeated simulation rounds of read resampling, realignment and recalling. As such, it is possible to identify more than two-thirds of false positive variation from only ten rounds of simulation.

Conclusion

Identification and removal of recurrent false positive variants from specific individual variant sets will improve overall data quality. Variant miscalls arising are highly sequence intrinsic and are often specific to an individual, pedigree or ethnicity. Further, read length is a strong determinant of whether given false variants will be called for any given genome – which has profound significance for cohort studies that pool datasets collected and sequenced at different points in time.

  相似文献   

2.
Hua  Kui  Zhang  Xuegong 《BMC genomics》2019,20(2):93-101
Background

Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage.

Results

As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses.

Conclusions

We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it.

  相似文献   

3.
Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1?h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and protein-protein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution.  相似文献   

4.
Purpose

The aquaculture sector is a major contributor to the economic and nutritional security for a number of countries. India’s total seafood exports for the year 2017–2018 accounted for US$ Million 7082. One of the major setbacks in this sector is the frequent outbreaks of diseases often due to bacterial pathogens. Vibriosis is one of the major diseases caused by bacteria of Vibrio spp., causing significant economic loss to the aquaculture sector. The objective of this study was to understand the genetic composition of Vibrio spp.

Methods

Thirty-five complete genomes were downloaded from GenBank comprising seven vibrio species, namely, Vibrio alginolyticus, V. anguillarum, V. campbellii, V. harveyi, V. furnissii, V. parahaemolyticus, and V. vulnificus. Pan-genome analysis was carried out with coding sequences (CDS) generated from all the Vibrio genomes. In addition, genomes were mined for genes coding for toxin-antitoxin systems, antibiotic resistance, genomic islands, and virulence factors.

Results

Results revealed an open pan-genome comprising of 2004 core, 8249 accessory, and 6780 unique genes. Downstream analysis of genomes and the identified unique genes resulted in 312 antibiotic resistance genes, 430 genes coding for toxin and antitoxin systems along with 4802, and 4825 putative virulent genes from genomic island regions and unique gene sets, respectively.

Conclusion

Pan-genome and other downstream analytical procedures followed in this study have the potential to predict strain-specific genes and their association with habitat and pathogenicity.

  相似文献   

5.
Background: All human land use (LU) affects the distribution of plant species; however, the impacts vary with the type/intensity of LU. For managing ecosystems, it is therefore essential to understand the effects of LU types on the distribution of plant species on a macroscale.

Aims: The objectives of our study were to quantify the effects of various LU types on the distribution of vascular plant species in Japan and to determine in particular the extent to which LU was an important factor for the distribution of common species.

Methods: Based on a logistic regression model and variation partitioning being applied to each plant species, we evaluated the partial deviance by six LU types, four climatic types and three topographic and geological factors for 647 plant species at 14,412 sites in Japan.

Results: The effect of LU was significant for species present at multiple sites. Of the six LU types, secondary vegetation and plantation were the most important factors determining species distribution for many species.

Conclusions: Our results suggest that distribution of the common species is largely affected by LU on macroscale. The design of LU relating to secondary vegetation and plantations will thus be important in determining changes in the vegetation composition within Japan.  相似文献   

6.
Berrios  Louis  Ely  Bert 《Plant and Soil》2020,449(1-2):81-95
Aims

Species within the Caulobacter genus have been termed ‘hub species’ in the plant microbiome. To understand these interactions, we assessed the interactions between several Caulobacter strains and a common host plant.

Methods

We identified a set of 11 Caulobacter strains that range in genetic diversity and tested them for their ability to increase the growth of Arabidopsis thaliana. In addition, biochemical assays were employed to determine if these Caulobacter strains produce common plant growth promoting (PGP) biosynthates. To identify potential PGP-related genes, genomic analyses were performed to compare the genomes of PGP Caulobacter strains to those of non-PGP Caulobacter strains.

Results

For the PGP Caulobacter strains, we observed that common PGP biosynthates did not contribute to the observed Caulobacter-mediated plant growth stimulation. Genomic analyses suggested that the genomes of PGP strains maintain similar metabolic pathways compared to those of non-PGP strains, and that common genes related to PGP factors do not explain the PGP mechanisms for the Caulobacter strains we analyzed.

Conclusions

Plant growth enhancement is not a conserved feature in the Caulobacter genus, and some Caulobacter strains even inhibit plant growth. Moreover, common PGP factors do not fully explain Caulobacter-mediated plant growth enhancement.

  相似文献   

7.

Background  

In eukaryotic genomes, most genes are members of gene families. When comparing genes from two species, therefore, most genes in one species will be homologous to multiple genes in the second. This often makes it difficult to distinguish orthologs (separated through speciation) from paralogs (separated by other types of gene duplication). Combining phylogenetic relationships and genomic position in both genomes helps to distinguish between these scenarios. This kind of comparison can also help to describe how gene families have evolved within a single genome that has undergone polyploidy or other large-scale duplications, as in the case of Arabidopsis thaliana – and probably most plant genomes.  相似文献   

8.
9.
Despite a long history of investigation, many bacteria associated with the human oral cavity have yet to be cultured. Studies that correlate the presence or abundance of uncultured species with oral health or disease highlight the importance of these community members. Thus, we sequenced several single-cell genomic amplicons from Desulfobulbus and Desulfovibrio (class Deltaproteobacteria) to better understand their function within the human oral community and their association with periodontitis, as well as other systemic diseases. Genomic data from oral Desulfobulbus and Desulfovibrio species were compared to other available deltaproteobacterial genomes, including from a subset of host-associated species. While both groups share a large number of genes with other environmental Deltaproteobacteria genomes, they encode a wide array of unique genes that appear to function in survival in a host environment. Many of these genes are similar to virulence and host adaptation factors of known human pathogens, suggesting that the oral Deltaproteobacteria have the potential to play a role in the etiology of periodontal disease.  相似文献   

10.
Abstract

Allelopathy is defined as the suppression of any aspect of growth and/or development of one plant by another through the release of chemical compounds. Although allelopathic interference has been demonstrated many times using in vitro experiments, few studies have clearly demonstrated allelopathy in natural settings. This difficulty reflects the complexity in examining and demonstrating allelopathic interactions under field conditions. In this paper we address a number of issues related to the complexity of allelopathic interference in higher plants: These are: (i) is a demonstrated pattern or zone of inhibition important in documenting allelopathy? (ii) is it ecologically relevant to explain the allelopathic potential of a species based on a single bioactive chemical? (iii) what is the significance of the various modes of allelochemical release from the plant into the environment? (iv) do soil characteristics clearly influence allelopathic activity? (v) is it necessary to exclude other plant interference mechanisms?, and (vi) how can new achievements in allelopathy research aid in solving problems related to relevant ecological issues encountered in research conducted upon natural systems and agroecosystems? A greater knowledge of plant interactions in ecologically relevant environments, as well as the study of biochemical pathways, will enhance our understanding of the role of allelopathy in agricultural and natural settings. In addition, novel findings related to the relevant enzymes and genes involved in production of putative allelochemicals, allelochemical persistence in the rhizosphere, the molecular target sites of allelochemicals in sensitive plant species and the influence of allelochemicals upon other organisms will likely lead to enhanced utilization of natural products for pest management or as pharmaceuticals and nutraceuticals. This review will address these recent findings, as well as the major challenges which continue to influence the outcomes of allelopathy research.  相似文献   

11.
Next generation sequencing is quickly emerging as the go-to tool for plant virologists when sequencing whole virus genomes, and undertaking plant metagenomic studies for new virus discoveries. This study aims to compare the genomic and biological properties of Bean yellow mosaic virus (BYMV) (genus Potyvirus), isolates from Lupinus angustifolius plants with black pod syndrome (BPS), systemic necrosis or non-necrotic symptoms, and from two other plant species. When one Clover yellow vein virus (ClYVV) (genus Potyvirus) and 22 BYMV isolates were sequenced on the Illumina HiSeq2000, one new ClYVV and 23 new BYMV sequences were obtained. When the 23 new BYMV genomes were compared with 17 other BYMV genomes available on Genbank, phylogenetic analysis provided strong support for existence of nine phylogenetic groupings. Biological studies involving seven isolates of BYMV and one of ClYVV gave no symptoms or reactions that could be used to distinguish BYMV isolates from L. angustifolius plants with black pod syndrome from other isolates. Here, we propose that the current system of nomenclature based on biological properties be replaced by numbered groups (I–IX). This is because use of whole genomes revealed that the previous phylogenetic grouping system based on partial sequences of virus genomes and original isolation hosts was unsustainable. This study also demonstrated that, where next generation sequencing is used to obtain complete plant virus genomes, consideration needs to be given to issues regarding sample preparation, adequate levels of coverage across a genome and methods of assembly. It also provided important lessons that will be helpful to other plant virologists using next generation sequencing in the future.  相似文献   

12.
13.
Background: The South Aegean Volcanic Arc (SAVA), one of the most notable geological structures of the Mediterranean Sea, is floristically well known. Nevertheless, the factors that contribute to shaping the plant species richness of the SAVA remain unclear.

Aims: To investigate the factors that affect plant species richness and identify plant diversity hotspots in the SAVA and other central Aegean islands.

Methods: We used stepwise multiple regression to test the relationship between a number of environmental factors and plant species richness in the SAVA, as well as the residuals from the species–area linear regressions of native, Greek and Cycladian endemic taxa as indicators of relative species richness.

Results: The area was confirmed to be the most powerful single explanatory variable of island species richness, while geodiversity, maximum elevation and mean annual precipitation explained a large proportion of variance for almost all the species richness measures. Anafi, Amorgos and Folegandros were found to be endemic plant diversity hotspots.

Conclusions: We have demonstrated that geodiversity is an important factor in shaping plant species diversity in the Cyclades, while mean annual precipitation, human population density and maximum elevation were significant predictors of the Greek endemics present in the Cyclades. Finally, Anafi was found to be a plant diversity hotspot in the South Aegean Sea.  相似文献   

14.
The use of antimicrobials in human and veterinary medicine has coincided with a rise in antimicrobial resistance (AMR) in the food-borne pathogens Campylobacter jejuni and Campylobacter coli. Faecal contamination from the main reservoir hosts (livestock, especially poultry) is the principal route of human infection but little is known about the spread of AMR among source and sink populations. In particular, questions remain about how Campylobacter resistomes interact between species and hosts, and the potential role of sewage as a conduit for the spread of AMR. Here, we investigate the genomic variation associated with AMR in 168 C. jejuni and 92 C. coli strains isolated from humans, livestock and urban effluents in Spain. AMR was tested in vitro and isolate genomes were sequenced and screened for putative AMR genes and alleles. Genes associated with resistance to multiple drug classes were observed in both species and were commonly present in multidrug-resistant genomic islands (GIs), often located on plasmids or mobile elements. In many cases, these loci had alleles that were shared among C. jejuni and C. coli consistent with horizontal transfer. Our results suggest that specific antibiotic resistance genes have spread among Campylobacter isolated from humans, animals and the environment.  相似文献   

15.
16.
Gao  Xiaoyang  Zhang  Xuan  Meng  Honghu  Li  Jing  Zhang  Di  Liu  Changning 《BMC genomics》2018,19(10):133-144
Background

Species of Paris Sect. Marmorata are valuable medicinal plants to synthesize steroidal saponins with effective pharmacological therapy. However, the wild resources of the species are threatened by plundering exploitation before the molecular genetics studies uncover the genomes and evolutionary significance. Thus, the availability of complete chloroplast genome sequences of Sect. Marmorata is necessary and crucial to the understanding the plastome evolution of this section and facilitating future population genetics studies. Here, we determined chloroplast genomes of Sect. Marmorata, and conducted the whole chloroplast genome comparison.

Results

This study presented detailed sequences and structural variations of chloroplast genomes of Sect. Marmorata. Over 40 large repeats and approximately 130 simple sequence repeats as well as a group of genomic hotspots were detected. Inverted repeat contraction of this section was inferred via comparing the chloroplast genomes with the one of P. verticillata. Additionally, almost all the plastid protein coding genes were found to prefer ending with A/U. Mutation bias and selection pressure predominately shaped the codon bias of most genes. And most of the genes underwent purifying selection, whereas photosynthetic genes experienced a relatively relaxed purifying selection.

Conclusions

Repeat sequences and hotspot regions can be scanned to detect the intraspecific and interspecific variability, and selected to infer the phylogenetic relationships of Sect. Marmorata and other species in subgenus Daiswa. Mutation and natural selection were the main forces to drive the codon bias pattern of most plastid protein coding genes. Therefore, this study enhances the understanding about evolution of Sect. Marmorata from the chloroplast genome, and provide genomic insights into genetic analyses of Sect. Marmorata.

  相似文献   

17.
Background

The number of species with completed genomes, including those with evidence for recent whole genome duplication events has exploded. The recently sequenced Atlantic salmon genome has been through two rounds of whole genome duplication since the divergence of teleost fish from the lineage that led to amniotes. This quadrupoling of the number of potential genes has led to complex patterns of retention and loss among gene families.

Results

Methods have been developed to characterize the interplay of duplicate gene retention processes across both whole genome duplication events and additional smaller scale duplication events. Further, gene expression divergence data has become available as well for Atlantic salmon and the closely related, pre-whole genome duplication pike and methods to describe expression divergence are also presented. These methods for the characterization of duplicate gene retention and gene expression divergence that have been applied to salmon are described.

Conclusions

With the growth in available genomic and functional data, the opportunities to extract functional inference from large scale duplicates using comparative methods have expanded dramatically. Recently developed methods that further this inference for duplicated genes have been described.

  相似文献   

18.
Since the endosymbiont origin from α-Proteobacteria, mitochondrial genomes have undergone extremely divergent evolutionary trajectories among eukaryotic lineages. Compared with the relatively compact and conserved animal mitochondrial genomes, plant mitochondrial genomes have many unique features, especially their large and complex genomic arrangements. The sizes of fully sequenced plant mitochondrial genomes span over a 100-fold range from 66 kb in Viscum scurruloideum to 11 000 kb in Silene conica. In addition to the typical circular structure, some species of plants also possess linear, and even multichromosomal, architectures. In contrast with the thousands of fully sequenced animal mitochondrial genomes and plant plastid genomes, only around 200 fully sequenced land plant mitochondrial genomes have been published, with many being only draft assemblies. In this review, we summarize some of the known novel characteristics found in plant mitochondrial genomes, with special emphasis on multichromosomal structures described in recent publications. Finally, we discuss the future prospects for studying the inheritance patterns of multichromosomal plant mitochondria and examining architectural variation at different levels of taxonomic organization—including at the population level.  相似文献   

19.
The study of nematode genomes over the last three decades has relied heavily on the model organism Caenorhabditis elegans, which remains the best-assembled and annotated metazoan genome. This is now changing as a rapidly expanding number of nematodes of medical and economic importance have been sequenced in recent years. The advent of sequencing technologies to achieve the equivalent of the $1000 human genome promises that every nematode genome of interest will eventually be sequenced at a reasonable cost. As the sequencing of species spanning the nematode phylum becomes a routine part of characterizing nematodes, the comparative approach and the increasing use of ecological context will help us to further understand the evolution and functional specializations of any given species by comparing its genome to that of other closely and more distantly related nematodes. We review the current state of nematode genomics and discuss some of the highlights that these genomes have revealed and the trend and benefits of ecological genomics, emphasizing the potential for new genomes and the exciting opportunities this provides for nematological studies.  相似文献   

20.
Fungi comprise a large monophyletic group of uni- and multicellular eukaryotic organisms in which many species are of economic or medical importance. Fungal genomes are variable in size (13–42 Mb), and multicellular species support true spatial and temporal cell-type-specific regulation of gene expression. In a 38.8-kbAspergillus nidulanscontiguous genomic DNA region, a transposable element and 12 potential genes were identified, 7 similar to genes in other organisms. This observation is consistent with the prediction that multicellular ascomycetous fungi harbor 8000–9000 genes in a 36-Mb average genome. Thus, the genomic DNA sequence of filamentous fungi will provide substantial amounts of genetic and functional information that is not available in yeast, for the human and other metazoan minimal gene complement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号