首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that approximately 92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.  相似文献   

2.
The present work describes three novel nonpolar host peptide sequences that provide a ready assessment of the 310- and α-helix compatibilities of natural and unnatural amino acids at different positions of small- to medium-size peptides. The unpolar peptides containing Ala, Aib, and a C-terminal p-iodoanilide group were designed in such a way that the peptides could be rapidly assembled in a modular fashion, were highly soluble in solvent mixtures of triflouroethanol and H2O for CD- and two-dimensional (2D) nmr spectroscopic analyses, and showed excellent crystallinity suited for x-ray structure analysis. To validate our approach we synthesized 9-mer peptides 79a–96 (Table IV), 12-mer peptides 99–110c (Table V), and 10-mer peptides 120a–125d and 129–133 (Table VI and Scheme 8) incorporating a series of optically pure cyclic and open-chain (R)- and (S)-α,α-disubstituted glycines 1–10 (Figure 2). These amino acids are known to significantly modulate the conformations of small peptides. Based on x-ray structures of 9-mers 79a, 80, and 87 (Figures 4–7), 10-mers 124c, 131, and 132 (Figures 9–12), and 12-mer peptide 102b (Figure 13), CD spectra of all peptides recorded in acidic, neutral, and basic media and detailed 2D-nmr analyses of 9-mer peptide 86 and 12-mer 102b, several interesting conformational observations were made. Especially interesting results were obtained using the convex constraint CD analysis proposed by Fasman on 9-mer peptides 79a–d, 80, 81, 86, and 87, which allowed us to determine the relative content of 310- and α-helical conformations. These results were fully supported by the corresponding x-ray and 2D-nmr analyses. As a striking example we found that the (S)- and (R)-β-tetralin derived amino acids (R)- and (S)-1 show excellent α-helix stabilisation, more pronounced than Aib and Ala. These novel reference peptide sequences should help establish a scale for natural and unnatural amino acids concerning their intrinsic 310- and α-helix compatibilities at different positions of medium-sized peptides and thus improve our understanding in the folding processes of peptides. © 1997 John Wiley & Sons, Inc. Biopoly 42: 575–626, 1997  相似文献   

3.
We studied whether the peptides of nine amino acids (9-mers) that are typically used in MHC class I presentation are sufficiently unique for self:nonself discrimination. The human proteome contains 28,783 proteins, comprising 107 distinct 9-mers. Enumerating distinct 9-mers for a variety of microorganisms we found that the average overlap, i.e., the probability that a foreign peptide also occurs in the human self, is about 0.2%. This self:nonself overlap increased when shorter peptides were used, e.g., was 30% for 6-mers and 3% for 7-mers. Predicting all 9-mers that are expected to be cleaved by the immunoproteasome and to be translocated by TAP, we find that about 25% of the self and the nonself 9-mers are processed successfully. For the HLA-A*0201 and HLA-A*0204 alleles, we predicted which of the processed 9-mers from each proteome are expected to be presented on the MHC. Both alleles prefer to present processed 9-mers to nonprocessed 9-mers, and both have small preference to present foreign peptides. Because a number of amino acids from each 9-mer bind the MHC, and are therefore not exposed to the TCR, antigen presentation seems to involve a significant loss of information. Our results show that this is not the case because the HLA molecules are fairly specific. Removing the two anchor residues from each presented peptide, we find that the self:nonself overlap of these exposed 7-mers resembles that of 9-mers. Summarizing, the 9-mers used in MHC class I presentation tend to carry sufficient information to detect nonself peptides amongst self peptides.  相似文献   

4.
Based on the well-known k-mer model, we propose a k-mer natural vector model for representing a genetic sequence based on the numbers and distributions of k-mers in the sequence. We show that there exists a one-to-one correspondence between a genetic sequence and its associated k-mer natural vector. The k-mer natural vector method can be easily and quickly used to perform phylogenetic analysis of genetic sequences without requiring evolutionary models or human intervention. Whole or partial genomes can be handled more effective with our proposed method. It is applied to the phylogenetic analysis of genetic sequences, and the obtaining results fully demonstrate that the k-mer natural vector method is a very powerful tool for analysing and annotating genetic sequences and determining evolutionary relationships both in terms of accuracy and efficiency.  相似文献   

5.
In human recurrent cutaneous herpes simplex, there is a sequential infiltrate of CD4 and then CD8 lymphocytes into lesions. CD4 lymphocytes are the major producers of the key cytokine IFN-gamma in lesions. They recognize mainly structural proteins and especially glycoproteins D and B (gD and gB) when restimulated in vitro. Recent human vaccine trials using recombinant gD showed partial protection of HSV seronegative women against genital herpes disease and also, in placebo recipients, showed protection by prior HSV1 infection. In this study, we have defined immunodominant peptide epitopes recognized by 8 HSV1(+) and/or 16 HSV2(+) patients using (51)Cr-release cytotoxicity and IFN-gamma ELISPOT assays. Using a set of 39 overlapping 20-mer peptides, more than six immunodominant epitopes were defined in gD2 (two to six peptide epitopes were recognized for each subject). Further fine mapping of these responses for 4 of the 20-mers, using a panel of 9 internal 12-mers for each 20-mers, combined with MHC II typing and also direct in vitro binding assay of these peptides to individual DR molecules, showed more than one epitope per 20-mers and promiscuous binding of individual 20-mers and 12-mers to multiple DR types. All four 20-mer peptides were cross-recognized by both HSV1(+)/HSV2(-) and HSV1(-)/HSV2(+) subjects, but the sites of recognition differed within the 20-mers where their sequences were divergent. This work provides a basis for CD4 lymphocyte cross-recognition of gD2 and possibly cross-protection observed in previous clinical studies and in vaccine trials.  相似文献   

6.
The discontinuous interleukin-10(IL-10)/interleukin-10 receptor (IL-10R) combining site was mapped using sets of overlapping peptides derived from both binding partners bound to continuous cellulose membranes. Low affinity binding of single regions of the discontinuous contact sites on IL-10 and IL-10R could be identified due to (1) high peptide density on the membrane support, (2) incubation with high protein concentrations, (3) indirect immunodetection of the ligates after electrotransfer onto polyvinylene difluoride membranes, and (4) use of highly overlapping peptide scans of different length (6-mers and 15-mers). The single binding regions identified for each protein species are separated in the protein sequences, but form continuous areas on the surface of IL-10 (X-ray structure) and IL-10R (computer model). Furthermore, four epitopes of neutralizing anti-IL-10 and anti-IL-10R antibodies were mapped and overlap with these binding regions. Soluble peptides (15- to 19-mers) each spanning one of the three identified IL-10-derived receptor binding regions displayed no significant affinity to IL-10R as expected, whereas a peptide (35-mer) comprising two of these regions had considerably higher binding activity. The data are consistent with a previously published computer model of the IL-10/IL-10R complex. This approach should be generally applicable for the mapping of non-linear protein-protein contact sites.  相似文献   

7.
Selectively deuterated transmembrane peptides comprising alternating leucine-alanine subunits were examined in fluid bilayer membranes by solid-state nuclear magnetic resonance (NMR) spectroscopy in an effort to gain insight into the behavior of membrane proteins. Two groups of peptides were studied: 21-mers having a 17-amino-acid hydrophobic domain calculated to be close in length to the hydrophobic thickness of 1-palmitoyl-2-oleoyl phosphatidylcholine and 26-mers having a 22-amino-acid hydrophobic domain calculated to exceed the membrane hydrophobic thickness. (2)H NMR spectral features similar to ones observed for transmembrane peptides from single-span receptors of higher animal cells were identified which apparently correspond to effectively monomeric peptide. Spectral observations suggested significant distortion of the transmembrane alpha-helix, and/or potential for restriction of rotation about the tilted helix long axis for even simple peptides. Quadrupole splittings arising from the 26-mer were consistent with greater peptide "tilt" than were those of the analogous 21-mer. Quadrupole splittings associated with monomeric peptide were relatively insensitive to concentration and temperature over the range studied, indicating stable average conformations, and a well-ordered rotation axis. At high peptide concentration (6 mol% relative to phospholipid) it appeared that the peptide predicted to be longer than the membrane thickness had a particular tendency toward reversible peptide-peptide interactions occurring on a timescale comparable with or faster than approximately 10(-5) s. This interaction may be direct or lipid-mediated and was manifest as line broadening. Peptide rotational diffusion rates within the membrane, calculated from quadrupolar relaxation times, T(2e), were consistent with such interactions. In the case of the peptide predicted to be equal to the membrane thickness, at low peptide concentration spectral lineshape indicated the additional presence of a population of peptide having rotational motion that was restricted on a timescale of 10(-5) s.  相似文献   

8.
Certain short peptides do not occur in humans and are rare or non-existent in the universal proteome. Antigens that contain rare amino acid sequences are in general highly immunogenic and may activate different arms of the immune system. We first generated a list of rare, semi-common, and common 5-mer peptides using bioinformatics tools to analyze the UniProtKB database. Experimental observations indicated that rare and semi-common 5-mers generated stronger cellular responses in comparison with common-occurring sequences. We hypothesized that the biological process responsible for this enhanced immunogenicity could be used to positively modulate immune responses with potential application for vaccine development. Initially, twelve rare 5-mers, 9-mers, and 13-mers were incorporated in frame at the end of an H5N1 hemagglutinin (HA) antigen and expressed from a DNA vaccine. The presence of some 5-mer peptides induced improved immune responses. Adding one 5-mer peptide exogenously also offered improved clinical outcome and/or survival against a lethal H5N1 or H1N1 influenza virus challenge in BALB/c mice and ferrets, respectively. Interestingly, enhanced anti-HBsAg antibody production by up to 25-fold in combination with a commercial Hepatitis B vaccine (Engerix-B, GSK) was also observed in BALB/c mice. Mechanistically, NK cell activation and dependency was observed with enhancing peptides ex vivo and in NK-depleted mice. Overall, the data suggest that rare or non-existent oligopeptides can be developed as immunomodulators and supports the further evaluation of some 5-mer peptides as potential vaccine adjuvants.  相似文献   

9.
Phage displaying random cyclic 7-mer, and linear 7-mer and 12-mer peptides at the N terminus of the coat protein, pIII, were panned with the murine monoclonal antibody, 9-2-L379 specific for meningococcal lipo-oligosaccharide. Five cyclic peptides with two sequence motifs, six linear 7-mers, and five linear 12-mers with different sequence motifs were identified. Only phage displaying cyclic peptides were specifically captured by and were antigenic for 9-2-L379. Monoclonal antibody 9-2-L379 exhibited "apparent" binding affinities to the cyclic peptides between 11 and 184 nm, comparable with lipo-oligosaccharide. All cyclic peptides competed with the binding of 9-2-L379 to lipo-oligosaccharide with EC(50) values in the range 10-105 microm, which correlated with their apparent binding affinities. Structural modifications of the cyclic peptides eliminated their ability to bind and compete with monoclonal antibody 9-2-L379. Mice (C3H/HeN) immunized with the cyclic peptide with optimal apparent binding affinity and EC(50) of competition elicited cross-reactive antibodies to meningococcal lipo-oligosaccharide with end point dilution serum antibody titers of 3200. Cyclic peptides were converted to T-cell-dependent immunogens without disrupting these properties by C-terminal biotinylation and complexing with NeutrAvidin. The data indicate that constrained peptides can cross-react with a carbohydrate-specific antibody with greater specificity than linear peptides, and critical to this specificity is their structural conformation.  相似文献   

10.
Choe J  Moyersoen J  Roach C  Carter TL  Fan E  Michels PA  Hol WG 《Biochemistry》2003,42(37):10915-10922
Glycosome biogenesis in trypanosomatids occurs via a process that is homologous to peroxisome biogenesis in other eukaryotes. Glycosomal matrix proteins are synthesized in the cytosol and imported posttranslationally. The import process involves a series of protein-protein interactions starting by recognition of glycosomal matrix proteins by a receptor in the cytosol. Most proteins to be imported contain so-called PTS-1 or PTS-2 targeting sequences recognized by, respectively, the receptor proteins PEX5 and PEX7. PEX14, a protein associated with the peroxisomal membrane, has been identified as a component of the docking complex and a point of convergence of the PEX5- and PEX7-dependent import pathways. In this paper, the strength of the interactions between Trypanosoma brucei PEX14 and PEX5 was studied by a fluorescence assay, using (i) a panel of N-terminal regions of TbPEX14 protein variants and (ii) a series of different peptides derived from TbPEX5, each containing one of the three WXXXF/Y motifs present in this receptor protein. On the PEX14 side, the N-terminal region of TbPEX14 including residues 1-84 appeared to be responsible for TbPEX5 binding. The results from PEX14 mutants identified specific residues in the N-terminal region of TbPEX14 involved in PEX5 binding and showed that in particular hydrophobic residues F35 and F52 are critical. On the PEX5 side, 13-mer peptides incorporating the first or the third WXXXF/Y motif bind to PEX14 with an affinity in the nanomolar range. However, the second WXXXF/Y motif peptide did not show any detectable affinity. Studies using variants of second and third motif peptides suggest that the alpha-helical content of the peptides as well as the charge of a residue at position 9 in the motif may be important for PEX14 binding. Assays with 7-, 10-, 13-, and 16-mer third motif peptides showed that 16-mers and 13-mers have comparable binding affinity for PEX14, whereas 10-mers and 7-mers have about 10- and 100-fold lower affinity than the 16-mers, respectively. The low sequence identities of PEX14 and PEX5 between parasite and its human host, and the vital importance of proper glycosome biogenesis to the parasite, render these peroxins highly promising drug targets.  相似文献   

11.

Background  

Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.  相似文献   

12.
 HLA-B*3501 is associated with subacute thyroiditis and fast progression of AIDS. An important prerequisite to investigate the T-cell recognition of HLA-B*3501-restricted antigens is the characterization of peptide-HLA-B*3501 interactions. In this study, peptide-HLA-B*3501 interactions were determined in quantitative peptide binding assays. The results were statistically analyzed to evaluate the influence of both anchor and nonanchor positions and the predictability of peptide binding. The binding data demonstrated that all anchor residues at position 2 and the C-terminus found in 9-mers functioned equally as anchors in 10-mers and 11-mers. These minimum requirements of peptide binding were refined by assessing positive and negative effects of nonanchor residues. Aliphatic hydrophobic residues at positions 3, 5, and 8 of 10-mers and position 3 of 11-mers significantly enhanced HLA-B*3501 binding. Similar effects rendered aromatic, bulky residues, acidic or polar residues of 11-mers at position 1 as well as at positions 4, 8, and 10, respectively. Negative effects were observed for residues carrying positively charged side-chains at position 7 of 11-mers. The refined HLA-B*3501 peptide binding motifs enhanced the identification of potential T-cell epitopes. The disparity between positive effects at the middle and C-terminal part (positions 5 – 8 and 10) of 11-mers and shorter peptides supports the extrusion of 11-mer residues at positions 5, 6, and 7, away from the HLA-B*3501 binding cleft. Received: 29 May 1996 / Revised: 5 August 1996  相似文献   

13.
The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.  相似文献   

14.
The activation function 2/ligand-dependent interaction between nuclear receptors and their coregulators is mediated by a short consensus motif, the so-called nuclear receptor (NR) box. Nuclear receptors exhibit distinct preferences for such motifs depending both on the bound ligand and on the NR box sequence. To better understand the structural basis of motif recognition, we characterized the interaction between estrogen receptor alpha and the NR box regions of the p160 coactivator TIF2. We have determined the crystal structures of complexes between the ligand-binding domain of estrogen receptor alpha and 12-mer peptides from the Box B2 and Box B3 regions of TIF2. Surprisingly, the Box B3 module displays an unexpected binding mode that is distinct from the canonical LXXLL interaction observed in other ligand-binding domain/NR box crystal structures. The peptide is shifted along the coactivator binding site in such a way that the interaction motif becomes LXXYL rather than the classical LXXLL. However, analysis of the binding properties of wild type NR box peptides, as well as mutant peptides designed to probe the Box B3 orientation, suggests that the Box B3 peptide primarily adopts the "classical" LXXLL orientation in solution. These results highlight the potential difficulties in interpretation of protein-protein interactions based on co-crystal structures using short peptide motifs.  相似文献   

15.
Synthetic peptides were used in this study to identify a structural element of apolipoprotein (apo) A-I that stimulates cellular cholesterol efflux and stabilizes the ATP binding cassette transporter A1 (ABCA1). Peptides (22-mers) based on helices 1 (amino acids 44-65) and 10 (amino acids 220-241) of apoA-I had high lipid binding affinity but failed to mediate ABCA1-dependent cholesterol efflux, and they lacked the ability to stabilize ABCA1. The addition of helix 9 (amino acids 209-219) to either helix 1 (creates a 1/9 chimera) or 10 (9/10 peptide) endowed cholesterol efflux capability and ABCA1 stabilization activity similar to full-length apoA-I. Adding helix 9 to helix 1 or 10 had only a small effect on lipid binding affinity compared with the 22-mer peptides, indicating that helix length and/or determinants on the polar surface of the amphipathic alpha-helices is important for cholesterol efflux. Cholesterol efflux was specific for the structure created by the 1/9 and 9/10 helical combinations, as 33-mers composed of helices 1 and 3 (1/3), 2/9, and 4/9 failed to mediate cholesterol efflux in an ABCA1-dependent manner. Transposing helices 9 and 10 (10/9 peptide) did not change the class Y structure, hydrophobicity, or amphiphilicity of the helical combination, but the topography of negatively charged amino acids on the polar surface was altered, and the 10/9 peptide neither mediated ABCA1-dependent cholesterol efflux nor stabilized ABCA1 protein. These results suggest that a specific structural element possessing a linear array of acidic residues spanning two apoA-I amphipathic alpha-helices is required to mediate cholesterol efflux and stabilize ABCA1.  相似文献   

16.
High‐throughput DNA methods hold great promise for the study of taxonomically intractable mesofauna of the soil. Here, we assess species diversity and community structure in a phylogenetic framework, by sequencing total DNA from bulk specimen samples and assembly of mitochondrial genomes. The combination of mitochondrial metagenomics and DNA barcode sequencing of 1494 specimens in 69 soil samples from three geographic regions in southern Iberia revealed >300 species of soil Coleoptera (beetles) from a broad spectrum of phylogenetic lineages. A set of 214 mitochondrial sequences longer than 3000 bp was generated and used to estimate a well‐supported phylogenetic tree of the order Coleoptera. Shorter sequences, including cox1 barcodes, were placed on this mitogenomic tree. Raw Illumina reads were mapped against all available sequences to test for species present in local samples. This approach simultaneously established the species richness, phylogenetic composition and community turnover at species and phylogenetic levels. We find a strong signature of vertical structuring in soil fauna that shows high local community differentiation between deep soil and superficial horizons at phylogenetic levels. Within the two vertical layers, turnover among regions was primarily at the tip (species) level and was stronger in the deep soil than leaf litter communities, pointing to layer‐mediated drivers determining species diversification, spatial structure and evolutionary assembly of soil communities. This integrated phylogenetic framework opens the application of phylogenetic community ecology to the mesofauna of the soil, among the most diverse and least well‐understood ecosystems, and will propel both theoretical and applied soil science.  相似文献   

17.
Zhao  Liang  Xie  Jin  Bai  Lin  Chen  Wen  Wang  Mingju  Zhang  Zhonglei  Wang  Yiqi  Zhao  Zhe  Li  Jinyan 《BMC genomics》2018,19(10):1-10
Background

NGS data contains many machine-induced errors. The most advanced methods for the error correction heavily depend on the selection of solid k-mers. A solid k-mer is a k-mer frequently occurring in NGS reads. The other k-mers are called weak k-mers. A solid k-mer does not likely contain errors, while a weak k-mer most likely contains errors. An intensively investigated problem is to find a good frequency cutoff f0 to balance the numbers of solid and weak k-mers. Once the cutoff is determined, a more challenging but less-studied problem is to: (i) remove a small subset of solid k-mers that are likely to contain errors, and (ii) add a small subset of weak k-mers, that are likely to contain no errors, into the remaining set of solid k-mers. Identification of these two subsets of k-mers can improve the correction performance.

Results

We propose to use a Gamma distribution to model the frequencies of erroneous k-mers and a mixture of Gaussian distributions to model correct k-mers, and combine them to determine f0. To identify the two special subsets of k-mers, we use the z-score of k-mers which measures the number of standard deviations a k-mer’s frequency is from the mean. Then these statistically-solid k-mers are used to construct a Bloom filter for error correction. Our method is markedly superior to the state-of-art methods, tested on both real and synthetic NGS data sets.

Conclusion

The z-score is adequate to distinguish solid k-mers from weak k-mers, particularly useful for pinpointing out solid k-mers having very low frequency. Applying z-score on k-mer can markedly improve the error correction accuracy.

  相似文献   

18.
To produce a large quantity of the angiotensin-converting-enzyme(ACE)-inhibiting peptide YG-1, which consists of ten amino acids derived from yeast glyceraldehyde-3-phosphate dehydrogenase, a high-level expression was explored with tandem multimers of the YG-1 gene in Escherichia coli. The genes encoding YG-1 were tandemly multimerized to 9-mers, 18-mers and 27-mers, in which each of the repeating units in the tandem multimers was connected to the neighboring genes by a DNA linker encoding Pro-Gly-Arg for the cleavage of multimers by clostripain. The multimers were cloned into the expression vector pET-21b, and expressed in E. coli BL21(DE3) with isopropyl β-d-thiogalactopyranoside induction. The expressed multimeric peptides encoded by the 9-mer, 18-mer and 27-mer accumulated intracellularly as inclusion bodies and comprised about 67%, 25% and 15% of the total proteins in E. coli respectively. The multimeric peptides expressed as inclusion bodies were cleaved with clostripain, and active monomers were purified to homogeneity by reversed-phase high-performance liquid chromatography. In total, 105 mg pure recombinant YG-1 was obtained from 1 l E. coli culture harboring pETYG9, which contained the 9-mer of the YG-1 gene. The recombinant YG-1 was identical to the natural YG-1 in molecular mass, amino acid sequence and ACE-inhibiting activity. Received: 6 January 1998 / Received revision: 23 February 1998 / Accepted: 24 February 1998  相似文献   

19.

Background  

The process of horizontal gene transfer (HGT) is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI) ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not.  相似文献   

20.
Viral metagenomics, also known as virome studies, have yielded an unprecedented number of novel sequences, essential in recognizing and characterizing the etiological agent and the origin of emerging infectious diseases. Several tools and pipelines have been developed, to date, for the identification and assembly of viral genomes. Assembly pipelines often result in viral genomes contaminated with host genetic material, some of which are currently deposited into public databases. In the current report, we present a group of deposited sequences that encompass ribosomal RNA (rRNA) contamination. We highlight the detrimental role of chimeric next generation sequencing reads, between host rRNA sequences and viral sequences, in virus genome assembly and we present the hindrances these reads may pose to current methodologies. We have further developed a refining pipeline, the Zero Waste Algorithm (ZWA) that assists in the assembly of low abundance viral genomes. ZWA performs context-depended trimming of chimeric reads, precisely removing their rRNA moiety. These, otherwise discarded, reads were fed to the assembly pipeline and assisted in the construction of larger and cleaner contigs making a substantial impact on current assembly methodologies. ZWA pipeline may significantly enhance virus genome assembly from low abundance samples and virus metagenomics approaches in which a small number of reads determine genome quality and integrity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号