首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Coenye  Tom; Vandamme  Peter 《DNA research》2005,12(4):221-233
The increasing availability of prokaryotic genome sequenceshas shown that simple sequence repeats (SSRs) are widespreadin prokaryotes and that there is extensive variation in theirlength, number and distribution. Considering their potentialimportance in generating genomic diversity, we determined thedistribution of a specific group of SSRs, mononucleotide repeatsof size between 5 and 13 nt, in 157 sequenced prokaryotic genomes.The data obtained in the present study show that (i) a largenumber of mononucleotide SSRs is present in all prokaryoticgenomes investigated, (ii) shorter repeats are much more abundantthan longer repeats, and (iii) in the majority of the genomes,longer mononucleotide SSRs are excluded from coding regionsalthough we identified several organisms where mononucleotideSSRs are not excluded from the coding regions. We also observedthat some genomes contain more mononucleotide SSRs than expected,while others contain significantly less. Bacterial genomes thatcontain much less mononucleotide SSRs than expected are generallylarger and more GC-rich, while bacterial genomes that containmuch more mononucleotide SSRs than expected are in general smallerand more AT-rich. Finally, we also noted that genomes that containa high fraction of horizontally transferred genes have a lowermononucleotide SSR density and that A and T are generally overrepresentedin mononucleotide SSRs.  相似文献   

4.
近期,从非编码RNA中发现具有肽编码能力的小开放阅读框(sORFs),激发了人们对这种长期被忽略的基因组元件的研究兴趣,sORFs迅速成为当前重点研究领域.由于表达水平及丰度低、序列短等因素,对肽编码sORFs的有效研究方法及数据资源还很缺乏,现有研究仅集中在少数真核模式生物,对自然界中广泛存在的原核生物研究非常少,肽编码sORFs的发现为目前精准背景下的基因组注释提出严峻挑战.在此背景下,本文首先系统研究了80余种不同类型原核生物中长度小于100个氨基酸的肽编码sORFs分布及功能特征,并对不同长度区间sORFs的序列组成、分布及进化特征进行了对比分析.结果表明,肽编码sORFs在原核生物基因组普遍存在,随着序列长度的降低,其序列复杂度降低,行使的生物功能也相对集中.在此基础上,进一步结合当前肽编码sORFs研究现状,深入总结了肽编码sORFs研究存在的问题及挑战,为今后肽编码sORFs研究奠定了坚实理论基础.  相似文献   

5.
Porphyromonas gingivalis, a Gram-negative asaccharolytic anaerobe, is a major causative organism of chronic periodontitis. Because the bacterium utilizes amino acids as energy and carbon sources and incorporates them mainly as dipeptides, a wide variety of dipeptide production processes mediated by dipeptidyl-peptidases (DPPs) should be beneficial for the organism. In the present study, we identified the fourth P. gingivalis enzyme, DPP5. In a dpp4-7-11-disrupted P. gingivalis ATCC 33277, a DPP7-like activity still remained. PGN_0756 possessed an activity indistinguishable from that of the mutant, and was identified as a bacterial orthologue of fungal DPP5, because of its substrate specificity and 28.5% amino acid sequence identity with an Aspergillus fumigatus entity. P. gingivalis DPP5 was composed of 684 amino acids with a molecular mass of 77,453, and existed as a dimer while migrating at 66 kDa on SDS-PAGE. It preferred Ala and hydrophobic residues, had no activity toward Pro at the P1 position, and no preference for hydrophobic P2 residues, showed an optimal pH of 6.7 in the presence of NaCl, demonstrated Km and kcat/Km values for Lys-Ala-MCA of 688 μm and 11.02 μm−1 s−1, respectively, and was localized in the periplasm. DPP5 elaborately complemented DPP7 in liberation of dipeptides with hydrophobic P1 residues. Examinations of DPP- and gingipain gene-disrupted mutants indicated that DPP4, DPP5, DPP7, and DPP11 together with Arg- and Lys-gingipains cooperatively liberate most dipeptides from nutrient oligopeptides. This is the first study to report that DPP5 is expressed not only in eukaryotes, but also widely distributed in bacteria and archaea.  相似文献   

6.
7.
The new field of synthetic biology aims at the creation of artificially designed organisms. A major breakthrough in the field was the generation of the artificial synthetic organism Mycoplasma mycoides JCVI‐syn3A. This bacterium possesses only 452 protein‐coding genes, the smallest number for any organism that is viable independent of a host cell. However, about one third of the proteins have no known function indicating major gaps in our understanding of simple living cells. To facilitate the investigation of the components of this minimal bacterium, we have generated the database SynWiki (http://synwiki.uni-goettingen.de/). SynWiki is based on a relational database and gives access to published information about the genes and proteins of M. mycoides JCVI‐syn3A. To gain a better understanding of the functions of the genes and proteins of the artificial bacteria, protein–protein interactions that may provide clues for the protein functions are included in an interactive manner. SynWiki is an important tool for the synthetic biology community that will support the comprehensive understanding of a minimal cell as well as the functional annotation of so far uncharacterized proteins.  相似文献   

8.
Typically, the assembly and closure of a complete bacterial genome requires substantial additional effort spent in a wet lab for gap resolution and genome polishing. Assembly is further confounded by subspecies polymorphism when starting from metagenome sequence data. In this paper, we describe an in silico gap-resolution strategy that can substantially improve assembly. This strategy resolves assembly gaps in scaffolds using pre-assembled contigs, followed by verification with read mapping. It is capable of resolving assembly gaps caused by repetitive elements and subspecies polymorphisms. Using this strategy, we realized the de novo assembly of the first two Dehalobacter genomes from the metagenomes of two anaerobic mixed microbial cultures capable of reductive dechlorination of chlorinated ethanes and chloroform. Only four additional PCR reactions were required even though the initial assembly with Newbler v. 2.5 produced 101 contigs within 9 scaffolds belonging to two Dehalobacter strains. By applying this strategy to the re-assembly of a recently published genome of Bacteroides, we demonstrate its potential utility for other sequencing projects, both metagenomic and genomic.  相似文献   

9.
Spliceosomal small nuclear ribonucleoproteins (snRNPs) in trypanosomes contain either the canonical heptameric Sm ring or variant Sm cores with snRNA-specific Sm subunits. Here we show biochemically by a combination of RNase H cleavage and tandem affinity purification that the U4 snRNP contains a variant Sm heteroheptamer core in which only SmD3 is replaced by SSm4. This U4-specific, nuclear-localized Sm core protein is essential for growth and splicing. As shown by RNA interference (RNAi) knockdown, SSm4 is specifically required for the integrity of the U4 snRNA and the U4/U6 di-snRNP in trypanosomes. In addition, we demonstrate by in vitro reconstitution of Sm cores that under stringent conditions, the SSm4 protein suffices to specify the assembly of U4 Sm cores. Together, these data indicate that the assembly of the U4-specific Sm core provides an essential step in U4/U6 di-snRNP biogenesis and splicing in trypanosomes.The excision of intronic sequences from precursor mRNAs is a critical step during eukaryotic gene expression. This reaction is catalyzed by the spliceosome, a macromolecular complex composed of small nuclear ribonucleoproteins (snRNPs) and many additional proteins. Spliceosome assembly and splicing catalysis occur in an ordered multistep process, which includes multiple conformational rearrangements (35). Spliceosomal snRNPs are assembled from snRNAs and protein components, the latter of which fall into two classes: snRNP-specific and common proteins. The common or canonical core proteins are also termed Sm proteins, specifically SmB, SmD1, SmD2, SmD3, SmE, SmF, and SmG (10; reviewed in reference 9), which all share an evolutionarily conserved bipartite sequence motif (Sm1 and Sm2) required for Sm protein interactions and the formation of the heteroheptameric Sm core complex around the Sm sites of the snRNAs (3, 7, 29). Prior to this, the Sm proteins form three heteromeric subcomplexes: SmD3/SmB, SmD1/SmD2, and SmE/SmF/SmG (23; reviewed in reference 34). Individual Sm proteins or Sm subcomplexes cannot stably interact with the snRNA. Instead, a stable subcore forms by an association of the subcomplexes SmD1/SmD2 and SmE/SmF/SmG with the Sm site on the snRNA; the subsequent integration of the SmD3/SmB heterodimer completes Sm core assembly.In addition to the canonical Sm proteins, other proteins carrying the Sm motif have been identified for many eukaryotes. Those proteins, termed LSm (like Sm) proteins, exist in distinct heptameric complexes that differ in function and localization. For example, a complex composed of LSm1 to LSm7 (LSm1-7) accumulates in cytoplasmic foci and participates in mRNA turnover (4, 8, 31). Another complex, LSm2-8, binds to the 3′ oligo(U) tract of the U6 snRNA in the nucleus (1, 15, 24). Finally, in the U7 snRNP, which is involved in histone mRNA 3′-end processing, the Sm proteins SmD1 and SmD2 are replaced by U7-specific LSm10 and LSm11 proteins, respectively (20, 21; reviewed in reference 28).This knowledge is based primarily on the mammalian system, where spliceosomal snRNPs are biochemically well characterized (34). In contrast, for trypanosomes, comparatively little is known about the components of the splicing machinery and their assembly and biogenesis. In trypanosomes, the expression of all protein-encoding genes, which are arranged in long polycistronic units, requires trans splicing. Only a small number of genes are additionally processed by cis splicing (reviewed in reference 11). During trans splicing, a short noncoding miniexon, derived from the spliced leader (SL) RNA, is added to each protein-encoding exon. Regarding the trypanosomal splicing machinery, the U2, U4/U6, and U5 snRNPs are considered to be general splicing factors, whereas the U1 and SL snRNPs represent cis- and trans-splicing-specific components, respectively. In addition to the snRNAs, many protein splicing factors in trypanosomes have been identified based on sequence homology (for example, see references 14 and 19).Recent studies revealed variations in the Sm core compositions of spliceosomal snRNPs from Trypanosoma brucei. Specifically, in the U2 snRNP, two of the canonical Sm proteins, SmD3 and SmB, are replaced by two novel, U2 snRNP-specific proteins, Sm16.5K and Sm15K (33). In this case, an unusual purine nucleotide, interrupting the central uridine stretch of the U2 snRNA Sm site, discriminates between the U2-specific and the canonical Sm cores. A second case of Sm core variation was reported for the U4 snRNP, in which a single protein, SmD3, was suggested to be replaced by the U4-specific LSm protein initially called LSm2, and later called SSm4, based on a U4-specific destabilization after SSm4 knockdown (30). A U4-specific Sm core variation was also previously suggested and discussed by Wang et al. (33), based on the inefficient pulldown of U4 snRNA through tagged SmD3 protein. However, neither of these two studies conclusively demonstrated by biochemical criteria that the specific Sm protein resides in the U4 Sm core; a copurification of other snRNPs could not be unequivocally ruled out.By using a combination of RNase H cleavage, tandem affinity purification, and mass spectrometry, we provide here direct biochemical evidence that in the variant Sm core of the U4 snRNP, only SmD3 is replaced by the U4-specific SSm4. SSm4 is nuclear localized, and the silencing of SSm4 leads to a characteristic phenotype: dramatic growth inhibition, general trans- and cis-splicing defects, a loss of the integrity of the U4 snRNA, as well as a destabilization of the U4/U6 di-snRNP. Furthermore, in vitro reconstitution assays revealed that under stringent conditions, SSm4 is sufficient to specify U4-specific Sm core assembly. In sum, our data establish SSm4 as a specific component of the U4 Sm core and demonstrate its importance in U4/U6 di-snRNP biogenesis, splicing function, and cell viability.  相似文献   

10.
The genus Echinochloa (Poaceae) includes numerous problematic weeds that cause the reduction of crop yield worldwide. To date, DNA sequence information is still limited in the genus Echinochloa. In this study, we completed the entire chloroplast genomes of two Echinochloa species (Echinochloa oryzicola and Echinochloa crus-galli) based on high-throughput sequencing data from their fresh green leaves. The two Echinochloa chloroplast genomes are 139,891 and 139,800 base pairs in length, respectively, and contain 131 protein-coding genes, 79 indels and 466 substitutions helpful for discrimination of the two species. The divergence between the genus Echinochloa and Panicum occurred about 21.6 million years ago, whereas the divergence between E. oryzicola and E. crus-galli chloroplast genes occurred about 3.3 million years ago. The two reported Echinochloa chloroplast genome sequences contribute to better understanding of the diversification of this genus.  相似文献   

11.
The genome of the soil-dwelling heterotrophic N2-fixing Gram-negative bacterium Azotobacter chroococcum NCIMB 8003 (ATCC 4412) (Ac-8003) has been determined. It consists of 7 circular replicons totalling 5,192,291 bp comprising a circular chromosome of 4,591,803 bp and six plasmids pAcX50a, b, c, d, e, f of 10,435 bp, 13,852, 62,783, 69,713, 132,724, and 311,724 bp respectively. The chromosome has a G+C content of 66.27% and the six plasmids have G+C contents of 58.1, 55.3, 56.7, 59.2, 61.9, and 62.6% respectively. The methylome has also been determined and 5 methylation motifs have been identified. The genome also contains a very high number of transposase/inactivated transposase genes from at least 12 of the 17 recognised insertion sequence families. The Ac-8003 genome has been compared with that of Azotobacter vinelandii ATCC BAA-1303 (Av-DJ), a derivative of strain O, the only other member of the Azotobacteraceae determined so far which has a single chromosome of 5,365,318 bp and no plasmids. The chromosomes show significant stretches of synteny throughout but also reveal a history of many deletion/insertion events. The Ac-8003 genome encodes 4628 predicted protein-encoding genes of which 568 (12.2%) are plasmid borne. 3048 (65%) of these show > 85% identity to the 5050 protein-encoding genes identified in Av-DJ, and of these 99 are plasmid-borne. The core biosynthetic and metabolic pathways and macromolecular architectures and machineries of these organisms appear largely conserved including genes for CO-dehydrogenase, formate dehydrogenase and a soluble NiFe-hydrogenase. The genetic bases for many of the detailed phenotypic differences reported for these organisms have also been identified. Also many other potential phenotypic differences have been uncovered. Properties endowed by the plasmids are described including the presence of an entire aerobic corrin synthesis pathway in pAcX50f and the presence of genes for retro-conjugation in pAcX50c. All these findings are related to the potentially different environmental niches from which these organisms were isolated and to emerging theories about how microbes contribute to their communities.  相似文献   

12.
基因组的开放阅读框(ORF)是基因识别与基因组分析的基础,有多种软件包给出了它们的生成算法,但结果与指标并不统一.本文给出了po-MORF的定义与它的生成算法,证明了由基因组所确定的po-MORF集合的存在与唯一性,并由该生成算法可以得到全部po-MORF序列.我们还比较了若干原核生物基因组中所有CDS与po-MORF序列的相互关系,并讨论了关于基因识别中的有关问题.  相似文献   

13.
Design and large-scale synthesis of DNA has been applied to the functional study of viral and microbial genomes. New and expanded technology development is required to unlock the transformative potential of such bottom-up approaches to the study of larger mammalian genomes. Two major challenges include assembling and delivering long DNA sequences. Here, we describe a workflow for de novo DNA assembly and delivery that enables functional evaluation of mammalian genes on the length scale of 100 kilobase pairs (kb). The DNA assembly step is supported by an integrated robotic workcell. We demonstrate assembly of the 101 kb human HPRT1 gene in yeast from 3 kb building blocks, precision delivery of the resulting construct to mouse embryonic stem cells, and subsequent expression of the human protein from its full-length human gene in mouse cells. This workflow provides a framework for mammalian genome writing. We envision utility in producing designer variants of human genes linked to disease and their delivery and functional analysis in cell culture or animal models.  相似文献   

14.
利用表型数据构建陆地棉核心种质   总被引:3,自引:0,他引:3  
以5963份陆地棉种质资源为材料,根据品种主要突变性状和品种类型分组成11组群,在分组的基础上利用21个表型性状,用非加权类平均聚类分析法,构建了281份陆地棉核心种质,占全部种质资源总量的4.71%。利用不同性状的均值t测验、方差F测验、变异系数、多样性指数t检验、均值、极差、表型方差、变异系数、均值差异百分率、方差差异百分率、极差符合率、变异系数变化率、主成分分析等参数进行核心种质代表性检验和评价。结果表明,所构建的陆地棉核心种质可以代表全部种质的遗传多样性。  相似文献   

15.
Now in its 52nd year of continuous operations, the Protein Data Bank (PDB) is the premiere open‐access global archive housing three‐dimensional (3D) biomolecular structure data. It is jointly managed by the Worldwide Protein Data Bank (wwPDB) partnership. The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) is funded by the National Science Foundation, National Institutes of Health, and US Department of Energy and serves as the US data center for the wwPDB. RCSB PDB is also responsible for the security of PDB data in its role as wwPDB‐designated Archive Keeper. Every year, RCSB PDB serves tens of thousands of depositors of 3D macromolecular structure data (coming from macromolecular crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and micro‐electron diffraction). The RCSB PDB research‐focused web portal (RCSB.org) makes PDB data available at no charge and without usage restrictions to many millions of PDB data consumers around the world. The RCSB PDB training, outreach, and education web portal (PDB101.RCSB.org) serves nearly 700 K educators, students, and members of the public worldwide. This invited Tools Issue contribution describes how RCSB PDB (i) is organized; (ii) works with wwPDB partners to process new depositions; (iii) serves as the wwPDB‐designated Archive Keeper; (iv) enables exploration and 3D visualization of PDB data via RCSB.org; and (v) supports training, outreach, and education via PDB101.RCSB.org. New tools and features at RCSB.org are presented using examples drawn from high‐resolution structural studies of proteins relevant to treatment of human cancers by targeting immune checkpoints.  相似文献   

16.
Comparisons of Two Large Phaeoviral Genomes and Evolutionary Implications   总被引:1,自引:0,他引:1  
The evolution of viral genomes has recently attracted considerable attention. We compare the sequences of two large viral genomes, EsV-1 and FirrV-1, belonging to the family of phaeoviruses which infect different species of marine brown algae. Although their genomes differ substantially in size, these viruses share similar morphologies and similar latent infection cycles. In fact, sequence comparisons show that the viruses have more than 60% of their genes in common. However, the order of genes is completely different in the two genomes, suggesting that extensive recombinational events in addition to several large deletions had occurred during the separate evolutionary routes from a common ancestor. We investigated genes encoding components of signal transduction pathways and genes encoding replicative functions in more detail. We found that the two genomes possess different, although overlapping, sets of genes in both classes, suggesting that different genes from each class were lost, perhaps randomly, after the separate evolution from an ancestral genome. Random loss would also account for the fact that more than one-third of the genes in one viral genome has no counterparts in the other genome. We speculate that the ancestral genome belonged to a cellular organism that had once invaded a primordial brown algal host.  相似文献   

17.
Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.  相似文献   

18.
果蝇细胞凋亡核心机制的基因组比较   总被引:1,自引:0,他引:1  
基因组比较研究是从基因组序列推测调控网络的主要途径。细胞凋亡信号网络是调控网络的一个典型代表。EGL1、CED3、CED4和CED9及其同源蛋白质的线虫和哺乳动物构成保守的凋亡核心机制。目前果蝇细胞凋亡核心机制尚不完整,还未找到EGL1和CED9类似蛋白质。通过一系列基于生物信息学的基因组比较分析,在果蝇的基因组数据库中发现了两个BCL2/CED9和一个EGL1的同源蛋白质的编码基因,并重构了果蝇  相似文献   

19.
Phenol extraction and cesium trifluoroacetate ultracentrifugation were compared for efficiency in the extraction of DNA from eggs and second-stage juveniles of four species of Meloidogyne. The second method proved to be more satisfactory in that it yielded larger amounts of DNA, shortened the extraction period, and reduced sample handling by eliminating phenol and ether extraction and RNAse treatment. It also made possible the extraction of DNA: from more than one sample at a time. The mean base compositions (% GC) of the total DNA of M. incognita, M. javanica, M. arenaria, and M. hapla, as determined by thermal denaturation tests, were quite similar, as they ranged only between 31 and 33%. Similarly, the thermal stability of the DNA of all four species covered a narrow range from 82.97 to 83.63 C.  相似文献   

20.
Plant viruses are known to infect most economically important crops and pose a major threat to global food security. Currently, few resistant host phenotypes have been delineated, and while chemicals are used for crop protection against insect pests and bacterial or fungal diseases, these are inefficient against viral diseases. Genetic engineering emerged as a way of modifying the plant genome by introducing functional genes in plants to improve crop productivity under adverse environmental conditions. Recently, new breeding technologies, and in particular the exciting CRISPR/Cas (clustered regularly interspaced short palindromic repeats/CRISPR‐associated proteins) technology, was shown to be a powerful alternative to engineer resistance against plant viruses, thus has great potential for reducing crop losses and improving plant productivity to directly contribute to food security. Indeed, it could circumvent the “Genetic modification” issues because it allows for genome editing without the integration of foreign DNA or RNA into the genome of the host plant, and it is simpler and more versatile than other new breeding technologies. In this review, we describe the predominant features of the major CRISPR/Cas systems and outline strategies for the delivery of CRISPR/Cas reagents to plant cells. We also provide an overview of recent advances that have engineered CRISPR/Cas‐based resistance against DNA and RNA viruses in plants through the targeted manipulation of either the viral genome or susceptibility factors of the host plant genome. Finally, we provide insight into the limitations and challenges that CRISPR/Cas technology currently faces and discuss a few alternative applications of the technology in virus research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号