首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.  相似文献   

4.
Coenye  Tom; Vandamme  Peter 《DNA research》2005,12(4):221-233
The increasing availability of prokaryotic genome sequenceshas shown that simple sequence repeats (SSRs) are widespreadin prokaryotes and that there is extensive variation in theirlength, number and distribution. Considering their potentialimportance in generating genomic diversity, we determined thedistribution of a specific group of SSRs, mononucleotide repeatsof size between 5 and 13 nt, in 157 sequenced prokaryotic genomes.The data obtained in the present study show that (i) a largenumber of mononucleotide SSRs is present in all prokaryoticgenomes investigated, (ii) shorter repeats are much more abundantthan longer repeats, and (iii) in the majority of the genomes,longer mononucleotide SSRs are excluded from coding regionsalthough we identified several organisms where mononucleotideSSRs are not excluded from the coding regions. We also observedthat some genomes contain more mononucleotide SSRs than expected,while others contain significantly less. Bacterial genomes thatcontain much less mononucleotide SSRs than expected are generallylarger and more GC-rich, while bacterial genomes that containmuch more mononucleotide SSRs than expected are in general smallerand more AT-rich. Finally, we also noted that genomes that containa high fraction of horizontally transferred genes have a lowermononucleotide SSR density and that A and T are generally overrepresentedin mononucleotide SSRs.  相似文献   

5.
6.
Plastids are actively involved in numerous plant processes critical to growth, development and adaptation. They play a primary role in photosynthesis, pigment and monoterpene synthesis, gravity sensing, starch and fatty acid synthesis, as well as oil, and protein storage. We applied two complementary methods to analyze the recently published apple genome (Malus × domestica) to identify putative plastid-targeted proteins, the first using TargetP and the second using a custom workflow utilizing a set of predictive programs. Apple shares roughly 40% of its 10,492 putative plastid-targeted proteins with that of the Arabidopsis (Arabidopsis thaliana) plastid-targeted proteome as identified by the Chloroplast 2010 project and ∼57% of its entire proteome with Arabidopsis. This suggests that the plastid-targeted proteomes between apple and Arabidopsis are different, and interestingly alludes to the presence of differential targeting of homologs between the two species. Co-expression analysis of 2,224 genes encoding putative plastid-targeted apple proteins suggests that they play a role in plant developmental and intermediary metabolism. Further, an inter-specific comparison of Arabidopsis, Prunus persica (Peach), Malus × domestica (Apple), Populus trichocarpa (Black cottonwood), Fragaria vesca (Woodland Strawberry), Solanum lycopersicum (Tomato) and Vitis vinifera (Grapevine) also identified a large number of novel species-specific plastid-targeted proteins. This analysis also revealed the presence of alternatively targeted homologs across species. Two separate analyses revealed that a small subset of proteins, one representing 289 protein clusters and the other 737 unique protein sequences, are conserved between seven plastid-targeted angiosperm proteomes. Majority of the novel proteins were annotated to play roles in stress response, transport, catabolic processes, and cellular component organization. Our results suggest that the current state of knowledge regarding plastid biology, preferentially based on model systems is deficient. New plant genomes are expected to enable the identification of potentially new plastid-targeted proteins that will aid in studying novel roles of plastids.  相似文献   

7.
8.
细菌基因组上存在着大量的重叠基因,这不但缩减基因组尺寸,增加对遗传信息的有效利用,而且参与转录及转录后水平的调控。目前重叠基因的形成原因尚不清楚,缺少预测重叠基因是否存在的特征信息,不利于对 重叠基因的注释。本研究通过机器学习中的卷积神经网络算法对基因相关区域进行扫描,发现基因编码区前54 bp的区域可以作为判定重叠基因的标记信息,并采用支持向量机算法确证以上预测结果的准确性。通过对卷积神经网络模型的训练与优化,成功构建卷积神经网络模型,并用于大肠杆菌基因组中重叠基因的注释,对重叠基因的研究有重要意义。已训练好的模型和使用方法已经发布于GitHub,具体内容参看以下网址:https://github.com/breadpot/Convolutional_Neural_Network_Bacteria_overlapping_genes_prediction。  相似文献   

9.
P-type ATPases play essential roles in numerous processes, which in humans include nerve impulse propagation, relaxation of muscle fibers, secretion and absorption in the kidney, acidification of the stomach and nutrient absorption in the intestine. Published evidence suggests that uncharacterized families of P-type ATPases with novel specificities exist. In this study, the fully sequenced genomes of 26 eukaryotes, including animals, plants, fungi and unicellular eukaryotes, were analyzed for P-type ATPases. We report the organismal distributions, phylogenetic relationships, probable topologies and conserved motifs of nine functionally characterized families and 13 uncharacterized families of these enzyme transporters. We have classified these proteins according to the conventions of the functional and phylogenetic IUBMB-approved transporter classification system (, Saier et al. in Nucleic Acids Res 34:181–186, 2006; Nucleic Acids Res 37:274–278, 2009). Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

10.
Genome comparison permits identification of chromosome regionsconserved during evolution. Bacillus subtilis and Escherichiacoli are so distant that there exists veryfew conserved landmarksin their genome organisation. Analysis of the conserved cmkrpsA cluster pinpointed the importance of cytosine nucleotidemetabolism. In these bacteria, mRNA turnover provides an efficientmeans to fulfil the need for CDP as a precursor of DNA synthesis.The cmk rpsA operon is responsible for CDP synthesis. This functionis self-explained in the case of the cmk gene (which codes forcytidylate kinase). The case of rpsA, that codes for ribosomalprotein S1, is more subtle. It is suggested here that S1 isa RNA-binding protein helping polynucleotide phosphorylase (PNPase,known to be phylogenetically related to S1) to degrade mRNA,or helper molecule involved in other RNase activities. Thisprovides an explanation for the elusive function of PNPase,which generates nucleoside diphosphates (not monophosphates)when degrading RNA. This also accounts for the discoverythatthe B. subtilis comR gene product is PNPase. This article brieflydiscussesthe availabilityof cytosine nucleotides in eukaryotes, and suggeststhat they are derived from phospholipids turnover. Finally,the GC content of genomes is discussed in this new light.  相似文献   

11.
In this Genomics Era, vast amounts of next-generation sequencing data have become publicly available for multiple genomes across hundreds of species. Analyses of these large-scale datasets can become cumbersome, especially when comparing nucleotide polymorphisms across many samples within a dataset and among different datasets or organisms. To facilitate the exploration of allelic variation and diversity, we have developed and deployed an in-house computer software to categorize and visualize these haplotypes. The SNPViz software enables users to analyze region-specific haplotypes from single nucleotide polymorphism (SNP) datasets for different sequenced genomes. The examination of allelic variation and diversity of important soybean [Glycine max (L.) Merr.] flowering time and maturity genes may provide additional insight into flowering time regulation and enhance researchers'' ability to target soybean breeding for particular environments. For this study, we utilized two available soybean genomic datasets for a total of 72 soybean genotypes encompassing cultivars, landraces, and the wild species Glycine soja. The major soybean maturity genes E1, E2, E3, and E4 along with the Dt1 gene for plant growth architecture were analyzed in an effort to determine the number of major haplotypes for each gene, to evaluate the consistency of the haplotypes with characterized variant alleles, and to identify evidence of artificial selection. The results indicated classification of a small number of predominant haplogroups for each gene and important insights into possible allelic diversity for each gene within the context of known causative mutations. The software has both a stand-alone and web-based version and can be used to analyze other genes, examine additional soybean datasets, and view similar genome sequence and SNP datasets from other species.  相似文献   

12.
13.
The Archaeplastida consists of three lineages, Rhodophyta, Virideplantae and Glaucophyta. The extracellular matrix of most members of the Rhodophyta and Viridiplantae consists of carbohydrate-based or a highly glycosylated protein-based cell wall while the Glaucophyte covering is poorly resolved. In order to elucidate possible evolutionary links between the three advanced lineages in Archaeplastida, a genomic analysis was initiated. Fully sequenced genomes from the Rhodophyta and Virideplantae and the well-defined CAZy database on glycosyltransferases were included in the analysis. The number of glycosyltransferases found in the Rhodophyta and Chlorophyta are generally much lower then in land plants (Embryophyta). Three specific features exhibited by land plants increase the number of glycosyltransferases in their genomes: (1) cell wall biosynthesis, the more complex land plant cell walls require a larger number of glycosyltransferases for biosynthesis, (2) a richer set of protein glycosylation, and (3) glycosylation of secondary metabolites, demonstrated by a large proportion of family GT1 being involved in secondary metabolite biosynthesis. In a comparative analysis of polysaccharide biosynthesis amongst the taxa of this study, clear distinctions or similarities were observed in (1) N-linked protein glycosylation, i.e., Chlorophyta has different mannosylation and glucosylation patterns, (2) GPI anchor biosynthesis, which is apparently missing in the Rhodophyta and truncated in the Chlorophyta, (3) cell wall biosynthesis, where the land plants have unique cell wall related polymers not found in green and red algae, and (4) O-linked glycosylation where comprehensive orthology was observed in glycosylation between the Chlorophyta and land plants but not between the target proteins.  相似文献   

14.
Native ribosomal subunits separated by sucrose gradient centrifugation are able to associate. The particles so formed sediment between 59 s and 63 s (designated 61 s ribosomes).  相似文献   

15.
SK Behura  DW Severson 《PloS one》2012,7(8):e43111

Background

Codon bias is a phenomenon of non-uniform usage of codons whereas codon context generally refers to sequential pair of codons in a gene. Although genome sequencing of multiple species of dipteran and hymenopteran insects have been completed only a few of these species have been analyzed for codon usage bias.

Methods and Principal Findings

Here, we use bioinformatics approaches to analyze codon usage bias and codon context patterns in a genome-wide manner among 15 dipteran and 7 hymenopteran insect species. Results show that GAA is the most frequent codon in the dipteran species whereas GAG is the most frequent codon in the hymenopteran species. Data reveals that codons ending with C or G are frequently used in the dipteran genomes whereas codons ending with A or T are frequently used in the hymenopteran genomes. Synonymous codon usage orders (SCUO) vary within genomes in a pattern that seems to be distinct for each species. Based on comparison of 30 one-to-one orthologous genes among 17 species, the fruit fly Drosophila willistoni shows the least codon usage bias whereas the honey bee (Apis mellifera) shows the highest bias. Analysis of codon context patterns of these insects shows that specific codons are frequently used as the 3′- and 5′-context of start and stop codons, respectively.

Conclusions

Codon bias pattern is distinct between dipteran and hymenopteran insects. While codon bias is favored by high GC content of dipteran genomes, high AT content of genes favors biased usage of synonymous codons in the hymenopteran insects. Also, codon context patterns vary among these species largely according to their phylogeny.  相似文献   

16.
Comparison of two hydrolytic murein transglycosylases of Escherichia coli   总被引:8,自引:0,他引:8  
Escherichia coli has two murein transglycosylases, which are found in the soluble and the particulate fraction, respectively. The enzymes have been purified and have been shown to differ in some of their molecular properties [Mett, H., Keck, W., Funk, A. & Schwarz, U. (1980) J. Bacteriol. 144, 45-52]. We improved and simplified the purification procedure for the membrane-derived transglycosylase and characterized the two enzymes in more detail by peptide mapping and by immunological procedures. The peptide pattern obtained after tryptic digestion of the purified enzymes differed for the two enzymes. Antisera to the transglycosylases reacted only with their own antigen as shown by specific inhibition of the enzymatic activity, double immunodiffusion and by immunochemical staining of protein blots on nitrocellulose filters. Thus we conclude that the transglycosylases are two distinct proteins and that the one is not a precursor of the other.  相似文献   

17.
Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. We developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve (AUC) scores in the receiver operating curves than the integrative approaches. In the ten-fold cross-validations among randomly upset samples, Geptop yielded an AUC of 0.918, and in the cross-organism predictions for 19 organisms Geptop yielded AUC scores between 0.569 and 0.959. A test applied to the very recently determined essential gene dataset from the Porphyromonas gingivalis, which belongs to a phylum different with all of the above 19 bacterial genomes, gave an AUC of 0.77. Therefore, Geptop can be applied to any bacterial species whose genome has been sequenced. Compared with the essential genes uniquely identified by the lethal screening, the essential genes predicted only by Gepop are associated with more protein-protein interactions, especially in the three bacteria with lower AUC scores (<0.7). This may further illustrate the reliability and feasibility of our method in some sense. The web server and standalone version of Geptop are available at http://cefg.uestc.edu.cn/geptop/ free of charge. The tool has been run on 968 bacterial genomes and the results are accessible at the website.  相似文献   

18.
19.
20.
In order to characterize the dynamics of adaptation, it is important to be able to quantify how a population’s mean fitness changes over time. Such measurements are especially important in experimental studies of evolution using microbes. The Long-Term Evolution Experiment (LTEE) with Escherichia coli provides one such system in which mean fitness has been measured by competing derived and ancestral populations. The traditional method used to measure fitness in the LTEE and many similar experiments, though, is subject to a potential limitation. As the relative fitness of the two competitors diverges, the measurement error increases because the less-fit population becomes increasingly small and cannot be enumerated as precisely. Here, we present and employ two alternatives to the traditional method. One is based on reducing the fitness differential between the competitors by using a common reference competitor from an intermediate generation that has intermediate fitness; the other alternative increases the initial population size of the less-fit, ancestral competitor. We performed a total of 480 competitions to compare the statistical properties of estimates obtained using these alternative methods with those obtained using the traditional method for samples taken over 50,000 generations from one of the LTEE populations. On balance, neither alternative method yielded measurements that were more precise than the traditional method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号