首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.  相似文献   

2.
3.
Contrary to the classical view, a large amount of non-coding DNA seems to be selectively constrained in Drosophila and other species. Here, using Drosophila miranda BAC sequences and the Drosophila pseudoobscura genome sequence, we aligned coding and non-coding sequences between D. pseudoobscura and D. miranda, and investigated their patterns of evolution. We found two patterns that have previously been observed in comparisons between Drosophila melanogaster and its relatives. First, there is a negative correlation between intron divergence and intron length, suggesting that longer non-coding sequences may contain more regulatory elements than shorter sequences. Our other main finding is a negative correlation between the rate of non-synonymous substitutions (d N) and codon usage bias (F op), showing that fast-evolving genes have a lower codon usage bias, consistent with strong positive selection interfering with weak selection for codon usage.  相似文献   

4.
Toll-like receptor 2 (TLR2) plays an important role in the recognition of a variety of pathogenic microbes. In the present study, we compared polymorphisms of TLR2 locus in two closely related old world monkey species, rhesus monkey (Macaca mulatta) and Japanese monkey (Macaca fuscata). By nucleotide sequencing of the third exon of TLR2 gene from 21 to 35 respective individuals, we could assign 17 haplotype combinations of 17 coding SNPs of ten non-synonymous and seven synonymous substitutions. A non-synonymous substitution at codon position 326 appeared to be differentially fixed in each species, asparagine for M. mulatta whereas tyrosine for M. fuscata, and may contribute to certain functional properties because it locates in the region contributing to ligand binding and interaction with dimerization partner of TLR2-TLR1 heterodimeric complex. Although TLR2 alleles have diverged to similar extent in both species, they have evolved in significantly different ways; TLR2 of M. fuscata has undergone purifying selection while the membrane-proximal part of the extracellular domain of M. mulatta TLR2 exhibits higher rates of non-synonymous substitutions, indicating a trace of Darwinian positive selection.  相似文献   

5.
In bacteria, synonymous codon usage can be considerably affected by base composition at neighboring sites. Such context-dependent biases may be caused by either selection against specific nucleotide motifs or context-dependent mutation biases. Here we consider the evolutionary conservation of context-dependent codon bias across 11 completely sequenced bacterial genomes. In particular, we focus on two contextual biases previously identified in Escherichia coli; the avoidance of out-of-frame stop codons and AGG motifs. By identifying homologues of E. coli genes, we also investigate the effect of gene expression level in Haemophilus influenzae and Mycoplasma genitalium. We find that while context-dependent codon biases are widespread in bacteria, few are conserved across all species considered. Avoidance of out-of-frame stop codons does not apply to all stop codons or amino acids in E. coli, does not hold for different species, does not increase with gene expression level, and is not relaxed in Mycoplasma spp., in which the canonical stop codon, TGA, is recognized as tryptophan. Avoidance of AGG motifs shows some evolutionary conservation and increases with gene expression level in E. coli, suggestive of the action of selection, but the cause of the bias differs between species. These results demonstrate that strong context-dependent forces, both selective and mutational, operate on synonymous codon usage but that these differ considerably between genomes. Received: 6 May 1999 / Accepted: 29 October 1999  相似文献   

6.
Pairwise comparison of whole plastid and draft nuclear genomic sequences of Arabidopsis thaliana and Oryza sativa L. ssp. indica shows that rice nuclear genomic sequences contain homologs of plastid DNA covering about 94 kb (83%) of plastid genome and including one or more full-length intact (without mutations resulting in premature stop codons) homologues of 26 known protein-coding (KPC) plastid genes. By contrast, only about 20 kb (16%) of chloroplast DNA, including a single intact plastid-derived KPC gene, is presented in the nucleus of A. thaliana. Sixteen rice plastid genes have at least one nuclear copy without any mutation or with only synonymous substitutions. Nuclear copies for other ten plastid genes contain both synonymous and non-synonymous substitutions. Multiple ESTs for 25 out of 26 KPC genes were also found, as well as putative promoters for some of them. The study of substitutions pattern shows that some of nuclear homologues of plastid genes may be functional and/or are under the pressure of the positive natural selection. The similar comparative analysis performed on rice chromosome 1 revealed 27 contigs containing plastid-derived sequences, totalling about 84 kb and covering two thirds of chloroplast DNA, with the intact nuclear copies of 26 different KPC genes. One of these contigs, AP003280, includes almost 57 kb (45%) of chloroplast genome with the intact copies of 22 KPC genes. At the same time, we observed that relative locations of homologues in plastid DNA and the nuclear genome are significantly different.  相似文献   

7.
The disease caused by the apicomplexan protozoan parasite Theileria parva, known as East Coast fever or Corridor disease, is one of the most serious cattle diseases in Eastern, Central, and Southern Africa. We performed whole-genome sequencing of nine T. parva strains, including one of the vaccine strains (Kiambu 5), field isolates from Zambia, Uganda, Tanzania, or Rwanda, and two buffalo-derived strains. Comparison with the reference Muguga genome sequence revealed 34 814–121 545 single nucleotide polymorphisms (SNPs) that were more abundant in buffalo-derived strains. High-resolution phylogenetic trees were constructed with selected informative SNPs that allowed the investigation of possible complex recombination events among ancestors of the extant strains. We further analysed the dN/dS ratio (non-synonymous substitutions per non-synonymous site divided by synonymous substitutions per synonymous site) for 4011 coding genes to estimate potential selective pressure. Genes under possible positive selection were identified that may, in turn, assist in the identification of immunogenic proteins or vaccine candidates. This study elucidated the phylogeny of T. parva strains based on genome-wide SNPs analysis with prediction of possible past recombination events, providing insight into the migration, diversification, and evolution of this parasite species in the African continent.  相似文献   

8.
9.
Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a Drosophila melanogaster line, termed "Dark-fly", which has been maintained in constant dark conditions for 57 years (1400 generations). We found that Dark-fly exhibited higher fecundity in dark than in light conditions, indicating that Dark-fly possesses some traits advantageous in darkness. Using next-generation sequencing technology, we determined the whole genome sequence of Dark-fly and identified approximately 220,000 single nucleotide polymorphisms (SNPs) and 4,700 insertions or deletions (InDels) in the Dark-fly genome compared to the genome of the Oregon-R-S strain, a control strain. 1.8% of SNPs were classified as non-synonymous SNPs (nsSNPs: i.e., they alter the amino acid sequence of gene products). Among them, we detected 28 nonsense mutations (i.e., they produce a stop codon in the protein sequence) in the Dark-fly genome. These included genes encoding an olfactory receptor and a light receptor. We also searched runs of homozygosity (ROH) regions as putative regions selected during the population history, and found 21 ROH regions in the Dark-fly genome. We identified 241 genes carrying nsSNPs or InDels in the ROH regions. These include a cluster of alpha-esterase genes that are involved in detoxification processes. Furthermore, analysis of structural variants in the Dark-fly genome showed the deletion of a gene related to fatty acid metabolism. Our results revealed unique features of the Dark-fly genome and provided a list of potential candidate genes involved in environmental adaptation.  相似文献   

10.
Single nucleotide polymorphisms (SNPs) are a fundamental source of genomic variation. Large SNP panels have been developed for Prunus species. Fruit quality traits are essential peach breeding program objectives since they determine consumer acceptance, fruit consumption, industry trends and cultivar adoption. For many cultivars, these traits are negatively impacted by cold storage, used to extend fruit market life. The major symptoms of chilling injury are lack of flavor, off flavor, mealiness, flesh browning, and flesh bleeding. A set of 1,109 SNPs was mapped previously and 67 were linked with these complex traits. The prediction of the effects associated with these SNPs on downstream products from the ‘peach v1.0’ genome sequence was carried out. A total of 2,163 effects were detected, 282 effects (non-synonymous, synonymous or stop codon gained) were located in exonic regions (13.04 %) and 294 placed in intronic regions (13.59 %). An extended list of genes and proteins that could be related to these traits was developed. Two SNP markers that explain a high percentage of the observed phenotypic variance, UCD_SNP_1084 and UCD_SNP_46, are associated with zinc finger (C3HC4-type RING finger) family protein and AOX1A (alternative oxidase 1a) protein groups, respectively. In addition, phenotypic variation suggests that the observed polymorphism for SNP UCD_SNP_1084 [A/G] mutation could be a candidate quantitative trait nucleotide affecting quantitative trait loci for mealiness. The interaction and expression of affected proteins could explain the variation observed in each individual and facilitate understanding of gene regulatory networks for fruit quality traits in peach.  相似文献   

11.
In this study the molecular evolution of duplicated HoxA genes in zebrafish and fugu has been investigated. All 18 duplicated HoxA genes studied have a higher non-synonymous substitution rate than the corresponding genes in either bichir or paddlefish, where these genes are not duplicated. The higher rate of evolution is not due solely to a higher non-synonymous-to-synonymous rate ratio but to an increase in both the non-synonymous as well as the synonymous substitution rate. The synonymous rate increase can be explained by a change in base composition, codon usage, or mutation rate. We found no changes in nucleotide composition or codon bias. Thus, we suggest that the HoxA genes may experience an increased mutation rate following cluster duplication. In the non-Hox nuclear gene RAG1 only an increase in non-synonymous substitutions could be detected, suggesting that the increased mutation rate is specific to duplicated Hox clusters and might be related to the structural instability of Hox clusters following duplication. The divergence among paralog genes tends to be asymmetric, with one paralog diverging faster than the other. In fugu, all b-paralogs diverge faster than the a-paralogs, while in zebrafish Hoxa-13a diverges faster. This asymmetry corresponds to the asymmetry in the divergence rate of conserved non-coding sequences, i.e., putative cis-regulatory elements. These results suggest that the 5′ HoxA genes in the same cluster belong to a co-evolutionary unit in which genes have a tendency to diverge together. Reviewing Editor: Dr. Axel Meyer  相似文献   

12.
Abstract

Genes involved in the symbiotic interactions between the nitrogen-fixing endosymbiont Bradyrhizobium japonicum, and its leguminous host are mostly clustered in a symbiotic island (SI), acquired by the bacterium through a process of horizontal transfer. A comparative analysis of the codon and amino acid usage in core and SI genes/proteins of B. japonicum has been carried out in the present study. The mutational bias, translational selection, and gene length are found to be the major sources of variation in synonymous codon usage in the core genome as well as in SI, the strength of translational selection being higher in core genes than in SI. In core proteins, hydrophobicity is the main source of variation in amino acid usage, expressivity and aromaticity being the second and third important sources. But in SI proteins, aromaticity is the chief source of variation, followed by expressivity and hydrophobicity. In SI proteins, both the mean molecular weight and mean aromaticity of individual proteins exhibit significant positive correlation with gene expressivity, which violate the cost-minimization hypothesis. Investigation of nucleotide substitution patterns in B. japonicum and Mesorhizobium loti orthologous genes reveals that both synonymous and non-synonymous sites of highly expressed genes are more conserved than their lowly expressed counterparts and this conservation is more pronounced in the genes present in core genome than in SI.  相似文献   

13.
Regularities of context-dependent codon bias in eukaryotic genes   总被引:10,自引:1,他引:9       下载免费PDF全文
Nucleotides surrounding a codon influence the choice of this particular codon from among the group of possible synonymous codons. The strongest influence on codon usage arises from the nucleotide immediately following the codon and is known as the N1 context. We studied the relative abundance of codons with N1 contexts in genes from four eukaryotes for which the entire genomes have been sequenced: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. For all the studied organisms it was found that 90% of the codons have a statistically significant N1 context-dependent codon bias. The relative abundance of each codon with an N1 context was compared with the relative abundance of the same 4mer oligonucleotide in the whole genome. This comparison showed that in about half of all cases the context-dependent codon bias could not be explained by the sequence composition of the genome. Ranking statistics were applied to compare context-dependent codon biases for codons from different synonymous groups. We found regularities in N1 context-dependent codon bias with respect to the codon nucleotide composition. Codons with the same nucleotides in the second and third positions and the same N1 context have a statistically significant correlation of their relative abundances.  相似文献   

14.
Plant chloroplast genes have a codon use that reflects the genome compositional bias of a high A+T content with the single exception of the highly translatedpsbA gene which codes for the photosystem II D1 protein. The codon usage of plantpsbA corresponds more closely to the limited tRNA population of the chloroplast and is very similar to the codon use observed in the chloroplast genes of the green algaChlamydomonas reinhardtii. This pattern of codon use may be an adaptation for increased translation efficiency. A correspondence between codon use of plantpsbA andChlamydomonas chloroplast genes and the tRNAs coded by the chloroplast genome, however, is not observed in all synonymous codon groups. It is shown here that the degree of correspondence between codon use and tRNA population in different synonymous groups is correlated with the second codon position composition. Synonymous groups with an A or T at the second codon position have a high representation of codons for which a complementary tRNA is coded by the chloroplast genome. Those with a G or C at the second position have an increased representation of codons that bind a chloroplast tRNA by wobble. It is proposed that the difference between synonymous groups in terms of codon adaptation to the tRNA population in plantpsbA andChlamydomonas chloroplast genes may be the result of differences in second position composition.  相似文献   

15.
Codon usage patterns and phylogenetic relationships in the actin multigene family have been analyzed for three dipteran species—Drosophila melanogaster, Bactrocera dorsalis, and Ceratitis capitata. In certain phylogenetic tree reconstructions, using synonymous distances, some gene relationships are altered due to a homogenization phenomenon. We present evidence to show that this homogenization phenomenon is due to codon usage bias. A survey of the pattern of synonymous codon preferences for I I actin genes from these three species reveals that five out of the six Drosophila actin genes show high degrees of codon bias as indicated by scaled 2 values. In contrast to this, four out of the five actin genes from the other species have low codon bias values. A Monte Carlo contingency test indicates that for those Drosophila actin genes which exhibit codon bias, the patterns of codon usage are different compared to actin genes from the other species. In addition, the genes exhibiting codon bias also appear to have reduced rates of synonymous substitution. The homogenization phenomenon seen in terms of synonymous substitutions is not observed for nonsynonymous changes. Because of this homogenization phenomenon, trees constructed based on synonymous substitutions will be affected. These effects can be overt in the case of multigene families, but similar distortions may underlie reconstructions based on single-copy genes which exhibit codon usage bias.Correspondence to: M. He  相似文献   

16.
During the last decade, the Toll-like receptors (TLRs) have been extensively studied, and their immense importance in innate immunity is now being unveiled. Here, we report pronounced differences—probably reflecting the domestication process and differences in selective pressure—between wild boars and domestic pigs regarding single nucleotide polymorphisms (SNPs) in TLR genes. The open reading frames of TLR1, TLR2, and TLR6 were sequenced in 25 wild boars, representing three populations, and in 15 unrelated domestic pigs of Hampshire, Landrace, and Large White origin. In total, 20, 27, and 26 SNPs were detected in TLR1, TLR2, and TLR6, respectively. In TLR1 and TLR2, the numbers of SNPs detected were significantly lower (P?≤?0.05, P?≤?0.01) in the wild boars than in the domestic pigs. In the wild boars, one major high frequency haplotype was found in all three genes, while the same pattern was exhibited only by TLR2 in the domestic pigs. The relative frequency of non-synonymous (dN) and synonymous (dS) SNPs was lower for the wild boars than for the domestic pigs in all three genes. In addition, differences in diversity between the genes were revealed: the mean heterozygosity at the polymorphic positions was markedly lower in TLR2 than in TLR1 and TLR6. Because of its localization—in proximity of the bound ligand—one of the non-synonymous SNPs detected in TLR6 may represent species-specific function on the protein level. Furthermore, the codon usage pattern in the genes studied deviated from the general codon usage pattern in Sus scrofa.  相似文献   

17.
The Périgord black truffle (Tuber melanosporum Vittad.), considered a gastronomic delicacy worldwide, is an ectomycorrhizal filamentous fungus that is ecologically important in Mediterranean French, Italian and Spanish woodlands. In this study, we developed a novel resource of single nucleotide polymorphisms (SNPs) for T. melanosporum using Illumina high‐throughput resequencing. The genome from six T. melanosporum geographical accessions was sequenced to a depth of approximately 20×. These geographical accessions were selected from different populations within the northern and southern regions of the geographical species distribution. Approximately 80% of the reads for each of the six resequenced geographical accessions mapped against the reference T. melanosporum genome assembly, estimating the core genome size of this organism to be approximately 110 Mbp. A total of 442 326 SNPs corresponding to 3540 SNPs/Mbps were identified as being included in all seven genomes. The SNPs occurred more frequently in repeated sequences (85%), although 4501 SNPs were also identified in the coding regions of 2587 genes. Using the ratio of nonsynonymous mutations per nonsynonymous site (pN) to synonymous mutations per synonymous site (pS) and Tajima's D index scanning the whole genome, we were able to identify genomic regions and genes potentially subjected to positive or purifying selection. The SNPs identified represent a valuable resource for future population genetics and genomics studies.  相似文献   

18.
Sauvage C  Bierne N  Lapègue S  Boudry P 《Gene》2007,406(1-2):13-22
DNA sequence polymorphism and codon usage bias were investigated in a set of 41 nuclear loci in the Pacific oyster Crassostrea gigas. Our results revealed a very high level of DNA polymorphism in oysters, in the order of magnitude of the highest levels reported in animals to date. A total of 290 single nucleotide polymorphisms (SNPs) were detected, 76 of which being localised in exons and 214 in non-coding regions. Average density of SNPs was estimated to be one SNP every 60 bp in coding regions and one every 40 bp in non-coding regions. Non-synonymous substitutions contributed substantially to the polymorphism observed in coding regions. The non-synonymous to silent diversity ratio was 0.16 on average, which is fairly higher to the ratio reported in other invertebrate species recognised to display large population sizes. Therefore, purifying selection does not appear to be as strong as it could have been expected for a species with a large effective population size. The level of non-synonymous diversity varied greatly from one gene to another, in accordance with varying selective constraints. We examined codon usage bias and its relationship with DNA polymorphism. The table of optimal codons was deduced from the analysis of an EST dataset, using EST counts as a rough assessment of gene expression. As recently observed in some other taxa, we found a strong and significant negative relationship between codon bias and non-synonymous diversity suggesting correlated selective constraints on synonymous and non-synonymous substitutions. Codon bias as measured by the frequency of optimal codons for expression might therefore provide a useful indicator of the level of constraint upon proteins in the oyster genome.  相似文献   

19.

Background  

In many bacteria, intragenomic diversity in synonymous codon usage among genes has been reported. However, no quantitative attempt has been made to compare the diversity levels among different genomes. Here, we introduce a mean dissimilarity-based index (Dmean) for quantifying the level of diversity in synonymous codon usage among all genes within a genome.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号