首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Amplification of monomer sequences into long contiguous arrays is the main feature distinguishing satellite DNA from other tandem repeats, yet it is also the main obstacle in its investigation because these arrays are in principle difficult to assemble. Here we explore an alternative, assembly‐free approach that utilizes ultra‐long Oxford Nanopore reads to infer the length distribution of satellite repeat arrays, their association with other repeats and the prevailing sequence periodicities. Using the satellite DNA‐rich legume plant Lathyrus sativus as a model, we demonstrated this approach by analyzing 11 major satellite repeats using a set of nanopore reads ranging from 30 to over 200 kb in length and representing 0.73× genome coverage. We found surprising differences between the analyzed repeats because only two of them were predominantly organized in long arrays typical for satellite DNA. The remaining nine satellites were found to be derived from short tandem arrays located within LTR‐retrotransposons that occasionally expanded in length. While the corresponding LTR‐retrotransposons were dispersed across the genome, this array expansion occurred mainly in the primary constrictions of the L. sativus chromosomes, which suggests that these genome regions are favourable for satellite DNA accumulation.  相似文献   

2.
The amplifiable AUD1 element of Streptomyces lividans 66 consists of two copies of a 4.7 kb sequence flanked by three copies of a 1 kb sequence. The DNA sequences of the three 1 kb repeats were determined. Two copies (left and middle repeats) were identical: (1009 by in length) and the right repeat was 1012 bp long and differed at 63 positions. The repeats code for open reading frames (ORFs) with typical Streptomyces codon usage, which would encode proteins of about 36 kD molecular weight. The sequences of these ORFs suggest that they specify DNA-binding proteins and potential palindromic binding sites are found adjacent to the genes. The putative amplification protein encoded by the right repeat was expressed in Escherichia coli.  相似文献   

3.
Nucleotide‐binding (NB‐ARC), leucine‐rich‐repeat genes (NLRs) account for 60.8% of resistance (R) genes molecularly characterized from plants. NLRs exist as large gene families prone to tandem duplication and transposition, with high sequence diversity among crops and their wild relatives. This diversity can be a source of new disease resistance, but difficulty in distinguishing specific sequences from homologous gene family members hinders characterization of resistance for improving crop varieties. Current genome sequencing and assembly technologies, especially those using long‐read sequencing, are improving resolution of repeat‐rich genomic regions and clarifying locations of duplicated genes, such as NLRs. Using the conserved NB‐ARC domain as a model, 231 tentative NB‐ARC loci were identified in a highly contiguous genome assembly of sugar beet, revealing diverged and truncated NB‐ARC signatures as well as full‐length sequences. The NB‐ARC‐associated proteins contained NLR resistance gene domains, including TIR, CC and LRR, as well as other integrated domains. Phylogenetic relationships of partial and complete domains were determined, and patterns of physical clustering in the genome were evaluated. Comparison of sugar beet NB‐ARC domains to validated R‐genes from monocots and eudicots suggested extensive Beta vulgaris‐specific subfamily expansions. The NLR landscape in the rhizomania resistance conferring Rz region of Chromosome 3 was characterized, identifying 26 NLR‐like sequences spanning 20 MB. This work presents the first detailed view of NLR family composition in a member of the Caryophyllales, builds a foundation for additional disease resistance work in B. vulgaris, and demonstrates an additional nucleic‐acid‐based method for NLR prediction in non‐model plant species.  相似文献   

4.
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.  相似文献   

5.
The red‐spotted grouper Epinephelus akaara (E. akaara) is one of the most economically important marine fish in China, Japan and South‐East Asia and is a threatened species. The species is also considered a good model for studies of sex inversion, development, genetic diversity and immunity. Despite its importance, molecular resources for E. akaara remain limited and no reference genome has been published to date. In this study, we constructed a chromosome‐level reference genome of E. akaara by taking advantage of long‐read single‐molecule sequencing and de novo assembly by Oxford Nanopore Technology (ONT) and Hi‐C. A red‐spotted grouper genome of 1.135 Gb was assembled from a total of 106.29 Gb polished Nanopore sequence (GridION, ONT), equivalent to 96‐fold genome coverage. The assembled genome represents 96.8% completeness (BUSCO) with a contig N50 length of 5.25 Mb and a longest contig of 25.75 Mb. The contigs were clustered and ordered onto 24 pseudochromosomes covering approximately 95.55% of the genome assembly with Hi‐C data, with a scaffold N50 length of 46.03 Mb. The genome contained 43.02% repeat sequences and 5,480 noncoding RNAs. Furthermore, combined with several RNA‐seq data sets, 23,808 (99.5%) genes were functionally annotated from a total of 23,923 predicted protein‐coding sequences. The high‐quality chromosome‐level reference genome of E. akaara was assembled for the first time and will be a valuable resource for molecular breeding and functional genomics studies of red‐spotted grouper in the future.  相似文献   

6.
Although plant genome sizes are extremely diverse, the mechanism underlying the expansion of huge genomes that did not experience whole‐genome duplication has not been elucidated. The pepper, Capsicum annuum, is an excellent model for studies of genome expansion due to its large genome size (2700 Mb) and the absence of whole genome duplication. As most of the pepper genome structure has been identified as constitutive heterochromatin, we investigated the evolution of this region in detail. Our findings show that the constitutive heterochromatin in pepper was actively expanded 20.0–7.5 million years ago through a massive accumulation of single‐type Ty3/Gypsy‐like elements that belong to the Del subgroup. Interestingly, derivatives of the Del elements, such as non‐autonomous long terminal repeat retrotransposons and long‐unit tandem repeats, played important roles in the expansion of constitutive heterochromatic regions. This expansion occurred not only in the existing heterochromatic regions but also into the euchromatic regions. Furthermore, our results revealed a repeat of unit length 18–24 kb. This repeat was found not only in the pepper genome but also in the other solanaceous species, such as potato and tomato. These results represent a characteristic mechanism for large genome evolution in plants.  相似文献   

7.
Complete sequence and genomic analysis of murine gammaherpesvirus 68.   总被引:19,自引:13,他引:19       下载免费PDF全文
Murine gammaherpesvirus 68 (gammaHV68) infects mice, thus providing a tractable small-animal model for analysis of the acute and chronic pathogenesis of gammaherpesviruses. To facilitate molecular analysis of gammaHV68 pathogenesis, we have sequenced the gammaHV68 genome. The genome contains 118,237 bp of unique sequence flanked by multiple copies of a 1,213-bp terminal repeat. The GC content of the unique portion of the genome is 46%, while the GC content of the terminal repeat is 78%. The unique portion of the genome is estimated to encode at least 80 genes and is largely colinear with the genomes of Kaposi's sarcoma herpesvirus (KSHV; also known as human herpesvirus 8), herpesvirus saimiri (HVS), and Epstein-Barr virus (EBV). We detected 63 open reading frames (ORFs) homologous to HVS and KSHV ORFs and used the HVS/KSHV numbering system to designate these ORFs. gammaHV68 shares with HVS and KSHV ORFs homologous to a complement regulatory protein (ORF 4), a D-type cyclin (ORF 72), and a G-protein-coupled receptor with close homology to the interleukin-8 receptor (ORF 74). One ORF (K3) was identified in gammaHV68 as homologous to both ORFs K3 and K5 of KSHV and contains a domain found in a bovine herpesvirus 4 major immediate-early protein. We also detected 16 methionine-initiated ORFs predicted to encode proteins at least 100 amino acids in length that are unique to gammaHV68 (ORFs M1 to 14). ORF M1 has striking homology to poxvirus serpins, while ORF M11 encodes a potential homolog of Bcl-2-like molecules encoded by other gammaherpesviruses (gene 16 of HVS and KSHV and the BHRF1 gene of EBV). In addition, clustered at the left end of the unique region are eight sequences with significant homology to bacterial tRNAs. The unique region of the genome contains two internal repeats: a 40-bp repeat located between bp 26778 and 28191 in the genome and a 100-bp repeat located between bp 98981 and 101170. Analysis of the gammaHV68, HVS, EBV, and KSHV genomes demonstrated that each of these viruses have large colinear gene blocks interspersed by regions containing virus-specific ORFs. Interestingly, genes associated with EBV cell tropism, latency, and transformation are all contained within these regions encoding virus-specific genes. This finding suggests that pathogenesis-associated genes of gammaherpesviruses, including gammaHV68, may be contained in similarly positioned genome regions. The availability of the gammaHV68 genomic sequence will facilitate analysis of critical issues in gammaherpesvirus biology via integration of molecular and pathogenetic studies in a small-animal model.  相似文献   

8.
Bacteriophage B3 is a transposable phage of Pseudomonas aeruginosa. In this report, we present the complete DNA sequence and annotation of the B3 genome. DNA sequence analysis revealed that the B3 genome is 38,439 bp long with a G+C content of 63.3%. The genome contains 59 proposed open reading frames (ORFs) organized into at least three operons. Of these ORFs, the predicted proteins from 41 ORFs (68%) display significant similarity to other phage or bacterial proteins. Many of the predicted B3 proteins are homologous to those encoded by the early genes and head genes of Mu and Mu-like prophages found in sequenced bacterial genomes. Only two of the predicted B3 tail proteins are homologous to other well-characterized phage tail proteins; however, several Mu-like prophages and transposable phage D3112 encode approximately 10 highly similar proteins in their predicted tail gene regions. Comparison of the B3 genomic organization with that of Mu revealed evidence of multiple genetic rearrangements, the most notable being the inversion of the proposed B3 immunity/early gene region, the loss of Mu-like tail genes, and an extreme leftward shift of the B3 DNA modification gene cluster. These differences illustrate and support the widely held view that tailed phages are genetic mosaics arising by the exchange of functional modules within a diverse genetic pool.  相似文献   

9.
Identifying all essential genomic components is critical for the assembly of minimal artificial life. In the genome-reduced bacterium Mycoplasma pneumoniae, we found that small ORFs (smORFs; < 100 residues), accounting for 10% of all ORFs, are the most frequently essential genomic components (53%), followed by conventional ORFs (49%). Essentiality of smORFs may be explained by their function as members of protein and/or DNA/RNA complexes. In larger proteins, essentiality applied to individual domains and not entire proteins, a notion we could confirm by expression of truncated domains. The fraction of essential non-coding RNAs (ncRNAs) non-overlapping with essential genes is 5% higher than of non-transcribed regions (0.9%), pointing to the important functions of the former. We found that the minimal essential genome is comprised of 33% (269,410 bp) of the M. pneumoniae genome. Our data highlight an unexpected hidden layer of smORFs with essential functions, as well as non-coding regions, thus changing the focus when aiming to define the minimal essential genome.  相似文献   

10.
Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many cis correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity.  相似文献   

11.

Background  

Candida glabrata is a pathogenic yeast of increasing medical concern. It has been regarded as asexual since it was first described in 1917, yet phylogenetic analyses have revealed that it is more closely related to sexual yeasts than other Candida species. We show here that the C. glabrata genome contains many genes apparently involved in sexual reproduction.  相似文献   

12.
Spindle‐shaped halovirus His2 and spherical halovirus SH1 represent ecologically dominant virus morphotypes in high‐salt environments. Both have linear dsDNA genomes with inverted terminal repeat sequences and terminal proteins, and probably replicate using protein priming. As a first step towards conventional genetic analyses on these viruses, we show that purified viral DNAs can transfect host cells. Intact terminal proteins were essential for this process. Despite the narrow host ranges of these viruses, at least under laboratory conditions, their DNAs were able to transfect a wide range of haloarchaeal species, demonstrating that the cytoplasms of diverse haloarchaea possess all the factors necessary for viral DNA synthesis and virion assembly. Transposon mutagenesis of viral DNAs was then used in conjunction with transfection to produce recombinant viruses, and to then map the insertion sites to identify non‐essential genes. The inserts in 34 His2 mutants were mapped precisely, and most clustered in a few, specific regions, particularly in the inverted terminal repeats and near the ends of ORFs. The results are consistent with the small genome size and densely packed, often overlapping ORFs that are transcribed as long operons. This study is the first demonstration of transfection and transposon mutagenesis in protein‐primed archaeal viruses.  相似文献   

13.
研究测定并分析了红足壮异蝽Urochela quadrinotata Reuter的线粒体基因组全序列。该线粒体基因组全长16585bp(GenBank登录号为JQ743678),A+T含量为75.4%,共编码35个基因,包括13个蛋白质基因、20个tRNA基因(两个tRNA基因,即tRNAIle和tRNAGln未被检测到)、2个rRNA基因及一段较长的非编码区(控制区,亦称A+T富含区)。基因排序与大部分昆虫的线粒体基因排列方式相同,没有发生基因重排。除tRNASer(AGN)的DHU臂无法形成典型的茎环结构,其余tRNA基因均能稳定形成典型的三叶草二级结构。预测了红足壮异蝽16S和12S rRNA的二级结果,分别包括6个结构域43个茎环和3个结构域27茎环。控制区含一个长1652bp的串联重复区域,由16个串联重复单元组成。  相似文献   

14.
Complete sequence determination of the brachiopod Lingula anatina mtDNA (28,818 bp) revealed an organization that is remarkably atypical for an animal mt-genome. In addition to the usual set of 37 animal mitochondrial genes, which make up only 57% (16,555 bp) of the entire sequence, the genome contains lengthy unassigned sequences. All the genes are encoded in the same DNA strand, generally in a compact way, whereas the overall gene order is highly divergent in comparison with known animal mtDNA. Individual genes are generally longer and deviate considerably in sequence from their homologues in other animals. The genome contains two major repeat regions, in which 11 units of unassigned sequences and six genes (atp8, trnM, trnQ, trnV, and part of cox2 and nad2) are found in repetition, in the form of nested direct repeats of unparalleled complexity. One of the repeat regions contains unassigned repeat units dispersed among several unique sequences, novel repetitive structure for animal mtDNAs. Each of those unique sequences contains an open reading frame for a polypeptide between 80 and 357 amino acids long, potentially encoding a functional molecule, but none of them has been identified with known proteins. In both repeat regions, tRNA genes or tRNA gene-like sequences flank major repeated units, supporting the view that those structures play a role in the mitochondrial gene rearrangements. Although the intricate repeated organization of this genome can be explained by recurrent tandem duplications and subsequent deletions mediated by replication errors, other mechanisms, such as nonhomologous recombinations, appear to explain certain structures more easily.  相似文献   

15.
16.
Caper spurge, Euphorbia lathyris L., is an important energy crop and medicinal crop. Here, we generated a high-quality, chromosome-level genome assembly of caper spurge using Oxford Nanopore sequencing, Illumina sequencing, and Hi-C technology. The final genome assembly was ∼988.9 Mb in size, 99.8% of which could be grouped into 10 pseudochromosomes, with contig and scaffold N50 values of 32.6 and 95.7 Mb, respectively. A total of 651.4 Mb repetitive sequences and 36,342 protein-coding genes were predicted in the genome assembly. Comparative genomic analysis showed that caper spurge and castor bean clustered together. We found that no independent whole-genome duplication event had occurred in caper spurge after its split from the castor bean, and recent substantial amplification of long terminal repeat retrotransposons has contributed significantly to its genome expansion. Furthermore, based on gene homology searching, we identified a number of candidate genes involved in the biosynthesis of fatty acids and triacylglycerols. The reference genome presented here will be highly useful for the further study of the genetics, genomics, and breeding of this high-value crop, as well as for evolutionary studies of spurge family and angiosperms.  相似文献   

17.
Jin X  Wang R  Xu T  Shi G 《Mitochondrial DNA》2012,23(2):142-144
The complete mitochondrial genome (mitogenome) of Oxuderces dentatus was determined first. The genome was 17,116?bp in length and consisted of 13 protein-coding genes, 22 tRNA genes, 2 ribosomal RNA genes, and 2 main non-coding regions [the control region (CR) and the origin of the light strand replication], the gene composition and order of which was similar to most other vertebrates. The overall base composition of the heavy strand was T 27.9%, C 26.8%, A 30.2%, and G 15.1%, with a slight A+T bias of 58.1%. In addition to the discrete and conserved sequence blocks, unusual long tandem repeat unit (three 150-bp tandem repeat units and an incomplete copy of 146?bp) was also detected within CR. This mitogenome sequence data would play an important role in population genetics and phylogenetic analysis of the Gobioidei.  相似文献   

18.
Haloxylon ammodendron is a xerophytic perennial shrub or small tree that has a high ecological value in anti-desertification due to its high tolerance to drought and salt stress. Here, we report a high-quality, chromosome-level genome assembly of H. ammodendron by integrating PacBio’s high-fidelity sequencing and Hi-C technology. The assembled genome size was 685.4 Mb, of which 99.6% was assigned to nine pseudochromosomes with a contig N50 value of 23.6 Mb. Evolutionary analysis showed that both the recent substantial amplification of long terminal repeat retrotransposons and tandem gene duplication may have contributed to its genome size expansion and arid adaptation. An ample amount of low-GC genes was closely related to functions that may contribute to the desert adaptation of H. ammodendron. Gene family clustering together with gene expression analysis identified differentially expressed genes that may play important roles in the direct response of H. ammodendron to water-deficit stress. We also identified several genes possibly related to the degraded scaly leaves and well-developed root system of H. ammodendron. The reference-level genome assembly presented here will provide a valuable genomic resource for studying the genome evolution of xerophytic plants, as well as for further genetic breeding studies of H. ammodendron.  相似文献   

19.
20.

Background

Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

Results

Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

Conclusion

Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号