首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. RESULTS: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. AVAILABILITY: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.  相似文献   

2.
3.
DoriC: a database of oriC regions in bacterial genomes   总被引:1,自引:0,他引:1  
Replication origins (oriCs) of bacterial genomes currently available in GenBank have been predicted by using a systematic method comprising the Z-curve analysis for nucleotide distribution asymmetry, DnaA box distribution, genes adjacent to candidate oriCs and phylogenetic relationships. These oriCs are organized into a MySQL database, DoriC, which provides extensive information and graphical views of the oriC regions. In addition, users can Blast a query sequence or even a whole genome against DoriC to find a homologous one. DoriC will be updated timely and the latest version is DoriC 1.8, in which oriCs of 425 genomes (468 chromosomes) are identified. AVAILABILITY: DoriC can be accessed from http://tubic.tju.edu.cn/doric/. SUPPLEMENTARY INFORMATION: Supplementary data are available at http://tubic.tju.edu.cn/doric/supplementary.htm.  相似文献   

4.
We have assessed the degree of relatedness of several portions of the Escherichia coli genome to the corresponding portions of the genomes of representative enteric bacteria, using the Southern transfer and hybridization technique (E. Southern, J. Mol. Biol. 98:503-517, 1975). The degree of relatedness varied among the regions examined. Judging both by the relative amounts of deoxyribonucleic acid in the various enteric genomes that are highly homologous and by the conservation of positions of restriction enzyme cleavage sites in these regions, the enteric genomes have diverged to greater extents in some parts of the genomes than in others. Portions of the genomes (including the tnaA and thyA genes, the trp operon, and one other unassigned segment) appear to have evolved in concert with the genome as a whole. By contrast, the lacZ gene and portions of the genome that are homologous to phage lambda vary more widely, perhaps reflecting a separate evolutionary origin for these segments of deoxyribonucleic acid.  相似文献   

5.
Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.  相似文献   

6.
The relatively small package capacity (less than 5 kb) of adeno-associated virus (AAV) vectors has been effectively doubled with the development of dual-vector heterodimerization approaches. However, the efficiency of such dual-vector systems is limited not only by the extent to which intermolecular recombination occurs between two independent vector genomes, but also by the directional bias required for successful transgene reconstitution following concatemerization. In the present study, we sought to evaluate the mechanisms by which inverted terminal repeat (ITR) sequences mediate intermolecular recombination of AAV genomes, with the goal of engineering more efficient vectors for dual-vector trans-splicing approaches. To this end, we generated a novel AAV hybrid-ITR vector characterized by an AAV-2 and an AAV-5 ITR at opposite ends of the viral genome. This hybrid genome was efficiently packaged into either AAV-2 or AAV-5 capsids to generate infectious virions. Hybrid AV2:5 ITR viruses had a significantly lower capacity to form circular intermediates in infected cells than homologous AV2:2 and AV5:5 ITR vectors despite their similar capacity to express an encoded enhanced green fluorescent protein (EGFP) transgene. To examine whether the divergent ITR sequences contained within hybrid AV2:5 ITR vectors could direct intermolecular recombination in a tail-to-head fashion, we generated two hybrid ITR trans-splicing vectors (AV5:2LacZdonor and AV2:5LacZacceptor). Each delivered one exon of a beta-galactosidase minigene flanked by donor or acceptor splice sequences. These hybrid trans-splicing vectors were compared to homologous AV5:5 and AV2:2 trans-splicing vector sets for their ability to reconstitute beta-galactosidase gene expression. Results from this comparison demonstrated that hybrid ITR dual-vector sets had a significantly enhanced trans-splicing efficiency (6- to 10-fold, depending on the capsid serotype) compared to homologous ITR vectors. Molecular studies of viral genome structures suggest that hybrid ITR vectors provide more efficient directional recombination due to an increased abundance of linear-form genomes. These studies provide direct evidence for the importance of ITR sequences in directing intermolecular and intramolecular homologous recombination of AAV genomes. The use of hybrid ITR AAV vector genomes provides new strategies to manipulate viral genome conversion products and to direct intermolecular recombination events required for efficient dual-AAV vector reconstitution of the transgene.  相似文献   

7.
孙高飞  何守朴  潘兆娥  杜雄明 《遗传》2015,37(2):192-203
SSRs(Simple sequence repeats)是一类广泛存在于动植物基因组的DNA短串联重复序列,是重要的基因组分子标记。比较不同基因组同源SSR的差异,有利于了解相近物种间的进化过程。文章使用雷蒙德氏棉基因组(D5)、亚洲棉基因组(A2)全基因组序列和陆地棉(AD1)的限制性酶切基因组测序数据,进行全基因组SSR扫描,比较了A组和D组的SSR分布情况,通过识别3个基因组之间的同源SSR,比较它们之间同源SSR重复序列的差异。结果发现,A组和D组同源SSR的分布规律非常相似,但A组与AD组的同源SSR保守性比D组与AD组同源SSR的保守性强。与AD组同源SSR相比,A组中重复序列长度增长的SSR数量约为长度缩短的SSR数量的5倍,在D组中这一比值约为3倍。可以推测,四倍体AD组在与A组、D组的平行进化过程中,由于基因组融合,导致SSR的重复序列长度变化速率与二倍体A、D组有差异,同时这种差异可能导致了AD组SSR重复序列长度在进化过程中与二倍体相比有变短的趋势。文章首次对3个棉花基因组的同源SSR进行了系统地比较,发现了同源SSR在棉属四倍体基因组和二倍体基因组中的显著差异,为进一步揭示棉属基因组的进化规律提供了基础。  相似文献   

8.
9.

Background

Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.

Results

We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.

Conclusions

PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos.
  相似文献   

10.
We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/.  相似文献   

11.
Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organellar genomes. Mitochondrial genomes have been extensively sequenced and analysed and the data collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc and MitoAln, two related databases containing, respectively, detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa and yeast, and the multiple alignments of the relevant homologous protein coding regions. MitoNuc and MitoAln retrieval through SRS at http://bio-www.ba.cnr.it:8000/srs6/ can easily allow the extraction of sequence data, subsequences defined by specific features and nucleotide or amino acid multiple alignments.  相似文献   

12.
We study the detection of mutations, sequencing errors, and homologous recombination events (HREs) in a set of closely related microbial genomes. We base the model on single nucleotide polymorphisms (SNPs) and break the genomes into blocks to handle the rearrangement problem. Then we apply a dynamic programming algorithm to model whether changes within each block are likely a result of mutations, sequencing errors, or HREs. Results from simulation experiments show that we can detect 31%–61% of HREs and the precision of our detection is about 48%–90% depending on the rates of mutation and missing data. The HREfinder software for predicting HREs in a set of whole genomes is available as open source (http://sourceforge.net/projects/hrefinder/).  相似文献   

13.
Comparison of heteroduplexes (HD) between DNAs of different transposable phages of Pseudomonas aeruginosa belonging to two previously described subgroups (D3112 and B3) revealed two types of structure (composition) of the bacteriophages, designated "type A" and "type B". The properties of genome structure of type A (phages of D3112 subgroup) are as follows: high level of conservation (up to 70% of genomes of different phages are represented as blocks of homologous DNA sequences); substitutions in genomes revealed as nonhomology regions in HD are, as a rule, small and located in certain sites; the distribution of the nonhomologous regions in HD of these phages is highly reproducible in independent experiments. Bacteriophages of subgroup B3 have genomes of type B: only a small part (approx. 30%) of genomes retain homology general for all of the phages; the nonhomologous regions are distributed in a large number of sites in HD; the sizes of nonhomologous regions are substantially larger than for the phages of subgroup D3112; distribution of the regions in HD is highly variable, which is characteristic of DNAs with partial homology. There is no difference between genomes of types A and B in G + C content (approx. 61-63%). Viable recombinants can be formed in crosses between phages of different genome types not only in regions with earlier revealed large DNA/DNA homology (right ends of genomes), but also in central portions of the genomes. Nevertheless, functional incompatibility of some regions of phage genomes of types A and B was demonstrated.  相似文献   

14.
The main focus of this article is to present the practical aspect of the code rules of variation and the search for a second set of genomic rules, including comparison of sequences to understand how to preserve compatible organisms in danger of extinction and how to generate biodiversity. Three new rules of variation are introduced: 1) homologous recombination, 2) a healthy fertile offspring, and 3) comparison of compatible genomes. The novel search in the natural world for fully compatible genomes capable of homologous recombination is explored by using examples of human polymorphisms and by the production of fertile offspring by crossbreeding. By a rational control of: natural crossbreeding of organisms with compatible genomes (something already happening in nature), the current work focuses on the generation of new varieties after a careful plan. This study is presented within the context of biosemiotics, which studies the processing of information, signalling and signs by living systems. I define a group of organisms having compatible genomes as a single theme: the genomic species or population, able to speak the same molecular language through different accents, with each variety within a theme being a different version of the same book. These studies have a molecular, compatible genetics context. Population and ecosystem biosemiotics will be exemplified by a possible genetic damage capable of causing mutations by breaking the rules of variation through the coordinated patterns of atoms present in the 9/11 World Trade Center contaminated dust (U, Ba, La, Ce, Sr, Rb, K, Mn, Mg, etc.), combination that may be able to overload the molecular quality control mechanisms of the human body. I introduce here the balance of codons in the circular genetic code: 2[1(1)+1(3)+1(4)+4(2)]=2[2(2)+3(4)].  相似文献   

15.
中间偃麦草染色体组构成的同工酶研究   总被引:8,自引:1,他引:7  
高明君  郝水 《遗传学报》1992,19(4):336-343
应用聚丙烯酰胺凝胶电泳,研究了带有不同染色体组的各种小麦和中间偃麦草的酯酶、苹果酸脱氢酶、酸性或碱性磷酸酶同工酶的酶谱。通过对各酶谱与染色体组的对比分析表明,中间偃麦草不含与小麦B组或D组同源的染色体,而可能含有两组分别与小麦A组和提莫菲维小麦G组有些同源性的染色体。中间偃麦草的染色体组构成可用E_(A1)E_(A2)N_G或E_(G1)E_(G2)N_A表示。  相似文献   

16.
The DNA sequences of the genomes of the bovine type 1 and human type 1a papillomaviruses were compared. The overall organization of both genomes is very similar. Three areas of maximal homology were found in the L1 and E1/E2 genes, and at the beginning of L2. The conservation of homologous amino acid sequences encoded in the open reading frames argues that these segments represent real genes or exons. Within these segments, however, only certain domains of the putative proteins are preferentially conserved. Two polypeptide chains show homologous arrangement of the cysteine residue clusters Cys-X-X-Cys, despite a lack of conservation of the rest of the amino acid sequence. A significant sequence divergence in a region where the three reading frames are open suggests that papillomavirus genomes have evolved not solely by accumulation of point mutations. Conserved sequences were also found in the noncoding region, and their possible involvement in regulation of viral gene expression is discussed.  相似文献   

17.
ABSTRACT: BACKGROUND: Escherichia coli is an important species of bacteria that can live as a harmless inhabitantof the guts of many animals, as a pathogen causing life-threatening conditions or freely inthe non-host environment. This diversity of lifestyles has made it a particular focus ofinterest for studies of genetic variation, mainly with the aim to understand how acommensal can become a deadly pathogen. Many whole genomes of E. coli have beenfully sequenced in the past few years, which offer helpful data to help understand how thisimportant species evolved. RESULTS: We compared 27 whole genomes encompassing four phylogroups of Escherichia coli (A,B1, B2 and E). From the core-genome we established the clonal relationships between theisolates as well as the role played by homologous recombination during their evolutionfrom a common ancestor. We found strong evidence for sexual isolation between three lineages (A+B1, B2, E), which could be explained by the ecological structuring of E. coliand may represent on-going speciation. We identified three hotspots of homologousrecombination, one of which had not been previously described and contains the aroCgene, involved in the essential shikimate metabolic pathway. We also described the roleplayed by non-homologous recombination in the pan-genome, and showed that thisprocess was highly heterogeneous. Our analyses revealed in particular that the genomes ofthree enterohaemorrhagic (EHEC) strains within phylogroup B1 have converged fromoriginally separate backgrounds as a result of both homologous and non-homologousrecombination. CONCLUSIONS: Recombination is an important force shaping the genomic evolution and diversification ofE. coli, both by replacing fragments of genes with an homologous sequence and also byintroducing new genes. In this study, several non-random patterns of these events wereidentified which correlated with important changes in the lifestyle of the bacteria, andtherefore provide additional evidence to explain the relationship between genomicvariation and ecological adaptation.  相似文献   

18.
The pattern of nucleotide substitution was examined at 2,129 orthologous loci among five genomes of Staphylococcus aureus, which included two sister pairs of closely related genomes (MW2/MSSA476 and Mu50/N315) and the more distantly related MRSA252. A total of 108 loci were unusual in lacking any synonymous differences among the five genomes; most of these were short genes encoding proteins highly conserved at the amino acid sequence level (including many ribosomal proteins) or unknown predicted genes. In contrast, 45 genes were identified that showed anomalously high divergence at synonymous sites. The latter genes were evidently introduced by homologous recombination from distantly related genomes, and in many cases, the pattern of nucleotide substitution made it possible to reconstruct the most probable recombination event involved. These recombination events introduced genes encoding proteins that differed in amino acid sequence and thus potentially in function. Several of the proteins are known or likely to be involved in pathogenesis (e.g., staphylocoagulase, exotoxin, Ser-Asp fibrinogen-binding bone sialoprotein-binding protein, fibrinogen and keratin-10 binding surface-anchored protein, fibrinogen-binding protein ClfA, and enterotoxin P). Therefore, the results support the hypothesis that exchange of homologous genes among S. aureus genomes can play a role in the evolution of pathogenesis in this species.  相似文献   

19.
The AFL genes (ABI3/VP1, FUS3 and LEC2) belong to the plant-specific B3 superfamily, playing important roles in regulating seed development and maturation. It is unclear, however, whether these genes appeared at the same time as the origin of seed plants and if all these genes are necessary and sufficient for seed development for all seed plants. By conducting a genome-wide comparative analysis of the putative AFL genes in various plant species, we found that the ABI3 homologous genes existed in all land plant genomes, but the FUS3 homologous were present only in seed plant genomes and the LEC2-like sequences only in dicot genomes. Phylogenetic analysis indicated that the AFL genes had undergone successive rounds of gene duplication and subsequent diversification during land plant evolution, resulting in the stepwise origin of the ABI3, FUS3 and LEC2 genes. Comparison of gene structure of the AFL genes revealed a trend of decreasing in the number of conserved domains from ABI3 to FUS3 and LEC2.  相似文献   

20.
There are four sequenced and publicly available plant genomes to date. With many more slated for completion, one challenge will be to use comparative genomic methods to detect novel evolutionary patterns in plant genomes. This research requires sequence alignment algorithms to detect regions of similarity within and among genomes. However, different alignment algorithms are optimized for identifying different types of homologous sequences. This review focuses on plant genome evolution and provides a tutorial for using several sequence alignment algorithms and visualization tools to detect useful patterns of conservation: conserved non-coding sequences, false positive noise, subfunctionalization, synteny, annotation errors, inversions and local duplications. Our tutorial encourages the reader to experiment online with the reviewed tools as a companion to the text.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号