首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequence similarity search is a fundamental way of analyzing nucleotide sequences. Despite decades of research, this is not a solved problem because there exist many similarities that are not found by current methods. Search methods are typically based on a seed-and-extend approach, which has many variants (e.g. spaced seeds, transition seeds), and it remains unclear how to optimize this approach. This study designs and tests seeding methods for inter-mammal and inter-insect genome comparison. By considering substitution patterns of real genomes, we design sets of multiple complementary transition seeds, which have better performance (sensitivity per run time) than previous seeding strategies. Often the best seed patterns have more transition positions than those used previously. We also point out that recent computer memory sizes (e.g. 60 GB) make it feasible to use multiple (e.g. eight) seeds for whole mammal genomes. Interestingly, the most sensitive settings achieve diminishing returns for human–dog and melanogaster–pseudoobscura comparisons, but not for human–mouse, which suggests that we still miss many human–mouse alignments. Our optimized heuristics find ∼20 000 new human–mouse alignments that are missing from the standard UCSC alignments. We tabulate seed patterns and parameters that work well so they can be used in future research.  相似文献   

2.
MOTIVATION: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses. RESULTS: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.  相似文献   

3.
4.
5.
We have examined the mouse genome sequence to determine its VH gene segment repertoire. In all, 141 segments are mapped to a 3 Mb region of chromosome 12. There is evidence that 92 of these are functional in the mouse strain used for the genome sequence, C57BL/6J; 12 are functional in other mouse strains, and 37 are pseudogenes. The mouse VH gene segment repertoire is therefore twice the size of that in humans. The mouse and human loci bear no large-scale similarity to each other. The 104 functional segments belong to one of the 15 known sequence subgroups, which have been further clustered into eight sets here. Seven of these sets, comprising 101 sequences, are related to five of the human VH families and have the same canonical structures in their hypervariable regions. Duplication of members of one set in the distal half of the locus is mainly responsible for the larger size of the mouse repertoire. Phylogenetic analysis of the VH segments indicates that most of the sequences in the human and mouse VH loci have arisen subsequent to the divergence of the two organisms from their common ancestor.  相似文献   

6.
DNA methylation is a pervasive epigenetic DNA modification that strongly affects chromatin regulation and gene expression. To date, it remains largely unknown how patterns of DNA methylation differ between closely related species and whether such differences contribute to species-specific phenotypes. To investigate these questions, we generated nucleotide-resolution whole-genome methylation maps of the prefrontal cortex of multiple humans and chimpanzees. Levels and patterns of DNA methylation vary across individuals within species according to the age and the sex of the individuals. We also found extensive species-level divergence in patterns of DNA methylation and that hundreds of genes exhibit significantly lower levels of promoter methylation in the human brain than in the chimpanzee brain. Furthermore, we investigated the functional consequences of methylation differences in humans and chimpanzees by integrating data on gene expression generated with next-generation sequencing methods, and we found a strong relationship between differential methylation and gene expression. Finally, we found that differentially methylated genes are strikingly enriched with loci associated with neurological disorders, psychological disorders, and cancers. Our results demonstrate that differential DNA methylation might be an important molecular mechanism driving gene-expression divergence between human and chimpanzee brains and might potentially contribute to the evolution of disease vulnerabilities. Thus, comparative studies of humans and chimpanzees stand to identify key epigenomic modifications underlying the evolution of human-specific traits.  相似文献   

7.
8.
Comparative analysis of processed pseudogenes in the mouse and human genomes   总被引:16,自引:0,他引:16  
Pseudogenes are important resources in evolutionary and comparative genomics because they provide molecular records of the ancient genes that existed in the genome millions of years ago. We have systematically identified approximately 5000 processed pseudogenes in the mouse genome, and estimated that approximately 60% are lineage specific, created after the mouse and human diverged. In both mouse and human genomes, similar types of genes give rise to many processed pseudogenes. These tend to be housekeeping genes, which are highly expressed in the germ line. Ribosomal-protein genes, in particular, form the largest sub-group. The processed pseudogenes in the mouse occur with a distinctly different chromosomal distribution than LINEs or SINEs - preferentially in GC-poor regions. Finally, the age distribution of mouse-processed pseudogenes closely resembles that of LINEs, in contrast to human, where the age distribution closely follows Alus (SINEs).  相似文献   

9.
The six hyaluronidase-like genes in the human and mouse genomes.   总被引:19,自引:0,他引:19  
The human genome contains six hyaluronidase-like genes. Three genes (HYAL1, HYAL2 and HYAL3) are clustered on chromosome 3p21.3, and another two genes (HYAL4 and PH-20/SPAM1) and one expressed pseudogene (HYALP1) are similarly clustered on chromosome 7q31.3. The extensive homology between the different hyaluronidase genes suggests ancient gene duplication, followed by en masse block duplication, events that occurred before the emergence of modern mammals. Very recently we have found that the mouse genome also has six hyaluronidase-like genes that are also grouped into two clusters of three, in regions syntenic with the human genome. Surprisingly, the mouse ortholog of HYALP1 does not contain any mutations, and unlike its human counterpart may actually encode an active enzyme. Hyal-1 is the only hyaluronidase in mammalian plasma and urine, and is also found at high levels in major organs such as liver, kidney, spleen, and heart. A model is proposed suggesting that Hyal-2 and Hyal-1 are the major mammalian hyaluronidases in somatic tissues, and that they act in concert to degrade high molecular weight hyaluronan to the tetrasaccharide. Twenty-kDa hyaluronan fragments are generated at the cell surface in unique endocytic vesicles resulting from digestion by the glycosylphosphatidyl-inositol-anchored Hyal-2, transported intracellularly by an unknown process, and then further digested by Hyal-1. The two beta-exoglycosidases, beta-glucuronidase and beta-N-acetyl glucosaminidase, remove sugars from reducing termini of hyaluronan oligomers, and supplement the hyaluronidases in the catabolism of hyaluronan.  相似文献   

10.
Shen X  Mao H  Miao S 《Génome》2011,54(2):144-150
cis-Elements CArG bound by serum response factor (SRF) are presently being intensively studied, but little is known about the substitution pattern of functional CArG elements. Here, we have performed the first evolutionary analysis of CArGome in the human and mouse genome through bioinformatic methods and statistical tests. We calculated the substitution rate at each site of the functional CArG elements. The results showed that the core sites of the functional CArG elements evolved faster than did the background DNA, indicating that these sites were likely to evolve under positive selection. Moreover, a strong TATA "motif" was evident in the core region within the functional CArG elements in both human and mouse promoters. This motif could probably be a major contribution to the formation of the spatial structure, which was important for CArG-SRF recognition. Thus, the study further revealed the sequence character and substitution pattern of CArG elements and provided useful information for the study of the SRF-binding efficiencies of CArG promoters in functional assays.  相似文献   

11.
Finger millet is an allotetraploid (2n = 4x = 36) grass that belongs to the Chloridoideae subfamily. A comparative analysis has been carried out to determine the relationship of the finger millet genome with that of rice. Six of the nine finger millet homoeologous groups corresponded to a single rice chromosome each. Each of the remaining three finger millet groups were orthologous to two rice chromosomes, and in all the three cases one rice chromosome was inserted into the centromeric region of a second rice chromosome to give the finger millet chromosomal configuration. All observed rearrangements were, among the grasses, unique to finger millet and, possibly, the Chloridoideae subfamily. Gene orders between rice and finger millet were highly conserved, with rearrangements being limited largely to single marker transpositions and small putative inversions encompassing at most three markers. Only some 10% of markers mapped to non-syntenic positions in rice and finger millet and the majority of these were located in the distal 14% of chromosome arms, supporting a possible correlation between recombination and sequence evolution as has previously been observed in wheat. A comparison of the organization of finger millet, Panicoideae and Pooideae genomes relative to rice allowed us to infer putative ancestral chromosome configurations in the grasses. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

12.
A comparative genome analysis on exon-intron distribution profiles is performed for human and mouse genomes to deduce similarities and differences between them. Interestingly, both in human and mouse genomes, the total length in introns and intergenic DNA on each chromosome is significantly correlated to the chromosome size. The results presented provide a framework for understanding the nature and patterns of exon-intron length distributions, the constraints on them and their role in genome design and evolution.  相似文献   

13.
Paralog gene trees, which reflect the increase of genomic complexity in the evolution, can be complicated and ambiguous. A simpler complementary approach is analysis of density distribution of paralog pairs. It can reveal general features of genome evolution, which may be hidden in the forest of gene trees. It is known that distribution of human paralog pairs along the axis of protein divergence between pair members forms two main peaks. Here I show that there are three main peaks in the mouse genome. Thus, the multimodality of paralog pair distribution seems to be a fundamental feature of mammalian genomes. Despite the great diversity of domains presented in small amounts or in multidomain architectures with a few predominant domains, both in human and mouse the first peak consists mostly of gene pairs with zinc finger domains or olfactory receptor domain. In the mouse the olfactory receptor predominates, which stipulates the three-peak distribution (since in the olfactory receptors the second peak is closer to the first peak than in other genes). The mammalian-wide zinc finger orthologs are biased towards the second peak. Thus, the marsupial orthologs are nearly absent in the first peak of human and mouse. The gene pairs in the first peak show a lower ratio of nonsynonymous to synonymous substitutions, which suggests that their evolution is more constrained. The plausible explanation is that they are in subfunctionalization state (partition of initial function of ancestral gene), whereas the second peak contains gene pairs that are already in neofunctionalization state (acquiring of novel functions). These data suggest that the adaptive radiation of mammals was accompanied by a burst of duplication of zinc finger genes, which are located in the first (most recent) peak of paralog pairs.  相似文献   

14.
ABSTRACT: BACKGROUND: Polyploidization is considered one of the main mechanisms of plant genome evolution. The presence of multiple copies of the same gene reduces selection pressure and permits sub-functionalization and neo-functionalization leading to plant diversification, adaptation and speciation. In bread wheat, polyploidization and the prevalence of transposable elements resulted in massive gene duplication and movement. As a result, the number of genes which are non-collinear to genomes of related species seems markedly increased in wheat. RESULTS: We used new-generation sequencing (NGS) to generate sequence of a Mb-sized region from wheat chromosome arm 3DS. Sequence assembly of 24 BAC clones resulted in two scaffolds of 1,264,820 and 333,768 bases. The sequence was annotated and compared to the homoeologous region on wheat chromosome 3B and orthologous loci of Brachypodium distachyon and rice. Among 39 coding sequences in the 3DS scaffolds, 32 have a homoeolog on chromosome 3B. In contrast, only fifteen and fourteen orthologs were identified in the corresponding regions in rice and Brachypodium, respectively. Interestingly, five pseudogenes were identified among the non-collinear coding sequences at the 3B locus, while none was found at the 3DS locus. CONCLUSION: Direct comparison of two Mb-sized regions of the B and D genomes of bread wheat revealed similar rates of non-collinear gene insertion in both genomes with a majority of gene duplications occurring before their divergence. Relatively low proportion of pseudogenes was identified among non-collinear coding sequences. Our data suggest that the pseudogenes did not originate from insertion of non-functional copies, but were formed later during the evolution of hexaploid wheat. Some evidence was found for gene erosion along the B genome locus.  相似文献   

15.
MOTIVATION: Proteins of the same class often share a secondary structure packing arrangement but differ in how the secondary structure units are ordered in the sequence. We find that proteins that share a common core also share local sequence-structure similarities, and these can be exploited to align structures with different topologies. In this study, segments from a library of local sequence-structure alignments were assembled hierarchically, enforcing the compactness and conserved inter-residue contacts but not sequential ordering. Previous structure-based alignment methods often ignore sequence similarity, local structural equivalence and compactness. RESULTS: The new program, SCALI (Structural Core ALIgnment), can efficiently find conserved packing arrangements, even if they are non-sequentially ordered in space. SCALI alignments conserve remote sequence similarity and contain fewer alignment errors. Clustering of our pairwise non-sequential alignments shows that recurrent packing arrangements exist in topologically different structures. For example, the three-layer sandwich domain architecture may be divided into four structural subclasses based on internal packing arrangements. These subclasses represent an intermediate level of structure classification, more general than topology, but more specific than architecture as defined in CATH. A strategy is presented for developing a set of predictive hidden Markov models based on multiple SCALI alignments.  相似文献   

16.
17.

Background  

Although originally thought to be less frequent in plants than in animals, alternative splicing (AS) is now known to be widespread in plants. Here we report the characteristics of AS in legumes, one of the largest and most important plant families, based on EST alignments to the genome sequences of Medicago truncatula (Mt) and Lotus japonicus (Lj).  相似文献   

18.
AFLP markers reveal high polymorphic rates in ryegrasses (Lolium spp.)   总被引:8,自引:0,他引:8  
An evaluation was performed of the potential use of AFLP markers to reveal polymorphisms among Lolium perenne plants with different degrees of kinship. Radioactive and fluorescent detection techniques were applied. The use of a fluorescent detection approach contributed greatly to the speed and ease of conducting and interpreting the AFLP patterns. The great discriminative power of AFLP markers and their capacity to represent genetic relationships among ryegrass plants was shown. Despite the high polymorphic value of the AFLP markers, standard statistical tests could not differentiate between two gene pools derived from different breeding programmes. It proved also impossible to correlate fodder and turf phenotypes with AFLP distance data. A very important point revealed by our data is the high degree of genetic diversity within commercial ryegrass varieties. Our findings are relevant to any outcrossing crop with a breeding strategy based on the production of synthetic populations.  相似文献   

19.
We aligned and analyzed 100 pairs of complete, orthologous intergenic regions from the human and mouse genomes (average length approximately 12 000 nucleotides). The alignments alternate between highly similar segments and dissimilar segments, indicating a wide variation of selective constraint. The average number of selectively constrained nucleotides within a mammalian intergenic region is at least 2000. This is threefold higher than within a nematode intergenic region and at least twofold higher than the number of selectively constrained nucleotides coding for an average protein. Because mammals possess only two- to threefold more proteins than Caenorhabditis elegans, the higher complexity of mammals might be primarily because of the functioning of intergenic DNA.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号