首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.  相似文献   

2.
Segmental duplications and copy-number variation in the human genome   总被引:33,自引:0,他引:33       下载免费PDF全文
The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders.  相似文献   

3.
Teshima KM  Innan H 《Genetics》2012,190(3):1077-1086
We develop a coalescent-based simulation tool to generate patterns of single nucleotide polymorphisms (SNPs) in a wide region encompassing both the original and duplicated genes. Selection on the new duplicated copy and interlocus gene conversion between the two copies are incorporated. This simulation enables us to explore how selection on duplicated copies affects the pattern of SNPs. The fixation of an advantageous duplicated copy causes a strong reduction in polymorphism not only in the duplicated copy but also in its flanking regions, which is a typical signature of a selective sweep by positive selection. After fixation, polymorphism gradually increases by accumulating neutral mutations and eventually reaches the equilibrium value if there is no gene conversion. When gene conversion is active, the number of SNPs in the duplicated copy quickly increases by transferring SNPs from the original copy; therefore, the time when we can recognize the signature of selection is decreased. Because this effect of gene conversion is restricted only to the duplicated region, more power to detect selection is expected if a flanking region to the duplicated copy is used.  相似文献   

4.
Horne I  Haritos VS 《Gene》2008,411(1-2):27-37
We have examined a highly dynamic section of the Drosophila melanogaster genome which contains neutral lipase family genes that have undergone multiple tandem duplication events. We have identified the orthologous clusters, encoding between five and eight apparently functional lipases, in other Drosophila genomes: yakuba, ananassae, pseudoobscura, virilis, mojavensis, persimilis, grimshawi and willistoni. We examined their gene structure, duplication and pseudogene formation, and the presence of transposable elements. Based on phylogenetic comparisons, the lipase genes contained in each of the clusters fall into four distinct clades. Clades I and II have distinct evolutionary constraints to clades III and IV. Multiple gene duplications have occurred in different lineages of clades I and II while clades III and IV contain a single lipase gene from each species. Compared with lipases from other clades, clade IV genes contain an additional 3' domain of tandemly repeated sequence of varying length and composition, and a substitution in the residue adjacent to the key catalytic serine in the encoded proteins. A comparison of non-synonymous to synonymous nucleotide substitution (dN/dS) rates within each clade showed the highest rate of divergence was between paralogous lipase gene pairs suggesting selection pressure on duplicated genes. Analysis of the encoded lipase protein sequences within each species using PAML identified positively selected sites; structure homology modeling based on human pancreatic lipase indicated many of these residues formed part of the active site of the enzyme. As some of the cluster lipase genes are known to be expressed in the insect midgut and respond to changes in dietary components, we propose that the lipase cluster has undergone dynamic evolutionary changes to maximize absorption of lipid nutrients from the diet.  相似文献   

5.
Island models and the coalescent process   总被引:2,自引:1,他引:1  
Using a coalescent approach, we derive several classical results and extend them to more general models. We find that the classic result for constant population size and constant migration rates holds in models with varying population size and varying migration rates with the obvious substitution of effective population size and mean migration fraction. In addition, the relationship of a 'local' F ST to local gene flow is derived. This result may be useful for analysing gene flow in a regional subset of a large global population, using only data from the regional subset.  相似文献   

6.
Ancient and recent duplications of the rainbow trout Wilms' tumor gene.   总被引:4,自引:0,他引:4  
The Wilms' tumor suppressor (WT1) gene plays an important role in the development and functioning of the genitourinary system, and mutations in this gene are associated with nephroblastoma formation in humans. Rainbow trout (Oncorhynchus mykiss) is one of the rare animal models that readily form nephroblastomas, yet trout express three distinct WT1 genes, one of which is duplicated and inherited tetrasomically. Sequence analyses suggest an ancient gene duplication in the common ancestor of bony fishes resulted in the formation of two WT1 gene families, that conserve the splicing variations of tetrapod WT1, and a second duplication event occurred in the trout lineage. The WT1 genes of one family map to linkage groups 6 and 27 in the trout genome map. Reverse transcribed polymerase chain reaction (RT-PCR) expression analysis demonstrated little difference in W  相似文献   

7.
H W Sheppard  G A Gutman 《Cell》1982,29(1):121-127
We have cloned DNA segments containing the Jk genes from LOUVAIN rat liver, and have determined their nucleotide sequence. Seven readily identifiable Jk-coding regions (six expressible) are evident in the rat, compared with five in the mouse (four expressible). The two additional J segments in the rat appear to be the result of two sequential gene duplications occurring since the divergence of rats and mice. The first involved a homologous but unequal crossing-over in a 14 bp region spanning the 3' end of the coding region of J1 and J2. The second involved a crossing-over following unequal pairing of the two newly duplicated regions. We propose that the probability of a second duplication was greatly increased following the first as a result of the increased target for unequal pairing (370 bp of good homology versus 27 bp in the original pairing). Comparisons of rat and mouse J genes show a surprisingly high degree of sequence conservation, both inside and outside the coding regions, similar to the pattern we reported previously for the kappa constant-region gene. This provides additional evidence that constraints exist on the nucleotide sequences of these genes independent of the function of the encoded proteins.  相似文献   

8.
We shall extend Kingman's coalescent to the geographically structured population model with migration among colonies. It is described by a continuous-time Markov chain, which is proved to be a dual process of the diffusion process of stepping-stone model. We shall derive a system of equations for the spatial distribution of a common ancestor of sampled genes from colonies and the mean time to getting to one common ancestor. These equations are solved in three particular models; a two-population model, the island model and the one-dimensional stepping-stone model with symmetric nearest-neighbour migration.  相似文献   

9.
Chordoma is a rare bone cancer that is believed to originate from notochordal remnants. We previously identified germline T duplication as a major susceptibility mechanism in several chordoma families. Recently, a common genetic variant in T (rs2305089) was significantly associated with the risk of sporadic chordoma. We sequenced all T exons in 24 familial cases and 54 unaffected family members from eight chordoma families (three with T duplications), 103 sporadic cases, and 160 unrelated controls. We also measured T copy number variation in all sporadic cases. We confirmed the association between the previously reported variant rs2305089 and risk of familial [odds ratio (OR) = 2.6, 95 % confidence interval (CI) = 0.93, 7.25, P = 0.067] and sporadic chordoma (OR = 2.85, 95 % CI = 1.89, 4.29, P < 0.0001). We also identified a second common variant, rs1056048, that was strongly associated with chordoma in families (OR = 4.14, 95 % CI = 1.43, 11.92, P = 0.0086). Among sporadic cases, another common variant (rs3816300) was significantly associated with risk when jointly analyzed with rs2305089. The association with rs3816300 was significantly stronger in cases with early age onset. In addition, we identified three rare variants that were only observed among sporadic chordoma cases, all of which have potential functional relevance based on in silico predictions. Finally, we did not observe T duplication in any sporadic chordoma case. Our findings further highlight the importance of the T gene in the pathogenesis of both familial and sporadic chordoma and suggest a complex susceptibility related to T.  相似文献   

10.
Structural variation is an important cause of genetic variation. Whole genome analysis techniques can efficiently identify copy-number variable regions but there is a need for targeted methods, to verify and accurately size variable regions, and to diagnose large sample cohorts. We have developed a technique based on multiplex amplification of size-coded selectively circularized genomic fragments, which is robust, cheaper and more rapid than current multiplex targeted copy-number assays.  相似文献   

11.

Background  

Rice is an important staple food and, with the smallest cereal genome, serves as a reference species for studies on the evolution of cereals and other grasses. Therefore, decoding its entire genome will be a prerequisite for applied and basic research on this species and all other cereals.  相似文献   

12.
We describe a forward-time haploid reproduction model with a constant population size that includes life history characteristics common to many marine organisms. We develop coalescent approximations for sample gene genealogies under this model and use these to predict patterns of genetic variation. Depending on the behavior of the underlying parameters of the model, the approximations are coalescent processes with simultaneous multiple mergers or Kingman’s coalescent. Using simulations, we apply our model to data from the Pacific oyster and show that our model predicts the observed data very well. We also show that a fact which holds for Kingman’s coalescent and also for general coalescent trees–that the most-frequent allele at a biallelic locus is likely to be the ancestral allele–is not true for our model. Our work suggests that the power to detect a “sweepstakes effect” in a sample of DNA sequences from marine organisms depends on the sample size.  相似文献   

13.
The interactions between tPA domains that are important for catalysis are poorly understood. We have probed the function of interdomain interactions by generating tPA variants in which domains are duplicated or rearranged. The proteins were expressed in a transient mammalian expression system and tested in vitro for their ability to activate plasminogen, induce fibrinolysis and bind to a forming fibrin clot. Duplication of the heavy chain domains of tPA produced enzymatically active tPA variants, many of which demonstrated similar in vitro amidolytic and fibrinolytic activity and similar fibrin affinity to the parent molecule. Zymographic analysis of the domain duplication tPA variants showed one major active species for each variant. Selection of the residues duplicated and the interdomain spacing were found to be critical considerations in the design of tPA variants with duplicated domains. We also rearranged the domains of tPA such that kringle 1 replaced the second kringle domain and vice versa. An analysis of these variants indicates that the first kringle domain can confer fibrin affinity to a tPA variant and function in place of kringle 2. Therefore, in wild-type tPA, the functions of kringle 1 and kringle 2 must be dependent partially on their orientation within the heavy chain of the protein. The functional autonomy of the heavy and light chains of tPA is demonstrated by the activity of a tPA variant in which the order of the heavy and light chains was reversed.  相似文献   

14.
A population genetic model with a single locus at which balancing selection acts and many linked loci at which neutral mutations can occur is analysed using the coalescent approach. The model incorporates geographic subdivision with migration, as well as mutation, recombination, and genetic drift of neutral variation. It is found that geographic subdivision can affect genetic variation even with high rates of migration, providing that selection is strong enough to maintain different allele frequencies at the selected locus. Published sequence data from the alcohol dehydrogenase locus of Drosophila melanogaster are found to fit the proposed model slightly better than a similar model without subdivision.  相似文献   

15.
Lohse K  Harrison RJ  Barton NH 《Genetics》2011,189(3):977-987
Analysis of genomic data requires an efficient way to calculate likelihoods across very large numbers of loci. We describe a general method for finding the distribution of genealogies: we allow migration between demes, splitting of demes [as in the isolation-with-migration (IM) model], and recombination between linked loci. These processes are described by a set of linear recursions for the generating function of branch lengths. Under the infinite-sites model, the probability of any configuration of mutations can be found by differentiating this generating function. Such calculations are feasible for small numbers of sampled genomes: as an example, we show how the generating function can be derived explicitly for three genes under the two-deme IM model. This derivation is done automatically, using Mathematica. Given data from a large number of unlinked and nonrecombining blocks of sequence, these results can be used to find maximum-likelihood estimates of model parameters by tabulating the probabilities of all relevant mutational configurations and then multiplying across loci. The feasibility of the method is demonstrated by applying it to simulated data and to a data set previously analyzed by Wang and Hey (2010) consisting of 26,141 loci sampled from Drosophila simulans and D. melanogaster. Our results suggest that such likelihood calculations are scalable to genomic data as long as the numbers of sampled individuals and mutations per sequence block are small.  相似文献   

16.
Structural variation is an important cause of genetic variation. Whole genome analysis techniques can efficiently identify copy-number variable regions but there is a need for targeted methods, to verify and accurately size variable regions, and to diagnose large sample cohorts. We have developed a technique based on multiplex amplification of size-coded selectively circularized genomic fragments, which is robust, cheaper and more rapid than current multiplex targeted copy-number assays.  相似文献   

17.
The genealogical structure of neutral populations in which reproductive success is highly-skewed has been the subject of many recent studies. Here we derive a coalescent dual process for a related class of continuous-time Moran models with viability selection. In these models, individuals can give birth to multiple offspring whose survival depends on both the parental genotype and the brood size. This extends the dual process construction for a multi-type Moran model with genic selection described in Etheridge and Griffiths (2009). We show that in the limit of infinite population size the non-neutral Moran models converge to a Markov jump process which we call the Λ-Fleming-Viot process with viability selection and we derive a coalescent dual for this process directly from the generator and as a limit from the Moran models. The dual is a branching-coalescing process similar to the Ancestral Selection Graph which follows the typed ancestry of genes backwards in time with real and virtual lineages. As an application, the transition functions of the non-neutral Moran and Λ-coalescent models are expressed as mixtures of the transition functions of the dual process.  相似文献   

18.
Journal of Mathematical Biology - Compact coalescent histories are combinatorial structures that describe for a given gene tree G and species tree S possibilities for the numbers of coalescences of...  相似文献   

19.
Endogenous small interfering RNAs (siRNAs) are a class of naturally occuring regulatory RNAs found in fungi, plants, and animals. Some endogenous siRNAs are required to silence transposons or function in chromosome segregation; however, the specific roles of most endogenous siRNAs are unclear. The helicase gene eri-6/7 was identified in the nematode Caenorhabditis elegans by the enhanced response to exogenous double-stranded RNAs (dsRNAs) of the null mutant. eri-6/7 encodes a helicase homologous to small RNA factors Armitage in Drosophila, SDE3 in Arabidopsis, and Mov10 in humans. Here we show that eri-6/7 mutations cause the loss of 26-nucleotide (nt) endogenous siRNAs derived from genes and pseudogenes in oocytes and embryos, as well as deficiencies in somatic 22-nucleotide secondary siRNAs corresponding to the same loci. About 80 genes are eri-6/7 targets that generate the embryonic endogenous siRNAs that silence the corresponding mRNAs. These 80 genes share extensive nucleotide sequence homology and are poorly conserved, suggesting a role for these endogenous siRNAs in silencing of and thereby directing the fate of recently acquired, duplicated genes. Unlike most endogenous siRNAs in C. elegans, eri-6/7-dependent siRNAs require Dicer. We identify that the eri-6/7-dependent siRNAs have a passenger strand that is ~19 nt and is inset by ~3-4 nts from both ends of the 26 nt guide siRNA, suggesting non-canonical Dicer processing. Mutations in the Argonaute ERGO-1, which associates with eri-6/7-dependent 26 nt siRNAs, cause passenger strand stabilization, indicating that ERGO-1 is required to separate the siRNA duplex, presumably through endonucleolytic cleavage of the passenger strand. Thus, like several other siRNA-associated Argonautes with a conserved RNaseH motif, ERGO-1 appears to be required for siRNA maturation.  相似文献   

20.

Background  

Most genes introduced into phototrophic eukaryotes during the process of endosymbiosis are either lost or relocated into the host nuclear genome. In contrast, gro EL homologues are found in different genome compartments among phototrophic eukaryotes. Comparative sequence analyses of recently available genome data, have allowed us to reconstruct the evolutionary history of these genes and propose a hypothesis that explains the unusual genome distribution of gro EL homologues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号