首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The National Children’s Study (NCS) is a prospective epidemiological study in the USA tasked with identifying a nationally representative sample of 100,000 children, and following them from their gestation until they are 21 years of age. The objective of the study is to measure environmental and genetic influences on growth, development, and health. Determination of the ancestry of these NCS participants is important for assessing the diversity of study participants and for examining the effect of ancestry on various health outcomes.

Results

We estimated the genetic ancestry of a convenience sample of 641 parents enrolled at the 7 original NCS Vanguard sites, by analyzing 30,000 markers on exome arrays, using the 1000 Genomes Project superpopulations as reference populations, and compared this with the measures of self-reported ethnicity and race. For 99% of the individuals, self-reported ethnicity and race agreed with the predicted superpopulation. NCS individuals self-reporting as Asian had genetic ancestry of either South Asian or East Asian groups, while those reporting as either Hispanic White or Hispanic Other had similar genetic ancestry. Of the 33 individuals who self-reported as Multiracial or Non-Hispanic Other, 33% matched the South Asian or East Asian groups, while these groups represented only 4.4% of the other reported categories.

Conclusions

Our data suggest that self-reported ethnicity and race have some limitations in accurately capturing Hispanic and South Asian populations. Overall, however, our data indicate that despite the complexity of the US population, individuals know their ancestral origins, and that self-reported ethnicity and race is a reliable indicator of genetic ancestry.  相似文献   

2.
This article combines social and genetic epidemiology to examine the influence of self-reported ethnicity on body mass index (BMI) among a sample of adolescents and young adults. We use genetic information from more than 5,000 single nucleotide polymorphisms in combination with principal components analysis to characterize population ancestry of individuals in this study. We show that non-Hispanic white and Mexican-American respondents differ significantly with respect to BMI and differ on the first principal component from the genetic data. This first component is positively associated with BMI and accounts for roughly 3% of the genetic variance in our sample. However, after controlling for this genetic measure, the observed ethnic differences in BMI remain large and statistically significant. This study demonstrates a parsimonious method to adjust for genetic differences among individual respondents that may contribute to observed differences in outcomes. In this case, adjusting for genetic background has no bearing on the influence of self-identified ethnicity.  相似文献   

3.
We conducted a nationwide study comparing self-identification to genetic ancestry classifications in a large cohort (n = 1752) from the National Marrow Donor Program. We sought to determine how various measures of self-identification intersect with genetic ancestry, with the aim of improving matching algorithms for unrelated bone marrow transplant. Multiple dimensions of self-identification, including race/ethnicity and geographic ancestry were compared to classifications based on ancestry informative markers (AIMs), and the human leukocyte antigen (HLA) genes, which are required for transplant matching. Nearly 20% of responses were inconsistent between reporting race/ethnicity versus geographic ancestry. Despite strong concordance between AIMs and HLA, no measure of self-identification shows complete correspondence with genetic ancestry. In certain cases geographic ancestry reporting matches genetic ancestry not reflected in race/ethnicity identification, but in other cases geographic ancestries show little correspondence to genetic measures, with important differences by gender. However, when respondents assign ancestry to grandparents, we observe sub-groups of individuals with well- defined genetic ancestries, including important differences in HLA frequencies, with implications for transplant matching. While we advocate for tailored questioning to improve accuracy of ancestry ascertainment, collection of donor grandparents’ information will improve the chances of finding matches for many patients, particularly for mixed-ancestry individuals.  相似文献   

4.
As the social sciences expand their involvement in genetic and genomic research, more information is needed to understand how theoretical concepts are applied to genetic data found in social surveys. Given the layers of complexity of studying race in relation to genetics and genomics, it is important to identify the varying approaches used to discuss and operationalize race and identity by social scientists. The present study explores how social scientists have used race, ethnicity, and ancestry in studies published in four social science journals from 2000 to 2014. We identify not only how race, ethnicity, and ancestry are classified and conceptualized in this growing area of research, but also how these concepts are incorporated into the methodology and presentation of results, all of which structure the discussion of race, identity, and inequality. This research indicates the slippage between concepts, classifications, and their use by social scientists in their genetics-related research. The current study can assist social scientists with clarifying their use and interpretations of race and ethnicity with the incorporation of genetic data, while limiting possible misinterpretations of the complexities of the connection between genetics and the social world.  相似文献   

5.
This paper presents the analysis of familial cancer data collected in a hospital-based study of 159 childhood soft-tissue-sarcoma patients. Two different statistical models detected excess aggregation of cancer, which could be explained by a rare dominant gene. For each kindred, we estimated the probability of the observed cancer distribution under the dominant-gene model and identified 12 families that are the most likely to be segregating the gene. Two of those families have confirmed germ-line mutations in the p53 tumor-suppressor gene. The relative risk of affection for children who are gene carriers was estimated to be 100 times the background rate. Females were found to have a slightly higher age-specific penetrance, but maternal and paternal lineages made equal contributions to the evidence in favor of the dominant gene. The proband's histology, ethnicity, and age at diagnosis were evaluated to determine whether any of these altered the probability of affection in family members. Only embryonal rhabdomyosarcoma was found to be a significant covariate under the dominant-gene model. While molecular genetic studies of familial cancer will eventually provide answers to the questions of genetic heterogeneity, age- and site-specific penetrance, mutation rates, and gene frequency, information from statistical models is useful for setting priorities and defining hypotheses.  相似文献   

6.
We have analyzed genetic data for 326 microsatellite markers that were typed uniformly in a large multiethnic population-based sample of individuals as part of a study of the genetics of hypertension (Family Blood Pressure Program). Subjects identified themselves as belonging to one of four major racial/ethnic groups (white, African American, East Asian, and Hispanic) and were recruited from 15 different geographic locales within the United States and Taiwan. Genetic cluster analysis of the microsatellite markers produced four major clusters, which showed near-perfect correspondence with the four self-reported race/ethnicity categories. Of 3,636 subjects of varying race/ethnicity, only 5 (0.14%) showed genetic cluster membership different from their self-identified race/ethnicity. On the other hand, we detected only modest genetic differentiation between different current geographic locales within each race/ethnicity group. Thus, ancient geographic ancestry, which is highly correlated with self-identified race/ethnicity--as opposed to current residence--is the major determinant of genetic structure in the U.S. population. Implications of this genetic structure for case-control association studies are discussed.  相似文献   

7.
In post-war Norway, only the 1970 national census has recorded ethnicity information about the indigenous Sámi, however restricted to those living in selected areas in the north. In this study, we combine replies about Sámi ethnicity given by the same individuals in Norway’s 1970 census and in the population-based SAMINOR study in 2003–04, to compare self-reported Sámi ethnicity at two points in time that encompass a period when the effects of a long-standing assimilation policy gradually lost ground in favour of upcoming Sámi revitalization. We found self-reported Sámi ethnicity – measured as (1) Sámi as home language in each of three generations and (2) the respondent’s self-identification as Sámi – to have remained generally stable, but some changes were observed. We argue that the results reflect interplays between societal and individual factors. We conclude that any statistical study involving an indigenous people, when clarifying the ethnicity measures, should also address the issue of ethnic mobility.  相似文献   

8.
Latent amino acid repeats seem to be widespread in genetic sequences and to reflect their structure, function, and evolution. We have recently identified latent periodicity in more than 150 protein families including protein kinases and various nucleotide-binding proteins. The latent repeats in these families were correlated to their structure and evolution. However, a majority of known protein families were not identified with our latent periodicity search algorithm. The main presumable reason for this was the inability of our techniques to identify periodicities interspersed with insertions and deletions. We designed the new latent periodicity search algorithm, which is capable of taking into account insertions and deletions. As a result, we identified many novel cases of latent periodicity peculiar to protein families. Possible origins of the periodic structure of these families are discussed. Summarizing, we presume that latent periodicity is present in a substantial portion of known protein families. The latent periodicity matrices and the results of Swiss-Prot scans are available from http://bioinf.narod.ru/del/.  相似文献   

9.
A current concern in genetic epidemiology studies in admixed populations is that population stratification can lead to spurious results. The Brazilian census classifies individuals according to self-reported "color", but several studies have demonstrated that stratifying according to "color" is not a useful strategy to control for population structure, due to the dissociation between self-reported "color" and genomic ancestry. We report the results of a study in a group of Brazilian siblings in which we measured skin pigmentation using a reflectometer, and estimated genomic ancestry using 21 Ancestry Informative Markers (AIMs). Self-reported "color", according to the Brazilian census, was also available for each participant. This made it possible to evaluate the relationship between self-reported "color" and skin pigmentation, self-reported "color" and genomic ancestry, and skin pigmentation and genomic ancestry. We observed that, although there were significant differences between the three "color" groups in genomic ancestry and skin pigmentation, there was considerable dispersion within each group and substantial overlap between groups. We also saw that there was no good agreement between the "color" categories reported by each member of the sibling pair: 30 out of 86 sibling pairs reported different "color", and in some cases, the sibling reporting the darker "color" category had lighter skin pigmentation. Socioeconomic status was significantly associated with self-reported "color" and genomic ancestry in this sample. This and other studies show that subjective classifications based on self-reported "color", such as the one that is used in the Brazilian census, are inadequate to describe the population structure present in recently admixed populations. Finally, we observed that one of the AIMs included in the panel (rs1426654), which is located in the known pigmentation gene SLC24A5, was strongly associated with skin pigmentation in this sample.  相似文献   

10.
Protein structure alignment using a genetic algorithm   总被引:3,自引:0,他引:3  
Szustakowski JD  Weng Z 《Proteins》2000,38(4):428-440
We have developed a novel, fully automatic method for aligning the three-dimensional structures of two proteins. The basic approach is to first align the proteins' secondary structure elements and then extend the alignment to include any equivalent residues found in loops or turns. The initial secondary structure element alignment is determined by a genetic algorithm. After refinement of the secondary structure element alignment, the protein backbones are superposed and a search is performed to identify any additional equivalent residues in a convergent process. Alignments are evaluated using intramolecular distance matrices. Alignments can be performed with or without sequential connectivity constraints. We have applied the method to proteins from several well-studied families: globins, immunoglobulins, serine proteases, dihydrofolate reductases, and DNA methyltransferases. Agreement with manually curated alignments is excellent. A web-based server and additional supporting information are available at http://engpub1.bu.edu/-josephs.  相似文献   

11.
MOTIVATION: The constituent amino acids of a protein work together to define its structure and to facilitate its function. Their interdependence should be apparent in the evolutionary record of each protein family: positions in the sequence of a protein family that are intimately associated in space or in function should co-vary in evolution. A recent approach by Ranganathan and colleagues proposes to look at subsets of a protein family, selected for their sequence at one position, to see how this affects variation at other positions. RESULTS: We present a quantitative algorithm for assessing covariation with this approach, based on explicit likelihood calculations. By applying our algorithm to 138 Pfam families with at least one member of known structure, we demonstrate that our method has improved power in finding physically close residues in crystal structures, compared to that of Ranganathan and colleagues. SUPPLEMENTARY INFORMATION: www.afodor.net/bioinfosup.html  相似文献   

12.
Mapping a locus controlling a quantitative genetic trait (e.g. blood pressure) to a specific genomic region is of considerable contemporary interest. Data on the quantitative trait under consideration and several codominant genetic markers with known genomic locations are collected from members of families and statistically analysed to estimate the recombination fraction, θ, between the putative quantitative trait locus and a genetic marker. One of the major complications in estimating θ for a quantitative trait in humans is the lack of haplotype information on members of families. We have devised a computationally simple two-stage method of estimation of θ in the absence of haplotypic information using the expectation-maximization (EM) algorithm. In the first stage, parameters of the quantitative trait locus (QTL) are estimated on the basis of data of a sample of unrelated individuals and a Bayes’s rule is used to classify each parent into a QTL genotypic class. In the second stage, we have proposed an EM algorithm for obtaining the maximum-likelihood estimate of θ based on data of informative families (which are identified upon inferring parental QTL genotypes performed in the first stage). The purpose of this paper is to investigate whether, instead of using genotypically ‘classified’ data of parents, the use of posterior probabilities of QT genotypes of parents at the second stage yields better estimators. We show, using simulated data, that the proposed procedure using posterior probabilities is statistically more efficient than our earlier classification procedure, although it is computationally heavier.  相似文献   

13.
Intensive growth in 3D structure data on DNA-protein complexes as reflected in the Protein Data Bank (PDB) demands new approaches to the annotation and characterization of these data and will lead to a new understanding of critical biological processes involving these data. These data and those from other protein structure classifications will become increasingly important for the modeling of complete proteomes. We propose a fully automated classification of DNA-binding protein domains based on existing 3D-structures from the PDB. The classification, by domain, relies on the Protein Domain Parser (PDP) and the Combinatorial Extension (CE) algorithm for structural alignment. The approach involves the analysis of 3D-interaction patterns in DNA-protein interfaces, assignment of structural domains interacting with DNA, clustering of domains based on structural similarity and DNA-interacting patterns. Comparison with existing resources on describing structural and functional classifications of DNA-binding proteins was used to validate and improve the approach proposed here. In the course of our study we defined a set of criteria and heuristics allowing us to automatically build a biologically meaningful classification and define classes of functionally related protein domains. It was shown that taking into consideration interactions between protein domains and DNA considerably improves the classification accuracy. Our approach provides a high-throughput and up-to-date annotation of DNA-binding protein families which can be found at http://spdc.sdsc.edu.  相似文献   

14.
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.  相似文献   

15.
Jay F  François O  Blum MG 《PloS one》2011,6(1):e16227

Background

The mainland of the Americas is home to a remarkable diversity of languages, and the relationships between genes and languages have attracted considerable attention in the past. Here we investigate to which extent geography and languages can predict the genetic structure of Native American populations.

Methodology/Principal Findings

Our approach is based on a Bayesian latent cluster regression model in which cluster membership is explained by geographic and linguistic covariates. After correcting for geographic effects, we find that the inclusion of linguistic information improves the prediction of individual membership to genetic clusters. We further compare the predictive power of Greenberg''s and The Ethnologue classifications of Amerindian languages. We report that The Ethnologue classification provides a better genetic proxy than Greenberg''s classification at the stock and at the group levels. Although high predictive values can be achieved from The Ethnologue classification, we nevertheless emphasize that Choco, Chibchan and Tupi linguistic families do not exhibit a univocal correspondence with genetic clusters.

Conclusions/Significance

The Bayesian latent class regression model described here is efficient at predicting population genetic structure using geographic and linguistic information in Native American populations.  相似文献   

16.
The taxonomic position of the endemic New Zealand bat genus Mystacina has vexed systematists ever since its erection in 1843. Over the years the genus has been linked with many microchiropteran families and superfamilies. Most recent classifications place it in the Vespertilionoidea, although some immunological evidence links it with the Noctilionoidea (=Phyllostomoidea). We have sequenced 402 bp of the mitochondrial cytochrome b gene for M. tuberculata (Gray in Dieffenbach, 1843), and using both our own and published DNA sequences for taxa in both superfamilies, we applied different tree reconstruction methods to find the appropriate phylogeny and different methods of estimating confidence in the parts of the tree. All methods strongly support the classification of Mystacina in the Noctilionoidea. Spectral analysis suggests that parsimony analysis may be misleading for Mystacina's precise placement within the Noctilionoidea because of its long terminal branch. Analyses not susceptible to long-branch attraction suggest that the Mystacinidae is a sister family to the Phyllostomidae. Dating the divergence times between the different taxa suggests that the extant chiropteran families radiated around and shortly after the Cretaceous-Tertiary boundary. We discuss the biogeographical implications of classifying Mystacina within the Noctilionoidea and contrast our result with those classifications placing Mystacina in the Vespertilionoidea, concluding that evidence for the latter is weak.  相似文献   

17.
Self-reported race/ethnicity is frequently used in epidemiological studies to assess an individual’s background origin. However, in admixed populations such as Hispanic, self-reported race/ethnicity may not accurately represent them genetically because they are admixed with European, African and Native American ancestry. We estimated the proportions of genetic admixture in an ethnically diverse population of 396 mothers and 188 of their children with 35 ancestry informative markers (AIMs) using the STRUCTURE version 2.2 program. The majority of the markers showed significant deviation from Hardy-Weinberg equilibrium in our study population. In mothers self-identified as Black and White, the imputed ancestry proportions were 77.6% African and 75.1% European respectively, while the racial composition among self-identified Hispanics was 29.2% European, 26.0% African, and 44.8% Native American. We also investigated the utility of AIMs by showing the improved fitness of models in paraoxanase-1 genotype-phenotype associations after incorporating AIMs; however, the improvement was moderate at best. In summary, a minimal set of 35 AIMs is sufficient to detect population stratification and estimate the proportion of individual genetic admixture; however, the utility of these markers remains questionable.  相似文献   

18.
The recognition of object categories is effortlessly accomplished in everyday life, yet its neural underpinnings remain not fully understood. In this electroencephalography (EEG) study, we used single-trial classification to perform a Representational Similarity Analysis (RSA) of categorical representation of objects in human visual cortex. Brain responses were recorded while participants viewed a set of 72 photographs of objects with a planned category structure. The Representational Dissimilarity Matrix (RDM) used for RSA was derived from confusions of a linear classifier operating on single EEG trials. In contrast to past studies, which used pairwise correlation or classification to derive the RDM, we used confusion matrices from multi-class classifications, which provided novel self-similarity measures that were used to derive the overall size of the representational space. We additionally performed classifications on subsets of the brain response in order to identify spatial and temporal EEG components that best discriminated object categories and exemplars. Results from category-level classifications revealed that brain responses to images of human faces formed the most distinct category, while responses to images from the two inanimate categories formed a single category cluster. Exemplar-level classifications produced a broadly similar category structure, as well as sub-clusters corresponding to natural language categories. Spatiotemporal components of the brain response that differentiated exemplars within a category were found to differ from those implicated in differentiating between categories. Our results show that a classification approach can be successfully applied to single-trial scalp-recorded EEG to recover fine-grained object category structure, as well as to identify interpretable spatiotemporal components underlying object processing. Finally, object category can be decoded from purely temporal information recorded at single electrodes.  相似文献   

19.
Using genome-wide genotypes, we characterized the genetic structure of 103,006 participants in the Kaiser Permanente Northern California multi-ethnic Genetic Epidemiology Research on Adult Health and Aging Cohort and analyzed the relationship to self-reported race/ethnicity. Participants endorsed any of 23 race/ethnicity/nationality categories, which were collapsed into seven major race/ethnicity groups. By self-report the cohort is 80.8% white and 19.2% minority; 93.8% endorsed a single race/ethnicity group, while 6.2% endorsed two or more. Principal component (PC) and admixture analyses were generally consistent with prior studies. Approximately 17% of subjects had genetic ancestry from more than one continent, and 12% were genetically admixed, considering only nonadjacent geographical origins. Self-reported whites were spread on a continuum along the first two PCs, indicating extensive mixing among European nationalities. Self-identified East Asian nationalities correlated with genetic clustering, consistent with extensive endogamy. Individuals of mixed East Asian–European genetic ancestry were easily identified; we also observed a modest amount of European genetic ancestry in individuals self-identified as Filipinos. Self-reported African Americans and Latinos showed extensive European and African genetic ancestry, and Native American genetic ancestry for the latter. Among 3741 genetically identified parent–child pairs, 93% were concordant for self-reported race/ethnicity; among 2018 genetically identified full-sib pairs, 96% were concordant; the lower rate for parent–child pairs was largely due to intermarriage. The parent–child pairs revealed a trend toward increasing exogamy over time; the presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies.  相似文献   

20.
Sarcoidosis is a granulomatous inflammatory disorder of complex etiology with significant linkage to chromosome 5, and marginal linkage was observed to five other chromosomes in African Americans (AAs) in our previously published genome scan. Because genetic factors underlying complex disease are often population specific, genetic analysis of samples with diverse ancestry (i.e., ethnic confounding) can lead to loss of power. Ethnic confounding is often addressed by stratifying on self-reported race, a controversial and less-than-perfect construct. Here, we propose linkage analysis stratified by genetically determined ancestry as an alternative approach for reducing ethnic confounding. Using data from the 380 microsatellite markers genotyped in the aforementioned genome scan, we clustered AA families into subpopulations on the basis of ancestry similarity. Evidence of two genetically distinct groups was found: subpopulation one (S1) comprised 219 of the 229 families, subpopulation two (S2) consisted of six families (the remaining four families were a mixture). Stratified linkage results suggest that only the S1 families contributed to previously identified linkage signals at 1p22, 3p21-14, 11p15, and 17q21 and that only the S2 families contributed to those found at 5p15-13 and 20q13. Signals on 2p25, 5q11, 5q35, and 9q34 remained significant in both subpopulations, and evidence of a new susceptibility locus at 2q37 was found in S2. These results demonstrate the usefulness of stratifying on genetically determined ancestry, to create genetically homogeneous subsets--more reliable and less controversial than race-stratified subsets--in which to identify genetic factors. Our findings support the presence of sarcoidosis-susceptibility genes in regions identified elsewhere but indicate that these genes are likely to be ancestry specific.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号