首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”—with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.  相似文献   

2.
One concern in human genetics research is maintaining the privacy of study participants. The growth in genealogical registries may contribute to loss of privacy, given that genotypic information is accessible online to facilitate discovery of genetic relationships. Through iterative use of two such web archives, FamilySearch and Sorenson Molecular Genealogy Foundation, I was able to discern the likely haplotypes for the Y chromosomes of two men, Joseph Smith and Brigham Young, who were instrumental in the founding of the Latter-Day Saints Church. I then determined whether any of the Utahns who contributed to the HapMap project (the “CEU” set) is related to either man, on the basis of haplotype analysis of the Y chromosome. Although none of the CEU contributors appear to be a male-line relative, I discovered that predictions could be made for the surnames of the CEU participants by a similar process. For 20 of the 30 unrelated CEU samples, at least one exact match was revealed, and for 17 of these, a potential ancestor from Utah or a neighboring state could be identified. For the remaining ten samples, a match was nearly perfect, typically deviating by only one marker repeat unit. The same query performed in two other large databases revealed fewer individual matches and helped to clarify which surname predictions are more likely to be correct. Because large data sets of genotypes from both consenting research subjects and individuals pursuing genetic genealogy will be accessible online, this type of triangulation between databases may compromise the privacy of research subjects.  相似文献   

3.
Human T-cell lymphotropic virus type 1 (HTLV-1) is mainly associated with two diseases: tropical spastic paraparesis/HTLV-1-associated myelopathy (TSP/HAM) and adult T-cell leukaemia/lymphoma. This retrovirus infects five-10 million individuals throughout the world. Previously, we developed a database that annotates sequence data from GenBank and the present study aimed to describe the clinical, molecular and epidemiological scenarios of HTLV-1 infection through the stored sequences in this database. A total of 2,545 registered complete and partial sequences of HTLV-1 were collected and 1,967 (77.3%) of those sequences represented unique isolates. Among these isolates, 93% contained geographic origin information and only 39% were related to any clinical status. A total of 1,091 sequences contained information about the geographic origin and viral subtype and 93% of these sequences were identified as subtype “a”. Ethnicity data are very scarce. Regarding clinical status data, 29% of the sequences were generated from TSP/HAM and 67.8% from healthy carrier individuals. Although the data mining enabled some inferences about specific aspects of HTLV-1 infection to be made, due to the relative scarcity of data of available sequences, it was not possible to delineate a global scenario of HTLV-1 infection.  相似文献   

4.
Neuroimaging activation maps typically color voxels to indicate whether the blood oxygen level-dependent (BOLD) signals measured among two or more experimental conditions differ significantly at that location. This data presentation, however, omits information critical for interpretation of experimental results. First, no information is represented about trends at voxels that do not pass the statistical test. Second, no information is given about the range of probable effect sizes at voxels that do pass the statistical test. This leads to a fundamental error in interpreting activation maps by naïve viewers, where it is assumed that colored, “active” voxels are reliably different from uncolored “inactive” voxels. In other domains, confidence intervals have been added to data graphics to reduce such errors. Here, we first document the prevalence of the fundamental error of interpretation, and then present a method for solving it by depicting confidence intervals in fMRI activation maps. Presenting images where the bounds of confidence intervals at each voxel are coded as color allows readers to visually test for differences between “active” and “inactive” voxels, and permits for more proper interpretation of neuroimaging data. Our specific graphical methods are intended as initial proposals to spur broader discussion of how to present confidence intervals for fMRI data.  相似文献   

5.
Genotype-Imputation Accuracy across Worldwide Human Populations   总被引:2,自引:0,他引:2  
A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the “portability” of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified “optimal” mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial “SNP chip,” again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations.  相似文献   

6.
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.  相似文献   

7.
Most societies prohibit some market transactions based on moral concerns, even when the exchanges would benefit the parties involved and would not create negative externalities. A prominent example is given by payments for human organs for transplantation, banned virtually everywhere despite long waiting lists and many deaths of patients who cannot find a donor. Recent research, however, has shown that individuals significantly increase their stated support for a regulated market for human organs when provided with information about the organ shortage and the potential beneficial effects a price mechanism. In this study we focused on payments for human organs and on another “repugnant” transaction, indoor prostitution, to address two questions: (A) Does providing general information on the welfare properties of prices and markets modify attitudes toward repugnant trades? (B) Does additional knowledge on the benefits of a price mechanism in a specific context affect attitudes toward price-based transactions in another context? By answering these questions, we can assess whether eliciting a market-oriented approach may lead to a relaxation of moral opposition to markets, and whether there is a cross-effect of information, in particular for morally controversial activities that, although different, share a reference to the “commercialization” of the human body. Relying on an online survey experiment with 5,324 U.S. residents, we found no effect of general information about market efficiency, consistent with morally controversial markets being accepted only when they are seen as a solution to a specific problem. We also found some cross-effects of information about a transaction on the acceptance of the other; however, the responses were mediated by the gender and (to a lesser extent) religiosity of the respondent—in particular, women exposed to information about legalizing prostitution reduced their stated support for regulated organ payments. We relate these findings to prior research and discuss implications for public policy.  相似文献   

8.
Intraspecific variability (IV) has been proposed to explain species coexistence in diverse communities. Assuming, sometimes implicitly, that conspecific individuals can perform differently in the same environment and that IV increases niche overlap, previous studies have found contrasting results regarding the effect of IV on species coexistence. We aim at showing that the large IV observed in data does not mean that conspecific individuals are necessarily different in their response to the environment and that the role of high‐dimensional environmental variation in determining IV has largely remained unexplored in forest plant communities. We first used a simulation experiment where an individual attribute is derived from a high‐dimensional model, representing “perfect knowledge” of individual response to the environment, to illustrate how large observed IV can result from “imperfect knowledge” of the environment. Second, using growth data from clonal Eucalyptus plantations in Brazil, we estimated a major contribution of the environment in determining individual growth. Third, using tree growth data from long‐term tropical forest inventories in French Guiana, Panama and India, we showed that tree growth in tropical forests is structured spatially and that despite a large observed IV at the population level, conspecific individuals perform more similarly locally than compared with heterospecific individuals. As the number of environmental dimensions that are well quantified at fine scale is generally lower than the actual number of dimensions influencing individual attributes, a great part of observed IV might be represented as random variation across individuals when in fact it is environmentally driven. This mis‐representation has important consequences for inference about community dynamics. We emphasize that observed IV does not necessarily impact species coexistence per se but can reveal species response to high‐dimensional environment, which is consistent with niche theory and the observation of the many differences between species in nature.  相似文献   

9.
Privacy laws are intended to preserve human well-being and improve medical outcomes. We used the Sportstats website, a repository of competitive athletic data, to test how easily these laws can be circumvented. We designed a haphazard, unrepresentative case-series analysis and applied unscientific methods based on an Internet connection and idle time. We found it both feasible and titillating to breach anonymity, stockpile personal information and generate misquotations. We extended our methods to snoop on celebrities, link to outside databases and uncover refusal to participate. Throughout our study, we evaded capture and public humiliation despite violating these 6 privacy fundamentals. We suggest that the legitimate principle of safeguarding personal privacy is undermined by the natural human tendency toward showing off.We are shocked! Shocked! Shocked! We are shocked at the amount of sensitive personal information being released on thousands of Canadians, including some of our country''s most prominent citizens. The widespread dispersal of and the easy access to health data offends our sensibilities as medical scientists who are respectful of Canadian privacy laws. We prefer to jump through innumerable bureaucratic hoops to obtain data for research, and we believe that our rivals in other scientific fields ought to do the same.We uphold traditional values. We reminisce about the golden age when conducting a chart review was the standard for measuring quality of care. Ethics submissions were like sustained foreplay, and privacy impact assessments provided another thrill verging on “joy of the forbidden.” The 3-week turnarounds gave us time to savour and appreciate every passing minute. And joy! Even more delays occurred when health records departments could not find the relevant charts.Woe unto those who visit the Sportstats website (www.sportstats.ca).1 This site reveals personal data obtained from timers affixed to athletes competing in sporting events across North America. This database is thorough and is searchable for many past years. In fact, we recommend using these data if you need personal information about your neighbour, nemesis or boss. In this article, we offer pointers on 6 violations of privacy for those mavericks who flaunt the scientific establishment (not us!).  相似文献   

10.
Avoidance behavior is a critical component of many psychiatric disorders, and as such, it is important to understand how avoidance behavior arises, and whether it can be modified. In this study, we used empirical and computational methods to assess the role of informational feedback and ambiguous outcome in avoidance behavior. We adapted a computer-based probabilistic classification learning task, which includes positive, negative and no-feedback outcomes; the latter outcome is ambiguous as it might signal either a successful outcome (missed punishment) or a failure (missed reward). Prior work with this task suggested that most healthy subjects viewed the no-feedback outcome as strongly positive. Interestingly, in a later version of the classification task, when healthy subjects were allowed to opt out of (i.e. avoid) responding, some subjects (“avoiders”) reliably avoided trials where there was a risk of punishment, but other subjects (“non-avoiders”) never made any avoidance responses at all. One possible interpretation is that the “non-avoiders” valued the no-feedback outcome so positively on punishment-based trials that they had little incentive to avoid. Another possible interpretation is that the outcome of an avoided trial is unspecified and that lack of information is aversive, decreasing subjects’ tendency to avoid. To examine these ideas, we here tested healthy young adults on versions of the task where avoidance responses either did or did not generate informational feedback about the optimal response. Results showed that provision of informational feedback decreased avoidance responses and also decreased categorization performance, without significantly affecting the percentage of subjects classified as “avoiders.” To better understand these results, we used a modified Q-learning model to fit individual subject data. Simulation results suggest that subjects in the feedback condition adjusted their behavior faster following better-than-expected outcomes, compared to subjects in the no-feedback condition. Additionally, in both task conditions, “avoiders” adjusted their behavior faster following worse-than-expected outcomes, and treated the ambiguous no-feedback outcome as less rewarding, compared to non-avoiders. Together, results shed light on the important role of ambiguous and informative feedback in avoidance behavior.  相似文献   

11.
“Informed consent” sets a goal for investigators experimenting with human subjects, but little is known about how to achieve or evaluate it in an experiment. In a 3-year, double-blind study with incarcerated men, we attempted to provide a “free and informed consent” and evaluated our efforts with an unannounced questionnaire administered to subjects after they completed the experiment. At that time, approximately two-thirds had sufficient information for an informed consent, but only one-third was well informed about all key aspects of the experiment and one-third was insufficiently informed to give an informed consent. We found that institution- or study-based coercion was minimal in our experiment. From our evaluation of the questionnaire and experience at the study institution, we conclude that an experiment with human subjects should be designed to include an ongoing evaluation of informed consent, and active attempts should be made to avoid or minimize coercive inducements. Experiments with significant risk, which require a long duration and/or large sample size relative to the institution's population, should probably not be performed on prisoner subjects. The experimenter should be independent of the penal institution's power structure. Presenting and explaining a consent form to volunteers on one occasion is probably an in adequate procedure for obtaining and maintaining an informed consent.  相似文献   

12.
We employed a multi-scale clustering methodology known as “data cloud geometry” to extract functional connectivity patterns derived from functional magnetic resonance imaging (fMRI) protocol. The method was applied to correlation matrices of 106 regions of interest (ROIs) in 29 individuals with autism spectrum disorders (ASD), and 29 individuals with typical development (TD) while they completed a cognitive control task. Connectivity clustering geometry was examined at both “fine” and “coarse” scales. At the coarse scale, the connectivity clustering geometry produced 10 valid clusters with a coherent relationship to neural anatomy. A supervised learning algorithm employed fine scale information about clustering motif configurations and prevalence, and coarse scale information about intra- and inter-regional connectivity; the algorithm correctly classified ASD and TD participants with sensitivity of and specificity of . Most of the predictive power of the logistic regression model resided at the level of the fine-scale clustering geometry, suggesting that cellular versus systems level disturbances are more prominent in individuals with ASD. This article provides validation for this multi-scale geometric approach to extracting brain functional connectivity pattern information and for its use in classification of ASD.  相似文献   

13.
CommunityRx (CRx), an information technology intervention, provides patients with a personalized list of healthful community resources (HealtheRx). In repeated clinical studies, nearly half of those who received clinical “doses” of the HealtheRx shared their information with others (“social doses”). Clinical trial design cannot fully capture the impact of information diffusion, which can act as a force multiplier for the intervention. Furthermore, experimentation is needed to understand how intervention delivery can optimize social spread under varying circumstances. To study information diffusion from CRx under varying conditions, we built an agent-based model (ABM). This study describes the model building process and illustrates how an ABM provides insight about information diffusion through in silico experimentation. To build the ABM, we constructed a synthetic population (“agents”) using publicly-available data sources. Using clinical trial data, we developed empirically-informed processes simulating agent activities, resource knowledge evolution and information sharing. Using RepastHPC and chiSIM software, we replicated the intervention in silico, simulated information diffusion processes, and generated emergent information diffusion networks. The CRx ABM was calibrated using empirical data to replicate the CRx intervention in silico. We used the ABM to quantify information spread via social versus clinical dosing then conducted information diffusion experiments, comparing the social dosing effect of the intervention when delivered by physicians, nurses or clinical clerks. The synthetic population (N = 802,191) exhibited diverse behavioral characteristics, including activity and knowledge evolution patterns. In silico delivery of the intervention was replicated with high fidelity. Large-scale information diffusion networks emerged among agents exchanging resource information. Varying the propensity for information exchange resulted in networks with different topological characteristics. Community resource information spread via social dosing was nearly 4 fold that from clinical dosing alone and did not vary by delivery mode. This study, using CRx as an example, demonstrates the process of building and experimenting with an ABM to study information diffusion from, and the population-level impact of, a clinical information-based intervention. While the focus of the CRx ABM is to recreate the CRx intervention in silico, the general process of model building, and computational experimentation presented is generalizable to other large-scale ABMs of information diffusion.  相似文献   

14.
15.
Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can "slice" and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches-for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums.  相似文献   

16.
Detecting gene-gene interaction in complex diseases has become an important priority for common disease genetics, but most current approaches to detecting interaction start with disease-marker associations. These approaches are based on population allele frequency correlations, not genetic inheritance, and therefore cannot exploit the rich information about inheritance contained within families. They are also hampered by issues of rigorous phenotype definition, multiple test correction, and allelic and locus heterogeneity. We recently developed, tested, and published a powerful gene-gene interaction detection strategy based on conditioning family data on a known disease-causing allele or a disease-associated marker allele4. We successfully applied the method to disease data and used computer simulation to exhaustively test the method for some epistatic models. We knew that the statistic we developed to indicate interaction was less reliable when applied to more-complex interaction models. Here, we improve the statistic and expand the testing procedure. We computer-simulated multipoint linkage data for a disease caused by two interacting loci. We examined epistatic as well as additive models and compared them with heterogeneity models. In all our models, the at-risk genotypes are “major” in the sense that among affected individuals, a substantial proportion has a disease-related genotype. One of the loci (A) has a known disease-related allele (as would have been determined from a previous analysis). We removed (pruned) family members who did not carry this allele; the resultant dataset is referred to as “stratified.” This elimination step has the effect of raising the “penetrance” and detectability at the second locus (B). We used the lod scores for the stratified and unstratified data sets to calculate a statistic that either indicated the presence of interaction or indicated that no interaction was detectable. We show that the new method is robust and reliable for a wide range of parameters. Our statistic performs well both with the epistatic models (false negative rates, i.e., failing to detect interaction, ranging from 0 to 2.5%) and with the heterogeneity models (false positive rates, i.e., falsely detecting interaction, ≤1%). It works well with the additive model except when allele frequencies at the two loci differ widely. We explore those features of the additive model that make detecting interaction more difficult. All testing of this method suggests that it provides a reliable approach to detecting gene-gene interaction.  相似文献   

17.
18.
Maintaining privacy in network data publishing is a major challenge. This is because known characteristics of individuals can be used to extract new information about them. Recently, researchers have developed privacy methods based on k-anonymity and l-diversity to prevent re-identification or sensitive label disclosure through certain structural information. However, most of these studies have considered only structural information and have been developed for undirected networks. Furthermore, most existing approaches rely on generalization and node clustering so may entail significant information loss as all properties of all members of each group are generalized to the same value. In this paper, we introduce a framework for protecting sensitive attribute, degree (the number of connected entities), and relationships, as well as the presence of individuals in directed social network data whose nodes contain attributes. First, we define a privacy model that specifies privacy requirements for the above private information. Then, we introduce the technique of Ambiguity in Social Network data (ASN) based on anatomy, which specifies how to publish social network data. To employ ASN, individuals are partitioned into groups. Then, ASN publishes exact values of properties of individuals of each group with common group ID in several tables. The lossy join of those tables based on group ID injects uncertainty to reconstruct the original network. We also show how to measure different privacy requirements in ASN. Simulation results on real and synthetic datasets demonstrate that our framework, which protects from four types of private information disclosure, preserves data utility in tabular, topological and spectrum aspects of networks at a satisfactory level.  相似文献   

19.
The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample and then to impute the rest of the study sample, using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the “most diverse reference panel”, defined as the subset with the maximal “phylogenetic diversity”, thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can substantially improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different marker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data.  相似文献   

20.
Oligonucleotide microarrays are commonly adopted for detecting and qualifying the abundance of molecules in biological samples. Analysis of microarray data starts with recording and interpreting hybridization signals from CEL images. However, many CEL images may be blemished by noises from various sources, observed as “bright spots”, “dark clouds”, and “shadowy circles”, etc. It is crucial that these image defects are correctly identified and properly processed. Existing approaches mainly focus on detecting defect areas and removing affected intensities. In this article, we propose to use a mixed effect model for imputing the affected intensities. The proposed imputation procedure is a single-array-based approach which does not require any biological replicate or between-array normalization. We further examine its performance by using Affymetrix high-density SNP arrays. The results show that this imputation procedure significantly reduces genotyping error rates. We also discuss the necessary adjustments for its potential extension to other oligonucleotide microarrays, such as gene expression profiling. The R source code for the implementation of approach is freely available upon request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号