首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

2.
High-throughput SNP genotyping platforms use automated genotype calling algo- rithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been opti- mized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be ad- visable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

3.
High-throughput SNP genotyping platforms use automated genotype calling algo- rithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been opti- mized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be ad- visable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

4.
Protein-protein docking plays an important role in the computational prediction of the complex structure between two proteins. For years, a variety of docking algorithms have been developed, as witnessed by the critical assessment of prediction interactions (CAPRI) experiments. However, despite their successes, many docking algorithms often require a series of manual operations like modeling structures from sequences, incorporating biological information, and selecting final models. The difficulties in these manual steps have significantly limited the applications of protein-protein docking, as most of the users in the community are nonexperts in docking. Therefore, automated docking like a web server, which can give a comparable performance to human docking protocol, is pressingly needed. As such, we have participated in the blind CAPRI experiments for Rounds 38-45 and CASP13-CAPRI challenge for Round 46 with both our HDOCK automated docking web server and human docking protocol. It was shown that our HDOCK server achieved an “acceptable” or higher CAPRI-rated model in the top 10 submitted predictions for 65.5% and 59.1% of the targets in the docking experiments of CAPRI and CASP13-CAPRI, respectively, which are comparable to 66.7% and 54.5% for human docking protocol. Similar trends can also be observed in the scoring experiments. These results validated our HDOCK server as an efficient automated docking protocol for nonexpert users. Challenges and opportunities of automated docking are also discussed.  相似文献   

5.
Human behaviour is highly individual by nature, yet statistical structures are emerging which seem to govern the actions of human beings collectively. Here we search for universal statistical laws dictating the timing of human actions in communication decisions. We focus on the distribution of the time interval between messages in human broadcast communication, as documented in Twitter, and study a collection of over 160,000 tweets for three user categories: personal (controlled by one person), managed (typically PR agency controlled) and bot-controlled (automated system). To test our hypothesis, we investigate whether it is possible to differentiate between user types based on tweet timing behaviour, independently of the content in messages. For this purpose, we developed a system to process a large amount of tweets for reality mining and implemented two simple probabilistic inference algorithms: 1. a naive Bayes classifier, which distinguishes between two and three account categories with classification performance of 84.6% and 75.8%, respectively and 2. a prediction algorithm to estimate the time of a user''s next tweet with an . Our results show that we can reliably distinguish between the three user categories as well as predict the distribution of a user''s inter-message time with reasonable accuracy. More importantly, we identify a characteristic power-law decrease in the tail of inter-message time distribution by human users which is different from that obtained for managed and automated accounts. This result is evidence of a universal law that permeates the timing of human decisions in broadcast communication and extends the findings of several previous studies of peer-to-peer communication.  相似文献   

6.
Detailed data on individual animals are critical to ecological and evolutionary studies, but attaching identifying marks can alter individual fates and behavior leading to biases in parameter estimates and ethical issues. Individual-recognition software has been developed to assist in identifying many species from non-invasive photographic data. These programs utilize algorithms to find unique individual characteristics and compare images to a catalogue of known individuals. Currently, all applications for individual identification require manual processing to crop images so only the area of interest remains, or the area of interest must be manually delineated in each image. Thus, one of the main bottlenecks in processing data from photographic capture-recapture surveys is in cropping to an area of interest so that matching algorithms can identify the individual. Here, we describe the development and testing of an automated cropping program. The methods and techniques we describe are broadly applicable to any system where raw photos must be cropped down to a specific area of interest before pattern recognition software can be used for individual identification. We developed and tested the program for use with identification photos of wild giraffes.  相似文献   

7.
自动对焦是实现线虫自动化筛选的一个重要步骤.在光学显微镜系统中,通过采集同一个视野下不同焦面的图像,再通过清晰度评价函数对这些图像进行运算,得到的最大值被认为是最佳对焦位置.在本研究中,对16种常用的自动对焦算法以及最近提出的一些算法进行了评估,通过评估找出最适合线虫脂滴图像的自动对焦算法,从而搭建一套线虫脂滴自动化筛选系统.同时就对焦精度、运算时间、抗噪声能力、对焦曲线等特征进行了分析评价,结果表明,大多数算法对线虫脂滴图像都有较好的表现,特别是绝对Tenengrad算法在对焦精度上有最好的表现,我们将优选该算法应用到线虫脂滴自动化筛选系统中.  相似文献   

8.
The accuracy of the vast amount of genotypic information generated by high-throughput genotyping technologies is crucial in haplotype analyses and linkage-disequilibrium mapping for complex diseases. To date, most automated programs lack quality measures for the allele calls; therefore, human interventions, which are both labor intensive and error prone, have to be performed. Here, we propose a novel genotype clustering algorithm, GeneScore, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters. Furthermore, we describe an expectation-maximization (EM) algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices (called "GenoSpectrum") as inputs. Combining these two model-based algorithms, we can perform haplotype inference directly on raw readouts from a genotyping machine, such as the TaqMan assay. By using both simulated and real data sets, we demonstrate the advantages of our probabilistic approach over the current genotype scoring methods, in terms of both the accuracy of haplotype inference and the statistical power of haplotype-based association analyses.  相似文献   

9.
Autonomous acoustic recorders are an increasingly popular method for low‐disturbance, large‐scale monitoring of sound‐producing animals, such as birds, anurans, bats, and other mammals. A specialized use of autonomous recording units (ARUs) is acoustic localization, in which a vocalizing animal is located spatially, usually by quantifying the time delay of arrival of its sound at an array of time‐synchronized microphones. To describe trends in the literature, identify considerations for field biologists who wish to use these systems, and suggest advancements that will improve the field of acoustic localization, we comprehensively review published applications of wildlife localization in terrestrial environments. We describe the wide variety of methods used to complete the five steps of acoustic localization: (1) define the research question, (2) obtain or build a time‐synchronizing microphone array, (3) deploy the array to record sounds in the field, (4) process recordings captured in the field, and (5) determine animal location using position estimation algorithms. We find eight general purposes in ecology and animal behavior for localization systems: assessing individual animals' positions or movements, localizing multiple individuals simultaneously to study their interactions, determining animals' individual identities, quantifying sound amplitude or directionality, selecting subsets of sounds for further acoustic analysis, calculating species abundance, inferring territory boundaries or habitat use, and separating animal sounds from background noise to improve species classification. We find that the labor‐intensive steps of processing recordings and estimating animal positions have not yet been automated. In the near future, we expect that increased availability of recording hardware, development of automated and open‐source localization software, and improvement of automated sound classification algorithms will broaden the use of acoustic localization. With these three advances, ecologists will be better able to embrace acoustic localization, enabling low‐disturbance, large‐scale collection of animal position data.  相似文献   

10.
Multiplex DNA profiles are used extensively for biomedical and forensic purposes. However, while DNA profile data generation is automated, human analysis of those data is not, and the need for speed combined with accuracy demands a computer-automated approach to sample interpretation and quality assessment. In this paper, we describe an integrated mathematical approach to modeling the data and extracting the relevant information, while rejecting noise and sample artifacts. We conclude with examples showing the effectiveness of our algorithms.  相似文献   

11.
This work aimed at combining different segmentation approaches to produce a robust and accurate segmentation result. Three to five segmentation results of the left ventricle were combined using the STAPLE algorithm and the reliability of the resulting segmentation was evaluated in comparison with the result of each individual segmentation method. This comparison was performed using a supervised approach based on a reference method. Then, we used an unsupervised statistical evaluation, the extended Regression Without Truth (eRWT) that ranks different methods according to their accuracy in estimating a specific biomarker in a population. The segmentation accuracy was evaluated by estimating six cardiac function parameters resulting from the left ventricle contour delineation using a public cardiac cine MRI database. Eight different segmentation methods, including three expert delineations and five automated methods, were considered, and sixteen combinations of the automated methods using STAPLE were investigated. The supervised and unsupervised evaluations demonstrated that in most cases, STAPLE results provided better estimates than individual automated segmentation methods. Overall, combining different automated segmentation methods improved the reliability of the segmentation result compared to that obtained using an individual method and could achieve the accuracy of an expert.  相似文献   

12.
A valid age at death estimation is required in historical and also forensic anthropology. Tooth cementum annulation (TCA) is a method for age at death estimation of adult individuals. The method is based on light microscope images taken from tooth-root cross sections. The age is then estimated by manually counting the cementum incremental lines and adding this to the chronological age at the assumed point of tooth eruption. Manual line counting, however, is time consuming, potentially subjective and the number of individual counts is insufficient for statistical evaluations. Software developed for the automated evaluation of TCA images, that uses Fourier analysis and algorithms for image analysis and pattern recognition is presented here. It involves "line-by-line" scanning and the counting of gray scale peaks within a selected region-of-interest (ROI). Each scanning process of a particular ROI yields up to 400 counts that are subsequently statistically evaluated. This simple and time saving program seeks to substitute manual counting and supply consistent and reproducible results as well as reduce the demand of human error by eliminating unavoidable factors such as subjectivity and fatigue.  相似文献   

13.
Advances in dinucleotide-based genetic maps open possibilities for large scale genotyping at high resolution. The current rate-limiting steps in use of these dense maps is data interpretation (allele definition), data entry, and statistical calculations. We have recently reported automated allele identification methods. Here we show that a 10-cM framework map of the human X chromosome can be analyzed on two lanes of an automated sequencer per individual (10–12 loci per lane). We use this map and analysis strategy to generate allele data for an X-linked recessive spastic paraplegia family with a known PLP mutation. We analyzed 198 genotypes in a single gel and used the data to test three methods of data analysis: manual meiotic breakpoint mapping, automated concordance analysis, and whole chromosome multipoint linkage analysis. All methods pinpointed the correct location of the gene. We propose that multipoint exclusion mapping may permit valid inflation of LOD scores using the equation max LOD — (next best LOD).  相似文献   

14.
MOTIVATION: Advances in microscopy technology have led to the creation of high-throughput microscopes that are capable of generating several hundred gigabytes of images in a few days. Analyzing such wealth of data manually is nearly impossible and requires an automated approach. There are at present a number of open-source and commercial software packages that allow the user to apply algorithms of different degrees of sophistication to the images and extract desired metrics. However, the types of metrics that can be extracted are severely limited by the specific image processing algorithms that the application implements, and by the expertise of the user. In most commercial software, code unavailability prevents implementation by the end user of newly developed algorithms better suited for a particular type of imaging assay. While it is possible to implement new algorithms in open-source software, rewiring an image processing application requires a high degree of expertise. To obviate these limitations, we have developed an open-source high-throughput application that allows implementation of different biological assays such as cell tracking or ancestry recording, through the use of small, relatively simple image processing modules connected into sophisticated imaging pipelines. By connecting modules, non-expert users can apply the particular combination of well-established and novel algorithms developed by us and others that are best suited for each individual assay type. In addition, our data exploration and visualization modules make it easy to discover or select specific cell phenotypes from a heterogeneous population. AVAILABILITY: CellAnimation is distributed under the Creative Commons Attribution-NonCommercial 3.0 Unported license (http://creativecommons.org/licenses/by-nc/3.0/). CellAnimationsource code and documentation may be downloaded from www.vanderbilt.edu/viibre/software/documents/CellAnimation.zip. Sample data are available at www.vanderbilt.edu/viibre/software/documents/movies.zip. CONTACT: walter.georgescu@vanderbilt.edu SUPPLEMENTARY INFORMATION: Supplementary data available at Bioinformatics online.  相似文献   

15.
16.
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.  相似文献   

17.
A Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is designed to distinguish humans from machines. Most of the existing tests require reading distorted text embedded in a background image. However, many existing CAPTCHAs are either too difficult for humans due to excessive distortions or are trivial for automated algorithms to solve. These CAPTCHAs also suffer from inherent language as well as alphabet dependencies and are not equally convenient for people of different demographics. Therefore, there is a need to devise other Turing tests which can mitigate these challenges. One such test is matching two faces to establish if they belong to the same individual or not. Utilizing face recognition as the Turing test, we propose FR-CAPTCHA based on finding matching pairs of human faces in an image. We observe that, compared to existing implementations, FR-CAPTCHA achieves a human accuracy of 94% and is robust against automated attacks.  相似文献   

18.
MOTIVATION: Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed. RESULTS: Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k. AVAILABILITY: R code for all validation measures and rank aggregation is available from the authors upon request. SUPPLEMENTARY INFORMATION: Supplementary information are available at http://www.somnathdatta.org/Supp/RankCluster/supp.htm.  相似文献   

19.
In a large variety of situations one would like to have an expressive and accurate model of observed animal or human behavior. While general purpose mathematical models may capture successfully properties of observed behavior, it is desirable to root models in biological facts. Because of ample empirical evidence for reward-based learning in visuomotor tasks, we use a computational model based on the assumption that the observed agent is balancing the costs and benefits of its behavior to meet its goals. This leads to using the framework of reinforcement learning, which additionally provides well-established algorithms for learning of visuomotor task solutions. To quantify the agent’s goals as rewards implicit in the observed behavior, we propose to use inverse reinforcement learning, which quantifies the agent’s goals as rewards implicit in the observed behavior. Based on the assumption of a modular cognitive architecture, we introduce a modular inverse reinforcement learning algorithm that estimates the relative reward contributions of the component tasks in navigation, consisting of following a path while avoiding obstacles and approaching targets. It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals. It is demonstrated through simulations that good estimates can be obtained already with modest amounts of observation data, which in turn allows the prediction of behavior in novel configurations.  相似文献   

20.
The rapid growth of the literature on neuroimaging in humans has led to major advances in our understanding of human brain function but has also made it increasingly difficult to aggregate and synthesize neuroimaging findings. Here we describe and validate an automated brain-mapping framework that uses text-mining, meta-analysis and machine-learning techniques to generate a large database of mappings between neural and cognitive states. We show that our approach can be used to automatically conduct large-scale, high-quality neuroimaging meta-analyses, address long-standing inferential problems in the neuroimaging literature and support accurate 'decoding' of broad cognitive states from brain activity in both entire studies and individual human subjects. Collectively, our results have validated a powerful and generative framework for synthesizing human neuroimaging data on an unprecedented scale.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号