Updated comparison of variant detection methods: Ensemble, FreeBayes and minimal BAM preparation pipelines

Variant evaluation overview

I previously discussed our approach for evaluating variant detection methods using a highly confident set of reference calls provided by NIST’s Genome in a Bottle consortium for the NA12878 human HapMap genome, In this post, I’ll update those conclusions based on recent improvements in GATK and FreeBayes.

The comparisons use bcbio-nextgen, an automated open-source pipeline for variant calling and evaluation that identifies concordant and discordant variants with the XPrize validation protocol. By having an automated validation workflow attached to a regularly updated, community developed, variant calling pipeline, we can actively track progress of variant callers and provide updates as algorithms improve.

Since the initial post, There have been two new GATK releases of UnifiedGenotyper and HaplotypeCaller, as well as multiple improvements to FreeBayes. Additionally we’ve enchanced our ensemble calling method, which combines inputs from multiple callers into a single final set of calls, to better handle comparisons with inputs from three callers.

The goal of this post is to re-evaluate these variant detection approaches and provide an updated set of recommendations:

FreeBayes detects more concordant SNPs and indels compared to GATK approaches, including GATK’s HaplotypeCaller method.
Post-alignment BAM processing steps like base quality recalibration and realignment have little impact on the quality of variant calls with variant callers that perform local realignment, including FreeBayes and GATK HaplotypeCaller.
The Ensemble calling method provides the best variant detection by combining inputs from GATK UnifiedGenotyper, HaplotypeCaller and FreeBayes.

Avoiding the post-alignment BAM recalibration and realignment steps allows us to save significant time and pipeline complexity. Combined with the improvements in FreeBayes, this enables a variant calling pipeline that can be freely used for academic, clinical and commercial work with equal quality variant calls compared to current GATK best-practice approaches.

Calling and evaluation methods

We called variants on a NA12878 exome dataset from EdgeBio’s clinical pipeline and assessed them against the NIST’s Genome in a Bottle reference material. Full instructions for replicating the analysis and installing the pipeline are available from the bcbio-nextgen documentation site. Following alignment with bwa-mem (0.7.5a), we post-processed the BAM files with two methods:

GATK’s best practices (2.7-2): This involves de-duplication with Picard MarkDuplicates, GATK base quality score recalibration and GATK realignment around indels.
Minimal post-processing, with de-duplication using samtools rmdup and no realignment or recalibration.

We then called variants with three general purpose callers:

FreeBayes (v0.9.9.2-18): A haplotype-based Bayesian caller from the Marth Lab. We filter calls with a hard filter based on depth, quality and strand bias.
GATK UnifiedGenotyper (2.7-2): GATK’s widely used Bayesian caller. Since this is single sample exome data, we filter calls using GATK’s recommended hard filters, instead of Variant Quality Score Recalibration (VQSR).
GATK HaplotypeCaller (2.7-2): GATK’s more recently developed haplotype caller which provides local assembly around variant regions. We also filtered these calls using recommended hard filters.

Finally, we evaluated the calls from each combination of variant caller and BAM post-alignment preparation method using the bcbio.variation framework. This provides a summary identifying concordant and discordant variants, separating SNPs and indels since they have different error profiles. Additionally it classifies discordant variants. where the reference material and evaluation variants differ, into three categories:

Extra variants, called in the evaluation data but not in the reference. These are potential false positives or missing calls from the reference materials.
Missing variants, found in the NA12878 reference but not in the evaluation data set. These are potential false negatives.
Shared variants, called in both the evaluation and reference but differently represented. This results from allele differences, such as heterozygote versus homozygote calls, or variant identification differences, such as indel start and end coordinates.

Variant caller comparison

Using this framework, we compared the 3 variant callers and combined ensemble method:

FreeBayes outperforms the GATK callers on both SNP and indel calling. The most recent versions of FreeBayes have improved sensitivity and specificity which puts them on par with GATK HaplotypeCaller. One area where FreeBayes performs better is in correctly resolving heterozygote/homozygote calls, reflected in the lower number of discordant shared variants.
GATK HaplotypeCaller is all around better than the UnifiedGenotyper. In the previous comparison, we found UnifiedGenotyper performed better on SNPs and HaplotypeCaller better on indels, but the recent improvements in GATK 2.7 have resolved the difference in SNP calling. If using a GATK pipeline, UnifiedGenotyper lags behind the realigning callers in resolving indels, and I’d recommend using HaplotypeCaller. This mirrors the GATK team’s current recommendations.
The ensemble calling approach provides the best overall resolution of both SNPs and indels. The one area where it lags slightly behind is in identification of homozygote/heterozygote calls, especially in indels. This is due to positions where HaplotypeCaller and FreeBayes both call variants but differ on whether it is a heterozygote or homozygote, reflected as higher discordant shared counts.

In addition to calling sensitivity and specificity, an additional factor to consider is the required processing time. Rough benchmarks on family-based calling of whole genome sequencing data indicate that HaplotypeCaller is roughly 7x slower than UnifiedGenotyper and FreeBayes is 2x slower. On multiple 30x whole genome samples, our experience is that calling can range from 10 hours for GATK UnifiedGenotyper to 70 hours for HaplotypeCallers. Ensemble calling requires running all three callers plus combining into a final call set, and for family-based whole genome samples can add another 100 hours of processing time. These estimates fluctuate greatly depending on the compute infrastructure and presence of longer difficult genomic regions with deeper coverage, but give some estimates of timing considerations.

Post-alignment BAM preparation comparison

Given the improved accuracy of local realignment haplotype-based callers like FreeBayes and HaplotypeCaller, we explored the accuracy cost of removing the post-alignment BAM processing steps. The recommended GATK best-practice is to follow up alignment with identification of duplicate reads, followed by base quality score recalibration and realignment around indels. Based on whole genome benchmarking work, these steps can take as long as the initial alignment and scale poorly due to the high IO costs of manipulating large BAM files. For multiple 30x whole genome samples running on 16 cores per sample, this can account for 12 to 16 hours of processing time.

To compare the quality impact of avoiding recalibration and realignment, we performed the identical alignment and variant calling steps as above, but did minimal post-alignment BAM preparation. Following alignment, the only step performed was deduplication using samtools rmdup. Unlike Picard MarkDuplicates, samtools rmdup handles piped streaming input to avoid IO penalties. This is at the cost of not handling some edge cases. Longer term, we’d like to explore biobambam’s markduplicates2, which implements a more efficient streaming version of the Picard MarkDuplicates algorithm.

Suprisingly, skipping base recalibration and indel realignment had almost no impact on the quality of resulting variant calls:

While GATK UnifiedGenotyper suffers during indel calling without recalibration and realignment, both HaplotypeCaller and FreeBayes perform as good or better without these steps. This allows us to save on processing time and complexity without sacrificing call quality when using a haplotype aware realigning caller.

Caveats and conclusions

Taken together, the improvements in FreeBayes and ability to avoid post-alignment BAM processing allow use of a commercially unrestricted GATK-free pipeline with equal quality to current GATK best practices. Adding in GATK’s two callers plus our ensemble combining method provides the most accurate overall calls, at the cost of additional processing time.

It’s also important to consider potential drawbacks of this analysis as we continue to design future evaluations. The comparison is in exome regions for single sample variant calling. In future work it would be helpful to have population or family based inputs. We’d also like to prepare test datasets that focus specifically on evaluating the quality of calls in more difficult repetitive regions within the whole genome. Using populations or whole genomes would also allow use of GATK’s Variant Quality Score Recalibration as part of the pipeline, which could provide improved filtering compared to the hard-filtering approach used here.

Another consideration is that the reference callset prepared by the Genome in a Bottle consortium makes extensive use of GATK tools during preparation. Evaluation of the reference materials with FreeBayes and other callers can help reduce potential GATK-specific biases when continuing to develop reliable reference materials.

All of these pipelines are freely available, open-source, community developed projects and we welcome feedback and contributors. By integrating validation into a scalable analysis pipeline, we hope to build a community interested in widely accessible calling pipelines coupled with well-evaluated reference datasets and methods.

Written by Brad Chapman

October 21, 2013 at 6:35 am

Posted in variation

Tagged with bioinformatics, clinical, ngs, variant

50 Responses

Subscribe to comments with RSS.

Seems to me realignment belongs to the aligners. I like where Novoalign is going with this.

Jeremy Leipzig

October 21, 2013 at 2:57 pm

Reply
- …oh and nice analysis
  
  Jeremy Leipzig
  
  October 21, 2013 at 3:22 pm
  
  Reply
  - Jeremy;
    Thanks much for the thoughts. I think the ideal situation is to have more compatibility between alignment and variant calling to better account for tricky alignment decisions. It’s valuable to be able to realign and adjust reads taking ploidy and haplotypes into account. This could be further improved by having information from the aligner about alternative locations, allowing final assignment of reads based on local assembly and haplotype graphs instead of strict mapping. I’m following Heng Li’s work (https://github.com/lh3/fermi2) thinking it might move in this direction.
    
    Brad Chapman
    
    October 22, 2013 at 9:48 am
    
    Reply
- Mappers treat each read independently, while both realigners and variant callers require sorted final alignment. In this sense, realigners are much closer to variant callers. It is definitely not mappers’ responsibility to do realignment.
  
  attractivechaos
  
  October 30, 2013 at 10:32 am
  
  Reply
great post Brad, really appreciate the effort you put in these posts, and in the bcbio pipelines.
Have you done any more systematic investigations into the cause of the discordancies? any obvious trends with DP, GQ, HRun or others?
cheers, Mark

Mark Cowley

October 22, 2013 at 5:15 am

Reply
- Mark;
  I’ve done a lot of digging into this and we’re working on a visualization system called o8 to identify useful discriminatory metrics (https://github.com/chapmanb/o8). The short answer is that it’s not obvious and the current variants callers and filters do a great job of catching the known problems and avoiding them. The most useful distinction is coverage depth, where calls with low depth (<13; http://www.ncbi.nlm.nih.gov/pubmed/23773188) are more difficult to resolve. The other currently target we look in assessing quality is strand bias, especially where a heterozygous call has additional quality support on one strand and not the other. But it's definitely a bit of a research project so we have ideas of what to look at right now more than specific recommendations.
  
  Thanks much for the feedback and thoughts.
  
  Brad Chapman
  
  October 22, 2013 at 10:04 am
  
  Reply
Great analysis. Do you know which version of freebayes improved its performance: I have this v9.9.2-13-gad718f9-dirty. Is this recent enough?

pico (@p_hi1)

October 22, 2013 at 11:10 am

Reply
- 9.9.2-13 is the version with the detection improvements, so it should show a lot of the benefits. It would be worth upgrading to the latest only because there were a few round of bug fixes for edge cases, so this would help avoid some of the error regions that -13 has. The lastest is -19. Hope this helps.
  
  Brad Chapman
  
  October 22, 2013 at 8:58 pm
  
  Reply
  - thank you for the info. i will update it. I was at BC and know G.M. so it good to hear FB is getting better.
    HI
    
    pico (@p_hi1)
    
    October 24, 2013 at 12:42 pm
    
    Reply
Hey Brad,
Thanks for the analysis :)
Do you think a Galaxy based visualisation solution will be more accessible? I am working on something similar.

Saket

Anonymous

October 22, 2013 at 5:42 pm

Reply
- For some reason, I am not able to comment via G+.
  Saket
  http://twitter.com/saketkc
  
  Anonymous
  
  October 22, 2013 at 5:45 pm
  
  Reply
  - Saket;
    I’d definitely like to integrate this pipeline into Galaxy and was hoping to put together a proof of concept as a CloudMan shared instance. It would be great to make running the pipeline and analyzing results more accessible so I’d be happy to explore integrating with your visualization tools once they’re available. Thanks much for the interest in this.
    
    Brad Chapman
    
    October 22, 2013 at 9:06 pm
    
    Reply
Could you please let me know the filters based on depth, quality and strand bias that you used for freebayes?

Anonymous

October 22, 2013 at 11:37 pm

Reply
- Definitely. We use both a very vanilla filter with only depth and quality (QUAL < 20, DP < 5; https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/vfilter.py#L72) and also have been experimenting with a more complicated filter that filters hets/homs differently and considers strand quality. This is part of bcbio.variation, with the algorithm here (https://github.com/chapmanb/bcbio.variation/blob/master/src/bcbio/variation/filter/custom.clj#L7) and the commandline call here (https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/vfilter.py#L63).
  
  I'm not convinced the more complex filter is currently worth it, which is why I underemphasized it in the post. I was working on filtering approaches at the same time FreeBayes improved calling approaches, and the calling approaches were far and away the bigger improvement. I hope to examine this further with the new FreeBayes updates and categorize what value it provides.
  
  Hope this helps.
  
  Brad Chapman
  
  October 23, 2013 at 2:25 pm
  
  Reply
  - An interesting exploration would be to look at the ROC curves of various callers rather than just at a fixed threshold. Presumably, different cutoffs might show that one caller can be more specific than the other, or more sensitive. Possibly one method has an overall higher area under the curve (AUC)?
    
    Erik Garrison
    
    October 31, 2013 at 10:02 am
    
    Reply
  - Hey, sorry for necromancing an old post, but I was hoping that you might be able to answer this:
    
    GATK recommends that you filter their calls (SNPs and InDels) separately. Would you recommend that we do the same for FreeBayes as well?
    
    Anonymous
    
    July 13, 2019 at 1:21 pm
    
    Reply
Brad, I recently added markdup in sambamba, which works according to Picard criteria but much faster on multicore (based on ideas from biobambam2 paper for handling read pairs). For now, external sort is not implemented, so expected memory usage is 2-2.5GB per each 100M reads. The latest build for linux 64-bit is at http://goo.gl/sPPX1Y

Artem Tarasov

October 24, 2013 at 6:00 am

Reply
- Artem;
  Thanks for the heads up. I’ll definitely investigate sambamba more. It would be worth emphasizing the current version and changes on your GitHub README and wiki page. I actually looked at sambamba a month ago and didn’t realize you’d added parameters to control core usage in recent versions (I had CloudBioLinux’s 0.2.9 on my machine). I’ll update to the latest and work more on it. Thanks again.
  
  Brad Chapman
  
  October 24, 2013 at 10:24 am
  
  Reply
Thanks for this- this is really useful. I was wondering if any work had been done in comparisons for these callers for low coverage data. Presumably, as calling of low-coverage data can have different problems, the ranking of the callers from high coverage data may not be generalisable to this?

Deepti Gurdasani

October 24, 2013 at 1:30 pm

Reply
- Deepti;
  Great point. This is a moderate coverage library (40-50x in exome regions). Here’s the binned coverage (http://i.imgur.com/mz2v3oA.png) and 0-50x coverage (http://i.imgur.com/yfXHUTg.png) in the target regions. As you mention, low coverage regions are definitely the hardest to resolve. The previous post (https://bcbio.wordpress.com/2013/05/06/framework-for-evaluating-variant-detection-methods-comparison-of-aligners-and-callers/) explicitly enumerates some of these challenges and provides numbers specifically in low coverage (< 10x in that post) regions.
  
  It would be interesting to downsample and repeat the comparisons to see how the results change depending on inputs. Thanks for the useful feedback.
  
  Brad Chapman
  
  October 25, 2013 at 7:12 am
  
  Reply
[…] Updated comparison of variant detection methods: Ensemble, FreeBayes and minimal BAM preparation pip… Variant evaluation overview I previously discussed our approach for evaluating variant detection methods using a highly confident set of reference calls provided by NIST’s Genome in a Bottle consor… […]

Raony Guimarães

October 27, 2013 at 9:51 pm

Reply
Hi Brad, thanks for the analysis. I am really interested in this and are trying to perform similar comparison between GATK, Freebayes and also ISAAC (although I still have difficulties getting ISAAC to run…), how do you use Freebayes when performing the calling? It seems like instead of having the realignment implemented within Freebayes, the author suggest the use of samtools fillmd, so did you perform the analysis by comparing the situation where Freebayes is used with samtools fillmd and without samtools fillmd?

Thank you

Sam

October 30, 2013 at 3:21 am

Reply
- Sam;
  I believe the fillmd/bamleftalign recommendations in the FreeBayes documentation are out of the date with the state of the caller. All of that work is now done by the caller itself. That’s reflected in the lack of difference in this comparison between with and without recalibration/realignment cases. So the calling used here is very vanilla. Here’s the code that constructs the commandline: https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/freebayes.py#L47
  
  Brad Chapman
  
  October 30, 2013 at 11:51 am
  
  Reply
  - This is absolutely correct. My apologies for the confusion in the documentation. I provide examples of how to stream various utilities together for time-of-flight BAM processing. At present, the optimal way to use freebayes is by specifying only the reference and set of BAM files.
    
    erik
    
    October 31, 2013 at 9:59 am
    
    Reply
    - Thanks Brad and Erik for answering. So basically, now we only need to perform the leftalignment then the freebayes calling right after in order to get the correct vcf without requiring to perform the realignment. (I assume markduplicate is still required?)
      
      And we can call both SNPs and Indel together as long as leftalign were performed. Is that right? Thanks
      
      Shing Wan Choi
      
      October 31, 2013 at 12:03 pm
      
      Reply
      - In freebayes, left realignment is done internally on a read-independent basis. You no longer need to use bamleftalign or any other independent realignment process. If you want to see what kind of effect the process has an option is provided to turn off the automatic realignment. The realignment is quite fast, using only a few percent of runtime, and in practice achieves identical performance to much more complicated methods.
        
        freebayes also always calls all short sequence variants together. The model is agnostic as to allele type at the point of detection, using only literal haplotypes observed in the reads. At times we describe it as the most local reference-free method— it is reference-free only within a dynamically-bounded window anchored by reference-matching reads.
        
        Many users are interested in turning off specific allele classes. I think that this is not a good idea to do in the detection phase, as errors can manifest as any allele type within alignments, and ignoring them may cause the caller to be overconfident at relatively noisy loci.
        
        Hope this clarifies things. I’m doing what I can to produce a system that is both user friendly in the default case and powerful when users have specific needs or unique experimental contexts. I’m writing up examples of both simple and complex use cases for a rewrite of the documentation, so please let me know off-thread if you have an specific questions or confusion.
        
        Erik Garrison
        
        October 31, 2013 at 1:49 pm
      - Erik, thank you for your answer, I will now proceeds to try the pipeline, hopefully I can come back to you guys with the results soon.
        
        Sam
        
        October 31, 2013 at 10:41 pm
      - @Sam — there isn’t much of a pipeline as freebayes is concerned. Just run freebayes on your alignments (in BAM format) and you’ll get a VCF, freebayes -f ref.fa file.bam >out.vcf.
        
        Erik Garrison
        
        November 4, 2013 at 4:26 pm
- freebayes has a lot of options, and can be used in many different contexts and processing patterns, but if you are working on diploids the best way to use it is to just run with the reference (-f) and a set of BAM files.
  
  I don’t really see the various base quality recalibrations and dbSNP-aware indel realignment has having much benefit. The algorithm internally implements left realignment.
  
  Erik Garrison
  
  October 31, 2013 at 10:07 am
  
  Reply
  - I second your saying, at least on high-coverage data. I used to compare the samtools calls on a very old data set with and without GATK base recalibration. Samtools worked a little better without recalibration. I also think realignment should be fast without a candidate list, at least faster than the BAQ calculation. We are simply lacking a good realigner implementation.
    
    Heng Li (@lh3lh3)
    
    November 1, 2013 at 12:02 pm
    
    Reply
    - Heng, the left-realignment implementation in freebayes is also factored into an independent executable (bamleftalign). See https://github.com/ekg/freebayes/blob/master/src/bamleftalign.cpp. I’d be happy for someone to show how this doesn’t work as well as a candidate-list based method for the newer variant detectors.
      
      For the local assembly or haplotype-based methods, the crucial thing is to get enough evidence in consistent representation to trigger consideration of the locus. Read-independent left-realignment provides this at only a few % of variant calling runtime.
      
      With the detection method in freebayes, clusters of multiply-aligned indels will typically result in haplotype calls across the inconsistently-represented region. Within these calls the exact representation of the alignment doesn’t matter. Genotype likelihoods are based on comparison of the read sequences and alternates, which handles the cases that might not be dealt with via read-independent left-realignment.
      
      Erik Garrison
      
      November 4, 2013 at 4:51 pm
      
      Reply
Brad,
Thanks for the post. Do you have a finer breakdown of the timings for each step of the Exome analysis ? It would be interesting to note the performance bottlenecks along with the variant concordance.

-Aniket

Aniket Deshpande

November 1, 2013 at 4:23 pm

Reply
- Aniket;
  Thanks for the feedback. I have a separate post with all the timing analysis I’ve put together so far:
  
  https://bcbio.wordpress.com/2013/05/22/scaling-variant-detection-pipelines-for-whole-genome-sequencing-analysis/
  
  I’m planning on updating that based on the new pipelines but haven’t started writing that up yet. Hope the previous post helps.
  
  Brad Chapman
  
  November 3, 2013 at 1:30 pm
  
  Reply
Hi,

Would it be possible to get the exact commands used for Freebayes in these tests? Thanks.

Robert Eveleigh

November 4, 2013 at 9:15 am

Reply
- Robert;
  Definitely, it’s pretty vanilla FreeBayes usage. The code to generate the calls is here:
  
  https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/freebayes.py#L37
  
  and command lines look like:
  
  freebayes -v out.vcf -f GRCh37.fa -b in.bam –use-mapping-quality –pvar 0.7 –ploidy 2
  
  Hope this helps.
  
  Brad Chapman
  
  November 4, 2013 at 1:40 pm
  
  Reply
  - I recommend just using freebayes -f $ref >out.vcf for diploid data. –ploidy 2 is default. –pvar 0.7 may reduce sensitivity. And, the docs should reflect this, but –use-mapping-quality has no effect unless –standard-gls is also specified.
    
    Hope this clarifies things.
    
    Brad, it’d be interesting if –pvar 0.7 improves performance over default.
    
    Erik Garrison
    
    November 4, 2013 at 4:23 pm
    
    Reply
    - To clarify further, mapping quality is accounted for by default now, using the “effective base depth” quantity developed by the developers of snpTools at Baylor College of Medicine.
      
      Erik Garrison
      
      November 4, 2013 at 4:24 pm
      
      Reply
      - Erik;
        Thanks for the heads up on the mapping quality change. This was added based on recommendations a year and a half ago (https://github.com/chapmanb/bcbio-nextgen/commit/680ebb3694765e945f0008471aea39edfb0136d7#diff-f56264ae93605f9df25e111b3d08512c) and I hadn’t realized it was no longer needed. It’s removed now to make the usage more vanilla.
        
        I use the –pvar flag to avoid variant calls with extremely low quality values (much smaller than 1). Some of these were so low they caused GATK post-processing to error out, which necessitated the flag. Since these were all getting filtered based on quality anyways, I set it to avoid losing sensitivity. I could try and put together a reproducible case of the extremely low values if it would be helpful.
        
        Brad Chapman
        
        November 5, 2013 at 11:53 am
    - And I should also correct myself, you’ll obviously want to do:
      
      freebayes -f $ref $bamfile >out.vcf
      
      Including the BAM files is important :)
      
      Erik Garrison
      
      November 4, 2013 at 4:26 pm
      
      Reply
Hi Brad,

As always, this is great work. There’s one piece, though, that I’d like to focus on that I think needs improvement and it would be good to garner feedback from the community. Your analysis above showing the “shared discordant” numbers uses hard genotype (GT) calls, which is really not the right thing to do; all downstream analysis of the VCF that includes genotypes really should be based on likelihoods. The GATK team is just starting to put together some documentation and best practices on this, but I’ll give a quick summary here.

Ultimately, what you want to see is how well calibrated the genotype likelihoods are; you can use the GQ (genotype quality) as a value for how confident the caller is in the genotype assignment. The question is how well that confidence matches to the real truth. So we actually expect 1 in 10 GTs with a GQ=10 to be incorrect (and anything off from that ratio is a mis-calibration). An incorrect GT with a high GQ is very bad (the caller was very confident in the wrong genotype) but an incorrect GT with a low confidence is within expectations.

I took a look at the VCF you released with the common sites of discordant genotypes. For the 38 records on chromosome 20 with GQ annotated, 25 (66%) of them have extremely low GQ (< Q20). These are generally sites with relatively low coverage and 1-2 sequencing errors and the likelihoods accurately reflect this.

So my question is: how should we (the GATK) communicate such cases to users in the VCFs we output? Currently, we greedily assign genotypes (GT) based on the likelihoods and assume that users will appropriately ignore those with low GQ. But clearly we need to take a more pro-active approach in cases where we know that a sample has the variant allele but where we have low confidence in the actual genotype (0/1 or 1/1). Should we assign such samples a GT of ./1? What do folks think?

Eric Banks

November 5, 2013 at 10:40 am

Reply
- Eric;
  Thanks for all this great feedback. I’m agreed that using likelihoods to stratify discordants into ‘well supported’ and ‘the caller was not sure’ would be a nice way to classify if callers have differing het/hom calls based on caller errors or low evidence. I’ll think of ways to incorporate this into future comparisons.
  
  On the other side, I do think it’s too simplistic to say that we should look at everything in terms of likelihoods. When reporting variants and using most downstream tools we practically need to translate the likelihoods into absolutes and report genotypes based on the caller’s best assessment. So we need to develop good ways to flatten the continuous likelihoods into something that’s easy to translate to end users.
  
  It would be useful to have variant calls supplemented with categorical classifications so you could more readily assess the reliability of a call. An INFO annotation or associated BED file that classifies a region as callable, non-callable or maybe would be a big help. Folks could decide whether to include ‘maybe’ calls and associated higher false positive rates. This might only be a documentation/perception issue and be as easy as saying that all calls with ‘GT < Q20' are less reliable and provide recommended practices for dealing with them based on desired risk of false positives.
  
  My view is that variant calls should come alongside ways to assess and report coverage and likelihood by region. Right now there are numerous tools to prioritize calls based on the experiment, but not to identify if important regions got missed, or are of low quality so might be wrong. Chanjo (https://github.com/robinandeer/chanjo) does a nice job of this and it would be useful to combine variant quality metrics alongside coverage.
  
  Thanks again for all the thoughts.
  
  Brad Chapman
  
  November 5, 2013 at 1:08 pm
  
  Reply
Could you point me to the hard filters you were using for freebayes? Thanks.

Heng Li (@lh3lh3)

November 19, 2013 at 10:24 pm

Reply
- Heng;
  It’s pretty vanilla (QUAL < 20, DP < 5). There is a little more information on other approaches I'm experimenting with in this comment above:
  
  https://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/#comment-1469
  
  Hope this helps
  
  Brad Chapman
  
  November 20, 2013 at 9:29 am
  
  Reply
Very nicely done post! Better than many peer-reviewed journal articles on this subject. One comment and one question for you:

(1) The Freebayes false positive rate seems significantly higher than the GATK FPR. I am curious what the sensitivity comparison looks like if the 3 callers were tuned for similar FPR.

(2) What was the total region size analyzed? i.e. did you use any of the GIB confidence-based region filters? they have BED filters for regions with known structural variation, etc.

John DiCarlo

November 21, 2013 at 12:17 am

Reply
- John;
  Thanks much for the thoughts and questions.
  
  1. The FreeBayes potential false positive counts are slightly higher than GATK HaplotypeCaller (913 versus 325 for SNPs; 228 versus 83 for Indels). However, one cavaet is that the preparation of the Genome in a Bottle reference makes heavy use of GATK HaplotypeCaller so these materials have some likely biases towards the GATK tools. The general notion is that many of these FreeBayes calls may be true calls that are missing from the reference. I’ve sent these off to Justin Zook at NIST and he’s evaluating if any of this may be helpful for improving the GiaB reference to be more unbiased.
  
  2. The total region size is 117 million bases. This is 78% of the total capture regions for the exome (150 million). The subsetting is two things: we only include callable regions from the actual exome data (4 or more reads aligned) and then by the Genome in a Bottle BED file. So it does avoid the known structural variations and difficult regions that GiaB also excludes.
  
  Thanks again
  
  Brad Chapman
  
  November 22, 2013 at 11:37 am
  
  Reply
  - Will be interesting to see what Justin Zook concludes, and that certainly is a reasonable (or maybe even likely) hypothesis. However, seems also plausible that GATK could achieve the Freebayes sensitivity level if tuned to allow 3X more false positives through.
    
    Also, thanks very much for providing the analysis region size – this allows us to compare your Freebayes FPR of about 10 per Mb to other published reports and benchmarks. Again, super nice post!
    
    John DiCarlo
    
    November 26, 2013 at 2:04 pm
    
    Reply
[…] to call variants, instead of GATK. The change over was largely inspired by an excellent blog post by Brad Chapman. His comparison shows that FreeBayes is comparable to GATK in terms of […]

A New Version of dDocent is here | dDocent: A guide to ddRAD sequencing

February 19, 2014 at 4:08 pm

Reply
[…] Updated comparison of variant detection methods: Ensemble, FreeBayes and minimal BAM preparation pip… […]

Updated comparison of variant detection methods: Ensemble, FreeBayes and minimal BAM preparation pipelines | Raony Guimarães

October 23, 2014 at 10:02 am

Reply
Did they ever publish FreeBayes, I can’t find the publication??

Anonymous

April 8, 2017 at 9:16 am

Reply
- I believe the correct reference is this arXiv paper: https://arxiv.org/abs/1207.3907
  
  Brad Chapman
  
  April 10, 2017 at 8:59 am
  
  Reply

Blue Collar Bioinformatics