syri

Synteny and Rearrangement Identifier

View the Project on GitHub schneebergerlab/syri

Variant calling benchmarking (2024)

Introduction

Syri was first benchmarked in 2019. Since then, it has received multiple updates and improvements. Similarly, other variant callers have improved as well along with the introduction of new variant callers. Here, we benchmark the variant (both small and SVs) calling performance of syri against some of the most popular variant callers. This should provide users an estimate of the expected performance.

Benchmarking Results

Genomic variation can be grouped as small variations (SNPs and short indels), structural variations (large indels, tandem duplications) and structural rearrangements (inversions, translocations, segmental duplications). Here, we limit the benchmarking to the identification of small variants and SVs (large indels) as benchmark VCFs for structural rearrangements are lacking.

Small variation

We compared syri, deepvariant, and gatk. Both syri and deepvariant performed very well while GATK had comparatively lower recall. Deepvariant and GATK used short-read WGS, which is significantly easier and cheaper to generate than high-quality genome assemblies, making them a suitable option when the project objective is to find small variations with sufficient quality. However, if the objective is to get the best possible variant calling, then using high-quality assemblies with syri might be a more suitable option.


Benchmarks for small variation calling. Panels show values for precision, recall and F1-Qscores.

Structural variation

Compared to small variants, all tools had lower performance for SV calling. pbsv had the highest precision while cutesv with diploid assembly had the highest recall. Overall, syri (without alignment filtering) performed best with the highest F1-Qscore followed by syri (with default filtering). These were followed by svim-asm, another assembly-based method. All three long-read based methods had lower performance.

Typically, long-reads are sequenced with the objective of generating genome assemblies. In such a scenario, using assemblies with syri should result in better variant calling. However, when only long/HiFi reads are available then they can also deliver robust variant calling performance.


Benchmarks for structural variation calling. Panels show values for precision, recall and F1-Qscores.

Observations

Methods

We benchmark variants for the human HG002 genome using chromosome-level assemblies, HiFi reads, and WGS short-reads as input data. The benchmark VCFs were obtained from Genome in a Bottle consortium.

Tools compared

Tool name genomic data used variants identified
Syri v1.7.1 Chromosome-level assembly Small variants, SVs and structural rearrangements
Svim-asm v1.0.3 Chromosome-level assembly SVs and structural rearrangements
Sniffles2 v2.2 long HiFi reads SVs and structural rearrangements
cuteSV v2.1.1 Chromosome-level assembly and long HiFi reads SVs and structural rearrangements
pbsv v2.10.0 long HiFi reads SVs and structural rearrangements
deepvariant (from parabricks) v4.4.0-1 WGS short reads Small variants
GATK (haplotypecaller) v4.6.1.0 WGS short reads Small variants

Datasets used

Benchmark VCFs:

  1. Small variants: GIAB benchmark variants in HG002 called against Human reference GRCh38
  2. SVs: GIAB benchmark variants in HG002 called against Human reference GRCh37

Reference genomes

  1. Human reference genome version GRCh38 for small variant benchmarking
  2. Human reference genome version GRCh37 for SVs benchmarking

HG002 data

  1. Assemblies:
    1. Paternal haplotype (NCBI accession: GCA_018852605.3)
    2. Maternal haplotype (NCBI accession: GCA_018852615.3)
  2. Pacbio HiFi reads: Revio SPRQ chemistry (filename: GRCh38.m84039_241001_220042_s2.hifi_reads.bc2018.bam). 1 SMRT cell.
  3. Shorts reads: WGS of HG002 (NCBI run ID: SRR12898346). Paired-end (151 x 2, 114.6 Gbp).

Brief description of analysis pipeline

Alignment

Variant calling

Tool name parameters and settings
Syri 1) Using default settings and 2) by turning off alignment filtering
Svim-asm Default settings for diploid assembly variant calling
Sniffles2 Default settings
cuteSV Defaults settings for both 1) HiFi read-based variant calling and 2) diploid assembly calling
pbsv Default settings
deepvariant Default settings
GATK Default settings

Benchmarking