Examples / Tutorial¶

On this page, we provide examples for the analysis of DNA-libraries published in three different studies to demonstrate the use of quicksand on different library- (single- and double-stranded) and experiment types (capture and shotgun). We demonstrate how the final summary report is evaluated and how filter thresholds can be adjusted.

The main columns used to evaluate the final summary report are FamPercentage and ProportionExpectedBreadth, which correspond to the PSF and PEB filter values, respectively (see the quicksand publication).

Importantly, the PEB filter is generally more informative than the PSF filter because the breadth of coverage is a key metric for determining assignment accuracy. However, since PEB assumes unbiased (random) mapping of sequences to the reference genome, lower values may occur when this expectation is violated; for instance, due to the enrichment strategy or lacking reference genome representation in the database.

Additional columns provided in the report for quality control include the KrakenUniq statistics: Kmers (unique kmer count), KmerCoverage (proportion of kmers observed), and KmerDupRate (average number of times each kmer was observed) as well as the ProportionMapped column.

Download quicksand-database¶

Please see the Quickstart section to download the required reference datastructure (refseq).

quicksand was developed and tested using a database built from the mammalian mtDNA genomes in RefSeq. For broader usability, the database provided on the FTP server was built from the full RefSeq release 221, including non-mammalian mtDNA genomes. This was done, because database construction is computationally intensive and may not be feasible for users with limited computational resources.

While this expanded database enables the detection of a wider range of organisms, quicksand (and the filter thresholds) have not been systematically tested for the validation of non-mammalian assignments. For this section, we therefore focus on the mammalian classifications only.

Single stranded Libraries¶

For the estimation of ancient DNA damage rates, quicksand assumes that sequences are derived from libraries prepared using a single stranded library preparation protocol (ssDNA). For the analysis of libraries prepared with a double stranded library preparation method, see Double stranded Libraries

We first demonstrate running quicksand on ssDNA libraries enriched for mammalian mtDNA (ssDNA / Mammalian mtDNA Capture) and human mtDNA (ssDNA / Human mtDNA Capture) as well as shallow shotgun sequencing data (ssDNA / Shotgun Sequencing).

As example, we use a librares created from the same sediment sample (SP3854), collected from Layer 15 of Denisova Cave, East Chamber, published in Slon et al. 2017. The data are available in the European Nucleotide Archive (ENA), and an overview of the libraries is provided below. An example for the analysis of multiple samples in parallel, see dsDNA / Mammalian mtDNA Capture.

Sample	Layer	Library	Library Type	Sequences	ENA Entry
SP3854	EC 15	R4095	Mammalian mtDNA Capture	1,837,960	ERR1883545
SP3854	EC 15	R4051	Human mtDNA Capture	1,512,964	ERR1883655
SP3854	EC 15	R5723	Shotgun	1,638,519	ERR1883446

ssDNA / Mammalian mtDNA Capture¶

Runtime: ~3 min.

With the quicksand database downloaded to the refseq directory (Quickstart), create a subdirectory for the mammalian mtDNA capture analysis and download the required library file into a fresh input directory:

mkdir ssDNA_mammMT && cd ssDNA_mammMT
wget -P split
ftp.sra.ebi.ac.uk/vol1/run/ERR188/ERR1883545/R4095.bam​

Then run quicksand with the default parameters:

nextflow run mpieva/quicksand \
    -r v2.6 \
    --split split/ \
    --db ../refseq/kraken/Mito_db_kmer22/ \
    --genomes ../refseq/genomes/ \
    --bedfiles ../refseq/masked/ \
    -profile singularity

After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. In total, 22 mammalian families are reported of which 20 show Ancientness levels of + or ++ (see Table below). Nine mammalian families pass the default PSF and PEB filter thresholds (see quicksand_v2.6/filtered_report_0.5p_0.5b.tsv).

Family	Best Reference	Reads	PSF	Coverage	PEB
Hyaenidae	Crocuta_crocuta	7616	34.98	24.84x	0.662
Rhinocerotidae	Coelodonta_antiquitatis	3965	19.15	11.32x	0.934
Bovidae	Bos_grunniens	3448	16.16	10.17x	0.824
Ursidae	Ursus_spelaeus	1066	5.04	3.26x	0.757
Canidae	Canis_lupus	970	4.52	2.71x	0.758
Equidae	Equus_ovodovi	884	4.24	2.72x	0.703
Elephantidae	Mammuthus_columbi	553	2.65	1.61x	0.711
Mustelidae	Martes_zibellina	514	2.45	1.51x	0.795
Cervidae	Cervus_nippon	281	1.32	0.82x	0.718
Felidae	Panthera_tigris	369	1.74	1.04x	0.346
Eupleridae	Galidia_elegans	225	1.10	0.7x	0.103
Phocidae	Monachus_monachus	166	0.81	0.47x	0.131
Sciuridae	Marmota_himalayana	155	0.75	0.44x	0.291
Muridae	Rattus_norvegicus	141	0.67	0.4x	0.193
Viverridae	Paguma_larvata	112	0.26	0.32x	0.331
Cercopithecidae	Macaca_sylvanus	70	0.33	0.2x	0.23
Cricetidae	Neodon_irene	65	0.31	0.18x	0.443
Cebidae	Sapajus_xanthosternos	64	0.24	0.19x	0.148
Hominidae	Homo_sapiens_subsp._'Denisova'	41	0.19	0.13x	1.005
Otariidae	Zalophus_japonicus	33	0.16	0.09x	0.402

The mammalian mtDNA enrichment analyzed here (targeting 242 mammalian mtDNA genomes) generally provides sufficiently random coverage across all mammalian families for the PEB filter to be effective. The lower PEB value observed for the Hyaenidae assignment (0.66), however, is likely due to the absence of Hyaenidae mtDNA in the capture probe set.

The Felidae, Eupleridae, Phocidae, and Sciuridae assignments pass the PSF filter but not the PEB filter. With the observed genomic coverages (between 0.4x and 1x), these cases are likely false positives. In contrast, the Hominidae assignment passes the PEB filter (PEB = 1.0) but not the PSF filter. The high PEB value for the Hominidae indicates an authentic assignment that would benefit from additional data for verification (human mtDNA capture).

In summary, for generic mammalian mtDNA capture libraries, the default quicksand filters (PSF = 0.5 and PEB = 0.5) provide a reliable set of family assignments.

ssDNA / Human mtDNA Capture¶

Runtime: ~3 min.

With the quicksand database downloaded to the refseq directory (Quickstart), create a subdirectory for the mammalian mtDNA capture analysis and download the required library file into a fresh input directory:

mkdir ssDNA_humMT && cd ssDNA_humMT
wget -P split ftp.sra.ebi.ac.uk/vol1/run/ERR188/ERR1883655/R4051.bam

Then run quicksand with the default parameters:

nextflow run mpieva/quicksand \
    -r v2.6 \
    --split split/ \
    --db ../refseq/kraken/Mito_db_kmer22/ \
    --genomes ../refseq/genomes/ \
    --bedfiles ../refseq/masked/ \
    -profile singularity

After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. A total of 15 mammalian families are detected (Table below). Only the Hominidae family passes the default PSF and PEB filter thresholds (quicksand_v2.6/filtered_report_0.5p_0.5b.tsv).

Family	Best Reference	Reads	PSF	Coverage	PEB
Hyaenidae	Crocuta_crocuta	2894	55.97	9.75x	0.19
Rhinocerotidae	Coelodonta_antiquitatis	587	12.61	1.94x	0.192
Bovidae	Bos_taurus	280	5.40	0.86x	0.231
Equidae	Equus_ovodovi	154	3.24	0.5x	0.252
Felidae	Neofelis_diardi	146	2.81	0.43x	0.107
Sciuridae	Dremomys_rufigenis	132	2.60	0.35x	0.19
Hominidae	Homo_sapiens_subsp._'Denisova'	118	2.28	0.4x	0.968
Cervidae	Cervus_hanglu	102	2.17	0.3x	0.202
Muridae	Rattus_norvegicus	99	2.10	0.29x	0.171
Cercopithecidae	Macaca_nigra	85	1.80	0.27x	0.152
Ursidae	Ursus_spelaeus	81	1.44	0.27x	0.354
Mustelidae	Martes_zibellina	75	1.50	0.25x	0.314
Canidae	Canis_aureus	64	1.35	0.2x	0.312
Elephantidae	Mammuthus_primigenius	55	1.18	0.15x	0.322
Cricetidae	Phodopus_sungorus	36	0.77	0.09x	0.351

Even in the human mtDNA capture data, assignments are dominated by faunal “bycatch,” i.e., mtDNA from other mammalian families that is also recovered by the human mtDNA probes. However, unlike for the mammalian mtDNA capture, non-human assignments do not meet the expectation of random mapping because they are biased towards the subset of sequences similar to the human mtDNA probes.

This substantially lowers the PEB-values of all non-human assignments, ranging from 0.1 to 0.35. As a consequence, the PEB filter cannot be used to identify true-positive assignments among the faunal bycatch; it can do so only for the Hominidae - the only family that meets the criterion of random mapping.

In addition, the PSF-values in single-species captures are less informative for distinguishing true- from false-positive families. The above-mentioned capture bias, combined with a reduced number of total detected families, changes the distribution of PSF values compared to the mammalian captures. We find an increased PSF for the dominant family (e.g., Hyaenidae: 55.97 vs. 34.98) and a leveling of PSF values across the remaining families. In this example, all mammalian families pass the default PSF threshold.

Because the human mtDNA capture experiment is intentionally biased toward Hominidae, applying the PSF filter is optional, as it could remove the sequences we are specifically trying to detect. Instead, we recommend relying on the PEB filter to assess the quality of Hominidae assignments.

For a more conservative approach, a stricter PEB threshold (e.g., 0.7) combined with a minimum number of reads or genomic coverage (e.g., >0.1x) could be applied.

ssDNA / Shotgun Sequencing¶

Runtime: ~3 min.

With the quicksand database downloaded to the refseq directory (Quickstart), create a subdirectory for the mammalian mtDNA capture analysis and download the required library file into a fresh input directory:

mkdir ssDNA_shotgun && cd ssDNA_shotgun
wget -P split ftp.sra.ebi.ac.uk/vol1/run/ERR188/ERR1883446/R5723.bam

Then run quicksand with the default parameters:

nextflow run mpieva/quicksand \
    -r v2.6 \
    --split split/ \
    --db ../refseq/kraken/Mito_db_kmer22/ \
    --genomes ../refseq/genomes/ \
    --bedfiles ../refseq/masked/ \
    -profile singularity

After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. A single mammalian family, the Hyaenidea is detected with 33 sequences (Table below), showing an Ancientness level of ++ and passing both the default PSF and PEB filter thresholds (quicksand_v2.6/filtered_report_0.5p_0.5b.tsv).

Family	Best Reference	Reads	PSF	Coverage	PEB
Hyaenidae	Crocuta_crocuta	33	40.50	0.08x	1.153

With the low sequencing depth (1,638,519 sequences), only the Hyaenidae are detected by quicksand. The assignments in the shotgun data are unaffected by capture bias, resulting in a PEB value of ~1 for the Hyaenidae.

The final summary report includes only assignments that pass the initial KrakenUniq filters. As we demonstrated with simulated data (see the quicksand publication), the default thresholds for the number of unique kmers can be lowered for a more sensitive screening, allowing additional families to pass the initial KrakenUniq step and proceed to mapping and downstream evaluation.

quicksand saves the KrakenUniq report for each sample, which can be examined to identify families that would be included if the filter thresholds were lowered. In this library, the report (quicksand_v2.6/stats/R5723.kraken.report) lists 10 mammalian family assignments, of which only the Hyaenidae meets the default thresholds of 129 unique kmers and 3 sequences (Table below).

TaxReads	Kmers	Rank	Name
39	317	family	Hyaenidae
16	82	family	Bovidae
6	48	family	Rhinocerotidae
15	30	family	Muridae
10	29	family	Vespertilionidae
15	24	family	Cricetidae
7	16	family	Pteropodidae
6	12	family	Soricidae
2	11	family	Otariidae
2	11	family	Ursidae

To override the default filters and process all potential mammalian families, repeat the quicksand-run with the flags --krakenuniq_min_kmers 11 and --krakenuniq_min_reads 2. This is done to apply more permissive KrakenUniq filter thresholds:

mv quicksand_v2.6/ quicksand_v2.6.old/

nextflow run mpieva/quicksand \
    -r v2.6 \
    --split split/ \
    --db ../refseq/kraken/Mito_db_kmer22/ \
    --genomes ../refseq/genomes/ \
    --bedfiles ../refseq/masked/ \
    --krakenuniq_min_kmers 11 \
    --krakenuniq_min_reads 2 \
    -profile singularity

The lower filter thresholds increase the runtime (~10 minutes) and allow 600 families to pass the KrakenUniq step. Of these, 62 assignments have at least one mapped sequence, and 57 pass the PEB and PSF filters. Only four mammalian families pass both filters (Table below). Given the low sequence counts per family, only the Hyaenidae shows significant Ancientness levels (++).

Although the number of sequences is very low, the detected families correspond to the four most abundant ones in the sample, as determined using the mammalian capture library.

Family	Best Reference	Reads	PSF	Coverage	PEB
Hyaenidae (++)	Crocuta_crocuta	33	17.58	0.085x	1.153
Rhinocerotidae	Coelodonta_antiquitatis	5	2.74	0.01x	1.1
Bovidae	Procapra_przewalskii	2	1.09	0.0x	1.135
Ursidae	Ursus_spelaeus	1	0.54	0.0x	1.134

To summarize, while we discourage using quicksand on shallow shotgun sequencing data to draw biological conclusions about the taxonomic composition of a sediment sample, it can provide a first impression of DNA preservation and help detect high-yielding samples for follow-up capture experiments.

In contrast, Section ds-shot shows the application of quicksand on a deeply sequenced double-stranded library.

Double stranded Libraries¶

Sequences from libraries prepared with a double-stranded (dsDNA) protocol display different deamination patterns compared to those from single-stranded libraries, showing G-to-A substitutions at the 3′ ends of DNA sequences rather than the C-to-T substitutions typical of single-stranded libraries.

To account for these G-to-A patterns in the Ancientness assessment, quicksand needs to be run with the --doublestranded flag

dsDNA / Mammalian mtDNA Capture¶

Runtime: ~1.5h

For the analysis of double-stranded mammalian mitochondrial capture libraries, we used published data from sediment samples collected at El Mirón Cave (Cantabria, Spain). These data were generated by Gelabert et al. 2025 and are available in the ENA under the accession ID PRJEB74514 (Table below). The libraries were enriched for 51 mammalian mtDNA genomes (Tejero et al. 2024) and sequenced to a depth of 5–10 million reads per library. In the original study, Gelabert et al. analyzed the data using euka to bin sequences and assess aDNA preservation, followed by a BLAST/MEGAN-based approach to refine taxonomic classifications to the genus and species levels.

ENA Entry	Sample	Layer	ENA Entry	Sample	Layer
ERR13916465	1	121	ERR13916459	16	122/124
ERR13916464	2	122/124	ERR13916456	17	125
ERR13916460	3	122/124	ERR13916454	18	126
ERR13916458	4	122/124	ERR13916450	19	127
ERR13916455	5	125	ERR13916448	20	128
ERR13916453	6	126	ERR13916440	21	129
ERR13916452	7	127	ERR13916436	22	130
ERR13916449	8	128	ERR13916433	23	130
ERR13916439	9	129	ERR13916430	24	130
ERR13916429	10	130	ERR13916467	26	119
ERR13916432	11	130	ERR13916443	28	128
ERR13916435	12	130	ERR13916446	30	128
ERR13916466	13	121	ERR13916445	32	128
ERR13916462	14	122/124	ERR13916442	34	128
ERR13916461	15	122/124	ERR13916438	36	130
			ERR13916437	38	130

With the quicksand database downloaded to the refseq directory (Quickstart), create a subdirectory for the mammalian mtDNA capture analysis:

mkdir dsDNA_miron && cd dsDNA_miron

Then, download the required 32 library files into a new input directory for parallel processing:

mkdir split && cd split \
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/038/ERR13916438/ERR13916438.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/067/ERR13916467/ERR13916467.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/037/ERR13916437/ERR13916437.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/066/ERR13916466/ERR13916466.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/082/ERR13953082/ERR13953082.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/043/ERR13916443/ERR13916443.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/046/ERR13916446/ERR13916446.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/059/ERR13916459/ERR13916459.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/035/ERR13916435/ERR13916435.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/040/ERR13916440/ERR13916440.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/052/ERR13916452/ERR13916452.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/045/ERR13916445/ERR13916445.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/030/ERR13916430/ERR13916430.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/062/ERR13916462/ERR13916462.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/053/ERR13916453/ERR13916453.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/036/ERR13916436/ERR13916436.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/032/ERR13916432/ERR13916432.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/056/ERR13916456/ERR13916456.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/061/ERR13916461/ERR13916461.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/054/ERR13916454/ERR13916454.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/049/ERR13916449/ERR13916449.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/058/ERR13916458/ERR13916458.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/055/ERR13916455/ERR13916455.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/060/ERR13916460/ERR13916460.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/029/ERR13916429/ERR13916429.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/065/ERR13916465/ERR13916465.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/064/ERR13916464/ERR13916464.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/042/ERR13916442/ERR13916442.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/033/ERR13916433/ERR13916433.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/048/ERR13916448/ERR13916448.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/050/ERR13916450/ERR13916450.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/039/ERR13916439/ERR13916439.bam
cd ../

Then, run quicksand using the default parameters and the --doublestranded flag:

quicksand run mpieva/quicksand \
-r v2.6 \
--split split \
--genomes refseq/genomes \
--db refseq/kraken/Mito_db_kmer22 \
--bedfiles refseq/masked \
--doublestranded \
-profile singularity

After completion, the summary report contains the results for all 32 libraries. Applying the default filters (PSF of 0.5 and PEB of 0.5; quicksand_v2.6/filtered_report_0.5p_0.5b.tsv), recovers the same mammalian taxa reported by Gelabert et al. (Table below).

Interestingly, we detect ancient DNA from several bird families across multiple samples: Accipitridae (11 samples), Columbidae (6), Falconidae (2), Phasianidae (3), and Strigidae (11), as well as ancient fish mtDNA (Salmonidae) in 14 samples, consistent with archaeological evidence for these families in El Mirón Cave. All these families pass the PEB-filter, even though they are not part of the mammalian enrichment set (Tejero et al. 2024; supplementary data).

Ancient Family	Best References (genera)	Number of Samples (Samples with > 5x Coverage)	Layers
Bovidae	Capra, Bos	31 (26)	All
Canidae	Canis, Cuon, Vulpes	32 (23)	All
Cervidae	Capreolus, Cervus, Rangifer	32 (28)	All
Cricetidae	Microtus	28 (15)	121, 122/124, 125, 126, 127, 128, 129, 130
Elephantidae	Mammuthus	4 (0)	128,130
Equidae	Equus	22 (3)	121, 122/124, 125, 126, 127, 128, 129, 130
Felidae	Lynx, Panthera, Felis	24 (3)	119, 122/124, 125, 126, 127, 128, 129, 130
Hominidae	Homo	13 (2)	121, 122/124, 125, 126, 127, 129
Hyaenidae	Crocuta	19 (9)	119, 121, 125, 126, 128, 129, 130
Leporidae	Lepus	13 (0)	121, 122/124, 125, 126, 127, 128, 129
Muridae	Apodemus	5 (0)	130
Mustelidae	Mustela	2 (2)	122/124, 128
Rhinocerotidae	Coelodonta	3 (0)	130
Soricidae	Sorex	7 (2)	119, 127, 128, 130
Suidae	Sus	3 (0)	130
Talpidae	Talpa	13 (0)	121, 122/124, 126, 127, 128, 130
Ursidae	Ursus	24 (2)	119, 122/124, 125, 126, 127, 128, 129, 130

The original study reported 70 assignments with coverage >5x. Re-analysis of the data using quicksand identifies more samples exceeding this threshold (115) and with higher coverage for all reported families.

To illustrate how quicksand output can be analysed further, we here repeat the human haplogroup analysis for three samples from the Solutrean levels 121, 122, and 126 reported by Gelabert et al. 2025. In all three cases, quicksand yields higher mtDNA coverage than previously reported: 4.86x, 10.71x, and 11.37x compared to 2.5x, 5.6x, and 6.4x for levels 121, 122, and 126, respectively.

Haplogroup calling with HaploGrep 3 (Schönherr et al., 2023) requires an mtDNA consensus sequence based on alignment to the rCRS. However, the “best” reference genome for the Hominidae family in default quicksand runs is often the Denisovan or Neanderthal mtDNA genome. This occurs because Neanderthals and Denisovans are classified as subspecies of modern humans in the NCBI taxonomy (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?command=show&mode=tree&id=9606). Consequently, if KrakenUniq matches kmers to both the modern human and Neanderthal/Denisovan mtDNA genomes, the ‘best’ assignment is placed at the subspecies (Neanderthal/Denisovan) level rather than at the Homo node.

This issue can be avoided by running quicksand with a set of predefined (“fixed”) reference genomes for selected families, either directly or in a second step using the --rerun flag.

In “rerun” mode, quicksand skips the preprocessing and KrakenUniq classification and starts from the ExtractedReads for the selected “fixed” families (e.g., quicksand_v2.6/out/Hominidae/1-extracted/*.bam). quicksand then processes these sequences using the fixed reference genomes and updates the final summary reports to include the statistics for the additional mappings.

This example shows the combined use of the --fixed with the --rerun flag.

The input for the --fixed flag is a tab-separated file specifying the family, a unique tag (or species name), and the absolute path to the mtDNA reference genome to be used. The number of fixed references per family is not limited, allowing alignments and statistics to be generated for multiple reference genomes per family if needed.

Run quicksand with the --rerun and --fixed options to map all Hominidae sequences to the rCRS. The rCRS is the Homo_sapiens reference mtDNA genome that is part of the quicksand database

Prepare the fixed.tsv input file:

export FAMILY=”Hominidae”​
export TAG=”rCRS”
export
GENOME=”$PWD/../refseq/genomes/Hominidae/Homo_sapiens.fasta”​
​
echo -e "Family\tTag\tGenome" > fixed.tsv
echo -e $FAMILY\t$TAG\t&GENOME >> fixed.tsv

Repeat the quicksand run with the --rerun and --fixed options, as well as the --doublestranded flag:

quicksand run mpieva/quicksand \
    -r v2.6 \
    --fixed fixed.tsv \
    --rerun \
    --doublestranded \
    -profile singularity

The mapped and deduplicated bam-files can be found in the freshly created fixed subdirectory of the family specific output-folder (quicksand_v2.6/out/Hominidae/fixed/3-deduped/). Construct a FASTA consensus sequence from the rCRS-mapped and deduplicated sequences of each sample using ANGSD v0.940 (Korneliussen et al. 2014), with the most common allele option (-doFasta 2 and -doCounts 1 flags), trimming the first and last base of each sequence to mitigate aDNA damage effects (-trim 1).

Index the sample BAM-files:

samtools index quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916465.Hominidae.rCRS_deduped.bam​
samtools index quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916462.Hominidae.rCRS_deduped.bam
samtools index quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916454.Hominidae.rCRS_deduped.bam

Create FASTA consensus sequences:

angsd -out ERR13916465.fasta -i quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916465.Hominidae.rCRS_deduped.bam -doFasta 2 -doCounts 1 -trim 1
angsd -out ERR13916462.fasta -i quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916462.Hominidae.rCRS_deduped.bam -doFasta 2 -doCounts 1 -trim 1
angsd -out ERR13916454.fasta -i quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916454.Hominidae.rCRS_deduped.bam -doFasta 2 -doCounts 1 -trim 1

Next, use the HaploGrep 3 web tool (https://haplogrep.i-med.ac.at/) to assign human mtDNA haplogroups to each sample (Table below).

Sample	Layer	Gelabert et al. 2015	This study
ERR13916465	121	R or U	U
ERR13916462	122	U2'3'4'7'8'9	U2'3'4'7'8'9
ERR13916454	126	U2'3'4'7'8'9	U4’9

The detected haplogroups are consistent with those reported by Gelabert et al. 2025. Due to the increased amount of data recovered using quicksand, we were able to refine the haplogroup assignments for sample ERR13916465 (U) and ERR13916454 (U4’9, a clade within U2'3'4'7'8'9).

While a complete re-evaluation of this dataset is beyond the scope of this example, it demonstrates that quicksand can be used to analyze mammalian capture enrichment data generated from double-stranded DNA libraries using a different probeset. The results are consistent with those obtained in the original publication, and quicksand classifies more sequences, allowing the identification of additional families and providing more data for mtDNA haplogroup assignment.

We also illustrate how quicksand output can be used for additional downstream analyses and how run options can be adjusted to meet the requirements of external tools. While Gelabert et al. applied several additional quality-control and filtering steps to remove modern faunal contaminants (not replicated here), we show that quicksand is suitable to provide a quick and easy starting point for subsequent in-depth, taxa-specific analyses.

dsDNA / Shotgun Sequencing¶

This chapter demonstrates the analysis of a deeply sequenced dsDNA shotgun library with quicksand.

For this example, we use data from sediment sample ‘SAT29’ from Satsurblia Cave (Georgia), described in Gelabert et al. 2021, containing ~522 million sequences. The BAM file available in the ENA includes raw reads processed with cutadapt v2.7 to remove sequencing adapters and poly-A–tailed reads.

With the quicksand database downloaded to the refseq directory (Quickstart), create a subdirectory for the shotgun library analysis and download the required library file into a fresh input directory:

mkdir dsDNA_sat && cd dsDNA_sat
​
wget -P split \
ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR602/004/ERR6024164.bam​

Then run quicksand with the default parameters and the --doublestranded flag:

quicksand run mpieva/quicksand \
    -r v2.6 \
    --split split \
    --genomes ../refseq/genomes \
    --db ../refseq/kraken/Mito_db_kmer22 \
    --bedfiles ../refseq/masked \
    --doublestranded \
    -profile singularity

After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. quicksand detects 1,237 families, of which 382 have at least one mapped sequence, and 14 pass the PSF and PEB filter thresholds. Five mammalian families are detected: Canidae (best reference: Canis lupus), Bovidae (best reference: Bison bonasus), Hominidae (best reference: Homo sapiens subsp. ‘Denisova’), and Cervidae (best reference: Cervus canadensis), with Ancientness ratings of ++ and mtDNA coverages of 2.96x, 1.84x, 1.94x, and 0.37x, respectively.

In addition, Cricetidae (best reference: Microtus ilaeus) is detected with Ancientness + and mtDNA coverage of 0.2x (Table below).

Family	Extracted	Kmers	KmerDupRate	Best Reference	Mapped	Deduped	Ancientness	PSF	Coverage	PEB
Canidae	2223	101 (8925)	1.69 (2.63)	Canis_lupus_familiaris	1666	1130	++	12.42	2.96	0.950
Bovidae	1783	1224 (6616)	2.69 (2.33)	Bison_bonasus	839	570	++	6.18	1.84	0.989
Cricetidae	259238	253 (1845)	1.38 (613.0)	Microtus_ilaeus	99	66	+	0.71	0.21	1.093
Hominidae	1080	27 (7082)	1.3 (2.35)	Homo_sapiens_subsp._'Denisova'	911	644	++	7.10	1.94	1.001
Cervidae	455	58 (1920)	2.21 (1.58)	Cervus_canadensis	191	133	++	1.51	0.37	1.081

We use this example to illustrate how the Kmers and KmerDupRate columns in the final summary report can be used for extended trouble-shooting. As shown in Table, the Cricetidae stands out with 259,238 sequences classified by KrakenUniq (Extracted). However, these sequences are based on only 1,845 unique kmers and exhibit a very high kmer duplication rate compared to the other families (613). Particularly telling is that only 99 of the 259,238 sequences successfully map to the best reference genome (Mapped). This combination of data indicates that sequences in the dataset containing a highly repetitive motif are being assigned to Cricetidae. Visual inspection of the sequences assigned to Cricetidae (quicksand_v2.6/out/Cricetidae/1-extracted/ERR6024164_extractedReads-Cricetidae.bam) shows that most classified sequences are identical in length (96 bp) and end with the same or highly similar sequence followed by a poly-G tail (ACTCCAGTCACCAGGAGGATCTCGTATGCCGTCTTCTGCTTGAAAA–PolyG), consistent with technical artifacts such as non-adapter-clipped reads. Although the 99 mapped Cricetidae sequences pass the PSF and PEB quicksand filters and appear authentic, this family-level assignment should be interpreted with caution.

We note that Gelabert et al. (2021) used only 226,880,778 sequences for their screening, after applying additional quality-filtering steps that are not replicated here that would probably remove these sequences.

The assignments of the Bovidae, Canidae, and Hominidae families are consistent with the analysis by Gelabert et al. 2021, who used CENTRIFUGE in combination with the whole non-redundant NCBI nucleotide database and a 2% classified-reads cutoff to screen the shotgun data to decide on organisms for follow-up targeted capture experiments and whole genome mapping. In this example, we show that in this case quicksand could be used to replace the CENTRIFUGE step for the detection of mammalian families in the sample. quicksand detects an additional family, the Cervidae, for which archaeological evidence was found in Satsurblia Cave.

In summary, this section demonstrates that quicksand can be run with default parameters on deeply sequenced shotgun data to extract the mammalian component that is consistent with the screening of the same data using a nuclear DNA database.

We also show how the KrakenUniq summary statistics in the final quicksand report can assist in trouble-shooting data-anomalies.

Examples / Tutorial¶

Download quicksand-database¶

Single stranded Libraries¶

ssDNA / Mammalian mtDNA Capture¶

ssDNA / Human mtDNA Capture¶

ssDNA / Shotgun Sequencing¶

Double stranded Libraries¶

dsDNA / Mammalian mtDNA Capture¶

dsDNA / Shotgun Sequencing¶

quicksand

Navigation

Related Topics