Examples / Tutorial¶
On this page, we provide examples for the analysis of DNA-libraries published in three different studies to demonstrate the use of quicksand on different library- (single- and double-stranded) and experiment types (capture and shotgun). We demonstrate how the final summary report is evaluated and how filter thresholds can be adjusted.
The main columns used to evaluate the final summary report are FamPercentage and ProportionExpectedBreadth, which correspond to the PSF and PEB filter values, respectively (see the quicksand publication).
Importantly, the PEB filter is generally more informative than the PSF filter because the breadth of coverage is a key metric for determining assignment accuracy. However, since PEB assumes unbiased (random) mapping of sequences to the reference genome, lower values may occur when this expectation is violated; for instance, due to the enrichment strategy or lacking reference genome representation in the database.
Additional columns provided in the report for quality control include the KrakenUniq statistics: Kmers (unique kmer count), KmerCoverage (proportion of kmers observed), and KmerDupRate (average number of times each kmer was observed) as well as the ProportionMapped column.
Download quicksand-database¶
Please see the Quickstart section to download the required reference datastructure (refseq).
quicksand was developed and tested using a database built from the mammalian mtDNA genomes in RefSeq. For broader usability, the database provided on the FTP server was built from the full RefSeq release 221, including non-mammalian mtDNA genomes. This was done, because database construction is computationally intensive and may not be feasible for users with limited computational resources.
While this expanded database enables the detection of a wider range of organisms, quicksand (and the filter thresholds) have not been systematically tested for the validation of non-mammalian assignments. For this section, we therefore focus on the mammalian classifications only.
Single stranded Libraries¶
For the estimation of ancient DNA damage rates, quicksand assumes that sequences are derived from libraries prepared using a single stranded library preparation protocol (ssDNA). For the analysis of libraries prepared with a double stranded library preparation method, see Double stranded Libraries
We first demonstrate running quicksand on ssDNA libraries enriched for mammalian mtDNA (ssDNA / Mammalian mtDNA Capture) and human mtDNA (ssDNA / Human mtDNA Capture) as well as shallow shotgun sequencing data (ssDNA / Shotgun Sequencing).
As example, we use a librares created from the same sediment sample (SP3854), collected from Layer 15 of Denisova Cave, East Chamber, published in Slon et al. 2017. The data are available in the European Nucleotide Archive (ENA), and an overview of the libraries is provided below. An example for the analysis of multiple samples in parallel, see dsDNA / Mammalian mtDNA Capture.
Sample |
Layer |
Library |
Library Type |
Sequences |
ENA Entry |
|---|---|---|---|---|---|
SP3854 |
EC 15 |
R4095 |
Mammalian mtDNA Capture |
1,837,960 |
ERR1883545 |
SP3854 |
EC 15 |
R4051 |
Human mtDNA Capture |
1,512,964 |
ERR1883655 |
SP3854 |
EC 15 |
R5723 |
Shotgun |
1,638,519 |
ERR1883446 |
ssDNA / Mammalian mtDNA Capture¶
Runtime: ~3 min.
With the quicksand database downloaded to the refseq directory (Quickstart),
create a subdirectory for the mammalian mtDNA capture analysis and download the
required library file into a fresh input directory:
mkdir ssDNA_mammMT && cd ssDNA_mammMT
wget -P split
ftp.sra.ebi.ac.uk/vol1/run/ERR188/ERR1883545/R4095.bam
Then run quicksand with the default parameters:
nextflow run mpieva/quicksand \
-r v2.6 \
--split split/ \
--db ../refseq/kraken/Mito_db_kmer22/ \
--genomes ../refseq/genomes/ \
--bedfiles ../refseq/masked/ \
-profile singularity
After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. In total, 22 mammalian families are reported
of which 20 show Ancientness levels of + or ++ (see Table below). Nine mammalian families pass the default PSF and PEB filter thresholds (see
quicksand_v2.6/filtered_report_0.5p_0.5b.tsv).
Family |
Best Reference |
Reads |
PSF |
Coverage |
PEB |
|---|---|---|---|---|---|
Hyaenidae |
Crocuta_crocuta |
7616 |
34.98 |
24.84x |
0.662 |
Rhinocerotidae |
Coelodonta_antiquitatis |
3965 |
19.15 |
11.32x |
0.934 |
Bovidae |
Bos_grunniens |
3448 |
16.16 |
10.17x |
0.824 |
Ursidae |
Ursus_spelaeus |
1066 |
5.04 |
3.26x |
0.757 |
Canidae |
Canis_lupus |
970 |
4.52 |
2.71x |
0.758 |
Equidae |
Equus_ovodovi |
884 |
4.24 |
2.72x |
0.703 |
Elephantidae |
Mammuthus_columbi |
553 |
2.65 |
1.61x |
0.711 |
Mustelidae |
Martes_zibellina |
514 |
2.45 |
1.51x |
0.795 |
Cervidae |
Cervus_nippon |
281 |
1.32 |
0.82x |
0.718 |
Felidae |
Panthera_tigris |
369 |
1.74 |
1.04x |
0.346 |
Eupleridae |
Galidia_elegans |
225 |
1.10 |
0.7x |
0.103 |
Phocidae |
Monachus_monachus |
166 |
0.81 |
0.47x |
0.131 |
Sciuridae |
Marmota_himalayana |
155 |
0.75 |
0.44x |
0.291 |
Muridae |
Rattus_norvegicus |
141 |
0.67 |
0.4x |
0.193 |
Viverridae |
Paguma_larvata |
112 |
0.26 |
0.32x |
0.331 |
Cercopithecidae |
Macaca_sylvanus |
70 |
0.33 |
0.2x |
0.23 |
Cricetidae |
Neodon_irene |
65 |
0.31 |
0.18x |
0.443 |
Cebidae |
Sapajus_xanthosternos |
64 |
0.24 |
0.19x |
0.148 |
Hominidae |
Homo_sapiens_subsp._'Denisova' |
41 |
0.19 |
0.13x |
1.005 |
Otariidae |
Zalophus_japonicus |
33 |
0.16 |
0.09x |
0.402 |
The mammalian mtDNA enrichment analyzed here (targeting 242 mammalian mtDNA genomes) generally provides sufficiently random coverage across all mammalian families for the PEB filter to be effective. The lower PEB value observed for the Hyaenidae assignment (0.66), however, is likely due to the absence of Hyaenidae mtDNA in the capture probe set.
The Felidae, Eupleridae, Phocidae, and Sciuridae assignments pass the PSF filter but not the PEB filter. With the observed genomic coverages (between 0.4x and 1x), these cases are likely false positives. In contrast, the Hominidae assignment passes the PEB filter (PEB = 1.0) but not the PSF filter. The high PEB value for the Hominidae indicates an authentic assignment that would benefit from additional data for verification (human mtDNA capture).
In summary, for generic mammalian mtDNA capture libraries, the default quicksand filters (PSF = 0.5 and PEB = 0.5) provide a reliable set of family assignments.
ssDNA / Human mtDNA Capture¶
Runtime: ~3 min.
With the quicksand database downloaded to the refseq directory (Quickstart),
create a subdirectory for the mammalian mtDNA capture analysis and download the
required library file into a fresh input directory:
mkdir ssDNA_humMT && cd ssDNA_humMT
wget -P split ftp.sra.ebi.ac.uk/vol1/run/ERR188/ERR1883655/R4051.bam
Then run quicksand with the default parameters:
nextflow run mpieva/quicksand \
-r v2.6 \
--split split/ \
--db ../refseq/kraken/Mito_db_kmer22/ \
--genomes ../refseq/genomes/ \
--bedfiles ../refseq/masked/ \
-profile singularity
After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. A total of 15 mammalian families are
detected (Table below). Only the Hominidae family passes the default PSF and PEB filter thresholds (quicksand_v2.6/filtered_report_0.5p_0.5b.tsv).
Family |
Best Reference |
Reads |
PSF |
Coverage |
PEB |
|---|---|---|---|---|---|
Hyaenidae |
Crocuta_crocuta |
2894 |
55.97 |
9.75x |
0.19 |
Rhinocerotidae |
Coelodonta_antiquitatis |
587 |
12.61 |
1.94x |
0.192 |
Bovidae |
Bos_taurus |
280 |
5.40 |
0.86x |
0.231 |
Equidae |
Equus_ovodovi |
154 |
3.24 |
0.5x |
0.252 |
Felidae |
Neofelis_diardi |
146 |
2.81 |
0.43x |
0.107 |
Sciuridae |
Dremomys_rufigenis |
132 |
2.60 |
0.35x |
0.19 |
Hominidae |
Homo_sapiens_subsp._'Denisova' |
118 |
2.28 |
0.4x |
0.968 |
Cervidae |
Cervus_hanglu |
102 |
2.17 |
0.3x |
0.202 |
Muridae |
Rattus_norvegicus |
99 |
2.10 |
0.29x |
0.171 |
Cercopithecidae |
Macaca_nigra |
85 |
1.80 |
0.27x |
0.152 |
Ursidae |
Ursus_spelaeus |
81 |
1.44 |
0.27x |
0.354 |
Mustelidae |
Martes_zibellina |
75 |
1.50 |
0.25x |
0.314 |
Canidae |
Canis_aureus |
64 |
1.35 |
0.2x |
0.312 |
Elephantidae |
Mammuthus_primigenius |
55 |
1.18 |
0.15x |
0.322 |
Cricetidae |
Phodopus_sungorus |
36 |
0.77 |
0.09x |
0.351 |
Even in the human mtDNA capture data, assignments are dominated by faunal “bycatch,” i.e., mtDNA from other mammalian families that is also recovered by the human mtDNA probes. However, unlike for the mammalian mtDNA capture, non-human assignments do not meet the expectation of random mapping because they are biased towards the subset of sequences similar to the human mtDNA probes.
This substantially lowers the PEB-values of all non-human assignments, ranging from 0.1 to 0.35. As a consequence, the PEB filter cannot be used to identify true-positive assignments among the faunal bycatch; it can do so only for the Hominidae - the only family that meets the criterion of random mapping.
In addition, the PSF-values in single-species captures are less informative for distinguishing true- from false-positive families. The above-mentioned capture bias, combined with a reduced number of total detected families, changes the distribution of PSF values compared to the mammalian captures. We find an increased PSF for the dominant family (e.g., Hyaenidae: 55.97 vs. 34.98) and a leveling of PSF values across the remaining families. In this example, all mammalian families pass the default PSF threshold.
Because the human mtDNA capture experiment is intentionally biased toward Hominidae, applying the PSF filter is optional, as it could remove the sequences we are specifically trying to detect. Instead, we recommend relying on the PEB filter to assess the quality of Hominidae assignments.
For a more conservative approach, a stricter PEB threshold (e.g., 0.7) combined with a minimum number of reads or genomic coverage (e.g., >0.1x) could be applied.
ssDNA / Shotgun Sequencing¶
Runtime: ~3 min.
With the quicksand database downloaded to the refseq directory (Quickstart),
create a subdirectory for the mammalian mtDNA capture analysis and download the
required library file into a fresh input directory:
mkdir ssDNA_shotgun && cd ssDNA_shotgun
wget -P split ftp.sra.ebi.ac.uk/vol1/run/ERR188/ERR1883446/R5723.bam
Then run quicksand with the default parameters:
nextflow run mpieva/quicksand \
-r v2.6 \
--split split/ \
--db ../refseq/kraken/Mito_db_kmer22/ \
--genomes ../refseq/genomes/ \
--bedfiles ../refseq/masked/ \
-profile singularity
After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. A single mammalian family, the Hyaenidea is
detected with 33 sequences (Table below), showing an Ancientness level of ++ and passing both the default PSF and PEB filter thresholds
(quicksand_v2.6/filtered_report_0.5p_0.5b.tsv).
Family |
Best Reference |
Reads |
PSF |
Coverage |
PEB |
|---|---|---|---|---|---|
Hyaenidae |
Crocuta_crocuta |
33 |
40.50 |
0.08x |
1.153 |
With the low sequencing depth (1,638,519 sequences), only the Hyaenidae are detected by quicksand. The assignments in the shotgun data are unaffected by capture bias, resulting in a PEB value of ~1 for the Hyaenidae.
The final summary report includes only assignments that pass the initial KrakenUniq filters. As we demonstrated with simulated data (see the quicksand publication), the default thresholds for the number of unique kmers can be lowered for a more sensitive screening, allowing additional families to pass the initial KrakenUniq step and proceed to mapping and downstream evaluation.
quicksand saves the KrakenUniq report for each sample, which can be examined to identify families that would be included if the filter thresholds
were lowered. In this library, the report (quicksand_v2.6/stats/R5723.kraken.report) lists 10 mammalian family assignments,
of which only the Hyaenidae meets the default thresholds of 129 unique kmers and 3 sequences (Table below).
TaxReads |
Kmers |
Rank |
Name |
|---|---|---|---|
39 |
317 |
family |
Hyaenidae |
16 |
82 |
family |
Bovidae |
6 |
48 |
family |
Rhinocerotidae |
15 |
30 |
family |
Muridae |
10 |
29 |
family |
Vespertilionidae |
15 |
24 |
family |
Cricetidae |
7 |
16 |
family |
Pteropodidae |
6 |
12 |
family |
Soricidae |
2 |
11 |
family |
Otariidae |
2 |
11 |
family |
Ursidae |
To override the default filters and process all potential mammalian families, repeat the quicksand-run with the flags --krakenuniq_min_kmers 11 and
--krakenuniq_min_reads 2.
This is done to apply more permissive KrakenUniq filter thresholds:
mv quicksand_v2.6/ quicksand_v2.6.old/
nextflow run mpieva/quicksand \
-r v2.6 \
--split split/ \
--db ../refseq/kraken/Mito_db_kmer22/ \
--genomes ../refseq/genomes/ \
--bedfiles ../refseq/masked/ \
--krakenuniq_min_kmers 11 \
--krakenuniq_min_reads 2 \
-profile singularity
The lower filter thresholds increase the runtime (~10 minutes) and allow 600 families to pass the KrakenUniq step. Of these, 62 assignments have at least one mapped sequence, and 57 pass the PEB and PSF filters. Only four mammalian families pass both filters (Table below). Given the low sequence counts per family, only the Hyaenidae shows significant Ancientness levels (++).
Although the number of sequences is very low, the detected families correspond to the four most abundant ones in the sample, as determined using the mammalian capture library.
Family |
Best Reference |
Reads |
PSF |
Coverage |
PEB |
|---|---|---|---|---|---|
Hyaenidae (++) |
Crocuta_crocuta |
33 |
17.58 |
0.085x |
1.153 |
Rhinocerotidae |
Coelodonta_antiquitatis |
5 |
2.74 |
0.01x |
1.1 |
Bovidae |
Procapra_przewalskii |
2 |
1.09 |
0.0x |
1.135 |
Ursidae |
Ursus_spelaeus |
1 |
0.54 |
0.0x |
1.134 |
To summarize, while we discourage using quicksand on shallow shotgun sequencing data to draw biological conclusions about the taxonomic composition of a sediment sample, it can provide a first impression of DNA preservation and help detect high-yielding samples for follow-up capture experiments.
In contrast, Section ds-shot shows the application of quicksand on a deeply sequenced double-stranded library.
Double stranded Libraries¶
Sequences from libraries prepared with a double-stranded (dsDNA) protocol display different deamination patterns compared to those from single-stranded libraries, showing G-to-A substitutions at the 3′ ends of DNA sequences rather than the C-to-T substitutions typical of single-stranded libraries.
To account for these G-to-A patterns in the Ancientness assessment, quicksand needs to be run with the --doublestranded flag
dsDNA / Mammalian mtDNA Capture¶
Runtime: ~1.5h
For the analysis of double-stranded mammalian mitochondrial capture libraries, we used published data from sediment samples collected at El Mirón Cave (Cantabria, Spain). These data were generated by Gelabert et al. 2025 and are available in the ENA under the accession ID PRJEB74514 (Table below). The libraries were enriched for 51 mammalian mtDNA genomes (Tejero et al. 2024) and sequenced to a depth of 5–10 million reads per library. In the original study, Gelabert et al. analyzed the data using euka to bin sequences and assess aDNA preservation, followed by a BLAST/MEGAN-based approach to refine taxonomic classifications to the genus and species levels.
ENA Entry |
Sample |
Layer |
ENA Entry |
Sample |
Layer |
|---|---|---|---|---|---|
ERR13916465 |
1 |
121 |
ERR13916459 |
16 |
122/124 |
ERR13916464 |
2 |
122/124 |
ERR13916456 |
17 |
125 |
ERR13916460 |
3 |
122/124 |
ERR13916454 |
18 |
126 |
ERR13916458 |
4 |
122/124 |
ERR13916450 |
19 |
127 |
ERR13916455 |
5 |
125 |
ERR13916448 |
20 |
128 |
ERR13916453 |
6 |
126 |
ERR13916440 |
21 |
129 |
ERR13916452 |
7 |
127 |
ERR13916436 |
22 |
130 |
ERR13916449 |
8 |
128 |
ERR13916433 |
23 |
130 |
ERR13916439 |
9 |
129 |
ERR13916430 |
24 |
130 |
ERR13916429 |
10 |
130 |
ERR13916467 |
26 |
119 |
ERR13916432 |
11 |
130 |
ERR13916443 |
28 |
128 |
ERR13916435 |
12 |
130 |
ERR13916446 |
30 |
128 |
ERR13916466 |
13 |
121 |
ERR13916445 |
32 |
128 |
ERR13916462 |
14 |
122/124 |
ERR13916442 |
34 |
128 |
ERR13916461 |
15 |
122/124 |
ERR13916438 |
36 |
130 |
ERR13916437 |
38 |
130 |
With the quicksand database downloaded to the refseq directory (Quickstart),
create a subdirectory for the mammalian mtDNA capture analysis:
mkdir dsDNA_miron && cd dsDNA_miron
Then, download the required 32 library files into a new input directory for parallel processing:
mkdir split && cd split \
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/038/ERR13916438/ERR13916438.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/067/ERR13916467/ERR13916467.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/037/ERR13916437/ERR13916437.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/066/ERR13916466/ERR13916466.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/082/ERR13953082/ERR13953082.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/043/ERR13916443/ERR13916443.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/046/ERR13916446/ERR13916446.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/059/ERR13916459/ERR13916459.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/035/ERR13916435/ERR13916435.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/040/ERR13916440/ERR13916440.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/052/ERR13916452/ERR13916452.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/045/ERR13916445/ERR13916445.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/030/ERR13916430/ERR13916430.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/062/ERR13916462/ERR13916462.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/053/ERR13916453/ERR13916453.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/036/ERR13916436/ERR13916436.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/032/ERR13916432/ERR13916432.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/056/ERR13916456/ERR13916456.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/061/ERR13916461/ERR13916461.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/054/ERR13916454/ERR13916454.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/049/ERR13916449/ERR13916449.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/058/ERR13916458/ERR13916458.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/055/ERR13916455/ERR13916455.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/060/ERR13916460/ERR13916460.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/029/ERR13916429/ERR13916429.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/065/ERR13916465/ERR13916465.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/064/ERR13916464/ERR13916464.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/042/ERR13916442/ERR13916442.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/033/ERR13916433/ERR13916433.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/048/ERR13916448/ERR13916448.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/050/ERR13916450/ERR13916450.bam
wget -nc ftp.sra.ebi.ac.uk/vol1/err/ERR139/039/ERR13916439/ERR13916439.bam
cd ../
Then, run quicksand using the default parameters and the --doublestranded flag:
quicksand run mpieva/quicksand \
-r v2.6 \
--split split \
--genomes refseq/genomes \
--db refseq/kraken/Mito_db_kmer22 \
--bedfiles refseq/masked \
--doublestranded \
-profile singularity
After completion, the summary report contains the results for all 32 libraries. Applying the default filters (PSF of 0.5 and PEB of 0.5;
quicksand_v2.6/filtered_report_0.5p_0.5b.tsv), recovers the same mammalian taxa reported by Gelabert et al. (Table below).
Interestingly, we detect ancient DNA from several bird families across multiple samples: Accipitridae (11 samples), Columbidae (6), Falconidae (2), Phasianidae (3), and Strigidae (11), as well as ancient fish mtDNA (Salmonidae) in 14 samples, consistent with archaeological evidence for these families in El Mirón Cave. All these families pass the PEB-filter, even though they are not part of the mammalian enrichment set (Tejero et al. 2024; supplementary data).
Ancient Family |
Best References (genera) |
Number of Samples (Samples with > 5x Coverage) |
Layers |
|---|---|---|---|
Bovidae |
Capra, Bos |
31 (26) |
All |
Canidae |
Canis, Cuon, Vulpes |
32 (23) |
All |
Cervidae |
Capreolus, Cervus, Rangifer |
32 (28) |
All |
Cricetidae |
Microtus |
28 (15) |
121, 122/124, 125, 126, 127, 128, 129, 130 |
Elephantidae |
Mammuthus |
4 (0) |
128,130 |
Equidae |
Equus |
22 (3) |
121, 122/124, 125, 126, 127, 128, 129, 130 |
Felidae |
Lynx, Panthera, Felis |
24 (3) |
119, 122/124, 125, 126, 127, 128, 129, 130 |
Hominidae |
Homo |
13 (2) |
121, 122/124, 125, 126, 127, 129 |
Hyaenidae |
Crocuta |
19 (9) |
119, 121, 125, 126, 128, 129, 130 |
Leporidae |
Lepus |
13 (0) |
121, 122/124, 125, 126, 127, 128, 129 |
Muridae |
Apodemus |
5 (0) |
130 |
Mustelidae |
Mustela |
2 (2) |
122/124, 128 |
Rhinocerotidae |
Coelodonta |
3 (0) |
130 |
Soricidae |
Sorex |
7 (2) |
119, 127, 128, 130 |
Suidae |
Sus |
3 (0) |
130 |
Talpidae |
Talpa |
13 (0) |
121, 122/124, 126, 127, 128, 130 |
Ursidae |
Ursus |
24 (2) |
119, 122/124, 125, 126, 127, 128, 129, 130 |
The original study reported 70 assignments with coverage >5x. Re-analysis of the data using quicksand identifies more samples exceeding this threshold (115) and with higher coverage for all reported families.
To illustrate how quicksand output can be analysed further, we here repeat the human haplogroup analysis for three samples from the Solutrean levels 121, 122, and 126 reported by Gelabert et al. 2025. In all three cases, quicksand yields higher mtDNA coverage than previously reported: 4.86x, 10.71x, and 11.37x compared to 2.5x, 5.6x, and 6.4x for levels 121, 122, and 126, respectively.
Haplogroup calling with HaploGrep 3 (Schönherr et al., 2023) requires an mtDNA consensus sequence based on alignment to the rCRS. However, the “best” reference genome for the Hominidae family in default quicksand runs is often the Denisovan or Neanderthal mtDNA genome. This occurs because Neanderthals and Denisovans are classified as subspecies of modern humans in the NCBI taxonomy (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?command=show&mode=tree&id=9606). Consequently, if KrakenUniq matches kmers to both the modern human and Neanderthal/Denisovan mtDNA genomes, the ‘best’ assignment is placed at the subspecies (Neanderthal/Denisovan) level rather than at the Homo node.
This issue can be avoided by running quicksand with a set of predefined (“fixed”) reference genomes for selected families, either directly or
in a second step using the --rerun flag.
In “rerun” mode, quicksand skips the preprocessing and KrakenUniq classification and starts from the ExtractedReads for the selected “fixed” families (e.g.,
quicksand_v2.6/out/Hominidae/1-extracted/*.bam). quicksand then processes these sequences using the fixed reference genomes and updates the final
summary reports to include the statistics for the additional mappings.
This example shows the combined use of the --fixed with the --rerun flag.
The input for the --fixed flag is a tab-separated file specifying the family, a unique tag (or species name), and the absolute path to the mtDNA reference genome to be
used. The number of fixed references per family is not limited, allowing alignments and statistics to be generated for multiple reference genomes per family if needed.
Run quicksand with the --rerun and --fixed options to map all Hominidae sequences to the rCRS. The rCRS is the Homo_sapiens
reference mtDNA genome that is part of the quicksand database
Prepare the fixed.tsv input file:
export FAMILY=”Hominidae”
export TAG=”rCRS”
export
GENOME=”$PWD/../refseq/genomes/Hominidae/Homo_sapiens.fasta”
echo -e "Family\tTag\tGenome" > fixed.tsv
echo -e $FAMILY\t$TAG\t&GENOME >> fixed.tsv
Repeat the quicksand run with the --rerun and --fixed options, as well as the --doublestranded flag:
quicksand run mpieva/quicksand \
-r v2.6 \
--fixed fixed.tsv \
--rerun \
--doublestranded \
-profile singularity
The mapped and deduplicated bam-files can be found in the freshly created fixed subdirectory of the family specific output-folder
(quicksand_v2.6/out/Hominidae/fixed/3-deduped/). Construct a FASTA consensus sequence from the rCRS-mapped and deduplicated sequences
of each sample using ANGSD v0.940 (Korneliussen et al. 2014), with the most common allele option
(-doFasta 2 and -doCounts 1 flags), trimming the first and last base of each sequence to mitigate aDNA damage effects (-trim 1).
Index the sample BAM-files:
samtools index quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916465.Hominidae.rCRS_deduped.bam
samtools index quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916462.Hominidae.rCRS_deduped.bam
samtools index quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916454.Hominidae.rCRS_deduped.bam
Create FASTA consensus sequences:
angsd -out ERR13916465.fasta -i quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916465.Hominidae.rCRS_deduped.bam -doFasta 2 -doCounts 1 -trim 1
angsd -out ERR13916462.fasta -i quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916462.Hominidae.rCRS_deduped.bam -doFasta 2 -doCounts 1 -trim 1
angsd -out ERR13916454.fasta -i quicksand_v2.6/out/Hominidae/fixed/3-deduped/ERR13916454.Hominidae.rCRS_deduped.bam -doFasta 2 -doCounts 1 -trim 1
Next, use the HaploGrep 3 web tool (https://haplogrep.i-med.ac.at/) to assign human mtDNA haplogroups to each sample (Table below).
Sample |
Layer |
Gelabert et al. 2015 |
This study |
|---|---|---|---|
ERR13916465 |
121 |
R or U |
U |
ERR13916462 |
122 |
U2'3'4'7'8'9 |
U2'3'4'7'8'9 |
ERR13916454 |
126 |
U2'3'4'7'8'9 |
U4’9 |
The detected haplogroups are consistent with those reported by Gelabert et al. 2025. Due to the increased amount of data recovered using quicksand, we were able to refine the haplogroup assignments for sample ERR13916465 (U) and ERR13916454 (U4’9, a clade within U2'3'4'7'8'9).
While a complete re-evaluation of this dataset is beyond the scope of this example, it demonstrates that quicksand can be used to analyze mammalian capture enrichment data generated from double-stranded DNA libraries using a different probeset. The results are consistent with those obtained in the original publication, and quicksand classifies more sequences, allowing the identification of additional families and providing more data for mtDNA haplogroup assignment.
We also illustrate how quicksand output can be used for additional downstream analyses and how run options can be adjusted to meet the requirements of external tools. While Gelabert et al. applied several additional quality-control and filtering steps to remove modern faunal contaminants (not replicated here), we show that quicksand is suitable to provide a quick and easy starting point for subsequent in-depth, taxa-specific analyses.
dsDNA / Shotgun Sequencing¶
This chapter demonstrates the analysis of a deeply sequenced dsDNA shotgun library with quicksand.
For this example, we use data from sediment sample ‘SAT29’ from Satsurblia Cave (Georgia), described in Gelabert et al. 2021, containing ~522 million sequences. The BAM file available in the ENA includes raw reads processed with cutadapt v2.7 to remove sequencing adapters and poly-A–tailed reads.
With the quicksand database downloaded to the refseq directory (Quickstart),
create a subdirectory for the shotgun library analysis and download the
required library file into a fresh input directory:
mkdir dsDNA_sat && cd dsDNA_sat
wget -P split \
ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR602/004/ERR6024164.bam
Then run quicksand with the default parameters and the --doublestranded flag:
quicksand run mpieva/quicksand \
-r v2.6 \
--split split \
--genomes ../refseq/genomes \
--db ../refseq/kraken/Mito_db_kmer22 \
--bedfiles ../refseq/masked \
--doublestranded \
-profile singularity
After completion, the final summary report is saved as quicksand_v2.6/final_report.tsv. quicksand detects 1,237 families, of which
382 have at least one mapped sequence, and 14 pass the PSF and PEB filter thresholds. Five mammalian families are detected: Canidae (best reference: Canis
lupus), Bovidae (best reference: Bison bonasus), Hominidae (best reference: Homo sapiens subsp. ‘Denisova’), and Cervidae (best reference: Cervus canadensis),
with Ancientness ratings of ++ and mtDNA coverages of 2.96x, 1.84x, 1.94x, and 0.37x, respectively.
In addition, Cricetidae (best reference: Microtus ilaeus) is detected with Ancientness + and mtDNA coverage of 0.2x (Table below).
Family |
Extracted |
Kmers |
KmerDupRate |
Best Reference |
Mapped |
Deduped |
Ancientness |
PSF |
Coverage |
PEB |
|---|---|---|---|---|---|---|---|---|---|---|
Canidae |
2223 |
101 (8925) |
1.69 (2.63) |
Canis_lupus_familiaris |
1666 |
1130 |
++ |
12.42 |
2.96 |
0.950 |
Bovidae |
1783 |
1224 (6616) |
2.69 (2.33) |
Bison_bonasus |
839 |
570 |
++ |
6.18 |
1.84 |
0.989 |
Cricetidae |
259238 |
253 (1845) |
1.38 (613.0) |
Microtus_ilaeus |
99 |
66 |
+ |
0.71 |
0.21 |
1.093 |
Hominidae |
1080 |
27 (7082) |
1.3 (2.35) |
Homo_sapiens_subsp._'Denisova' |
911 |
644 |
++ |
7.10 |
1.94 |
1.001 |
Cervidae |
455 |
58 (1920) |
2.21 (1.58) |
Cervus_canadensis |
191 |
133 |
++ |
1.51 |
0.37 |
1.081 |
We use this example to illustrate how the Kmers and KmerDupRate columns in the final summary report can be used for extended trouble-shooting.
As shown in Table, the Cricetidae stands out with 259,238 sequences classified by KrakenUniq (Extracted). However, these sequences are based on only
1,845 unique kmers and exhibit a very high kmer duplication rate compared to the other families (613). Particularly telling is that only 99 of the
259,238 sequences successfully map to the best reference genome (Mapped). This combination of data indicates that sequences in the dataset containing a
highly repetitive motif are being assigned to Cricetidae. Visual inspection of the sequences assigned to Cricetidae
(quicksand_v2.6/out/Cricetidae/1-extracted/ERR6024164_extractedReads-Cricetidae.bam) shows that most classified sequences are identical in length
(96 bp) and end with the same or highly similar sequence followed by a poly-G tail (ACTCCAGTCACCAGGAGGATCTCGTATGCCGTCTTCTGCTTGAAAA–PolyG), consistent with
technical artifacts such as non-adapter-clipped reads. Although the 99 mapped Cricetidae sequences pass the PSF and PEB quicksand filters and appear
authentic, this family-level assignment should be interpreted with caution.
We note that Gelabert et al. (2021) used only 226,880,778 sequences for their screening, after applying additional quality-filtering steps that are not replicated here that would probably remove these sequences.
The assignments of the Bovidae, Canidae, and Hominidae families are consistent with the analysis by Gelabert et al. 2021, who used CENTRIFUGE in combination with the whole non-redundant NCBI nucleotide database and a 2% classified-reads cutoff to screen the shotgun data to decide on organisms for follow-up targeted capture experiments and whole genome mapping. In this example, we show that in this case quicksand could be used to replace the CENTRIFUGE step for the detection of mammalian families in the sample. quicksand detects an additional family, the Cervidae, for which archaeological evidence was found in Satsurblia Cave.
In summary, this section demonstrates that quicksand can be run with default parameters on deeply sequenced shotgun data to extract the mammalian component that is consistent with the screening of the same data using a nuclear DNA database.
We also show how the KrakenUniq summary statistics in the final quicksand report can assist in trouble-shooting data-anomalies.