public sequences ready for download!May 2021 update: we are now at 86,377 sequences with normalized metadata on AWS OpenData!
COVID-19 PubSeq is a free and open online bioinformatics public sequence resource with federated data using unique identifiers and with unique metadata, such as disambiguated Geo localisation. PubSeq comes with on-the-fly analysis of sequenced SARS-CoV-2 samples that allows for a quick turnaround in identification of new virus strains. PubSeq allows anyone to upload sequence material in the form of FASTA or FASTQ files with accompanying metadata through a web interface or REST API.
PubSeq is not owned by anyone. There is no central authority and there is no (single) company that owns that data or workflows. Our goal is simply to help map the viral variants. Early identification of variants helps with testing and treatments! COVID-19 PubSeq accepts sequence material from all sources. In addition, PubSeq has specific workflows for Oxford Nanopore analysis in FAST5 and FASTQ format. If you have an Oxford Nanopore and need (free) help analysing SARS-CoV-2 FAST5 or FASTQ data, feel free to contact us!
COVID-19 PubSeq is also a repository for sequences with a low barrier to entry for uploading sequence data using best practices, including FAIR data. Data are published with metadata using state-of-the art standards and, perhaps most importantly, providing standardised workflows that get triggered on upload, so that results are immediately available in standardised data formats. Note that, in general, there is no conflict also uploading your data to other repositories, including EBI/ENA and GISAID.
Your uploaded sequence will automatically be processed and incorporated into the public pangenome with metadata using worklows from the High Performance Open Biology Lab defined here. Importantly, all data is published under a Creative Commons license (CC0 or CC-BY-4.0). Anyone can take the published (GFA/RDF/FASTA) data and use it for further processing.
The repository will be maintained and expanded for the duration of the pandemic (and beyond). To contribute data simply upload it! To contribute code and/or workflows see the project repository. For more information see the FAQ and the paper