public sequences ready for download!
COVID-19 PubSeq is a free and open online bioinformatics public sequence resource with on-the-fly analysis of sequenced SARS-CoV-2 samples that allows for a quick turnaround in identification of new virus strains. PubSeq allows anyone to upload sequence material in the form of FASTA or FASTQ files with accompanying metadata through a web interface or REST API.
Our goal is to help map the viral variants in this pandemic. Early identification of variants helps with testing and treatments! COVID-19 PubSeq accepts sequence material from all sources (notably in FASTA format). In addition, PubSeq provides specific workflows for Oxford Nanopore analysis in FAST5 and FASTQ format. If you have an Oxford Nanopore and need (free) help analysing SARS-CoV-2 FAST5 or FASTQ data, feel free to contact us! Also for commercial support you can reach out.
COVID-19 PubSeq is also a repository for sequences with a low barrier to entry for uploading sequence data using best practices, including FAIR data. Data are published with metadata using state-of-the art standards and, perhaps most importantly, providing standardised workflows that get triggered on upload, so that results are immediately available in standardised data formats. Note that, in general, there is no conflict also uploading your data to other repositories, including EBI/ENA and GISAID.
Your uploaded sequence will automatically be processed and incorporated into the public pangenome with metadata using worklows from the High Performance Open Biology Lab defined here. Importantly, all data is published under a Creative Commons license (CC0 or CC-BY-4.0). Anyone can take the published (GFA/RDF/FASTA) data and use it for further processing.
The repository will be maintained and expanded for the duration of the pandemic (and beyond). To contribute data simply upload it! To contribute code and/or workflows see the project repository. For more information see the FAQ and the paper