COVID-19 PubSeq is a free and open online bioinformatics public
sequence resource with on-the-fly analysis of sequenced SARS-CoV-2
samples that allows for a quick turnaround in identification of new
virus strains. PubSeq allows anyone to upload sequence material in
the form of FASTA or FASTQ files with accompanying metadata through
a web interface or REST API.
Our goal is to help map the viral variants in this pandemic. Early
identification of variants helps with testing and treatments!
COVID-19 PubSeq accepts sequence material from all sources (notably
in FASTA format). In addition, PubSeq provides specific workflows
for Oxford Nanopore analysis in FAST5 and FASTQ format. If you have
an Oxford Nanopore and need (free) help analysing SARS-CoV-2 FAST5
or FASTQ data, feel free to contact us! Also
for commercial support you can reach out.
COVID-19 PubSeq is also a repository for sequences with a low
barrier to entry for uploading sequence data using best practices,
including FAIR
data. Data are published with metadata using state-of-the art
standards and, perhaps most importantly, providing standardised
workflows that get triggered on upload, so that results are
immediately available in standardised data formats. Note that, in
general, there is no conflict also uploading your data to other
repositories, including EBI/ENA and GISAID.
Your uploaded sequence will automatically be processed and
incorporated into the public pangenome with metadata using worklows
from the High Performance Open Biology Lab
defined here. Importantly, all
data is published under
a Creative
Commons license (CC0 or CC-BY-4.0). Anyone can take the
published (GFA/RDF/FASTA) data and use it for
further processing.
The repository will be maintained and expanded for the
duration of the pandemic (and beyond). To contribute data
simply upload it! To contribute code and/or workflows see
the project
repository. For more information see
the FAQ and
the paper