COVID-19 PubSeq: Public SARS-CoV-2 Sequence Resource

public sequences ready for download!

May 2021 update: we are now at 86,377 sequences with normalized metadata on AWS OpenData!

Edit text!

COVID-19 PubSeq (part 6)

1 Generating output for EBI

Would it not be great an uploader to PubSeq also can export samples to, say, EBI? That is what we discuss in this section. The submission process is somewhat laborious and when you have submitted to PubSeq why not export the same to EBI too with the least amount of effort?

COVID-19 PubSeq is a data source - both sequence data and metadata - that can be used to push data to other sources, such as EBI. You can register samples programmatically with a specific XML interface. Note that (at this point) if you want to submit a sequence (FASTA) it can only be done through the Webin-CLI. Raw data (FASTQ) can go through the XML interface.

EBI sequence resources are presented through ENA. For example Sequence: MT394864.1.

EBI has XML Formats for

  • SUBMISSION
  • STUDY
  • SAMPLE
  • EXPERIMENT
  • RUN
  • ANALYSIS
  • DAC
  • POLICY
  • DATASET
  • PROJECT

with the schemas listed here. Since we are submitting sequences we should follow submitting full genome assembly guidelines and ENA guidelines. The first step is to define the study, next the sample and finally the sequence (assembly).

2 Defining the EBI study

A study is defined here and looks like

<PROJECT_SET>
   <PROJECT alias="COVID-19 Washington DC">
      <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE>
      <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION>
      <SUBMISSION_PROJECT>
         <SEQUENCING_PROJECT/>
      </SUBMISSION_PROJECT>
   </PROJECT>
</PROJECT_SET>

also a submission 'command' is required looking like

<SUBMISSION>
   <ACTIONS>
      <ACTION>
         <ADD/>
      </ACTION>
      <ACTION>
         <HOLD HoldUntilDate="TODO: release date"/>
      </ACTION>
   </ACTIONS>
</SUBMISSION>

The webin system accepts such sources using a command like

curl -u username:password -F "SUBMISSION=@submission.xml" \
  -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"

as described here. Note that this is the test server. For the final version use www.ebi.ac.uk instead of wwwdev.ebi.ac.uk. You may also need the –insecure switch to circumvent certificate checking.

work in progress (WIP)

3 Define the EBI sample

work in progress (WIP)

4 Define the EBI sequence

work in progress (WIP)

Edit text!


Other documents

We fetch sequence data and metadata. We query the metadata in multiple ways using SPARQL and onthologies
We submit a sequence to the database. In this BLOG we fetch a sequence from GenBank and add it to the database.
We modify a workflow to get new output
We modify metadata for all to use! In this BLOG we add a field for a creative commons license.
Dealing with PubSeq localisation data
We explore the Arvados command line and API
Generate the files needed for uploading to EBI/ENA
Documentation for PubSeq REST API