COVID-19 PubSeq (part 6)

1 Generating output for EBI

Would it not be great an uploader to PubSeq also can export samples to, say, EBI? That is what we discuss in this section. The submission process is somewhat laborious and when you have submitted to PubSeq why not export the same to EBI too with the least amount of effort?

COVID-19 PubSeq is a data source - both sequence data and metadata - that can be used to push data to other sources, such as EBI. You can register samples programmatically with a specific XML interface. Note that (at this point) if you want to submit a sequence (FASTA) it can only be done through the Webin-CLI. Raw data (FASTQ) can go through the XML interface.

EBI sequence resources are presented through ENA. For example Sequence: MT394864.1.

EBI has XML Formats for

  • RUN
  • DAC

with the schemas listed here. Since we are submitting sequences we should follow submitting full genome assembly guidelines and ENA guidelines. The first step is to define the study, next the sample and finally the sequence (assembly).

2 Defining the EBI study

A study is defined here and looks like

   <PROJECT alias="COVID-19 Washington DC">
      <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE>
      <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION>

also a submission 'command' is required looking like

         <HOLD HoldUntilDate="TODO: release date"/>

The webin system accepts such sources using a command like

curl -u username:password -F "SUBMISSION=@submission.xml" \
  -F "PROJECT=@project.xml" ""

as described here. Note that this is the test server. For the final version use instead of You may also need the –insecure switch to circumvent certificate checking.

work in progress (WIP)

3 Define the EBI sample

work in progress (WIP)

4 Define the EBI sequence

work in progress (WIP)

