COVID-19 PubSeq: Public SARS-CoV-2 Sequence Resource

public sequences ready for download!

May 2021 update: we are now at 86,377 sequences with normalized metadata on AWS OpenData!

Edit text!

UP | HOME

PubSeq REST API

1 PubSeq REST API

Here we document the public REST API that comes with PubSeq. The tests run in emacs org-babel. See the bottom of this document for running the tests inside emacs. See bottom of the page how to run tests.

1.1 Introduction

We built a REST API for COVID-19 PubSeq. The API source code can be found in api.py. To see if the service is up try

curl http://covid19.genenetwork.org/api/version
{
  "service": "PubSeq",
  "version": 0.1
}

The Python3 version is

import requests
baseURL="http://localhost:5067" # for development
# baseURL="http://covid19.genenetwork.org"
response = requests.get(baseURL+"/api/version")
response_body = response.json()
assert response_body["service"] == "PubSeq", "PubSeq API not found"
response_body
service : PubSeq version : 0.1

1.2 Search for an entry

When you use the search box on PubSeq it queries the REST end point for information on the search items. For example

requests.get(baseURL+"/api/search?s=MT533203.1").json()
collection : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 fasta : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta id : MT533203.1 info : http://identifiers.org/insdc/MT533203.1#sequence

where collection is the raw uploaded data. The hash value in c= is computed on the contents of the Arvados keep collection and effectively acts as a deduplication uuid.

1.3 Fetch metadata

Using above collection link you can fetch the metadata in JSON as it was uploaded originally from the SHeX expression, e.g. using https://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126/

But better to use the more advanced sample metadata fetcher because is does a bit more in terms of expansion

requests.get(baseURL+"/api/sample/MT533203.1.json").json()
collection : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 date : 2020-04-27 fasta : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta id : MT533203.1 info : http://identifiers.org/insdc/MT533203.1#sequence mapper : minimap v. 2.17 sequencer : http://www.ebi.ac.uk/efo/EFO_0008632 specimen : http://purl.obolibrary.org/obo/NCIT_C155831

1.4 Fetch EBI XML

PubSeq provides an API that is used to export formats that are suitable for uploading data to EBI/ENA from our EXPORT menu. This is documented here.

requests.get(baseURL+"/api/ebi/sample-MT326090.1.xml").text
<?xml version="1.0" encoding="UTF-8"?>
<SAMPLE_SET>
  <SAMPLE alias="MT326090.1" center_name="COVID-19 PubSeq">
    <TITLE>COVID-19 PubSeq Sample</TITLE>
    <SAMPLE_NAME>
      <TAXON_ID>2697049</TAXON_ID>
      <SCIENTIFIC_NAME>Severe acute respiratory syndrome coronavirus 2</SCIENTIFIC_NAME>
      <COMMON_NAME>SARS-CoV-2</COMMON_NAME>
    </SAMPLE_NAME>
    <SAMPLE_ATTRIBUTES>
      <SAMPLE_ATTRIBUTE>
        <TAG>investigation type</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>sequencing method</TAG>
        <VALUE>http://purl.obolibrary.org/obo/OBI_0000759</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>collection date</TAG>
        <VALUE>2020-03-21</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (latitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (longitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
     <TAG>geographic location (country and/or sea)</TAG>
     <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (region and locality)</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>environment (material)</TAG>
        <VALUE>http://purl.obolibrary.org/obo/NCIT_C155831</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>ENA-CHECKLIST</TAG>
        <VALUE>ERC000011</VALUE>
      </SAMPLE_ATTRIBUTE>
    </SAMPLE_ATTRIBUTES>
  </SAMPLE>
</SAMPLE_SET>

2 Configure emacs to run tests

Execute a code block with C-c C-c. You may need to set

(org-babel-do-load-languages
 'org-babel-load-languages
 '((python . t)))
(setq org-babel-python-command "python3")
(setq org-babel-eval-verbose t)
(setq org-confirm-babel-evaluate nil)

To skip confirmations you may also want to set

(setq org-confirm-babel-evaluate nil)

To see output of the inpreter open then Python buffer.

Edit text!


Other documents

We fetch sequence data and metadata. We query the metadata in multiple ways using SPARQL and onthologies
We submit a sequence to the database. In this BLOG we fetch a sequence from GenBank and add it to the database.
We modify a workflow to get new output
We modify metadata for all to use! In this BLOG we add a field for a creative commons license.
Dealing with PubSeq localisation data
We explore the Arvados command line and API
Generate the files needed for uploading to EBI/ENA
Documentation for PubSeq REST API