Bravo API Tutorial

The Bravo variant browser allows authorized access to its data through an API. Two levels are supported:

  • Low level API access. This is the most flexible way of accessing Bravo programmatically. It requires a basic knowledge of HTTP requests and command line tools such as curl.
  • High level API access. This is the easiest way to query Bravo from the command line. It doesn't require any prior knowledge of HTTP requests or Bravo API endpoints.

What it is for

  • Programmatic lookups of candidate variants, genes, and regions.
  • Integration with web and command line tools.

What it is not for

  • Programmatic data dumps
  • Querying of millions of variants genome-wide at once

In this tutorial we will introduce to you the high level API access to Bravo variant browser. We will cover how to:

  1. Setup the API on your machine.
  2. Retrieve a single variant.
  3. Retrieve all variants inside a region.
  4. Retrieve all variants inside a gene.
  5. Filtering options.
  6. Annotate your VCF file.

1. Setup

  1. Enable API access from your Profile->Settings page in Bravo.
  2. Download the bravo command line tool from https://bravo.sph.umich.edu/freeze5/hg38/static/tools/bravo. Note: before using the tool, you may need to make it executable with chmod +x bravo command on Linux or MacOS. Windows users need to download bravo.exe command line tool from https://bravo.sph.umich.edu/freeze5/hg38/static/tools/windows/bravo.exe.
  3. Run ./bravo login and follow the instructions on your screen. Windows users run bravo.exe login.

2. Retrieve a single variant

A single variant can be retrieved from Bravo using the ./bravo query-variant command.

Execute ./bravo quert-variant -h to see all available options:

usage: bravo query-variant [-h] [-v chrom-pos-ref-alt/rs#] [-c name]
                           [-p base-pair] [-o {json,vcf}]

Query variant by identifier CHROM-POS-REF-ALT, or by chromosome name and
chromosomal position.

optional arguments:
  -h, --help            show this help message and exit
  -v chrom-pos-ref-alt/rs#, --variant chrom-pos-ref-alt/rs#
                        Variant identifier CHROM-POS-REF-ALT or rs#.
  -c name, --chromosome name
                        Chromosome name.
  -p base-pair, --position base-pair
                        Position.
  -o {json,vcf}, --output {json,vcf}
                        Output format.

You can retrieve variant by its identifier CHROM-POSITION-REF-ALT if you know its alleles. The default output format is JSON.

./bravo query-variant -v 22-16389447-A-G
{u'allele_num': 125568, u'allele_freq': 0.0498614, u'ref': u'A', u'allele_count': 6261, u'pos': 16389447, u'filter': u'PASS', u'site_quality': 255.0, u'rsids': [u'rs34747326'], u'variant_id': u'22-16389447-A-G', u'alt': u'G', u'chrom': u'22'}

Alternatively, you can retrieve variant by rs#.

{u'allele_num': 125568, u'allele_freq': 0.0498614, u'ref': u'A', u'allele_count': 6261, u'pos': 16389447, u'filter': u'PASS', u'site_quality': 255.0, u'rsids': [u'rs34747326'], u'variant_id': u'22-16389447-A-G', u'alt': u'G', u'chrom': u'22'}

Or, you can retrieve a variant by chromosome name and position. In this way, you will see all possible alleles if a variant is multi-allelic.

./bravo query-variant -c 22 -p 16390137
{u'allele_num': 125568, u'allele_freq': 1.59276e-05, u'ref': u'T', u'allele_count': 2, u'pos': 16390137, u'filter': u'PASS', u'site_quality': 255.0, u'rsids': [], u'variant_id': u'22-16390137-T-A', u'alt': u'A', u'chrom': u'22'}
{u'allele_num': 125568, u'allele_freq': 7.96381e-06, u'ref': u'T', u'allele_count': 1, u'pos': 16390137, u'filter': u'PASS', u'site_quality': 255.0, u'rsids': [], u'variant_id': u'22-16390137-T-C', u'alt': u'C', u'chrom': u'22'}

Bravo API supports VCF output as well. Enable it with -o or --output option.

./bravo query-variant -c 22 -p 16390137 -o vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=CEN,Description="Variant located in centromeric region with inferred sequences">
##FILTER=<ID=SVM,Description="Variant failed SVM filter">
##FILTER=<ID=DISC,Description="Mendelian or duplicate genotype discordance is high (3/5% or more)">
##FILTER=<ID=CHRXHET,Description="Excess heterozygosity in chrX in males">
##FILTER=<ID=EXHET,Description="Excess heterozygosity with HWE p-value < 1e-6">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
22	16390137	.	T	A	255.0	PASS	AN=125568;AC=2;AF=1.59276e-05
22	16390137	.	T	C	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06

3. Retrieve all variants inside a region

All variants inside a region can be retrieved from Bravo using ./bravo query-region command.

./bravo query-region -c 22 -s 16387675 -e 16390908 -o vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=CEN,Description="Variant located in centromeric region with inferred sequences">
##FILTER=<ID=SVM,Description="Variant failed SVM filter">
##FILTER=<ID=DISC,Description="Mendelian or duplicate genotype discordance is high (3/5% or more)">
##FILTER=<ID=CHRXHET,Description="Excess heterozygosity in chrX in males">
##FILTER=<ID=EXHET,Description="Excess heterozygosity with HWE p-value < 1e-6">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
22	16387678	.	C	G	255.0	PASS	AN=125568;AC=2;AF=1.59276e-05
22	16387678	.	C	T	157.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387688	.	C	T	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387694	rs189315821	G	A	132.0	PASS	AN=125568;AC=2;AF=1.59276e-05
22	16387695	.	C	T	255.0	PASS	AN=125568;AC=2;AF=1.59276e-05

4. Retrieve all variants inside a gene

All variants inside a gene can be retrieved from Bravo using ./bravo query-gene command.

./bravo query-gene -n ABCD1P4 -o vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=CEN,Description="Variant located in centromeric region with inferred sequences">
##FILTER=<ID=SVM,Description="Variant failed SVM filter">
##FILTER=<ID=DISC,Description="Mendelian or duplicate genotype discordance is high (3/5% or more)">
##FILTER=<ID=CHRXHET,Description="Excess heterozygosity in chrX in males">
##FILTER=<ID=EXHET,Description="Excess heterozygosity with HWE p-value < 1e-6">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
22	16387695	.	C	T	255.0	PASS	AN=125568;AC=2;AF=1.59276e-05
22	16387700	.	C	T	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387702	.	T	C	116.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387706	.	T	C	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387714	.	C	T	255.0	PASS	AN=125568;AC=8;AF=6.37105e-05
22	16387718	.	G	T	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387719	.	C	A	255.0	PASS	AN=125568;AC=2;AF=1.59276e-05
22	16387721	.	T	G	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387722	.	G	T	255.0	PASS	AN=125568;AC=3;AF=2.38914e-05

5. Filtering

The bravo commands query-gene, query-region, and annotate (covered later) support result filtering by specified expression.

Let's retrieve all singletons in ABCD1P4 gene.

./bravo query-gene -n ABCD1P4 -f "allele_count==1" -o vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=CEN,Description="Variant located in centromeric region with inferred sequences">
##FILTER=<ID=SVM,Description="Variant failed SVM filter">
##FILTER=<ID=DISC,Description="Mendelian or duplicate genotype discordance is high (3/5% or more)">
##FILTER=<ID=CHRXHET,Description="Excess heterozygosity in chrX in males">
##FILTER=<ID=EXHET,Description="Excess heterozygosity with HWE p-value < 1e-6">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
22	16387700	.	C	T	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387702	.	T	C	116.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387706	.	T	C	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387718	.	G	T	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387721	.	T	G	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387728	.	G	A	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387737	.	C	G	135.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387738	.	C	A	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06
22	16387740	.	G	A	255.0	PASS	AN=125568;AC=1;AF=7.96381e-06

There are 478 singletons in this gene. Now, let's look how many of them did't pass filtering.

./bravo query-gene -n ABCD1P4 -f "allele_count==1&filter!=PASS" -o vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=CEN,Description="Variant located in centromeric region with inferred sequences">
##FILTER=<ID=SVM,Description="Variant failed SVM filter">
##FILTER=<ID=DISC,Description="Mendelian or duplicate genotype discordance is high (3/5% or more)">
##FILTER=<ID=CHRXHET,Description="Excess heterozygosity in chrX in males">
##FILTER=<ID=EXHET,Description="Excess heterozygosity with HWE p-value < 1e-6">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
22	16387844	.	G	C	50.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16388018	.	C	T	44.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16388182	.	C	A	32.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16388371	rs112139032	T	C	255.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16388398	rs112696740	G	T	255.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16388429	.	G	A	21.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16388598	.	C	T	32.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16389436	.	G	T	4.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16389699	.	C	A	255.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16389864	.	A	T	71.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16389873	.	T	A	60.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16390250	.	G	C	42.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16390363	.	C	T	50.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16390714	.	A	G	255.0	SVM	AN=125568;AC=1;AF=7.96381e-06
22	16390714	.	A	T	255.0	SVM	AN=125568;AC=1;AF=7.96381e-06

26 out of 476 singletons in ABCD1P4 gene didn't pass quality filters.

Bravo supports ==, !=, <, >, <=, >= comparison operators applicable to allele_count (AC in VCF), allele_freq (AF in VCF), allele_num (AN in VCF), site_quality (QUAL in VCF) and filter (FILTER in VCF) variant attributes.

6. Annotate your VCF

If you have a list of candidate variants, then you can annotate them with inforomation from Bravo using the bravo annotate command. Your list of variants must be in VCF format and preferrably sorted by position (for the best performance).

Download the list of ClinVar variants in APOE gene from https://bravo.sph.umich.edu/freeze5/hg38/static/tools/tutorial/clinvar_20180401_APOE.vcf. To annotate this list with allele frequences from Bravo, run the following command on Linux or Mac OS X:

cat clinvar_20180401_APOE.vcf | ./bravo annotate > clinvar_20180401_APOE_Bravo.vcf

and on Windows:

type clinvar_20180401_APOE.vcf | ./bravo annotate > clinvar_20180401_APOE_Bravo.vcf

The BRAVO_AC, BRAVO_AN, BRAVO_AF, and BRAVO_FILTER keys will be appended to the INFO field for every variant from your list which is also present in Bravo.