Genebass walk-through

Summary

Genebass is a resource of exome-based association statistics, made available to the public. The dataset encompasses 4,529 phenotypes with gene-based and single-variant testing across 394,841 individuals with exome sequence data from the UK Biobank.

We hope scientists will use this resource to identify novel gene-phenotype associations towards understanding disease biology and drug discovery. All data here are released openly and publicly (see the terms of use). Many teams of scientists and engineers across Broad Institute, Pfizer, Biogen, and AbbVie worked together to build Genebass.

Genebass serves up 993,280,477 gene-level association statistics (across 19,407 genes, 4 annotation sets, and 3 burden tests) and 28,158,190,538 single-variant association statistics across 8,074,878 exome variants. Because there is so much data, we designed a novel user interface to enable users to quickly visualize many phenotypes and genes simultaneously. Users can quickly jump between genes, single variants, and phenotypes. They can explore data mutation class (predicted loss-of-function, missense, synonymous). Please see the detailed walk-through below to see what is possible.

For details about the analysis, methods, and quality control, please see our accompanying paper, which was recently published! The Broad Institute's news post provides a nice big picture summary.

Navigation & Layout

Overview

A core concept of the Genebass Browser is its split-screen design intended for rapidly inspecting and comparing many association results. The left hand side displays a re-sizable Results Pane, which shows all hits for a given phenotype, gene, or variant. The right hand side displays detailed association data for genes and variants with selected phenotype(s).

Results Pane Navigation

The Results Pane displays PheWAS plots, Manhattan plots, or phenotype information depending on which navigational button is selected in the top bar.

The Results Pane can shrink, expand, or be hidden entirely by clicking the Results Pane Width presets buttons. The central dotted line is also draggable left or right. Organizing the page in this way is intended to help users to quickly inspect many associations without losing a sense of context, and either half can be easily hidden to create more screen room for what the user is focusing on. Depending on the width of the Results Pane, the certain controls and or table columns may be automatically hidden.

The Status Bar

The Status Bar displays the currently selected gene, phenotype, variant, or burden annotation set. Keep this in mind as you cycle through the different Results Pane options, because the data displayed will depend on the current state shown in the Status Bar.

Exploring associations by gene

When the Gene PheWAS button is active, the Results Pane displays all phenotypes associated with a particular gene in PheWAS plot and tabular formats, along with a set of controls.

How links work

In general, clicking on names or ids (a phenotype name like LDLR direct, a gene name like PCSK9 or a variant ID like 8-19953276-C-A) will bring you to a page focused on that item's top hits. In contrast, clicking on arrow buttons in the table will keep you on the same page and update the right hand side to display detailed data for that table row. In other words, clicking on nouns will show you top results for that noun, and clicking on arrows will update to the page's genotype-phenotype relationship corresponding to the table row clicked.

Caching and performance

Stronger associations are cached for quicker browsing. Pages load quickly if the phenotype-gene or phenotype-variant association P-value pair is below the cache threshold (1e-4 for genes, 1e-6 for variants) and a bit longer if items are above the cache threshold. Pay attention to rows that have green or yellow indicators, since these will load faster.

Phenotype controls

The phenotype control panel is for fine-tuning which set of phenotypes and test statistics to display. Users can specify one of three burden tests (Burden, SKAT, SKAT-O) or burden sets (pLoF, missense, synonymous -- i.e. the annotations used in the burden analysis) shown in the table and plot. Phenotypes can be filtered by keywords such as phenotype description or trait type (continuous, categorical, or ICD10). The results can also be filtered by P-value or Beta using minimum and/or maximum threshold controls. Note that for genes, the Beta statistic is always derived from the Burden test (SKAT and SKAT-O tests do not produce Beta statistics). The PheWAS plot is colored and grouped by UK Biobank showcase category; the Categories section can be used to traverse the showcase tree and filter the phenotype list to those belonging to specific categories. P-values can be plotted on either log or double log scales using Plot options. Users can set the plot to focus on P-value only, Beta-only, or P-value and Beta simultaneously (default).

Gene Burden Table

The gene burden table summarizes burden statistics and quality control metrics across all mutational classes and tests. Useful for comparing P-values across the tests and determining whether the results should be treated with caution based on Lambda GC or other gene quality control metrics. See the preprint for methodology and more details on interpreting these values.

Gene & single variant plot and table

The gene plot displays single variants (A) mapped to genomic coordinates along the gene exons (B). Variant -log10p values are shown on the Y axis (C). The plot transitions from to a double log scale ⅔ along the plot height in order to prevent variants with extremely low P-values from dominating the plot, allowing users to focus on novel rare variant associations near the significance threshold. Variants are depicted as circles, with the circle radii log-scaled by allele frequency in the non-Finnish European population. By default, variants are colored by their most severe VEP consequence across transcripts. If the selected phenotype is categorical, two additional case/control variant tracks (D) display variant positions with radii log-scaled to allele frequencies in cases and controls, respectively. If the selected phenotype is continuous, variant radii will be log-scaled by allele frequency among individuals measured for the trait.

Each variant is represented as a row in the table containing detailed summary statistics. Column headers have tooltips for learning what each column means, and the headers can be clicked to sort the table by that column.

Single variant plot controls

The single variant analysis control panel is used to configure data displayed related to single variants. Variants can be filtered by identifier or annotation using the search box (A). Users can focus on particular parts of the allele frequency spectrum by dragging the allele frequency filter slider (B). Users can specify which columns to display using the column selection checkboxes (C), or they can choose one of the column group presets (D). Each preset will select a particular set of columns that make sense to compare side-by-side (e.g. allele counts, frequencies, population counts, and columns best suited for categorical or continuous trait types). This section also enables features related to viewing multi-phenotypes and GWAS catalog data, discussed below.

Exploring associations by phenotype

When the Gene Manhattan results pane is active, the Results Pane displays all gene associations with a selected phenotype. The results are displayed in manhattan plot, QQ plot, and tabular format. The three burden test types are displays as columns, and the burden set (pLoF, missense|LC, or synonymous) can be selected with the “Burden set” segmented control. Clicking on a gene name will navigate to the gene PheWAS view, and clicking on the “details” arrow will update the right hand side without leaving the gene manhattan view. When the Variant Manhattan results pane is active, the left hand side results index takes a similar format as the gene manhattan but displays single variant association P-values instead of gene test statistics. Single variant results can be filtered by consequence category (pLoF, missense, synonymous, and other). Clicking a variant ID will navigate to that variant’s PheWAS view, and clicking the “details” arrow will keep the single variant manhattan view active.

Viewing phenotype information

Click on the Phenotype info button to see more information about the currently-selected phenotype, including category path, data collection descriptions, and links to the official UKBB showcase page. The phenotype table will generally show more columns when the results pane is set to wider formats. If the phenotype table is too narrow to see all desired columns, hover over top of the little info icon to see the fields.

Comparing single variant associations across phenotypes

Genebass has functionality for exploring many single-variant associations simultaneously on the gene page. This feature aims to help users gain insight into pleiotropic patterns of variation across all high-scoring traits for a gene. Each row in the PheWAS table has a checkbox (A) that, when checked, will overlay the phenotype in the gene plot. The “select top” button (B) will load all phenotypes below the 1e-4 P-value threshold; the “clear selected” button (B) will unselect all phenotypes and return to the single phenotype view. When selected, phenotypes are assigned randomly generated colors to make them easier to distinguish in the plot and table (C). Tens or even hundreds of phenotypes can be loaded simultaneously; however, an automatic P-value threshold will be applied when there are too many variants to display and the user will be warned in the single variant control panel.

Wide table format

By default, the variant table is configured to the “long” table format; when multiple phenotypes are selected, each variant-phenotype association will appear as a row in the variant table such that the variant table now contains duplicate entries for each variant. To make rows unique and to see comparison of association statistics across traits in a single row, the table can be set to “wide” format (A). The phenotype pivots to the columns, creating a sort of genotype-phenotype matrix (B). The column selection controls will affect both the long and wide table formats. When examining many phenotypes at once, users can click the “filter to selected” button (C) on the phenotype section to simplify the PheWAS plot by only showing the selected phenotypes; this effectively serves as a legend for coloring-by-trait functionality.

Hover interactions are especially useful when comparing multiple phenotypes; hovering over variants or phenotypes with the mouse will emphasize the relevant variants and bring them to the foreground (D). The transparency slider sets the opacity level for non-hovered variants (E), helping the user tune the multi-phenotype plot such that hovered selections can stand out better.

In the above image, 10 LDLR associations are compared simultaneously and one splice donor of interest is hovered in the variant table to highlight the plot.

Visualization case/control data

For categorical phenotypes, case control counts are available in the phenotype table (A). It can also be useful to get a visual sense of how variant positions and allele frequencies differ across the gene between cases and controls. When multiple phenotypes are selected, the “show case/control tracks” checkbox (B) will fold out a series of case/control tracks for all traits currently selected (C). Setting the variant table column to the "categorical" setting (D) will display the columns for case/control counts in the table (E). For continuous traits, a single track is displayed with the circles scaled by allele frequency in individuals measured. By comparing burden results across phenotypes, genes, and individual variants, it becomes possible to get a sense of which specific variants may be driving gene burden signals in the context of the overall power of multiple analyses.

When the per-phenotype tracks are expanded, it can be useful to use the “Color by” switch to look for trends and variants across genomic coordinates, (A) consequences, (B) association P-value, (C) effect size , (D) trait, and (E) zygosity.

PheWAS results for individual variants

When a variant ID is clicked on the single variant manhattan plot or on the gene page, the page will focus on the selected variant, and a PheWAS displays all associations with that variant. Similar to the gene PheWAS page, multiple phenotypes can be selected and loaded at once using the wide table format. In the variant view, the table rows show statistics for the selected variant, and the columns show values across selected phenotypes (A). The variant position is displayed along the genomic coordinate. Clicking the “unselect” button will return to the gene page (B). In this way, users can easily flip back and forth between single variants and the gene context.

GWAS catalog annotations

Variants in Genebass can be cross-referenced with hits published in the NHGRI-EBI GWAS Catalog. Use the GWAS Catalog segmented control (A) to highlight or filter to these variants in the table and plot (B). Click the "GWAS catalog" column checkbox to annotate these variants with a purple indicator in the variant table (C).

It is possible to see the specific catalog entries and citations for these variants. Click on a variant ID, and click the "wide" table format, and scroll to the bottom.