The Gamma & Histogram

Towards dot-driven data exploration







Tomas Petricek

University of Kent
tomas@tomasp.net | @tomaspetricek

Motivation

Data journalism practice

Motivation

Open data analyses published by journalists

Spreadsheets

Easy to use

Simple problems

Not reproducible

Programming

Expert skills

Internet-scale

Reproducible

Design principles

Dot-driven development by example

Design principles

Dot-driven data exploration

Work with concrete, not abstract

Record interactions to enable learning

Make simple things easy

Choosing over constructing

DEMO
Comparing CO2 emissions of China and US

Ways of slicing data cubes

Ways of slicing data cubes

Select a year

Plane with indicators for countries

Ways of slicing data cubes

Select a year

Select indicator

Line with value for each country

Ways of slicing data cubes

Select a year

Select indicator

Select country

Data point

DEMO
Aggregating table of Olympic medal winners

Wrapper over an SQL-like query language

Offer meaningful options in a given context
Joins are an interesting open problem

DEMO
Querying the
Dr Who graph database

Querying graph databases

1: 
2: 
3: 
4: 
5: 
START a=node(1)
MATCH
  (a)-[:ALLY_OF]->(b)
     -[:APPEARED_IN]->(c)
RETURN b, c

Dot-driven development

Experimenting with a design principle

All is done by choosing from a list

Does this make programming easier?

Very different query languages underneath

Does knowledge of one transfer to another?

Next steps

How to get closer to spreadsheets

Next steps

What can we learn from spreadsheets

Interaction with the system matters

Programming as interacting with data

Direct manipulation for program construction

How to publish ideas on interaction?

DEMO
Histogram
(Live 2019)

Dirty tricks

And crimes committed along the way

Dirty tricks

And crimes committed along the way

Type providers to hide fancy types

Reinventing scope with 'then'

I also need some lambdas sometimes!

Generating builder patterns

Linearization

Write everything as a single linear chain

Except that 'then' is reinventing parenthesis

But it is second class construct...

Type providers

Encoding complex logic via member access

Type providers for member generation

Laziness for scaling to large hierarchies

Fancy types for the masses!

Row types to track schema

\[\definecolor{cc}{RGB}{204,82,34} \definecolor{mc}{RGB}{0,0,153} \frac {\Gamma \vdash e : {\color{cc}[f_1:\tau_1, \ldots, f_n:\tau_n]}} {\Gamma \vdash e.\text{drop}~f_i : {\color{cc} [f_1:\tau_1, \ldots, f_{i-1}:\tau_{i-1}, f_{i+1}:\tau_{i+1}, \ldots, f_n:\tau_n]}}\]

Embed row types in nominal types

\[\frac {\Gamma \vdash e : {\color{mc} C_1}} {\Gamma \vdash e.\text{drop}~f_i : {\color{mc} C_2}} \quad{\small \text{where}}\]

\[\begin{array}{l} \\[-0.5em] {fields({\color{mc} C_1}) = {\color{mc} \{f_1:\tau_1, \ldots, f_n:\tau_n\}}}\\ {fields({\color{mc} C_2}) = {\color{mc} \{f_1:\tau_1, \ldots, f_{i-1}:\tau_{i-1}, f_{i+1}:\tau_{i+1}, \ldots, f_n:\tau_n\}}} \end{array}\]

Type providers

Fancy types for the masses!

Row types and phantom types

Session types for communication

Add your own fancy type here!

Structure of a type provider

Context \(L\) maps names to
definitions and nested contexts

\(L(C) = {\color{mc}\text{type}}~C(x:\tau) = \overline{m}, L'\)

Pivot provider takes schema and
provides a class with context

\(\text{pivot}(F) = C, L\)

Pivot type provider

Generate classes that drop individual columns

Summary

Investigating programming

Conclusions

Towards dot-driven data exploration

  • Exploring an interesting design principle
  • Using a wide range of case studies
  • Can prove and evaluate a few things
  • Am I making falsifiable claims?

TheGamma: thegamma.net
Demos: turing.thegamma.net
Histogram: tomasp.net/histogram

Tomas Petricek, tomas@tomasp.net