The Gamma: Data exploration
through dot-driven development






Tomas Petricek

The Alan Turing Institute, London
http://tomasp.net | tomas@tomasp.net | @tomaspetricek

Spreadsheets

Error-prone

Not reproducible

Easy to use

Programming

Fully reproducible

Can be analyzed

Even Python is hard

Rio medalists in Python

Athletes by number of gold medals from Rio 2016

1: 
2: 
3: 
4: 
5: 
6: 
olympics = pd.read_csv("olympics.csv")
olympics[olympics["Games"] == "Rio (2016)"]
  .groupby("Athlete")
  .agg({"Gold": sum})
  .sort_values(by="Gold", ascending=False)
  .head(8)

Rio medalists in Python

Language and data source features you need to know

Python dictionaries {"key": value}

Generalised indexers .[ condition ]

Operation names sort_values

Data column names "Athlete"

DEMO
Dot-driven data exploration

Tooling complexity

From: Transcript: End-user programming of social apps
Jonathan Edwards, YOW 2015

Tooling complexity

From: Transcript: End-user programming of social apps
Jonathan Edwards, YOW 2015

Dot-driven development

Encoding complex logic via simple member access

Type providers for member generation

Laziness for scaling to large hierarchies

Fancy types for the masses!

Pivot type provider


Context \(L\) maps names to definitions and nested contexts

\(\definecolor{mc}{RGB}{0,32,172} L(C) = {\color{mc}\text{type}}~C(x:\tau) = \overline{m}, L'\)

Pivot provider takes schema and provides a class with context

\(\text{pivot}(F) = C, L\)

Pivot type provider

Generate class that constructs a relational algebra query

DEMO
Fancy types for the masses

Row types and phantom types


Row types to track names and types of fields

\[\definecolor{cc}{RGB}{172,0,32} \frac {\Gamma \vdash e : {\color{cc}[f_1:\tau_1, \ldots, f_n:\tau_n]}} {\Gamma \vdash e.\text{drop}~f_i : {\color{cc} [f_1:\tau_1, \ldots, f_{i-1}:\tau_{i-1}, f_{i+1}:\tau_{i+1}, \ldots, f_n:\tau_n]}}\]

Embed row types in provided nominal types

\[\frac {\Gamma \vdash e : {\color{mc} C_1}} {\Gamma \vdash e.\text{drop}~f_i : {\color{mc} C_2}} \quad{\small \text{where}}\]

\[{fields({\color{mc} C_2}) = {\color{mc} \{f_1:\tau_1, \ldots, f_{i-1}:\tau_{i-1}, f_{i+1}:\tau_{i+1}, \ldots, f_n:\tau_n\}}}\]

Fancy types for the masses!

Powerful idea that works in other contexts

Row types and phantom types

Session types for communication

Add your own fancy type here!

DEMO
Unifying programming and spreadsheets

Thank you!

Data exploration through dot-driven development

Data science Bridging spreadsheets and programming

Dot-driven Can express more than you'd think!

Fancy types Encoding row types via type providers



Tomas Petricek
Questions and suggestions: tomas@tomasp.net | @tomaspetricek Implementation and paper: thegamma.net | tomasp.net/academic