The Alan Turing Institute, London
http://tomasp.net |
tomas@tomasp.net |
@tomaspetricek
The gap between spreadsheets and programming
Making programming languages a bit easier
Learning from spreadsheet interaction model
Unsafe dynamic access in a typed language
1: 2: 3: 4: 5: 6: 7: |
|
Unsafe dynamic access in a typed language
1: 2: 3: 4: 5: 6: 7: |
|
Accessing data from external data sources
Languages do not understand data
There is rarely explicit schema
Manually define types to caputre it
Easier in dynamic languages
Athletes by number of gold medals from Rio 2016
1: 2: 3: 4: 5: 6: |
|
Language and data source features you need to know
Python dictionaries {"key": value}
Generalised indexers .[ condition ]
Operation names sort_values
Data column names "Athlete"
\(\emptyset \vdash e : \tau\)
\(\pi(~~~~~~~) \vdash e : \tau\)
Type providers for structured data
Structural shape inference
Language integration via type providers
Relative type safety
{title : string, author : {age : int}} {author : {age : float}}
{ title : option<string>, author : {age : float} }
{ coordinates : {lng:num, lat:num} } string
{ coordinates : {lng:num, lat:num} } + string
Pragmatic design choices for usability
Prefers records for tooling
Predictable and stable
Open world assumption about sums
Encoding complex logic via simple member access
Type providers for member generation
Laziness for scaling to large hierarchies
Fancy types for the masses!
Row types to track names and types of fields
\[\definecolor{cc}{RGB}{204,82,34} \definecolor{mc}{RGB}{0,0,153} \frac {\Gamma \vdash e : {\color{cc}[f_1:\tau_1, \ldots, f_n:\tau_n]}} {\Gamma \vdash e.\text{drop}~f_i : {\color{cc} [f_1:\tau_1, \ldots, f_{i-1}:\tau_{i-1}, f_{i+1}:\tau_{i+1}, \ldots, f_n:\tau_n]}}\]
Embed row types in provided nominal types
\[\frac {\Gamma \vdash e : {\color{mc} C_1}} {\Gamma \vdash e.\text{drop}~f_i : {\color{mc} C_2}} \quad{\small \text{where}}\]
\[\begin{array}{l} \\[-0.5em] {fields({\color{mc} C_1}) = {\color{mc} \{f_1:\tau_1, \ldots, f_n:\tau_n\}}}\\ {fields({\color{mc} C_2}) = {\color{mc} \{f_1:\tau_1, \ldots, f_{i-1}:\tau_{i-1}, f_{i+1}:\tau_{i+1}, \ldots, f_n:\tau_n\}}} \end{array}\]
Powerful idea that works in other contexts
Row types and phantom types
Session types for communication
Add your own fancy type here!
Context \(L\) maps names to definitions and nested contexts
\(L(C) = {\color{mc}\text{type}}~C(x:\tau) = \overline{m}, L'\)
Pivot provider takes schema and provides a class with context
\(\text{pivot}(F) = C, L\)
Generate classes that drop individual columns
Generate class corresponding to a record shape
Well typed programs do not go wrong.
(As long as the world is well-behaved.)
Given representative samples and an input value
\(S(d)\sqsubset S(d_1, \ldots, d_n)\)
Any program written using a type provider reduces
\(e_{user}[x\leftarrow {\color{mc}\text{new}}~C(d)] \rightsquigarrow^* v\)
Provided type can change only in limited ways
\(C[e] \rightarrow C[e.M]\)
\(C[e] \rightarrow C[{\sf match}~e~{\sf with}~\ldots]\)
\(C[e] \rightarrow C[int(e)]\)
Making programming with data easier
Learning from spreadsheets
Understanding programmer interactions
Handling joins and data cleaning
Read, analyse and visualize!
Dot-driven Towards minimal calculus of interactions
Fancy types Encoding row types via type providers
Relative safety Necessity when working with data
Tomas Petricek
tomas@tomasp.net | @tomaspetricek | tomasp.net/academic
thegamma.net | fslab.org | gamma.turing.ac.uk
Don Syme, Keith Battocchi, Kenji Takeda, Donna Malayeri and Tomas Petricek. Themes in Information-Rich Functional Programming for Internet-Scale Data Sources. In proceedings of DDFP 2013
Tomas Petricek, Gustavo Guerra and Don Syme. Types from data: Making structured data first-class citizens in F#. PLDI 2016
Tomas Petricek. Data exploration through dot-driven development. In proceedings of ECOOP 2017