Tomas Petricek
Department of Distributed and Dependable Systems
petricek@d3s.mff.cuni.cz
https://d3s.mff.cuni.cz/~petricek
@tomasp.net




Restoring trust in data in society?
Understandability
Correctness
Reproducibility
Source code should give us all of these!





Invalid columns?
Language does not understand the data


Invalid columns?
Language does not understand the data
Complex abstractions
Designed for experts


Invalid columns?
Language does not understand the data
Complex abstractions
Designed for experts
Inconsistencies
Are we sure it is right?


Correctness meaning all
recommendations valid
Completeness meaning
all valid options listed

Dot-driven development ECOOP '17 (A)
Relative type safety PLDI '16 (A*)
Iterative prompting VL/HCC '22
Composable library design J. Funct. Program '21

Minimal formal language model
\(e\,:=\,x\,|\,v\,|\,e.N\)
Types in program are used correctly
\(~~~~~~~\Gamma \vdash e : \tau\)
Formal model of how programs run
\(~~~~~~~e\mapsto v\)




For any \(e\) such that \(\emptyset \vdash e : \tau\), the program evaluates to a value \(e \mapsto v\) and the result has the right type \(v \in \tau\)
It contains things from the real world! i.e. \(\Gamma_0 \neq \emptyset\)
For some data \(d\) and type provider \(\pi\), let \(\Gamma_0 = \pi(d)\)

For any \(e\) such that \(\emptyset \vdash e : \tau\), the program evaluates to a value \(e \mapsto v\) and the result has the right type \(v \in \tau\)
Given a data sample \(d\), type provider \(\pi\) and an actual input
\(d'\) such that
\(\pi(d') \sqsubseteq \pi(d)\) then for any program \(e\) such that \(x:\pi(d) \vdash e : \tau\),
it evaluates \(e[d'/x]\mapsto v\) and \(v \in \tau\).







Two Core A* conferences (POPL, PLDI) and
one D1 (98 percentile) journal (IEEE TKDE)


Make the IPCC
reports more understandable, transparent and reproducible?


PRIMUS (2024-2027)


Tomas Petricek
Department of Distributed and Dependable Systems
petricek@d3s.mff.cuni.cz
https://d3s.mff.cuni.cz/~petricek
@tomasp.net

v1




Spreadsheets, data science notebooks, business intelligence
I propose to view systems & tools for data exploration as programming tools


Major gap in tooling
for data science!
A small number of composable primitives
Non-programmers can complete more tasks



Makes accessing and exploring data easier for programmers


\(\emptyset \vdash e : \tau\)

\(\pi(~~~~~~) \vdash e : \tau\)

For any \(e\) such that \(\emptyset \vdash e : \tau\), the program evaluates to a value \(e \mapsto v\) and the result has the right type \(v \in \tau\)
Given sample inputs \(d_1, \ldots, d_n\) and an actual input \(d\) such that \(S(d) \sqsubseteq S(d1, \ldots, d_n)\) then for any program \(e\) such that \(x:S(d_1, \ldots, d_n) \vdash e : \tau\), it evaluates \(e[d/x]\mapsto v\) and \(v \in \tau\).



Lets non-programmers complete simple data programming tasks





Core A* (POPL, PLDI) and Core A (ECOOP)
conferences and journals (JFP, IEEE TKDE)



Programming systems
with Joel Jakubovic
User experience of proof assistants with Jan Verter
Logics for visual programming with Pablo Donato
PRIMUS (2024-2027) and collaboration
with prof. Jan Vitek and FIT CTU




Tomas Petricek
Department of Distributed and Dependable Systems
petricek@d3s.mff.cuni.cz
https://d3s.mff.cuni.cz/~petricek
@tomasp.net
