CO559: Software Engineering
Living with errors

Tomas Petricek

email: t.petricek@kent.ac.uk
twitter: @tomaspetricek
office: S129A

Trivial problem or a crisis

The birth of a crisis

Programming is trivial

Charles Babbage (19th century)
ENIAC designers (1940s)

The birth of a crisis

Labour crisis (1950s)
NATO conference (1968)
Application crisis (1990s)

Software industry paradox (Ensmenger, 2010)

How can we explain the [seemingly perpetual] crisis
in what is considered one of the most successful and
profitable industries of all time?

How did software
get reliable?

Combination of good engineering practices

Management
Testing
Debugging
Over-engineering

Learning to live with errors

Over time, software engineers developed a range of practical approaches for coping with errors.

Living with errors (1/2)

Depends on application domain

What happens if it goes wrong?
Is human around to help?
How quickly do we need the system?

Business computing

Failure costs some money, but not lives
Balance between correctness and delivery time
Methods need to support teamwork

Living with errors (2/2)

Telecommunications

Hardware errors will happen
System should recover if possible
Redundancy can be built-in

Computer art

No lives or money is threatened
Error as a source of creativity
Human may find ways to incorporate errors

Different kinds of errors

How did it happen

Mistake - following a wrong plan
This logic is not actually going to sort a list

Failure - error in translation from plan to reality
My algorithm is correct, but I implemented it poorly

Slip - mechanical error when implementing code
Syntax error or off by one error in array access

Malfunction - hardware failures
Wrong data received in network communication

When do we find out?

Static error - when developing the software

Immediate runtime error
right when it occurs

Latent runtime error
sometime after it occurs

What can go wrong when executing a sub-routine?

Output error - no way of producing valid result
e.g. file does not exist

Input error - operation cannot be done because the inputs are wrong!

Runtime errors technically

Input error - no character at index 99

1:	`"hello world".charAt(99)`

Input error - null always represents invalid value

1:	`"hello world".replace("world", null)`

Output error - there's no valid result if file doesn't exist

1:	`fs.readFile("c:/temp/non-existent.txt")`

Engineering approach

Engineering approach to errors

Errors are inevitable

Programmers are just humans
Cannot possibly check everything
Time and budget are limited

Good practices

Do not assume errors will not happen
Make it easy to find out what's wrong

Indicating erroneous situations

Two ways of indicating errors

Error code - return null or -1
Exception - throw Java exception etc.

Why modern systems prefer exceptions?

Cannot be confused with valid result
They propagate automatically
Their name explains the reason for error

Three ways of handling bad inputs (1/3)

Ignore bad inputs - assume that input is valid

1: 
2: 
3:

function niceName(name) {
  return name[0].toUpperCase() + name.substr(1);
}

When will this function not work?

The function will fail if name is null
The function will fail if name is empty string

Three ways of handling bad inputs (2/3)

Recover from bad inputs - try to do the best you can

1: 
2: 
3: 
4:

function niceName(name) {
  if (!name || name.length == 0) return "";
  else return name[0].toUpperCase() + name.substr(1);
}

What happens for bad inputs?

You get an empty name as the result
Is this what you want? It depends...

Three ways of handling bad inputs (3/3)

Fail when input is bad - terminate execution immediately

1: 
2: 
3: 
4: 
5:

function niceName(name) {
  if (!name) throw "Name cannot be null"
  if (name.length == 0) throw "Name cannot be empty"
  return name[0].toUpperCase() + name.substr(1);
}

Terminating execution immediately

Most languages support throwing exceptions
Exception to indicate what was wrong with input

Three ways of handling bad inputs

Ignore bad inputs
Simple code, but unexpected things may happen
Good enough for local helper functions

Recover from bad inputs
Default behaviour may cause serious error later
Good if imperfect result is better than no result

Fail when input is bad
Stops execution, but tells you what to fix
Good for public functions called by others

Defensive programming

"Better safe than sorry" programming

Assume that your code might not be called correctly

Protect your code by throwing for bad inputs

Use extra security measures in low-level code

Check for null values explicitly

Four more approaches

Test-based engineering approach

Prevent errors using tests

Tests as a method for checking correctness
Tests as a light-weight specification

Testing in practice

Test that we get correct results for sample inputs
Test expected behaviour for bad inputs
Test-driven development and writing tests first
Write tests for known past issues

Mathematical approach

Prove programs correct

Testing can only prove presence of bugs, not their absence. Rather than testing, we should prove programs correct.

Mathematical approach in practice

Use types to specify correct behaviour
More precise types to capture, e.g. non-nullability
Verification tools for critical software

Recovery-based engineering approach

The kill-yourself strategy

When the process does not know how to react, it should kill itself and let someone else deal with it.

Error handling in Erlang

Program consists of many small processes
Processes communicate via messages
Supervisor processes handle failures

Artistic approach

In live coded music performance, there are no errors. Only more or less appropriate sounds. An error might be a source for creativity.

Is this just for artists?

Any system with live human involvement
Exploratory data science process
Writing live systems to process data

Summary

Living with errors

History, context and types of errors
Mistakes, failures, slips and malfunctions
Input errors and output errors

Engineering approach to error handling
Representing using error codes and exceptions
Three ways of handling bad inputs

Four more approaches to error handling
Mathematical proofs and engineering tests
Recovery in telecom and acceptance in art

CO559: Living with errors

What you should remember from this lecture

Appropriate approaches in different contexts
Different kinds of errors based on why, how and what
Ways of handling errors and mechanisms

Tomas Petricek
t.petricek@kent.ac.uk | @tomaspetricek

References

Selected chapters from books

Papers and articles

CO559: Software Engineering Living with errors