CO559: Software Engineering
Living with errors






Tomas Petricek

email: t.petricek@kent.ac.uk
twitter: @tomaspetricek
office: S129A

Trivial problem or a crisis

The birth of a crisis

Programming is trivial

  • Charles Babbage (19th century)
  • ENIAC designers (1940s)

The birth of a crisis

  • Labour crisis (1950s)
  • NATO conference (1968)
  • Application crisis (1990s)

Software industry paradox (Ensmenger, 2010)

How can we explain the [seemingly perpetual] crisis
in what is considered one of the most successful and
profitable industries of all time?

How did software
get reliable?

Combination of good engineering practices

Management
Testing
Debugging
Over-engineering

Learning to live with errors

Over time, software engineers developed a range of practical approaches for coping with errors.

Living with errors (1/2)

Depends on application domain

  • What happens if it goes wrong?
  • Is human around to help?
  • How quickly do we need the system?

Business computing

  • Failure costs some money, but not lives
  • Balance between correctness and delivery time
  • Methods need to support teamwork

Living with errors (2/2)

Telecommunications

  • Hardware errors will happen
  • System should recover if possible
  • Redundancy can be built-in

Computer art

  • No lives or money is threatened
  • Error as a source of creativity
  • Human may find ways to incorporate errors

Different kinds of errors

How did it happen

Mistake - following a wrong plan
This logic is not actually going to sort a list

Failure - error in translation from plan to reality
My algorithm is correct, but I implemented it poorly

Slip - mechanical error when implementing code
Syntax error or off by one error in array access

Malfunction - hardware failures
Wrong data received in network communication

When do we find out?

Static error - when developing the software

Immediate runtime error
right when it occurs

Latent runtime error
sometime after it occurs

What can go wrong when executing a sub-routine?

Output error - no way of producing valid result
e.g. file does not exist

Input error - operation cannot be done because the inputs are wrong!

Runtime errors technically

Input error - no character at index 99

1: 
"hello world".charAt(99)

Input error - null always represents invalid value

1: 
"hello world".replace("world", null)

Output error - there's no valid result if file doesn't exist

1: 
fs.readFile("c:/temp/non-existent.txt")

Engineering approach

Engineering approach to errors

Errors are inevitable

  • Programmers are just humans
  • Cannot possibly check everything
  • Time and budget are limited

Good practices

  • Do not assume errors will not happen
  • Make it easy to find out what's wrong

Indicating erroneous situations

Two ways of indicating errors

  • Error code - return null or -1
  • Exception - throw Java exception etc.

Why modern systems prefer exceptions?

  • Cannot be confused with valid result
  • They propagate automatically
  • Their name explains the reason for error

Three ways of handling bad inputs (1/3)

Ignore bad inputs - assume that input is valid

1: 
2: 
3: 
function niceName(name) {
  return name[0].toUpperCase() + name.substr(1);
}

When will this function not work?

The function will fail if name is null
The function will fail if name is empty string

Three ways of handling bad inputs (2/3)

Recover from bad inputs - try to do the best you can

1: 
2: 
3: 
4: 
function niceName(name) {
  if (!name || name.length == 0) return "";
  else return name[0].toUpperCase() + name.substr(1);
}

What happens for bad inputs?

You get an empty name as the result
Is this what you want? It depends...

Three ways of handling bad inputs (3/3)

Fail when input is bad - terminate execution immediately

1: 
2: 
3: 
4: 
5: 
function niceName(name) {
  if (!name) throw "Name cannot be null"
  if (name.length == 0) throw "Name cannot be empty"
  return name[0].toUpperCase() + name.substr(1);
}

Terminating execution immediately

Most languages support throwing exceptions
Exception to indicate what was wrong with input

Three ways of handling bad inputs

Ignore bad inputs
Simple code, but unexpected things may happen
Good enough for local helper functions

Recover from bad inputs
Default behaviour may cause serious error later
Good if imperfect result is better than no result

Fail when input is bad
Stops execution, but tells you what to fix
Good for public functions called by others

Defensive programming

"Better safe than sorry" programming

Assume that your code might not be called correctly

Protect your code by throwing for bad inputs

Use extra security measures in low-level code

Check for null values explicitly

Four more approaches

Test-based engineering approach

Prevent errors using tests

  • Tests as a method for checking correctness
  • Tests as a light-weight specification

Testing in practice

  • Test that we get correct results for sample inputs
  • Test expected behaviour for bad inputs
  • Test-driven development and writing tests first
  • Write tests for known past issues

Mathematical approach

Prove programs correct

Testing can only prove presence of bugs, not their absence. Rather than testing, we should prove programs correct.

Mathematical approach in practice

  • Use types to specify correct behaviour
  • More precise types to capture, e.g. non-nullability
  • Verification tools for critical software

Recovery-based engineering approach

The kill-yourself strategy

When the process does not know how to react, it should kill itself and let someone else deal with it.

Error handling in Erlang

  • Program consists of many small processes
  • Processes communicate via messages
  • Supervisor processes handle failures

Artistic approach

In live coded music performance, there are no errors. Only more or less appropriate sounds. An error might be a source for creativity.

Is this just for artists?

  • Any system with live human involvement
  • Exploratory data science process
  • Writing live systems to process data

Summary

Living with errors

History, context and types of errors
Mistakes, failures, slips and malfunctions
Input errors and output errors

Engineering approach to error handling
Representing using error codes and exceptions
Three ways of handling bad inputs

Four more approaches to error handling
Mathematical proofs and engineering tests
Recovery in telecom and acceptance in art

CO559: Living with errors

What you should remember from this lecture

  • Appropriate approaches in different contexts
  • Different kinds of errors based on why, how and what
  • Ways of handling errors and mechanisms


Tomas Petricek
t.petricek@kent.ac.uk | @tomaspetricek

References

Selected chapters from books

Papers and articles