Design Principles

An analogy is often made between systems design and designing other things such as a house. To a certain extent this analogy holds true. We are attempting to place design components into a structure that meets a specification. The analogy breaks down when we consider their respective operating environments. It is generally assumed in the design of a house that the landscape, once suitably formed, will not change.

Software environments are slightly different. Systems are interactive and dynamic. Any system we design will be nested inside other systems, either electronic, physical, or human. In the same way, the different layers in computer networks  (application layer, transport layer, physical layer etc) nest different sets of meanings and function, so as to do activities we do at different levels of a project.

As the designer of these systems we must also have a strong awareness of the setting, the domain in which we work. This knowledge gives us clues to patterns in our data and helps us give context to our work.

Machine learning projects can be divided into 5 distinct activities.

  • Defining the object and specification
  • Preparing and exploring the data
  • Model building
  • Implementation
  • Testing
  • Deployment

The designer is mainly concerned with the first three. However they often play, and in many projects, must play a major role in other activities. It should also be said that a projects timeline is not necessarily a linear sequence of these activities. The important point is that they are distinct activities. They may occur in parallel with each other, and in other ways interact, but they generally involve different types of tasks that can be separated out in terms of human and other resources, stage of the project and externalities. Also we need to consider that different activities involve distinct operational modes. Consider the different ways your brain works when you are sketching out an idea compared to when you are working on a specific analytical task, say a piece of code.

Often the hardest question is where to begin? We can start drilling into the different elements of a problem, with an idea of a feature set, and perhaps an idea of the model or models we might use. This may lead to a defined object and specification, or we may have to do some preliminary research, such as checking possible data sets and sources, or available technologies or to talk to other engineers and technicians and users of the system. We need to explore the operating environment and various constraints, is it part of a web application, or a laboratory research tool for a scientists?

In the early stages of design, our work flow will flip between working on the different elements. For instance we start with a general problem, perhaps having an idea of the task, or tasks, necessary to solve it, then divide it into what we think  are key features, try it out on a few models with a toy data set, go back to refine the feature set, adjust our model, and  precisely define tasks, refine the model. When we feel our system is robust enough we can test it out on some real data. Of course then we may need to go back, change our feature set...

Selecting and optimising features is often a major activity (really a task in itself) for the machine learning designer. We cannot really decide what features we need until we have adequately described the task, and of course both the task and features are constrained by the types of feasible models we can build.

Types of questions

As designers we are asked to solve a problem. We are given some data and an expected output, a solution. The first step is to frame the problem in a way that a machine can understand and in a way that carries meaning for a human. There are the following six broad approaches we can take to precisely defining our machine learning problem.

  • 1.Exploratory: Here we are analysing data looking for patterns, such as a trend or relationship between variables. Exploration often will lead to a hypothesis such as linking diet with disease, or crime rate with suburb. 
  • 2.Descriptive: Here we are trying to summarise specific features of our data. For instance, the average life expectancy, average temperature or number of left handed people in a population. 
  • 3.Inferential: An inferential question is one that attempts to support a hypothesis, for instance proving (or disproving) a general link between life expectancy and income by using different data sets. 
  • 4, Predictive: Here we are trying to anticipate future behaviour. For instance predicting life expectancy by analysing income.
  • 5.Casual: his is an attempt to find what causes something. Does low income cause a lower life expectancy? 
  • 6. Mechanistic: This tries to answer questions like 'what are the mechanisms that link income with life expectancy?'

Most machine learning problems involve several of these types of questionings during development. For instance we may first explore the data looking for patterns or trends, then we may describe certain key features of our data. This may enable us to make a prediction, find a cause or a mechanism behind a particular problem.

Are you asking the right question?

The question must be plausible and meaningful in its subject area. This domain knowledge enables you to understand the things that are important in your data, and see where a certain pattern or correlation has meaning.

The question should be as specific as is possible while still giving a meaningful answer. It is common for it to begin as a generalised statement, such as 'I wonder if wealthy means healthy'. So you do some further research and find you can get statistics for wealth by geographic region, say from the tax office. We can measure health by its inverse, illness, say by hospital admissions and test our initial proposition 'Wealthy means healthy' by tying illness to geographic region.  We can see that a more specific question relies on several, perhaps questionable, assumptions.

We should also consider that our results may be confounded by the fact that poorer people may not have health care insurance so are less likely to attend a hospital despite illness. There is an interaction between what we want to find out and what we are trying to measure. This interaction perhaps hides a true rate of illness. All is not lost however. Because we know about these things then perhaps we can account for them in our model.

We can make things a lot easier by learning as much as we can about the domain we are working in.

You could possibly save yourself a lot of time by checking that the question you are asking, or part of it, has not already been answered, or that there are data sets available that may shed some light. Often you have to approach a problem from several different angles at once.  Do as much preparatory research as you can. It is quite likely that other designers have done work that could shed light on your own.


A task is a specific activity conducted over a period of time. We have to distinguish between the human tasks, planning designing, implementing, to the machine tasks, classification, clustering, regression, and so on.  Also consider when there is overlap between human and machine as, say, in selecting features for a model. Our goal really, in machine learning, is to transform as many of these tasks as we can, from human tasks to machine tasks.

It is not always easy to match a real world problem to a specific task. Many real world problems may seem to be conceptually linked, but require a very different methods to solve. Alternatively problems that appear completely different may require similar methods. Unfortunately there is no simple look up table to match a particular task to a problem. A lot depends on the setting and domain. A similar problem in one domain may be unsolvable in another, perhaps because of lack of data. There are however a small number of tasks that are applied to a large number of methods to solve many of the most common problem types. In other words in the space of all possible programming tasks there is a subset of tasks that are useful to our particular problem. Within this subset there is a smaller subset of tasks that are easy and can actually be applied usefully to our problem. 

Let’s first briefly introduce the basic machine tasks. Classification is probably the  most common type of task, due in part because it is relatively easy, well understood and solves a lot of common problems. Multi class classification (for instance handwriting recognition) can sometimes be achieved by chaining binary classification tasks, however we lose information this way, and we become unable to define a single decision boundary. For this reason multi class classification is often treated separately to binary classification.

There are cases where what we are interested in is not discrete classes but a real number, for instance a probability. These type of problems are regression problems. Both classification and regression require a training set of correctly labelled data. They are supervised learning problems.

Clustering, on the other hand, the task of grouping items without any information on that group is an unsupervised learning task. Clustering is basically making a measurement of similarity.

Related to clustering is association, this is unsupervised task to find a certain type of pattern in a data. This task is behind movie recommender systems, and 'customers who bought this also bought ..'  on checkout pages of online stores.

From these basic machine tasks there are a number of derived tasks. In many applications this may simply be applying the learning model to a prediction to establish a casual relationship. We must remember that explaining and predicting are not the same. A model can make a prediction but unless we know explicitly how it made the prediction we cannot begin to form a comprehensible explanation. An explanation requires human knowledge of the domain.

We can also use a prediction model to find exceptions from a general pattern Here we are interested in the individual cases that deviate from the predictions. This is often called anomaly detection and has wide applications in things like detecting bank fraud, noise filtering, even in the search for extraterrestrial life.

An important and potentially useful task is subgroup discovery. Our goal here is not, as in clustering, to partition the entire domain but rather find a subgroup that has a substantially different distribution. In essence subgroup discovery is trying to find relationships between a dependant, target variable an many independent explaining variables. We are not trying to find a complete relationship rather a group of instances that are different in ways that are important in the domain. For instance establishing a subgroup 'smoker = true' and 'family history =true' for a target variable of 'heart disease =true'.

Finally we consider control type tasks. These act to optimise control setting to maximise a pay off given different conditions. This can be achieved in several ways. We can clone expert behaviour, the machine learns directly from a human and makes predictions of actions given different conditions. The task is to learn a prediction model for the experts actions. This is similar to reinforcement learning where the task is to learn a relationship between conditions and optimal action.


In machine learning systems, software flaws can have very serious real world consequences, what happens if your algorithm embedded in an assembly line robot classifies a human as a production component. Clearly in critical systems you need to plan for failure. There should be a robust fault and error detection procedure embedded in your design process and systems.

Sometimes it is necessary to design quite complex systems simply for the purpose of debugging and checking for logic flaws. It may be necessary to generate data sets with specific statistical structure, or create 'artificial humans' to mimic an interface. For example, developing a methodology to verify that the logic of your design is sound at the data, model, and task levels. Errors can be hard to track, and as a scientist, you must assume there are errors and try to prove your hypothesis.

The idea of recognising and gracefully catching errors is important for the software designer, but as machine learning systems designers we must take it a step further. We need to be able to capture, in our models, the ability to learn from an error.


Consideration must be given to how we select our test set and in particular how representative it is of the rest of the data set. For instance if it is noisy compared to the training set, it will give poor results on the test set suggesting our model is over fitting, when in fact this is not the case. To avoid this a process of cross validation is used. This works by randomly dividing the data into, for example ten chunks of equal size. We use nine chunks for training the model and one for testing. We do this 10 times, using each chunk once for testing. Finally we take an average of test set performance. Cross validation is used with other supervised learning problems besides classification, but, as you would expect, unsupervised learning problems need to be evaluated differently.

Since with an unsupervised task, we do not have a labelled training set. Evaluation can therefore be a little more tricky since we do not know what a correct answer looks like. In a clustering problem for instance we can compare the quality of different models by measures such as the ratio of cluster diameter compared with the distance between clusters. However in problems of any complexity we can never tell if there is another model not yet built, that is better.


Optimisation problems are ubiquitous in many different domains from finance, business, management, sciences, mathematics and engineering. Optimisation problems consist of:

  1. An objective function that we want to maximise or minimise,
  2. Decision variables, a set of controllable inputs. These inputs are varied within the specified constraints in order to satisfy the objective function.
  3. Parameters. These are uncontrollable or fixed inputs
  4. Constraints. These are relations between decision variables and parameters. They define what values the decision variables can have.

Most optimisation problems have a single objective function. In the cases where we may have multiple objective functions we often find that they conflict with each other, for example reducing costs and increasing output. In practice we try to reformulate multiple objectives into a single function, perhaps by creating a weighted combination of objective functions. In our costs and output example a variable along the lines of cost per unit might work.

The decision variables are the variables we control to achieve the objective. They may include things like resources or labour. The parameters of the module are fixed for each run of the model. We may use several 'cases' where we choose different parameters to test variations in conditions.

There are literally thousands of solution algorithms to the many different types of optimisation problems. Most of them involve first finding a feasible solution, then  iteratively improving on this ,by adjusting the decision variables, to, hopefully, find an optimum solution. Many optimisation problems can be solved reasonably well with linear programming techniques. They assume that the objective function and all the constraints are linear with respect to the decision variables. Where these relationships are not linear we often use a suitable quadratic function. If the system is non linear then the objective function may not be convex. That is it may have more than one local minima and there is no assurance that a local minima is a global minima.