Introduction to Machine Learning Algorithms

Machine Learning Categories

Machine learning is a group of algorithms that allows computers to mimic human learning behaviour. That is, they can learn from experience, or more precisely, from historical data. Like any child, the more experiences they have, the better their performance will be. Learning, which is also known as training, can be categorised as either supervised or unsupervised. The inputs from which it learns are known as predictors, features or independent variables.

For supervised learning, the training data has both inputs and outputs. The outputs are known as outcomes, targets, responses or dependent variables.

For unsupervised learning, there are only inputs, and the model is attempting to group the data into clusters based on their inputs only. Effectively, it is looking for hidden patterns in the data. An example of this could be predictive maintenance, where we are trying to cluster healthy equipment or equipment that will fail after X weeks based on their characteristics such as speed of operation, temperature, hours run, vibration etc.

Supervised learning can be further broken down into classification or regression. For classification models, the output is a series of discrete states or categorical responses, e.g. pass or fail or grade A, B or C. For regression models, the output is a continuous value, e.g. expected yield or energy consumption is X.

Machine Learning Models

For each of the above categories, there are numerous algorithms available. This diagram shows some of the more popular ones. Generally, simple models are always preferred over complex ones, but accuracy should never be significantly sacrificed for simplicity.

The choice of model is important as:

Some models cannot tolerate features that measure the same underlying quantity, i.e. multicollinearity or correlation between predictors. For example, suppose you believe that temperature would be a good predictor. Would that be the setpoint temperature or the measured value? Assuming that the measured value tracks the setpoint value, then you would expect them to be highly correlated. Thus only one is required, and the inclusion of the second one would add no new information.
Many models cannot use samples with any missing data.
Some models are severely compromised when irrelevant predictors are used.

Thus there are a series of trade-offs to be made, and it is impossible to determine the best model before the data has been fully explored. In practice, it is normally wise to try a number of disparate types of models against any particular dataset before one can identify the best model.