A Gentle Introduction to Machine Learning

The Basics of Machine Learning

Oluwaseyi Okunlola
5 min readFeb 10, 2024

What’s Machine Learning?

Machine learning is the application of artificial intelligence which provides the computer system the ability to learn from dataset without explicitly being programmed.

The objective of machine learning is to allow the computer system to learn without human intervention.

The learning process starts from inputting the data into the system, the system then comes up with appropriate statistical tools to analyse the data.

Machine Learning Method

Machine learning algorithms are often categorized into two:

  • Supervised Machine and
  • Unsupervised Machine Learning Algorithms.

Machine Learning Hierarchy

Supervised Machine Learning Algorithms

This is when the system is trained on a labelled dataset.

The labelled datasets have both input and output parameters.

In this type of machine learning, both the training and validation datasets are labelled as shown below:

The input parameter is the validation data. While the output parameter is the training data.

The output parameters are the columns highlighted for both figure 1 and 2.

While the input parameters are the other unhighlighted columns.

Figure 1 is a dataset of a shopping store useful in predicting whether or not a customer will buy a particular product under consideration based on the customer’s gender age, and salary.

Input parameter: Gender, Age, Salary

Output parameter: Purchased, i.e., 0 or 1.

— 1 means yes, the customer will purchase

— 0 means no, the customer won’t purchase

Figure 2 is a meteorological dataset for predicting wind speed based on different parameters.

Input parameters: Temperature, Pressure, Relative Humidity, Wind Direction.

Output Parameter: Wind Speed.

How Supervised Learning works:

  • The algorithm consists of an outcome (also called target or dependent) variable which is to be predicted from a given set of predictors (independent variables).
  • A function that map inputs to desired outputs is generated using these set of variables.
  • The training process continues until the model achieves a desired level of accuracy on the training data
  • Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression, etc.

Types of Supervised Learning

The supervised learning method is classified into two:

1) Regression: It’s a supervised learning where the output is having continuous value, that is, a real number. The goal is to get a value that is much closer to actual output value as our model can predict. Evaluation is then done later by calculating the error value. The smaller the error, the greater the accuracy of our regression model.

Regression problem

Regression problem is a problem when the output variable is a real value (numerical) such as number, dollars, weight, etc.

Examples of regression problems:

— Simple Linear Regression

— Multiple Linear Regression

— Polynomial Regression

— Ridge Regression

— Lasso Regression

— Elastic net regression

— Decision tree regressor

— Random forest regressor

— Support vector regressor

2) Classification: Under classification, the output is defined by labels. That is, the output is a discrete value, having no numerical importance. The goal is to predict discrete values belonging to a particular class and then evaluate on the basis of accuracy.

Binary classification: In binary classification, the model predicts either 0 or 1; yes or no.

In multiclass classification, model predicts more than two class.

Unsupervised Machine Learning Algorithm

What differentiates the supervised machine learning from the unsupervised is that we do not have any target or outcome variable to predict (estimate).

The unsupervised algorithm is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention.

Examples of unsupervised learning algorithm

  • Apriori algorithm
  • K -means

Machine Learning Process

Variable

A variable is any characteristic or quantity that can be measured or counted. Variables could be in form of numbers (numerical) or labels (attributes).

Examples:

  • Age (21, 22, 70, …)
  • Gender (male, female)
  • Income (N50, N100, … N1000)
  • Country of Birth (USA, Russia, …)
  • Degree Ranking (First Class, Second Class, Third Class, etc)

Classifications of Variables

Most variables in a dataset can be broadly classified into two major types:

  • Numerical variables
  • Categorical variables

Numerical Variables

As the name implies, the values of numerical variables are numbers. They can be further classified into discrete and continuous variables.

Discrete numerical variable

This is a variable whose values are whole numbers. The numerical values cannot take up decimal. Hence the name discrete.

— Number of children in family

— Number of men working in a bank.

Continuous Numerical Variable

This is a variable that can take any value within some range. For example, the amount paid in a shopping mall is continuous; a customer can pay $10.8. Another example is the amount of time taken to complete a race, e.g., 10. 2 minutes.

Categorical Variables

These are variables that cannot be quantified by numbers but only by attributes, also called labels. Examples are gender (male or female), marital status (single, married, divorced, widowed).

Categorical variables can be further categorized into:

— Ordinal variable

— Nominal variables

Ordinal categorical variable

Categorical variables whose categories are meaningfully and hierarchically ordered are called ordinal. Example:

— Student’s grade in an exam (A, B, C, or Fail)

— Days of the week (Monday = 1, … , Sunday = 7)

— Degree Type (Bachelor, Master’s, PhD)

— Bachelor’s degree ranking (First Class, Second Class Upper, Third Class)

Nominal categorical variable

There is no particular ordering for labels in a nominal categorical variable. Example:

— Country of birth (Nigeria, Ghana, etc)

— Vehicle make (Peugeot, Mercedes, Toyota, etc)

Sometimes categorical variables are coded as numbers when the data are recorded (e.g., gender may be coded as 0 for male and 1 for female). In this case, the numbers have no numerical significance, they are just representative label for the categories, in this case, “male” and “female”.

--

--

Oluwaseyi Okunlola

• Teacher • Writer • Encourager • Typist • Data Analyst.