Machine Learning for beginners

Hello there and welcome again to yet another post on machine learning for beginners. Today I will walk you through one of the major field in data that is the talk of on everyone's lips. Machine learning. Today, we will have a look at what machine learning is, what problems require machine learning domain, how to know when to use machine learning and then we will wrap it by creating a simple regression model.(do not worry if you do not get what a model is or what regression involves I'll walk you through all the basics you need to understand machine learning more intuitively.) Lets get fraudy, shall we: What is machine Learning That's a good place to start. So, what is machine learning? Machine learning is an art of programming computers so that they can learn from data. The part of machine learning that learns and makes predictions is called a model. That's right, it's an art. And if you are paying close, attention you should realize that machine learning is an art of developing models. If you are passionate about art you definitely will enjoy machine learning. (pun intended) What does Machine Learning involve Now that we know how to define machine learning in one long sentence, let's get a little deeper and get to understand what kind of problems or projects require to be solved using machine learning concepts: 1. Problems that require a lot of fine tuning or long lists of rules. Traditional rule-based systems require explicitly coded instructions for every condition. When the number of rules becomes massive or unmanageable (e.g., spam detection based on keywords or transaction fraud) machine learning becomes practical. A spam filter would learn which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam folder. That's a perfect example of how machine learning is helpful because that simple problem would require thousands of else if statements just to detect spam emails. 2. Complex programming problems that traditional hard coding yields zero to no results. No matter your coding skills, there are some real world problems you cannot hard code on your computer because there is no simple algorithmic solution to detect the problem let alone solving the problem. Foe example, Image recognition, natural language understanding, and speech-to-text conversion are governed by high-dimensional, non-linear patterns that can’t be feasibly encoded manually. ML models like Convolutional Neural Networks (CNNs) or Transformers automatically extract abstract features and model these relationships through training. (Don't get too scared about big words like Convolutional Neural Networks yet, there are simply no easier terms to explain this nut crack. But I hope you get the point.) 3. A highly fluctuating environment. A machine learning environment that can be retrained on new data Think about a highly volatile system involving stock market price predictions. This kind of system requires constant updated of the system model with new available information to make sure that the system performance remains optimal. This adaptability makes ML preferable over static, rule-based systems. 4. Getting insights from large amounts of data. Digging into large amounts of data to gain insights is called data mining and Machine learning excels in it. A Simple Machine Learning Project Machine learning models can be categorized in various ways, such as by the type of supervision (supervised, unsupervised, or reinforcement learning), their ability to learn incrementally (online vs. batch learning), and whether they rely on instance-based learning or model-based learning. In this context, we will build a simple model that falls under supervised learning, where the model is trained on labeled data to make predictions. It's just a simple model so just follow along. Supervised Learning. Under supervised learning, there can be various classifications: Classification. Training a machine learning model based on a class. eg ( ham or spam emails) Regression. Predicting a target numeric value with a set of data with given features. Logistic regression. Classification of regression eg 20% chance of being spam. The model we are going to create below will be a very simple regression model that will give you the big picture of what creating machine learning models involves. It uses a python code that loads data, separates the inputs x and the labels y, creates a scatter plot for visualization , and then trains a linear model that makes predictions. A regression model First you want to make sure you have a python setup on your machine and have a code editor of your choice already fired ready to create a model. For this demo, i am going to use VS code. press ctrl+shift+pto open a new notebook and name the notebook with a .ipynbextension. Then make sure the data set you are going to use(for those with locally available datasets) is in th

May 18, 2025 - 09:16
 0
Machine Learning for beginners

Hello there and welcome again to yet another post on machine learning for beginners. Today I will walk you through one of the major field in data that is the talk of on everyone's lips. Machine learning. Today, we will have a look at what machine learning is, what problems require machine learning domain, how to know when to use machine learning and then we will wrap it by creating a simple regression model.(do not worry if you do not get what a model is or what regression involves I'll walk you through all the basics you need to understand machine learning more intuitively.) Lets get fraudy, shall we:

What is machine Learning

That's a good place to start. So, what is machine learning? Machine learning is an art of programming computers so that they can learn from data. The part of machine learning that learns and makes predictions is called a model. That's right, it's an art. And if you are paying close, attention you should realize that machine learning is an art of developing models. If you are passionate about art you definitely will enjoy machine learning. (pun intended)

What does Machine Learning involve

Now that we know how to define machine learning in one long sentence, let's get a little deeper and get to understand what kind of problems or projects require to be solved using machine learning concepts:

1. Problems that require a lot of fine tuning or long lists of rules. Traditional rule-based systems require explicitly coded instructions for every condition. When the number of rules becomes massive or unmanageable (e.g., spam detection based on keywords or transaction fraud) machine learning becomes practical. A spam filter would learn which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam folder. That's a perfect example of how machine learning is helpful because that simple problem would require thousands of else if statements just to detect spam emails.

2. Complex programming problems that traditional hard coding yields zero to no results. No matter your coding skills, there are some real world problems you cannot hard code on your computer because there is no simple algorithmic solution to detect the problem let alone solving the problem. Foe example, Image recognition, natural language understanding, and speech-to-text conversion are governed by high-dimensional, non-linear patterns that can’t be feasibly encoded manually. ML models like Convolutional Neural Networks (CNNs) or Transformers automatically extract abstract features and model these relationships through training. (Don't get too scared about big words like Convolutional Neural Networks yet, there are simply no easier terms to explain this nut crack. But I hope you get the point.)

3. A highly fluctuating environment. A machine learning environment that can be retrained on new data Think about a highly volatile system involving stock market price predictions. This kind of system requires constant updated of the system model with new available information to make sure that the system performance remains optimal. This adaptability makes ML preferable over static, rule-based systems.

4. Getting insights from large amounts of data. Digging into large amounts of data to gain insights is called data mining and Machine learning excels in it.

A Simple Machine Learning Project

Machine learning models can be categorized in various ways, such as by the type of supervision (supervised, unsupervised, or reinforcement learning), their ability to learn incrementally (online vs. batch learning), and whether they rely on instance-based learning or model-based learning. In this context, we will build a simple model that falls under supervised learning, where the model is trained on labeled data to make predictions. It's just a simple model so just follow along.

Supervised Learning.

Under supervised learning, there can be various classifications:

  1. Classification. Training a machine learning model based on a class. eg ( ham or spam emails)

  2. Regression. Predicting a target numeric value with a set of data with given features.

  3. Logistic regression. Classification of regression eg 20% chance of being spam.

The model we are going to create below will be a very simple regression model that will give you the big picture of what creating machine learning models involves. It uses a python code that loads data, separates the inputs x and the labels y, creates a scatter plot for visualization , and then trains a linear model that makes predictions.

A regression model

First you want to make sure you have a python setup on your machine and have a code editor of your choice already fired ready to create a model. For this demo, i am going to use VS code. press ctrl+shift+pto open a new notebook and name the notebook with a .ipynbextension. Then make sure the data set you are going to use(for those with locally available datasets) is in the same directory as the .ipynb notebook. But for our case, we are going to obtain our dataset from an online Github repo. But for future models, where you have your dataset locally stored, make sure that you have it in the same directory as your notebook file.

Installing the Libraries.

pip install matplotlib numpy pandas scikit-learn

Importing the Libraries

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 
from sklearn.linear_model import LinearRegression

pyplot library provides functions to create static, animated, and interactive visualizations (e.g., scatter plots, line charts).
numpy library is a fundamental package for numerical computing in Python, used for handling arrays, mathematical operations, and linear algebra. (Today we are just going to use it for mathematical ops and handling of arrays. No linear algebra for today)
pandas library is a powerful data analysis library used for reading, writing, and manipulating structured data through DataFrames.
scikit-learn library is a popular and versatile Python library for machine learning. It provides a wide range of tools and algorithms for tasks like classification, regression, clustering, and data preprocessing.The LinearRegression class from Scikit-learn enables fitting linear models to data for regression tasks.

Obtaining, extracting and storing the datasets


data_root = "https://github.com/ageron/data/raw/main/"
lifesat=pd.read_csv(data_root + "lifesat/lifesat.csv")
X = lifesat[["GDP per capita (USD)"]].values
y = lifesat[["Life satisfaction"]].values

here is what the above code does:

data_root = "https://github.com/ageron/data/raw/main/"
Defines the base URL where the dataset is hosted; it's used to construct the full path to the CSV file.

lifesat = pd.read_csv(data_root + "lifesat/lifesat.csv")
Downloads and loads the CSV file into a pandas DataFrame named lifesat. A DataFrame is a datastructure used in python libraries like pandas and other languages like R used to organize raw csv data in rows and columns

X = lifesat[["GDP per capita (USD)"]].values
Extracts the GDP per capita column as a NumPy array X, formatted as a 2D array with shape (n_samples, 1).

y = lifesat[["Life satisfaction"]].values
Extracts the life satisfaction scores as a NumPy array y, also as a 2D array for model training compatibility.

Visualizing the dataset

Why visualize during training?: Visualizations (e.g., scatter plots, histograms) help reveal patterns, trends, outliers, and potential correlations, guiding feature selection and preprocessing.

lifesat.plot(kind='scatter',  grid=True, x="GDP per capita (USD)", y="Life satisfaction")
plt.axis([23500, 62500, 4, 9])
plt.show()

lifesat.plot(...)
Creates a scatter plot from the lifesat DataFrame, plotting GDP per capita on the x-axis and Life satisfaction on the y-axis, with grid lines enabled.

plt.axis([23500, 62500, 4, 9])
Sets the range of the x-axis from 23,500 to 62,500 and the y-axis from 4 to 9 to focus on a specific region of the data.

plt.show()
Displays the plotted figure.

Model selection

model = LinearRegression()
Select the model LinearRegression() which is an inbuilt function and the store it in the variable model.

Model Training.

The line model.fit(X, y) trains the machine learning model (e.g., LinearRegression) by finding the best parameters (e.g., slope and intercept) that minimize the prediction error between input features X and target values y. If I threw you off the bus a little, try to imagine your model is a student learning to draw a straight line through dots on a paper.

When you say model.fit(X, y), you're telling the student: "Look at these dots (X and y), and draw the best straight line that goes as close as possible to all of them." It learns the underlying relationship in the data to make future predictions on unseen input.
It is simply used to select the best features that will train the model to make the right predictions with minimal errors.

Can Our Model Make A Prediction?

X_new = [[37_655.2]] #cyprus gdp per capita in 20202
print(model.predict(X_new)) #output: [[6.30165767]]

This code tells the trained model: "If a country has a GDP per capita of 37,655.20, what life satisfaction should we expect?" The model answers: "Based on what I learned, about 6.30." — it's using the line it learned to make a guess.

And there you go, your first regression model. Based on what we did, was that difficult? Then what's stopping you from creating more models?

Leave a comment for critics or anything you have in mind. That's all for now and see you in the next one.