Introduction to Probability


Unit 1 — Foundations of Probability

Probability theory is the mathematical study of uncertainty. In Data Science, uncertainty appears everywhere: in predicting user behaviour, measuring risk, forecasting demand, estimating disease spread, analysing noisy sensor data, and training machine‑learning models. Very often, we do not know the exact outcome of an experiment before it happens. Instead, we describe the likelihood of various possible outcomes using probability.

The central idea of probability theory is that every uncertain situation can be modelled in a logically consistent way. Once the situation is modelled, we can make predictions, quantify uncertainty, and support rational decision‑making.

Probability is also a way of thinking. A Data Scientist constantly reasons under uncertainty — whether evaluating the accuracy of a model, interpreting statistical output, or designing experiments. Therefore, a strong foundation in probability is one of the most valuable intellectual tools in Data Science.

In this unit, we introduce the fundamental building blocks of probability, including random experiments, outcomes, sample spaces, and events. Later units build on these ideas.


1.1 Random Experiments

An experiment is any process that produces an outcome.

A random experiment is an experiment whose outcome cannot be predicted with certainty, even if it is repeated under identical conditions. However, while the exact outcome is unknown, the set of all possible outcomes is known.

Examples of random experiments include:

  • Tossing a coin
  • Rolling a six‑sided die
  • Selecting a student at random from a class
  • Measuring the time taken for a web page to load
  • Observing whether a manufactured item is defective
  • Predicting tomorrow’s rainfall

Notice that in each case:

  1. Only one outcome happens in each trial.
  2. The outcome is uncertain before the trial.
  3. We can list all possible outcomes in advance.

These characteristics form the basis of probability modelling.


1.2 Outcomes and Sample Space (S)

An outcome is a single possible result of a random experiment.

The sample space, denoted by S, is the set of all possible outcomes of the experiment.

Examples:

  • Tossing a coin → S = {Heads, Tails}
  • Rolling a die → S = {1, 2, 3, 4, 5, 6}
  • Selecting a blood group → S = {A, B, O, AB}
  • Measuring temperature → S consists of all real‑number temperature values in a realistic range

Sample spaces may be classified as follows:

1.2.1 Finite Sample Space

A finite sample space has a limited number of possible outcomes.

Example:

S = {1, 2, 3, 4, 5, 6}

Finite sample spaces frequently appear in games of chance and discrete probability models.

1.2.2 Countably Infinite Sample Space

A countably infinite sample space contains infinitely many outcomes, but they can still be listed or counted logically.

Example:

S = {0, 1, 2, 3, 4, …}

An application is modelling the number of phone calls received at a call centre in a day.

1.2.3 Uncountable (Continuous) Sample Space

A continuous sample space contains infinitely many outcomes forming a continuum (like real numbers). You cannot list all outcomes one by one.

Example:

S = all real numbers between 0 and 100

This appears in measurement data such as:

  • Height
  • Weight
  • Temperature
  • Flight arrival delays

Continuous sample spaces are the foundation for continuous probability distributions such as the Normal distribution.


1.3 Events

An event is a subset of the sample space. It represents a collection of outcomes that share a particular property.

When a random experiment is performed, either an event occurs or it does not occur.

Example — Rolling a Die:

S = {1, 2, 3, 4, 5, 6}

Possible events:

  • A = {2, 4, 6} → event that an even number appears
  • B = {5, 6} → event that the outcome is greater than 4
  • C = {3} → event that the outcome equals 3

Events can be grouped as:

  • Simple events — containing a single outcome (e.g., {4})
  • Compound events — containing multiple outcomes (e.g., {1, 3, 5})
  • Certain event — the entire sample space S
  • Impossible event — the empty set, written as ∅

Events behave like sets, and probability theory uses the language of set theory extensively.


1.4 Probability as a Measure of Uncertainty

Probability assigns a number between 0 and 1 to each event.

  • Probability 0 means the event is impossible.
  • Probability 1 means the event is certain.
  • Probability 0.5 means the event is equally likely to occur or not occur.

If we denote the probability of event A by P(A), then:

0 ≤ P(A) ≤ 1

These probabilities can be interpreted in multiple ways:

  1. Classical (theoretical) interpretation — all outcomes are equally likely.
  2. Frequentist interpretation — probability equals long‑run relative frequency.
  3. Subjective interpretation — probability expresses degree of belief.

In Data Science, all three interpretations appear depending on the context.


1.5 Importance of Probability in Data Science

Data Science relies heavily on probability because real‑world data contain randomness, noise, and uncertainty.

Some key applications include:

  • Classification models predict probability of class membership
  • Logistic regression outputs probabilities rather than fixed labels
  • Naive Bayes uses conditional probabilities to classify observations
  • Bayesian learning updates prior beliefs as new data arrive
  • Hidden Markov Models model sequential probability
  • Markov Decision Processes support reinforcement learning
  • Risk models estimate probability of rare but costly events

Thus, probability is not just theoretical mathematics — it underpins nearly every analytical method used in professional practice.


Unit 2 — Relationships Among Events

Events can interact with each other in various ways. Understanding how events overlap, exclude, or depend on each other is critical because different probability rules apply depending on these relationships.

In this unit, we study:

  • Intersection of events
  • Union of events
  • Mutually exclusive events
  • Complementary events
  • Independent and dependent events
  • Conditional probability

Each idea builds logically on the ones before it, forming the grammar of probability calculus.


2.1 Intersection of Events (A ∩ B)

The intersection of two events A and B is the event containing all outcomes that belong to both A and B.

Symbolically:

A ∩ B = { outcomes that are in A and in B }

Example:

Let A = {students who passed}
Let B = {female students}

Then:

A ∩ B = {female students who passed}

Visual Description

Imagine two overlapping circles. The overlapping area represents A ∩ B — the event that satisfies both conditions simultaneously.

The intersection is key when calculating the probability that two events occur together.


2.2 Union of Events (A ∪ B)

The union of two events A and B is the event containing all outcomes that belong to A, or to B, or to both.

Symbolically:

A ∪ B = { outcomes in A or B or both }

Example:

Let A = {cars with sunroof}
Let B = {cars with diesel engines}

Then:

A ∪ B = {cars with sunroof or diesel engine or both}

In plain language, union corresponds to the logical word “OR”.


2.3 Mutually Exclusive (Disjoint) Events

Two events A and B are mutually exclusive if they cannot occur at the same time.

Formally:

A ∩ B = ∅

Example:

When rolling a die:

A = {1}
B = {6}

Only one number can appear per roll. Therefore A and B are mutually exclusive.

However, many real‑world events are not mutually exclusive. For example, the events “being left‑handed” and “wearing glasses” can occur together.


2.4 Complement of an Event (Aᶜ)

The complement of an event A, written Aᶜ, consists of all outcomes in the sample space that are not in A.

Example:

If A = {students who passed}

Then Aᶜ = {students who did not pass}

A and Aᶜ split the sample space into two non‑overlapping parts. Exactly one of them must occur when the experiment is performed.


2.5 Independent Events

Two events A and B are independent if the occurrence of one event does not change the probability of the other event.

In other words, knowing that one event occurred provides no useful information about the likelihood of the other event.

Example:

Let A = result of a coin toss
Let B = result of rolling a fair die

The outcome of the coin toss does not influence the die roll. Thus A and B are independent.

Independence is one of the most important but misunderstood ideas in probability. Many Data‑Science algorithms assume independence — for instance, Naive Bayes assumes conditional independence among features.


2.6 Dependent Events

Events are dependent when the occurrence of one event changes the probability of the other.

Example:

Drawing two cards from a deck without replacement:

Let A = first card is Ace
Let B = second card is Ace

Once the first card is drawn, the probabilities for the second card change. These events are dependent.

Dependence is common in real‑world data: time series, correlated financial returns, and behavioural data rarely satisfy independence.


2.7 Conditional Probability

Conditional probability measures the probability of an event, given that another event has already occurred. It is denoted by:

P(A | B)

which is read as “probability of A given B”.

The formal definition is:

P(A | B) = P(A ∩ B) / P(B), provided P(B) > 0

This formula tells us that, once we know B has occurred, only outcomes inside B are relevant. Among these, we find the proportion that also lie in A.

Conditional probability is central to:

  • Bayesian inference
  • Spam filtering
  • Medical diagnosis
  • Risk assessment
  • Fraud detection
  • Recommendation systems

Whenever new information becomes available, conditional probability helps update beliefs.


2.8 Independent vs Dependent Events — A Formal View

Two events A and B are independent if and only if:

P(A | B) = P(A)

This means the probability of A is the same, regardless of whether B occurs or not.

Equivalently, we can say:

P(A ∩ B) = P(A) P(B)

If either of these equalities does not hold, then the events are dependent.

This condition is essential in modelling assumptions and testing independence in data.


Unit 3 — Rules of Probability

Probability theory is governed by several fundamental rules. These rules ensure that probabilities remain consistent and logically coherent.

Understanding these rules allows us to compute probabilities in complex situations by breaking them into simpler parts.

The most important rules are:

  1. The Complement Rule
  2. The Addition Rule (general form)
  3. The Addition Rule for mutually exclusive events
  4. The Multiplication Rule (general form)
  5. The Multiplication Rule for independent events

3.1 Complement Rule

For any event A:

P(Aᶜ) = 1 − P(A)

This rule follows from the fact that A and its complement Aᶜ together cover the entire sample space and do not overlap.

Therefore, their probabilities must sum to 1.

This rule is especially useful when calculating:

  • Probability of “at least one success”
  • Probability of “no defect”
  • Reliability of systems

Often it is easier to compute the probability of the complement and subtract from 1.


3.2 Addition Rule — General Case

The probability that either A or B (or both) occurs is given by:

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Why subtract the intersection?

Because when adding P(A) and P(B), we count the overlapping outcomes twice. Subtracting P(A ∩ B) corrects this double‑counting.

This rule works for all events — whether independent, dependent, or overlapping.


3.3 Addition Rule for Mutually Exclusive Events

If events A and B are mutually exclusive, then:

P(A ∩ B) = 0

Therefore, the Addition Rule simplifies to:

P(A ∪ B) = P(A) + P(B)

This is intuitive because mutually exclusive events cannot happen together.


3.4 Multiplication Rule — General Case

For any events A and B:

P(A ∩ B) = P(A | B) P(B)

This means:

Probability (A and B) = Probability (B occurs first) × Probability (A occurs given B has already occurred)

This is extremely important when analysing sequential processes such as:

  • Drawing balls from a bag
  • Customer purchase journeys
  • Multi‑stage manufacturing processes
  • Disease progression models

3.5 Multiplication Rule for Independent Events

If A and B are independent, then:

P(A ∩ B) = P(A) P(B)

This simplifies calculation greatly. But independence must be justified — never assumed casually.


Unit 4 — Worked Examples (Step‑by‑Step)

Worked examples help convert theory into understanding. In this unit, we solve a range of problems, moving from basic to applied scenarios relevant to Data Science.


Example 1 — Tossing a Coin and Rolling a Die

Let A = event that the coin shows Heads.
Let B = event that the die shows 6.

Since the experiments are independent:

P(A) = 1/2
P(B) = 1/6

Therefore:

P(A ∩ B) = (1/2)(1/6) = 1/12

This means that in the long run, about one in twelve joint trials will produce Heads and a 6.


Example 2 — Website Conversion Rate

Suppose the probability that a website visitor clicks on an advertisement is 0.04. If 10,000 visitors access the website, the expected number of clicks is:

Expected clicks = 0.04 × 10,000 = 400

This interpretation of probability as long‑run relative frequency is widely used in digital marketing analytics.


Example 3 — Medical Testing and False Positives

Suppose:

  • 5% of people in a population have a particular disease
  • A test detects the disease correctly 95% of the time
  • The test gives a false positive 2% of the time for healthy individuals

Questions like:

“What is the probability that a person actually has the disease given that the test is positive?”

require conditional probability and Bayesian reasoning — a topic you will study in depth later.


Example 4 — Dependent Events: Drawing Cards

A standard deck has 52 cards.

Let A = first card drawn is an Ace.
Let B = second card drawn is an Ace (without replacement).

P(A) = 4/52 = 1/13

After one Ace is removed, only 3 Aces remain out of 51 cards. Thus:

P(B | A) = 3/51

Therefore:

P(A ∩ B) = P(A) P(B | A) = (1/13)(3/51)

This is a case of dependent events.


Unit 5 — Visualising Event Relationships

Although diagrams may not always be visible, you should learn to imagine event relationships visually.

Intersection (A ∩ B)

Visualise two overlapping circles. The overlapping region is shaded.

Union (A ∪ B)

Both circles are shaded completely, including overlap.

Complement (Aᶜ)

The space outside the event circle is shaded.

Mutually Exclusive Events

Two circles appear completely separate, with no overlap.

Being able to visualise events will greatly assist with intuition.


Unit 6 — Summary Tables and Comparisons

Table 1 — Types of Events and Their Meanings

Event Type — Meaning — Example

Mutually Exclusive — Cannot occur together — Rolling a 1 and rolling a 6 in one throw

Independent — One does not influence the other — Coin toss and die roll

Dependent — One influences the probability of the other — Drawing cards without replacement

Exhaustive — Together they cover the whole sample space — All outcomes of a die roll


Unit 7 — Proof Sketches of Key Probability Rules

Mathematical proof gives confidence that results are always true — not just observed by coincidence.


7.1 Proof of Complement Rule

We know that:

S = A ∪ Aᶜ

Also, A and Aᶜ are mutually exclusive.

Therefore:

P(S) = P(A) + P(Aᶜ)

But P(S) = 1 because the sample space always occurs.

So:

1 = P(A) + P(Aᶜ)

Rearranging gives the complement rule:

P(Aᶜ) = 1 − P(A)


7.2 Proof of Addition Rule

If we simply add P(A) and P(B), the outcomes in the intersection A ∩ B are counted twice. Therefore we subtract P(A ∩ B) once to correct this.

Thus:

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

This proof relies on set‑theoretic logic.



Unit 8 — Multiple Choice Questions (MCQs)

  1. If A and B are mutually exclusive, then:

a) P(A ∩ B) = P(A) + P(B)

b) P(A ∩ B) = 0

c) P(A ∩ B) = 1

d) P(A ∩ B) = P(A)P(B)

Correct answer: b


  1. If P(A) = 0.4, then P(Aᶜ) equals:

a) 0.4

b) 0.6

c) 1.4

d) 0.2

Correct answer: b


  1. Conditional probability is written as:

a) P(A ∩ B)

b) P(A | B)

c) P(A ∪ B)

d) P(B | Aᶜ)

Correct answer: b


  1. Two events are independent if:

a) P(A | B) = P(A)

b) P(A ∩ B) = 0

c) P(A) = 1

d) P(B) = 0

Correct answer: a


  1. The sum of probabilities of all outcomes in the sample space equals:

a) 0

b) 1

c) 0.5

d) Depends on the experiment

Correct answer: b

Leave a Comment

💬 Join Telegram