| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
Math 183 • Statistical Methods • Spring 2026
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
Observation
An individual unit from which data are collected.
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
Variable
A characteristic for which different observations can take on different values.
Observation
An individual unit from which data are collected.
Variable
A characteristic for which different observations can take on different values.
Constant
A characteristic that is the same for all observations
A research team recruits 100 adults aged between 25-40 to participate in a 6-month study. Participants log their daily coffee intake and wear sleep trackers at night to record their sleep quality.
150 office workers are surveyed over a 3-month period where they report their weekly exercise routines and undergo monthly stress tests.
200 households in a city participate in a year-long study where their usage of cooking oil is recorded monthly. Additionally, all adult members undergo quarterly heart health check-ups.

Warning
Sometimes computers can’t (or won’t) understand the difference between different types of variables. It’s up to us to tell them!
Beverage Preference
A survey in a school asks students their preferred beverage among tea, coffee, or juice.
Daily Screen Time
A study measures the daily screen time in hours of 100 individuals.
Types of Pets Owned
A neighborhood survey asks households about the types of pets they own.
Monthly Savings
Individuals are asked about their monthly savings in dollars.
Work Commute Method
A city survey asks residents about their preferred method of commuting to work.
Number of Books Read
A library conducts a survey asking individuals about the number of books they read in a month.
Favorite Music Genre
A radio station surveys its listeners to know their favorite music genre.
Weekly Exercise Hours
A health app collects data on the weekly exercise hours logged by its users.
Example
A local gym aims to find the most effective workout routine for weight loss. They create a 3-month program where participants are divided into two groups. One group follows a cardio-centric routine, while the other engages in strength training. Participants’ weights are recorded at the start and end of the program. The gym seeks to understand which workout type leads to greater weight loss, to offer better guidance to its members.
Exercise Type & Weight Loss
A fitness center compares the effectiveness of two workout routines - HIIT and Yoga, for weight loss. The type of exercise is the explanatory variable, while the amount of weight loss is the response variable.
Teaching Methods & Student Performance
An educator evaluates two teaching methods to understand which one enhances student performance. The teaching method is the explanatory variable, and the students’ performance is the response variable.
Diet Type & Energy Levels
A nutritionist compares vegetarian and non-vegetarian diets to assess their impact on energy levels. The diet type is the explanatory variable, while the energy level is the response variable.
Medication Dosage & Recovery Time
In a clinical trial, different dosages of a medication are administered to patients to observe the effects on recovery time. The dosage is the explanatory variable, and the recovery time is the response variable.
Sleep Hours & Productivity
A company explores the relationship between hours slept and productivity the next day among its employees. The sleep hours is the explanatory variable, and the productivity is the response variable.
| school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | reason | guardian | traveltime | studytime | failures | schoolsup | famsup | paid | activities | nursery | higher | internet | romantic | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | G3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GP | F | 18 | U | GT3 | A | 4 | 4 | at_home | teacher | course | mother | 2 | 2 | 0 | yes | no | no | no | yes | yes | no | no | 4 | 3 | 4 | 1 | 1 | 3 | 6 | 5 | 6 | 6 |
| GP | F | 17 | U | GT3 | T | 1 | 1 | at_home | other | course | father | 1 | 2 | 0 | no | yes | no | no | no | yes | yes | no | 5 | 3 | 3 | 1 | 1 | 3 | 4 | 5 | 5 | 6 |
| GP | F | 15 | U | LE3 | T | 1 | 1 | at_home | other | other | mother | 1 | 2 | 3 | yes | no | yes | no | yes | yes | yes | no | 4 | 3 | 2 | 2 | 3 | 3 | 10 | 7 | 8 | 10 |
| GP | F | 15 | U | GT3 | T | 4 | 2 | health | services | home | mother | 1 | 3 | 0 | no | yes | yes | yes | yes | yes | yes | yes | 3 | 2 | 2 | 1 | 1 | 5 | 2 | 15 | 14 | 15 |
| GP | F | 16 | U | GT3 | T | 3 | 3 | other | other | home | father | 1 | 2 | 0 | no | yes | yes | no | yes | yes | no | no | 4 | 3 | 2 | 1 | 2 | 5 | 4 | 6 | 10 | 10 |
| GP | M | 16 | U | LE3 | T | 4 | 3 | services | other | reputation | mother | 1 | 2 | 0 | no | yes | yes | yes | yes | yes | yes | no | 5 | 4 | 2 | 1 | 2 | 5 | 10 | 15 | 15 | 15 |
| sex | mean grade |
|---|---|
| F | 30.98 |
| M | 33.22 |
\[ 33.22 - 30.98 = \color{green}{2.24} \]
\[ 33.22 - 30.98 = \color{green}{2.24} \]
Vaccine efficacy. A vaccine trial is conducted as follows:
| 💉✅ | 💉❌ | |
|---|---|---|
| COVID 🙁 | \(n_1\) | \(n_2\) |
| No COVID 🙂 | \(m_1\) | \(m_2\) |
\[ \begin{aligned} \%[🤒+✅💉] = \frac{n_1}{n_1 + m_1} \quad\text{ and }\quad \%[🤒+❌💉] = \frac{n_2}{n_2 + m_2} \end{aligned} \]
\[
\text{Efficacy} = 1 - \frac{\%[🤒+✅💉]}{\%[🤒+❌💉]}
\]
Consider the following* examples from a vaccine trial. Which is more reliable?
| 💉✅ | 💉❌ | |
|---|---|---|
| COVID 🙁 | 5 | 1200 |
| No COVID 🙂 | 45 | 4800 |
Efficacy: \[
E = 1 - \frac{10\%}{20\%} = 50\%
\]
| 💉✅ | 💉❌ | |
|---|---|---|
| COVID 🙁 | 500 | 1200 |
| No COVID 🙂 | 4500 | 4800 |
Efficacy: \[
E = 1 - \frac{10\%}{20\%} = 50\%
\]
* hypothetical
Test of Significance
A principled statstical approach to examining whether some observed effect may truly exist as opposed to it being an artefact of random chance
Data Summarization
Data summarization is the process of condensing large amounts of data into smaller, more informative representations that capture the essential information of the dataset. This is crucial for making the data more understandable, interpretable, and manageable.
Here are some key methods:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Flat

Right Skewed

Symmetric

Left Skewed

Mean
The mean of a set of quantitative variables \(x_1, x_2, \dots, x_n\) is given by
\[ \bar{x} = \frac 1n \sum_{i=1}^n x_i = \frac{x_1 + x_2 + \dots + x_n}{n} \]
The mean of \(\{1, 2, 3, 4, 10, x\}\) is \(3.333\). What is \(x\)?
Median
Let the data points be \(x_1, x_2, \ldots, x_n\) arranged in non-decreasing order, i.e., \(x_i \le x_{i+1}\) for all \(i\). Then the median, \(M\), is:
\[ M = \begin{cases} x_{\frac{n+1}{2}} & \text{if $n$ is odd}\\ \\ \frac{x_{\frac{n}{2}} + x_{(\frac{n}{2} + 1)}}{2} & \text{if $n$ is even} \end{cases} \]
The median of \(\{1, 2, 3, 4, 10, x\}\) is \(2.5\). What is \(x\)?
Mode
The mode of data points \(x_1, x_2, \ldots, x_n\) is the value which appears most frequently
The mode of \(\{1, 2, 3, 4, 10, x\}\) is \(1\). What is \(x\)?
Which of the following histograms exhibits more variability?


Variance, as the name suggests, is a measure of this variability.
What effect does variance have on our perception of the data?
Variance
The variance of a set of quantitative variables \(x_1, x_2, \dots, x_n\) is given by
\[ s^2 = \frac{1}{n-1} \sum_{i=1}^n({x_i - \bar{x}})^2 \]
The standard deviation, \(s\), is is the square root of the variance.
The standard deviation of \(\{1, 2, 3, 4, 10, x\}\) is \(1\). What is \(x\)?
Lower Quantile
For \(0 \le \alpha \le 1\), the lower \(\alpha\)–quantile of \(x_1, x_2, \ldots, x_n\) is the value for which at least \(\alpha\) fraction of the points have a value less than or equal to it.
\(Q_{0.1}\) quantile
Lower Quantile
For \(0 \le \alpha \le 1\), the lower \(\alpha\)–quantile of \(x_1, x_2, \ldots, x_n\) is the value for which at least \(\alpha\) fraction of the points have a value less than or equal to it.
\(Q_{0.6}\) quantile
Lower Quantile
For \(0 \le \alpha \le 1\), the lower \(\alpha\)–quantile of \(x_1, x_2, \ldots, x_n\) is the value for which at least \(\alpha\) fraction of the points have a value less than or equal to it.
\(Q_{0.95}\) quantile
Upper Quantile
For \(0 \le \alpha \le 1\), the upper \(\alpha\)–quantile of \(x_1, x_2, \ldots, x_n\) is the value for which at least \(\alpha\) fraction of the points have a value greater or equal to it.
\(q_{0.05}\) quantile
Upper Quantile
For \(0 \le \alpha \le 1\), the upper \(\alpha\)–quantile of \(x_1, x_2, \ldots, x_n\) is the value for which at least \(\alpha\) fraction of the points have a value greater or equal to it.
\(q_{0.1}\) quantile
Upper Quantile
For \(0 \le \alpha \le 1\), the upper \(\alpha\)–quantile of \(x_1, x_2, \ldots, x_n\) is the value for which at least \(\alpha\) fraction of the points have a value greater or equal to it.
\(q_{0.95}\) quantile