Day 3 : Machine Learning Essential Statistics & Math + Project : Advanced Statistical Analyzer
Self-Intro¶
Hi Everyone,
Welcome back to Day 3 of our Machine Learning series!
I’m Rohan Sai, aka AiKnight.
A Joke to Lighten the Mood
Why was the statistician good at baseball?
Because he had the best mean swing! 😄
Today, we’ll cover Essential Statistics & Math—the backbone of Machine Learning.
For hands-on exploration, try my Advanced Statistical Analysis Tool. It’s a one-stop app for analyzing, visualizing data effortlessly.
Without any late lets dive into it ...
Essential Statistics for Machine Learning¶
Statistics is the backbone of machine learning. It provides tools to understand, interpret, and manipulate data, enabling effective model building and evaluation.
1. Basic Concepts in Statistics¶
1.1. Descriptive Statistics¶
- Definition: Descriptive statistics summarize and describe the main features of a dataset.
- Types:
Measures of Central Tendency:
- Mean: The average of all data points.
- Formula: $ \text{Mean} = \frac{\sum X_i}{N} $
- Example: For data $[1, 2, 3, 4, 5]$, Mean = $ \frac{1+2+3+4+5}{5} = 3 $.
- Code:
import numpy as np data = [1, 2, 3, 4, 5] mean = np.mean(data) print(f"Mean: {mean}")
- Median: The middle value in a sorted dataset.
- Procedure:
- Sort the data.
- Find the middle value (or average of two middle values for even-sized data).
- Code:
median = np.median(data) print(f"Median: {median}")
- Procedure:
- Mode: The most frequent value in the dataset.
- Code:
from scipy import stats mode = stats.mode(data) print(f"Mode: {mode.mode[0]}, Frequency: {mode.count[0]}")
- Code:
- Mean: The average of all data points.
Measures of Dispersion:
- Variance: Average squared deviation from the mean.
- Formula: $ \sigma^2 = \frac{\sum (X_i - \mu)^2}{N} $
- Code:
variance = np.var(data) print(f"Variance: {variance}")
- Standard Deviation: Square root of variance.
- Formula: $ \sigma = \sqrt{\text{Variance}} $
- Code:
std_dev = np.std(data) print(f"Standard Deviation: {std_dev}")
- Range: Difference between the maximum and minimum values.
- Formula: $ \text{Range} = \text{Max} - \text{Min} $
- Code:
data_range = max(data) - min(data) print(f"Range: {data_range}")
- Variance: Average squared deviation from the mean.
1.2. Inferential Statistics¶
- Definition: Draw conclusions about a population based on sample data.
- Key Concepts:
- Population vs Sample:
- Population: Entire group being studied.
- Sample: Subset of the population.
- Hypothesis Testing:
- Null Hypothesis ($H_0$): Assumes no effect or relationship.
- Alternative Hypothesis ($H_a$): Assumes an effect or relationship.
- Code Example (t-test):
from scipy.stats import ttest_1samp sample = [5, 6, 7, 8, 9] t_stat, p_value = ttest_1samp(sample, popmean=7) print(f"T-statistic: {t_stat}, P-value: {p_value}")
- Population vs Sample:
1.3. Probability¶
- Definition: Measure of the likelihood of an event occurring.
- Key Formulas:
- $ P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total outcomes}} $
- Example:
- Tossing a coin: $ P(\text{Heads}) = \frac{1}{2} $.
- Code:
from random import choice outcomes = ['Heads', 'Tails'] print(f"Random Coin Toss: {choice(outcomes)}")
2. Advanced Topics in Statistics¶
2.1. Correlation and Causation¶
- Correlation: Measures the strength of the relationship between two variables.
- Pearson Correlation Coefficient ($r$):
- Formula: $ r = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y} $
- Values range from -1 to +1.
- Code:
from scipy.stats import pearsonr x = [1, 2, 3, 4, 5] y = [5, 4, 3, 2, 1] r, p = pearsonr(x, y) print(f"Pearson Correlation: {r}, P-value: {p}")
- Pearson Correlation Coefficient ($r$):
- Causation: Indicates a cause-and-effect relationship.
2.2. Distributions¶
- Normal Distribution:
- Symmetric bell-shaped curve.
- Code:
import matplotlib.pyplot as plt import numpy as np data = np.random.normal(0, 1, 1000) plt.hist(data, bins=30, density=True) plt.title("Normal Distribution") plt.show()
- Other Distributions:
- Binomial, Poisson, Exponential, etc.
2.3. Statistical Tests¶
- Chi-Square Test:
- Tests independence between categorical variables.
- Code:
from scipy.stats import chi2_contingency table = [[10, 20, 30], [6, 9, 17]] chi2, p, dof, expected = chi2_contingency(table) print(f"Chi-Square: {chi2}, P-value: {p}")
3. Applications in Machine Learning¶
- Why Statistics?:
- Understand data distributions.
- Perform feature selection.
- Evaluate model performance.
- Example:
- Calculating correlation to check multicollinearity.
- Using hypothesis testing for A/B testing.
4. Benefits and Demerits¶
Benefits:¶
- Provides insights into data patterns.
- Facilitates model evaluation and validation.
Demerits:¶
- Can be computationally intensive for large datasets.
- Misinterpretation of results can lead to incorrect conclusions.
Essential Math¶
Mathematics forms the foundation of machine learning. It helps understand how models work, optimize their parameters, and interpret results effectively.
1. Linear Algebra¶
Linear Algebra is central to data representation and manipulation in machine learning.
1.1. Vectors¶
- Definition: A vector is a one-dimensional array of numbers.
- Notation: $\mathbf{v} = [v_1, v_2, \dots, v_n]$
- Operations:
- Addition: $\mathbf{u} + \mathbf{v} = [u_1 + v_1, u_2 + v_2, \dots]$
- Scalar Multiplication: $c \cdot \mathbf{v} = [c \cdot v_1, c \cdot v_2, \dots]$
- Dot Product: $\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^n u_i v_i$
- Example:
import numpy as np u = np.array([1, 2]) v = np.array([3, 4]) dot_product = np.dot(u, v) print(f"Dot Product: {dot_product}")
- Example:
1.2. Matrices¶
- Definition: A matrix is a two-dimensional array of numbers.
- Notation: $\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}$
- Operations:
- Addition: $\mathbf{A} + \mathbf{B}$
- Scalar Multiplication: $c \cdot \mathbf{A}$
- Matrix Multiplication: $\mathbf{A} \cdot \mathbf{B}$
- Example:
A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) product = np.dot(A, B) print(f"Matrix Product:\n{product}")
- Example:
1.3. Determinants and Inverses¶
- Determinant: A scalar value representing a matrix's properties.
- Formula: $|\mathbf{A}| = a_{11}a_{22} - a_{12}a_{21}$
- Example:
from numpy.linalg import det determinant = det(A) print(f"Determinant: {determinant}")
- Inverse: $\mathbf{A}^{-1}$ satisfies $\mathbf{A} \cdot \mathbf{A}^{-1} = \mathbf{I}$
- Example:
from numpy.linalg import inv inverse = inv(A) print(f"Inverse Matrix:\n{inverse}")
- Example:
2. Calculus¶
Calculus is essential for optimization in machine learning.
2.1. Derivatives¶
- Definition: A derivative measures the rate of change of a function.
- Notation: $f'(x) = \frac{df(x)}{dx}$
- Rules:
- Power Rule: $\frac{d}{dx}[x^n] = n \cdot x^{n-1}$
- Chain Rule: $\frac{d}{dx}[f(g(x))] = f'(g(x)) \cdot g'(x)$
- Example:
from sympy import symbols, diff x = symbols('x') f = x**2 + 3*x + 2 derivative = diff(f, x) print(f"Derivative: {derivative}")
2.2. Gradients¶
- Definition: The gradient is a vector of partial derivatives.
- Usage: Gradients are used in optimization (e.g., gradient descent).
- Code:
from sympy import Matrix y, z = symbols('y z') f = x**2 + y**2 + z**2 gradient = Matrix([diff(f, var) for var in (x, y, z)]) print(f"Gradient: {gradient}")
2.3. Optimization¶
- Gradient Descent:
- Update rule: $\theta := \theta - \alpha \cdot \nabla J(\theta)$
- Code:
def gradient_descent(x, grad, lr=0.01, epochs=100): for _ in range(epochs): x = x - lr * grad(x) return x
3. Probability and Statistics¶
Understanding uncertainty and distributions is key in ML.
3.1. Probability Distributions¶
- Normal Distribution:
- Formula: $P(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$
- Code:
import matplotlib.pyplot as plt import numpy as np data = np.random.normal(0, 1, 1000) plt.hist(data, bins=30, density=True) plt.show()
4. Advanced Topics¶
4.1. Eigenvalues and Eigenvectors¶
- Definition: For $\mathbf{A}\mathbf{v} = \lambda \mathbf{v}$, $\lambda$ is an eigenvalue, and $\mathbf{v}$ is an eigenvector.
- Code:
from numpy.linalg import eig eigenvalues, eigenvectors = eig(A) print(f"Eigenvalues: {eigenvalues}") print(f"Eigenvectors:\n{eigenvectors}")
4.2. Singular Value Decomposition (SVD)¶
- Definition: Factorizes a matrix into three matrices: $\mathbf{A} = \mathbf{U} \cdot \mathbf{\Sigma} \cdot \mathbf{V}^T$
- Code:
from numpy.linalg import svd U, S, Vt = svd(A) print(f"U:\n{U}\nSingular Values:\n{S}\nVt:\n{Vt}")
5. Applications in Machine Learning¶
- Linear Algebra:
- Representing data as matrices.
- Feature transformation.
- Calculus:
- Loss function optimization.
- Probability:
- Bayesian inference.
- Sampling from distributions.
6. Benefits and Demerits¶
Benefits:¶
- Provides the mathematical tools to build, optimize, and interpret models.
- Enables better understanding of algorithms.
Demerits:¶
- May require significant computational resources for complex computations.
- Can be challenging for beginners.
That’s it for Day 3!
Learned something new? Put it into action with the Advanced Statistical Analysis Tool.
Stay tuned for Day 4 for more ML insights.
Follow me on LinkedIn and X for updates.
Happy learning! 🚀
Comments
Post a Comment