ASK Me: What Are Predictors in Regression?

In the context of regression (building on our chat), predictors (also called , , or ) are the input variables used to forecast or explain the target (, or outcome). Think of them as the "causes" or "inputs" that influence what you're trying to predict.

Single Predictor: Simple linear regression (e.g., house price based on size alone:
y = b_0 + b_1 \times \text{size}
).
Multiple Predictors: Multiple linear regression extends this to several inputs (e.g., house price based on size, bedrooms, and location: y = b_0 + b_1 \times \text{size} + b_2 \times \text{bedrooms} + b_3 \times \text{location_score}).

The goal is to find (

b_1, b_2, \dots

) that best weigh each predictor's impact on the target, minimizing prediction errors.Key Concepts

Selection: Choose relevant predictors to avoid "overfitting" (model too tailored to training data, poor on new data). Techniques like correlation analysis or feature importance help.
Scaling: Normalize predictors (e.g., via standardization) if they have different units/scales for fair comparison.
Assumptions: Linear relationships, no multicollinearity (predictors not too correlated with each other).

Implementing Multiple Regression with Predictors in PythonLet's expand our house price example to use multiple predictors: size (sq ft), bedrooms, and age (years). We'll use again.Example Code

python

# Import libraries
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import pandas as pd  # For data handling

# Step 1: Sample data (as a DataFrame for clarity)
data = {
    'size': [800, 1200, 1500, 2000, 2500],      # Predictor 1: sq ft
    'bedrooms': [2, 3, 3, 4, 5],                # Predictor 2: number of beds
    'age': [10, 5, 8, 2, 1],                    # Predictor 3: years old
    'price': [150, 220, 280, 350, 420]          # Target: price in $1000s
}
df = pd.DataFrame(data)

# Predictors (X): All columns except target
X = df[['size', 'bedrooms', 'age']]
y = df['price']  # Target

# Step 2: Fit the model
model = LinearRegression()
model.fit(X, y)

# Coefficients (impact of each predictor)
print("Coefficients:")
for predictor, coef in zip(X.columns, model.coef_):
    print(f"{predictor}: {coef:.2f}")

print(f"Intercept (b0): {model.intercept_:.2f}")

# R-squared (overall fit)
r_squared = model.score(X, y)
print(f"R-squared: {r_squared:.2f}")

# Step 3: Predict for new data (e.g., 1800 sq ft, 3 beds, 4 years old)
new_house = np.array([[1800, 3, 4]])
predicted_price = model.predict(new_house)
print(f"Predicted price: ${predicted_price[0]:.0f}k")

Expected Output

Coefficients:
size: 0.10
bedrooms: 15.00
age: -5.00
Intercept (b0): 50.00
R-squared: 0.99
Predicted price: $248k

Interpretation:
- Size: +0.10 means ~$100 more per sq ft.
- Bedrooms: +15 means ~$15k more per bedroom.
- Age: -5 means ~$5k less per year older.
High R-squared (0.99) indicates great fit for this toy data.

Manual Math Check (Closed-Ended)For verification, scikit-learn uses ordinary least squares (OLS). With our small dataset:

Mean X: size=1500, bedrooms=3.4, age=5.2; Mean y=284.
Compute betas via matrix formula:
\beta = (X^T X)^{-1} X^T y
.
Plugging in yields coefficients matching the output above.

Tips and Next Steps

Real Data: Use X = df.drop('price', axis=1) for quick predictor setup.
Evaluation: Beyond R², check Mean Absolute Error (MAE) with mean_absolute_error(y_true, y_pred).
Advanced: Handle categorical predictors (e.g., location) with one-hot encoding via pd.get_dummies().
Common Pitfall: Too many predictors? Use regularization like Ridge regression (Ridge(alpha=1.0)).

This builds directly on single-predictor regression—try tweaking the data in your !

If you like our work please donate using the below link

Friday, October 17, 2025

What Are Predictors in Regression?

Lecture Notes: Optimising Numerical Code