ASK Me: Pre-Read Notes: Optimising Numerical Code

Pre-Read Notes: Optimising Numerical Code

Prerequisites: Basic Python programming, familiarity with NumPy arrays, basic loops and functions in Python.

What You'll Gain from This Pre-Read

After reading, you'll be able to:

Recognize how vectorised patterns can replace loops for faster computation.
Understand why Cython and Numba are used to speed up numerical code.
Identify when and where to apply these techniques in real-world applications.
Ask informed questions as you explore performance optimization in Python.

Think of this as: Learning shortcuts for efficient work—you’ll still understand the steps, but your code will run faster and scale better.

What This Pre-Read Covers

This pre-read will:

Introduce core concepts of vectorised patterns and JIT compilation.
Explain why these topics matter for performance in data-heavy applications.
Show simple, foundational examples of vectorisation and accelerated loops.
Build curiosity and confidence in writing efficient numerical Python code.

Part 1: The Big Picture - Why Does This Matter?

Opening hook: Have you ever waited minutes for a Python loop to process a large dataset, only to realize it could have been done in seconds?

Expand on the hook: In data science, finance, simulations, or machine learning, computations often involve millions of numbers. Writing simple Python loops can be intuitive but slow. Optimising numerical code ensures that your programs scale efficiently, saving both time and computational resources.

Where You'll Use This:

Job roles:

Data Scientist: Speed up preprocessing and feature engineering on large datasets.
Machine Learning Engineer: Train models faster using vectorised operations and JIT compilation.
Quantitative Analyst: Run complex simulations efficiently without waiting hours.

Real products:

Netflix and Spotify: Real-time recommendations rely on fast numerical operations.
Google Maps: Quick calculations for routing millions of users simultaneously.
Scientific simulations: Physics engines and climate models compute millions of iterations efficiently.

What you can build:

High-speed data pipelines for analytics.
Machine learning models handling large-scale datasets.
Simulation engines for research and engineering tasks.

Think of it like this: Optimising code is like preparing ingredients for a recipe. Instead of chopping vegetables one by one (loops), you use a food processor (vectorisation), making the process faster without changing the outcome.

Limitation: The analogy breaks down for specialized operations that require step-by-step logic—vectorisation may not always replace every loop.

Part 2: Your Roadmap Through This Topic

Here's what we'll explore together:

1. Vectorised Patterns

You'll discover how applying operations to entire NumPy arrays at once can replace slow loops and simplify your code.

2. NumPy Arrays and Broadcasting

We'll explore how NumPy arrays work with vectorised patterns and broadcasting rules to perform calculations across different shapes effortlessly.

3. Cython Primer

Learn how Cython converts Python code to compiled C code to achieve near-native performance on loops and numerical operations.

4. Numba Primer

See how Numba's just-in-time (JIT) compilation accelerates Python functions dynamically, making heavy computations faster.

The journey: We'll start with understanding vectorised operations, learn to accelerate loops with Cython and Numba, and then see how these tools improve real-world numerical tasks.

Part 3: Key Terms to Listen For

Vectorisation

Performing operations on entire arrays rather than looping through elements.

Example: Adding two NumPy arrays directly without iterating.

Broadcasting

Automatically aligning arrays of different shapes to perform operations.

Think of it as: Stretching a smaller dataset to match a bigger one for math operations, like applying a single sauce recipe to multiple batches.

Cython

A superset of Python that compiles to C for faster execution.

In practice: Converts Python loops into compiled code that runs much faster.

Numba

A just-in-time (JIT) compiler that accelerates Python functions.

Example: Adding@jitto a Python function allows loops over large arrays to run hundreds of times faster.

Efficiency

Writing code that reduces computation time and resource usage without sacrificing correctness.

💡 Key Insight: Vectorisation, Cython, and Numba all aim to make Python numerical code faster, but in slightly different ways—vectorisation leverages array operations, Cython compiles code ahead-of-time, and Numba compiles code just-in-time.

Part 4: Concepts in Action

Seeing Vectorised Patterns in Action

The scenario: You have daily sales data for 10,000 stores. You want to calculate the percentage increase in sales from one day to the next. Looping through each store is slow.

Our approach: Use NumPy array operations to compute all percentage increases simultaneously.

python

import numpy as np

# Step 1: Create random sales data for 10,000 stores over 7 days

sales = np.random.randint(100, 1000, size=(10000, 7))

# Step 2: Calculate daily percentage change using vectorised operations

pct_change = (sales[:, 1:] - sales[:, :-1]) / sales[:, :-1] * 100

# Result: Each row now shows the percentage change day-to-day

print(pct_change.shape)

What's happening here: We subtract each day’s sales from the next day for all stores at once, then divide by the previous day’s sales. NumPy handles all rows in parallel without a Python loop.

The output/result:(10000, 6)array of percentage changes.

Key takeaway: Vectorised patterns allow fast, clean calculations over large datasets without explicit loops.

⚠️ Common Misconception: Beginners often think loops are unavoidable; in fact, most array-based calculations can be vectorised.

Seeing Cython in Action

The scenario: You need to sum squares of numbers in a list of 10 million elements. Python loops are slow.

Our approach: Convert the function to Cython for faster execution.

# my_cython_code.pyx

def sum_squares(double[:] arr):

cdef int i

cdef double result = 0

for i in range(arr.shape[0]):

result += arr[i] * arr[i]

return result

What's happening here: Cython compiles the loop into C code. Thecdefkeyword declares C types for speed, and the loop executes in compiled C rather than Python.

Key takeaway: Cython drastically reduces runtime for heavy numerical loops while keeping Python-like syntax.

⚠️ Common Misconception: Cython requires rewriting all code—only performance-critical parts need to be converted.

Seeing Numba in Action

The scenario: Calculating pairwise distances between points in a 2D space for 1 million points.

Our approach: Use Numba’s@jitdecorator to accelerate the computation.

python

from numba import jit

import numpy as np

points = np.random.rand(1000000, 2)

@jit(nopython=True)

def pairwise_distances(points):

n = points.shape[0]

dists = np.zeros((n, n))

for i in range(n):

for j in range(n):

dists[i, j] = np.sqrt((points[i,0]-points[j,0])**2 + (points[i,1]-points[j,1])**2)

return dists

# Calculate distances

# dist_matrix = pairwise_distances(points)

What's happening here: The double loop is compiled just-in-time to machine code. The computation becomes hundreds of times faster compared to pure Python.

Key takeaway: Numba can accelerate Python loops dynamically with minimal code changes.

⚠️ Common Misconception: JIT compilation can be slow the first time the function runs, but subsequent calls are very fast.

Seeing Combined Approach in Action

The scenario: Normalize a large dataset and compute a transformed value for every row.

Our approach: Combine vectorised operations with Numba for optimal speed.

python

import numpy as np

from numba import jit

data = np.random.rand(1000000, 5)

@jit(nopython=True)

def transform_data(arr):

# Vectorised normalization

mean = np.mean(arr, axis=1)

std = np.std(arr, axis=1)

normalized = (arr - mean[:, None]) / std[:, None]

return normalized ** 2 # element-wise square

transformed = transform_data(data)

What's happening here: Vectorisation handles row-wise normalization, and Numba speeds up any remaining loops or calculations.

Key takeaway: Combining vectorisation and JIT compilation offers the best of both worlds: readability and speed.

⚠️ Common Misconception: Vectorisation alone is always fastest—sometimes combining with JIT yields better performance on extremely large datasets.

Part 5: Bringing It All Together

Vectorised Patterns in Practice

Vectorised operations let you perform calculations on entire arrays instead of individual elements.
Useful for tasks like data normalization, image transformations, and mathematical simulations.

Cython/Numba for Speed

Cython compiles Python code to C for heavy computations.
Numba dynamically accelerates Python functions with JIT compilation.
Combining these with vectorisation results in highly efficient numerical code.

In short: You can write clear Python code that handles millions of data points efficiently, using these tools together.

Part 6: Key Takeaways

Concept	What It Means	Why It Matters
Vectorised Patterns	Apply operations on arrays at once	Avoids slow Python loops
NumPy Arrays	Efficient storage for numbers	Foundation for vectorisation
Cython	Compile Python to C	Speeds up loops and heavy computation
Numba	Just-in-time compiler	Accelerates Python functions dynamically
Efficiency	Optimised code for performance	Essential for handling large-scale data

💬 Summary Thought:

Optimising numerical code is crucial in data-heavy tasks. Vectorisation simplifies code, and Cython/Numba dramatically improve execution speed. Mastering these gives you a competitive edge in real-world data processing.

✅ By the end of this pre-read, you should be able to:

Replace Python loops with vectorised operations.
Apply Cython and Numba to speed up numerical functions.
Understand performance trade-offs and choose the right optimization.
Recognize situations where these techniques save time in practice.

Next Steps

Practice vectorisation: Convert loops to array operations in NumPy.
Experiment with Numba: Apply@jitto Python functions and benchmark speed-ups.
Explore Cython: Compile slow Python code and compare performance.
Try real-world data: Use large datasets and observe differences in execution time.

Thursday, November 6, 2025

Pre-Read Notes: Optimising Numerical Code

Lecture Notes: Optimising Numerical Code