ASK Me

Lecture Notes: Optimising Numerical Code

Prerequisites:

Basic Python programming
Understanding of NumPy arrays and vector operations
Familiarity with loops and basic function definitions

What you'll be able to do:

Explain the benefits of vectorised computation and identify when to use it.
Apply NumPy vectorised patterns to replace explicit Python loops.
Understand Cython and Numba as tools to accelerate numerical code.
Profile Python code to identify bottlenecks and optimize performance.

1. Introduction: What is Optimising Numerical Code and Why Should You Care?

Core Definition

Optimising numerical code refers to the practice of writing Python programs that perform numerical computations efficiently. It involves using strategies such as vectorised operations, just-in-time compilation, and profiling to reduce computation time, memory usage, and unnecessary overhead. The goal is to make code run faster while maintaining correctness.

A Simple Analogy

Think of solving a big puzzle. You could pick up one piece at a time (Python loops), or you could use a system that grabs many pieces at once and places them efficiently (vectorised operations). Both solve the puzzle, but the latter is faster and more efficient.

Limitation: This analogy works for understanding speed but does not cover the low-level memory and compilation tricks that Cython or Numba use.

Why This Matters to You

Problem it solves: Python loops are slow for large numerical datasets. As data grows, computation time increases drastically if inefficient code is used.

What you'll gain:

Faster computations, especially for large datasets.
Cleaner, more readable code by using vectorised patterns.
The ability to profile and optimise real-world applications.

Real-world context: Companies handling financial analytics, scientific simulations, and machine learning rely on these optimisations to save hours of computation time.

2. The Foundation: Core Concepts Explained

Concept A: Vectorised Patterns

Definition: A vectorised pattern replaces explicit loops with operations on entire arrays, leveraging optimized C and Fortran code under the hood in NumPy.

Key characteristics:

Broadcasting: Automatically aligns arrays of different shapes for operations.
Memory efficiency: Avoids Python-level loops and temporary objects.
Speed: Uses low-level implementations for faster computation.

A concrete example:

python

import numpy as np

a = np.array([1, 2, 3, 4])

b = np.array([10, 20, 30, 40])

# Vectorised addition

c = a + b

print(c)

Output: [11 22 33 44]

Common confusion: Beginners often write loops for element-wise operations, not realising that NumPy can handle entire arrays at once.

Concept B: Cython/Numba Primer

Definition: Cython and Numba are tools to compile Python code for speed:

Cython: Converts Python code to C for faster execution.
Numba: Uses just-in-time (JIT) compilation to optimize numerical functions at runtime.

How it relates to Vectorisation: While vectorisation works on arrays, Cython/Numba can speed up loops and functions that cannot easily be vectorised.

Key characteristics:

Cython: Static typing, explicit compilation, generates .c files.
Numba: Dynamic, minimal code change, uses @jit decorator.
Targeted acceleration: Works well for math-heavy loops.

A concrete example:

python

from numba import jit

import numpy as np

@jit

def sum_squares(n):

total = 0

for i in range(n):

total += i ** 2

return total

print(sum_squares(1000000))

Common confusion: Beginners think @jit magically optimizes everything. It only accelerates numeric-heavy loops and functions, not simple Python operations like printing.

How Vectorised Patterns and JIT Work Together

Vectorisation removes Python loops where possible, while JIT optimizes the remaining loops. Think of vectorisation as a first-level speedup and JIT as a second-level accelerator. Together, they produce near C-level performance in Python code.

3. Seeing It in Action: Worked Examples

Example 1: Vectorised Sum of Squares

Scenario: You need the sum of squares of numbers 1 to 1,000,000.

Our approach: Use a vectorised NumPy expression instead of Python loops.

python

import numpy as np

# Step 1: Create an array

arr = np.arange(1, 1000001)

# Step 2: Compute sum of squares vectorised

result = np.sum(arr**2)

# Result

print(result)

What's happening here: NumPy computes arr**2 in C internally, and np.sum reduces the array efficiently. No Python loop overhead exists.

The output/result: 333333833333500000

Key takeaway: Vectorised operations dramatically reduce runtime for large datasets.

⚠️ Common Misconception: Some believe Python loops are necessary for clarity. Vectorisation can be clearer and faster.

Example 2: Using Numba to Speed Loops

Scenario: Computing the same sum of squares with a Python loop is too slow.

Our approach: Apply @jit from Numba to accelerate the function.

python

from numba import jit

@jit

def sum_squares_loop(n):

total = 0

for i in range(1, n+1):

total += i**2

return total

print(sum_squares_loop(1000000))

What's happening here: Numba compiles the Python loop to machine code at runtime, so the loop executes at near C speed.

The output/result: Same numeric result, runtime reduced significantly.

Key takeaway: JIT is a quick way to speed up numeric loops that cannot be vectorised.

⚠️ Common Misconception: Adding @jit to a function that mostly calls Python built-ins does not speed up computation.

Example 3: Broadcasting for Efficient Array Operations

Scenario: You have two arrays of shapes (1000, 1) and (1, 500). You want all pairwise sums.

Our approach: Use broadcasting instead of nested loops.

python

A = np.arange(1000).reshape(1000, 1)

B = np.arange(500).reshape(1, 500)

# Broadcasting addition

C = A + B

print(C.shape)

What's happening here: NumPy automatically stretches arrays to match each other’s shapes without creating large temporary arrays in memory.

The output/result: (1000, 500)

Key takeaway: Broadcasting avoids explicit loops and excessive memory use.

⚠️ Common Misconception: Beginners try for loops for each element, which is inefficient.

Example 4: Combining Vectorisation with Numba

Scenario: You want to apply a complex formula on a 1D array that cannot be fully vectorised.

Our approach: Vectorise where possible, use @jit for the remaining loops.

python

from numba import jit

@jit

def complex_transform(arr):

res = np.zeros_like(arr)

for i in range(arr.size):

res[i] = np.sin(arr[i])**2 + np.log1p(arr[i])

return res

arr = np.arange(1, 1000000)

result = complex_transform(arr)

What's happening here: Numba accelerates the loop, while NumPy functions inside the loop (np.sin, np.log1p) remain efficient.

The output/result: Array of transformed values, computed much faster than pure Python.

Key takeaway: Combining vectorisation with JIT provides flexibility and speed for complex computations.

⚠️ Common Misconception: Some think JIT alone suffices. Combining techniques yields best performance.

4. Common Pitfalls: What Can Go Wrong and How to Avoid It

The Mistake: Using Python loops for large arrays.

Why It's a Problem: Slow execution and high overhead.

The Right Approach: Use vectorised NumPy operations.

Why This Works: Computations run in compiled C code with low overhead.

The Mistake: Assuming @jit optimizes all code.

Why It's a Problem: Functions calling Python objects or I/O are not accelerated.

The Right Approach: Only use JIT on numeric-heavy functions.

Why This Works: JIT optimizes numeric loops; non-numeric operations stay in Python.

The Mistake: Misunderstanding broadcasting shapes.

Why It's a Problem: Operations fail or produce unexpected results.

The Right Approach: Ensure shapes align logically for broadcasting rules.

Why This Works: NumPy aligns arrays intelligently, avoiding memory bloat.

The Mistake: Premature optimisation without profiling.

Why It's a Problem: May complicate code unnecessarily.

The Right Approach: Profile first, then optimise hotspots.

Why This Works: Focuses effort where it has real impact.

5. Your Turn: Practice & Self-Assessment

Practice Task (Estimated: 20 minutes)

The Challenge: Optimise a function that computes f(x) = x^2 + sin(x) over 1 million elements.

Specifications:

Write a pure Python loop version.
Vectorise it using NumPy.
Apply Numba @jit to the loop version.

Hint: Measure execution time using time or %timeit in Jupyter. Compare results.

Extension (optional): Try a hybrid approach: vectorised computation inside a JIT function.

Check Your Understanding:

Explain why vectorisation is faster than Python loops.
When would you prefer JIT over vectorisation?
Identify potential memory bottlenecks in broadcasting.
Predict the effect of mixing Python loops with NumPy operations.

6. Consolidation: Key Takeaways & Next Steps

The Essential Ideas

Concept	What It Means	Why It Matters
Vectorised Patterns	Operations on arrays without explicit loops	Fast, readable, memory-efficient
Broadcasting	Automatic shape alignment for arrays	Avoids loops, reduces memory overhead
JIT Compilation (@jit)	Compile Python functions to machine code at runtime	Speeds up numeric-heavy loops
Cython	Converts Python to C code	Achieves near C performance for complex functions
Profiling	Measure runtime and memory usage	Helps identify bottlenecks

💬 Summary Thought:

Optimising numerical code combines smart use of libraries, compilation, and profiling. By learning these techniques, you reduce execution time, write cleaner code, and can scale computations to large datasets with confidence.

Next Steps

Practice vectorised patterns: Replace loops in your previous assignments with NumPy operations.
Explore Numba: Apply @jit to loops in your code and measure speedups.
Profile code: Use %timeit or cProfile to find bottlenecks.
Combine techniques: Apply vectorisation, broadcasting, and JIT together for optimal performance.

Thursday, November 6, 2025

Lecture Notes: Optimising Numerical Code

Lecture Notes: Optimising Numerical Code