Eigenfaces — User Guide &
Technical Documentation
A complete interactive visualization of the Eigenfaces face recognition system,
built on real sklearn.decomposition.PCA running over the AT&T Olivetti
Faces dataset — 400 real face images, 40 identities, all processed through
Principal Component Analysis and visualized in the browser.
Quick Start
Three commands in Git Bash are all you need. Make sure all project files are in the same folder before starting.
pip install scikit-learn numpy pillow
Downloads scikit-learn (includes sklearn PCA, the Olivetti dataset loader), numpy for array math, and Pillow for image resizing. Only needed once.
cd /c/Users/YourName/Downloads/eigenfaces
python generate_eigenfaces.py
The script fetches the real Olivetti faces, runs PCA(n_components=50).fit(X), encodes results, and generates the HTML file. Takes 15–30 seconds.
start eigenfaces_olivetti.html
Opens the self-contained visualization in your default browser. No server, no internet required after generation.
Open eigenfaces_sklearn.html right now — it contains the full interface loaded with simulated data so you can explore before running anything.
Windows Setup
Detailed step-by-step guide for running the Eigenfaces pipeline on Windows using Git Bash. The script downloads the AT&T Olivetti dataset, runs real sklearn PCA, and generates the self-contained HTML visualization — all from five simple commands.
Pipeline Overview
sklearn runs on your machine, fetches the real AT&T Olivetti dataset, computes PCA, then bakes everything into a self-contained HTML file you open in any browser.
Setup Steps — Git Bash on Windows
Open Git Bash and verify Python works. You need Python 3.7 or higher.
Git Bashpython --version
# Python 3.11.x
# If that fails, try:
python3 --version
Download from python.org/downloads — check "Add Python to PATH" during install, then restart Git Bash.
Git Bash uses Unix-style paths — drive letter becomes /c/ instead of C:\
Git Bash# Change to match where you saved the files:
cd /c/Users/YourName/Downloads/eigenfaces
# Verify all files are there:
ls
generate_eigenfaces.py eigenfaces_template.html eigenfaces_sklearn.html
Only needs to be done once. Pillow is optional — the script has a numpy fallback if it's missing.
Git Bashpip install scikit-learn numpy pillow
If pip is not foundpython -m pip install scikit-learn numpy pillow
Downloads the real Olivetti dataset (~5 MB from figshare.com), runs sklearn PCA, and generates the HTML. Takes about 15–30 seconds total.
Git Bashpython generate_eigenfaces.py
Expected output[1/5] Fetching Olivetti faces dataset via sklearn...
X.shape = (400, 4096)
[2/5] Running PCA(n_components=50).fit(X)...
[3/5] Downsampling 64×64 → 32×32...
[4/5] Encoding to base64...
[5/5] Building eigenfaces_olivetti.html...
✓ eigenfaces_olivetti.html (2100 KB)
eigenfaces_olivetti.html will appear in the same folder with all 400 real Olivetti faces embedded — no internet required to open it.
The generated HTML is fully self-contained — no server, no dependencies needed.
Git Bashstart eigenfaces_olivetti.html
Alternative# Double-click the file in Windows Explorer
# Or drag it into Chrome / Firefox
Troubleshooting
Try python3 instead of python, or reinstall Python from python.org and check "Add Python to PATH".
Run python -m pip install scikit-learn numpy pillow instead.
Your network is blocking figshare.com. Try a different network, mobile hotspot, or VPN. The script only needs internet for step 4 (first run only — dataset is cached after that).
The script must be in the same folder as the template. Run ls to confirm both files are present in the current directory.
Open eigenfaces_sklearn.html right now — it works with simulated data and shows the full interface with no setup required.
Project Files
Four files make up the project. Keep them all in the same folder — the Python script looks for the template by relative path.
/* __SKLEARN_DATA__ */ placeholder comment. The script replaces this with the real PCA JSON payload.| File | Size | Created by | Role |
|---|---|---|---|
generate_eigenfaces.py | ~3.5 KB | Provided | Run once to generate the HTML |
eigenfaces_template.html | ~41 KB | Provided | HTML shell, keep alongside the script |
eigenfaces_sklearn.html | ~2.1 MB | Provided | Preview — open immediately |
eigenfaces_olivetti.html | ~2+ MB | Script output | Final file with real Olivetti data |
Workflow
The core challenge is that browsers cannot run Python. The solution is a build-time bridge: Python runs once on your machine, does all the heavy computation, then serializes the results into a self-contained HTML file the browser can display with pure JavaScript.
_faces()
.fit(X)
base64
.replace()
renders JS
Why not just run sklearn in the browser?
Browsers execute only JavaScript. Python libraries like sklearn, numpy,
and scipy cannot run client-side. Pyodide (Python-in-WebAssembly) could in theory
work but would require downloading a 30+ MB runtime and would be far slower.
The chosen approach — pre-compute in Python, embed in HTML — gives the best of both worlds: real sklearn results with zero browser dependencies. The generated HTML is completely self-contained and can be shared, emailed, or opened offline.
Data Flow Detail
| Stage | Where | Input | Output |
|---|---|---|---|
fetch_olivetti_faces() | Python | Internet (figshare) | X (400,4096), y (400,) |
PCA.fit(X) | Python/sklearn | X | components_, mean_, EVR, projections |
| Downsample + encode | Python/Pillow | 64×64 images | 32×32 uint8 → base64 strings |
| JSON injection | Python | All PCA results | eigenfaces_olivetti.html |
| Decode + render | Browser JS | base64 JSON | Canvas drawings, charts |
| Recognition | Browser JS | User interaction | Nearest-neighbor in eigenspace |
Methodology
Eigenfaces is a classical computer vision technique introduced by Turk and Pentland (1991). It applies Principal Component Analysis (PCA) to a dataset of face images, discovering the directions of greatest variance — the "eigenfaces" — and uses projection into this low-dimensional space for recognition.
The AT&T Olivetti Faces Dataset
Collected at AT&T Laboratories Cambridge between 1992–1994. Contains 400 grayscale photographs of 40 distinct subjects, with 10 images per person. Variations include different lighting conditions, facial expressions (open/closed eyes, smiling/neutral), and accessories (glasses or no glasses). Each image is 64×64 pixels = 4,096 dimensions.
| Property | Value |
|---|---|
| Total images | 400 |
| Identities | 40 subjects |
| Images per person | 10 |
| Image size | 64×64 pixels (original) · 32×32 (display) |
| Feature dimension D | 4,096 (64×64 flattened) |
| Color | Grayscale, uint8, range [0, 255] |
| sklearn access | fetch_olivetti_faces() |
Core Methodology Steps
Build the Data Matrix
Flatten each 64×64 face image into a 4,096-dimensional row vector. Stack all 400 images to form the matrix X ∈ ℝ⁴⁰⁰ˣ⁴⁰⁹⁶. Each row is one face; each column is one pixel across all faces.
Compute and Subtract the Mean Face
Calculate μ = (1/N) Σ xᵢ — the average face across all 400 images (pca.mean_). Subtract from every face: X_c = X − μ. This centers the data before SVD.
Singular Value Decomposition (SVD)
Decompose the centered matrix: X_c = U Σ Vᵀ. sklearn internally uses LAPACK's dgesdd (divide-and-conquer SVD). The right singular vectors Vᵀ are the eigenfaces — directions of maximal variance in face-space.
Select Top-k Eigenfaces
Retain only the top k singular vectors from Vᵀ. These become pca.components_ with shape (k, 4096). Each row reshaped to 64×64 is one eigenface — a "face-like" basis image capturing one major mode of variation.
Project All Faces into Eigenspace
For each face, compute its weights: wᵢ = (xᵢ − μ) · Vᵏ. This is pca.transform(X), producing a (400, k) matrix of projections. Recognition operates entirely in this low-dimensional eigenspace.
Nearest-Neighbor Recognition
Project query face into eigenspace: w_q = pca.transform(query). Find the training face with minimum Euclidean distance: match = argmin ||W − w_q||₂. The identity of the matched face is the prediction.
Reconstruction
Reconstruct any face from its eigenspace weights: x̂ = w · Vᵏ + μ — equivalent to pca.inverse_transform(w). Using all k=50 components gives near-perfect reconstruction; fewer components give a blurrier approximation.
Mathematics
SVD Decomposition
sklearn PCA centers the data matrix and applies truncated SVD:
Explained Variance Ratio
Each singular value σᵢ corresponds to one eigenface. The fraction of total variance it explains:
Projection (Encoding)
Reconstruction (Decoding)
Recognition Distance
Projecting into eigenspace removes high-frequency noise and retains only the principal modes of variation — lighting, expression, pose. Two images of the same person cluster closer in this space than images of different people, making simple L2 distance effective for recognition.
Python Script Explained
generate_eigenfaces.py runs in five stages. Each stage is clearly printed
to the terminal so you can follow progress.
Stage 1 — Fetch the Dataset
sklearn downloads the AT&T Olivetti dataset from figshare.com (~5 MB) and caches it locally in ~/scikit_learn_data/. Future runs use the cache.
Python · generate_eigenfaces.py · Stage 1
from sklearn.datasets import fetch_olivetti_faces
dataset = fetch_olivetti_faces(shuffle=False)
X = dataset.data # (400, 4096) float32 — flattened images, range [0,1]
y = dataset.target # (400,) int — identity labels 0..39
images = dataset.images # (400,64,64) float32 — 2D pixel arrays
Stage 2 — Run Real sklearn PCA
PCA.fit(X) centers X by mean, then calls LAPACK's truncated SVD. All results are standard sklearn attributes.
Python · generate_eigenfaces.py · Stage 2
from sklearn.decomposition import PCA
K = 50
pca = PCA(n_components=K, whiten=False)
pca.fit(X)
# Standard sklearn PCA attributes:
eigenfaces = pca.components_ # (50, 4096) — right singular vectors
mean_face = pca.mean_ # (4096,) — mean over training set
evr = pca.explained_variance_ratio_ # (50,) — variance per component
singular_v = sqrt(pca.explained_variance_ * (N-1)) # approx σᵢ
# Project all 400 faces into eigenspace:
projections = pca.transform(X) # (400, 50) — real sklearn output
Stage 3 — Downsample for Display
Original images are 64×64 but displayed at 32×32 to keep the HTML file size manageable (~2 MB vs ~8 MB). Pillow's LANCZOS filter gives high quality downsampling.
Python · generate_eigenfaces.py · Stage 3
from PIL import Image as PILImage
def resize32(arr_64):
u8 = (arr_64.clip(0,1) * 255).astype('uint8')
pil = PILImage.fromarray(u8, mode='L').resize((32,32), PILImage.LANCZOS)
return array(pil, dtype='float32') / 255.0
faces_32 = stack([resize32(images[i]) for i in range(400)]) # (400,32,32)
mean_32 = resize32(mean_face.reshape(64,64)) # (32,32)
ef_32 = stack([resize32(normalize(eigenfaces[i])) for i in range(K)]) # (50,32,32)
Stage 4 — Encode to base64
Float arrays are quantized to uint8 and base64-encoded so they can be embedded as strings inside the HTML/JavaScript. The browser decodes them back with atob().
Python · generate_eigenfaces.py · Stage 4
import base64, json
def b64(arr):
u8 = (arr.clip(0,1) * 255).astype('uint8')
return base64.b64encode(u8.tobytes()).decode('ascii')
payload = {
"faces_b64": b64(faces_32.reshape(400,-1)), # 400 × 1024 pixels
"mean_b64": b64(mean_32.flatten()), # 1024 pixels
"ef_b64": b64(ef_32.reshape(K,-1)), # 50 × 1024 pixels
"projections": projections.tolist(), # (400, 50) full precision
"evr": evr.tolist(), # (50,)
"labels": y.tolist(), # (400,) identity IDs
}
Stage 5 — Inject into HTML Template
The template contains a single placeholder comment. Python replaces it with the full JSON payload — resulting in a self-contained HTML file.
Python · generate_eigenfaces.py · Stage 5
json_str = json.dumps(payload, separators=(',',':')) # compact JSON
html = open('eigenfaces_template.html').read()
# Replace placeholder with real sklearn data:
html = html.replace(
'/* __SKLEARN_DATA__ */',
f'const DATA = {json_str};'
)
open('eigenfaces_olivetti.html', 'w').write(html)
HTML Visualization Explained
The HTML file is pure vanilla JavaScript — no frameworks, no external dependencies.
All face rendering uses the HTML5 <canvas> API.
Decoding Embedded Data
When the page loads, the injected DATA constant is decoded back from base64 to pixel arrays.
JavaScript · eigenfaces_template.html · Data Decoding
// DATA is injected by Python — the const is already in the HTML
const FW = DATA.img_w, FH = DATA.img_h, FD = FW * FH; // 32, 32, 1024
function decodeU8B64(b64, rows, cols) {
const raw = atob(b64); // base64 → raw bytes
const u8 = new Uint8Array(raw.length);
for (let i = 0; i < raw.length; i++)
u8[i] = raw.charCodeAt(i);
const arr = [];
for (let i = 0; i < rows; i++) {
const row = new Float32Array(cols);
for (let j = 0; j < cols; j++)
row[j] = u8[i*cols + j] / 255; // uint8 → [0,1]
arr.push(row);
}
return arr;
}
const ALL_FACES = decodeU8B64(DATA.faces_b64, 400, FD);
const MEAN_FACE = decodeU8B64(DATA.mean_b64, 1, FD)[0];
const EIGENFACES = decodeU8B64(DATA.ef_b64, 50, FD);
const PROJECTIONS = DATA.projections; // (400, 50) full float64 from sklearn
Drawing a Face on Canvas
Every face image — original, eigenface, reconstructed — uses the same canvas drawing function.
JavaScript · Face Rendering
function drawFace(canvas, face, w, h, displayW, displayH) {
canvas.width = displayW; canvas.height = displayH;
const ctx = canvas.getContext('2d');
// Normalize to [0,1] for rendering (handles negative eigenface values)
let mn = Infinity, mx = -Infinity;
for (const v of face) { if (v < mn) mn=v; if (v > mx) mx=v; }
// Write pixel-by-pixel into ImageData at native resolution
const img = ctx.createImageData(w, h);
for (let i = 0; i < w*h; i++) {
const v = Math.round((face[i]-mn)/(mx-mn+1e-8) * 255);
img.data[i*4] = v; // R
img.data[i*4+1] = v; // G
img.data[i*4+2] = v; // B
img.data[i*4+3] = 255; // A (fully opaque)
}
// Scale up from 32×32 → display size with pixelated rendering
const tmp = document.createElement('canvas');
tmp.width=w; tmp.height=h;
tmp.getContext('2d').putImageData(img, 0, 0);
ctx.drawImage(tmp, 0, 0, displayW, displayH);
}
Reconstruction in JavaScript
The browser re-implements pca.inverse_transform() using the pre-computed projections and eigenfaces from sklearn.
JavaScript · Face Reconstruction
function reconstructFace(faceIdx, k) {
// Start from mean face (equivalent to pca.mean_)
const recon = new Float32Array(FD);
for (let j=0; j<FD; j++) recon[j] = MEAN_FACE[j];
// Add weighted eigenfaces: Σᵢ wᵢ · efᵢ
for (let ki=0; ki<k; ki++) {
const w = PROJECTIONS[faceIdx][ki]; // real sklearn projection weight
const ef = EIGENFACES[ki]; // real sklearn eigenface
for (let j=0; j<FD; j++) recon[j] += w * ef[j];
}
return recon;
}
Nearest-Neighbor Recognition
The recognition step projects the (noisy) query face into eigenspace using the stored eigenfaces, then finds the nearest training projection by Euclidean distance.
JavaScript · Recognition
function recognize(queryFace, k) {
// 1. Center the query face
const centered = queryFace.map((v, j) => v - MEAN_FACE[j]);
// 2. Project: w_q = (query - mean) · Vᵏᵀ
const qProj = EIGENFACES.slice(0, k).map(ef => {
let d = 0;
for (let j=0; j<FD; j++) d += centered[j] * ef[j];
return d;
});
// 3. Find nearest training projection by L2 distance
let minDist = Infinity, bestIdx = 0;
PROJECTIONS.forEach((p, i) => {
let d = 0;
for (let ki=0; ki<k; ki++) d += (p[ki] - qProj[ki]) ** 2;
if (Math.sqrt(d) < minDist) { minDist = Math.sqrt(d); bestIdx = i; }
});
return { match: bestIdx, dist: minDist, identity: LABELS[bestIdx] };
}
The sklearn Bridge Pattern
This project demonstrates a general pattern for bringing any Python ML computation into a static HTML page. The pattern has three parts:
Python computes, JavaScript renders
All heavy math (SVD, PCA, projection) happens in Python at build time. The browser only handles display and user interaction — tasks JavaScript does well with no performance issues.
Placeholder injection
The HTML template has a single placeholder /* __SKLEARN_DATA__ */ inside a <script> tag. Python replaces it with const DATA = {...}; — making the data a native JavaScript constant, not a runtime fetch.
Zero runtime dependencies
The generated HTML imports no external scripts, makes no HTTP requests, and needs no server. It works offline, can be emailed, shared via USB, or hosted on any static file server.
Replace fetch_olivetti_faces() with your own dataset. Replace PCA with any sklearn estimator. Serialize what your JavaScript needs (weights, predictions, embeddings). Inject via the same replace(placeholder, json_str) pattern.
Interface Guide
The visualization has a fixed two-column layout: a sidebar on the left for controls and stats, and a main panel on the right with five tabs corresponding to the five stages of the Eigenfaces pipeline.
Layout Regions
| Region | Contents |
|---|---|
| Header | Title, dataset info, badges |
| Pipeline tabs | Five clickable steps — click any to jump to that stage |
| Sidebar | k slider, noise slider, action buttons, live stats, sklearn code reference |
| Main panel | Content for the currently selected pipeline step |
Controls Reference
| Control | Range | Effect |
|---|---|---|
| k slider | 1 – 50 | Number of eigenfaces used. Higher k = more variance captured = better reconstruction and recognition. Lower k = faster, more compressed, less accurate. |
| Noise slider | 0 – 60% | Gaussian noise added to the query face before recognition. Tests robustness. At 0% the system uses a clean image; at 60% the face is heavily degraded. |
| Apply k & Refresh | Button | Re-renders all panels with the current k value. Updates eigenface grid, reconstruction pairs, variance chart, and stats. |
| Recognize Query | Button | Projects the current query face (with noise) into eigenspace using the current k, runs nearest-neighbor, and switches to Step 5 to show results. |
| New Query | Button | Selects a random face from the 400 Olivetti images as the new query. Also updates the person grid selection highlight. |
| Person grid | Click | Click any of the 40 identity cards in Step 1 to set a face from that person as the query. Highlights selected identity. |
The Five Pipeline Steps
Step 1 — Dataset · fetch_olivetti_faces()
Displays all 40 identity cards (one representative image each) and the full 400-face gallery. Click any identity card to select it as the recognition query. Hover any face in the gallery to see its person ID. This step corresponds to loading X and y from sklearn.
Step 2 — PCA / SVD · PCA.fit(X)
Shows the mean face (pca.mean_) and a bar chart of singular values σᵢ. Bars highlighted in green are the k selected components; grey bars are excluded. The SVD equation is displayed with the actual matrix shapes. Adjusting k updates the green/grey split.
Step 3 — Eigenfaces · pca.components_
Shows the cumulative explained variance curve with a 90% reference line and a green dot marking the current k. Below the chart, all k eigenfaces are displayed as thumbnails — these are the rows of pca.components_, reshaped to 32×32 and normalized for display. Earlier eigenfaces capture broad features; later ones capture fine detail.
Step 4 — Reconstruction · pca.inverse_transform()
Displays 20 side-by-side pairs: original face (left) and reconstruction (right). Reconstruction uses the stored sklearn projections plus the first k eigenfaces. Lower k gives blurrier, approximate faces; at k=50 the reconstruction is nearly perfect. The percentage label shows how much variance is retained.
Step 5 — Recognition · argmin(||W − w_q||₂)
Shows the noisy query face, the recognition result (HIT or MISS), and a horizontal bar chart of the top-8 nearest neighbors with their L2 distances in eigenspace. The green bar is the best match; blue bars are runners-up. Adjusting noise or k and re-running shows how both parameters affect accuracy.
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
python: command not found |
Python not in PATH | Try python3 instead, or reinstall Python from python.org and check "Add to PATH" |
pip: command not found |
pip not in PATH | Use python -m pip install ... |
URLError: Tunnel connection failed |
figshare.com blocked by network | Use mobile hotspot, VPN, or different Wi-Fi network. Only needed for first download; then cached locally. |
eigenfaces_template.html not found |
Files not in same folder | Run ls to confirm both generate_eigenfaces.py and eigenfaces_template.html are in the current directory |
ModuleNotFoundError: PIL |
Pillow not installed | Script has a numpy fallback — just re-run. Or install: pip install pillow |
| Blank page in browser | File too large / wrong browser | Use Chrome or Firefox. Edge and Safari may have issues with large inline scripts. Try python -m http.server 8080 and open localhost:8080 |
| Faces look pixelated | Expected — 32×32 display | This is intentional for file size. The recognition math still uses full-precision sklearn data. |
sklearn caches the Olivetti dataset in ~/scikit_learn_data/ after the first download. All subsequent runs of generate_eigenfaces.py use the cache and need no internet connection.
Glossary
| Term | Definition |
|---|---|
| PCA | Principal Component Analysis — dimensionality reduction technique that finds directions of maximum variance in data. |
| SVD | Singular Value Decomposition — matrix factorization A = UΣVᵀ. sklearn PCA uses truncated SVD internally. |
| Eigenface | A principal component of the face dataset — one row of pca.components_, reshaped to an image. Looks like a ghostly face pattern. |
| Eigenspace | The low-dimensional coordinate system spanned by the top-k eigenfaces. Each face maps to a point in this space. |
| Projection | The coordinates of a face in eigenspace — computed by pca.transform(). Shape: (k,) per face. |
| Reconstruction | Rebuilding a face from its eigenspace coordinates — pca.inverse_transform(). Quality improves with larger k. |
| EVR | Explained Variance Ratio — fraction of total variance captured by each component. Sum to 1 across all components. |
| Nearest Neighbor | Recognition method — finds the training face whose eigenspace projection is closest (min L2 distance) to the query. |
| mean_ (μ) | pca.mean_ — the average pixel values across all training faces. Subtracted before projection, added during reconstruction. |
| components_ | pca.components_ — shape (k, D). The eigenfaces. Row i is the i-th principal direction in pixel space. |
| k | Number of eigenfaces (PCA components) to use. Trades off accuracy vs. dimensionality. |
| base64 | Binary-to-text encoding that lets pixel data be stored as ASCII strings inside HTML/JavaScript. |
| AT&T Olivetti | Standard face recognition benchmark dataset. 400 images, 40 subjects × 10 photos, 64×64 grayscale. |
Eigenfaces · sklearn PCA · AT&T Olivetti Faces · MSDE Cohort 4 · User Guide v1.0