pandas = Array.map/filter/reduce on Tabular Data
A pandas DataFrame is an array of objects with column-aware map, filter, reduce, groupBy, and join operations built in.
You've filtered arrays, mapped over objects, reduced datasets to summaries, and grouped items by category — all in JavaScript. A pandas DataFrame is that same toolkit, optimized for tabular data. If you can chain .map().filter().reduce(), you can read pandas code.
Before data reaches a model, it goes through wrangling: cleaning, transforming, splitting, normalizing. In Python, that's pandas. In JavaScript, that's the array methods you use every day. This lesson maps every common pandas operation to its JavaScript equivalent.
Learning Objectives
- ○Read pandas DataFrame operations and understand what they do
- ○Map pandas filtering, selection, and transformation to JS array methods
- ○Understand groupby as the equivalent of a reduce-to-groups pattern
- ○Translate pandas data preparation pipelines to JavaScript
DataFrames Are Arrays of Objects
Frontend
Array of objects + .map/.filter/.reduce
data.filter(r => r.age > 25).map(r => ({ ...r, label: r.age > 50 ? 1 : 0 }))Machine Learning
pandas DataFrame
df[df['age'] > 25].assign(label=lambda r: (r['age'] > 50).astype(int))import pandas as pd
# Create a DataFrame — like an array of objects with typed columns
df = pd.DataFrame({
'name': ['Amina', 'Ravi', 'Leyla', 'Arjun'],
'age': [28, 35, 42, 23],
'score': [0.85, 0.72, 0.91, 0.68]
})
# name age score
# 0 Amina 28 0.85
# 1 Ravi 35 0.72
# 2 Leyla 42 0.91
# 3 Arjun 23 0.68// JavaScript equivalent
const data = [
{ name: 'Amina', age: 28, score: 0.85 },
{ name: 'Ravi', age: 35, score: 0.72 },
{ name: 'Leyla', age: 42, score: 0.91 },
{ name: 'Arjun', age: 23, score: 0.68 },
];Selecting Columns
# pandas
names = df['name'] # Single column → Series
subset = df[['name', 'score']] # Multiple columns → DataFrame
# JavaScript
# const names = data.map(r => r.name);
# const subset = data.map(({ name, score }) => ({ name, score }));Filtering Rows
# pandas — boolean indexing
adults = df[df['age'] > 30]
high_scorers = df[df['score'] >= 0.8]
combined = df[(df['age'] > 25) & (df['score'] > 0.7)]
# JavaScript
# const adults = data.filter(r => r.age > 30);
# const highScorers = data.filter(r => r.score >= 0.8);
# const combined = data.filter(r => r.age > 25 && r.score > 0.7);Adding / Transforming Columns
# pandas
df['normalized'] = df['score'] / df['score'].max()
df['label'] = (df['score'] > 0.8).astype(int)
df['age_group'] = df['age'].apply(lambda x: 'senior' if x > 40 else 'junior')
# JavaScript
# const result = data.map(r => ({
# ...r,
# normalized: r.score / Math.max(...data.map(d => d.score)),
# label: r.score > 0.8 ? 1 : 0,
# ageGroup: r.age > 40 ? 'senior' : 'junior',
# }));Aggregation (reduce)
# pandas
df['score'].mean() # Average
df['score'].sum() # Sum
df['age'].min() # Minimum
df.describe() # Summary statistics
# JavaScript
# const mean = data.reduce((s, r) => s + r.score, 0) / data.length;
# const sum = data.reduce((s, r) => s + r.score, 0);
# const min = Math.min(...data.map(r => r.age));GroupBy
The groupby operation is the most important pattern. It's exactly a reduce that buckets items by key.
# pandas
grouped = df.groupby('age_group')['score'].mean()
# age_group
# junior 0.75
# senior 0.91
# JavaScript
# const grouped = data.reduce((acc, r) => {
# const key = r.ageGroup;
# if (!acc[key]) acc[key] = [];
# acc[key].push(r.score);
# return acc;
# }, {});
# const means = Object.fromEntries(
# Object.entries(grouped).map(([k, v]) =>
# [k, v.reduce((s, x) => s + x, 0) / v.length]
# )
# );Sorting and Merging
# Sort
df_sorted = df.sort_values('score', ascending=False)
# → data.sort((a, b) => b.score - a.score)
# Merge (like SQL JOIN)
merged = pd.merge(df1, df2, on='user_id', how='left')
# → like a manual join with Map lookup in JSData Prep for ML
The real payoff: reading a data preparation pipeline in a Jupyter notebook.
# Typical pandas ML pipeline
df = pd.read_csv('data.csv') # Load
df = df.dropna() # Remove missing values
df = df[df['value'] > 0] # Filter outliers
df['value_norm'] = (df['value'] - df['value'].mean()) / df['value'].std() # Normalize
X = df[['feature1', 'feature2', 'feature3']].values # → numpy array
y = df['label'].values # → numpy array
# JavaScript equivalent
# let data = rawData
# .filter(r => r.value != null && r.value > 0)
# .map(r => ({ ...r }));
# const mean = data.reduce((s, r) => s + r.value, 0) / data.length;
# const std = Math.sqrt(data.reduce((s, r) => s + (r.value - mean) ** 2, 0) / data.length);
# data = data.map(r => ({ ...r, valueNorm: (r.value - mean) / std }));
# const X = data.map(r => [r.feature1, r.feature2, r.feature3]);
# const y = data.map(r => r.label);Challenge
Translate a pandas data wrangling pipeline to JavaScript array operations.
Exercise
Data Wrangling in JavaScript
Translate a pandas data pipeline to JavaScript. You have an array of sensor reading objects. Implement the pipeline: (1) filter out rows with null values, (2) filter to only readings above a threshold, (3) add a normalized column, (4) group by sensorId and compute the mean of the normalized values. This mirrors a typical pandas preprocessing pipeline.
Key Takeaways
- ✓A pandas DataFrame is an array of objects with column-aware operations built in
- ✓df[condition] is .filter(), df['col'].apply() is .map(), df.groupby() is .reduce()
- ✓pandas data preparation pipelines read like chained JS array methods
- ✓You can understand any Jupyter notebook's data section with these mappings