Skip to content
Extras/classical-ml/choosing-your-model
// companion content · math depth

When NOT to Use Deep Learning

Model selection depends on dataset size, interpretability requirements, compute budget, and data type. Deep learning is not always the answer.

Instructor

I've seen teams spend months building a neural network that a logistic regression could have matched in an afternoon. The most expensive mistake in ML isn't picking the wrong hyperparameters — it's picking the wrong model entirely. Let's make sure you never make that mistake.

Choosing an ML model is like choosing a frontend framework. You wouldn't use Next.js for a static landing page, and you wouldn't use plain HTML for a real-time dashboard. The right tool depends on the job.

Learning Objectives

  • Apply a systematic decision framework for model selection
  • Identify when classical ML outperforms deep learning
  • Evaluate trade-offs between accuracy, interpretability, and compute cost
  • Match problem types (tabular, image, text, time-series) to model families

The Decision Framework

Frontend

Choosing npm packages
// Need routing? react-router. Need state? zustand. Need SSR? Next.js

Machine Learning

Model selection
// Tabular? XGBoost. Images? CNN. Text? Transformer. Small data? KNN
Structural Bridge
⚠ Where this breaks
Choosing npm packages weighs maintenance, downloads, license. Choosing a model weighs accuracy on your data, inference cost, training cost, calibration, interpretability — and you cannot know which model wins without empirically training each one.
model-selector.tstypescript
type DataType = 'tabular' | 'image' | 'text' | 'time-series' | 'audio';
type Priority = 'accuracy' | 'interpretability' | 'speed' | 'low-data';

interface ProblemSpec {
dataType: DataType;
datasetSize: number;
needsInterpretability: boolean;
computeBudget: 'low' | 'medium' | 'high';
priority: Priority;
}

function recommendModel(spec: ProblemSpec): string {
// Rule 1: Unstructured data (images, text, audio) → deep learning
if (['image', 'audio'].includes(spec.dataType)) {
  return spec.computeBudget === 'low'
    ? 'Pre-trained model (transfer learning)'
    : 'CNN / Vision Transformer';
}

if (spec.dataType === 'text') {
  return spec.datasetSize < 1000
    ? 'TF-IDF + Logistic Regression'
    : 'Fine-tuned Transformer';
}

// Rule 2: Tabular data → classical ML almost always wins
if (spec.dataType === 'tabular') {
  if (spec.needsInterpretability) {
    return spec.datasetSize < 500
      ? 'Logistic Regression / Decision Tree'
      : 'Explainable Boosted Machine (EBM)';
  }

  if (spec.datasetSize < 100) return 'KNN or Logistic Regression';
  if (spec.datasetSize < 10000) return 'Random Forest';
  return 'XGBoost / LightGBM';
}

// Rule 3: Time-series → depends on complexity
if (spec.dataType === 'time-series') {
  return spec.datasetSize < 1000
    ? 'ARIMA or Prophet'
    : 'LSTM / Temporal Fusion Transformer';
}

return 'Start with logistic regression baseline';
}

When Classical ML Wins

Here are the scenarios where you should reach for classical ML first:

1. Tabular Data (Structured Data)

This is the biggest one. If your data lives in a database table with named columns, tree-based models (random forest, XGBoost) consistently outperform neural networks.

tabular-wins.tstypescript
// Classic tabular problem: predict user churn
// Features: days_since_login, total_purchases, support_tickets, plan_type
// Label: churned (0 or 1)

// Neural network approach:
//   - Needs feature engineering
//   - Needs normalization
//   - Needs architecture tuning
//   - Training time: minutes to hours
//   - Accuracy: ~85%

// XGBoost approach:
//   - Handles mixed feature types natively
//   - Handles missing values natively
//   - Minimal tuning needed
//   - Training time: seconds
//   - Accuracy: ~87%

// The simpler model wins on accuracy AND speed.

2. Small Datasets (< 1,000 samples)

Neural networks are data-hungry. With small datasets, they memorize instead of learning. Classical models generalize better with less data.

3. Interpretability Required

Regulated industries (finance, healthcare, insurance) often require model explanations. "The model denied your loan because your debt-to-income ratio exceeds 0.4" is only possible with interpretable models.

4. Tight Compute Budget

Training a neural network requires GPUs. Training a random forest requires a laptop. In production, inference cost matters too — a decision tree evaluates in microseconds.

When Deep Learning Wins

Deep learning is the right choice when:

deep-learning-wins.tstypescript
const useDeepLearning = (problem: ProblemSpec): boolean => {
// Unstructured data: images, audio, video, raw text
if (['image', 'audio'].includes(problem.dataType)) return true;

// Massive datasets (100k+ samples) with complex patterns
if (problem.datasetSize > 100_000 && !problem.needsInterpretability) return true;

// Sequence-to-sequence tasks (translation, summarization)
if (problem.dataType === 'text' && problem.priority === 'accuracy') return true;

// Multi-modal inputs (image + text, audio + video)
// Classical ML can't naturally combine these
return false;
};

The Production Checklist

Before choosing your model, answer these five questions:

production-checklist.tstypescript
interface ModelDecision {
// 1. What type of data do you have?
dataType: 'tabular' | 'image' | 'text' | 'time-series';

// 2. How much labeled data do you have?
datasetSize: number; // < 1k = small, 1k-100k = medium, 100k+ = large

// 3. Does a human need to understand why?
interpretable: boolean;

// 4. What's your compute budget?
hasGPU: boolean;
maxTrainingTime: 'minutes' | 'hours' | 'days';

// 5. What's your deployment target?
deployTarget: 'browser' | 'server' | 'edge' | 'mobile';
}

// The golden rule: start with the simplest model that could work.
// Only add complexity when you have evidence it's needed.

// Always establish a baseline:
// 1. Logistic regression (classification) or linear regression (regression)
// 2. Random forest or XGBoost
// 3. Only then try a neural network
// If step 1 achieves your target metric, ship it.

Challenge

Given real-world scenarios, choose the right model and justify your reasoning.

Exercise

IntermediateArithmetic~15 min

Choose the Right Model

Implement a recommendModel function that takes a problem specification and returns the best model family. Follow these rules: (1) image/audio data → 'deep-learning', (2) text data with < 1000 samples → 'logistic-regression', text with >= 1000 → 'deep-learning', (3) tabular data: if interpretability required → 'decision-tree', if dataset < 100 → 'knn', if dataset < 10000 → 'random-forest', otherwise → 'xgboost', (4) time-series with < 1000 samples → 'classical-stats', otherwise → 'deep-learning'.

# bridge

Choosing npm packagesModel selection

Key Takeaways

  • Tabular data + classical ML beats neural networks more often than not
  • Always start with a simple baseline — logistic regression for classification
  • Deep learning shines with unstructured data (images, text, audio) and massive datasets
  • Model selection is about trade-offs: accuracy vs. interpretability vs. compute vs. time
  • The best model is the simplest one that meets your requirements

Need a hint?

🧭 Guidance
Solution
Report Issue
0/2000
Severity
Screenshot
+ Attach screenshot (optional)
page url + browser info captured automatically