Disclaimer: Signal Ward is an educational simulation. All clinical scenarios are fictional. Nothing in this course constitutes medical advice.
VikramKhalil, you've parsed JSON a thousand times, right? You take a string, run JSON.parse, and get a structured object. Turning text into ML features is the same idea — different format, same principle.
KhalilSo instead of key-value pairs, I'm getting... what, a list of numbers?
VikramExactly. A vector. Each position represents a word from your vocabulary, and the value is how many times that word appears. It's called bag-of-words.
When your frontend fetches an API response, JSON.parse() converts a raw string into structured data your app can use. Bag-of-words does the same thing for ML: it converts raw text into a numerical vector your model can process.
const data = JSON.parse('{"name": "aspirin"}')const bow = textToVector('patient takes aspirin daily')A bag-of-words representation is simple: define a vocabulary, then count occurrences.
import * as tf from '@tensorflow/tfjs';
// Our vocabulary — each word gets a position
const vocab = ['patient', 'reports', 'chest', 'pain', 'denies', 'fever'];
function textToVector(text: string): number[] {
const words = text.toLowerCase().split(/s+/);
return vocab.map(v => words.filter(w => w === v).length);
}
// "patient reports chest pain" → [1, 1, 1, 1, 0, 0]
const note1 = textToVector('patient reports chest pain');
console.log(note1); // [1, 1, 1, 1, 0, 0]
// "patient denies fever" → [1, 0, 0, 0, 1, 1]
const note2 = textToVector('patient denies fever');
console.log(note2); // [1, 0, 0, 0, 1, 1]
// Now these are tensors the model can work with
const tensor1 = tf.tensor(note1);
const tensor2 = tf.tensor(note2);Bag-of-words captures what words are present but ignores order. "Patient denied pain" and "Pain denied patient" produce the same vector. For clinical text, that's a problem we'll solve in later modules. But it's a solid starting point.
// These two notes have VERY different meanings
const noteA = textToVector('patient denied chest pain');
const noteB = textToVector('patient reported chest pain');
// But with our current vocabulary, they differ only
// in words not yet in the vocab!
// We need 'denied' and 'reported' in the vocabulary.
const expandedVocab = ['patient', 'reports', 'chest', 'pain',
'denies', 'fever', 'denied', 'reported'];
// With expanded vocab, the vectors ARE different.
// But bag-of-words still can't capture word ORDER.Build a bag-of-words function that converts clinical notes into numerical vectors.
Implement a textToVector function that takes a string and a vocabulary array, and returns a number array where each element is the count of the corresponding vocabulary word in the input text. Lowercase the input before counting.
import * as tf from '@tensorflow/tfjs'; const vocab = ['patient', 'chest', 'pain', 'allergy', 'medication']; function textToVector(text: string, vocabulary: string[]): number[] { // 1. Lowercase and split the text into words // 2. For each word in the vocabulary, count occurrences return null; // your code here } const vector = textToVector('Patient reports chest pain and chest tightness', vocab); const tensor = tf.tensor(vector);
Khalil successfully converts his first batch of clinical notes into numerical vectors.
Next: building a simple classifier that can categorize these vectors.