CBOW Interactive Visualizer

Visualizing the flow of data in a Continuous Bag of Words model.
Two context words (previous & next) predict the center target word.

Input 1

Input 2

Hidden (Avg)

Output (Softmax)

Weight Matrix (W)

Loading Matrix...

The matrix $W$ has dimensions $V \times N$, where $V=5$ is the vocabulary size and $N=3$ is the hidden layer size (embedding dimension).

Hidden Layer & Output Logic

1. The Projection (Hidden Layer)

The hidden layer state $h$ is simply the average of the embedding vectors corresponding to the input context words. It compresses the context into a single vector of size $N$. $$h = \frac{1}{C} \sum_{w \in \text{context}} \text{vec}(w)$$ In our case with window size 2 (1 before, 1 after): $$h = \frac{\text{vec}(w_{t-1}) + \text{vec}(w_{t+1})}{2}$$

2. The Prediction (Matrix $W'$)

To predict the target, the hidden vector $h$ is multiplied by the Output Matrix $W'$ (dimensions $N \times V$). This produces a raw score ($z$) for every word in the vocabulary. $$z = h \cdot W'$$ A high score for a word means its vector in $W'$ aligns closely with the context vector $h$. Finally, Softmax converts these scores into probabilities.

Reflection: Toy Model vs. Industry Standard

While this visualizer helps understand the mechanism, real-world models operate on a vastly different scale. Below is a comparison between our toy model and the famous Google News model (Mikolov et al., 2013).

Hyperparameter	This Toy Model	Google News Model
Vocabulary Size ($V$)	5 words	3,000,000 words
Embedding Size ($N$)	3 dimensions	300 dimensions
Window Size	1 (Total 2 context words)	5 (Total 10 context words)
Total Parameters	15 (in $W$)	900,000,000 (in $W$)

Note on Input Complexity: In a naive neural network approach where input vectors are concatenated, a context of 10 words with a vocabulary of 3 million would result in a massive input layer of 30 million nodes ($10 \times 3M$). Word2Vec avoids this by projecting inputs directly into a shared embedding space (lookup tables) and averaging them.

CBOW Neural Network Visualizer

1. The Projection (Hidden Layer)

2. The Prediction (Matrix $W'$)