Linear/Fully-Connected Layer

Fully Connected Layer/Linear Layer: The Token's Final Destination

Right before calculating the output probabilities, all the information gathered in earlier steps of the transformers architecture will pass through what is known as a fully connected layer, or a linear layer. This layer will have a batch size, number of inputs, and number of outputs.

Llama Final Architecture