--- tags: - ai - media --- Visual Question Answering - Combine visual with text sequence - [CNN](../CNN/CNN.md) + [LSTM](LSTM.md) - Generate text from images - Automatic scene description - Cross-modal ![cnn+lstm](../../../img/cnn+lstm.png) - Word embedding not character # Freeform - Encode facts with two text streams ![vqa-block](../../../img/vqa-block.png) # Limitations - Repetitive answers - Not much variation - No creativity - Wont generalise beyond taught concepts