stem/AI/Neural Networks/RNN/VQA.md

19 lines
393 B
Markdown
Raw Normal View History

Visual Question Answering
- Combine visual with text sequence
- [[CNN]] + [[LSTM]]
- Generate text from images
- Automatic scene description
- Cross-modal
![[cnn+lstm.png]]
- Word embedding not character
# Freeform
- Encode facts with two text streams
![[vqa-block.png]]
# Limitations
- Repetitive answers
- Not much variation
- No creativity
- Wont generalise beyond taught concepts