In 2015 FAIR released the bAbI dataset. The dataset consists of 20 synthetic question answering tasks that require reasoning about agents, locations, objects, and intentions. Each instance (story) consists of a sequence of clauses and questions. A full description of the dataset, motivation, and results for several system can be found in the paper Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks.

Example Instance

 1 Mary moved to the bathroom.
 2 John went to the hallway.
 3 Where is Mary?    bathroom    1
 4 Daniel went back to the hallway.
 5 Sandra moved to the garden.
 6 Where is Daniel?  hallway 4
 7 John moved to the office.
 8 Sandra journeyed to the bathroom.
 9 Where is Daniel?  hallway 4
10 Mary moved to the hallway.
11 Daniel travelled to the office.
12 Where is Daniel?     office  11
13 John went back to the garden.
14 John moved to the bedroom.
15 Where is Sandra?     bathroom    8

Overview

Number of instances: 400
Train instances: 200
Test instances: 200

Vocabulary

Size: 19 Words

Agents

Locations

Clause Templates

Number of Possible Clauses

\[n = \#\{\text{agents}\} \times \#\{\text{clauses}\} \times \#\{\text{locations}\} = 4 \times 5 \times 6 = 120.\]

Instance Format

Instance Composition