CausalChaos!

Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes


Paritosh Parmar1,2, Eric Peh1,2, Ruirui Chen1, Ting En Lam3, Yuhan Chen4, Elston Tan5, Basura Fernando1,2

1IHPC, A*STAR, Singapore 2CFAR, A*STAR, Singapore 3Nanyang Technological University 4National University of Singapore 5Singapore Polytechnic

NeurIPS 2024

Paper Code 💾 Dataset



CausalChaos! Dataset Teaser Figure

Abstract

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.

Dataset Videos

We provide the annotations, but the videos will have to be bought by the users. For example, users may buy it from here. Cost is reasonable---for example, whole series is priced around $60. In case needed, for synchronization between different video sources, we provide synchronization frames to help fit our annotations to your video source.

Material Acknowledgement and Disclaimer

Tom and Jerry is a material of Turner Entertainment Company (Warner Bros. Entertainment Inc.). All rights reserved. We do not claim any ownership of or rights to the Tom and Jerry material. All other trademarks, service marks, trade names and any other material referenced in this document are the property of their respective owners.

Citation

@article{parmar2024causalchaos,
            title={CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes},
            author={Parmar, Paritosh and Peh, Eric and Chen, Ruirui and Lam, Ting En and Chen, Yuhan and Tan, Elston and Fernando, Basura},
            journal={arXiv preprint arXiv:2404.01299},
            year={2024}
          }