NVIDIA’s generalist AI tech plays Minecraft, wins conference award


NVIDIA’s generalist AI agent has won itself an award for playing Minecraft, well, performing actions from written prompts, at the recent NeurIPS conference.

The AI agent won the Outstanding Datasets and Benchmarks Paper Award at the 2022 NeurIPS (Neural Information Processing Systems) conference, with NVIDIA researchers pumping a huge 730,000 videos of Minecraft from YouTube into training the MineDojo framework to play Minecraft.


We’re talking over 2.2 billion words transcribed, 7000 stacked webpages from the Minecraft wiki, a huge 360,000 posts on Reddit, and 6.6 million comments on Reddit that described Minecraft gameplay to the AI agent crafted by NVIDIA researchers. The data allowed NVIDIA researchers to create a custom transformer model it calls MineCLIP, which uses video clips with specific in-game Minecraft activities.

Using this, someone can tell a MineDojo agent to do whatever they want in Minecraft using high-level natural language. This includes things like “find a desert pyramid” or something like “build a nether portal and enter it”. After those commands are given, MineDogo will perform the instructions in a series of steps, inside of Minecraft. Very cool.

NVIDIA's generalist AI tech plays Minecraft, wins conference award 04

An award-winning paper titled “MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge” was released in June 2022, with authors including Linxi Fan of NVIDIA and Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar of various academic institutions.

NVIDIA explains: “While researchers have long trained autonomous AI agents in video-game environments such as StarCraft, Dota, and Go, these agents are usually specialists in only a few tasks. So NVIDIA researchers turned to Minecraft, the world’s most popular game, to develop a scalable training framework for a generalist agent-one that can successfully execute a wide variety of open-ended tasks”.