The new Dreamer AI system, developed by Google’s DeepMind team, has successfully learned how to mine diamonds in Minecraft — without any direct instruction on how to play the game. Mining diamonds is a challenging, multi-phase task that the AI model mastered through a trial-and-error approach known as reinforcement learning. The developer describes this as a significant-breakthrough toward building AI systems capable of transferring knowledge from one domain to another.
Why train an AI model on Minecraft?
In Minecraft, players explore a 3D virtual world that includes many different environments, such as deserts, swamps, mountains, and forests. They collect resources from the different worlds and then turn them into objects such as chests and swords. They also collect items, with diamonds being one of the most prized possessions in the game.
Each time someone plays Minecraft, the game randomly generates a new world, so no two play-throughs are exactly the same. This makes the game a great choice for when researchers want to train an AI system to generalize knowledge between different scenarios.
Mining diamonds is a “very hard task” for an AI to learn
Multiple AI researchers have focused specifically on finding diamonds in Minecraft because it’s a complex, multi-step process. First, players must gather resources to build the necessary tools, such as a pickaxe and crafting table. Next, they must dig down to a deep enough level, then search for diamonds by tunneling. They must also avoid obstacles such as getting burned by lava and falling into caverns.
Collecting diamonds is a “very hard task” for artificial intelligence, according to Jeff Clune, a computer scientist at the University of British Columbia in Vancouver, Canada. “There is no question this [Dreamer AI system] represents a major step forward for the field,” Clune told the scientific publication Nature.
Dreamer system uses reinforcement learning to achieve results
The Dreamer team used a trial-and-error machine learning technique called “reinforcement learning.” The AI model explores the game on its own, identifying which actions are more likely to result in rewards in the game (such as mining diamonds). The model then repeats those actions, while avoiding others that are less likely to reap rewards.
The researchers reset the game every half hour so that the Dreamer system didn’t become too well conditioned to any particular environment. It takes the model about nine days of continuous play to find one diamond. Even though this is exponentially more time than a human player (experts can usually find a diamond in 20-30 minutes), researchers say this still represents a big step forward for AI models.
“This is a notoriously hard problem and the results are fantastic,” said Keyon Vafa, a computer scientist at Harvard University in Boston, Massachusetts.