Why AI struggles to grasp cause and effect

When you look at the following short video sequence, you can make inferences about causal relations between different elements. For instance, you can see the bat and the baseball player’s arm moving in unison, but you also know that it is the player’s arm that is causing the bat’s movement and not the other way around. You also don’t need to be told that the bat is causing the sudden change in the ball’s direction.

Likewise, you can think about counterfactuals, such as what would happen if the ball flew a bit higher and didn’t hit the bat.

Such inferences come to us humans intuitively. We learn them at a very early age, without being explicitly instructed by anyone and just by observing the world. But for machine learning algorithms, which have managed to outperform humans in complicated tasks such as go and chess, causality remains a challenge. Machine learning algorithms, especially deep neural networks, are especially good at ferreting out subtle patterns in huge sets of data. They can transcribe audio in real-time, label thousands of images and video frames per second, and examine x-ray and MRI scans for cancerous patterns. But they struggle to make simple causal inferences like the ones we just saw in the baseball video above.

In a paper titled “Towards Causal Representation Learning,” researchers at the Max Planck Institute for Intelligent Systems, the Montreal Institute for Learning Algorithms (Mila), and Google Research, discuss the challenges arising from the lack of causal representations in machine learning models and provide directions for creating artificial intelligence systems that can learn causal representations.

This is one of several efforts that aim to explore and solve machine learning’s lack of causality, which can be key to overcoming some of the major challenges the field faces today.

Independent and identically distributed data

Why do machine learning models fail at generalizing beyond their narrow domains and training data?

“Machine learning often disregards information that animals use heavily: interventions in the world, domain shifts, temporal structure — by and large, we consider these factors a nuisance and try to engineer them away,” write the authors of the causal representation learning paper. “In accordance with this, the majority of current successes of machine learning boil down to large scale pattern recognition on suitably collected independent and identically distributed (i.i.d.) data.”

i.i.d. is a term often used in machine learning. It supposes that random observations in a problem space are not dependent on each other and have a constant probability of occurring. The simplest example of i.i.d. is flipping a coin or tossing a die. The result of each new flip or toss is independent of previous ones and the probability of each outcome remains constant.

When it comes to more complicated areas such as computer vision, machine learning engineers try to turn the problem into an i.i.d. domain by training the model on very large corpora of examples. The assumption is that, with enough examples, the machine learning model will be able to encode the general distribution of the problem into its parameters. But in the real world, distributions often change due to factors that cannot be considered and controlled in the training data. For instance, convolutional neural networks trained on millions of images can fail when they see objects under new lighting conditions or from slightly different angles or against new backgrounds.

ImageNet images vs ObjectNet images