A bit of algebra beat a celebrated AI model at completing knowledge bases. And the same algebra turns out to write clean logical rules—just by multiplying matrices.
The puzzle—and a contrarian bet
Modern knowledge bases like Freebase and WordNet store facts as triples: (subject, relation, object). They’re huge and incomplete. The standard challenge is “link prediction”: given two parts of a triple, recover the third. Most systems learn low‑dimensional “embeddings” for entities and relations and score how plausible a triple is.
A team from Cornell and Microsoft Research took a contrarian bet. Instead of piling on layers and tensors, they stripped things down and asked: which simple interaction between two entity vectors works best? Addition or multiplication?
A unified playbook for relations
In their framework, every entity becomes a vector learned by a neural network. Every relation becomes an operator that scores a pair of entities. The score is a single number: higher means “more likely true.”
Two families cover nearly everything in the literature:
- Linear operators, which nudge the two entity vectors and compare them.
- Bilinear operators, which weigh how each dimension of one entity interacts with each dimension of the other.
Well‑known models fit neatly here. TransE, for example, tries to make y_subject + V_relation ≈ y_object (an additive move). Neural Tensor Networks stack a linear part on top of a bilinear tensor. The authors added a no‑frills option: a bilinear model with a diagonal matrix per relation—known as DistMult.
Multiplication beats addition
On Freebase (FB15k and a frequent‑relations subset) and WordNet, multiplication won. DistMult beat TransE across the board, even though both have the same number of parameters.
- FB15k: DistMult HITS@10 57.7% vs. TransE 53.9%
- FB15k‑401: DistMult 58.5% vs. TransE 54.7%
- WordNet: DistMult 94.2% vs. TransE 90.9%
More complex didn’t mean better: a tensor‑heavy model (NTN) underperformed, likely overfitting. Why does multiplication help? Think of each relation as a mask that tells the model which dimensions of two entities must “light up together.” Multiplication rewards that coordinated alignment; addition tends to blur it.
Two small tweaks pushed the simple model much further. First, add a tanh layer when projecting entities to control scale and sharpen distinctions. Second, initialize entity vectors with pre‑trained “entity” embeddings (not averaged word vectors), reflecting that most Freebase entries are names and non‑compositional phrases. With those in place, the team reported a top‑10 accuracy of 73.2% on FB15k‑401—versus 54.7% for TransE trained under comparable conditions.
From scores to rules
Numbers are nice; rules are better. Could these relation embeddings recover human‑readable logic? The team built a method, EmbedRule, that does exactly that.
The idea is simple. Many useful “Horn rules” describe chains like:
BornInCity(a, b) ∧ CityInCountry(b, c) ⇒ Nationality(a, c)
Treat the body as a composition of relations. If relations live as vectors (like TransE), compose by addition. If relations live as matrices (like bilinear models), compose by multiplication. Then search for heads whose embedding lies nearest to that composed body in embedding space. Use the schema to enforce type‑compatible chains and, after selecting promising candidates, compute each rule’s confidence from the data (the share of its predictions that hold).
On Freebase, matrix‑based embeddings shined. Using multiplication, EmbedRule consistently outperformed AMIE—a leading symbolic rule miner—on length‑2 rules, especially those involving composition. It surfaced clean, intuitive rules AMIE missed because too few instances appeared explicitly in the data, for example:
TVProgramCountryOfOrigin(a, b) ∧ CountryOfficialLanguage(b, c) ⇒ TVProgramLanguage(a, c)
For length‑3 chains, full bilinear matrices were best at the very top ranks; the diagonal (DistMult) version caught up as more predictions were considered. That gap points to a known limitation: diagonal matrices can’t model cross‑dimension interactions or distinguish a relation from its inverse; longer chains need that extra expressiveness.
Why composition works
There’s a neat algebra hiding here. If a relation B is modeled by a matrix M_B and true triples make y_a^T M_B y_b large, then chaining two relations B1 then B2 tends to make y_a^T (M_B1 M_B2) y_c large. That’s exactly matrix multiplication. The assumptions are approximate, but the behavior holds well enough to mine rules without ever enumerating entities.
Why this matters
This work delivers two practical lessons. First, in multi‑relational data, multiplicative interactions are a strong default: they’re simple, scalable, and they generalize. Second, relation embeddings aren’t just scores; they carry structure. With the right algebra—addition for vectors, multiplication for matrices—they write rules quickly, often better than symbolic search, and at a fraction of the cost.
A bit of algebra didn’t just beat a heavyweight model at filling in facts; it also showed how those facts fit together.


