Research Project

Extrapolative Correlation Attention

Understanding and overcoming the correlation plateau in attention-based regression models.

ECA diagram

Illustration of the correlation plateau phenomenon and the proposed extrapolative aggregation mechanism.

Overview

Attention-based regression models are frequently trained using a joint objective combining Mean Squared Error (MSE) and Pearson Correlation Coefficient (PCC). In practice, however, correlation often stops improving early during training even when prediction error continues to decrease.

Our work provides the first theoretical analysis explaining this phenomenon. We show that optimizing magnitude accuracy can suppress the gradient signal required to improve correlation structure, leading to the observed plateau.

Key Insight

We demonstrate that standard attention mechanisms behave as convex aggregators, meaning predictions are constrained within the convex hull of input representations. This limitation imposes a strict upper bound on achievable correlation improvement.

To overcome this limitation, we propose Extrapolative Correlation Attention (ECA), an aggregation mechanism that enables controlled extrapolation beyond the convex hull while preserving stable optimization. Across diverse benchmarks, ECA consistently breaks the correlation plateau and improves predictive correlation without sacrificing MSE performance.