Resource: large-scale data exploration, MIC-style
Real-world data are messy. Relationships between two variables can take on an infinite number of forms, and while one doesn’t see, say, umbrella-shaped data very often, strange things can happen. When scientists talk about correlations or associations between variables, they’re usually referring to one very specific form of relationship–namely, a linear one. The assumption is that most associations between pairs of variables are reasonably well captured by positing that one variable increases in proportion to the other, with some added noise. In reality, of course, many associations aren’t linear, or even approximately so. For instance, many associations are cyclical (e.g., hours at work versus day of week), or curvilinear (e.g., heart attacks become precipitously more frequent past middle age), and so on.
Detecting a non-linear association is potentially just