Building a Media Mix Modeling Practice at The Knot Worldwide

By: James Pooley, Machine Learning Scientist

January 17, 2024

 

At The Knot Worldwide, we focus on helping couples navigate the complexities of their entire wedding journey and beyond. We offer couples a Vendor Marketplace to connect with wedding professionals as well as an assortment of planning tools, invitation and registry services, and much more.

As with any modern company, marketing plays a key role in informing our customers about our services; and as with all marketing investments, questions regarding the measurement of marketing performance are crucial for reporting and strategic planning. As a data scientist supporting marketing at The Knot Worldwide, I spend a lot of time thinking about these questions and the complexity of the problem.

In this post, we’ll provide an overview of how we’re approaching marketing measurement at The Knot Worldwide; in particular, we’ll focus on our utilization of media mix modeling (MMM).

 

Measuring Marketing Performance

Historically, marketing at The Knot Worldwide was heavily focused on digital channels and, consequently, last-touch attribution played a key role in measuring performance and informing planning decisions. However, as we have evolved and our marketing mix has expanded, a need has arisen to have a more holistic approach to measurement. Despite having been in use since the 1960s, MMM is uniquely suited to this role for the following reasons:

  • It provides a view of both online and offline marketing investments.
  • It uses aggregated historical data rather than raw data on individual users, and is privacy-friendly.
  • It both measures performance and, importantly, provides recommendations on how to better allocate our marketing spend.

The Knot Worldwide has recently expanded its marketing mix significantly with the launch of its largest ever integrated marketing campaign (recently announced by our CMO, Jenny Lewis) to include a number of additional channels, including, Podcasts, OLV, CTV, and more.

With the increased investment in upper funnel brand-building channels to complement our lower funnel performance channels, the limitations of last-touch attribution and the benefits of MMM for reporting have become more pronounced. MMM as a core competency being relatively new, and tasked with helping to get its practice up and running as quickly as possible, one early and important task was evaluating the MMM options available to us.

 

A Key Decision: Build vs. Buy

For companies wishing to utilize MMM, a number of options are on the table, including:

  • Outsourcing model building to advertising agencies or bespoke MMM consultancies
  • Utilizing new and constantly evolving SaaS MMM platforms
  • Building an MMM solution in house

There is no one-size-fits-all recommendation as to which of these options is the best fit for all companies. We chose to go the route of the third option. However, thanks to the hard work of open-source software developers, we don’t have to start from scratch and can leverage a variety of open-source tools and frameworks tailored to MMM.

 

Our Immediate Needs

Although a global company with many facets to our business, our first concerns were building national-level MMMs for digital marketplace leads in the United States, and finding an MMM framework that allowed us to build trustworthy models with both statistical and business validity.

With these considerations in mind, and having decided to go the route of building an MMM practice in house, how did we go about choosing the open-source frameworks to utilize? The remainder of this post will provide an abbreviated version of our approach to selecting and validating an MMM framework, while also demonstrating the challenge of marketing measurement.

 

Robyn and pymc-marketing: A High-Level Comparison

Two open-source modeling frameworks receiving significant buzz in the MMM community are Robyn (developed by Meta) and pymc-marketing (developed by the consultancy PyMC Labs). Both frameworks are useful, and as a data scientist, I clearly wanted to take both for a test drive before applying them to our internal data. I’ll use a simple example with simulated data taken from pymc-marketing’s documentation.

 

pymc-marketing

The newest MMM framework on the block, pymc-marketing, is a Python-based Bayesian approach developed by the consultancy PyMC Labs. Bayesian statistics brings with it a number of benefits, including:

  • Ability to model all uncertainty (e.g., regarding model parameters, missing data, etc.) using probability distributions
  • Ability to build arbitrarily complex models to match the complexities of your business

While we won’t be exploring these (as well as many additional benefits of a Bayesian approach) features in this post, let it suffice to say that pymc-marketing provides an out-of-the-box MMM model widely used by media mix modelers. PyMC Labs has a nice example showing its ability to recover key marketing performance measurements. In particular, on a simulated weekly dataset with with two media channels, their simulated data has true values of the following:

  • Channel 1:Carryover Effect: 40%Contribution Share: 81%
  • Channel 2:Carryover Effect: 20%Contribution Share: 19%

To demonstrate the inherent complexity of media measurement even in this simple scenario, we’ll repeat a subset of their exercise using another framework which we’ll now introduce. (For a more detailed illustration of tuning Bayesian MMMs such as those implemented by pymc-marketing, please see Slava Kisilevich’s excellent article Modeling Marketing Mix Using PyMC3.)

 

Robyn

Robyn is an approach to MMM based on traditional machine learning principles that also attempts to help incorporate business context in a semi-automated, “human-in-the-loop” fashion, reducing the time it takes to produce actionable insights. Its codebase is a mixture of R and Python and is currently limited to national-level models (e.g., those without a geographic split; e.g., by Nielsen DMA). Instead of utilizing probability distributions to model uncertainty, it performs a large search over the space of MMM parameters, building thousands of models in the process and whittling down the options to those that have both a good statistical as well as “business fit” to the observed data. A data scientist can then choose from these models or further refine their MMM. (For anyone wishing for a more detailed overview of Robyn, this post by Recast as well as Robyn’s excellent documentation are highly recommended resources to learn more.)

While there are numerous differences compared to the model implemented in pymc-marketing (e.g., the saturation curves implemented, the various options for how carryover effects are implemented, etc.), we’d like to see what applying Robyn to PyMC Labs’s simulated dataset with minimal tuning returns.

  • Overall Model Fit: First, we’ll check the statistical fit of the model to the data. Although there are issues with the measure, R2 is commonly used to measure how well a model matches actual data. (For a discussion on the issues with relying on this measure, Recast has an excellent post on the topic. In practice, we look at other measures of “model goodness” to assess which MMM we build is the best for our application.) For a basic Robyn model applied to the simulated data, R2 is a respectable 90% (which can be seen in the top chart), indicating that the model is able to account for a good portion of the uncertainty in the data.
  • Carryover Recovery: Using the model, we can also gain insight into the decay rate of our media investments (shown in the bottom left chart). Using the fairly common geometric adstock decay, we see that Robyn tells us, for example, that about 30% of our investment in channel 1 will carry over from week to week. Such insights are crucial for marketers when they are attempting to balance insights into the delayed effects of media exposure on the KPIs they care about.
  • Contribution Shares Recovery: Compared to the actual contribution shares, which are contribution shares of 81% and 19% for channels 1 and 2, respectively, Robyn is understating the contribution in the first case with 67% and overstating in the second case with 34% (as shown in the bottom right chart). However, the results are directionally aligned.

 

Considerations for Marketing Data Scientists

Out-of-the box, Robyn did not recover the known source-of-truth values of the simulated data; however, its core estimates were directionally consistent with these values. This is not to suggest that Robyn is a bad framework to consider for MMM. Quite the opposite! In fact, there are many additional avenues to fine-tune Robyn models not explored here (e.g., more complex adstock transformations, addition of different features, etc.). Additionally, it is important to note some differences between the simulated data and the model Robyn applies to these data. For example, the so-called geometric adstock in Robyn does not apply a maximum lag on the carryover effect’s impact, whereas in the simulated data the maximum duration of the carryover effect was set to be 8 weeks.

Our main point is to highlight the challenges of accurate marketing measurement. As with all business applications of data science, working closely with stakeholders and utilizing all our business knowledge and all the measurement tools in our arsenal is critical to construct a useful model that matches our business reality as closely as possible. As the saying familiar to every statistician goes: “All models are wrong, but some are useful.” Knowing we’ll never be able to capture all of reality in our model, our goal is to have a useful model that empowers marketing stakeholders with actionable insights. Robyn and Bayesian models such as those supported by pymc-marketing both enable marketing data scientists to produce useful models.

In this post we focused on recovery of selected high-level reporting capabilities of MMMs. Of course, at The Knot Worldwide we’re also using MMMs for their key features of the ability to understand marketing channel saturation and, consequently, utilizing this information to perform scenario planning and run optimized budget allocations across these channels. Both Robyn and pymc-marketing outputs support such functionality; so, when choosing an MMM framework, what additional considerations should data scientists take into account?

  • Geo-Level Models: Robyn does not currently support geo-level models, and neither do pymc-marketing’s default models. However, the flexibility of the Bayesian approach means that users can utilize functionality from pymc-marketing to build custom geo-level models. Robyn, in particular, will be most useful for data scientists building national-level MMMs.
  • Time-Varying Coefficients: Allowing modeled marketing performance to change over time, rather than being static. While neither framework currently supports these out-of-the-box, developers are actively working on integrating these into pymc-marketing.
  • Two-Stage Models: Allowing us to explicitly model the impact of upper funnel variables on lower funnel variables and metrics, thus allowing the complexity of the MMM to more closely match the complexity of the real-world marketing environment. Again, Robyn does not support such models, while the flexibility of the Bayesian approach means that users can again utilize functionality from pymc-marketing in the process of building custom models.

For all these reasons, although Robyn has been a wonderful framework to work with to help “bootstrap” our MMM practice at The Knot Worldwide, a Bayesian approach is an increasingly attractive option for our future MMM work.

 

Concluding Thoughts

While TKWW is still at a relatively early stage in its MMM journey, we are excited about where we are and more excited about what’s coming. From stakeholder buy-in to implementation, building an in-house MMM practice is not simple. However, thanks to the work of open-source developers authoring the frameworks considered here, it becomes a much easier task.

As parting words, I’d like to thank the developers of both Robyn and pymc-marketing, as developing open-source software can often feel like a thankless job. However, I (and I’m sure the rest of the MMM community would agree) greatly appreciate all their hard work!