Is Machine Learning ready to take over the world?

5 Key Data Science Learnings from Zillow’s ‘iBuying’ Failures

Pitfalls and considerations to be made while working with DS/ML features

Pritish Jadhav

7 min readNov 19, 2021

Once upon a Time:

In December 2019, Zillow started a house flipping hustle - “Offers”, allowing sellers to directly sell houses to Zillow thereby eliminating the lengthy bidding and closing process.
Zillow saw this as an opportunity to monetize and profit from the rich real-estate data at its disposal.
The “Offers” program was powered by a time-series forecasting model — Zestimate.
Zillow even conducted a $1.2M worth Kaggle Competition for improving the accuracy of its prediction algorithm Zestimate.

The Flippinggate:

In the Summer of 2021, Zillow went on a shopping spree by buying $1.2B worth of houses.
On 2nd Nov 2021, Zillow declared subpar financial results where the revenues missed the Wall Street estimates by $267M whereas earnings per share missed the estimates by a whopping 702% to $-0.95.
Zillow’s CEO, Richard Barton acknowledged the flaw in their Housing price forecasting model where Zillow overpaid and lost approximately $81000 on each of the 3032 houses bought over the summer of 2021.
“We’ve determined the unpredictability in forecasting home prices far exceeds what we anticipated,” Mr. Barton said in a statement accompanying its quarterly financials.
Zillow also announced that it will be shutting down its house flipping business and laying off 25% of the workforce.

Reactions:

Zillow’s stock price nosedived more than 11% on 2nd November 2021. The stock price has dropped more than 50% YTD. That is brutal !!

Social media has been buzzing with opinions and post mortems about how Zillow missed the trick, how the machine learning model failed, Covid’s impact on the housing market, the blame games, and it goes on and on.

A Million Dollar Question:

If we take a step back from all the commentary and criticism, a more burning question is:

Is it the last time a Data product that uses Machine learning has failed?

AI and Machine Learning has grown leaps and bound over the last decade and is a cornerstone for many businesses.
As these prediction models start contributing directly to the revenue stream, there will be instances where they fail
These failures will have a more pronounced impact on a company’s bottom line and its future (oh the irony !!!).
People are afraid of AI taking over the world but the fact is businesses out there are still figuring ways to reliably monetize these predictive data products.
Having established that, it is imperative to identify the pitfalls in the process that can ultimately lead to a disaster.

1. Understanding and Defining the Problem:

Data Science and ML features are notoriously open-ended. However, it is important to distinguish between the problem space and the solution space.
Solving a problem that we don't completely understand is probably the very first pitfall.
It is absolutely crucial to take the time for defining and understanding the problem that we are trying to solve.
Some of the questions that I ask myself and the stakeholders during this phase of the development cycle include:

What is the business problem that we are solving for?
Who are we solving for? What is our target audience?
What is the expected outcome?
What are the current processes and benchmarks?
What are the initial set of assumptions?
What is the expected impact ?

It is important to note how the problem definition process does not mention Data Science, AI, ML, or Deep Learning.
Marrying the solution space before having a grasp over the problem space is a recipe for disaster.

2. Defining the Right Success Metrics:

Defining the right set of success metrics for a predictive feature is critical. In fact, this is where the development of predictive modeling differs from traditional software development.
Often the DS/ML, Analytics teams are focused on improving the accuracy, MSE, MAE, F-score, RMSE, etc.
Countless iterations are made to improve technical metrics. Sometimes, Kaggle competitions are leveraged to extract that every drop of optimization on technical metrics.
It is easy to obsess over the technical metrics while completely losing sight of the business.
For instance, Zillow maintained its faith in Zestimate by reiterating the low error rates but the fact is the business kept losing money QOQ. In such scenarios, the “low” error rates are moot.

I highly recommend defining a mix of technical and business success metrics for a DS/ML project.
Visibility and continuous monitoring of these success metrics are also important considerations in the development process.

3. Iteration 1:

With a well-defined problem and a set of robust technical and business success metrics, it is now time to shift the focus on the solution space.
I always prefer keeping the first iteration of the feature/predictive model simple while focussing on visibility and debugging properties.
The goal of the first iteration is __NOT__ production deployment.
It is an opportunity to learn more about the data at our disposal and ways to model it in a way that makes intuitive sense.
During this phase of the project, focusing on error rates and error types (Type-1/Type-2) is more important than improving success metrics at all costs.
To be honest, I count on my first model to fail miserably. Understanding why a simple model fails and decoupling the failures from model complexity will pay higher dividends than blindly adding layers of complexity.
An unbiased first iteration can be a very powerful tool in understanding the problem and the underlying data on a deeper level.

4. The Power of Kaizen:

Which of the following options would you choose?

Option 1: A One Shot Optimization formulation that leverages deep neural net model with billions of parameters to be trained on a GPU machine with an estimated development time of 6 months and an estimated accuracy of 95% on the hold out set.
VS
Option 2: A model that can be trained with thousands of parameters with an estimated development time of 1 month and an estimated accuracy of 80% on the hold out set.

I would advocate for Option 2. Trading development time and complexity for accuracy enable me to understand the failure cases before adding more layers of complexity.
For all you know, the accuracy can be significantly improved by adding more relevant features/data than adding more layers of complexity.
It is also important to remember the fact that the hold-out set is a sample set and not a population set. There is always a probability of overfitting the model by optimizing for accuracy on the hold-out set.
Kaizen is a Japanese term for “continuous improvement”.
Applying an iterative process of bettering your predictive model as opposed to a one-shot optimization minimizes the risk of the model failing in the real world with no visibility.

5. Do NOT lose focus of the Original Problem:

This sounds ridiculous but it is not that improbable to get into this state.
DS/ML field has been evolving with new approaches, algorithms, and technologies emerging every single day.
There are N different ways of formulating the same problem, M different algorithms to solve it, and T different technologies and frameworks.
This turns any given problem into a problem of N x M x T complexity.
With so many options and each one of them equally fascinating, it is easy to get stuck in a local optimum or even completely lose the way.

A development cycle of a successful data product with robust predictive powers is exhausting and even boring for the most part. It is imperative to avoid pitfalls to achieve the ultimate glory.

Comment about the pitfalls and considerations that you have experienced while working on such complex projects.

Let’s have a chat :

Reach out to me on Linkedin to brainstorm ideas.

Backtesting the MACD Trading Strategy Using Python

Moving Averages Convergence Divergence (MACD) is a widely used trading signal for detecting trend reversals.

medium.datadriveninvestor.com

Why is ReLU preferred over Sigmoid Activation?

Diving Deeper into Deep Learning — ReLU vs Sigmoid Activation function.

medium.com

Handling Continuous Valued Attributes in Decision Trees

Choosing the optimal splitting point for continuous attributes in Decision Trees