July 3, 2020
/
Careers

SageMaker and Metaflow: Modern Machine Learning Deployment at Cleo

Everything you need to know, by the Cleo AI Data Science Team 💡

Successful data science projects typically need to have a process for building, improving, and making the actual machine learning models on a continual basis. Organisations often build effective models, but then struggle to follow through with production.  The problem is so acute that some organisations have hired engineers solely to bring machine learning from proof-of-concept to reality.  In this blog post, let's talk about the technologies we’ve been implementing at Cleo to make our machine learning deployments simple yet robust.

For some time, we’ve been using Amazon SageMaker to train models in the cloud and deploy them as micro-services.  For those unfamiliar with it, SageMaker is Amazon’s machine learning training and deployment service.  By making use of Docker and separating the data from the code, SageMaker allows efficient training of models where the training instances are only open for as long as needed to train the model.  Once the model is built, it allows single-click deployment to an endpoint, which our backend services can then query whenever they need an inference.  You can read more about Cleo’s use of SageMaker in this guest blog for AWS.

SageMaker tracks data versions across the later steps of training and deployment, but it doesn’t capture details of the data extraction or preprocessing, and it doesn’t have an easy way to relate each endpoint back to the source code that went into it.  Until recently we were operating SageMaker through an untidy and poorly versioned collection of bash and Python scripts, mixed in with some old-fashioned clicking through the AWS console.

Enter Netflix Metaflow.  Open-sourced and released to the public in December 2019, Metaflow is a Python package that can be used to wrap both the training and deployment workflows of an organisation’s machine learning models.  It arrived just as we were looking to harden our deployment and version control and increase our iteration speed.  We became early adopters.

We were able to get up and running early in the new year with our first model.  We moved the deployment code into a single script — a Metaflow ‘flow’ — with Git and Docker commands handled through their Python SDKs.

With a single command we can now:

  • extract a new dataset
  • perform preprocessing
  • train a number of new models to get the best hyperparameters
  • evaluate the model
  • deploy the model as an endpoint
  • test that endpoint’s performance

Each time the flow runs, it increments the version number and logs all of the parameters, hyperparameters, and datasets of the run.  It also tags our Github and Docker repositories and all our SageMaker cloud objects with the version number, so we can join the dots between them later.

Metaflow also comes with inbuilt AWS integration: instead of local storage, objects can be persisted to Amazon’s data store, S3.  And it comes with a fast S3 client that runs in the background uploading all the logs and data.  That’s a big help for our collaboration — it makes it easy to pick up where a colleague has left off, and allows multiple people to work on the same application without getting in each others’ ways.

Building and deploying a new model with a single command means we can focus on data science rather than engineering, and makes switching between models and testing hypotheses speedy and fun.

Read more

Lifestyle

Let's talk about your credit

Probably not the words you want to read at the moment. That’s because the big old system your credit currently falls under is pretty important, and pretty unfair. Either that or you have no clue why you should even care about your credit score so this seemed boring. Let’s tackle both.

Friday, December 18, 2020
Careers

How to fix bugs with Norse mythology

You didn’t hear it from me, but Cleo Engineers aren’t perfect. We have bugs. We have quite a few of them. Our relentless focus at Cleo is to make it as simple and joyful as possible for our users to level up their relationship with money.

Tuesday, February 23, 2021
Lifestyle

End of year report, 2020

As we continue to power through the end of 2020, it’s time to look back on how consumer spending behaviors have significantly changed in light of the global pandemic. With a load of social restrictions put in place, everything from travel plans to socializing at bars and restaurants have been put on hold, impacting the ways consumers are spending their money.

Friday, December 18, 2020
Seen enough?
Download Cleo
Screenshot of the chat screen and paycheck breakdown feature