MLOps, DevOps, AIOps?
By Nico Lutz
Generally speaking, MLOps lies at the intersection of machine learning engineering, DevOps and data engineering. It’s a fast-growing field targeted at engineers and developers trying to operationalise all facets of machine learning development.
As such, it comes in different flavours, modes and practices, and a variety of tools have been developed to focus on different aspects of the discipline. In my opinion, at least, there’s no one-size-fits-all solution when it comes to these tools. In this article though, I want to highlight how MLOps integrates into the different stages of project development, while diving into the general lifecycle of a machine learning prototype and showcasing how different tools can help throughout a project. I’ll then take a look at how we’ve used MLOps in a recent project, AI Start.
The current state of MLOps
The common components of a machine learning development cycle consist of:
- Data exploration: each machine learning project starts with data and the need to find out what the data actually entails, what you can do with it, and where it comes from (this last part is mostly forgotten, but necessary to understand the bigger picture)
- Data collection: tuning, preparing, and pre-processing the data to fit your needs
- Model creation: this task usually consists of selecting a model, then training and tuning it. The right model is selected and trained with respect to the task at hand. It’s important to track specific metrics and performance indicators that allow us to review the different models. Usually this step involves more than just finding the best model; in my experience, cost, deployment or speed are factors that need to be considered when selecting the right model for the job. There’s no need to use bleeding-edge architecture if it’s slow and can’t perform in a production environment — it’s all a trade-off
- Model deployment: the best way to deploy inside a company's network or infrastructure usually needs to be evaluated
- Model monitoring: this groups together tasks that follow a successful deployment. Ideally, scripts for retraining are automated, and a fast and efficient way to switch the inference to newly trained models is set up. In addition, operational metrics (like traffic/latency) or qualitative metrics (like accuracy on retrained models) are collected
MLOps can help to navigate some of these steps, especially in the development phase of a project. For example, tools like MLflow or Weights and Biases can help to keep track of different experiments with various hyper-parameters. They provide a nice UI for comparing different models, and the graphs they offer help me, the developer, communicate to the client the choices I’ve made and why — something that can’t be underestimated.
In an ideal world, an MLOps tool should be able to track experiments, the hyper-parameters and metrics; provide a way to highlight and compare these results; store the results and models in a unified way; and allow clients easy access to the above, for example through S3 buckets, etc. It should be easy to set up and capable of handling everything from hand-coded PyTorch models to ready-to-use Hugging Face or Scikit models. Ideally it should also handle some form of data versioning.
As I mentioned above, there’s no one-size-fits-all solution when selecting a tool. Although MLflow has a model registry, for example, it can’t handle complex workflows or automated retraining. Other tools like Airflow or Dagster try to bridge this gap by leveraging a graph-based approach to executing scripts. Speaking more broadly, what these libraries have in common is that they try to formulate code (or scripts) as “dags” (directed acyclic graphs), which are then executed based on different conditions — for example, time or changes in the database. Of course, it could be argued that these are nothing more than fancy Cron Job tools, but by providing a UI, debugging tools, deployment tools and more, they can greatly improve the maintainability of your model and the orchestration of large workflow dependencies.
Overall, here’s my takeaway on what an MLOps tool should include:
- Some form of encapsulated versioning: imagine developing several machine learning models, each depending on a different version of packages — a tool should be able to handle these cases but still be able to interconnect the processes
- Easy integration into existing code bases
- An easy way to deploy models
- But also an easy way to deploy the tool
- User/group management
- The ability to track processes
- The ability to collect metrics of processes (time is money: ideally a tool should be able to track the amount of resources a certain process chain takes in order to improve the workflow)
That said, each model, workflow or infrastructure needs a different set-up. It might make sense to isolate some models in a docker container, while, in other cases, a processing pipeline with different models is grouped into one process. In my experience it depends on many factors and each new project offers new perspectives.
MLOps in practice: AI Start
AI Start is one project where we’ve been using MLOps so far. It’s a collaborative effort between Seedstars, Katapult and Bakken & Bæck with the aim of using artificial intelligence and design techniques to explore detailed analysis of early-stage start-ups. You can read more in depth about how we developed an AI engine to predict the likelihood of raising a Series A/B investment round in an earlier article here.
The project relies heavily on a mixture of data sources to inform several data science pipelines developed during our research phase. As such, it’s a prime example of how to use MLOps tools not only during development but also in a production setting.
During the development phase we specifically used MLflow to track experiment outcomes across different teams. This helped us manage different metrics and models, as working with many people across different companies posed a challenge when project managing duties and assignments. We also found it beneficial to use the open-source data science framework Kedro to enforce best practices when creating pipelines that are reproducible, modular and maintainable.
While the models and pipelines we developed are powerful at discerning features of early-stage start-ups, updates will still need to occur when retraining a model, or features updated when more information becomes accessible(underlying data changes over time) or new information about a start-up becomes available.
In order to cover these cases and still enhance the early-stage company data, we automated relevant workflows by integrating Airflow into the backend stack. Airflow especially shines in building ETL pipelines. ETL is short for “extract, transform and load” and involves gathering data from multiple sources and then transforming and loading it into a database, for example. As such, the backend has access to underlying HubSpot and other similar web data and looks for changes in the provided data fields. When data fields change, we can trigger new computations of features that then result in updated classifications.
And that’s how our backend utilises MLOps tools to automate the data flow and effectively enrich company data.