ML Ops
Machine learning operations
Hi all! Had a fun brainstorming discussion with my colleague about issues we have seen rolling out ML in production - who’s got one to add? :) (Note, I will use this in an upcoming talk, and reference you (if you want)) – David Aronchick
If you could tell a company about to deploy their first ML platform 3 things to know before they start, what would they be? – Peter Skomoroch
Responses:
- log everything and archive artifacts from every step in training to prediction. oh and log everything else too. no really log EVERYTHING. did i mention logging everything?
- Log the exact feature vector that was used to make a prediction. Don’t trust that you can recreate it later when you want to add it to a training set.
- Prioritize use cases by value and effort - hit high value, low effort first for ”quick wins” 2) Failure is a feature, not a bug. Fail fast and learn 3) Over communicate with all stakeholders, especially business providing the funding.
- More specific ones: 1) as per deronaucoin?, track your code, your data and have a journal 2) choose a framework that can do what you need but isn’t too painful for techops 3) enable rapid iteration (daily+) of ideas against prod data
- design the interface to match the user. Eg Full stack data scientists aren’t afraid of docker 2) language and framework agnostic 3) invest in monitoring and logging upfront and bring in good engineers to do this