Please log in to watch this conference skillscast.
Bigger isn't always better and is definitely more likely to be irresponsible when it comes to datasets. But datasets are the unsung heroes of modern machine learning and AI systems, just as much as the algorithms, advanced hardware, and models that support them. In this talk, I share some tips and tricks from over 20 years of building and using datasets for search, natural language processing, and ML applications more generally. I discuss the importance of understanding your application task, gotchas with personalisation, the benefits of human diversity, a couple of patterns for dealing with too much or too little data, and last but not least some responsible AI considerations.
Building better and more responsible datasets
Peter Bailey
ML Engineering Lead, Search & Recommendations
Canva