One of the issues is that existing analytics and data science platforms aren’t always suited for today’s modern machine learning applications. They’re often not agile enough to deliver the real-time data needed to continuously — and quickly — train machine learning models.
Instead, data science teams can spend large chunks of their time building complex data pipelines, transforming the data and then training the models on the updated data. That complexity, along with the heavy volume of data in the pipelines and the pricing models of the data analytics tools, can hinder agility, auditability and overall organizational efficiency. Here, IT executives break down the challenges they’ve been facing in data pipeline management.
Data platform pricing models
One challenge in data pipeline management is that many commercial platforms charge by volume, according to Len Greski, vice president of technology at travel information company Travelport. The company currently handles more than 100 terabytes of data a day.