Broken data economics

For the past few years, big data has been all the rage, but it has not delivered on all its promise. We were expecting omniscience at our fingertips, and we’re getting moderately well-targeted advertising instead.

Don’t get confused though, big data is not going anywhere. On the contrary, data keeps growing at a dizzying rate, increasing in volume, variety, velocity, and veracity. IDC estimates that in 2025, the world will create and replicate 163 zettabytes of data (yes, a trillion of gigabytes), representing a tenfold increase from the amount of data created in 2016. This is the big data era for which companies have been preparing for years.

And IT is undergoing transformative changes to manage big data and embrace technology innovation. Five years ago, the most common big data use case was “ETL offload” or “data warehouse optimization”, but now more projects have requirements to:

  • Focus on velocity, variety, and veracity of data, e.g., moving to real-time, ingesting streaming data, supporting more data sources in the cloud, and making sure data can be trusted.
  • Add automation and intelligence such as machine learning to data pipelines
  • Run across different cloud platforms leveraging containers and serverless computing

Almost concurrently, because companies have all recognized the value of data, there is a multiplication of data consumers across the enterprise. Everyone wants access to data so they can provide better insight, but there are also new data roles, such as data scientists, data engineers, and data stewards to analyze and manage data. A recent report by Gartner states that by 2020, the number of data and analytics experts in business units will grow at three times the rate of experts in IT departments. And big data becomes addictive, once you use it in your analytics, you want more to provide even deeper insight.

And this is the challenge – companies and, in particular, IT departments, can no longer keep pace with data demands nor the users requesting access to it. To harness data, companies have made ever bigger annual investments in software solutions, in their infrastructures, in their IT teams. However, simply increasing the IT budget and resources is not a sustainable strategy, especially as data keeps on growing and more users pop up eager to get their hands on it. It’s clear current data economics are broken, and the looming challenge for companies looks particularly daunting with the rise of hybrid cloud and the multiplication of applications.

Gartner stated that “Through 2020, integration will consume 60% of the time and cost of building a digital platform”. At Talend, we believe we can completely change this statistic. Instead of throwing more money at big data, how about being smarter in how you enable users and use technology to manage data? Talend enables just that.

Enabling More People

If companies are to realize more value from more of their data, they need to make data management a team sport. Rather than each group going their own way, business and IT need to collaborate, so everyone can access, work with, and share trusted data. This is the balancing act between collaboration and self-service. At Talend, we believe that the productivity gains and cost savings are significant if users across the organization collaborate and manage data from the same platform, just like it is best to standardize on a productivity suite like Microsoft Office vs using different tools for word processing, spreadsheets and presentations. Talend Data Fabric supports this through a range of new self-service apps for developers, data scientists, and other data workers.  With these new apps, the business and IT can collaborate on integration, transformation and governance tasks more efficiently, and easily share work between apps.

In addition to business productivity gains through persona-based apps, IT improves efficiencies as the apps are all governed and managed through the Talend platform, i.e., there is a central management console for managing users, projects and licenses; a common way to share data, data pipelines and metadata across on-premises and cloud deployments; a single DevOps framework; and one method for implementing security and privacy across all your data. So instead of spending time integrating and managing each of your integration tools, you spend more time delivering data-driven insight.

By implementing governed, self-service, collaborative data management, trusted data flows faster through your business and everyone becomes more confident in the decisions they make.

Today, Talend provides the following apps:

  • Talend Studio– a developer power tool to quickly build cloud and big data integration pipelines
  • Talend Cloud Data Preparation – an easy to use, self-service tool where IT, marketing, HR, finance and other business users can easily access, cleanse and transform data.

And we just announced:

  • Talend Cloud Data Stewardship – a team-based, self-service data curation and validation app where data users quickly identify, manage, and resolve any data integrity issue.
  • Talend Cloud Data Streams – a self-service web UI, built in the cloud, that makes streaming data integration faster, easier, and more accessible, not only for data engineers but also for data scientists, data analysts and other ad hoc integrators so they can collect and access data easily.

Talend Data Fabric and its set of applications for different data workers are all about enabling everyone to work on data together.

Embracing Technology Innovation

Enabling more users is the first lever for companies; embracing innovation is the second.

The collective innovation of the entire technology ecosystem has been focused on reimagining what we do with data and how we do it. Everything in the data world gets continually re-invented from the bottom up at an accelerating pace. There is an endless supply of new technologies to consume data that improve both performance and costs.


Companies need to be enabled to embrace all this innovation and get the benefits of cloud computing, containers, machine learning, and whatever comes next. This is why Talend made the choice to build on an open and native architecture from day one. Talend Studio is a model-driven code generator, where you graphically design and deploy a data pipeline once, and they can easily switch technologies to support new data requirements. With support for over 900 components, including databases, big data, messaging systems and cloud technologies, it is easy to change a data source or target in your pipeline, and then let Talend generate the optimized code for that platform. This abstraction layer between design and runtime, combined with Talend’s continued support for new technologies, means IT teams don’t have to worry about big migration projects anymore, and can easily gain all the performance, security, functionality, and management capabilities of the underlying platform, such Hadoop.  Today, Talend includes support for AWS, Microsoft Azure, Google Cloud Platform, Apache Spark, Apache Beam, Docker and serverless computing, providing the ultimate flexibility and portability to run your data pipelines anywhere.

The 2018 imperative for companies is to put more data to work for their business faster. Because data is sprawling and users are multiplying, incremental improvement won’t make the cut. Companies must be exponentially better at putting data to work for their business. The only solution to this data challenge is a force multiplier to enable more users at a lower cost with more innovative technologies.