The New Data Economy: What Your Business Should Know
Arindam Sen, Senior Sales Director
Arindam Sen, Senior Sales Director
We are all aware that AI is here. We are also aware that AI depends on data. The importance of good data literacy and good data practices is becoming increasingly realized by businesses, sometimes as a precursor to, and sometimes as an element critical in tandem with, a good AI strategy. But the data landscape is changing too.
AI, in terms of a technology, construct everything by learning patterns from data. The last decade has seen the remarkable growth of AI because it caught wind of the perfect storm between massive quantities of data, meeting really powerful computational hardware. While the amount of data required to construct effective AI very much depends on the problem domain, it is commonly believed that larger datasets enable more options for powerful AIs. As the world wakes up to the need for AI capabilities to improve national, regional and corporate competitiveness, access to what data becomes a core element of this competitiveness.
Traditionally, data came from past experience. For example, a bank may have ten years' worth of data about its customers. A hospital has records of the patients. In such cases, data literacy and management became synonymous with ensuring that the data an organization had access to was protected well managed and effectively used. Today, however, an organization's internal data is only one lever available in the tale.
It is now possible, using the advances of technology, to augment existing data (to generate new data programmatically), use synthetic data from simulations and other means, or simply apply AI techniques that are less data-dependent. Based on your domain, one or more of these may become a key part of your data strategy.
Another important capacity of AI is the ability to develop one, AI from another through mechanisms such as fine-tuning or transfer learning. Such a class of AIs, known as Foundation Models, is typical of this trend. They have been trained on often humongous datasets but may be deployed in applications without necessarily having access to the original training data or even tuned for a range of purposes drawing on the understanding of the underlying dataset.
As the critical role of data to competitiveness in AI is more and more realized, concern arises that an AI race will be a race in data. To promote more equitable access to AI, entities are creating public data repositories, from medicine-biopsy images (see one example, in the Cancer Imaging Archive) to public datasets to train Generative AI models containing trillions of text tokens and billions of images.
Finally, it is even possible to buy and sell data in marketplaces. This is particularly interesting as it connects directly to the value of data, as it puts a price on it. Custom datasets are also curated, labeled, and bought from companies specializing in the generation of data in some industries.
What does this mean for your business? It means that the data strategy is integral to any kind of business that would seek to exploit the uses of AI, and whereas a data strategy will always include careful management of internal data, it is no longer just this element. A data strategy for a business, in the sense that I would consider today, could include understanding both what data sources publically are relevant, what foundational models might bring in digested insights, what synthetic data and augmentation is available, and what data-if any-can be purchased. All these can build toward a good and powerful data strategy that can then enable Return on Investment (ROI) from AI solutions.
Author