API Best Practices Blog
Big Broad Data: Increasing the signal to noise ratio »
In my previous post about Making the shift from Big to Broad Data, I made the case for thinking about Big Data not so much as “Big” but as “Broad.” We looked at the explosion of new data sources in today’s economy, which are individually typically smaller and more diverse than the enterprise systems of record of the past. Data comes from a variety of sources like Twitter, Facebook, partners, tens and hundreds of apps (some built around your APIs), and more.
To be responsive and make business decisions, an enterprise simply has to be responsive to data across many more sources than in the past.
Signal extraction and stitching data from diverse sources
Whenever you collect a lot of data, you collect a lot of both signal and noise. In fact, a lot of the Big Data approaches to date are focused on extracting the signal from the noise in the data collected from traditional enterprise sources.

As the data an enterprise collects has shifted from 5 to 7 primary "enterprise-centric" data sources (point-of-sales data sources, supply records, customer records, warehousing records, and so on) to hundreds of diverse and typically smaller sources (hundreds of apps, hundreds of social networks, business networks, and so on), the size of the data doesn’t matter nearly as much as the number and diversity of data sources you need stitch together to extract meaningful signal.
Now with the footprints of customer and partner behavior spread across hundreds and maybe thousands of data sources (see Making the shift from Big to Broad Data), there's undoubtedly a smaller signal/noise ratio than before. Data need to be stitched together to ensure that the signal rises above the noise.
Successful enterprises will be those who understand and consolidate the data that they own as well as that which they can acquire. Today, the challenges are striking deals and forming partnerships to get access to hundreds, not handfuls of data sources. It's no longer about old styles of purchasing data or getting them from departments in your own organization.
Enterprises need to think about access and control; whether they need to push analytics to the data or push data to the analytics, and so on. In the past, bringing data to a central repository in the enterprise - whether a warehouse or other techniques - where it could be analysed was a less significant concern.

Data APIs will lead the way for easy data consumption, flow, and interaction
Fundamental questions are What are the mechanisms enterprises need to stitch data together from its own enterprise as well as syndicated and external data sources?
How will people interact with other people’s data? (We’ve got to understand the form in which this data will be exposed and how it will be consumed.)
In Web 1.0, techniques centered around Web crawling; in Web 2.0 it was about Web pages, AJAX and other "rich" interface technologies. I think that in Web 3.0, broad, diverse data will be accessed, consolidated, and correlated through the power of APIs.
Today, transactional and data APIs are the Ying and Yang of APIs and the API conversation is fairly dominated by transactional APIs (which achieve tasks like sending messages, making trades, getting credit information, and so on). But I think we’re looking at a revolution in the world of Data APIs because the future is around the easy consumption, the flow and interaction of data. As APIs are central to the evolution and handling of big, broad, diverse data, so is data central to the evolution of APIs. The structure of the data APIs becomes increasingly important.
In my next post, we'll take a look at some of the schools of thought around structure of Data APIs. We'll explore the different techniques companies are using to collect and stitch data to create individual domains and to expose data out of those domains. See The role of Data APIs »
Making the shift from Big to Broad Data »
In my previous post, I laid out why I think we need to move beyond the hype of Big Data technology and “bigness” to focus instead on the breadth and diversity of data, as well as signal extraction, analytics and deep insights from that broad data.
Here we’ll delve into what we mean by "Broad Data" as well as some of the fundamental changes for businesses in today’s marketplace that compel the need to focus on breadth of data and on data stitching from disparate sources.
The shift of control to the edge of the enterprise
Social, mobile and cloud influences have caused enterprises to undergo a tectonic shift in how they do business with customers. The real value for an enterprise - the interaction with end users (customers) - has shifted one or two tiers away from the enterprise. The control is shifting to social networks where people are talking about companies and products; to business networks where interactions are happening through partner channels; and to apps and the APIs they leverage.
The landscape for customer interaction with enterprises looked significantly different just a few years ago than it looks today. Data was controlled within the enterprise – all of the data that an enterprise gathered were collected when partners and customers interacted with systems produced and provided by the enterprise.

But today’s landscape reveals an expansion of the interaction with customers by one or two degrees from the core of the enterprise. The evolution of the apps and API economy has resulted in people using apps that may or may not have been created by the enterprise. Apps then are the vehicles that inform the enterprise about how customers and partners are interacting with them.
Factor in the influence of social networks, partner- and business- networks and the effect is amplified. Simply put, the enterprise is no longer in control of the data it needs to inform and make accurate business decisions. That’s the fundamental shift of interaction to the edge (and even beyond the boundaries) of the enterprise.

This shift in the market has a fundamental implication for the Big Data conversation. The number and variety of data sources is much more important than the volume that comes from any one source.
Big Data becomes Broad Data
Data is not by itself "Big". Aggregated fragments of small and contextually related data make for "Big" - more accurately - "Broad" data. Taking advantage of the breadth of the data, its variety, its dynamism, and its disparate sources is the real future.
Just a few years ago, the data an enterprise collected were collected from physical stores, Web sites, and partners and from 5 to 7 primary data sources. Data from point-of-sales data sources, supply records, customer records, warehousing records, and so on reflected all the interesting things happening with respect to an enterprise’s interaction with customers and partners.

Today the sources and types of data are expanding continuously - there are hundreds of new data sources, each generating data (which might be small or not-so-small) and definitely generating a smaller signal/noise ratio.
The shift is significant - from 100% of data captured from 5 or 6 sources to a scenario in which maybe less than 50% comes from those original sources. In time, I contend that the old enterprise sources may not even be the most important source.
The many new sources are much smaller and from a variety of relatively new sources: from Twitter, Facebook, partners, tens and hundreds of apps, some built around your APIs. The list goes on. This essentially defines the need for the shift from the deep and big focus of the old world to the broad and pervasive focus of the new world. This will allow businesses to focus on all of the new places in which there is the potential of a signal relevant to their enterprise.
Whenever you collect lots of data, you of course collect lots of both signal and noise. Next time, we’ll look at increasing the signal to noise ratio of broad data - Big Broad Data: Increasing the signal to noise ratio »
Big Data: Beyond the ‘Bigness’ & the Technology (video & slides) »
Thanks to all who participated in last week's Webcast, Big Data: Beyond the 'Bigness' & the Technology. We explored moving beyond the "bigness" and technology hype of the typical Big Data conversation to how businesses need to respond to the explosion of new, disparate and dynamic data sources as social, mobile and cloud influences shift customer interaction to the edge of the enterprise.
The video (~35 min.) and slides are below. Thanks @jhingran.
We'd love to continue the discussion on the api-craft forum.



