API Best Practices Blog
Making the shift from Big to Broad Data »
In my previous post, I laid out why I think we need to move beyond the hype of Big Data technology and “bigness” to focus instead on the breadth and diversity of data, as well as signal extraction, analytics and deep insights from that broad data.
Here we’ll delve into what we mean by "Broad Data" as well as some of the fundamental changes for businesses in today’s marketplace that compel the need to focus on breadth of data and on data stitching from disparate sources.
The shift of control to the edge of the enterprise
Social, mobile and cloud influences have caused enterprises to undergo a tectonic shift in how they do business with customers. The real value for an enterprise - the interaction with end users (customers) - has shifted one or two tiers away from the enterprise. The control is shifting to social networks where people are talking about companies and products; to business networks where interactions are happening through partner channels; and to apps and the APIs they leverage.
The landscape for customer interaction with enterprises looked significantly different just a few years ago than it looks today. Data was controlled within the enterprise – all of the data that an enterprise gathered were collected when partners and customers interacted with systems produced and provided by the enterprise.

But today’s landscape reveals an expansion of the interaction with customers by one or two degrees from the core of the enterprise. The evolution of the apps and API economy has resulted in people using apps that may or may not have been created by the enterprise. Apps then are the vehicles that inform the enterprise about how customers and partners are interacting with them.
Factor in the influence of social networks, partner- and business- networks and the effect is amplified. Simply put, the enterprise is no longer in control of the data it needs to inform and make accurate business decisions. That’s the fundamental shift of interaction to the edge (and even beyond the boundaries) of the enterprise.

This shift in the market has a fundamental implication for the Big Data conversation. The number and variety of data sources is much more important than the volume that comes from any one source.
Big Data becomes Broad Data
Data is not by itself "Big". Aggregated fragments of small and contextually related data make for "Big" - more accurately - "Broad" data. Taking advantage of the breadth of the data, its variety, its dynamism, and its disparate sources is the real future.
Just a few years ago, the data an enterprise collected were collected from physical stores, Web sites, and partners and from 5 to 7 primary data sources. Data from point-of-sales data sources, supply records, customer records, warehousing records, and so on reflected all the interesting things happening with respect to an enterprise’s interaction with customers and partners.

Today the sources and types of data are expanding continuously - there are hundreds of new data sources, each generating data (which might be small or not-so-small) and definitely generating a smaller signal/noise ratio.
The shift is significant - from 100% of data captured from 5 or 6 sources to a scenario in which maybe less than 50% comes from those original sources. In time, I contend that the old enterprise sources may not even be the most important source.
The many new sources are much smaller and from a variety of relatively new sources: from Twitter, Facebook, partners, tens and hundreds of apps, some built around your APIs. The list goes on. This essentially defines the need for the shift from the deep and big focus of the old world to the broad and pervasive focus of the new world. This will allow businesses to focus on all of the new places in which there is the potential of a signal relevant to their enterprise.
Whenever you collect lots of data, you of course collect lots of both signal and noise. Next time, we’ll look at increasing the signal to noise ratio of broad data - Big Broad Data: Increasing the signal to noise ratio »
Big Data: Beyond the ‘Bigness’ & the Technology (video & slides) »
Thanks to all who participated in last week's Webcast, Big Data: Beyond the 'Bigness' & the Technology. We explored moving beyond the "bigness" and technology hype of the typical Big Data conversation to how businesses need to respond to the explosion of new, disparate and dynamic data sources as social, mobile and cloud influences shift customer interaction to the edge of the enterprise.
The video (~35 min.) and slides are below. Thanks @jhingran.
We'd love to continue the discussion on the api-craft forum.
Big Broad Data: Beyond the “bigness” and the technology to extracting meaning »
The amount of data in our world has been exploding, and the concept of “Big Data” - collecting and analyzing large data sets—needs no introduction. It’s the buzzword of 2012 where IT is concerned.
There's been a focus on the business side of Big Data, which of course is a critical component of the discussion. Big Data is most certainly the next frontier for innovation, competition and productivity (McKinsey Global Institute, 2011).
However, a quick Google search, a track of #bigdata in your Twitter feed, or 5 minutes in a conversation with folks about "Big Data" will show you how the weight of the discussion focuses primarily on two things - first on technology and then on "bigness".
While both are important, I think that the focus on the technology and size is misplaced and causing us to miss the point that the depth of analysis of the data and the insights we get from them are the most important and valuable things.
“What’s your tool set?”
Hardly a conversation happens in the big data space that doesn’t start with the pros and cons of Hadoop, No SQL, Cassandra, Hbase . . . the list goes on. Technology is of course extremely important because without it we couldn't determine the signal over the noise or handle large data sets. But the technology is almost commodity. (And of course, trying to get two of us technologists to agree on a technology is a whole different discussion.)
"How big is your data anyway?"
Right behind the technology argument is the “Bigness” – the petabytes vs. terabytes argument. There are certainly technical complexities to dealing with petabytes of data. But terabytes and even kilobytes are big and more importantly they too hold valuable information.
Remember that a lot of the size will come from noise whether you’re dealing with kilobytes, terabytes, or petabytes. Big, noisy data is not valuable - the value will come from the signal that you can extract.
Extracting Meaning
To successfully glean value from big data, we've got to pivot the discussion to focus on the breadth of the data, signal extraction, and deep insights. This should make us think about the areas above or below the technology and not on the technology itself. Bottom line - the data itself is the real gold – the new currency.
The disruptive technologies of social, mobile, and cloud that are transforming how we do business serve up the breadth of data. Data about a business' customers is available and interpretable in all kinds of new contexts. A customer that checked in at the gym on Foursquare before visiting a retailer is likely to be interested in sports stuff. You can imagine hundreds of similar examples.
What's a good example of value from extracting signal over noise? A Klout Score uses data from social networks to measure reach and influence. It is a signal extracted from a superabundance of tweets and other social interactions.
Deep Insight is about how people can take the output of the machines and convert it into business value. We might come to know that shopping cart abandonment is higher from apps on Android devices than on iPhone devices, indicating that Android apps are less persuasive.
There’s also a fundamental change for businesses because of the apps and API economy that compels the need to focus on breadth of data and data stitching from disparate sources.
I’ll talk more in upcoming posts about "broad" data and "data stitching" as well as how Data APIs will lead the way in the exploding apps and API economy. We also discussed these topics in a Webcast last week. (video and slides here)



