Loading Search...

API Best Practices Blog

Big Broad Data: The role of Data APIs »

In previous posts on this topic of Big Broad Data, we looked at some of the reasons for and implications of enterprises shifting their focus from the “bigness” and technology hype of “Big Data” to breadth and diversity, signal extraction, analytics and deep insights.

The future is around the easy consumption, the flow and interaction of data, which drives a revolution in the world of Data APIs. The structure of the Data APIs becomes increasingly important.

Building an Information Halo around APIs

Let’s consider enterprises as one of two types from a data perspective: those for whom data is the core business and those who give data away to attract increased transactions to the core of their business. 

In the latter case, the data (information) itself is not necessarily monetizable, but it attracts people to the business.

For a great discussion about the notion of information halos around your core business, see Sam Ramji’s talk: Amundsen’s Dogs, Information Halos and APIs: The epic story of your API Strategy.

In both scenarios, data is a fundamental and critical part of an API strategy

Enterprises who are monetizing around data are beginning to plant flags in different domains. Weather, finance, real estate, Internet traffic and dozens or hundreds of other domains are forming.

The people building the data in any domain are doing so by collecting from disparate data sources. To build out any one domain, they’ve probably stitched together data from a large number of data sources, cleansed and standardized it before finally exposing it as an API.

What are companies using to collect and stitch the data?

A natural and familiar stitching technique is the linked data model (linkeddata.org). While linked data techniques are excellent at accessing individual data elements, I argue that this is not the model that these data providers need. Instead they need to crawl, bulk load, and access data in large quantities, before cleansing, standardizing, and delivering it.

If linked data is not the most effective method to stitch data together to create the domains (at the bottom of the stack), can linked data become the de-facto standard to express data out of the information halo (at the top of the stack and as the Data API for domains)?

The answer is probably yes – eventually. I think it is an unlikely scenario just yet. Today, the challenge is how to cleanse, standardize, unify and use the data in individual domains. Linked data techniques have the right characteristics to bring together data that have already been cleansed, standardized, and stitched but is not a great model to do the initial stitching. It will most likely become useful and common in the future when the inter-linking of domains becomes more important than it is today.

If linked data is not the approach to expose data as APIs, what is?

The school of thought to which I subscribe is one of schema-based access to data APIs patterned after relational models. Here are a few examples of data APIs, which highlight three common kinds of data access patterns.

* Primary key lookup - to get to a specific data element.
* Imposed hierarchy-based lookup - in which you have classes with hierarchy and in effect traverse the hierarchy to get to the data elements.
* Rectangular lookups - defined by typical relational lookups of rows and columns.

All of these techniques are being built around single data sources as opposed to massively linked data sources.

The structure and “RESTification” of  Data APIs

There are several approaches to Data APIs. In addition to the perspective that is Pragmatic REST for SQL Developers, there’s Microsoft’s OData approach. As I asserted in a recent talk at an OData Meetup event at Microsoft, OData is a step in the right direction but there are certain things OData needs to do to become the de-facto standard.

The “RESTification” of the Data APIs is a fundamental imperative and both the Pragmatic REST for SQL and the OData approaches are good starting points.

Whatever the solution is, it cannot be vendor specific. Data is too important, and the data revolution too fundamental for it to be associated with any one vendor.

OData technologies need to be available in all ecosystems, not just in the Windows Foundation Classes (WFC) library and the .NET Framework. Similarly pragmatic REST and other techniques cannot be available in Apigee or any other single vendor offering only.

Let the call to action be to come together as a community; get the best of the linked data and OData ideas and techniques together and transform the world with Data APIs.

The conversation has already started and we’d love to hear more of your perspectives and arguments over on api-craft.

Big Broad Data: Increasing the signal to noise ratio »

In my previous post about Making the shift from Big to Broad Data, I made the case for thinking about Big Data not so much as “Big” but as “Broad.” We looked at the explosion of new data sources in today’s economy, which are individually typically smaller and more diverse than the enterprise systems of record of the past. Data comes from a variety of sources like Twitter, Facebook, partners, tens and hundreds of apps (some built around your APIs), and more. 

To be responsive and make business decisions, an enterprise simply has to be responsive to data across many more sources than in the past.

Signal extraction and stitching data from diverse sources

Whenever you collect a lot of data, you collect a lot of both signal and noise. In fact, a lot of the Big Data approaches to date are focused on extracting the signal from the noise in the data collected from traditional enterprise sources.

As the data an enterprise collects has shifted from 5 to 7 primary "enterprise-centric" data sources (point-of-sales data sources, supply records, customer records, warehousing records, and so on) to hundreds of diverse and typically smaller sources (hundreds of apps, hundreds of social networks, business networks, and so on), the size of the data doesn’t matter nearly as much as the number and diversity of data sources you need stitch together to extract meaningful signal. 

Now with the footprints of customer and partner behavior spread across hundreds and maybe thousands of data sources (see Making the shift from Big to Broad Data), there's undoubtedly a smaller signal/noise ratio than before. Data need to be stitched together to ensure that the signal rises above the noise.

Successful enterprises will be those who understand and consolidate the data that they own as well as that which they can acquire. Today, the challenges are striking deals and forming partnerships to get access to hundreds, not handfuls of data sources. It's no longer about old styles of purchasing data or getting them from departments in your own organization.

Enterprises need to think about access and control; whether they need to push analytics to the data or push data to the analytics, and so on. In the past, bringing data to a central repository in the enterprise - whether a warehouse or other techniques - where it could be analysed was a less significant concern.

Data APIs will lead the way for easy data consumption, flow, and interaction

Fundamental questions are What are the mechanisms enterprises need to stitch data together from its own enterprise as well as syndicated and external data sources?

How will people interact with other people’s data? (We’ve got to understand the form in which this data will be exposed and how it will be consumed.)

In Web 1.0, techniques centered around Web crawling; in Web 2.0 it was about Web pages, AJAX and other "rich" interface technologies. I think that in Web 3.0, broad, diverse data will be accessed, consolidated, and correlated through the power of APIs.

Today, transactional and data APIs are the Ying and Yang of APIs and the API conversation is fairly dominated by transactional APIs (which achieve tasks like sending messages, making trades, getting credit information, and so on). But I think we’re looking at a revolution in the world of Data APIs because the future is around the easy consumption, the flow and interaction of data. As APIs are central to the evolution and handling of big, broad, diverse data, so is data central to the evolution of APIs. The structure of the data APIs becomes increasingly important.

In my next post, we'll take a look at some of the schools of thought around structure of Data APIs. We'll explore the different techniques companies are using to collect and stitch data to create individual domains and to expose data out of those domains. See The role of Data APIs »

Making the shift from Big to Broad Data »

In my previous post, I laid out why I think we need to move beyond the hype of Big Data technology and “bigness” to focus instead on the breadth and diversity of data, as well as signal extraction, analytics and deep insights from that broad data. 

Here we’ll delve into what we mean by "Broad Data" as well as some of the fundamental changes for businesses in today’s marketplace that compel the need to focus on breadth of data and on data stitching from disparate sources.

The shift of control to the edge of the enterprise

Social, mobile and cloud influences have caused enterprises to undergo a tectonic shift in how they do business with customers. The real value for an enterprise - the interaction with end users (customers) - has shifted one or two tiers away from the enterprise.  The control is shifting to social networks where people are talking about companies and products; to business networks where interactions are happening through partner channels; and to apps and the APIs they leverage.

The landscape for customer interaction with enterprises looked significantly different just a few years ago than it looks today. Data was controlled within the enterprise – all of the data that an enterprise gathered were collected when partners and customers interacted with systems produced and provided by the enterprise.

But today’s landscape reveals an expansion of the interaction with customers by one or two degrees from the core of the enterprise. The evolution of the apps and API economy has resulted in people using apps that may or may not have been created by the enterprise. Apps then are the vehicles that inform the enterprise about how customers and partners are interacting with them.

Factor in the influence of social networks, partner- and business- networks and the effect is amplified. Simply put, the enterprise is no longer in control of the data it needs to inform and make accurate business decisions. That’s the fundamental shift of interaction to the edge (and even beyond the boundaries) of the enterprise.

This shift in the market has a fundamental implication for the Big Data conversation. The number and variety of data sources is much more important than the volume that comes from any one source.

Big Data becomes Broad Data

Data is not by itself "Big". Aggregated fragments of small and contextually related data make for "Big" - more accurately - "Broad" data. Taking advantage of the breadth of the data, its variety, its dynamism, and its disparate sources is the real future.

Just a few years ago, the data an enterprise collected were collected from physical stores, Web sites, and partners and from 5 to 7 primary data sources. Data from point-of-sales data sources, supply records, customer records, warehousing records, and so on reflected all the interesting things happening with respect to an enterprise’s interaction with customers and partners.

Today the sources and types of data are expanding continuously - there are hundreds of new data sources, each generating data (which might be small or not-so-small) and definitely generating a smaller signal/noise ratio.

The shift is significant - from 100% of data captured from 5 or 6 sources to a scenario in which maybe less than 50% comes from those original sources. In time, I contend that the old enterprise sources may not even be the most important source.

The many new sources are much smaller and from a variety of relatively new sources: from Twitter, Facebook, partners, tens and hundreds of apps, some built around your APIs. The list goes on. This essentially defines the need for the shift from the deep and big focus of the old world to the broad and pervasive focus of the new world. This will allow businesses to focus on all of the new places in which there is the potential of a signal relevant to their enterprise.

Whenever you collect lots of data, you of course collect lots of both signal and noise. Next time, we’ll look at increasing the signal to noise ratio of broad data - Big Broad Data: Increasing the signal to noise ratio »