API Best Practices Blog
Big Broad Data: The role of Data APIs »
In previous posts on this topic of Big Broad Data, we looked at some of the reasons for and implications of enterprises shifting their focus from the “bigness” and technology hype of “Big Data” to breadth and diversity, signal extraction, analytics and deep insights.
The future is around the easy consumption, the flow and interaction of data, which drives a revolution in the world of Data APIs. The structure of the Data APIs becomes increasingly important.
Building an Information Halo around APIs
Let’s consider enterprises as one of two types from a data perspective: those for whom data is the core business and those who give data away to attract increased transactions to the core of their business.
In the latter case, the data (information) itself is not necessarily monetizable, but it attracts people to the business.
For a great discussion about the notion of information halos around your core business, see Sam Ramji’s talk: Amundsen’s Dogs, Information Halos and APIs: The epic story of your API Strategy.
In both scenarios, data is a fundamental and critical part of an API strategy
Enterprises who are monetizing around data are beginning to plant flags in different domains. Weather, finance, real estate, Internet traffic and dozens or hundreds of other domains are forming.

The people building the data in any domain are doing so by collecting from disparate data sources. To build out any one domain, they’ve probably stitched together data from a large number of data sources, cleansed and standardized it before finally exposing it as an API.

What are companies using to collect and stitch the data?
A natural and familiar stitching technique is the linked data model (linkeddata.org). While linked data techniques are excellent at accessing individual data elements, I argue that this is not the model that these data providers need. Instead they need to crawl, bulk load, and access data in large quantities, before cleansing, standardizing, and delivering it.
If linked data is not the most effective method to stitch data together to create the domains (at the bottom of the stack), can linked data become the de-facto standard to express data out of the information halo (at the top of the stack and as the Data API for domains)?
The answer is probably yes – eventually. I think it is an unlikely scenario just yet. Today, the challenge is how to cleanse, standardize, unify and use the data in individual domains. Linked data techniques have the right characteristics to bring together data that have already been cleansed, standardized, and stitched but is not a great model to do the initial stitching. It will most likely become useful and common in the future when the inter-linking of domains becomes more important than it is today.
If linked data is not the approach to expose data as APIs, what is?
The school of thought to which I subscribe is one of schema-based access to data APIs patterned after relational models. Here are a few examples of data APIs, which highlight three common kinds of data access patterns.

* Primary key lookup - to get to a specific data element.
* Imposed hierarchy-based lookup - in which you have classes with hierarchy and in effect traverse the hierarchy to get to the data elements.
* Rectangular lookups - defined by typical relational lookups of rows and columns.
All of these techniques are being built around single data sources as opposed to massively linked data sources.
The structure and “RESTification” of Data APIs
There are several approaches to Data APIs. In addition to the perspective that is Pragmatic REST for SQL Developers, there’s Microsoft’s OData approach. As I asserted in a recent talk at an OData Meetup event at Microsoft, OData is a step in the right direction but there are certain things OData needs to do to become the de-facto standard.
The “RESTification” of the Data APIs is a fundamental imperative and both the Pragmatic REST for SQL and the OData approaches are good starting points.
Whatever the solution is, it cannot be vendor specific. Data is too important, and the data revolution too fundamental for it to be associated with any one vendor.
OData technologies need to be available in all ecosystems, not just in the Windows Foundation Classes (WFC) library and the .NET Framework. Similarly pragmatic REST and other techniques cannot be available in Apigee or any other single vendor offering only.
Let the call to action be to come together as a community; get the best of the linked data and OData ideas and techniques together and transform the world with Data APIs.
The conversation has already started and we’d love to hear more of your perspectives and arguments over on api-craft.
Innovator Spotlight: Pearson Plug & Play Platform »
Today we are proud to publish our first "Apigee Innovator Spotlight" featuring Diana Stepner, Head of Future Technologies at publishing giant Pearson. Apigee Innovator Spotlights will be a regular series of interviews with API experts sharing their experiences, strategies and best practices for innovating through APIs. Look for more soon!

Pearson is the world’s leading learning company. Pearson delivers content in a range of forms and through a variety of channels, including online services, ebooks, books and newspapers. It provides consumer publishing through the Penguin brand, educational materials and services, and business information through the Financial Times Group.
Diana Stepner, Head of Future Technologies at Pearson, discusses how Pearson, through the developer platform Plug & Play, has opened up some of the company’s award-winning content via APIs to foster the development of innovative new applications.
How are you using APIs today?
We're using APIs at Pearson to connect with developers, whether internally within our companies or externally. APIs are used in many different ways throughout Pearson. Some Pearson APIs are for internal development and others are for working with our partners. The Plug & Play platform is focused on delivering APIs that are available to third-party developers as well as internal teams.
The Plug & Play platform makes Pearson data available for developers to build new, innovative products that incorporate novel ways of using, displaying and blending Pearson content with other material and data.
The goal of the Plug & Play API program is to spawn innovation, connect with new audiences, and make it easier to create apps with Pearson content. Also via Plug & Play, we hope to explore new revenue opportunities.
What need were you addressing with your API strategy? 
The world is changing -- becoming more open. There has been an important shift in the ecosystem around us, and APIs are becoming the norm. Companies are using them as the foundation of their development. We saw this as a big opportunity for Pearson.
How have your APIs evolved over time?
Initially, we offered three APIs: the DK Eyewitness Guide API, which provides access to information about the top sites and attractions in nine cities (London, Barcelona, Berlin, New York City, Berlin, Paris, Prague, Rome, Venice and Washington); The Longman Dictionary of Contemporary English API, which provides access to the flagship Longman dictionary; and The FT Press API, which delivers insights from high caliber business books and original writings by leading business thinkers. Later, we added the The Pearson Kitchen Manager as a resource for food enthusiasts and chefs. Developers can go to the Pearson Plug & Play portal to explore and use all our APIs. We’re also expecting to add more APIs shortly.
In terms of the technology evolution of Pearson's APIs, we've been listening to feedback from developers, and they've been asking for more flexibility. In response, we've made changes like moving from a traditional relationship environment to a solution NoSQL based solution based on MongoDB. Also, sometimes the content we get is not as rich as it should be, so our developers are working with the businesses to augment the content we use to build APIs.
What kind of business benefits have you experienced as a result of your APIs ?
One of the key benefits we've seen by offering our APIs is a change in the perception of Pearson. We have been looked upon as a traditional publisher. Our goal is to be seen as an innovator in the digital world and part of the developer ecosystem. We participate in developer-centric events and are hearing the awareness around Pearson’s activities changing – to be more thought of as a forward-thinking company.
Our API initiative has driven some important benefits internally too. Now, there is more focus on making content available. At the end of a production cycle, we want to make sure there's an asset – an API. Or better yet, we want to start with an API! But there's definitely an increased focus within Pearson on making our content more available for developer access, internally and externally.
And of course, we are excited about the applications being built with Pearson’s content.
How do you work with Apigee?
We use the Apigee Enterprise platform -- specifically the Apigee Gateway, Key Management and Analytics products. Apigee Enterprise provides management and control of traffic flow, analysis and control of API usage, data security and protection, and support for scaling and performance with caching.
Pearson also uses Apigee Developer Connect for developer lifecycle management. This allows Pearson to engage and enable the developer community, and developer tools make it easy to explore and use Pearson APIs.
In the beginning, Apigee also gave us a lot of guidance to get our APIs up and running. Apigee's expertise was invaluable in helping us identify and establish API best practices, and we worked very collaboratively with them.
What is your vision for your API program?
For us, the ideal scenario would be that as soon as someone starts working on a project in Pearson, they think about how to offer a supporting API. We want to ensure a constant flow of content into the Pearson developer platform so it can remain a valuable resource for developers to tap into and get inspiration to build things we never would have thought of.
Big Broad Data: Increasing the signal to noise ratio »
In my previous post about Making the shift from Big to Broad Data, I made the case for thinking about Big Data not so much as “Big” but as “Broad.” We looked at the explosion of new data sources in today’s economy, which are individually typically smaller and more diverse than the enterprise systems of record of the past. Data comes from a variety of sources like Twitter, Facebook, partners, tens and hundreds of apps (some built around your APIs), and more.
To be responsive and make business decisions, an enterprise simply has to be responsive to data across many more sources than in the past.
Signal extraction and stitching data from diverse sources
Whenever you collect a lot of data, you collect a lot of both signal and noise. In fact, a lot of the Big Data approaches to date are focused on extracting the signal from the noise in the data collected from traditional enterprise sources.

As the data an enterprise collects has shifted from 5 to 7 primary "enterprise-centric" data sources (point-of-sales data sources, supply records, customer records, warehousing records, and so on) to hundreds of diverse and typically smaller sources (hundreds of apps, hundreds of social networks, business networks, and so on), the size of the data doesn’t matter nearly as much as the number and diversity of data sources you need stitch together to extract meaningful signal.
Now with the footprints of customer and partner behavior spread across hundreds and maybe thousands of data sources (see Making the shift from Big to Broad Data), there's undoubtedly a smaller signal/noise ratio than before. Data need to be stitched together to ensure that the signal rises above the noise.
Successful enterprises will be those who understand and consolidate the data that they own as well as that which they can acquire. Today, the challenges are striking deals and forming partnerships to get access to hundreds, not handfuls of data sources. It's no longer about old styles of purchasing data or getting them from departments in your own organization.
Enterprises need to think about access and control; whether they need to push analytics to the data or push data to the analytics, and so on. In the past, bringing data to a central repository in the enterprise - whether a warehouse or other techniques - where it could be analysed was a less significant concern.

Data APIs will lead the way for easy data consumption, flow, and interaction
Fundamental questions are What are the mechanisms enterprises need to stitch data together from its own enterprise as well as syndicated and external data sources?
How will people interact with other people’s data? (We’ve got to understand the form in which this data will be exposed and how it will be consumed.)
In Web 1.0, techniques centered around Web crawling; in Web 2.0 it was about Web pages, AJAX and other "rich" interface technologies. I think that in Web 3.0, broad, diverse data will be accessed, consolidated, and correlated through the power of APIs.
Today, transactional and data APIs are the Ying and Yang of APIs and the API conversation is fairly dominated by transactional APIs (which achieve tasks like sending messages, making trades, getting credit information, and so on). But I think we’re looking at a revolution in the world of Data APIs because the future is around the easy consumption, the flow and interaction of data. As APIs are central to the evolution and handling of big, broad, diverse data, so is data central to the evolution of APIs. The structure of the data APIs becomes increasingly important.
In my next post, we'll take a look at some of the schools of thought around structure of Data APIs. We'll explore the different techniques companies are using to collect and stitch data to create individual domains and to expose data out of those domains. See The role of Data APIs »



