API Best Practices Blog
Big Broad Data: The role of Data APIs »
In previous posts on this topic of Big Broad Data, we looked at some of the reasons for and implications of enterprises shifting their focus from the “bigness” and technology hype of “Big Data” to breadth and diversity, signal extraction, analytics and deep insights.
The future is around the easy consumption, the flow and interaction of data, which drives a revolution in the world of Data APIs. The structure of the Data APIs becomes increasingly important.
Building an Information Halo around APIs
Let’s consider enterprises as one of two types from a data perspective: those for whom data is the core business and those who give data away to attract increased transactions to the core of their business.
In the latter case, the data (information) itself is not necessarily monetizable, but it attracts people to the business.
For a great discussion about the notion of information halos around your core business, see Sam Ramji’s talk: Amundsen’s Dogs, Information Halos and APIs: The epic story of your API Strategy.
In both scenarios, data is a fundamental and critical part of an API strategy
Enterprises who are monetizing around data are beginning to plant flags in different domains. Weather, finance, real estate, Internet traffic and dozens or hundreds of other domains are forming.

The people building the data in any domain are doing so by collecting from disparate data sources. To build out any one domain, they’ve probably stitched together data from a large number of data sources, cleansed and standardized it before finally exposing it as an API.

What are companies using to collect and stitch the data?
A natural and familiar stitching technique is the linked data model (linkeddata.org). While linked data techniques are excellent at accessing individual data elements, I argue that this is not the model that these data providers need. Instead they need to crawl, bulk load, and access data in large quantities, before cleansing, standardizing, and delivering it.
If linked data is not the most effective method to stitch data together to create the domains (at the bottom of the stack), can linked data become the de-facto standard to express data out of the information halo (at the top of the stack and as the Data API for domains)?
The answer is probably yes – eventually. I think it is an unlikely scenario just yet. Today, the challenge is how to cleanse, standardize, unify and use the data in individual domains. Linked data techniques have the right characteristics to bring together data that have already been cleansed, standardized, and stitched but is not a great model to do the initial stitching. It will most likely become useful and common in the future when the inter-linking of domains becomes more important than it is today.
If linked data is not the approach to expose data as APIs, what is?
The school of thought to which I subscribe is one of schema-based access to data APIs patterned after relational models. Here are a few examples of data APIs, which highlight three common kinds of data access patterns.

* Primary key lookup - to get to a specific data element.
* Imposed hierarchy-based lookup - in which you have classes with hierarchy and in effect traverse the hierarchy to get to the data elements.
* Rectangular lookups - defined by typical relational lookups of rows and columns.
All of these techniques are being built around single data sources as opposed to massively linked data sources.
The structure and “RESTification” of Data APIs
There are several approaches to Data APIs. In addition to the perspective that is Pragmatic REST for SQL Developers, there’s Microsoft’s OData approach. As I asserted in a recent talk at an OData Meetup event at Microsoft, OData is a step in the right direction but there are certain things OData needs to do to become the de-facto standard.
The “RESTification” of the Data APIs is a fundamental imperative and both the Pragmatic REST for SQL and the OData approaches are good starting points.
Whatever the solution is, it cannot be vendor specific. Data is too important, and the data revolution too fundamental for it to be associated with any one vendor.
OData technologies need to be available in all ecosystems, not just in the Windows Foundation Classes (WFC) library and the .NET Framework. Similarly pragmatic REST and other techniques cannot be available in Apigee or any other single vendor offering only.
Let the call to action be to come together as a community; get the best of the linked data and OData ideas and techniques together and transform the world with Data APIs.
The conversation has already started and we’d love to hear more of your perspectives and arguments over on api-craft.
Big Broad Data: Increasing the signal to noise ratio »
In my previous post about Making the shift from Big to Broad Data, I made the case for thinking about Big Data not so much as “Big” but as “Broad.” We looked at the explosion of new data sources in today’s economy, which are individually typically smaller and more diverse than the enterprise systems of record of the past. Data comes from a variety of sources like Twitter, Facebook, partners, tens and hundreds of apps (some built around your APIs), and more.
To be responsive and make business decisions, an enterprise simply has to be responsive to data across many more sources than in the past.
Signal extraction and stitching data from diverse sources
Whenever you collect a lot of data, you collect a lot of both signal and noise. In fact, a lot of the Big Data approaches to date are focused on extracting the signal from the noise in the data collected from traditional enterprise sources.

As the data an enterprise collects has shifted from 5 to 7 primary "enterprise-centric" data sources (point-of-sales data sources, supply records, customer records, warehousing records, and so on) to hundreds of diverse and typically smaller sources (hundreds of apps, hundreds of social networks, business networks, and so on), the size of the data doesn’t matter nearly as much as the number and diversity of data sources you need stitch together to extract meaningful signal.
Now with the footprints of customer and partner behavior spread across hundreds and maybe thousands of data sources (see Making the shift from Big to Broad Data), there's undoubtedly a smaller signal/noise ratio than before. Data need to be stitched together to ensure that the signal rises above the noise.
Successful enterprises will be those who understand and consolidate the data that they own as well as that which they can acquire. Today, the challenges are striking deals and forming partnerships to get access to hundreds, not handfuls of data sources. It's no longer about old styles of purchasing data or getting them from departments in your own organization.
Enterprises need to think about access and control; whether they need to push analytics to the data or push data to the analytics, and so on. In the past, bringing data to a central repository in the enterprise - whether a warehouse or other techniques - where it could be analysed was a less significant concern.

Data APIs will lead the way for easy data consumption, flow, and interaction
Fundamental questions are What are the mechanisms enterprises need to stitch data together from its own enterprise as well as syndicated and external data sources?
How will people interact with other people’s data? (We’ve got to understand the form in which this data will be exposed and how it will be consumed.)
In Web 1.0, techniques centered around Web crawling; in Web 2.0 it was about Web pages, AJAX and other "rich" interface technologies. I think that in Web 3.0, broad, diverse data will be accessed, consolidated, and correlated through the power of APIs.
Today, transactional and data APIs are the Ying and Yang of APIs and the API conversation is fairly dominated by transactional APIs (which achieve tasks like sending messages, making trades, getting credit information, and so on). But I think we’re looking at a revolution in the world of Data APIs because the future is around the easy consumption, the flow and interaction of data. As APIs are central to the evolution and handling of big, broad, diverse data, so is data central to the evolution of APIs. The structure of the data APIs becomes increasingly important.
In my next post, we'll take a look at some of the schools of thought around structure of Data APIs. We'll explore the different techniques companies are using to collect and stitch data to create individual domains and to expose data out of those domains. See The role of Data APIs »
Making the shift from Big to Broad Data »
In my previous post, I laid out why I think we need to move beyond the hype of Big Data technology and “bigness” to focus instead on the breadth and diversity of data, as well as signal extraction, analytics and deep insights from that broad data.
Here we’ll delve into what we mean by "Broad Data" as well as some of the fundamental changes for businesses in today’s marketplace that compel the need to focus on breadth of data and on data stitching from disparate sources.
The shift of control to the edge of the enterprise
Social, mobile and cloud influences have caused enterprises to undergo a tectonic shift in how they do business with customers. The real value for an enterprise - the interaction with end users (customers) - has shifted one or two tiers away from the enterprise. The control is shifting to social networks where people are talking about companies and products; to business networks where interactions are happening through partner channels; and to apps and the APIs they leverage.
The landscape for customer interaction with enterprises looked significantly different just a few years ago than it looks today. Data was controlled within the enterprise – all of the data that an enterprise gathered were collected when partners and customers interacted with systems produced and provided by the enterprise.

But today’s landscape reveals an expansion of the interaction with customers by one or two degrees from the core of the enterprise. The evolution of the apps and API economy has resulted in people using apps that may or may not have been created by the enterprise. Apps then are the vehicles that inform the enterprise about how customers and partners are interacting with them.
Factor in the influence of social networks, partner- and business- networks and the effect is amplified. Simply put, the enterprise is no longer in control of the data it needs to inform and make accurate business decisions. That’s the fundamental shift of interaction to the edge (and even beyond the boundaries) of the enterprise.

This shift in the market has a fundamental implication for the Big Data conversation. The number and variety of data sources is much more important than the volume that comes from any one source.
Big Data becomes Broad Data
Data is not by itself "Big". Aggregated fragments of small and contextually related data make for "Big" - more accurately - "Broad" data. Taking advantage of the breadth of the data, its variety, its dynamism, and its disparate sources is the real future.
Just a few years ago, the data an enterprise collected were collected from physical stores, Web sites, and partners and from 5 to 7 primary data sources. Data from point-of-sales data sources, supply records, customer records, warehousing records, and so on reflected all the interesting things happening with respect to an enterprise’s interaction with customers and partners.

Today the sources and types of data are expanding continuously - there are hundreds of new data sources, each generating data (which might be small or not-so-small) and definitely generating a smaller signal/noise ratio.
The shift is significant - from 100% of data captured from 5 or 6 sources to a scenario in which maybe less than 50% comes from those original sources. In time, I contend that the old enterprise sources may not even be the most important source.
The many new sources are much smaller and from a variety of relatively new sources: from Twitter, Facebook, partners, tens and hundreds of apps, some built around your APIs. The list goes on. This essentially defines the need for the shift from the deep and big focus of the old world to the broad and pervasive focus of the new world. This will allow businesses to focus on all of the new places in which there is the potential of a signal relevant to their enterprise.
Whenever you collect lots of data, you of course collect lots of both signal and noise. Next time, we’ll look at increasing the signal to noise ratio of broad data - Big Broad Data: Increasing the signal to noise ratio »
Big Data: Beyond the ‘Bigness’ & the Technology (video & slides) »
Thanks to all who participated in last week's Webcast, Big Data: Beyond the 'Bigness' & the Technology. We explored moving beyond the "bigness" and technology hype of the typical Big Data conversation to how businesses need to respond to the explosion of new, disparate and dynamic data sources as social, mobile and cloud influences shift customer interaction to the edge of the enterprise.
The video (~35 min.) and slides are below. Thanks @jhingran.
We'd love to continue the discussion on the api-craft forum.
Big Broad Data: Beyond the “bigness” and the technology to extracting meaning »
The amount of data in our world has been exploding, and the concept of “Big Data” - collecting and analyzing large data sets—needs no introduction. It’s the buzzword of 2012 where IT is concerned.
There's been a focus on the business side of Big Data, which of course is a critical component of the discussion. Big Data is most certainly the next frontier for innovation, competition and productivity (McKinsey Global Institute, 2011).
However, a quick Google search, a track of #bigdata in your Twitter feed, or 5 minutes in a conversation with folks about "Big Data" will show you how the weight of the discussion focuses primarily on two things - first on technology and then on "bigness".
While both are important, I think that the focus on the technology and size is misplaced and causing us to miss the point that the depth of analysis of the data and the insights we get from them are the most important and valuable things.
“What’s your tool set?”
Hardly a conversation happens in the big data space that doesn’t start with the pros and cons of Hadoop, No SQL, Cassandra, Hbase . . . the list goes on. Technology is of course extremely important because without it we couldn't determine the signal over the noise or handle large data sets. But the technology is almost commodity. (And of course, trying to get two of us technologists to agree on a technology is a whole different discussion.)
"How big is your data anyway?"
Right behind the technology argument is the “Bigness” – the petabytes vs. terabytes argument. There are certainly technical complexities to dealing with petabytes of data. But terabytes and even kilobytes are big and more importantly they too hold valuable information.
Remember that a lot of the size will come from noise whether you’re dealing with kilobytes, terabytes, or petabytes. Big, noisy data is not valuable - the value will come from the signal that you can extract.
Extracting Meaning
To successfully glean value from big data, we've got to pivot the discussion to focus on the breadth of the data, signal extraction, and deep insights. This should make us think about the areas above or below the technology and not on the technology itself. Bottom line - the data itself is the real gold – the new currency.
The disruptive technologies of social, mobile, and cloud that are transforming how we do business serve up the breadth of data. Data about a business' customers is available and interpretable in all kinds of new contexts. A customer that checked in at the gym on Foursquare before visiting a retailer is likely to be interested in sports stuff. You can imagine hundreds of similar examples.
What's a good example of value from extracting signal over noise? A Klout Score uses data from social networks to measure reach and influence. It is a signal extracted from a superabundance of tweets and other social interactions.
Deep Insight is about how people can take the output of the machines and convert it into business value. We might come to know that shopping cart abandonment is higher from apps on Android devices than on iPhone devices, indicating that Android apps are less persuasive.
There’s also a fundamental change for businesses because of the apps and API economy that compels the need to focus on breadth of data and data stitching from disparate sources.
I’ll talk more in upcoming posts about "broad" data and "data stitching" as well as how Data APIs will lead the way in the exploding apps and API economy. We also discussed these topics in a Webcast last week. (video and slides here)
Insights from API data: The human factor »
In the last post after our Webinar Visibility at the Edge - Deep Insights from your API, I talked about what it means to get 360 degrees of visibility from your API. This time, I'll talk about using that 360 view to get deep insights.
Q: It is one thing to keep all the data, it is quite another to gain insights from it.
We've all seen examples of enterprises saying . . . "let's first store it, we'll worry about its uses later." Is that a good strategy? Is there a danger of collecting a lot of the data that is not very actionable or useful?
If you are generating 1000 TPS, you'll generate around 200 terabytes of data per year. We know that storage costs are going down but if you're storing 200 TBytes in a landfill fashion, you're looking at spending hundreds of thousands of dollars per month.
The basic question becomes - for the cost of collection and storage what value are you getting? You have to get incremental value for incremental investment. The only way to get this incremental value is to not "store and forget." Instead "store, analyze, synthesize, get insights" and then determine what else to store creating a tight loop between data, analysis, and business decisions.
Q: Are there any easy on-ramps to doing this?
There are certain technological considerations that help get started.
Cloud model: Any enterprise looking at an API strategy should look at a cloud-based strategy, either with the help of people who know how to do it or with your own people. It is no secret that getting infrastructure set up is not easy in any sized enterprise. Enterprise IT infrastructure has a 6-12 month procurement cycle and you have to plan it carefully. But how to plan in this space? You cannot know what API traffic will be or what value you'll get from the data 6-12 months ahead of time.
In a cloud model, you can avoid showing up-front costs without showing the value.
Out-of-the-box API data. Consider the kind of out-of-the-box API data you want to expose to your business users, which can show instantaneous value. It may be primarily operational value, but still instantaneous data that shows value.
Contextual data. Don't boil the ocean. Start with critical data from the enterprise that is “close” to the API. Consider what’s your core context, and expand slightly around the API data, restricting the scope of what you take in. In other words, take a judicious and iteritive approach to incorporating context data, which should help avoid paying the cost of context without knowing the value it adds.
Q: Is all that sufficient to affect the business?
You need data, you need analytics, and most importantly, for data and analytics to be relevant, you need to derive insights from these two.
Analytics is about running machine learning and computations on large amounts of data.
Insights is about how people can take the output of the machines and convert it into business value.
I believe that machines cannot replace human beings in deriving the insights that are critical to driving and pivoting your business. Deep insights come from API data, contextual data, machine analytics, and people augmenting the machine intelligence. 
Q: What are the organizational challenges to succeeding here?
It is important for those embarking on API journey to convince people to invest in APIs but also in analytics. I've made the case in the past that enterprises should make analytics and APIs peers in their strategy. This might mean an 80:20 effort for APIs:analytics. Making an investment in this culture change will pay dividends as you'll be able to show value far earlier than if you spend 100% on APIs up front and zero on analytics.
Another organizational challenge can be that not all API strategies are fully sanctioned arms of their enterprises. Many are nascent and need nurturing. People can still be wary of "opening up" back-end systems to let people interact with them.
Like many things in the corporate world, APIs and analytics need executive sponsorship. Of course, they have to deliver value, but without the executive sponsorship, the skeptics might kill it before it has a chance to prove itself.
Q: Can an API provider do something in the design of an API to make it more "insight friendly?"
I assert it's not a good idea to try to design for insights or to think about insight as an end goal. In this world, you're either in the business of APIs, or in the business of running a business.
If you're in the former, you never want to optimize for insights. Instead, you want insights to help you optimize your APIs for developers, for building apps, for app end users, and so on.
if you're in the latter camp, where APIs are generating a large amount of traffic and transactions of your business, then insights into the business of APIs now become insights into your overall business. Again, you don't want to, and probably can't effectively design for insights. Rather let the insights optimize your business.
That said, you can definitely think ahead to collecting those out-of-the-box data that will give you a jump start.
Q: Should you collect data yourself or gather it from systems already collecting it?
First work with people who have put APIs in front of data - figure out if you can use the data in this way. If not, and if you think you can derive a huge amount of value from the data, you want to be able to be able to slice and dice it so you might want a copy of it for yourself.
Insights from API data: 360 degrees of visibility »
Thanks to all who participated in the Webinar: Visibility at the Edge - Deep Insights from your API. Check out the video and slides here.
In the last post, I talked about what we mean by visibilty at the edge of your business and the concept of 360 degrees of visibility. This time, I'll talk about the data that gets you the 360 degree view.
Next time, we'll talk about gaining insights from that big data and the analytics on that data
In this post, I'll write out some of concepts and ideas we explored on the Webinar.
Q: What’s needed to make this 360 degree visibility happen?
There are a few things.
- First obviously, you need to be collecting data from the APIs.
- Second is collecting data that adds context to the API data.
- Thirdly, you need analytics that model and predict both the business and operational metrics.
Let's elaborate on data that adds context. Visibility into APIs is the best aperture into enterprises but just API data is insufficient - you need a mechanism to collect other "contextual" data.

Say a user is interacting with a Telco API. A cell phone number is flowing through that API. Other than that phone number, there's a huge amount of context that is not captured in that API:
- who's number is it?
- what do you know about that person?
- which app is that interaction happening through?
- Are they using only your APIs, or others; are they happy with your API or not? . . .
Knowing the mobile number and knowing the name, address, payment history of the person using that number, and so on, are two very different things.
Clearly visibility into APIs gives you valuable operations information, and even business impact information like the number of products purchased.
But wouldn't it be interesting and useful to know in what context that product was purchased, in which context the app was built, and so on? API data is great for operational and business visibility. Having contextual data greatly enriches the information for business visibility. Ideally you can collect enough contextual information around the APIs so that your analytics is no longer just looking at the bits and bytes of APIs.
Q: Does this combination of API data and context data dictate a need for a BIG DATA solution for companies?
Yes. Let's look at the two factors that I think are driving to a big data solution. The first is the volume and chattiness of API traffic, and the second is the new ways in which you need to interact with data.
Think about how your APIs are designed. Are they optimized first for back-end efficiency or for easy consumption?
They should be (and probably are) optimized for consumption with the understanding that the back-end optimization can come later. Optimizing for back-end systems first can lead you down a path where you build a system that's efficient from a back-end perspective, but not necessarily one that is attractive for developers to consume and adopt.
This necessity to optimize for consumption tends to lead to API traffic that is chatty and voluminous. Even a small sized enterprise that has 1000 TPS can easily end up with 200 terabytes of data every year. That's not huge, but it's also not small for most companies.
Just as important as the volume of data is how you interact with that data. Traditional data warehouses were built to answer questions you know you have and repeatedly.
The new big data solutions are about determining the questions to ask. In this scenario, it's not just the volume that is at issue - but the variety and nature of questions you need to ask. In an API, app, and customer-centric world you don't really know the patterns and questions at the outset.
Next time, we'll talk about analytics and how to get those insights.
Insights from API data - visibility across the value chain »
Thanks to all who participated in last week's Webinar: Visibility at the Edge - Deep Insights from your API. The video and slides are here.
As promised on the Webinar, we'll write out some of the Q&A we got into in the live session. We'll start with what we mean by visibility at the edge and why we think it's important.
Next time, we'll talk about gaining insights from that big data and the analytics on that data
In this post, I'll write out some of concepts and ideas we explored on the Webinar.
Q: What do we mean by visibility at the edge?
in the past, enterprises could focus on controlling their interactions with customers through physical presence or websites. Now people are interacting more and more with core businesses through apps.
Although the interaction has moved farther away and businesses are no longer in control of the interaction in the same way, customers are more important to the business than ever before. More data is coming in from developers and end users through apps and APIs.
Q: Clearly, to get such visibility, we will collect lots of data.
Before we get into the data part, are there examples of the value such visibility might provide?
Think about the number of different constituents around API interactions (even non-API interactions). There are the people transacting with your business, developers building apps, teams in enterprises responsible for the building and the success of APIs, teams responsible for the operations of APIs, and business users interested in the top and bottom lines.

In addition to various constituents, there are lots of interactions through multiple channels with different systems, including billing, catalog, procurement, order management, and so on.
The biggest challenge is how to collect the disparate perspectives you get through these channels and generate a 360 degree sense of it. If you believe in the apps economy, you believe this will all get standardized around the APIs "interface."
Bottom line, your APIs and the management of APIs gives you visibility into a larger set of interactions than any one back-end system can.
If you harness that visibility, regardless whether you run operations for the back-end systems, your business is to make the APIs successful, or you are a business person focused on the top and bottom line, you can get a lot of insights into your business. For example:
- How well is the API being adopted?
- What is the response time per request?
- Does operations need to allocate more resources at peak times?
- What impact is an API having on my bottom line?
and much more . . .
But there's something even more transformative one can get from API data. Instead of thinking only about what the enterprise or the API team learns from API data, think about what developers and the app end users can learn from the API data.
For example, if you are a developer, you'd like to know what impact you're having on the business whose API you are using to build your app.
Visibility into the APIs gives insights for all of the constituents of the value chain, therefore getting us closer to a 360 degree and actionable view.
For more on the business implications, check out this recent post Engage customers where they are - at the edge of your enterprise.
Next time, we'll talk about what else is needed to make this 360 degree visibility a reality.
Visibility at the Edge - Deep Insights from your API (video & slides) »
Thanks to all who participated in last week's strategy webinar:
Visibility at the Edge - Deep Insights from your API
Thanks to our speakers @jhingran and @brianpagano.
Here are the slides and video. Also, check out our follow-up posts this week in which our speakers further explore the topics of this Webinar hour.
We'd love more of your thoughts, insights, or questions on the api-craft forum. Or tune in to the new IRC channel #api-craft.
Managing Big Data »
In my recent Data Analytics & APIs post, I talked about analytics as a peer to your API, and using analytics to gain insights to drive success of your API strategy.
An API strategy quickly gets an enterprise to a place where they have big data sets. As defined by Wikipedia, "Big Data consists of datasets that grow so large that they become awkward to work with using on-hand database management tools."
No question that Big Data comes with big opportunity for valuable insights but also with challenges and questions about data storage, how to effectively manage the growth of data sets, methods for understanding data analytics and gleaning the most valuable insights from it.
Many people have commented, "big data is great, but before I get an ounce of value out of it, I am stuck with the cost of storing and managing it."* This is an understandable pain. However there are some simple solutions to manage this pain. Once managed, the gains from Big Data are easily achieved.
Let’s take a look at an enterprise with APIs that are getting requests at about 1000 TPS. Even if, conservatively, each request generates 1K of data, we are looking at ~100GB/day, or ~300TB/year, for just the raw information. While people tend to focus on the “storage” costs of managing this kind of information, it is almost always the people costs that dominate. Then add the complexity of running analytical tasks, and the simple “systems management” problem (not the byte storage costs) will tend to be overwhelming.
I have argued in the past that APIs benefit from cloud deployment. Today I will argue that the analytics on APIs also benefits tremendously from cloud deployment.
People Costs
Let’s first tackle the people costs. Managing hundreds of TB is not for the faint of heart. Clearly, when you have such a large volume, one would like to use the cheapest storage, which often times is significantly more unreliable than SAN storage. The storage needs to be backed up. The hundreds of servers need to be monitored. When bad things happen (and when is it that bad things never happen?
), corrective action needs to take place. Many of the Big Data technologies are immature, and have a fast release/patch cycle, so someone has to worry about patch management too. The list goes on and on.
One soon realizes that it is often better to have some experts “manage” this for you – typically a cloud delivered service. The service provider has in-house data management experts, might even have in-house data scientists – this allows you to focus on the business you have to run, and the value that Big Data will deliver, rather than all the blocking and tackling that needs to be done managing that data.
API Burstiness
APIs by nature are “bursty” in use. How does one provision the Big Data infrastructure that is needed to support this (over provision for peak, or under provision for average)?
It is absolutely clear that instantaneous provisioning of new data and storage is not an easy model for any enterprise to execute on, let alone a small to medium sized enterprise. Failure to provision leads to enterprise amnesia#. The data is either lost forever, or can only be analyzed days or weeks after the fact, when the new systems are provisioned.
It therefore makes perfect sense to look at a cloud delivery model for your API analytics.
I am not suggesting that there aren’t issues to deal with as you make these decisions. For example, issues about data security are always top of mind. However, as cloud-delivered services mature, these issues are less and less bothersome. The benefits are very little pain in moving your API analytics to the cloud, and not having to worry about all the headaches of managing the infrastructure, so that you can focus only on the gains.
* Hadoop Has Promise but Also Problems- Jessica E Vascellaro, Wall Street Journal
# Enterprise Amnesia: Organizations Have Lost Their Minds Jeff Jonas
Data Analytics & APIs »
There's an explosion of developers building mobile and web apps. There's also an explosion in APIs as enterprises adopt API strategies to open up access to their back-end systems to enable developers. As a result, more transactions flow from end users and developers to enterprises increasing reach, revenue, and profit for enterprises.
In this short video, I talk about how to think about analytics as a peer of your APIs and the role analytics plays in having your API strategy succeed.
Every enterprise expects to see increasing rate of transactions (sales, subscriptions, etc.) over time. API-delivered transactions are going to contribute an increasingly larger fraction to this top or bottom line.
For an enterprise that is early in its API journey, a small fraction of total transactions come from APIs.
For an enterprise in a later stage of its API program, APIs contribute a much larger fraction of the total transactions.
![]()
For all these companies, analytics on APIs benefit many constituents including developers, API managers, operations teams, and business managers.
For the early adopters - the companies embarking on their API journey - analytics optimize the business of APIs. IT and API metrics like throughput, availability, latency, errors etc. all help improve your APIs which results in an increased rate in the growth of your API strategy.
For the seasoned enterprises, analytics become even more important to optimize the business of the enterprise. Leveraging business-level analytics on APIs -- which products are purchased, which are tagged as favorite, which API resources are most used, etc. -- the enterprises in which APIs already contribute a high fraction of transactions will also see an uptick in the rate and relative contribution of API business to the bottom line.
Irrespective of the phase or size of your API program, your business will benefit from analytics.
Video and Slides: Is your API naked? API Platform and Ops Considerations »
Thanks to all that attended last week's API Best Practices Webinar #5 "Is your API Naked? API Platform and Operations Considerations" (and thanks to our presenters @gbrail and @landlessness). Video and slides are below.
Our next API webinar, "Your API Sucks! Why developers hang up and how to stop that" with @landlessness and @earth2marsh, is June 14th at 11am PST (sign up here!)
(And you can see all our API best practices webinars to date here)
Video and Slides: API Metrics - What to Measure »
Thanks to all that attended last week's API Best Practices Webinar #4, API Metrics - What to Measure (and thanks to our presenters @brianpagano and @landlessness). Video and slides are below.
Our next API webinar, "Is your API Naked? API Technology and Ops Considerations" with @landlessness and @gbrail, is June 14th at 11am PST (sign up here!)
Design your API for adoption with ‘true REST’ »
"The only reason you'd have only a SOAP API is because you hate 80% of your addressable market." - @sramji
There's usually little argument that a REST API is easier to use than a SOAP API.
But how important is it to be 'truly' or 'strictly' RESTful? That is, adhering to standard HTTP operations or 'verbs' - GET, PUT, DELETE, POST - on well defined resources, as opposed to the common practice of embedding 'verbs' or operations as methods in a GET URL.
Typically, security is cited as the big advantage of 'true REST' (with some good discussions here and here).
However, a truly RESTful API may help you boost developer adoption. For example, imagine a 'shopping cart' API:
| Operation | Operation | URL |
|---|---|---|
| Insert new item into the cart | POST | http://api.shopping.com/InsertNewItem |
| Delete item from the cart | POST | http://api.shopping.com/DeleteItem |
| List everything in the cart | GET | http://api.shopping.com/ListCart?cartId=X |
| Get an item in the cart | GET | http://api.shopping.com/ShowItem?cartId=X&itemId=Y |
| Delete the whole cart | POST | http://api.shopping.com/DeleteCart |
While the above API isn't 'truly RESTful', it's not that hard to use. But you do have to learn the individual operations and this can get cumbersome if there are a lot of them or as the API evolves.
Instead, this 'true REST API' may be easier to learn and predict as you use more features.
| Operation | Operation | URL |
|---|---|---|
| Insert new item into the cart | POST * |
http://api.shopping.com/carts/X.xml |
| Delete item from the cart | DELETE | http://api.shopping.com/carts/X/item/Y.xml |
| List everything in the cart | GET | http://api.shopping.com/carts/X.xml |
| Get an item in the cart | GET | http://api.shopping.com/carts/X/item/Y.xml |
| Delete the whole cart | DELETE | http://api.shopping.com/carts/ |
What if we want to list all the shopping carts in the system at any one time? We would add that via an HTTP GET to:
http://api.shopping.com/carts.xml
Query parameters can still serve a purpose - making it possible to specify additional options. For instance, imagine a very large shopping cart, and you want to "paginate" the results. To look at items 20-29, you might use a URL like:
http://api.shopping.com/carts/X.xml?start=20&;count=10
Bonus: True REST makes analytics easier
It's a lot easier to build reports that segregate and analyze traffic by URL than to build logic that tries to do this by methods or combination of methods. Good API analytics helps you optimize features, debug problems, and weed out traffic that can slow down your service.
What if your 'non-strict' REST API is already out there?
It might not be that big a deal if your API is very simple or 'read-only' API with information that isn't too sensitive (such as a free search API). Or you can map a non-RESTful API into a 'truly RESTful" API with custom code or API management tools thatperform API transformations.
* A note on POST VS. PUT
* One way to insert an item in the shopping cart is to use POST to update the shopping cart by sending it a new item. In this case, we are using POST to send the server an instruction that essentially tells it to insert some new content to the existing resource. This is why we use POST -- it is like an "update" in a database.
Alternately, we could add the item by using PUT to a new URL, such as:
PUT http://api.shopping.com/carts/X/items/Y.xml
But if we do this, then we need to somehow give our item some sort of unique URL by picking a value for "Y". This is kind of a strange thing to do, so it may be more natural to use POST and have the response include the URL for the new item, so that we may retrieve it later using GET or delete it using DELETE. Still, sometimes using PUT like this makes sense.
This all comes down to the difference between PUT and POST in the HTTP spec. POST modifies something that already exists, and how that thing is modified is up to the server. PUT replaces the entire contents of the URL with new data. Plus, like GET, HEAD, and DELETE, PUT is idempotent, which means that if you call it more than once, it has the same result every time, whereas POST may keep doing what you ask it to do over and over again
But can you hold the API to the SLA? »
Great article by Jonathon Feldman in Information Week recently with some steps for CIOs to take before getting into cloud computing. One is to insist on SLAs from cloud providers, especially considering the natural tension from the provider's perspective between high-availability and low-cost operations.
Absolutely agree. But to build on this - remember that scene from Seinfeld where Jerry is at the car rental counter - "Anybody can *take* a reservation, the important part is to *hold* the reservation."
Often, cloud and API providers will agree to SLAs, but have limited means to enforce or verify the SLA is held. Many SLAs are just 'on paper' with minimal enforcement or monitoring. This gets especially tricky if you need to discuss financial penalties.
Consider how you will monitor, meter, and audit API traffic and content between you and your partners - from your side - in order to pinpoint problems, protect your organization, and especially if you need to enforce penalties based on SLA misses.
Why modern applications need an API proxy »
Structures of control are spontaneously generated in every environment and every wave of computing.
Today on the web we have a model where browsers are the single point of control for much of what happens, not just at the level of applications, but at the meta-application level as well. Not simply usage (“point-click-type”), but things about usage – who is the user (browser cookie), what are they using the app through (user agent), where did they come from (referrer), what can we infer about their behavioral state, and so on – as well as modifications of usage (browser add-ins, content filters, security modes, local caching for performance). To be sure, some of these things can be and are performed using infrastructure between the browser and the website (such as content filtering, security, and caching), but the guaranteed component is the browser.
This is one of the reasons that Google Analytics is so popular and useful – you can rely on it to tell you useful things about your traffic because it can rely on the browser as a predictable point of control. Including an invisible piece of content on your web page makes the browser fetch data from Google, implicitly sending information that enables Google to report on your usage.
For web and cloud APIs, what is the equivalent structure of control?
Currently there is no one point like the browser. This is for great reasons – APIs are all about reusing application or service logic and rendering it to different form factors: pure logic (built into an internal application computation), web UIs (part of a mashup), and most notably, client applications on a wide range of devices (from PCs to mobile phones, set-top boxes, and tablets like the iPad). These devices are in the early part of a boom that will see over 10 billion individual units in use, representing at least hundreds of unique hardware/software designs. The sheer utility of these internet-connected devices predicts that their usage will drive high demand for APIs rather than standard websites. There are initial specifications like BONDI that suggest a standard contract across all of these for “mobile web applications” that include interaction with the features of the local device (such as a camera or GPS) but they are years from broad adoption and don’t attempt to unify all API access down to a common control point.
Given that APIs are to application logic what RSS is for content, we know they will be very important; at least as important as the visible web that we use today and possibly more important. This suggests that the other things that are spontaneously generated in value-exchange environments like user/customer management, behavior analysis, content filtering, caching, and security – will show up for APIs as well.
The web API equivalent of the browser’s control structure is an API proxy.
This is a control point which unlike a web proxy is fully aware of API content, communications patterns, and able to drive the meta-application controls discussed above. An architecture like Google Analytics which is founded on a browser’s predictable algorithms cannot work in an API setting. The same rule applies to add-ons that modify usage – they can’t do so relying on the local device if they are to be widely adopted. But an API proxy – a server or service on the internet, sitting between the client (regardless of type) – is able to be that point of control. As traffic runs through it, meaningful data can be captured for immediate outcomes (block access, change the message, or respond from a cache) and later used for behavior analysis and business planning. Add-ons that modify usage of the API can be installed at this point (content filtering, adding new information such as advertising, or identity management). All of this can be done while adhering to the contracts of the APIs and supporting the web architecture and rules of HTTP-based applications, and without attempting to solve the logarithmically complex problem of modifications to all the world’s clients.
So API proxies are likely to be necessary for the sustained growth of web and cloud API usage. There are likely to be several nuances that end up differentiating the different implementations and providers of API proxies. The key is to start experimenting with them now in order to build better apps and stay ahead of the competition.
Tech Talk: API Visibility and Metrics »
Earlier this week, Greg speculated that Twitter might have benefited from digging deeper into API metrics and usage patterns, so we thought it would be a good time to put him on the spot with a tech talk he recorded on API visibility a couple weeks ago.
For more, here are some sample API metrics considerations and a demo of our own API Analytics solution.
TrueCredit.com API case study »
Scott Metzger, CTO of TrueCredit.com was kind enough to take some time to talk about their Consumer Connect API program and some of the technical challenges that they have addressed using Apigee's API Gateway.
Scott wanted to make life easier on his development team as they ramped up their number of APIs, partners and traffic volumes. Here, he describes how he uses the technology as a 'policy layer' to provide API analytics, fine-grained data protection, and caching in an API Gateway. In this case, Apigee Enterprise is deployed on-premise virtualized software.
We're very excited to be working with Scott and TrueCredit, and check out the full TrueCredit Case study.
Is your API naked? 10 API roadmap considerations (part 1: visibility) »
We'd like to run a short series on product and technical requirements you might consider for an API roadmap or strategy. We’ll base this on trends we see in talking to hundreds of product and engineering managers that are either opening or consuming APIs (or aggregating and publishing large numbers of RSS feeds.)
Instead of talking about your APIs functionality, this is about what's *around* the API features and data. Many APIs start out as just raw naked back end features. And there is often a big gap between a raw feature and a full-fledged service, which is your API might eventually become.
So this series is about what's needed to "monetize", "productize", or "operationalize" an API. And not just if you are providing an API to customers – much of this applies if you are consuming APIs as well.
Topic #1: API Visibility
We're always surprised how almost every company we talk to says how little they know about their API traffic and usage. We see lots sifting through web server logs to understand usage. As the API becomes more strategic and you want to make the case for more investment - this gets more painful.
This happens a lot when an API starts as an experiment, launched by the 'ask for forgiveness, not permission' types (you know who you are). Things like metrics or analytics are back burner until an API either gets off the ground or doesn't.
But most APIs usually end up getting more important more quickly than expected, and as a product and engineering manager you may start asking:
Who is using the API and how much are they using?
- How many clients, apps, developers are out there?
- How do they break down by type, region, class of service?
- How does usage map to existing customer or partner organizations. Or how do developers map to applications map to customers? (This can be tough with only key or IP based tracking.)
- What parts of the API and service are they using - on a method or operation level?
- How does traffic breakdown between the your own products and 3rd party products? (If they use the same API.)
- What the aggregate and per developer/app/customer transaction and data volumes?
How fast and 'good' is your service?
- What latency are customers experiencing, by developer, customer, region, or operation?
- Where are errors and user experienced bugs happening and how often?
- How is the API delivering vs. any formal SLA have agreed to or paid for?
- How can you find out if a customer is having a problem (before they call you)?
- How is the API usage impacting the rest of the platform or web products that also use the same API?
- Can you quickly trap and debug based on a specific message? Based on what is in a cache? streaming right now?
How does the API impact the business?
- Who are the top priority customers? Developers? Partners? Who should you call to up sell to a higher service tier or do a deal with?
- What do you need to show general management to make product strategy (or tactical) decisions?
- Will you need to create audit trails or metering reports for partners that are paying for API access?
- Do you need to create metrics based on a certain data value in the payload? (Such as a specific product SKU)
- What is the cost of the data that you are serving up? (if you are licensing this data)
- Are you in line with all the compliance standards that IT enforces for the on-premise apps?
Knowing this stuff is really important when opening an internal service as an API, because now customers, contract terms, and compliance regulations come into play. Analytics and metrics help you get proactive with customers and partners, and are critical when you need to make the business case for an APIs to executives. You probably use web analytics to help you improve your Web UI - at some point you need this for APIs as well to see where to invest.
What's your experience? We’d love to hear what you think. Next up: traffic management.
SaaS API management and operations »
This week we'll be at the O'Reilly Velocity conference on scalability and operations in San Jose. On the topic of API operations, below is a case study we did with Tim Madewell of Innotas, providers of on-demand IT Governance - where he talks about how they operationalize and scale their SaaS API.
Tim talks about the importance of having separation and visibility between front-end and back-end service traffic. We are seeing this use case more often as more web products are being built off the same API that is opened to customers and partners. Because your web app is the biggest customer of the API, it's critical to be able to understand and throttle traffic into the back-end to make sure your web app performance isn't compromised by API usage by other clients.
From a competitive standpoint, Tim makes a great point that it's critical to be able to assure enterprise customers that a SaaS API is as robust as anything their customer could build or buy on-premise - not only from a functional standpoint, but operationally in terms of security, compliance, control and scale.
For more on this, Dana Gardner did a great podcast on Innotas API management at briefingsdirect.com



