Loading Search...

API Best Practices Blog

In the cloud, scale means concurrency »

In enterprise computing, scale has traditionally meant “lots of transactions per second."  On Wall Street for many years, “20,000 TPS” was the magic number as it was the rate of a typical market data feed.  Infrastructure like TIBCO’s UDP-based information bus and then IBM’s MQSeries became the base platforms for much of this scale of computing, and are still heavily used alongside modern JMS and MSMQ implementations.
 
Relatively little attention was paid to concurrent connections.  Enterprise environments tend to be well-regulated, and most applications will have under 1000 simultaneous users (whether human or machine driven).  As a result, application servers and related technologies evolved to support high transaction throughput at limited concurrency.
 
The web on the other hand brought in much higher concurrency requirements, and platforms like WebLogic became default components of web computing environments for sites serving 1,000s people at the same time.  This was a breakthrough and led to significant market success in a short time period.
 
With the rise of cloud computing, two things change.  First, mobile applications and the API economy are driving an order of magnitude increase in the number of simultaneous users.  Second, these users are often machines rather than people, and therefore aren’t limited to the demand patterns of humans users clicking links or refreshing their pages.
 
This produces a new set of demand patterns which increase both total throughput and peak concurrency.  As an example, travel sites like Kayak.com and Bing.com/travel issue hundreds of API requests to airline reservation system backends as a result of a single human-driven query.  Furthermore, these requests are being made not just by desktop or web applications but by mobile applications – especially iPhone applications.  As most people are aware, the next 10 billion devices that come online will be mobile devices (phones, MIDs, GPS, game units, media players).  Each of these is prized for its native application experiences.  Each of these devices will be making user-driven and automated calls to cloud services in order to deliver those experiences.
 
Where backend systems are not protected from this demand, they are being penalized in performance and load management.  This causes either outright outages, “web brownouts” where the core website that uses the same backend slows down, or erratic performance across both the web and cloud properties.  Again, mobile access exacerbates the issue due to the intermittent nature of mobile internet connectivity, which multiplies the number of connections that need to be set up and torn down as the device comes on and off the network.
 
So the explosion of concurrent usage is already beginning, as the traffic and backend impact is expanding.  To manage this and maintain stability of existing infrastructure, a new layer of infrastructure is emerging, much as HTTP load balancers have evolved to serve the needs of web computing.  What we’re seeing is the rise of cloud service controllers, a category of infrastructure that works well with existing systems and builds on top of the strengths of application servers, enterprise messaging systems, and application delivery controllers.

Tradeoffs in XML data transformations »

Daniel Jacobson of NPR posted a fascinating piece about how NPR tackles a common problem – what’s the best way to render content on a variety of devices, from modern web browsers with top-notch CSS implementations that look almost like typesetting (like Safari) to mobile phones using WAP to low-end devices like HD Radio receivers that don’t understand anything but plain ASCII text.

NPR’s clever solution is to strip markup out of the text and store it in a database table, indexed by position in the text document. To re-generate the content for a particular device, their software queries the database and re-applies the markup tags to the content according to what device it is rendering to.

This takes me back to the original reason SGML was invented and made an ISO standard in 1986. The idea was to describe the semantic meaning of text, and then to let a computer program figure out how to render it for human consumption.

SGML was a little over-engineered for that purpose, however, so a bunch of smart people got together in 1996 and invented XML. XML then begat technologies like HTML, XSLT, and CSS.

So today, instead of writing something like:

<h1 class=”headline”>This is a headline</h1><p class=”byline”><b>By I.M.A. Reporter</b></p><p class=”paragraph”>And here is my first paragraph with something in <i>italics</i>.</p>

 

XML lets us write:

<main_headline>This is a headline</main_headline><byline>By I.M.A. Reporter</byline><p>And here is my first paragraph with something in <i>italics</i>.</p>

 

The difference is that my second example isn’t HTML – it’s part of a document that uses an XML schema that’s up to me, and when writing it I don’t care if I’m coding for an HTML browser or for a car radio – I just have to identify when I’m writing a headline, or a byline, or a caption, and so on. I can now use XSLT or another transformation technology to transform this XML into very simple HTML for a simple browser, or into very complex HTML with links to a CSS stylesheet for a more sophisticated browser, or just into plain text. And if I decide that part of my XML schema should look just like HTML (like I did above with the “p” and “i” tags) then that’s fine too.

Other approaches and tradeoffs

NPR’s approach has a lot of benefts. Depending on your business and situation, this might mean lot of database processing, which could to be expensive to scale in either licenses or capacity.  Caching helps a lot in this case, since once is content there’s no need to do it again.

You could also solve this problem by writing the original content in very simple HTML or XML (in whatever schema one desires) and then by using something like XSLT to transform the content for each input device. This solution might be CPU-intensive but might compare favorably vs. database operations depending on what you are doing. Plus, XSLT processing can be easily scaled across thousands of parallel nodes if necessary without buying any more database licenses.  

If development resources and cycles are the constraint, a dedicated policy layer can help.  In the case of our Sonoa ServiceNet technology - you could configure transformation policies that leverage XPath or XSLT from within our proxy.   This might also make it easier to add and validate 3rd party APIs or feeds from outside your own database.   You can also handle other types of mediation such as versioning or protocol transformations, if that is in your use case, such as some of our Sonoa media and consumer web services customers do.

 

 

The API is more than the new CLI »

This article by Lori MacVittie of F5 makes some good points that whoever becomes the de facto API in cloud infrastructure might win - and goes as far to say that the API replaces the CLI.

Generally agree but might take it a step further.  

Just as we drove a 'de facto' standard CLI at Cisco, de facto standard "infrastructure APIs" likely will emerge.   (Already seeing this happening with the AWS API)

But APIs represent a significant evolution.  Why?  CLI commands and output are unstructured.  API commands and output are structured. 

I can point to an experience I had at Cisco. There is a Cisco CLI command called ‘show BGP summary' that gives the status of BGP peer – a good window into the status of the complete routing infrastructure.  In one of the releases, we changed the display a little bit and all hell broke loose - a ton of P1s to fix.

Turns out nearly all the operators ran the command output thru scripts, parsed them and used output in ops. The small formatting change we introduced broke their operations. We were forced to roll back the change.

With APIs, the output is structured and it would have been possible to introduce additional information without breaking integrations.  

Even though Infrastructure APIs will take over some of what was integration with the CLI, a properly managed API allows evolution, migration and co-existence of multiple versions easily.