API Best Practices Blog
Indecent Exposure: API Data & Threat Protection »
None of us ever want to see accidental customer data exposure through our APIs. Especially like Paris experienced last year (an API auth issue)
Last time in our series we talked about identity, authentication, and authorization. But another important aspect of API management is data protection - specifically:
> Encryption - protecting your API data from eavesdropping.
> Threat detection - ensuring client API requests won't cause back-end problems.
> Data masking - preventing inadvertent data exposure in API responses.
First, protecting sensitive data requires knowing what and where it is, and how sensitive it is to you and your customers. There are a lot of regulations and best practices for Social Security numbers, credit card numbers, home addresses, birthdays, and such (more on compliance in another post - data protection is just one aspect). Once you know which data is sensitive, you can think about how to best protect it in your API.
Encryption
Encryption is a basic technology that must be part of any API with sensitive data. Some relevant technologies include SSL and XML encryption. For most APIs, the most critical encryption mechanism is SSL. It works on every platform, and the overhead is minimal (less than 5% through our own products in benchmark tests.) Using SSL to encrypt any sensitive data is the least your API should do.
Another alternative is to encrypt either all or part of each message using XML Encryption. (This is a W3C standard implemented by many SOA products.) However, this requires you and your clients to manage public/private key pairs, so deploying this technology can be more complex and it has a larger performance impact. But XML Encryption is tremendously useful when it’s important to manage sensitive data behind the API. For instance, if API data must not only be transmitted securely over the Internet, but also stored in internal systems in encrypted form on a disk or in a database. Otherwise, stick with SSL.
Threat Detection
Any server that receives data over the Internet is subject to attack. Some attacks are more specific to an API and merit additional consideration.
The first is SQL injection. This attack takes advantage of internal systems that construct database queries using string concatenation. If there’s a way to take data from the client and paste it inside a database query, then there may be a way to compromise the system using SQL injection. The best way to prevent SQL injection is to code the back-end systems so that an SQL injection attempt is not possible. But it’s also important to stop SQL injection attempts before they get to the back end.
The second is an XML attack - taking advantage of the flexibility of XML by constructing a document that could cause a problem for a back-end system. For example, causing the XML software to try to allocate more memory than is available. Or, an XML document that is nested many levels deep, or with extremely large entity names or comments.
A simple check to ensure that XML or JSON is well-formed can save resources on the back end that would otherwise be devoted to generating and logging error messages. These attacks aren’t always intentional. Ever used an API like StAX to construct an XML document, but forgotten to add all the “end tags?” An invalid XML document that appears to be nested very deep can cause problems for the back end servers, or at least tie up more CPU and memory resources. Fortunately, there are many products, including our Apigee Enterprise, that contain pre-defined policy templates to defend against these types of attacks.
Data Masking
In some cases, it may make sense to try and re-use internal services and data. However, you might need to screen – or mask- some private data for the API or for some users of the API. This means using an XML or JSON transformation to either remove or obfuscate certain data from an API response. While this technique must be used with care – there may be cases in which only certain API users are authorized to see certain information. For example, there might be an API call that returns a “user” record with all the details when the user him or herself calls this API, but only limited data when a customer service reps access the “user” record using the API.
In that case, you could implement this by building only one version of the API on a back end server and adding a data transformation rule that removes the user’s home address if the request is coming from a CSR’s account. If you have many services, you might consider having a common layer that performs these types of transformations – especially if you ever want to add or manipulate payload fields as well as masking or clipping fields. (more on that in a section on API Mediation).
Recommendations
- Use SSL when the API includes sensitive data, or if the authentication mechanism in use does not include an encryption component. (HTTP basic authentication, for instance, allows an eavesdropper to intercept the password unless SSL is used; OAuth does not.)
- Always defend against SQL injection, either in the back end server, at the edge of the network, or both.
- If your API accepts XML input via HTTP POST or some other way, then defend against the many types of XML attacks. These include large inputs, payloads or attachments, ‘header bombs’, replay attacks, message tampering and more.
- Consider using data masking in a common transformation layer if your back end servers may return some data that should not be given out to all users of the API.
Up next: Compliance (and thanks to carbonnyc for the photo)




