Data loss prevention isn’t a new idea, but it’s a concept that’s become increasingly important to IT as organizations recognize the threat to their operations from leaks by disgruntled insiders or intrusions by hostile outsiders. At least, that’s the pitch DLP (data-loss-prevention) vendors use.
But in most cases, notes Securosis analyst and CEO Rich Mogull, organizations that deploy DLP find data leaks are more likely to be caused by accident or by bad procedures rather than malice, As he explained in an interview with eWEEK, when the causes of leaks are explored, the “whoops” factor surfaces repeatedly. Someone transmits unencrypted medical data in violation of HIPAA (Health Insurance Portability and Accountability Act) rules, or a file containing credit card numbers is moved into an unsecured area. This sort of thing, when discovered during an audit, can be a career-killer; if it becomes a news story, it’s damaging to the reputation of the business itself.
Of course, that doesn’t mean that companies not subject to regulatory regimes-such as HIPAA, Sarbanes-Oxley Act or others-can simply pass on implementing DLP. As Mogull explained, the risk of data loss isn’t always visible: When data is stolen, or merely mishandled, “you don’t even have the base monitoring to know about the problem.”
How can an organization introduce DLP in an effective manner, when the potential for leakage or loss is so pervasive? Let’s start with a conceptual discussion before moving on to specifics.
One can begin thinking about DLP by treating data as being in one or more states: data in motion, data at rest and data in use. But there’s a danger in focusing on only one of these aspects, because methods that work exceedingly well on, say, the network-or data in motion-may be of little or no use against threats that seek to obtain data in use at an endpoint. A sound DLP strategy will consider all three of these against the needs of an organization, whether these are regulatory, operational or cultural. The poser, for both IT managers and security specialists, is that no single product adequately addresses all three categories.
Or if one chooses to look at DLP from another perspective, one can consider it from the standpoint of threat vectors. In this view, the tripod’s legs are email, the Web and the endpoint. Protecting against the first two is fairly well understood and easily implemented. The third is a little more complicated. Removable media can be blocked or screened, but the near-ubiquity of phone-based cameras makes it possible to record on-screen data, albeit in a clumsy and terribly obvious fashion.
The first step in implementing a DLP strategy is data identification. Although it may be easy to specify the general nature of the data to be protected, such as financial records, customer information or product plans, it’s not always that simple to assign a risk value to an individual document. Mogull points out that one has to “understand what to protect.”
Context + Content
Perhaps it’s best to regard the context of data, with its content, as two sides of a coin. Context can take the form of file metadata, email headers or the application that’s consuming the data. In more complex forms of context analysis, a DLP process might look at file formats or network protocols, or use network information from a DHCP server and a directory service to identify who’s consuming the data. This can be expanded to take specific Web services or network destinations into account or identify individual storage devices such as a USB drive.
Content, as one might think, is pretty self-explanatory. Being aware of the contents of data can often give one a good indication of what kind of protection needs to be applied. Analyzing content is where things get tricky, because one has to start with the context of data, and then examine the contents. This might take a rules-based approach using regular expressions, file matching, database fingerprinting or statistical analysis. This “content awareness,” as Securosis’ Mogull puts it, is what defines true DLP.
Further stages of DLP Implementation are where things can become complicated. For example, addressing DLP in email seems relatively straightforward, because of the nature of the medium. Many products that address other email security threats also offer some DLP functionality-in a fashion that Mogull refers to as “DLP Light”-and if one chooses to bring a dedicated system to insert another mail transfer agent that provides a DLP layer, it’s unlikely to be noticed by users. The downside to such a solution is that it may cover one’s external email traffic well, but leave internal traffic unprotected.
A similar situation can be seen with network-based DLP. DLP products will often work with existing reverse proxy features of an Internet gateway to inspect SSL-encrypted traffic. In a recent report from Palo Alto Networks, sampled organizations in the United States showed that 20.7 percent of bandwidth consisted of SSL, on port 443 or other ports. The same traffic analysis showed that one or more implementations of the Tor onion router were running on 15 percent of survey networks worldwide. The most a DLP solution can do for such traffic is to flag it or block it altogether, without actually identifying what the traffic consists of.
In storage, Mogull and Securosis are observing less “DLP Light” but better integration, thanks in part to the ability to tap into databases and document management systems. Because of the nature of these systems, real-time DLP monitoring is often limited to filterlike techniques-categories, patterns and rules-because anything deeper can present an obstacle to achieving optimal system performance.
Whatever one chooses as part of a DLP strategy, it’s important to make sure that it offers a clean user interface and solid reporting tools. Although those may seem like obvious criteria, Mogull observed that DLP tools are sometimes so engineering-driven that their designers forget that the users of the tools- who may not work in IT at all-need a simple and efficient way to address potential problems. After all, there will be occasions when an immediate response is needed, and the interface should be a help rather than a hindrance.
An area of DLP that isn’t often discussed is what to do when data appears to have been leaked. All too often, these efforts, which are reactive by their very nature, take on the aspects of a witch hunt. These can often do more damage than the actual data loss, by virtue of their effect on the morale of the organization and its customers and partners. That’s why it’s important to keep Mogull’s point about intent in mind, or to paraphrase a common saying, “don’t assume malice when simple carelessness will suffice.”
One thing that should give DLP implementers hope, according to Mogull, is that the market is starting to mature, even as the technology remains ahead of adoption. Arguably, the hardest thing for IT and security managers to cope with today is making room in their budgets for tools that are appropriate for their organizations, whether that’s viewed from a threat perspective or from the available skill sets within the company. Of course, that’s one problem that almost never goes away. At least, until it’s too late.