Reading the fine print for Data Loss Prevention (DLP) in Office 365Alex Fields
After implementing DLP policies in your organization, you might consider testing it out. Let’s say you implemented Microsoft’s DLP policy for identifying U.S. Social Security Numbers, which are a nine-digit string of numbers, often formatted XXX-XX-XXXX (sometimes with dashes, sometimes without). You decide to draft an email containing such a number. You hit send. It shows up quickly in the recipient’s mailbox, and apparently no action has been taken by DLP.
You think to yourself: Why isn’t this working?
I have had a number of customers ask me about this; or they say that their experience with it is “hit or miss”–sometimes it seems to catch the sensitive content and sometimes not.
The reason is usually because there are no “keywords” within 300 characters–for example with regard to U.S. Social Security numbers, the words: “Social Security” or “SSN.” Here is a listing of the all the sensitive information types in Exchange 2016 (applies also to Office 365 Exchange Online). You can see the detail for all the DLP “criteria” and confidence ratings.
Now, it is possible to adjust the sensitivity from the default minimum confidence level (75%) down to 55%, however, even at this level, a keyword will be necessary in addition to the number pattern.
Even the lowest defined confidence rating of 55% for Social Security Numbers looks like this:
A DLP policy is 55% confident that it’s detected this type of sensitive information if, within a proximity of 300 characters:
- The function
Func_randomized_unformatted_ssnfinds content that matches the pattern.
- A keyword from
- At least one of the following is true:
- The function
Func_us_datefinds a date in the right date format.
- The function
Func_us_addressfinds an address in the right date format.
- The function
The “Keyword_ssn” table contains the following keywords:
Example: if you sent a table that contained names and numbers, it would only flag the content and take action if the table has a header row, that says “Name” and “SSN” or “Social Security” or similar. Or if the table is given some other kind of description that matches the keyword list.
But why can’t DLP just recognize and take action based on the number pattern only?
Microsoft was very deliberate in how they wrote DLP. Without additional confidence criteria, there could be issues. For example, SSN’s and ABA routing numbers are both nine digits long, but you may want a policy that takes different actions for each of these sensitive data types. Further, you could end up with more false positives if the actions were taken based on number strings alone.
Writing your own rules: How to automatically encrypt sensitive data patterns
There are several ways you could approach this, if you wanted to take your chances with false positives and so on. It is possible to customize the XML files that inform DLP, for example. That sounds fairly complex for most small business admins. Now on the other hand, you can also use regular expressions in mail transport rules to create your own pattern recognition (without keyword criteria).
See the below example of a rule which would encrypt messages containing what appear to be nine-digit strings, resembling Social Security Numbers (formatted with hyphens as XXX-XX-XXXX). So even without DLP, you could write some of your own criteria, and have it take a certain action based on that criteria (reject the message, encrypt the message, etc.).
Under Apply this rule if…. Choose The subject or body matches… and then type \d\d\d-\d\d-\d\d\d\d
However, be careful. None of the DLP policies or indeed, any other policies/transport rules that you may “slap together” are in any way a guarantee of safety, security or compliance. So don’t think that this is going to be ‘set it and forget it’ or that you can claim any kind of compliance simply because you have a DLP or transport rule in place. If someone is determined to share information out, they will figure out a way. And the technology isn’t capable of recognizing and catching everything, either. So go into this with eyes wide open. It’s a feature you will have to support once it is in place, it will require some user education, and it is far from being a panacea.
You have been warned. Best of luck.