Reading the fine print for Data Loss Prevention (DLP) in Office 365

Back to Blog

Reading the fine print for Data Loss Prevention (DLP) in Office 365

After implementing DLP policies in your organization, you might consider testing it out. Let’s say you implemented Microsoft’s DLP policy for identifying U.S. Social Security Numbers, which are a nine-digit string of numbers, often formatted XXX-XX-XXXX (sometimes with dashes, sometimes without).  You decide to draft an email containing such a number. You hit send. It shows up quickly in the recipient’s mailbox, and apparently no action has been taken by DLP.

You think to yourself: Why isn’t this working?

I have had a number of customers ask me about this; or they say that their experience with it is “hit or miss”–sometimes it seems to catch the sensitive content and sometimes not.

The reason is usually because there are no “keywords” within 300 characters–for example with regard to U.S. Social Security numbers, the words: “Social Security” or “SSN.”  Here is a listing of the all the sensitive information types in Exchange 2016 (applies also to Office 365 Exchange Online). You can see the detail for all the DLP “criteria” and confidence ratings.

Now, it is possible to adjust the sensitivity from the default minimum confidence level (75%) down to 55%, however, even at this level, a keyword will be necessary in addition to the number pattern.

Even the lowest defined confidence rating of 55% for Social Security Numbers looks like this:

A DLP policy is 55% confident that it’s detected this type of sensitive information if, within a proximity of 300 characters:

  • The function Func_randomized_unformatted_ssn finds content that matches the pattern.
  • A keyword from Keyword_ssn is found.
  • At least one of the following is true:
    • The function Func_us_date finds a date in the right date format.
    • The function Func_us_address finds an address in the right date format.

The “Keyword_ssn” table contains the following keywords:

Keyword_ssn
Social Security

Social Security#

Soc Sec

SSN

SSNS

SSN#

SS#

SSID

 

Example: if you sent a table that contained names and numbers, it would only flag the content and take action if the table has a header row, that says “Name” and “SSN” or “Social Security” or similar. Or if the table is given some other kind of description that matches the keyword list.

But why can’t DLP just recognize and take action based on the number pattern only?

Microsoft was very deliberate in how they wrote DLP. Without additional confidence criteria, there could be issues. For example, SSN’s and ABA routing numbers are both nine digits long, but you may want a policy that takes different actions for each of these sensitive data types. Further, you could end up with more false positives if the actions were taken based on number strings alone.

Writing your own rules: How to automatically encrypt sensitive data patterns

There are several ways you could approach this, if you wanted to take your chances with false positives and so on. It is possible to customize the XML files that inform DLP, for example. That sounds fairly complex for most small business admins. Now on the other hand, you can also use regular expressions in mail transport rules to create your own pattern recognition (without keyword criteria).

See the below example of a rule which would encrypt messages containing what appear to be nine-digit strings, resembling Social Security Numbers (formatted with hyphens as XXX-XX-XXXX). So even without DLP, you could write some of your own criteria, and have it take a certain action based on that criteria (reject the message, encrypt the message, etc.).

Under Apply this rule if…. Choose The subject or body matches… and then type \d\d\d-\d\d-\d\d\d\d

However, be careful.  None of the DLP policies or indeed, any other policies/transport rules that you may “slap together” are in any way a guarantee of safety, security or compliance. So don’t think that this is going to be ‘set it and forget it’ or that you can claim any kind of compliance simply because you have a DLP or transport rule in place. If someone is determined to share information out, they will figure out a way. And the technology isn’t capable of recognizing and catching everything, either. So go into this with eyes wide open. It’s a feature you will have to support once it is in place, it will require some user education, and it is far from being a panacea.

You have been warned. Best of luck.

 

Comments (8)

  • Pascal Reply

    Hi Alex,

    After reading many articles about OME en DLP, we have decided to implement both.
    Everything works well, de rules are hit. Mails are encrypted.
    We use also policy tips. But they where not shown to the user, but the rule is hit bij OME en DLP.
    I found several documents about this issue, but none of them helps me.
    The file with the information about the policy tips are placed in the user profile folder, but it seems that the client doesn’t get the sign from the 365 of onpremise exchange 2016 server (We have a hybrid configuration) to shown the policy tip.

    When I read your articles, I think you know a lot of knowledge about OME and DLP. Do you have an idee of how I can find the issue.

    Thanks for reaction

    April 25, 2018 at 1:35 pm
    • Alex Reply

      Sorry for the delay in my response here–I know that policy tips aren’t going to work on older versions of the Outlook client. Assuming you’ve set them up in the wizards online, and they are activated, then any Outlook 2016 client that is up to date, should show the policy tips.

      May 18, 2018 at 2:09 pm
  • John Kane Reply

    We got DLP to work well. your comments re keywords are absolutely spot on – but we learned that the hard way. however, what we also learned was that if you configure DLP to prompt you with an advisory message and an override facility to still send the message, this does not work with O365 for MAC. Be warned.

    May 15, 2018 at 2:28 am
  • Barry Reply

    Please let me know if you’re looking for a article writer for your blog.

    You have some really great posts and I think I would be a good asset.
    If you ever want to take some of the load off, I’d really like to write some content for your blog in exchange for a link back to mine.
    Please blast me an email if interested. Kudos!

    December 9, 2020 at 3:19 am
  • Craig Reply

    I found another article someone had written that stated using this expression may work better so it doesn’t identify if there were preceeding or proceeding numbers as well and in case someone used periods or dashes instead.

    January 11, 2021 at 1:13 pm
    • Craig Reply

      ^\d\d\d(\s|.|-)\d\d(\s|.|-)\d\d\d\d(\s|$)

      January 11, 2021 at 1:13 pm
      • Not Craig Reply

        You shouldn’t have a start anchor in that regex, or if you do, do it similar to your end anchor. `(^|\s)`

        November 17, 2022 at 1:25 pm
    • Alex Reply

      I find the built-in detection methods have been working pretty well overall, and I believe no matter how the number is split up (or not split up). As long as the algorithm finds any references like SSN, social, social security, etc., then it will treat the number string as an SSN and take action accordingly.

      January 14, 2021 at 9:06 am

Leave a Reply

Back to Blog

Helping IT Consultants Succeed in the Microsoft Cloud

Have a Question? Contact me today.