Split Microsoft Sentinel Tables with Multi-Destination Data Collection Rules

Thank you to my colleague Maria de Sousa-Valadas Castano, Adi Biran, and the team for assisting in writing this content and demos.

Looking to better manage where logs go when they are ingested? Enter the mutli-destination data collection rule.

Recently, the team has released a new data collection rule functionality that allows for data ingestion streams to be split into more than one table. This leverages existing functionality and allows opportunities for use cases such as:

  • Table volume management: Break up larger tables that may have essential and non-essential security data.
  • Basic log configuration: Break out subsets of data to be configured for basic log usage at a lower cost.
  • Data normalization: Break out subsets of data to be normalized into tables.
  • Query performance improvement: With data being split out from the main table, there will be less residing data in the main table. This will allow for queries to perform better for day to day SOC operations and queries.
  • and more.

This is a great option for SOC teams and organizations that are looking to break out security valuable data from general information that may exist in the same ingestion stream. This also provides additional functionality for log management, as data split out from the main table can be placed into a custom table that has been configured for lower cost ingestion with the basic log tier.

Using with Basic Logs 

Once data has been split to go to a custom table, that custom table is eligible to be moved to the basic log tier as the data is being ingested. The process to enable this would be via the documented process. As a reminder, the basic log tier allows for logs to be ingested at $1 per GB vs. the full price in the Analytics log tier. The logs will remain available for querying on-demand for 8 days before the data moves to archive (if configured).

A popular example would be modifying a Syslog ingest stream that contains high volume, low value data to go to a custom table.  For example:

  • We are collecting log levels warning, error, critical, alert and emergency from local4), and then do one or more transformations.
  • We are excluding logs based on the SyslogMessage field from the Syslog table and we are using this criteria to send the logs to a custom logs table we have converted into basic logs. This way, we will still have those logs in our workspace for compliance reasons, but we also get to reduce our ingestion volume into the Analytics tier.

Following the process highlighted in the document above, the template is modified to appear as the following:

{
  "properties": {
    "dataSources": {
      "syslog": [{
        "streams": [
          "Microsoft-Syslog"
         ],
        "facilityNames": [
          "local4"
         ],
         "logLevels": [
           "Warning",
           "Error",
           "Critical",
           "Alert",
           "Emergency"
           ],
          "name": {
           "sysLogsDataSource-1688419672"
           }]
          },
         "destinations": {
           "logAnalytics": [{
             "workspaceResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-resource-group/providers/Microsoft.OperationalInsights/workspaces/my-workspace",
             "workspaceId": "532cd4d7-b4eb-41ad-80dc-f2f6435094e8",
             "name": "myworkspace"
             }]
            },
         "dataFlows": [{
           "streams": [
             "Microsoft-Syslog"
            ],
         "destinations": [
            "myworkspace"
          ],
         "transformKql": "source | where SyslogMessage !has 'Scheduled restart job'",
         "outputStream": "Microsoft-Syslog"
        },
        {
          "streams": [
            "Microsoft-Microsoft-Syslog"
          ],
          "destinations": [
            "myworkspace"
          ],
          "transformKql": "source | where SyslogMessage has 'Scheduled restart job' | extend RawData = SyslogMessage",
          "outputStream": "Custom-TransformedSyslog_CL"
         }
       ]
    }
} 

How It Works:

The new feature uses the following existing components within data collection rules:

  1. Stream: Name of the data ingestion stream
  2. Destinations: Workspace that the data will be sent to
  3. TransformKql: KQL that will be used to identify or transform data that should be acted on during ingestion time
  4. OutputStream: Specified destination tables that the data will be sent to post transformation

The new functionality allows users to configure data collection rules to leverage transformKql as the logic for grabbing the data coming in and leverage outputStream to send that data to a different table. A simplified example looks like:

"dataFlows": [
                    {
                        "streams": [
                            "Custom-MyTableRawData"
                        ],
                        "destinations": [
                            "clv2ws1"
                        ],
                        "transformKql": "source | project TimeGenerated = Time, Computer, SyslogMessage = AdditionalContext",
                        "outputStream": "Microsoft-Syslog"
                    },
                    {
                        "streams": [
                            "Custom-MyTableRawData"
                        ],
                        "destinations": [
                            "clv2ws1"
                        ],
                        "transformKql": "source | extend jsonContext = parse_json(AdditionalContext) | project TimeGenerated = Time, Computer, AdditionalContext = jsonContext, ExtendedColumn=tostring(jsonContext.CounterName)",
                        "outputStream": "Custom-MyTable_CL"
                    }
                ]

In this example, data is being ingested into the Syslog table as normal (Microsoft-Syslog). Within the transformKql, the configuration is looking for specific context within the data in order to determine which logs to send to the custom table (Custom-MyTable_CL). By using this flow, it is possible to ingest data into a main table while breaking off data into other specified tables. Syslog is just one of many tables that can benefit from this functionality.

Building It Out

Existing DCR

If modifying an existing DCR:

  1. Go to the Azure Portal.
  2. Go to Monitor.
  3. Go to Data Collection Rules.
  4. Find the existing DCR to modify.
  5. In the left menu, go to ‘Export Template'.

Matt_Lowe_0-1686870364411.png

      6. Click the ‘Deploy' button.

Matt_Lowe_1-1686871261802.png

      7. Click ‘Edit Template'.

      8. Within the body of the JSON, make the changes to split the table.

Note: This is just an example, not a recommendation to move these security events out of the native table.Note: This is just an example, not a recommendation to move these security events out of the native table.

      9. Once done, click ‘done'.

     10. Make sure the required information is correct.

     11. Click ‘review and create'.

     12. Once validation is passed, click ‘create'.

New DCR

If creating a new DCR:

  1. Go to the Azure Portal.
  2. If building from scratch, follow the documented steps here.
  3. If using an existing template, either take an existing template (via the process above) or reference a documented template.
    1. Go to the Azure Portal.
    2. In the search bar within the portal, enter ‘deploy a custom template'. 
    3. Choose ‘build your own template'.
    4. Paste the template from step 3 into the editor.
    5. Fill out the key information in the template: name, location, dataCollectionEndpointId, stream details, workspaceResourceId, and the dataflow section.
    6. Once everything is ready, click ‘done'.
    7. Confirm that the required data parameters are correct. If so, click ‘review and create'.
    8. Once validation has passed, click ‘create'.

Things to Consider

Types of Data Collection Rules

There are three types of data collection rules today:

  • WorkspaceTransform (default) rules
  • AMA based rules
  • Custom log rules

WorkspaceTransform rules, also referred to as default rules, are tied to tables that are ingesting data that is not coming from the Agent. If ingesting data via methods that are not tied to AMA, default DCR's should be used. The instructions for them can be found here. If ingesting data via AMA, DCR's created via the wizard in Azure Monitor should be used. Custom log rules are created when establishing a new table within the workspace. These instructions cover create one.

Custom Tables

If looking to leverage a custom table as one of the output destinations, the table will need to be created before the table splitting is performed. If attempting to split a table and send data to a custom table that does not exist, the DCR will generate an error upon deployment.

Excluded Tables

You can use most streams in your input/output, but bear in mind the following ones are forbidden:

  • Microsoft-Heartbeat
  • Microsoft-Usage
  • Microsoft-OperationJson
  • Microsoft-OperationLog
  • Microsoft-AzureActivityV2

And that's it. This scenario is just another example of the expanding use case library for AMA and DCRs in combination with Microsoft Sentinel. May this assist in breaking down larger tables and improve cost management/query performance.

 

This article was originally published by Microsoft's Sentinel Blog. You can find the original article here.