If you've ever worked with CloudWatch Logs, you'll know it's a powerful tool with one huge caveat - getting the logs out is nearly impossible. For my current client, we were tasked with exporting and translating the CloudWatch Logs (in this case VPC Flow Logs) to something consumable by a third party security monitoring platform.
The platform had a format it was capable of ingesting, and we knew we could hook it up to S3 to pull down the logs. What we didn't know was how we were going to get them to a bucket in the format required in as close to real time as possible.
The tools being leveraged in this little Rube Goldberg inspired machine are:
The process for setting this all up is fairly straight forward. I'll summarize it here:
- Setup FlowLogs on the VPC of your choice.
- Create a Kinesis Firehose stream with a corresponding S3 bucket to write to. You can leave the default buffer information for now (5 MB/5 min).
- Create a Lambda function (you can dump the script from below in there), modifying the Kinesis Firehose stream name.
- From the CloudWatch Logs page, create a filter subscription by selecting the Log Group, selecting actions and selecting "Start Streaming to Lambda Service"
- For the log format, select "Amazon VPC Flow Logs". You'll see a filter displayed in the text box below. By default, this filter will send all valid traffic to Lambda. You can modify this to only send certain request if needed.
If your IAM permissions are setup properly, logs should start flowing into S3 in real time. You can modify the output being sent to Kinesis, or perform further manipulation on the data if needed.
This Lambda script should be selected as the subscription to the Log Stream and have permission to read CloudWatch Logs and write to the Kinesis Firehose stream.
I generally prefer utilizing Python in Lambda, thanks to AWS' well documented library, boto3.
A few observations about this whole process:
- When sending CloudWatch Logs to Lambda, the data is not only encoded base64, but also zipped. You'll need to take this into account when working with the data.
- Kinesis may not be completely necessary, but it's relatively inexpensive, allows me to zip on the fly and ensures the data's making it to S3.
- When sending bulk records to Kinesis using client.put_record_batch(), you're limited to 500 records per batch.
That about wraps it up. While the workflow above focuses primarily on Flow Logs, it's applicable to any type of log being sent to CloudWatch Logs. It's also pretty powerful when considering it's price (cheap) and on demand nature. Having the ability to translate data in stream isn't a new concept, but being able to perform a complex workflow such as this with relatively little code is a pretty big deal, in my opinion.
Straight to S3
If you'd prefer to send the data right to S3, without worrying about the whole Kinesis stuff, you can do that as well. I will note that doing so does create a ton of files, and may not be the best option for organizations with many EC2 instances.
On the plus side, this does give you more granular control of the folder structure and extensions of the files.
The example below was used to export CloudWatch FlowLogs to ArcSight.
And of course, I'm no developer, so if anyone out there has suggestions for making these snippets more effective, please let me know in the comments or on Twitter.