Centralised Logging for Lambda@Edge

The post presents a possibility to centralise Lambda@Edge logs into one place. As an architect and troubleshooter of a serverless application utilising Lambda@Edge, I find it difficult to access logs produced by Lambda@Edge when users experience issues. This is due to the fact that Lambda@Edge utilises CloudFront edge locations to distribute your Lambda functions across the whole AWS world, and when a user accesses your CloudFront content, the logs will appear in the region that’s closest to that edge location, and not in us-east-1 where Lambda@Edge is deployed! The logs are sometimes difficult to find, particularly when you have no idea where the user having issues made the request from, or when you’re using all edge location provided by CloudFront service.

 

 

Methodology


First of let me explain you how I approach this problem. For this project, I’ll utilise streaming of Cloudwatch logs produced by Lambda@Edge functions into a public Elasticsearch cluster using a Lambda function. The deployment of services behind logging itself is partially automated using Terraform. Unfortunately, few things have to be done manually, as it’s not yet available by any Terraform module. However, the point of this post is not to limit the deployment to AWS Elasticsearch only, the endpoint of the logs can be anything. You can use HTTP POST request to send data to a cloud log analyzer service, or even set the endpoint to a Syslog server!

 

 

You can stream Cloudwatch logs into Elasticsearch directly without the use of another Lambda function, however, there's a problem! Cloudwatch logs can streamed into Elasticsearch that's located within the same region as the Cloudwatch log group itself. Combined with Lambda@Edge's logs created in multiple regions, you'll need Elasticsearch cluster in every region, ending up with the same problem of decentralised logs. Therefore, one public Elasticsearch cluster is created (in a preferred region) and Cloudwatch logs are proxied into the public Elasticsearch endpoint.

 

 

The Project Itself


The project is split between multiple files the create or deploy a resource, it looks like this:

 

lambda_edge_centralised_logging
├── backend.tf
├── elasticsearch_cluster.tf
├── lambda_resource
│   └── log_proxy_function.zip
├── log_proxy_lambda_deployment.tf
├── provider.tf
└── vars.tf

 

FILEDESCRIPTION
backend.tfCreates a backend for this project to backup the state file into an s3 bucket.
elacticsearch_cluster.tfThe file deploys an Elasticsearch cluster. Of course this is optional, what the endpoint will be is purely up to you in this case. I tried to utilise available AWS services, I'm aware if you run a small application, an AWS Elasticsearch cluster is a huge expenditure (at least for me personally).
lambda_resource/log_proxy_function.zipThis is a Lambda function written in Python that will proxy CloudWatch logs into Elasticsearch cluster.
log_proxy_lambda_deployment.tfDeploys log_proxy_function.zip into all necessary AWS regions.
provider.tfAll Terraform providers that are utilised within this project, each corresponding to an AWS region.
vars.tfVariables used across the project.

 

Log Proxy Lambda Function


The following block of code is the Lambda function required to proxy Cloudwatch logs into Elasticsearch. The Python script is attached to a Lambda deployment package, since it requires libraries not included in the standard AWS Lambda environment, such as elasticsearch, or gzip. The code also contains, gzip and base64 Python packages, that is due to Cloudwatch providing only zipped and base64 encoded log data. The Log Proxy Lambda has to therefore unzip and decode the data before sending it to Elasticsearch.

 

from elasticsearch import Elasticsearch, RequestsHttpConnection
from aws_requests_auth.aws_auth import AWSRequestsAuth
from botocore.vendored import requests
import json
import os
import base64
import gzip
from io import StringIO


# Function to authenticate to the AWS Elasticsearch service
def authenticate_to_aws():
    aws_auth = AWSRequestsAuth(
    aws_access_key = os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY'],
    aws_token = os.environ['AWS_SESSION_TOKEN'],
    aws_host = os.environ['ES_HOST'],
    aws_region = os.environ['ES_REGION'],
    aws_service = 'es')
    return aws_auth


# Function to parse and send Cloudwatch logs into Elasticsearch cluster
def lambda_handler(event, context):
    es_host = os.environ['ES_CONNECT_STRING']

    es = Elasticsearch(
        es_host,
        use_ssl = True,
        verify_certs = True,
        scheme = "https",
        connection_class = RequestsHttpConnection,
        http_auth = authenticate_to_aws())
    # Extract data from Cloudwatch event begin sent here
    cw_data = event['awslogs']['data']
    # Decode Cloudwatch event data
    compressed_payload = base64.b64decode(cw_data)
    # Unzip compressed Cloudwatch event data
    uncompressed_payload = gzip.decompress(compressed_payload)
    # JSONify the payload, as we#ll need to loop through it
    payload = json.loads(uncompressed_payload)

    # Send data over to Elasticsearch
    # This is in a for loop, because sometimes multiple log files are contained within a single Cloudwatch payload
    for log_event in payload['logEvents']:
        es.index(index = 'edge_application_logs', doc_type = 'lambda_edge', body = log_event)

 

 

Lambda Function Deployment


First of create all necessary Terraform providers.

 

The file will change depending on your user base for example, if your customers are and will be located in Asia and Europe, you probably won't need to deploy Log Proxy Lambda into regions such as us-east-1.

 

 

provider "aws" {
  region  = "eu-west-1"
  profile = "aws_terraform_profile"
  alias   = "ireland"
}

provider "aws" {
  region  = "us-east-1"
  profile = "aws_terraform_profile"
  alias   = "nvirginia"
}

provider "aws" {
  region  = "ap-southeast-2"
  profile = "aws_terraform_profile"
  alias   = "sydney"
}

 

The Terraform file contains creation of a role necessary to create Log Proxy Lambda function with appropriate permissions. The file deploys 3 identical Lambda functions to 3 different AWS regions:

 

  • Ireland (handled by aws.ireland provider)
  • Northern Virginia (handled by aws.nvirginia provider)
  • Sydney (handled by aws.sydney provider)

 

 

By default, aws_lambda_functionTerraform module checks for function presence, and not for sum of the deployment package, so if there are any changes to Log Proxy Lambda function, Terraform will not upload new deployment package. Therefore, source_code_hashfunction of aws_lambda_functionmodule is used to prevent this behaviour. Of course you can remove sum checking if this is not desired in your use case. The deployment also contains few variables inherited from deployment of the Elasticsearch cluster.

 

data "aws_iam_policy_document" "main_lambda_assume_policy" {
  statement {
    actions = [
      "sts:AssumeRole",
    ]

    principals {
      type = "Service"

      identifiers = [
        "lambda.amazonaws.com",
        "events.amazonaws.com",
        "edgelambda.amazonaws.com",
      ]
    }
  }

  provider = "aws.ireland"
}

resource "aws_iam_role" "default_lambda_role" {
  name               = "log_proxy_lambda_role"
  assume_role_policy = "${data.aws_iam_policy_document.main_lambda_assume_policy.json}"
  provider           = "aws.ireland"
}

resource "aws_iam_role_policy_attachment" "attach_lambda_execution_policy_to_role" {
  role       = "${aws_iam_role.default_lambda_role.name}"
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
  provider   = "aws.ireland"
}

resource "aws_lambda_function" "log_proxy_lambda_ireland" {
  filename         = "lambda_resource/log_proxy_function.zip"
  function_name    = "log_proxy"
  role             = "${aws_iam_role.default_lambda_role.arn}"
  handler          = "lambda_function.lambda_handler"
  runtime          = "python3.6"
  source_code_hash = "${base64sha256(file("lambda_resource/log_proxy_function.zip"))}"
  timeout          = 60
  description      = "Proxies application logs to ES cluster"
  provider         = "aws.ireland"

  environment {
    variables = {
      ES_HOST           = "${aws_elasticsearch_domain.elasticsearch_log_cluster.endpoint}"
      ES_REGION         = "eu-west-1"
      ES_CONNECT_STRING = "https://${aws_elasticsearch_domain.elasticsearch_log_cluster.endpoint}:443"
    }
  }
}

resource "aws_lambda_function" "log_proxy_lambda_frankfurt" {
  filename         = "lambda_resource/log_proxy_function.zip"
  function_name    = "log_proxy"
  role             = "${aws_iam_role.default_lambda_role.arn}"
  handler          = "lambda_function.lambda_handler"
  runtime          = "python3.6"
  source_code_hash = "${base64sha256(file("lambda_resource/log_proxy_function.zip"))}"
  timeout          = 60
  description      = "Proxies application logs to ES cluster"
  provider         = "aws.nvirginia"

  environment {
    variables = {
      ES_HOST           = "${aws_elasticsearch_domain.elasticsearch_log_cluster.endpoint}"
      ES_REGION         = "eu-west-1"
      ES_CONNECT_STRING = "https://${aws_elasticsearch_domain.elasticsearch_log_cluster.endpoint}:443"
    }
  }
}

resource "aws_lambda_function" "log_proxy_lambda_sydney" {
  filename         = "lambda_resource/log_proxy_function.zip"
  function_name    = "log_proxy"
  role             = "${aws_iam_role.default_lambda_role.arn}"
  handler          = "lambda_function.lambda_handler"
  runtime          = "python3.6"
  source_code_hash = "${base64sha256(file("lambda_resource/log_proxy_function.zip"))}"
  timeout          = 60
  description      = "Proxies application logs to ES cluster"
  provider         = "aws.sydney"

  environment {
    variables = {
      ES_HOST           = "${aws_elasticsearch_domain.elasticsearch_log_cluster.endpoint}"
      ES_REGION         = "eu-west-1"
      ES_CONNECT_STRING = "https://${aws_elasticsearch_domain.elasticsearch_log_cluster.endpoint}:443"
    }
  }
}

 

 

Notice that Log Proxy Lambda function also obtains AWS credentials from the environment which are not provided by the Terraform deployment. This will not produce an error since the variables actually exist and are taken from the role that's attached to the Lambda function itself.

 

 

Cloudwatch Stream


It appears that Terraform does not provide facilities to set up streaming of Cloudwatch logs to Lambda, nor create Cloudwatch Lambda trigger, both of which are essential. Therefore, manual interaction is required.

 

Go to Cloudwatch and see log group that Lambda@Edge has created, and set up stream Log Proxy Lambda function. Now Cloudwatch knows where to send the logs, however the Lambda does not know if to react to a log being sent to it, therefore Lambda trigger needs to be configured also.

 

 

 

Lambda Trigger


Go to Log Proxy Lambda function, and add a Cloudwatch Logs trigger with the Cloudwatch log group that you want to send to your logging solution.

 

 

Elasticsearch Cluster


Just to show you how Elasticsearch cluster was created, here’s the Terraform file. The Elasticsearch uses Cognito authentication, since it’s opened to the public, it’s required!

data "aws_iam_policy_document" "elasticsearch_log_cluster_access_policy" {
  statement {
    actions = [
      "es:*",
    ]

    resources = [
      "arn:aws:es:eu-west-1:619306035195:domain/${var.elasticsearch_domain_name}/*",
    ]

    principals {
      type = "AWS"

      identifiers = [
        "${aws_iam_role.default_lambda_role.arn}",
        "${aws_iam_role.cognito_authenticated_role.arn}",
      ]
    }
  }

  provider = "aws.ireland"
}

# generate assume roles and policies for cognito
data "aws_iam_policy_document" "cognito_assume_policy" {
  statement {
    actions = [
      "sts:AssumeRole",
    ]

    principals {
      type = "Service"

      identifiers = [
        "es.amazonaws.com",
      ]
    }
  }

  provider = "aws.ireland"
}

resource "aws_iam_role" "cognito_access_role" {
  name               = "access_cognito_role"
  assume_role_policy = "${data.aws_iam_policy_document.cognito_assume_policy.json}"
  provider           = "aws.ireland"
}

resource "aws_iam_role_policy_attachment" "attach_policy_to_cognito_access_role" {
  role       = "${aws_iam_role.cognito_access_role.name}"
  policy_arn = "arn:aws:iam::aws:policy/AmazonESCognitoAccess"
  provider   = "aws.ireland"
}

resource "aws_elasticsearch_domain" "elasticsearch_log_cluster" {
  domain_name           = "${var.elasticsearch_domain_name}"
  elasticsearch_version = "6.3"
  access_policies       = "${data.aws_iam_policy_document.elasticsearch_log_cluster_access_policy.json}"

  ebs_options {
    ebs_enabled = true
    volume_size = 10
    volume_type = "standard"
  }

  cluster_config {
    instance_type  = "t2.small.elasticsearch"
    instance_count = 2
  }

  cognito_options {
    enabled          = true
    user_pool_id     = "${aws_cognito_user_pool.cognito_user_pool.id}"
    identity_pool_id = "${aws_cognito_identity_pool.cognito_kibana_identity_pool.id}"
    role_arn         = "${aws_iam_role.cognito_access_role.arn}"
  }

  tags     = "${var.default_tags}"
  provider = "aws.ireland"
}

 

Potential Issue


The biggest issue with this approach is the fact that you’ll have to look for new Cloudwatch log groups being created in all AWS region e.g. with the first deployment of your serverless application using Lambda@Edge, the logs could appear in one or two regions, however as your user base grows, new log groups will be created in new regions where you might have deployed log proxy Lambda function, but logs won’t be proxied through as it has to be set manually.

 

You can however, create log groups with the same name as your Lambda@Edge function in all AWS regions beforehand, and set the streaming to log proxy Lambda at the beginning and not wait for the log groups to be created automatically whenever new AWS region receives logs.

Katapult Cloud
No Comments

Post a Comment

Comment
Name
Email
Website