Ads Top

Building, Testing and Deploying Java applications on AWS Lambda using Maven and Jenkins

With continuous integration (the practice of continually integrating code into a shared code repository) and continuous deployment (the practice of building, testing, and deploying code often), developers can release software faster and more frequently.
This post shows how the principles of code testing, continuous integration, and continuous deployment can be applied to your AWS Lambda workflows. Using Git, Maven, and Jenkins, you can integrate, test, build and deploy your Lambda functions using these same paradigms.
As a side note, we will be having a webinar on Continuous Delivery to AWS Lambda which covers more methods of Continuous Delivery on Thursday, April 28th. Register for the webinar.

Prerequisites

  • Jenkins
    Many of our customers use Jenkins as an automation server to perform continuous integration. While the setup of Jenkins is out of scope for this post, you can still learn about unit testing and pushing your code to Lambda by working through this example.
  • A Git repository
    This method uses Git commit hooks to perform builds against code to be checked into Git. You can use your existing Git repository, or create a new Git repository with AWS CodeCommit or other popular Git source control managers.

Getting started

In this example, you are building a simple document analysis system, in which metadata is extracted from PDF documents and written to a database, allowing indexing and searching based on that metadata.
You use a Maven-based Java project to accomplish the PDF processing. To explain concepts, we show snippets of code throughout the post.

Overview of event-driven code

To accomplish the document analysis, an Amazon S3 bucket is created to hold PDF documents. When a new PDF document is uploaded to the bucket, a Lambda function analyzes the document for metadata (the title, author, and number of pages), and adds that data to a table in Amazon DynamoDB, allowing other users to search on those fields.
Lambda executes code in response to events. One such event would be the creation of an object in Amazon S3. When an object is created (or even updated or deleted), Lambda can run code using that object as a source.
image

Create an Amazon DynamoDB table

To hold the document metadata, create a table in DynamoDB, using the Title value of the document as the primary key.
image
For this example, you can set your provisioned throughput to 1 write capacity unit and 1 read capacity unit.
image

Write the Java code for Lambda

The Java function takes the S3 event as a parameter, extracting the PDF object and analyzing the document for metadata using Apache PDFBox, and writing the results to DynamoDB.
// Get metadata from the document
PDDocument document = PDDocument.load(objectData);
PDDocumentInformation metadata = document.getDocumentInformation();
...
String title = metadata.getTitle();

if (title == null) {
    title = "Unknown Title";
}
...
Item item = new Item()
    .withPrimaryKey("Title", title)
    .withString("Author", author)
    .withString("Pages", Integer.toString(document.getNumberOfPages()));
The Maven project comes with a sample S3 event (/src/test/resources/s3-event.put.json) from which you can build your tests.
{
  "Records": [
    {
      "eventVersion": "2.0",
      "eventSource": "aws:s3",
      "awsRegion": "us-east-1",
      "eventTime": "1970-01-01T00:00:00.000Z",
      "eventName": "ObjectCreated:Put",
      "userIdentity": {
        "principalId": "EXAMPLE"
      },
      "requestParameters": {
        "sourceIPAddress": "127.0.0.1"
      },
      "responseElements": {
        "x-amz-request-id": "79104EXAMPLEB723",
        "x-amz-id-2": "IOWQ4fDEXAMPLEQM+ey7N9WgVhSnQ6JEXAMPLEZb7hSQDASK+Jd1vEXAMPLEa3Km"
      },
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "testConfigRule",
        "bucket": {
          "name": "builtonaws",
          "ownerIdentity": {
            "principalId": "EXAMPLE"
          },
          "arn": "arn:aws:s3:::builtonaws"
        },
        "object": {
          "key": "blogs/lambdapdf/aws-overview.pdf",
          "size": 558985,
          "eTag": "ac265da08a702b03699c4739c5a8269e"
        }
      }
    }
  ]
}
Take care to to replace the awsRegion, arn, and key to match your specific region, Amazon Resource Name, and key of the PDF document that you’ve uploaded.

Test your code

The sample code you’ve downloaded contains some basic unit tests. One test gets an item from the DynamoDB table and verifies that the expected metadata exists:
@Test
public void checkMetadataResult() {
    DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient());
    Table table = dynamoDB.getTable("PDFMetadata");
    Item item = table.getItem("Title", "Overview of Amazon Web Services");

    assertEquals(31, item.getInt("Pages"));
    assertEquals("sajee@amazon.com", item.getString("Author"));
    assertEquals("Overview of Amazon Web Services", item.getString("Title"));
}
Before continuing, test your code to ensure that everything is working:
mvn test
After ensuring there are no errors, check your DynamoDB table to see the metadata now added to your table.
DynamoDB
The code executes because of the sample event in your Maven project, but how does it work when a new PDF is added to your bucket? To test this, complete the connection between Amazon S3 and Lambda.

Create a Lambda function

Use mvn package to package your working code, then upload the resulting JAR file to a Lambda function.
  1. In the Lambda console, create a new function and set runtime to Java 8.
  2. Set function package to the project JAR file Maven created in the target folder.
  3. Set handler to “example.S3EventProcessorExtractMetadata”.
  4. Create a new value for role based on the Basic With DynamoDB option. A role gives your function access to interact with other services from AWS. In this case, your function to interacts with both Amazon S3 and Amazon DynamoDB. In the window that opens, choose View Policy Document, then choose Edit to edit your policy document.
Policy
While this document gives your function access to your DynamoDB resources, you need to add access to your S3 resources as well. Use the policy document below to replace the original.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::*"
      ]
    },
    {
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}
It is important to note that this policy document allows your Lambda function access to all S3 and DynamoDB resources. You should lock your roles down to interact only with those specific resources that you wish the function to have access to.
After completing your policy document and reviewing the function settings, choose Create Function.

Create Amazon S3 bucket

  1. Create a bucket in the Amazon S3 console. Note that buckets created in S3 are in a global namespace: you need to create a unique name.
  2. After the bucket is created, upload the Overview of Amazon Web Services PDF to your bucket. We’ve included this white paper to use in unit tests for debugging your Lambda function.
  3. Manage events for your S3 bucket by going to the root bucket properties and choosing Events:S3
  4. Give your event a name, such as “PDFUploaded”.
  5. For Events, choose Object Created (all).
  6. For Prefix, list the key prefix for the subdirectory that holds your PDFs, if any. If you want to upload PDF documents to the root bucket, you can leave this blank. If you made a “subdirectory” called “pdf”, then the prefix would be “pdf”.
  7. Leave Suffix blank, and choosing Lambda function as the Send To option, choosing the Lambda function you created.
  8. Choose Save to save your S3 event.

Test everything

Test the entire process by uploading a new PDF to your bucket. Verify that a new entry was added to your DynamoDB table.

S3
(To troubleshoot any errors, choose Monitoring in your Lambda function to view logs generated by Amazon CloudWatch.)

Enter Jenkins

At this point, you have created a testable Java function for Lambda that uses an S3 event to analyze metadata from a PDF document and stores that information in a DynamoDB table.
In a CI/CD environment, changes to the code might be made and uploaded to a code repository on a frequent basis. You can bring those principles into this project now by configuring Jenkins to perform builds, package a JAR file, and ultimately push the JAR file to Lambda. This process can be based off of a Git repo, by polling the repo for changes, or using Git’s built-in hooks for post-receive or post-commit actions.

Build hooks

Use the post-commit hook to trigger a Jenkins build when a commit is made to the repo. (For the purposes of this post, the repo was cloned to the Jenkins master, allowing you to use the post-commit hook.)
To enable Jenkins to build off Git commits, create a Jenkins project for your repo with the Git plugin, set Build Trigger to “Poll SCM”, and leave Schedule blank.
In your project folder, find .git/hooks/post-commit and add the following:
#!/bin/sh

curl http://<jenkins-master>:8080/job/<your-project-name>/build?delay=0sec
This ensures that when a commit is made in this project, a request is made to your project’s build endpoint on Jenkins. You can try it by adding or modifying a file, committing it to your repo, and examining the build history and console output in your Jenkins dashboard for the status update.
S3
(For more information about implementing a post-receive hook, see the Integrating AWS CodeCommit with Jenkins AWS DevOps Blog post.)

Deploy code to Lambda

You may notice in the console output a command for aws sns publish --topic-arn .... In this project, we’ve added a post-build step to publish a message via Amazon Simple Notification Service (Amazon SNS) as an SMS message. You can add a similar build step to do the same, or take advantage of SNS to HTTP(S) endpoints to post status messages to team chat applications or a distributed list.
However, to be able to push the code to AWS Lambda after a successful commit and build, look at adding a post build step.
  1. In the configuration settings for your project, choose Add built step and Invoke top-level Maven targets, setting Goal to “package”. This packages up your project as a JAR file and places it into the target directory.
  2. Add a second build step by choosing Add built step and the Execute shell option.
  3. For Command, add the following Lambda CLI command (substitute the function-name variable and zip-file variable as necessary):
    aws lambda update-function-code --function-name extractPDFMeta --zip-file fileb://target/lambda-java-example-1.0.jar
You have now added the necessary build steps for Jenkins to test and package your code, then upload it to Lambda. Test the entire process start to finish by adding a new PDF document into your S3 bucket and checking the DynamoDB table for changes.

Take it a step further

The code and architecture described here are meant to serve as illustrations for testing your Lambda functions and building out a continuous deployment process using Maven, Jenkins, and AWS Lambda. If you’re running this in a production environment, there may be additional steps you would want to take. Here are a few:
  • Add additional unit tests
  • Build in additional features and sanity checks, for example, to make sure documents to be analyzed are actually PDF documents
  • Adjust the Write Capacity Units (WCU) of your DynamoDB table to accommodate higher levels of traffic
  • Add an additional Jenkins post-build step to integrate Amazon SNS to send a notification about a successful Lambda deployment

1 comment:

Powered by Blogger.