Flying Server-less on the Cloud with AWS Lambda

Flying Server-less on the
Cloud with AWS Lambda
Serkan ÖZAL

Who Am I?
● Senior Software Engineer @ OpsGenie
● Co-organizer of Serverless Meetup Turkey
● Oracle Open-Source Contributor
● PhD. Cand. @ METU Computer Eng.
● 8+ years in software development
● Hard-core JVM ninja
● Actively working on Serverless and AWS Lambda
● Part-time Big Data researcher
● Building a new product Thundra
2

Agenda
● Road to Serverless
● The Motivation
● Under the Hood
● Integrations
● Limitations
● Logging
● Config Management
● Security
● Error Handling
4

“If your PaaS can efficiently start
instances in 20 ms that run for half
a second, then call it serverless.”
Adrian Cockcroft, VP Cloud Architecture Strategy at AWS
6

What is AWS Lambda?
- AWS’s FaaS (Function as a Service)
- Run code without provisioning or managing servers
- Support invocation types:
- Request/response (sync)
- Event driven (async)
- Supported languages:
- Go
- C#
- Java
- Node.js
- Python
7

Why AWS Lambda?
- PAYG - pay as you go
- Highly available
- Scale fast
- Horizontally
- Vertically
- Don’t manage servers
- Built-in integration with other AWS services
- Security
9

The Devil is in the Detail
- Container reuse
- Container freeze
- More memory => More CPU
- One execution per container at any time
- At least one delivery guarantee
- New container for
- new deployments
- configuration updates
- even for environment variable updates
- Destroy container when timeout 11

How to Limit Container?
- cgroup (Control Group)
- Engineers at Google started the work in 2006
- Merged into Linux kernel in January 2008
- Can limit
- CPU
- Memory
- Disk bandwidth
- Network bandwidth
12

CPU Throttling
- cgcreate
- cgcreate -g cpu:/cg1
- cgcreate -g cpu:/cg2
- cpu.cfs_quota_us / cpu.cfs_period_us
- how to configure cg1 to run 0.2 seconds out of every 1 second?
- cgset -r cpu.cfs_quota_us=200000 cpu.cfs_period_us=1000000 cg1
- cpu.shares
- how to configure 1:2 CPU usage ratio between cg1 and cg2?
- cgset -r cpu.shared=512 cg1
- cgset -r cpu.shared=1024 cg2
- why not used by AWS Lambda? 13

Built-in Integrations
- Direct
- DynamoDB
- Kinesis
- Firehose
- SNS
- S3
- API Gateway
- CloudWatch Logs
- CloudWatch Events
- Scheduled
- SES
- Cognito
- CodeCommit
- CloudFormation
- CloudFront
- Config
- Alexa
- Lex
- IoT Button
15

Resource Limits
Max execution time 5 minutes
Memory allocation range 128 MB - 3 GB
Ephemeral disk capacity ("/tmp" space) 512 MB
Invoke request body payload size (sync invocation) 6 MB
Invoke request body payload size (async invocation) 128 K
Number of file descriptors 1024
Number of processes and threads (combined total) 1024
17

Deployment Limits
Function deployment package size (compressed .zip/.jar file) 50 MB
Size of code/dependencies that you can zip into a deployment
package (uncompressed .zip/.jar size)
250 MB
Total size of all the deployment packages per region 75 GB
Total size of environment variables set 4 KB
18

Execution Limits
- Account level concurrent execution limit is 1000
- It per region
- It is soft limit
- Function level concurrent execution limit
- It is reserved
- the value is deducted from the unreserved concurrency pool
- ENI Limit is 350
- It is for Lambda function in VPC
- It is per region
- It is soft limit
19

Writing Logs
- Logs are written to CloudWatch asynchronously
- Log group per function
- /aws/lambda/my-func
- Log stream per container under log group
- 2018/01/27/[$LATEST]f95da1aaf0384ed6ad642d8299f7503d
- How to log
- Standard output/error
- Lambda API
21

Collecting Logs
- Subscribe to CloudWatch log groups
- Only one subscription per log group
- Filter by pattern
- Stream to AWS Lambda
- Stream to AWS Elasticsearch
23

Environment Variables
- No limit to the number of env. variables
- Max total size is 4 KB
- Must start with letters [a-zA-Z]
- Can only contain alphanumeric char. and “_” [a-zA-Z0-9_]
- KMS
- Encrypt at rest (default)
- Encrypt in transit
25

SSM
- Centralized config management
- share between functions
- update once
- Fine-grained access to sensitive data via IAM
- Integrates with KMS out-of-the-box
- Records a history of changes
26

VPC
- Define/select VPC and configure
- Subnets (recommended one subnet in each AZ)
- Security Groups
- To be able to access internet
- NAT Gateway
- Internet Gateway
- Route table configuration
- Be aware of ENI limit (default 350)
- Sure that subnet has enough IP address range for ENI
28

Role
- Each Lambda function has an associated IAM role
- For accessing AWS resources
- grant the role the necessary permissions that your Lambda function needs
- for ex. permission to Lambda for putting item to DynamoDB table
- For non-stream based event sources
- grant the event source permissions to invoke function
- for ex. perm. to S3 bucket for invoking Lambda on upload
- For stream based event sources
- grant AWS Lambda permissions for the relevant stream actions
- for ex. perm. to Lambda for getting Kinesis stream records to be invoked
29

Others
- Inbound connections are blocked
- For outbound connections only TCP/IP sockets are
supported
- “ptrace” (debugging) system calls are blocked
- TCP port 25 is also blocked as an anti-spam measure
30

Retries
- For sync invocations (Lambda API call, …)
- client is responsible for retries
- For async invocations
- Non-Stream based events (S3, SNS, CloudWatch, …)
- retry a few times (2 or more) with delays
- If still fails, put in to DLQ (if specified=
- Stream based events (Kinesis , DynamoDB streams)
- retry until succeeded or
- retry until data expires
32

DLQ - Dead Letter Queue
- Can be
- SNS topic
- SQS queue
- Requests are redirected if the invocation is
- Asynchronous and
- Event source is non-stream based (S3, SNS, …)
- Requires permission to access to the DLQ resource
- Monitor “DeadLetterErrors” metrics
33

Agenda
● Monitoring
● Alerting
● Testing
● Deployment
● Performance & Cold Start
● AWS Lambda @ OpsGenie
35

CloudWatch Metrics
- Following metrics are supported per function basis:
- Invocation
- Errors
- Duration
- Dead Letter Error
- Throttles
- Iterator Age
- Following metrics are supported across all functions:
- Concurrent Executions
- Unreserved Concurrent Executions
37

Distributed Tracing with AWS X-Ray
- Shows durations, responses and errors
- Segment for Lambda invocation
- Sub-Segments for
- initialization
- calls to external services
- custom ones
- Custom properties
- can be queried over “Annotation”
- can be stored on “Metadata” as raw
38

API Logging with CloudTrail
- CloudTrail can log
- function definition/configuration CRUD
- function invocations
- log entry contains information about
- who generated the request
- the requested action
- the action parameters
- ...
- CloudTrail logs can be published to
- S3
- SNS 41

Full Observability with Thundra
- Provides three pillars of observability:
- Trace
- Metric
- Log
- Zero overhead with async data publishing
- Has automated instrumentation and profiling support
- Integrated with AWS X-Ray
- www.thundra.io
42

Creating Alarm
- Create alarm on CloudWatch by metrics
- Following metrics are supported per function basis:
- Following metrics are supported across all functions:
- Concurrent Executions
- Unreserved Concurrent Executions
- Notify through SNS
- E-Mail
- Lambda
- ...
- Duration
- Errors
- Invocations
- Throttles
46

Writing Test
- Unit Test
- do our objects works as expected themselves?
- Integration Test
- does our objects work well together?
- Functional Test
- does the whole system work from end to end?
- Local Lambda development
- SAM Local
- LocalStack
- Cloud9 48

SAM Local
- Works with SAM template
- Simulates some AWS service events (not services)
- S3, Kinesis, DynamoDB, Cloudwatch, Scheduled Event, API GW
- Runs API Gateway locally
- Allows debugging on local
- https://github.com/awslabs/aws-sam-local
49

LocalStack
- Spins up the many core Cloud APIs on your local
- Lambda, API GW, DynamoDB, Kinesis, Firehose, S3, SNS, SQS, ...
- Supports error injection
- ProvisionedThroughputExceededException, ...
- Can be run on docker
- Integrated with some test frameworks
- JUnit for Java
- nosetests for Python
- https://github.com/localstack/localstack
50

Tools
- Serverless
- SAM (Serverless Application Model)
- APEX
- Zappa
- Sparta
52

Versioning
- Each deploy/upload is a new version
- Aliases map to versions
- There is N:1 relation
- An alias can only be mapped to only one version
- A version can be mapped by multiple aliases
- By default latest version (“$LATEST”) is invoked
- Shift traffic using aliases with weighted versions
53

New Version Release
v1
prod
Alias Mapping
Alias Version
prod v1
54

New Version Release
v1
prod
Alias Mapping
Alias Version
prod v1
v2
55

New Version Release
v1
prod
Alias Mapping
Alias Version
prod v2
v2
56

Canary Deployment
Alias Mapping
Alias Version
prod v1v1
prod
app
57

Canary Deployment
Alias Mapping
Alias Version
prod v1v1
prod
app
v2
58

Canary Deployment
Alias Mapping
Alias Version
prod v1
prod2 v2
v1
prod
app
v2
59

Canary Deployment
Alias Mapping
Alias Version
prod v1
prod2 v2
v1
prod
app
v2
app2
prod2
60

Canary Deployment
Alias Mapping
Alias Version
prod v1
prod2 v2
v1
app
v2
app2
prod2
61

Canary Deployment
Alias Mapping
Alias Version
prod2 v2
v2
app2
prod2
62

What Does Affect Cold Start?
- Depends on language
- Java and C# has more cold start overhead
- Depends on code size
- Smaller artifact size = less cold start (not significantly)
- Depends on memory size
- More memory = less cold start
- Depends on network configuration
- VPC has more cold start overhead (because of ENI)
- SSL handshake has more cold start overhead
- Depends on application and 3rd party libs 64

Cold start times by language + memory
read.acloud.guru/does-coding-language-memory-or-package-size-affect-cold-starts-of-aws-lambda-a15e26d12c76
65

Response times by language
read.acloud.guru/does-coding-language-memory-or-package-size-affect-cold-starts-of-aws-lambda-a15e26d12c76
Average response time
Maximum response time
66

Cold Start on JVM
- Loading and initializing
- Application classes
- Core JDK classes
- Security (SSL, encryption, …) related JDK classes
- Initializing 3rd party libraries/frameworks
- AWS SDK
- Spring, Jackson, ...
67

How to Startup Faster on JVM? [1]
- Enable CDS (Class Data Sharing)
- -Xshare:on
- Already enabled by AWS Lambda
- Enable AppCDS (Application Class Data Sharing)
- -XX:+UseAppCDS -XX:SharedArchiveFile=hello.jsa
- For OpenJDK only available at Java 9 :(
- Use AOT (Ahead of Time Compilation)
- Build custom runtime image with “jlink”
- Only available at Java 9 :(
68

How to Startup Faster on JVM? [2]
- Use Tiered Compilation
- Tiered compilation is disabled on AWS Lambda
- -XX:+TieredCompilation -XX:TieredStopAtLevel=1
- Disable bytecode verification
- -Xverify:none
- No classpath scan
- Prefer programmatic or XML configuration for Spring
- Prefer lightweight libraries if possible
- Spring => Guava, Dagger, ...
- Jackson => Gson, ...
69

Warmup
- Periodically send empty messages
- So AWS Lambda might think that container is active
- Not perfect solution for cold start
- AWS’s new experimental container pre-initializer
- How to keep multiple containers up?
- https://github.com/opsgenie/sirocco
70

Incident Management
On
Lambda
72

Flying Server-less on the Cloud with AWS Lambda

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Flying Server-less on the Cloud with AWS Lambda

Similaire à Flying Server-less on the Cloud with AWS Lambda (20)

Plus de Serkan Özal

Plus de Serkan Özal (7)

Dernier

Dernier (20)

Flying Server-less on the Cloud with AWS Lambda