AWS Design Resilient Architectures

varunkumar inbaraj
7 min readDec 16, 2019

| Cheat Sheets

AMIs can be created with software installed and as AMIs are regional scoped they can be copied to another region for launching instances.

You can launch multiple instances from a single AMI when you need multiple instances with the same configuration.

CloudFront with S3 as origin helps cache the requests and reduce the direct calls to S3

As well as randomness helps in data distribution in S3 across partitions.

versioning on S3 helps in maintaining multiple copies

Also It helps to recover from accidental deletion or overwrites.

ELB is used to distribute traffic on to EC2 Instances

It automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions.

If 2 AZ then attach ELB with 2 public subnets and 2 private subnets for web servers and a Multi-AZ RDS spanning 2 AZs.

AZs do not meet the requirement of being apart at least 500 miles(only regional can).

Weighted routing policy ideally use in Route 53 for achieving Blue-Green deployments

Route 53 does not load balancing across Auto Scaled instances. Its can be achieved through ELB.

Spot instances defined with persistent request type and EBS backed instances allows them to be stopped and started.

If Spot instances that will be terminated when interrupted and with a two-minute warning when there is not enough capacity,
Amazon CloudWatch Events to invoke an AWS Lambda function that can launch On-Demand Instances.
Note:Spot fleet does not guarantee availability and can be terminated.

AWS Aurora provides high availability and reliability by replicating the data across three availability zones, by default

Note:
Multi-AZ RDS, the data is replicated across 2 AZs only and It is not multi-master (only one DB can be written to at a time), and does not span regions.

Pilot light disaster recovery method only critical systems are backed up like RDS and other non-critical systems configurations are made to make sure they can be brought up soon like CloudFormation, Snapshots, AMIs etc.This solution allows rapid provision of a working, fully-scaled production environment

The term pilot light is often used to describe a Disaster Recovery scenario in which a minimal version of an environment is always running in the cloud.

Note:
Lambda function is not reliable to replicate database changes
ALB is not needed in pilot light
Route 53 health checks cannot be deployment and S3 does not suit application deployment.

Lambda provides serverless compute, high availability for the process as well as removes the need for any compute infrastructure

AWS Lambda can automatically run code in response to multiple events, such as
HTTP requests via Amazon API Gateway,
modifications to objects in Amazon S3 buckets,
table updates in Amazon DynamoDB,
Lambda provides scalable way to implement microservice and easily integrates with Kinesis,
and state transitions in AWS Step Functions.

NAT Gateway is launched per AZ and should be launched in each AZ to ensure High Availability.

Note: ELB is for load balancing and not high availability

Launch the instances in an Auto Scaling group with an Elastic Load Balancing health check helps for automate recovery if a web server instance stops replying to requests

Note:
1)Route 53 DNS only need to point to ELB
2)ELB would help handle the routing of traffic only to the healthy instances.
3)Auto Scaling would help replace the instance with the AMI(If) specified in the auto Scaling configuration.
4)ELB and Auto Scaling in multiple AZs also provide HA but does not help recovery if health checks are not configured.
5)Route 53 need to point to Application Load Balancer. It does not provide load balancing over instances.

SQS can help decouple the requests between the components. like a payment processing service that sends orders to a fulfilment service and even it communicate asynchronously

And scaling the processing nodes also can help in scaling as the demand.
Note: Internal ELB does not still decouple the components.

Elastic Beanstalk worker environments support SQS dead letter queues, where worker can send messages that for some reason could not be successfully processed.

IAM role needs to be associated with the ECS task definition

If you want each Amazon ECS task to have an IAM policy that limits the task’s privileges to only those required for its use of AWS services.

DynamoDB Auto Scaling is a fully managed NoSQL solution and supports both key-value and document structures.

like service 50,000 reads/second and if you expect 10% growth in traffic and data volume and DynamoDB global tables provides a fully managed solution for deploying a multi-region, multi-master database.

Application Load Balancer provides a dynamic port mapping capability with ECS.

Route 53, CloudFront with ALB and Auto Scaling can help create a geographically redundant and scalable solution.

Enable Cross-Region snapshots for the Redshift Cluster and

It helps for recovery site be setup in a separate region in case of a disaster. It is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and existing Business Intelligence (BI) tools.

Backup and Restore is most cost effective solution for Disaster recovery. It involves backing up all the resources from primary, so as to recreate them in secondary in case a disaster happens.

In case of an Availability Zone becomes unavailable or failure of the primary database instance, could cause a Multi-AZ Amazon RDS failover to occur

Note : This is due to automatic failover. It’s in a different AZ within the same region.

Kinesis Data Firehose is the easiest way to load streaming data into data stores and analytics tools.

Firehose Destinations include: Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk. It is use to provide reliable data ingestion from the any application into the datastore

Instance store volumes are Ephemeral i.e. short lived and the data is lost as soon the instance is terminated.

Therefore, do not rely on instance store for valuable, long-term data. Instead, use more durable data storage, such as Amazon S3, Amazon EBS, or Amazon EFS.

The volume gateway represents the family of gateways that support block-based volumes

In the cached volume mode, your data is stored in Amazon S3 and a cache of the frequently accessed data is maintained locally by the gateway. With this mode, you can achieve cost savings on primary storage, and minimize the need to scale your storage on-premises, while retaining low-latency access to your most used data.

In the stored volume mode, data is stored on your local storage with volumes backed up asynchronously as Amazon EBS snapshots stored in Amazon S3. This provides durable and inexpensive off-site backups. You can recover these backups locally to your gateway or in-cloud to Amazon EC2, for example, if you need replacement capacity for disaster recovery.

Origin Access Identity is a special CloudFront user associated with the distribution. OAI allows exposing the content without making the S3 content public.

For web distribution, it is associated with S3.

Route 53 is an AWS managed DNS service. It is Global and can be configured to route traffic across multiple regions with an ability for health checks and fail over routing.

Route 53 allows routing requests to instances outside AWS as well as providing health checks to route traffic to only healthy instances.
Note : CloudFront only enables caching and rendering across edge locations and S3 is regional and does not help route traffic across regions.

Route 53 can be configured to failover to a static S3 website or CloudFront in case of an issue.

For highest packet-per-second performance and lowest latency for your application, Enable enhanced networking on all the Amazon EC2 instances

Note : Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both.

To optimize performance for a compute cluster that requires low inter-node latency

In case of degraded due to the failure of the ElastiCache, Configure ElastiCache Multi-AZ with automatic failover.

The main benefits of running your ElastiCache for Redis in Multi-AZ mode are enhanced availability and smaller need for administration.

Configuring a Multivalue Answer policycan be used to route user traffic to random web servers when they request for the underlying web application.

Multivalue answer routing lets you configure Amazon Route 53 to return multiple values, such as IP addresses for your web servers, in response to DNS queries.
Note:
1)latency policy helps direct the traffic to the resource with minimal latency.
2)Route 53 for its resources between two AWS regions if both regions are healthy, Configure Active-Active failover using Route 53 Latency DNS records.

Creating a NAT gateway in a public subnet route all private subnet Internet traffic through the NAT gateway is the most cost-effective and scalable solution

the most cost effective and scalable solution is to use the AWS NAT Gateway, which is an AWS managed NAT solution.

Using Cost Explorer to generate the report for the past 12 months and to provide the forecast.

Cost Explorer allows you to generate cost reports for the past 13 months and forecast for 3 months, with the ability to filter the data over a variety of filters.

AWS is responsible for Physical security of AWS data centers, hypervisors and decommissioning of storage devices and performing backups and patching the software that powers your RDS database and it automated backup feature automatically creates a storage volume snapshot of your DB instance, backing up the entire DB instance. The latest restorable time for a DB instance is typically within 5 minutes of the current time.

Note: Not responsible if anything with and within the VPC is customer’s responsibility like security of keys, VPC, instances and data traffic to VPC and instances.

The internet is being used, it can be avoided using an VPC endpoint which will allow the communication to happen within the AWS network.

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC.

AWS Organizations helps you manage policies for multiple AWS accounts. With Organizations, you can create groups of accounts, and then attach policies to a group to ensure the correct policies are applied across the accounts.

AWS Organizations enables you to set up a single payment method for all the AWS accounts in your organization through consolidated billing.

--

--