Cloud Design Patterns
These design patterns are useful for building reliable, scalable, secure applications in the cloud.
Each pattern describes the problem that the pattern addresses, considerations for applying the pattern, and an example based on Microsoft Azure. Most patterns include code samples or snippets that show how to implement the pattern on Azure. However, most patterns are relevant to any distributed system, whether hosted on Azure or other cloud platforms.
Cloud workloads are prone to the fallacies of distributed computing. Some examples of cloud design fallacies are:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn't change
- There is one administrator
- Component versioning is simple
- Observability implementation can be delayed
Design patterns don't eliminate notions such as these but can help bring awareness, compensations, and mitigations of them. Each cloud pattern has its own trade-offs. You need to pay attention more to why you're choosing a certain pattern than to how to implement it.
Challenges in cloud development
Data ManagementData management is the key element of cloud applications, and it influences most of the quality attributes. Data is typically hosted in different locations and across multiple servers for performance, scalability or availability. This can present various challenges. For example, data consistency must be maintained, and data will typically need to be synchronized across different locations. |
|
Design and ImplementationGood design encompasses consistency and coherence in component design and deployment, maintainability to simplify administration and development, and reusability to allow components and subsystems to be used in other applications and scenarios. Decisions made during the design and implementation phase significantly impact the quality and total cost of ownership of cloud-hosted applications and services. |
|
MessagingThe distributed nature of cloud applications requires a messaging infrastructure that connects the components and services, ideally loosely coupled to maximize scalability. Asynchronous messaging is widely used and provides many benefits, but it also brings challenges such as ordering messages, poison message management, idempotency, and more. |
Catalog of patterns
Pattern | Summary | Category |
---|---|---|
Ambassador | Create helper services that send network requests on behalf of a consumer service or application. | Design and Implementation, Operational Excellence |
Anti-Corruption Layer | Implement a façade or adapter layer between a modern application and a legacy system. | Design and Implementation, Operational Excellence |
Asynchronous Request-Reply | Decouple backend processing from a frontend host, where backend processing needs to be asynchronous, but the frontend still needs a clear response. | Messaging |
Backends for Frontends | Create separate backend services to be consumed by specific frontend applications or interfaces. | Design and Implementation |
Bulkhead | Isolate elements of an application into pools so that if one fails, the others will continue to function. | Reliability |
Cache-Aside | Load data on demand into a cache from a data store | Data Management, Performance Efficiency |
Choreography | Let each service decide when and how a business operation is processed, instead of depending on a central orchestrator. | Messaging, Performance Efficiency |
Circuit Breaker | Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource. | Reliability |
Claim Check | Split a large message into a claim check and a payload to avoid overwhelming a message bus. | Messaging |
Compensating Transaction | Undo the work performed by a series of steps, which together define an eventually consistent operation. | Reliability |
Competing Consumers | Enable multiple concurrent consumers to process messages received on the same messaging channel. | Messaging |
Compute Resource Consolidation | Consolidate multiple tasks or operations into a single computational unit | Design and Implementation |
CQRS | Segregate operations that read data from operations that update data by using separate interfaces. | Data Management, Design and Implementation, Performance Efficiency |
Deployment Stamps | Deploy multiple independent copies of application components, including data stores. | Reliability, Performance Efficiency |
Edge Workload Configuration | The great variety of systems and devices on the shop floor can make workload configuration a difficult problem. | Design and Implementation |
Event Sourcing | Use an append-only store to record the full series of events that describe actions taken on data in a domain. | Data Management, Performance Efficiency |
External Configuration Store | Move configuration information out of the application deployment package to a centralized location. | Design and Implementation, Operational Excellence |
Federated Identity | Delegate authentication to an external identity provider. | Security |
Gatekeeper | Protect applications and services by using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. | Security |
Gateway Aggregation | Use a gateway to aggregate multiple individual requests into a single request. | Design and Implementation, Operational Excellence |
Gateway Offloading | Offload shared or specialized service functionality to a gateway proxy. | Design and Implementation, Operational Excellence |
Gateway Routing | Route requests to multiple services using a single endpoint. | Design and Implementation, Operational Excellence |
Geodes | Deploy backend services into a set of geographical nodes, each of which can service any client request in any region. | Reliability, Operational Excellence |
Health Endpoint Monitoring | Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. | Reliability, Operational Excellence |
Index Table | Create indexes over the fields in data stores that are frequently referenced by queries. | Data Management, Performance Efficiency |
Leader Election | Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances. | Design and Implementation, Reliability |
Materialized View | Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for required query operations. | Data Management, Operational Excellence, Performance Efficiency |
Pipes and Filters | Break down a task that performs complex processing into a series of separate elements that can be reused. | Design and Implementation, Messaging |
Priority Queue | Prioritize requests sent to services so that requests with a higher priority are received and processed more quickly than those with a lower priority. | Messaging, Performance Efficiency |
Publisher/Subscriber | Enable an application to announce events to multiple interested consumers asynchronously, without coupling the senders to the receivers. | Messaging |
Queue-Based Load Leveling | Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads. | Reliability, Messaging, Resiliency, Performance Efficiency |
Rate Limit Pattern | Limiting pattern to help you avoid or minimize throttling errors related to these throttling limits and to help you more accurately predict throughput. | Reliability |
Retry | Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed. | Reliability |
Saga | Manage data consistency across microservices in distributed transaction scenarios. A saga is a sequence of transactions that updates each service and publishes a message or event to trigger the next transaction step. | Messaging |
Scheduler Agent Supervisor | Coordinate a set of actions across a distributed set of services and other remote resources. | Messaging, Reliability |
Sequential Convoy | Process a set of related messages in a defined order, without blocking processing of other groups of messages. | Messaging |
Sharding | Divide a data store into a set of horizontal partitions or shards. | Data Management, Performance Efficiency |
Sidecar | Deploy components of an application into a separate process or container to provide isolation and encapsulation. | Design and Implementation, Operational Excellence |
Static Content Hosting | Deploy static content to a cloud-based storage service that can deliver them directly to the client. | Design and Implementation, Data Management, Performance Efficiency |
Strangler Fig | Incrementally migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services. | Design and Implementation, Operational Excellence |
Throttling | Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. | Reliability, Performance Efficiency |
Valet Key | Use a token or key that provides clients with restricted direct access to a specific resource or service. | Data Management, Security |
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for