Thursday, May 19, 2022

Cloud-Native

Attemt to clarify and demystify the term of Cloud-Native

As with many other IT terms, there is no clear and general accepted definition of Cloud-Native. The range on existing definitions is wide and usually depends on the publisher view and opinion. That makes it difficult to find an objective characterization. Here are my thoughts about the term Cloud-Native listed as theses.

In recent years, applications have found a new habitat at the Cloud and the borders of a dual view on the Enterprise IT - described as Bimodale IT (Gartner) or Two-Speed IT (MyKinsey) – became, over time, more and more blurred. The increased gravitational pull of the “Cloud” captures not only system of innovation or differentiation (aka mode 2), but nowadays also the traditional systems of record (aka mode 1). Then what do the people mean by saying that the real benefits of the Cloud will come with Cloud-Native applications? What precisely does the term Cloud-Native mean? Where does the term Cloud-Native actually come from? In which context is it appropriate to use the term Cloud-Native?

Cloud-Native Question Mark

As mentioned above, there is no formal definition of Cloud-Native, but one can list and describe what we see as common characteristics that fit the label. My view comes from own multiyear experiences at well running and, in particular, under-performing Cloud projects. Often these experiences and the resulting conclusions are congruent with the opinion of esteemed professionals who are much smarter than I am.

 

Thesis 1: “Cloud-Native” at its core is about applications and their architecture

Cloud-Native applications are running on modern API driven elastic infrastructure, packaged and deployed through DevOps processes and CI/CD automation workflows (CI/CD stands for Continuous Integration, Continuous Delivery and Continuous Deployment in this context). I think that most people would agree with me that this seems to be the formula for success of today.

However, the focal point is about the application itself and their characteristic architecture, assembling cloud-based components in an optimized way for a cloud infrastructure. It is a change in architecture on how deviding things up as compared to the past (vague, I know).

Cloud-Native needs Cloud infrastructure. Nevertheless, Cloud infrastructure XaaS (X is variable) is a means and not an end in itself, like all the other “noisy things” from agile techniques to CIOps or latest GitOps. Agile development methodologies grounds evolved long before the cloud-thing was coming up. Even with the best agile process, project teams have build poor traditional applications in the past or supposed Cloud-Native application these days. Moreover, a very sophisticated deployment process with a high degree of automation could deploy Cloud inappropriate applications. Agile development and innovated CI/CD processes will guide the creation and deployment of Cloud-Native applications but are not the centrepieces. It even tends to be the opposite; Cloud-Native applications facilitate agility, are the base to enable often and fast independent deployments fully automated to all staging environments.

The real problems arise from weak architectures with poor modularity that often leads to a big ball of mud. Neither an unmanageable strong-coupled service mesh (also called Cloud-Native spaghetti) nor monolithic service blobs will give the stakeholder the outcome that they hoped for.

The final objective behind Cloud-Native is to minimize time to market by working fast on cloud-fitting applications delivering business value (speed factor). It is about speed to market by getting it right in a faster way. This requires an architecture that care about “putting things apart”, having distributed and discrete units with clearly defined service boundaries. Ideally build Cloud vendor-neutral and therefore able to run anywhere (freedom factor) with a 24/7 uptime facilitated by redundancy together with fast self-recovery (trust factor). “Cattle” infrastructure with supporting applications swiftly replacing “pets” (never heard of the "Cattle not Pets" paradigm of disposable server infrastructure, see the explanation of the expression creator). These applications are build with mature technologies and frameworks which are important but not Cloud-Native itself, it is the way they are used (= architectural and design pattern paired with best practices).

 

Thesis 2: “Cloud-Native” is the approach to build applications with services for Cloud environments from day one.

Cloud-Native applications have an inhärent Cloud-DNA, they are born in the cloud. “Native” is derived from the latin word “nativus" for “innate, produced by birth". It is about how applications are designed and built - not where they are deployed; the Cloud provider should not matter.
There are different Could-Native maturity models available (for example the models from HCL Technologies, JPMorgan Chase & Co. or Microsoft), they all have in common that Cloud-Native is the highest maturity level. Applications could climb the ladder up to be Cloud-Native. These matured applications must be built for the Cloud and – to be honest – it is a hard way to do it in the right way.
Therefore, simply deploying an existing legacy application in the Cloud does not make it Cloud-Native. Cloud-Native is certainly not a Lift-and-Shift approach (aka rehosting) to bring existing applications as-is from a local non-cloud environment to a new Cloud habitat. For some applications, it is possible to reap certain benefits of the Cloud going for a Lift-and-Shift or API-wrapped approach. In the end, both approaches give some time to refactor and rewrite the existing legacy applications towards a Cloud-Native application. Nevertheless, re-architecture and rewrite is the only way to update the application architecture as part of an overall modernization strategy – a rebirth as a Cloud-Native application.

 

Thesis 3: “Cloud-Native” build applications as Microservices, packaging each part into its own container and dynamically orchestrating those deployed containers in order to optimize resource utilization.

This thesis is the slightly modified original characterization taken from the Cloud Native Computing Foundation (CNCF) which is a non-profit technology consortium, created with the charter to define the term Cloud-Native and to promote cloud-based projects. All big Cloud players like Amazon AWS, Microsoft Azure and Google Cloud are CNCF members. There is no getting around CNCF, which gather heavyweights in the IT industry around one table to form a shared understanding. Today the definition changed to a more generic description, but the main message about the three original elements is still true in Cloud-Native reality: "application containerization; microservice-oriented architecture; application support container orchestration and scheduling".

Even though the characterization is a few years old, I still agree with the original CNCF characterization of Cloud-Native, especially with the wording “application as Microservices” which revive and reify thesis 2, claiming to “build applications with services” – clarifying that we are talking about Microservices.

Netflix is often regarded as one of the pioneers of the Microservice movement, shaping new Cloud architectures to operate at massive scale. There are many magic frameworks from NetflixOSS that help with fighting the distributed systems challenges. Adrian Cockcroft, the former VP of Cloud architecture at Netflix, was one of the first who used the term Cloud-Native together with Microservices:

Adrian Cockcroft: “The application is designed from the ground up to take advantage of the elasticity and automation provided by the cloud. It uses cloud-native patterns and practices, such as microservices, containers, and declarative APIs to achieve high availability, resiliency, and agility.

I fully share that early days view, “designed from the ground” underlines addionally the significance of thesis 2, claiming for a cloud-day-one-architecture. Based on “cloud-native patterns and practices, such as microservices” is the reference to a Microservice architecture demaned at thesis 3.

Obviously, Microservices play an important role in the context of Cloud-Native. These two terms are closely interrelated. What are Microservices exactly? Although Microservices are also very broad and mixed defined, I would agree with the description given by James Lewis and Martin Fowler:

In short, the Microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

Previous theses and the given Microservice description goes all hand in hand. It is therefore clear to me, a Cloud-Native application – and Cloud-Native is first and foremost about applications - is build and deployed as a suite of small Microservices, each running as own process packaged as container. These small deployment units with one container per “bootiful” Microservice make fast independent deployments and individual scalability possible in the first place. The open success secret behind is to get the service boundaries right towards high cohesion internally and to be loosely coupled to each other (the best source to learn more about service boundaries is still Domain Driven Design).

In the words of Gary Olliffe, Research Vice President of Gartner:

The superpowers of a Microservice architecture can, in large parts, be attributed to the benefits of loose coupling and decomposing systems into component services that can de developed, deployed, and operated independently of each other”.

Microservices should remain small to make them easier to understand, maintain, and to rewrite/replace (if necessary). In order to have a high cohesion, the service responsibility has to be right regarding the domain. Vaughn Vernon, a leading expert in Domain Driven Design, recommends expressly looking into Microservices when the rate of change across business domains differs substantially.

Once Microservices are packaged as containers, the second part of the thesis is about “dynamically orchestrating those deployed containers in order to optimize resource utilization”. I think, today we do not need any further discussion about the advantages of container orchestration and that Kubernetes (K8s) has won the game. Kubernetes makes containerized services radically easier to manage and became a key part of the container revolution. I would even go so far as to say that Kubernetes has become the established operating platform for Cloud-Native applications; it is the defacto standard Cloud-Native “operating system”. Nowadays a Cloud-Native application means also a Kubernetes-Native application.

Kubernetes became the standard for container orchestration because it allows you to treat your containers like Cattle. Meanwhile we treat the Kubernetes clusters like Cattle, too. Spinning them up and down on demand by using appropriate infrastructure as code (IaC) tools (Terraform is the most popular representative of this type of IaC tools).

 

Thesis 4: “Cloud-Native” is using an Open Source software stack

I initially removed the Open Source commitment from the original CNCF characterization in thesis 3 – in full knowledge of diverging views and to give it an own thesis. Is Cloud-Native associated with Open Source? I believe that is the case to a high degree.

Using a Closed Source provider who is most visible in advertising but not portale between Cloud platforms is not a good decision. Furthermore, a lazy decision for a software provider that have been used in a pre-cloud approach is also not a good decisionmaking base. Unfeasible auto provisioning, portability problems and exploding costs could be the result.

The usage of Open Source will enable freedom of choice. Using Amazon-/Microsoft-/Google-Source leads to a single Cloud vendor lockin. It is easy to yield to the temptation of the Cloud service providers using their proprietary services. But it is very difficult to outbreak when a move to another Cloud provider or a hybrid approach with multiple provider is suddenly on the agenda. It is also conceivable that a Cloud-Exit strategy is demanded in order to avoid becoming dependent on a single supplier, requiring a move to another Cloud vendor within a specific timeframe anytime in the future. Proprietary Cloud Source will create barriers to technology to pull into a particular Cloud; Open Source will eliminate barriers to technology to enable freedom of choice.

An AWS-, Azure- or GCP-first approach, using proprietary cloud vendor services make sense only in certain scenarios. One case might be the reason of lacking Cloud-Native architectural and Open Source tooling knowledge (combined with the need to speed-up the application development). Another reason is deep trust in a Cloud provider whatever may come. Nevertheless, an application design with a Cloud-Exit and migration strategy in mind offers more flexibility and avoids a Cloud lockin.

I believe that the crowdsourced wisdom of a highly motivated developer community will create outstanding software. Many developers coming from different directions, sharing experiences, let maturing software in public. CNCF promote trusted open source projects and give them a home. Just like in the real world where kids need stability to grow up, CNCF give stability and trust for the Clout-Native open source ecosystem. There is no particular endorsement of any one project over another. CNCF keeps the overview in a confusing tooling jungle and often providing the defacto toolset standard.

Dan Kohn, the the recently deceased former CNCF executive director, neatly summarized when Amazon AWS joining the CNCF as a platinum member in 2017: “Virtualization was the biggest trend in enterprise computing over the last decade. The age of virtualization is now ending and the cloud-native era has begun, with an open sources software stack enabling portability between public, private and hybrid clouds, with the addition of AWS today, all the major cloud vendors are working together supporting open-source development of cloud-native technologies, with Kubernetes a primary focus of their collaboration. We believe AWS' participation will help shape the future of enterprise computing."

CNCF claims to provide a collection of Open Source project anyone can trust in order to avoid that “cloud is eating open source” and running into a cloud-vendor lockin. They support common Open Source projects anyone can use on any cloud platform. CNCF is about working projects-first, they enable competition and choice. It takes very good reasons not to choose the CNCF Open Source toolset!

 

Thesis 5: “Cloud-Native” is only under strict conditions Serverless/FaaS

The innovative leap on Serverless or FaaS (Function as a Service) happened relative quickly after the Microservice revolution. Project teams still struggling to get Microservice architectures right and we already have FaaS around the corner. I use “FaaS” as a shorthand for the development of “server-side” logic written by application developer (in distinction from pure “glue code” between cloud-vendor specific managed services), but coming with certain limitations and constraints.

In my opinion, FaaS is not the correct approach for every problem (what we continually hear and read); it will definitely not replace all existing architectures. IMHO, the spectrum of applications appears to be more limited than using a K8s running Microservice approach.

FaaS comes with vendor specific quotas applied to function configuration, deployments, and executions. Typically, these quotas cannot be changed, there are restrictions! FaaS functions are typically limited in how long each invocation is allowed to run and how many memory could be used. Environment variables are limited, payload size is limited, code package deployment size is limited, etc. Additonally architects and developers have to deal with a FaaS cold-start latency, particularly unfortunate and inopportune when an interactive application has a need for quick response times.

Proponents argue that FaaS scaling is automatically managed, transparent and fine-grained, tied in with the automatic resource provisioning and allocation. But a K8s running application has also this autoscaling feature built in with Horizontal Pod Autoscaling (HPA) in a more custom metric controlled manner.

But what bothers me most is that vendor lockin factor which could be high if project teams are not careful. There is a high risk to get caught in a particular Cloud. But even when Open Source projects like the Serverless Framework or Spring Cloud Functions are used, someone could be trapped by other chunks of the vendor specific infrastructure. For example, an upfront vendor specific Gateway solution that is typically used for an API-oriented FaaS or a cloud-vendor managed GraphQL service to implement the FaaS binding. A changeover would become more than difficult; the vendor solution will not be able to run on anything else but the platform of choice.

One thing is for sure: the smaller the business functionality is, the more orchestration is needed. Yet another trap to get caught in a particular Cloud. Recently I was reading a tweet from Camille Fournier who expresses my thoughts: “I wonder if serverless services will become a thing like stored procedures, a good idea that quickly turns into massive technical debt”. In addition, there is another thing I am particularly concerned about: Observability (logs, metrics, traces) from a holistic perspective; rather difficult to establish good observability solutions with Mircoservices, muchmore difficult with FaaS.

In short, it may be that FaaS is a better choice for a short-lived and event-driven style having a few event types per application component that could tolerate occasional cold-latency delays. Whereas containers with Microservices are seen as a better choice for synchronous-request–driven components with many entry points (to share Mike Robers view). I am slightly surprised that we do not see more about an hybrid architectural approch on both styles – FaaS together with Microservices which might operate as FaaS aggregator.

 

Thesis 6: “Cloud-Native” applications need a Macro-Architecture

One might also wonder whether a “Cloud-Native” Microservice architecture perform miracles. Microservices are coming with the promises to be more simple to understand, faster to build, easier to maintain and simple rewrite/replace promises; more benefits are brought to bear because services are independently deployable and separately scalable (“unlimited” in the Cloud), reduced downtime through fault isolation, self-healing mechanism and service degradation techniques, etc.
But as mentioned by Gray Olliffe, it’s not the full story. The complexity removed from large monolithic applications by building simplified services has not gone away and it brings new challenges, and amplifies old ones. Most of the complexity gets pushed outside the Microservice.

Gray Olliffe: “That complexity has moved and, I would argue, increased. It now lives in what I call the outer architecture

Microservices architectures are a tradeoff from inner complexity to outer complexity. Gartner calls it the Outer Architecture, other calls it a Macro Architecture in contrast to the Micro Architecture which addresses all internal Microservice matters (read also about the Independent Systems Architecture). The Macro Architecture is about architectural decisions affecting the entire system and is mostly the space between the Microservices. The Macro Architecture frees up the service teams to focus on just its own Microservice.

 

Thesis 7: Cloud-Native is NOT a big-bang approach and doing so will backfire incredibly hard

Cloud-Native architectures have plenty of challenges and are very different from traditional application-centric architectures. Just to name a few challenges: session management, database (schema) per service pattern, limited transactionality countering with the Saga pattern (“the art of undoing”), gateway pattern (“abstracting services”), etc.

A mindshift has to take place. Long-established truths have to be questioned and discarded (I am thinking, for example, about the DRY principle). It requires a change in thinking and that change cannot happen overnight. For example, we have long experiences working on architectures that try to operate fail-safe (MTTF focus), now we have to put the focus on fail-fast and recover quickly solutions (MTTR focus). Be aware and internalize the famous quote from AWS CTO, Werner Vogels: “Everything fails all the time”.

Also the technology flexibility for each service could turn into a mess when projects have gigantic technology stacks and services written in many difference languages. It takes time and a competent staff to handle such a flexibility and get it done right.

 

Conclusion:

Cloud-Native Summary


I recently have read that Cloud-Native is about culture. That is too general for me; the term was born with a technical matter and not a fuzzy culture aspect. Often I see the term Cloud-Native hijacked by all kinds of Cloud aspects, just make it appear more interesting and meaningful. Cloud-Native is also much more than just another buzzword, it is a valuable term for applications build with an architecture fitting natively in the Cloud - as a result, it facilitates agility and maintainability. That is the quintessence. Cloud-Native is a technical and architectural matter – sometimes emotional, always intellectually discussed, difficult to grasp, hard and difficult to achieve.

Cloud-Native

Attemt to clarify and demystify the term of Cloud-Native As with many other IT terms, there is no clear and general accepted definition of...