Attemt to clarify and demystify the term of Cloud-Native
As with many other IT terms, there is no clear and general accepted definition of Cloud-Native. The range on existing definitions is wide and usually depends on the publisher view and opinion. That makes it difficult to find an objective characterization. Here are my thoughts about the term Cloud-Native listed as theses.
As mentioned above, there is no formal definition of Cloud-Native, but one can list and describe what we see as common characteristics that fit the label. My view comes from own multiyear experiences at well running and, in particular, under-performing Cloud projects. Often these experiences and the resulting conclusions are congruent with the opinion of esteemed professionals who are much smarter than I am.
Thesis 1: “Cloud-Native” at its core is about applications
and their architecture
Cloud-Native needs Cloud infrastructure.
Nevertheless, Cloud infrastructure XaaS (X is variable) is a means and not an
end in itself, like all the other “noisy things” from agile techniques to CIOps
or latest GitOps. Agile development methodologies grounds evolved long before
the cloud-thing was coming up. Even with the best agile process, project teams have
build poor traditional applications in the past or supposed Cloud-Native
application these days. Moreover, a very sophisticated deployment process with
a high degree of automation could deploy Cloud inappropriate applications. Agile
development and innovated CI/CD processes will guide the creation and
deployment of Cloud-Native applications but are not the centrepieces. It even
tends to be the opposite; Cloud-Native applications facilitate agility, are the
base to enable often and fast independent deployments fully automated to all
staging environments.
The real problems arise from weak
architectures with poor modularity that often leads to a big ball of mud. Neither
an unmanageable strong-coupled service mesh (also called Cloud-Native
spaghetti) nor monolithic service blobs will give the stakeholder the outcome
that they hoped for.
The final objective behind Cloud-Native is to
minimize time to market by working fast on cloud-fitting applications delivering
business value (speed factor). It is about speed to market by getting it right in
a faster way. This requires an architecture that care about “putting things
apart”, having distributed and discrete units with clearly defined service
boundaries. Ideally build Cloud vendor-neutral and therefore able to run
anywhere (freedom factor) with a 24/7 uptime facilitated by redundancy together
with fast self-recovery (trust factor). “Cattle” infrastructure with supporting
applications swiftly replacing “pets” (never heard of the "Cattle not
Pets" paradigm of disposable server infrastructure, see the explanation
of the expression creator). These applications are build with mature technologies
and frameworks which are important but not Cloud-Native itself, it is the way
they are used (= architectural and design pattern paired with best practices).
Thesis 2: “Cloud-Native” is the approach to build applications
with services for Cloud environments from day one.
There are different Could-Native maturity models available (for example the models from HCL Technologies, JPMorgan Chase & Co. or Microsoft), they all have in common that Cloud-Native is the highest maturity level. Applications could climb the ladder up to be Cloud-Native. These matured applications must be built for the Cloud and – to be honest – it is a hard way to do it in the right way.
Therefore, simply deploying an existing legacy application in the Cloud does not make it Cloud-Native. Cloud-Native is certainly not a Lift-and-Shift approach (aka rehosting) to bring existing applications as-is from a local non-cloud environment to a new Cloud habitat. For some applications, it is possible to reap certain benefits of the Cloud going for a Lift-and-Shift or API-wrapped approach. In the end, both approaches give some time to refactor and rewrite the existing legacy applications towards a Cloud-Native application. Nevertheless, re-architecture and rewrite is the only way to update the application architecture as part of an overall modernization strategy – a rebirth as a Cloud-Native application.
Thesis 3: “Cloud-Native” build applications as
Microservices, packaging each part into its own container and dynamically
orchestrating those deployed containers in order to optimize resource
utilization.
This thesis is the slightly modified original characterization
taken from the Cloud Native Computing Foundation (CNCF)
which is a non-profit technology consortium, created with the charter to define
the term Cloud-Native and to promote cloud-based projects. All big Cloud players
like Amazon AWS, Microsoft Azure and Google Cloud are CNCF members. There is no
getting around CNCF, which gather heavyweights in the IT industry around one
table to form a shared understanding. Today the definition changed to a more generic description,
but the main message about the three original elements is still true in
Cloud-Native reality: "application containerization; microservice-oriented
architecture; application support container orchestration and scheduling".
Even though the characterization is a few
years old, I still agree with the original CNCF characterization of
Cloud-Native, especially with the wording “application
as Microservices” which revive and reify thesis 2, claiming to “build applications with services” – clarifying
that we are talking about Microservices.
Netflix is often regarded as one of the
pioneers of the Microservice movement, shaping new Cloud architectures to
operate at massive scale. There are many magic frameworks from NetflixOSS that
help with fighting the distributed systems challenges. Adrian Cockcroft, the
former VP of Cloud architecture at Netflix, was one of the first who used the term Cloud-Native
together with Microservices:
Adrian Cockcroft: “The application is designed from the ground up to take advantage of the
elasticity and automation provided by the cloud. It uses cloud-native patterns
and practices, such as microservices, containers, and declarative APIs to
achieve high availability, resiliency, and agility.”
I fully share that early days view, “designed from the ground” underlines
addionally the significance of thesis 2, claiming for a cloud-day-one-architecture.
Based on “cloud-native patterns and
practices, such as microservices” is the reference to a Microservice
architecture demaned at thesis 3.
Obviously, Microservices play an
important role in the context of Cloud-Native. These two terms are
closely interrelated. What are Microservices exactly? Although Microservices
are also very broad and mixed defined, I would agree with the description given by James
Lewis and Martin Fowler:
“In
short, the Microservice architectural style is an approach to developing a
single application as a suite of small services, each running in its own
process and communicating with lightweight mechanisms, often an HTTP
resource API. These services are built around business capabilities and independently
deployable by fully automated deployment machinery. There is a bare minimum
of centralized management of these services, which may be written in different
programming languages and use different data storage technologies.”
Previous theses and the given Microservice
description goes all hand in hand. It is therefore clear to me, a Cloud-Native
application – and Cloud-Native is first and foremost about applications - is
build and deployed as a suite of small Microservices, each running as own
process packaged as container. These small deployment units with one container
per “bootiful” Microservice make fast independent deployments and individual
scalability possible in the first place. The open success secret behind is to
get the service boundaries right towards high cohesion internally and to be
loosely coupled to each other (the best source to learn more about service
boundaries is still Domain Driven Design).
In the words of Gary Olliffe, Research Vice
President of Gartner:
“The
superpowers of a Microservice architecture can, in large parts, be attributed
to the benefits of loose coupling and decomposing systems into component
services that can de developed, deployed, and operated independently of each
other”.
Microservices should remain small to make them
easier to understand, maintain, and to rewrite/replace (if necessary). In order
to have a high cohesion, the service responsibility has to be right regarding the
domain. Vaughn Vernon, a leading expert in Domain Driven Design, recommends expressly
looking into Microservices when the rate of change across business domains
differs substantially.
Once Microservices are packaged as containers,
the second part of the thesis is about “dynamically
orchestrating those deployed containers in order to optimize resource
utilization”. I think, today we do not need any further discussion about
the advantages of container orchestration and that Kubernetes (K8s) has won the
game. Kubernetes makes containerized services radically easier to manage and
became a key part of the container revolution. I would even go so far as to say
that Kubernetes has become the established operating platform for Cloud-Native applications;
it is the defacto standard Cloud-Native “operating system”. Nowadays a
Cloud-Native application means also a Kubernetes-Native application.
Kubernetes became the standard for container
orchestration because it allows you to treat your containers like Cattle. Meanwhile
we treat the Kubernetes clusters like Cattle, too. Spinning them up and down on
demand by using appropriate infrastructure as code (IaC) tools (Terraform is
the most popular representative of this type of IaC tools).
Thesis 4: “Cloud-Native”
is using an Open Source software stack
I initially removed the Open Source commitment
from the original CNCF characterization in thesis 3 – in full knowledge of diverging
views and to give it an own thesis. Is Cloud-Native associated with Open
Source? I believe that is the case to a high degree.
Using a Closed Source provider who is most
visible in advertising but not portale between Cloud platforms is not a good
decision. Furthermore, a lazy decision for a software provider that have been
used in a pre-cloud approach is also not a good decisionmaking base. Unfeasible
auto provisioning, portability problems and exploding costs could be the
result.
The usage of Open Source will enable freedom
of choice. Using Amazon-/Microsoft-/Google-Source leads to a single Cloud
vendor lockin. It is easy to yield to the temptation of the Cloud service
providers using their proprietary services. But it is very difficult to
outbreak when a move to another Cloud provider or a hybrid approach with
multiple provider is suddenly on the agenda. It is also conceivable that a
Cloud-Exit strategy is demanded in order to avoid becoming dependent on a
single supplier, requiring a move to another Cloud vendor within a specific
timeframe anytime in the future. Proprietary Cloud Source will create barriers
to technology to pull into a particular Cloud; Open Source will eliminate
barriers to technology to enable freedom of choice.
An AWS-, Azure- or GCP-first approach, using proprietary
cloud vendor services make sense only in certain scenarios. One case might be the
reason of lacking Cloud-Native architectural and Open Source tooling knowledge
(combined with the need to speed-up the application development). Another
reason is deep trust in a Cloud provider whatever may come. Nevertheless, an
application design with a Cloud-Exit and migration strategy in mind offers more
flexibility and avoids a Cloud lockin.
I believe that the crowdsourced wisdom of a highly
motivated developer community will create outstanding software. Many developers
coming from different directions, sharing experiences, let maturing software in
public. CNCF promote trusted open source projects and give them a home. Just
like in the real world where kids need stability to grow up, CNCF give
stability and trust for the Clout-Native open source ecosystem. There is no
particular endorsement of any one project over another. CNCF keeps the overview
in a confusing tooling jungle and often providing the defacto toolset standard.
Dan Kohn, the the recently deceased former CNCF
executive director, neatly summarized when Amazon AWS joining the CNCF as a
platinum member in 2017: “Virtualization
was the biggest trend in enterprise computing over the last decade. The age of
virtualization is now ending and the cloud-native era has begun, with an
open sources software stack enabling portability between public, private and
hybrid clouds, with the addition of AWS today, all the major cloud
vendors are working together supporting open-source development of cloud-native
technologies, with Kubernetes a primary focus of their collaboration.
We believe AWS' participation will help shape the future of enterprise
computing."
CNCF claims to provide a collection of Open Source
project anyone can trust in order to avoid that “cloud is eating open source”
and running into a cloud-vendor lockin. They support common Open Source
projects anyone can use on any cloud platform. CNCF is about working
projects-first, they enable competition and choice. It takes very good reasons
not to choose the CNCF Open Source toolset!
Thesis 5: “Cloud-Native”
is only under strict conditions Serverless/FaaS
The innovative leap on Serverless or FaaS (Function
as a Service) happened relative quickly after the
Microservice revolution. Project teams still struggling to get Microservice
architectures right and we already have FaaS around the corner. I use “FaaS” as
a shorthand for the development of “server-side” logic written by application
developer (in distinction from pure “glue code” between cloud-vendor specific
managed services), but coming with certain limitations and constraints.
In my opinion, FaaS is not the correct
approach for every problem (what we continually hear and read); it will definitely
not replace all existing architectures. IMHO, the spectrum of applications
appears to be more limited than using a K8s running Microservice approach.
FaaS comes with vendor specific quotas applied
to function configuration, deployments, and executions. Typically, these quotas
cannot be changed, there are restrictions! FaaS functions are typically limited
in how long each invocation is allowed to run and how many memory could be
used. Environment variables are limited, payload size is limited, code package
deployment size is limited, etc. Additonally architects and developers have to
deal with a FaaS cold-start latency, particularly unfortunate and inopportune
when an interactive application has a need for quick response times.
Proponents argue that FaaS scaling is automatically
managed, transparent and fine-grained, tied in with the automatic resource
provisioning and allocation. But a K8s running application has also this
autoscaling feature built in with Horizontal Pod Autoscaling (HPA) in a more
custom metric controlled manner.
But what bothers me most is that vendor lockin
factor which could be high if project teams are not careful. There is a high
risk to get caught in a particular Cloud. But even when Open Source projects
like the Serverless Framework or Spring Cloud Functions are used, someone could
be trapped by other chunks of the vendor specific infrastructure. For example,
an upfront vendor specific Gateway solution that is typically used for an
API-oriented FaaS or a cloud-vendor managed GraphQL service to implement the FaaS
binding. A changeover would become more than difficult; the vendor solution will
not be able to run on anything else but the platform of choice.
One thing is for sure: the smaller the
business functionality is, the more orchestration is needed. Yet another trap
to get caught in a particular Cloud. Recently I was reading a tweet from
Camille Fournier who expresses my thoughts: “I wonder if serverless services will become a thing like stored
procedures, a good idea that quickly turns into massive technical debt”. In
addition, there is another thing I am particularly concerned about:
Observability (logs, metrics, traces) from a holistic perspective; rather
difficult to establish good observability solutions with Mircoservices, muchmore
difficult with FaaS.
In short, it may be that FaaS is a better
choice for a short-lived and event-driven style having a few event types per
application component that could tolerate occasional cold-latency delays.
Whereas containers with Microservices are seen as a better choice for synchronous-request–driven
components with many entry points (to share Mike Robers view).
I am slightly surprised that we do not see more about an hybrid architectural
approch on both styles – FaaS together with Microservices which might operate
as FaaS aggregator.
Thesis 6: “Cloud-Native”
applications need a Macro-Architecture
But as mentioned by Gray Olliffe, it’s not the full story. The complexity removed from large monolithic applications by building simplified services has not gone away and it brings new challenges, and amplifies old ones. Most of the complexity gets pushed outside the Microservice.
Microservices architectures are a tradeoff from inner complexity to outer complexity. Gartner calls it the Outer Architecture, other calls it a Macro Architecture in contrast to the Micro Architecture which addresses all internal Microservice matters (read also about the Independent Systems Architecture). The Macro Architecture is about architectural decisions affecting the entire system and is mostly the space between the Microservices. The Macro Architecture frees up the service teams to focus on just its own Microservice.
Thesis 7: Cloud-Native is NOT a big-bang approach and
doing so will backfire incredibly hard
A mindshift has to take place. Long-established truths have to be questioned and discarded (I am thinking, for example, about the DRY principle). It requires a change in thinking and that change cannot happen overnight. For example, we have long experiences working on architectures that try to operate fail-safe (MTTF focus), now we have to put the focus on fail-fast and recover quickly solutions (MTTR focus). Be aware and internalize the famous quote from AWS CTO, Werner Vogels: “Everything fails all the time”.
Also the technology flexibility for each service could turn into a mess when projects have gigantic technology stacks and services written in many difference languages. It takes time and a competent staff to handle such a flexibility and get it done right.
Conclusion: