Skip to main content

2 posts tagged with "Serverless"

View All Tags

· 10 min read

If you missed the first installment of our Knative series, you can catch up by diving into our previous blog post: Dive into Knative—Explore Serverless with Kubernetes

Overview

Overview

Technology Conversations

The key aspects and benefits of Knative Serving:

  1. Serverless Platform: Knative Serving is a serverless platform built on top of Kubernetes.
  2. Deployment Simplification: It simplifies the deployment of containerized applications on Kubernetes.
  3. Auto-scaling: Automatically scales applications based on demand, ensuring optimal resource utilization.
  4. Traffic Management: Provides features for managing traffic routing, allowing seamless updates and rollbacks.
  5. Focus on Development: Abstracts away infrastructure management complexities, enabling developers to focus on writing and deploying code.
  6. Cloud-Native Applications: Facilitates the development of modern, scalable, and resilient cloud-native applications.

For an introductory exploration of Knative Serving, delve into our dedicated Knative Serving section.

Knative Serving Architecture

Architecture Diagram

Knative Serving consists of several components forming the backbone of the Serverless Platform. This blog explains the high-level architecture of Knative Serving.

Architecture

Components

  • Activator: The activator is part of the [data-plane]. It is responsible to queue incoming requests (if a Knative Service is scaled-to-zero). It communicates with the autoscaler to bring scaled-to-zero Services back up and forward the queued requests. Activator can also act as a request buffer to handle traffic bursts.
  • Autoscaler: The autoscaler is responsible to scale the Knative Services based on configuration, metrics and incoming requests.
  • Controller: The controller manages the state of Knative resources within the cluster. It watches several objects, manages the lifecycle of dependent resources, and updates the resource state.
  • Queue-Proxy: The Queue-Proxy is a sidecar container in the Knative Service's Pod. It is responsible to collect metrics and enforcing the desired concurrency when forwarding requests to the user's container. It can also act as a queue if necessary, similar to the Activator.
  • Webhooks: Knative Serving has several webhooks responsible to validate and mutate Knative Resources.

HTTP Request Flows

This explains the behavior and flow of HTTP requests to an application which is running on Knative Serving.

HTTP Request Flows

  1. Initial Request: When a user sends an HTTP request to your Knative service, it first hits the ingress gateway.
  2. Routing Decision: The ingress gateway examines the request to determine which Knative service should handle it based on the requested domain name.
  3. Service Activation: Knative Serving keeps your service deployed at all times. When a request arrives and no instances are running, it promptly activates a new instance by spinning up a pod.
  4. Scaling Decision: Knative Serving checks the current load and decides how many instances of the service need to be running to handle incoming requests efficiently.
  5. Activator Interaction: For the first-time request, it goes to the activator. The activator asks the auto scaler to scale up one pod to serve the initial request, ensuring rapid response and availability.
  6. Request Handling: The request is then forwarded to one of the instances of your service, where your application code processes it.
  7. Containerized Environment: Within each pod, there are two containers:
    • User Container: This container hosts your application code, serving user requests.
    • Queue Container: This container monitors metrics and observes concurrency levels.
  8. Auto-scaling Based on Concurrency: When the concurrency exceeds the default level, the autoscaler spins up new pods to handle the increased concurrent requests, ensuring optimal performance.
  9. Response: After processing the request, your service generates a response, which is sent back through the same flow to the user who made the initial request.
  10. Scaling Down: If there is no more traffic or if the traffic decreases significantly, Knative Serving may scale down the number of running instances to save resources.

Revisions

  • Revisions are Knative Serving resources representing snapshots of application code and configuration.
  • They are created automatically in response to updates in a Configuration spec.
  • Revisions cannot be directly created or updated; they are managed through Configuration changes.
  • Deletion of Revisions can be forced to handle resource leaks or remove problematic Revisions.
  • Revisions are generally immutable, but may reference mutable Kubernetes resources like ConfigMaps and Secrets.
  • Changes in Revision defaults can lead to syntactic mutations in Revisions, affecting configuration without altering their core behavior.

Autoscaling

Kubernetes Autoscaling Options

Kubernetes Autoscaling Options

Knative Serving provides automatic scaling, or autoscaling, for applications to match incoming demand. This is provided by default, by using the Knative Pod Autoscaler (KPA).

For example, if an application is receiving no traffic and scale to zero is enabled, Knative Serving scales the application down to zero replicas. If scaling to zero is disabled, the application is scaled down to the minimum number of replicas specified for applications on the cluster. Replicas are scaled up to meet demand if traffic to the application increases.

Supported Autoscaler types

Knative Serving supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes' Horizontal Pod Autoscaler (HPA).

  • Knative Pod Autoscaler (KPA)
    • Part of the Knative Serving core and enabled by default once Knative Serving is installed.
    • Supports scale to zero functionality.
    • Does not support CPU-based autoscaling.
  • Horizontal Pod Autoscaler (HPA)
    • Not part of the Knative Serving core, and you must install Knative Serving first.
    • Does not support scale to zero functionality.
    • Supports CPU-based autoscaling.

Knative Serving Autoscaling System

APIs

  1. PodAutoscaler (PA):
    • API: podautoscalers.autoscaling.internal.knative.dev
    • It's an abstraction that encompasses all possible PodAutoscalers, with the default implementation being the Knative Pod Autoscaler (KPA).
    • The PodAutoscaler manages the scaling target, the metric used for scaling, and other relevant inputs for the autoscaling decision-making process.
      1. Scaling Target: The PodAutoscaler determines what resource it should scale. This could be the number of pods, CPU utilization, memory consumption, or any other metric that indicates the workload's demand.
      2. Metric for Scaling: It specifies which metric or metrics should be used to make scaling decisions. For example, it might use CPU utilization to decide when to add or remove pods based on workload demand.
      3. Other Inputs: The PodAutoscaler considers additional factors beyond just the scaling metric. These could include constraints, policies, or thresholds that influence scaling decisions. For instance, it might have rules to prevent scaling beyond a certain limit or to ensure a minimum number of pods are always running.
    • PodAutoscalers are automatically created from Revisions by default.
  2. Metric:
    • API: metrics.autoscaling.internal.knative.dev
    • This API controls the collector of the autoscaler, determining which service to scrape data from, how to aggregate it, and other related aspects.
      1. Collector Control: The API controls the collector component of the autoscaler. The collector is responsible for gathering data related to the performance and behavior of the services being monitored for autoscaling.
      2. Data Scraping: It determines which service or services the autoscaler should scrape data from. This involves collecting relevant metrics such as CPU utilization, request latency, or throughput from the specified services.
      3. Aggregation: The API defines how the collected data should be aggregated. This could involve calculating averages, sums, or other statistical measures over a specific time window to provide a meaningful representation of the service's performance.
      4. Other Related Aspects: Beyond data collection and aggregation, the API likely handles other aspects such as data retention policies, thresholds for triggering scaling actions, and configurations for interacting with the autoscaler's decision-making process.
    • Metrics are automatically generated from PodAutoscalers by default.
  3. ServerlessServices (SKS):
    • API: serverlessservices.networking.internal.knative.dev
    • It's an abstraction layer built on top of Kubernetes Services, managing the data flow and the switch between using the activator as a buffer or routing directly to application instances.
    • SKS creates two Kubernetes services for each revision: a public service and a private service.
    • The private service points to the application instances, while the public service endpoints are managed directly by the SKS reconciler.
    • SKS operates in two modes: Serve and Proxy.
      1. In Serve mode, traffic flows directly to the revision's pods.
      2. In Proxy mode, traffic is directed to activators.
    • ServerlessServices are created from PodAutoscalers.

Scaling up and down (steady state)

steady state

  • Steady State Operation:
    • The autoscaler operates continuously at a steady state.
    • It regularly scrapes data from the currently active revision pods to monitor their performance.
  • Dynamic Adjustment:
    • As incoming requests flow into the system, the scraped values of performance metrics change accordingly.
    • Based on these changing metrics, the autoscaler dynamically adjusts the scale of the revision.
  • SKS Functionality:
    • The ServerlessServices (SKS) component keeps track of changes to the deployment's size.
    • It achieves this by monitoring the private service associated with the deployment.
  • Public Service Update:
    • SKS updates the public service based on the changes detected in the deployment's size.
    • This ensures that the public service endpoints accurately reflect the available instances of the revision.

Scaling to zero

Scaling to zero

  • Scaling to Zero Process (1):
    • A revision scales down to zero when there are no more requests in the system.
    • All data collected by the autoscaler from revision pods and the activator reports zero concurrency, indicating no active requests.
  • Activator Preparation:
    • Before removing the last pod of the revision, the system ensures that the activator is in the path and reachable.
  • Proxy Mode Activation (4.1):
    • The autoscaler, which initiated the decision to scale to zero, directs the SKS to switch to Proxy mode.
    • In Proxy mode, all incoming traffic is routed to the activators.
  • Public Service Probing:
    • The SKS's public service is probed continuously to ensure it returns responses from the activator.
    • Once the public service reliably returns responses from the activator and a configurable grace period (set via scale-to-zero-grace-period) has elapsed,
  • Final Scaling Down (5):
    • The last pod of the revision is removed, marking the successful scaling down of the revision to zero instances.

Scaling from zero

Scaling from zero

  • Scaling Up Process:
    • If a revision is scaled to zero and a request arrives for it, the system needs to scale it up.
    • As the SKS is in Proxy mode, the request reaches the activator.
  • Request Handling:
    • The activator counts the incoming request and reports its appearance to the autoscaler (2.1).
    • It then buffers the request and monitors the SKS's private service for endpoints to appear (2.2).
  • Autoscaling Cycle (3):
    • The autoscaler receives the metric from the activator and initiates an autoscaling cycle.
    • This process determines the desired number of pods based on the incoming request.
  • Scaling Decision (4):
    • The autoscaling process concludes that at least one pod is needed to handle the incoming request.
  • Scaling Up Instructions (5.1):
    • The autoscaler instructs the revision's deployment to scale up to N > 0 replicas to accommodate the increased demand.
  • Serve Mode Activation (5.2):
    • The autoscaler switches the SKS into Serve mode, directing traffic to the revision's pods directly once they are up.
  • Endpoint Probing:
    • The activator monitors the SKS's private service for the appearance of endpoints.
    • Once the endpoints come up and pass the probe successfully, the respective address is considered healthy and used to route the buffered request and any additional requests that arrived in the meantime (8.2).
  • Successful Scaling Up:
    • The revision has successfully scaled up from zero to handle the incoming request.

Conclusion

In summary, we've explored the core concepts of Knative Serving, from its architecture to scaling mechanisms. Next, we'll dive into practical implementation in our upcoming blog. Also, stay tuned for the integration of the serverless component into the WeDAA Platform, making prototyping and deployment faster and easier than ever.

· 8 min read

What is serverless?

Serverless is a cloud-native development model that allows developers to build and run applications without having to manage servers.

There are still servers in serverless, but they are abstracted away from app development. A cloud provider handles the routine work of provisioning, maintaining, and scaling the server infrastructure. Developers can simply package their code in containers for deployment.

Once deployed, serverless apps respond to demand and automatically scale up and down as needed.

Serverless Computing: A Catering Service Analogy

Catering Service Analogy

Catering Service Analogy

Imagine you're hosting a dinner party. In a traditional hosting scenario, you'd have to plan everything from cooking the food to setting the table and serving your guests. This is like managing servers in traditional computing – you have to handle all the details yourself.

Now, consider a serverless approach as hiring a catering service for your party. You tell them what you need, and they take care of everything – from cooking the food to setting up and serving. You don't have to worry about the kitchen logistics or cleaning up afterward; you can focus on enjoying the party with your guests. Similarly, in serverless computing, you provide your code, and the cloud provider takes care of the infrastructure, scaling, and management, allowing you to focus on writing and improving your application.

Kubernetes-Powered Serverless: Introducing Knative

Serverless Framework Knative

Serverless Framework Knative

In the rapidly evolving landscape of cloud computing, serverless technology has become increasingly popular for its simplicity in deploying applications without worrying about infrastructure. Knative, built on top of Kubernetes (k8s), extends the power of Kubernetes to manage serverless workloads seamlessly. While major cloud providers like AWS, Google Cloud, and Microsoft Azure offer their serverless solutions, Knative stands out as an open-source, platform-agnostic framework.

Collaboratively developed by industry leaders like Google and Red Hat, Knative abstracts away the complexities of deploying, scaling, and managing containerized applications, allowing developers to focus solely on writing code without worrying about infrastructure management. Knative simplifies serverless deployments across diverse cloud environments, revolutionizing the way applications are developed and deployed in modern cloud-native architectures.

Exploring Knative Features: Simplifying Serverless Deployment

Serverless refers to running back-end programs and processes in the cloud. Serverless works on an as-used basis, meaning that companies only use what they pay for. Knative is a platform-agnostic solution for running serverless deployments.

Knative Features

  • Simpler Abstractions: simplifies the YAML configuration process by providing custom CRDs (Custom Resource Definitions), streamlining the abstraction layers and making development workflows more straightforward.

  • Autoscaling: autoscaling feature seamlessly adjusts resource allocation, scaling applications down to zero and up from zero based on demand.

  • Progressive Rollouts: Customize your rollout strategy with Knative's Progressive Rollouts feature, offering flexibility to select the ideal approach based on your specific requirements.

  • Event Integrations: Easily manage events from diverse sources with Knative's Event Integrations, streamlining event handling for seamless integration.

  • Handle Events: Effortlessly trigger handlers from the event broker with Knative's event handling capabilities, ensuring seamless integration and streamlined workflow.

  • Plugable: Knative's pluggable architecture ensures seamless integration and extension within the Kubernetes ecosystem, providing flexibility and scalability for diverse use cases.

Knative Components

Knative has two main components that empower teams working with Kubernetes. Serving and Eventing work together to automate and manage tasks and applications.

Serving Eventing

  • Knative Serving: Allows running serverless containers in Kubernetes with ease. Knative takes care of the details of networking, autoscaling (even to zero), and revision tracking. Teams can focus on core logic using any programming language.
  • Knative Eventing: Allows universal subscription, delivery and management of events. Build modern apps by attaching compute to a data stream with declarative event connectivity and developer friendly object models.

Knative Serving

Knative Serving defines a set of objects as Kubernetes Custom Resource Definitions (CRDs). These objects get used to define and control how your serverless workload behaves on the cluster:

Knative Serving

Savita Ashture, CC BY-SA 4.0

  • Service: A Knative Service describes a combination of a route and a configuration as shown above. It is a higher-level entity that does not provide any additional functionality. It should make it easier to deploy an application quickly and make it available. You can define the service to always route traffic to the latest revision or a pinned revision.

  • Route: The Route describes how a particular application gets called and how the traffic gets distributed across the different revisions. There is a high chance that several revisions can be active in the system at any given time based on the use case in those scenarios. It's the responsibility of routes to split the traffic and assign to revisions.

  • Configuration: The Configuration describes what the corresponding deployment of the application should look like. It provides a clean separation between code and configuration and follows the Twelve-Factor App methodology. Modifying a configuration creates a new revision.

  • Revision: The Revision represents the state of a configuration at a specific point in time. A revision, therefore, gets created from the configuration. Revisions are immutable objects, and you can retain them for as long as useful. Several revisions per configuration may be active at any given time, and you can automatically scale up and down according to incoming traffic.

Knative Serving focuses on:

  • Rapid deployment of serverless containers.
  • Autoscaling includes scaling pods down to zero.
  • Support for multiple networking layers such as Ambassador, Contour, Kourier, Gloo, and Istio for integration into existing environments.
  • Give point-in-time snapshots of deployed code and configurations.

Knative Eventing

Knative Eventing is a collection of APIs that enable you to use an event-driven architecture with your applications. You can create components that route events from event producers to event consumers, known as sinks, that receive events.

Use-cases

General areas of application are:

  • Publishing an event without creating a consumer. You can send events to a broker as an HTTP POST, and use binding to decouple the destination configuration from your application that produces events.

  • Consuming an event without creating a publisher. You can use a trigger to consume events from a broker based on event attributes.

  • IoT, network monitoring, application monitoring, website testing and validation, and mobile app front-end processes that act as event generators.

Use Knative eventing when:

  • When you want to publish an event without creating a consumer. You can send events to a broker as an HTTP POST, and use binding to decouple the destination configuration from your application that produces events.

  • When you want to consume an event without creating a publisher. You can use a trigger to consume events from a broker based on event attributes. The application receives events as an HTTP POST.

  • When you want to create components that route events from event producers to event consumers, known as sinks, that receive events. Sinks can also be configured to respond to HTTP requests by sending a response event.

Knative Eventing

Eventing Components

Components

  • Sources: Knative eventing sources are objects that generate events and send them to a sink. They are created by instantiating a custom resource (CR) from a source object. There are different types of sources, such as PingSource, ApiServerSource, KafkaSource, etc., depending on the event producer.

  • Sinks: Knative eventing sinks are objects that receive events from sources or other components. They can be Addressable or Callable resources that have an address defined in their status.address.url field. Addressable sinks can receive and acknowledge an event delivered over HTTP, while Callable sinks can also respond to HTTP requests by sending a response event. Knative Services, Channels, and Brokers are all examples of sinks.

  • Brokers: Knative eventing brokers are objects that define an event mesh for collecting a pool of events. Brokers provide a discoverable endpoint for event ingress, and use triggers for event delivery. Event producers can send events to a broker by posting the event.

  • Channels: Channels are custom resources that define a single event-forwarding and persistence layer. You can connect channels to various backends for sourcing events, such as In-Memory, Kafka, or GCP PubSub. You can also fan-out received events, through subscriptions, to multiple destinations, or sinks. Examples of sinks include brokers and Knative services.

  • Subscriptions: Knative subscriptions are objects that enable event delivery from a channel to an event sink, also known as a subscriber. A subscription specifies the channel and the sink to deliver events to, as well as some sink-specific options, such as how to handle failures.

  • Triggers: Knative Triggers are objects that enable seamless integration with external event sources, allowing applications to react dynamically to incoming events, fostering the development of scalable, event-driven architectures.

Conclusion

In this overview, we've explored serverless computing with Knative on Kubernetes, covering core concepts, features, and components. Stay tuned for practical implementations and real-world use cases in upcoming blogs, unlocking Knative's full potential for your projects. With Knative, the future of serverless on Kubernetes is brighter than ever.

Furthermore, I'm excited to announce that our platform, WeDAA, will be hosting these upcoming blogs. WeDAA is committed to providing innovative solutions, and soon, we'll be incorporating serverless capabilities into our platform. Keep an eye out for our future updates, as we continue to evolve and enhance our services to meet your needs.

Continue your exploration of Knative by diving into our next blog on Knative Serving Definitive Guide to Knative Serving—A Deep Dive into Theory and Architecture!