Data plane
The data plane runs entirely within the customer’s cloud account on a Kubernetes cluster. It is where all computation occurs and where customer data is stored at rest (see Data classification and residency). In the self-managed model, the customer operates the data plane independently. In the BYOC model, Union.ai manages the Kubernetes cluster on the customer’s behalf, but it still runs in the customer’s cloud account. See Deployment models for the differences.
Components
The data plane consists of several components, each handling a specific aspect of task execution and data management.
Executor is a Kubernetes controller that watches for TaskAction custom resources created by the control plane. When a TaskAction appears, the Executor reconciles its lifecycle: creating task pods, monitoring their status, and reporting state transitions back to the control plane. If connectivity to the control plane is lost, in-flight pods continue running and state reconciles when the connection is restored.
Object Store Service handles data access operations on the customer’s object store. It signs presigned URLs for bulk data (files, directories, DataFrames, code bundles, and reports) and serves object read/write operations used by the control plane for structured task I/O.
Log Provider serves task logs through two channels. For running tasks, it streams live logs from the Kubernetes API. For completed tasks, it retrieves logs from the cloud provider’s log aggregator (CloudWatch, Cloud Logging, or Azure Monitor). There is no content filtering or redaction; any sensitive data (secrets, PII, stack traces) that applications write to stdout/stderr is included in the stream unmodified.
Image Builder uses Buildkit running on the customer’s Kubernetes cluster to build container images from user-submitted Image specifications. Source code and built images never leave the customer’s infrastructure. Base images are pulled from customer-configured registries, and built images are pushed to the customer’s container registry (ECR, GCR, or ACR).
Tunnel Service maintains the outbound-only encrypted Cloudflare Tunnel from the data plane to the control plane. This service initiates the tunnel (no inbound ports required), performs health checks and heartbeats, and automatically reconnects if the connection drops.
In addition to the tunnel, the data plane operator establishes a separate outbound gRPC connection (TLS) to the regional control plane endpoint for orchestration RPCs (cluster registration, action lifecycle, event reporting, catalog and artifact lookups, admin RPCs). Both channels are outbound-initiated; see Network architecture for what each carries.
Apps & Serving provides model and application serving capabilities using Knative with a Kourier gateway. All serving infrastructure runs within the customer’s cluster. Authentication is enforced on all endpoints by default (SSO for browser access, API keys for programmatic access), with an option to allow anonymous access on specific endpoints. See Apps & Serving security below for details.
For how each of these pathways handles data in transit, see Data flow.
Object store layout
Each data plane cluster uses two object store buckets: a metadata bucket for execution metadata and a fast-registration bucket for rapid code deployment artifacts. Within these buckets, objects are organized by namespace: org/project/domain/run-name/action-name/. This layout provides isolation: IAM policies and bucket policies can scope access to specific organizational boundaries.
Kubernetes security
The data plane enforces several layers of Kubernetes security to protect workloads and limit blast radius.
Workload identity federation eliminates the need for static cloud credentials on the data plane. See IAM and workload identity below for details.
Kubernetes RBAC restricts what each service account can do within the cluster. Platform components have scoped permissions for their specific functions, and task pods run under service accounts with minimal privileges.
Network policies control pod-to-pod communication within the cluster, limiting lateral movement in the event of a container compromise.
Resource quotas and limit ranges prevent any single workload from consuming all cluster resources, providing both stability and a degree of isolation between tenants and projects.
Pod security contexts enforce non-root execution for platform components, reducing the impact of container escape vulnerabilities.
Container security
When a user defines an Image specification, source code is uploaded to the customer’s object store via presigned URL and fetched by the builder; it never transits through the control plane.
Base images are pulled from registries configured by the customer, allowing the use of hardened or pre-approved base images. Customers can apply their own image tagging conventions, vulnerability scanning policies, and registry access controls.
Task pods mount code bundles via presigned URLs with limited time-to-live (TTL). These URLs expire after a short window, limiting the exposure if a URL is intercepted.
IAM and workload identity
The data plane uses two IAM roles to separate platform-level and user-level access:
adminflyterole is used by platform services (Executor, Object Store Service, Log Provider). It has read/write access to the object store buckets, access to the secrets manager for retrieving user-defined secrets, and read access to persisted logs. This role is bound to platform service accounts via workload identity federation.
userflyterole is used by task pods (the containers running user code). It has read/write access to the object store buckets for reading inputs and writing outputs. It does not have access to the secrets manager or platform-level resources.
Both roles use cloud-native workload identity federation: IRSA (IAM Roles for Service Accounts) on AWS, Workload Identity on GCP, and Azure Workload Identity on Azure. No static credentials are created, stored, or rotated. The Kubernetes service account annotations bind each pod to the appropriate IAM role automatically.
Apps & Serving security
App and serving traffic flows entirely within the customer’s infrastructure. No application code, data, or serving requests pass through the control plane.
Inbound traffic reaches the serving endpoints through Cloudflare, which provides DDoS protection, before routing to the Kourier ingress gateway running in the customer’s cluster. Authentication is enforced by default on all endpoints: browser-based access uses SSO, and programmatic access uses API keys. Individual endpoints can be configured for anonymous access when required (for example, public-facing model endpoints).
RBAC controls govern which users and service accounts can deploy applications and access specific endpoints, scoped per project. All serving infrastructure (Knative, Kourier, and the Union Operator) runs within the customer’s Kubernetes cluster. In the BYOC model, Union.ai manages the lifecycle of this serving infrastructure (upgrades, scaling, configuration), but the infrastructure itself resides in the customer’s account.
Verification
Components
Reviewer focus: Confirm that the described components are running in the customer’s cluster and match the documented architecture.
How to verify:
-
List data plane pods and deployments:
kubectl get pods -n union kubectl get deployments -n union -o wideConfirm that the Executor, Object Store Service, Tunnel Service, and other components are present.
-
Inspect a specific component:
kubectl describe pod <executor-pod> -n unionVerify the container image, service account, and resource configuration match expectations.
Kubernetes security
Reviewer focus: Confirm that Kubernetes RBAC, network policies, resource quotas, and pod security contexts are in place and correctly scoped.
How to verify:
-
Review cluster role bindings for Union components:
kubectl get clusterrolebindings | grep union -
Check network policies across namespaces:
kubectl get networkpolicies -A -
Verify resource quotas:
kubectl get resourcequotas -A -
Inspect pod security contexts:
kubectl get pods -n <namespace> -o jsonpath='{.items[0].spec.securityContext}'Confirm
runAsNonRoot: trueor equivalent non-root settings on platform pods.
Container security
Reviewer focus: Confirm that image builds execute entirely within the customer’s infrastructure and that built images never leave the customer’s registry.
How to verify:
-
Trigger an image build by submitting a workflow with an
Imagespecification. -
Observe the build pod:
kubectl get pods -n union | grep build kubectl logs <buildkit-pod> -n unionConfirm that the build pulls base images from the customer’s configured registry and pushes the result to the customer’s container registry.
-
Verify the image in the customer’s registry:
aws ecr describe-images --repository-name <repo> --image-ids imageTag=<tag>(Or the equivalent
gcloud/azcommand for GCP/Azure.)
IAM and workload identity
Reviewer focus: Confirm that the two IAM roles exist with the documented permissions, that workload identity federation is in use, and that no static credentials are present.
How to verify:
-
Inspect the IAM roles and their policies:
aws iam get-role --role-name adminflyterole aws iam list-role-policies --role-name adminflyterole aws iam list-attached-role-policies --role-name adminflyterole aws iam get-role --role-name userflyterole aws iam list-role-policies --role-name userflyterole aws iam list-attached-role-policies --role-name userflyteroleConfirm that
adminflyterolehas object store, secrets manager, and log access. Confirm thatuserflyterolehas only object store access. -
Verify workload identity annotations on service accounts:
kubectl get sa -n union -o yaml | grep role-arnEach service account should have an annotation binding it to the appropriate IAM role via IRSA (or the equivalent for GCP/Azure).
-
Confirm no static credentials exist:
kubectl get secrets -n union -o name | grep -i awsThere should be no secrets containing static AWS access keys. Workload identity federation eliminates the need for them.