Reference · Commercial Kafka Distribution

Confluent Platform & Cloud — Component Map, Licensing, and Public APIs

A reference for what Confluent actually ships on top of Apache Kafka — Confluent Platform (the on-prem distribution) and Confluent Cloud (the managed SaaS) — with each component tagged by its licence (Apache 2.0, Confluent Community Licence, or proprietary) and notes on the public APIs and semantics that show up when integrating with Schema Registry, ksqlDB, Cluster Linking, and Tiered Storage. Source-level detail is intentionally absent: the broker and Connect framework are Apache 2.0, but most of Confluent's added value ships under Confluent Community Licence or fully proprietary, and this page sticks to published behaviour.

1. Confluent Platform vs Confluent Cloud

Confluent ships two distinct products built around the same Apache Kafka broker. Confluent Platform (CP) is a self-managed distribution: you install brokers, controllers, Schema Registry, Connect, ksqlDB, and Control Center on your own VMs or Kubernetes (via Confluent for Kubernetes / CFK), and you operate them. Confluent Cloud is a managed multi-tenant service hosted on AWS, GCP, and Azure, where Confluent runs the brokers and most of the add-ons for you behind a unified control plane. The two share a name, the broker, the wire protocols, and most client libraries, but they diverge sharply in operating model, feature gating, networking options, and licensing.

Functionally, Confluent Platform is "Apache Kafka + Confluent's add-on components, packaged and supported." Confluent Cloud is a SaaS whose API surface is a superset (with quotas) of Apache Kafka, including Confluent-only managed services such as Stream Governance, Flink-as-a-Service, and Cloud-only enterprise features.

2. Licensing — what is OSS, what is CCL, what is proprietary

This is the single most important thing to understand before adopting any Confluent component. The Apache Kafka broker itself is and remains Apache 2.0. Several of Confluent's add-ons were Apache 2.0 originally and were relicensed in 2018 under the Confluent Community Licence (CCL) — source remains visible on GitHub, but the licence forbids offering the software as a competing managed service. Other components have always been proprietary and are only available with a Confluent Enterprise subscription or as part of Confluent Cloud.

Component Licence Notes
Apache Kafka broker, AdminClient, KRaft controller, Streams API, MirrorMaker 2 Apache 2.0 The Apache project. Confluent Platform repackages this; Confluent contributes back to upstream.
Kafka Connect framework Apache 2.0 Framework is Apache; individual connectors vary — see Confluent Hub.
Schema Registry Confluent Community Licence Relicensed from Apache 2.0 in 2018. Source on GitHub at confluentinc/schema-registry.
REST Proxy Confluent Community Licence Relicensed from Apache 2.0 in 2018.
ksqlDB Confluent Community Licence The product formerly known as KSQL. Relicensed in 2018.
Confluent Hub connectors Varies (Apache 2.0, CCL, or proprietary per connector) Each connector page on Confluent Hub states its licence; assume nothing without checking.
Replicator Proprietary Cross-cluster replication tool. Largely superseded by Cluster Linking.
Cluster Linking Proprietary Available in Confluent Platform under enterprise subscription and in Confluent Cloud.
Tiered Storage (Confluent's implementation) Proprietary Confluent's tiered storage predates the Apache project's own KIP-405 implementation; they are distinct codebases.
Self-Balancing Clusters (SBC) Proprietary Confluent's equivalent of Cruise Control, integrated into the broker.
Confluent Control Center Proprietary The operational UI for Confluent Platform.
RBAC + Metadata Service (MDS) Proprietary Role-based access on clusters, topics, schemas, connectors.
Confluent for Kubernetes (CFK) Proprietary The operator that replaced the older Confluent Operator. The open-source equivalent ecosystem here is Strimzi.
Confluent Cloud (the SaaS itself) Proprietary You consume it as a service; the runtime is not delivered as software.

The practical reading: if a customer says "we are on Confluent," ask which components. A team using only the Apache 2.0 broker plus open-source clients can migrate off Confluent with minimal friction. A team using ksqlDB, Schema Registry, RBAC, Cluster Linking, and Tiered Storage has built on a stack whose value-added pieces will not run as a hosted service on a non-Confluent vendor — that is the deliberate effect of the CCL relicense.

3. Component Map

A rough mental model of what sits where:

CONFLUENT CLOUD — managed SaaS wrapper (proprietary) Apache Kafka broker KRaft controller · AdminClient Streams · MirrorMaker 2 Apache 2.0 Schema Registry subjects · compatibility CCL REST Proxy HTTP → produce / fetch CCL ksqlDB streams / tables / pull CCL Kafka Connect framework connectors vary by licence Apache 2.0 Cluster Linking offset-preserving mirror Proprietary Tiered Storage hot / cold object store Proprietary Self-Balancing Clusters partition rebalance Proprietary RBAC + MDS · Control Center · CFK authn/z · UI · Kubernetes operator Proprietary

Green = Apache 2.0. Amber = Confluent Community Licence. Red = proprietary. The dashed outer box is Confluent Cloud, which wraps and operates the whole stack as a managed service.

4. Schema Registry — subjects, compatibility, wire format

Schema Registry stores Avro, JSON Schema, and Protobuf schemas and assigns each a globally unique integer ID. Producers fetch (or register) the schema ID for a given record before serialising; consumers look up the schema by ID before deserialising. Two things matter most: how schemas are grouped under subjects, and how compatibility is enforced when a schema evolves.

Subjects and naming strategies

A subject is the version namespace for a schema. The default TopicNameStrategy produces two subjects per topic — <topic>-key and <topic>-value — so a topic orders with both key and value schemas registers orders-key and orders-value. RecordNameStrategy uses the fully qualified record name instead of the topic, so two topics carrying the same record type share a subject. TopicRecordNameStrategy combines the two. Pick the strategy at producer config time; mixing strategies within a single topic causes look-ups to miss.

Compatibility modes

Compatibility is the contract for what kinds of schema changes are allowed and which direction of clients (old or new) must continue to work after the change. The Schema Registry's enforcement modes:

  • BACKWARD (default) — a new schema can read data written with the previous schema. Adding optional fields is fine; removing or renaming required fields is not. Lets producers upgrade first.
  • BACKWARD_TRANSITIVE — same rule, but checked against every prior version, not just the previous one.
  • FORWARD — a previous schema can read data written with the new schema. Lets consumers upgrade first.
  • FORWARD_TRANSITIVE — forward checked against every prior version.
  • FULL — both BACKWARD and FORWARD against the previous version.
  • FULL_TRANSITIVE — both, against every prior version.
  • NONE — no compatibility check. Use only when you control both producer and consumer roll-outs in lockstep.

Per-subject configuration overrides the global setting. PUT /config/<subject> with {"compatibility":"FORWARD"} changes the rule for that subject only.

Wire format

byte 0           1-byte magic, always 0x00
bytes 1..4       4-byte schema ID (big-endian int32)
bytes 5..end     the serialised payload (Avro binary, JSON, or Protobuf)

This 5-byte prefix is what separates "a Kafka record with Schema Registry" from "a raw Avro/JSON/Protobuf byte array." A consumer that decodes the payload without stripping the prefix will fail with a corrupt-record error; that is one of the most common Schema Registry misconfigurations in practice.

REST API surface

# List subjects, list versions, fetch a schema by ID
GET  /subjects
GET  /subjects/<subject>/versions
GET  /schemas/ids/<id>

# Register a new schema version under a subject
POST /subjects/<subject>/versions
  { "schema": "{...Avro JSON...}", "schemaType": "AVRO" }

# Check whether a candidate schema is compatible with the latest
POST /compatibility/subjects/<subject>/versions/latest
  { "schema": "{...}" }

# Read or change compatibility config
GET  /config
PUT  /config              { "compatibility": "BACKWARD" }
PUT  /config/<subject>    { "compatibility": "FORWARD" }

5. ksqlDB — push, pull, and materialised state

ksqlDB is a SQL-like layer over Kafka Streams. It runs as a separate server (or cluster of servers), reads from Kafka topics, processes records, and writes back to other Kafka topics; the long-running computations are Kafka Streams topologies under the hood. There are two ideas to keep straight: streams vs tables, and push vs pull queries.

Streams and tables

  • STREAM — an append-only sequence of records, backed by a Kafka topic. Insertions produce new records; there is no concept of update.
  • TABLE — the materialised view of the latest value per key. Backed by a compacted Kafka topic and, on the ksqlDB server, by a local state store (RocksDB).
CREATE STREAM orders (
  order_id   STRING KEY,
  customer   STRING,
  amount     DOUBLE
) WITH (
  KAFKA_TOPIC='orders',
  VALUE_FORMAT='AVRO'
);

CREATE TABLE customers (
  customer_id STRING PRIMARY KEY,
  tier        STRING
) WITH (
  KAFKA_TOPIC='customers',
  VALUE_FORMAT='AVRO'
);

Push queries (continuous output)

A SELECT ... EMIT CHANGES stays open and emits a row every time the underlying stream / table changes. This is how you build a streaming pipeline whose result is itself a new topic.

-- Continuously project a join of orders + customers, write to a new topic
CREATE STREAM enriched_orders AS
SELECT o.order_id, o.amount, c.tier
FROM   orders o
LEFT JOIN customers c ON o.customer = c.customer_id
EMIT CHANGES;

Pull queries (point lookup)

A pull query returns a single answer from the materialised state of a table. It is a one-shot SQL query, not a stream subscription. Useful for "look up the current value for this key" without standing up a separate cache.

SELECT tier FROM customers WHERE customer_id = 'c-42';

Windowing

Aggregations over a stream are windowed: TUMBLING (fixed, non-overlapping), HOPPING (fixed-size, overlapping), or SESSION (gap-based). Window state lives in the local RocksDB store and is backed by a changelog topic so a restarted ksqlDB server can rebuild it.

6. Cluster Linking & Tiered Storage

The two most distinctive proprietary features are worth understanding in their own right, because they make decisions that change how you architect the surrounding system.

Cluster Linking

Cluster Linking is a broker-level mirror that copies records from a source topic in one cluster to a mirror topic in a destination cluster, preserving offsets. That offset preservation is the key difference from MirrorMaker 2: with MM2, each replicated partition is re-numbered on the destination, so a consumer that fails over has to translate offsets via a separate __consumer_offsets mapping. With Cluster Linking, a consumer that fails over reads from the same offset it was at, which makes active / passive disaster-recovery much simpler.

On the destination side, mirror topics are read-only by default. To use the destination as a failover target you promote the mirror, which severs the link and converts the topic to a normal writeable topic. Cluster Linking is configured as a top-level cluster object (a ClusterLink with credentials for the source) rather than as per-topic configuration, and individual topics are added to or removed from the link.

Tiered Storage

Confluent's Tiered Storage moves older log segments off the broker's local disk onto object storage (S3, GCS, Azure Blob). The hot tier — anything inside confluent.tier.local.hotset.ms / ...bytes — stays on the broker. Once a segment ages out of the hotset, the broker uploads it to the configured object store and then deletes the local copy. When a consumer fetches an offset that lives in the cold tier, the broker fetches the relevant segment back from object storage transparently and serves it.

The effect on operations: per-broker disk size is sized for the hot working set, not for the entire retention. Retention can be measured in weeks or months without scaling broker disks. The trade-off is fetch latency for cold reads (an S3 GET is much slower than a local page-cache read) and the operational overhead of the object store itself (credentials, lifecycle policies, egress costs).

Note that Apache Kafka has since shipped its own tiered storage implementation under KIP-405. The two are not the same codebase, and a Confluent Platform install does not silently switch to the upstream implementation when you upgrade.

7. Confluent Cloud — cluster types, networking, eCKU

Confluent Cloud sells Kafka clusters in tiers. The publicly documented tiers, with the rough operational shape of each:

  • Basic — multi-tenant, pay-as-you-go. Limited features; intended for development and small workloads.
  • Standard — multi-tenant with stronger throughput guarantees and broader feature set (Cluster Linking on the destination side, RBAC).
  • Dedicated — single-tenant cluster sized in eCKU (elastic Confluent Kafka Units). Full feature set, private networking options, multi-zone HA.
  • Enterprise — newer tier with enterprise-grade SLA and dedicated networking. Treat the exact feature gate as something to check against the current Confluent docs rather than something to encode in your design.

eCKU sizing

An eCKU is Confluent's unit of guaranteed throughput on Dedicated clusters: roughly tens of MB/s of produce + fetch, a documented partition count, and a connection budget. Clusters are sized by adding eCKUs and can elastically scale up (and back down, with some restrictions) without a rolling broker upgrade visible to clients. The pricing model is per-eCKU-hour plus data-egress and storage; the public docs are the source of truth and the exact numbers move over time.

Networking

  • Public — TLS over the open Internet, IP-allowlist optional. Cheapest, available on all tiers.
  • Private networking — VPC Peering, AWS PrivateLink, GCP Private Service Connect, Azure Private Link, AWS Transit Gateway. Available on Dedicated and Enterprise tiers depending on the cloud and region. Required when corporate policy forbids data over the public Internet.

What you do not get

Confluent Cloud does not expose the broker JVM, broker config files, JMX, or shell access. Tuning is via documented config knobs through the API; observability is via Confluent's metrics API and Stream Lineage UI. If a debugging path on Apache Kafka requires SSH into the broker, the Cloud equivalent is "raise a support ticket and provide context." This is the standard trade-off of every managed service, but it is worth being explicit about it during a vendor evaluation.

8. CLI & REST surface

Two CLIs and three REST APIs cover most day-to-day work.

confluent CLI

The confluent CLI is the user-facing tool for both Confluent Cloud and Confluent Platform RBAC-aware operations. It speaks to the Confluent Cloud control plane (environments, clusters, schemas, API keys) and to a CP cluster's REST endpoints.

# Confluent Cloud login + cluster bootstrap
confluent login --save
confluent environment list
confluent environment use env-XXXXX
confluent kafka cluster list
confluent kafka cluster use lkc-XXXXX

# Topic and schema operations
confluent kafka topic create orders --partitions 6
confluent kafka topic list
confluent schema-registry schema describe --subject orders-value --version latest
confluent schema-registry compatibility validate --subject orders-value --schema-type AVRO --schema /tmp/orders.avsc

# Service accounts and API keys
confluent iam service-account create payments-svc --description "payments service"
confluent api-key create --resource lkc-XXXXX --service-account sa-YYYY

Schema Registry REST

Covered above. Any HTTP client works; curl against a Cloud Schema Registry endpoint with the API-key basic-auth header is the simplest portable form. Most CI pipelines that gate schema changes use these endpoints directly rather than the CLI.

ksqlDB REST + ksqldb-cli

ksqlDB exposes a REST API for issuing statements and pull queries, and a CLI (ksqldb-cli) for an interactive session. Push queries are streamed over a long-lived HTTP/2 connection — relevant if you sit ksqlDB behind a proxy or load balancer that defaults to short HTTP timeouts.

9. Versioning & release model

Confluent Platform versions track Apache Kafka versions with a small lag. CP X.Y bundles a specific Apache Kafka version, plus the matching Schema Registry, REST Proxy, ksqlDB, Connect, Control Center, CFK, and proprietary feature releases. The Confluent docs publish a compatibility matrix per CP release: which Apache Kafka version is bundled, which JDK is supported, which Schema Registry version pairs with which CP. Treat that matrix as authoritative — particularly when planning an upgrade across a CCL/proprietary feature boundary.

Confluent Cloud has no version that you select; the control plane decides when to roll a new minor across the multi-tenant pool, and Dedicated clusters get advance notice. Cloud-only features (Stream Governance, managed Flink, Stream Lineage) ship continuously and are documented on Confluent's release-notes page rather than as discrete version tags.

Related