Spread INSERTs among cluster shards

Usually INSERTs are sent from app servers located in a limited number of subnetworks. INSERTs from other subnetworks must be denied.

All the INSERTs may be routed to a distributed table on a single node. But this increases resource usage (CPU and network) on the node comparing to other nodes, since it must parse each row to be inserted and route it to the corresponding node (shard).

It would be better to spread INSERTs among available shards and to route them directly to per-shard tables instead of distributed tables. The routing logic may be embedded either directly into applications generating INSERTs or may be moved to a proxy. Proxy approach is better since it allows re-configuring ClickHouse cluster without modification of application configs and without application downtime. Multiple identical proxies may be started on distinct servers for scalability and availability purposes.

The following minimal chproxy config may be used for this use case:

server:
  http:
      listen_addr: ":9090"

      # Networks with application servers.
      allowed_networks: ["10.10.1.0/24"]

users:
  - name: "insert"
    to_cluster: "stats-raw"
    to_user: "default"

clusters:
  - name: "stats-raw"

    # Requests are spread in `round-robin` + `least-loaded` fashion among nodes.
    # Unreachable and unhealthy nodes are skipped.
    nodes: [
      "10.10.10.1:8123",
      "10.10.10.2:8123",
      "10.10.10.3:8123",
      "10.10.10.4:8123"
    ]
Edit this page on GitHub Updated at Tue, Nov 29, 2022