Spread SELECTs from reporting apps among cluster nodes

Reporting apps usually generate various customer reports from SELECT query results. The load generated by such SELECTs on ClickHouse cluster may vary depending on the number of online customers and on the generated report types. It is obvious that the load must be limited in order to prevent cluster overload.

All the SELECTs may be routed to a distributed table on a single node. But this increases resource usage (RAM, CPU and network) on the node comparing to other nodes, since it must do final aggregation, sorting and filtering for the data obtained from cluster nodes (shards).

It would be better to create identical distributed tables on each shard and spread SELECTs among all the available shards.

The following minimal chproxy config may be used for this use case:

server:
  http:
      listen_addr: ":9090"

      # Networks with reporting servers.
      allowed_networks: ["10.10.2.0/24"]

users:
  - name: "report"
    to_cluster: "stats-aggregate"
    to_user: "readonly"
    max_concurrent_queries: 6
    max_execution_time: 1m

clusters:
  - name: "stats-aggregate"
    nodes: [
      "10.10.20.1:8123",
      "10.10.20.2:8123"
    ]
    users:
      - name: "readonly"
        password: "****"
Edit this page on GitHub Updated at Tue, Nov 29, 2022