- LangSmith-managed ClickHouse
- Provision a ClickHouse Cloud either directly or through a cloud provider marketplace:
- On a VM in your cloud provider
Using the first two options (LangSmith-managed ClickHouse or ClickHouse Cloud) will provision a Clickhouse service OUTSIDE of your VPC. However, both options support private endpoints, meaning that you can direct traffic to the ClickHouse service without exposing it to the public internet (eg via AWS PrivateLink, or GCP Private Service Connect).Additionally, sensitive information can be configured to be not stored in Clickhouse. Please contact support via support.langchain.com for more information.
Requirements
- A provisioned ClickHouse instance that your LangSmith application will have network access to (see above for options).
- A user with admin access to the ClickHouse database. This user will be used to create the necessary tables, indexes, and views.
- We support both standalone ClickHouse and externally managed clustered deployments. For clustered deployments, ensure all nodes are running the same version. Note that clustered setups are not supported with bundled ClickHouse installations.
- We only support ClickHouse versions >= 23.9. Use of ClickHouse versions >= 24.2 requires LangSmith v0.6 or later.
- We rely on a few configuration parameters to be set on your ClickHouse instance. These are detailed below:
HA Replicated Clickhouse Cluster
If you would like to use a multi-node Clickhouse cluster for HA, we support this with additional required configuration. This setup can use a Clickhouse cluster with multiple nodes where data replicated via Zookeeper or Clickhouse Keeper. For more information on Clickhouse replication, see Clickhouse Data Replication Docs. In order to setup LangSmith with a replicated multi-node Clickhouse setup:- You need to have a Clickhouse cluster that is setup with Keeper or Zookeeper for data replication and the appropriate settings. See Clickhouse Replication Setup Docs.
- You need to set the cluster setting in the LangSmith Configuration section, specifically the
clustersettings to match your Clickhouse Cluster name. This will use theReplicatedtable engines when running the Clickhouse migrations. - If in addition to HA, you would like to load balance among the Clickhouse nodes (to distribute reads or writes), we suggest using a load balancer or DNS load balancing to round robin among your Clickhouse servers.
- Note: You will need to enable your
clustersetting before launching LangSmith for the first time and running the Clickhouse migrations. This is a requirement since the table engine will need to be created as aReplicatedtable engine vs the non replicated engine type.
cluster enabled, the migration will create the Replicated table engine flavor. This means that data will be replicated among the servers in the cluster. This is a master-master setup where any server can process reads, writes, or merges.
For an example setup of a replicated ClickHouse cluster, refer to the replicated ClickHouse section in the LangSmith Helm chart repo, under examples.
LangSmith-managed ClickHouse
- If using LangSmith-managed ClickHouse, you will need to set up a VPC peering connection between the LangSmith VPC and the ClickHouse VPC. Please contact support via support.langchain.com for more information.
- You will also need to set up Blob Storage. You can read more about Blob Storage in the Blob Storage documentation.
ClickHouse installations managed by LangSmith use a SharedMerge engine, which automatically clusters them and separates compute from storage.
Parameters
You will need to provide several parameters to your LangSmith installation to configure an external ClickHouse database. These parameters include:- Host: The hostname or IP address of the ClickHouse database
- HTTP Port: The port that the ClickHouse database listens on for HTTP connections
- Native Port: The port that the ClickHouse database listens on for native connections
- Database: The name of the ClickHouse database that LangSmith should use
- Username: The username to use to connect to the ClickHouse database
- Password: The password to use to connect to the ClickHouse database
- Cluster (Optional): The name of the ClickHouse cluster if using an external Clickhouse cluster. When set, LangSmith will run migrations on the cluster and replicate data across instances.
Configuration
With these parameters in hand, you can configure your LangSmith instance to use the provisioned ClickHouse database. You can do this by modifying theconfig.yaml file for your LangSmith Helm Chart installation or the .env file for your Docker installation.
TLS with ClickHouse
Use this section to configure TLS for ClickHouse connections. For mounting internal/public CAs so LangSmith trusts your ClickHouse server certificate, see Configure custom TLS certificates.Server TLS (one-way)
To enable TLS for ClickHouse connections:- Set
tls: truein your configuration (or usetlsSecretKeywith an external secret). - Use the appropriate TLS ports (typically
8443for HTTP and9440for native TCP connections). - Provide a CA bundle using
config.customCa.secretNameandconfig.customCa.secretKeyif using an internal CA.
Mutual TLS with client auth (mTLS)
As of LangSmith helm chart version 0.12.29, we support mTLS for ClickHouse clients. For server-side authentication in mTLS, use the Server TLS steps (custom CA) in addition to the following client certificate configuration. If your ClickHouse server requires client certificate authentication:- Provide a Secret with your client certificate and key.
- Reference it via
clickhouse.external.clientCert.secretNameand specify the keys withcertSecretKeyandkeySecretKey.
Non-TLS native port for migrations
By default, the migration job connects to port9000 for migrations. If your ClickHouse instance uses a different non-TLS native port, you can configure it using the CLICKHOUSE_MIGRATE_NATIVE_PORT environment variable:
Pod security context for certificate volumes
The certificate volumes mounted for mTLS are protected by file access restrictions. To ensure all LangSmith pods can read the certificate files, you must setfsGroup: 1000 in the pod security context.
You can configure this in one of two ways:
Option 1: Use commonPodSecurityContext
Set the fsGroup at the top level to apply it to all pods:
fsGroup to each pod’s security context individually. See the mTLS configuration example for a complete reference.