Skip to main content

Connect a project

Once you've installed Lightdash, you can connect to your existing dbt project:

Once you've set up the connection to your dbt project, you'll need to continue on to set up the connection to your warehouse (it's a short step, we promise ๐Ÿคž).

We currently support:

  1. Bigquery
  2. Postgres
  3. Redshift
  4. Snowflake
  5. Databricks

If we don't support the warehouse you're using, don't be afraid to reach out to us in GitHub! :)

dbt connection optionsโ€‹


GitHubโ€‹

Personal access tokenโ€‹

This is used to access your repo. See the instructions for creating a personal access token here.

Select repo scope when you're creating the token.

screenshot

Repositoryโ€‹

This should be in the format my-org/my-repo. e.g. lightdash/lightdash-analytics

Branchโ€‹

This is the branch in your GitHub repo that Lightdash should sync to. e.g. main, master or dev

By default, we've set this to main but you can change it to whatever you'd like.

Project directory pathโ€‹

This is the folder where your dbt_project.yml file is found in the GitHub repository you entered above.

  • Put / if your dbt_project.yml file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml)
  • Include the path to the sub-folder where your dbt project is if your dbt project is in a sub-folder in your repo. For example, if my project was in lightdash/lightdash-analytics/dbt/dbt_project.yml, I'd write /dbt in this field.

Host domainโ€‹

If you've customized the domain for your GitHub pages, you can add the custom domain for your project in here.

By default, this is github.com

Target nameโ€‹

target contains information about your dbt connection to your warehouse.

It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml file.

If you want to change this to be something other than the default target defined in dbt, you can enter the target of your choice here (for example dbt_khindson.)

To read more about dbt targets, check out the dbt docs here.

Environment variablesโ€‹

If you've used environment variables in your dbt profiles.yml file, you can add these to Lightdash here.

For each environment variable, you'll need to add the key + value pair for the item.

You'll normally find these values in a file called .env in your dbt project directory.

For example, I might have something like:

profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....

Then a .env file like:

export DBT_USER="myspecialuserkey123"

So, in Lightdash, I'd add a new environment variable and put key as DBT_USER and value as myspecialuserkey123.


GitLabโ€‹

Personal access tokenโ€‹

This is used to access your repo. See the instructions for creating a personal access token here.

Select read_repository scope when you're creating the token.

Repositoryโ€‹

This should be in the format my-org/my-repo. e.g. lightdash/lightdash-analytics

Branchโ€‹

This is the branch in your GitLab repo that Lightdash should sync to. e.g. main, master or dev

By default, we've set this to main but you can change it to whatever you'd like.

Project directory pathโ€‹

This is the folder where your dbt_project.yml file is found in the GitLab repository you entered above.

If your dbt_project.yml file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml), then you don't need to change anything in here. You can just leave the default value we've put in.

If your dbt project is in a sub-folder in your repo (e.g. lightdash/lightdash-analytics/dbt/dbt_project.yml), then you'll need to include the path to the sub-folder where your dbt project is (e.g. /dbt).

Host domainโ€‹

If you've customized the domain for your GitLab pages, you can add the custom domain for your project in here.

By default, this is gitlab.io.

Target nameโ€‹

target contains information about your dbt connection to your warehouse.

It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml file.

If you want to change this to be something other than the default target defined in dbt, you can enter the target of your choice here (for example dbt_khindson.)

To read more about dbt targets, check out the dbt docs here.

Environment variablesโ€‹

If you've used environment variables in your dbt profiles.yml file, you can add these to Lightdash here.

For each environment variable, you'll need to add the key + value pair for the item.

You'll normally find these values in a file called .env in your dbt project directory.

For example, I might have something like:

profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....

Then a .env file like:

export DBT_USER="myspecialuserkey123"

So, in Lightdash, I'd add a new environment variable and put key as DBT_USER and value as myspecialuserkey123.


Azure DevOpsโ€‹

Personal access tokenโ€‹

This is your secret token used to access Azure Devops. See the instructions to create a personal access token You must specify at least the Repo:Read scope.

Organizationโ€‹

This is the name of the organization that owns your repository

Projectโ€‹

This is the name of the project that owns your repository

Repositoryโ€‹

This is the name of the repository. For many projects, this is the same as your project name above.

Branchโ€‹

This is the branch in your repository that Lightdash should sync to. e.g. main, master or dev

By default, we've set this to main but you can change it to whatever you'd like.

Project directory pathโ€‹

This is the folder where your dbt_project.yml file is found in the repository you entered above.

If your dbt_project.yml file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml), then you don't need to change anything in here. You can just leave the default value we've put in.

If your dbt project is in a sub-folder in your repo (e.g. lightdash/lightdash-analytics/dbt/dbt_project.yml), then you'll need to include the path to the sub-folder where your dbt project is (e.g. /dbt).

Target nameโ€‹

target contains information about your dbt connection to your warehouse.

It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml file.

If you want to change this to be something other than the default target defined in dbt, you can enter the target of your choice here (for example dbt_khindson.)

To read more about dbt targets, check out the dbt docs here.

Environment variablesโ€‹

If you've used environment variables in your dbt profiles.yml file, you can add these to Lightdash here.

For each environment variable, you'll need to add the key + value pair for the item.

You'll normally find these values in a file called .env in your dbt project directory.

For example, I might have something like:

profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....

Then a .env file like:

export DBT_USER="myspecialuserkey123"

So, in Lightdash, I'd add a new environment variable and put key as DBT_USER and value as myspecialuserkey123.

Local dbt projectโ€‹

Prerequisite

Unsuitable for production and only available for Lightdash instances installed on your local machine

To start Lightdash with the option to connect to a local dbt project, you must specify the directory of the dbt project when you start docker compose:

# Specify the absolute path to your dbt project
# e.g. export DBT_PROJECT_DIR=/Users/elonmusk/mydbtproject
export DBT_PROJECT_DIR= # Enter your path here!
docker compose start

BitBucketโ€‹

Usernameโ€‹

This is the login name for your Bitbucket user. This is usually the same username you use to login to Bitbucket. You can find your username in Bitbucket by:

  1. Going to your avatar in the bottom left and click Personal settings.
  2. From the Account settings page, check the value next to the Username

Alternatively, you can create a new user through the Bitbucket console with a username and password specifically for Lightdash to use.

Http access tokenโ€‹

This is used by Lightdash to access your repo. See the instructions for creating a http access token here.

Select Project read and Repository read scope when you're creating the token.

Repositoryโ€‹

This should be in the format my-org/my-repo. e.g. lightdash/lightdash-analytics

Branchโ€‹

This is the branch in your Bitbucket repo that Lightdash should sync to. e.g. main, master or dev

By default, we've set this to main but you can change it to whatever you'd like.

Project directory pathโ€‹

This is the folder where your dbt_project.yml file is found in the Bitbucket repository you entered above.

  • Put / if your dbt_project.yml file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml)
  • Include the path to the sub-folder where your dbt project is if your dbt project is in a sub-folder in your repo. For example, if my project was in lightdash/lightdash-analytics/dbt/dbt_project.yml, I'd write /dbt in this field.

Host domainโ€‹

If you've customized the domain for your Bitbucket server, you can add the custom domain for your project in here.

Target nameโ€‹

target contains information about your dbt connection to your warehouse.

It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml file.

If you want to change this to be something other than the default target defined in dbt, you can enter the target of your choice here (for example dbt_khindson.)

To read more about dbt targets, check out the dbt docs here.

Environment variablesโ€‹

If you've used environment variables in your dbt profiles.yml file, you can add these to Lightdash here.

For each environment variable, you'll need to add the key + value pair for the item.

You'll normally find these values in a file called .env in your dbt project directory.

For example, I might have something like:

profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....

Then a .env file like:

export DBT_USER="myspecialuserkey123"

So, in Lightdash, I'd add a new environment variable and put key as DBT_USER and value as myspecialuserkey123.


Warehouse connectionโ€‹

We always recommend giving read-only permissions to Lightdash, that way you ensure than no data can be manipulated. See each section below for warehouse specific details.

Bigqueryโ€‹

Projectโ€‹

This is project ID from Google Cloud Platform for the data that you want to connect Lightdash to.

To find your project ID, head to the BigQuery Cloud console.

Once you're in there, click on the project name in the top bar.

get to projects

Then, you should see all of the projects and their project IDs in your organization (that you have access to).

project id list

For the project you want to connect Lightdash to, just copy its id and pop it into the project field in the Warehouse Connection form in Lightdash.

Data setโ€‹

This is the default dataset used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE or your local profiles.yml file.

If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:

dbt-cloud-credentials

If you work with dbt locally, then check your profiles.yml file at ~/.dbt/profiles.yml and look for a field named dataset:

my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset # look for this one!
...

Locationโ€‹

The data location of the dataset in BigQuery where the output of your dbt models is written to.

location may be either a multi-regional location (e.g. EU, US), or a regional location (e.g. us-west2 ). Check out the BigQuery documentation for more information on dataset locations.

You can find the location of the dataset you're using for your dbt project in your dbt profiles.yml file, or in your BigQuery console.

data location

Key Fileโ€‹

To connect to BigQuery, you'll need to have a service account that you can use with Lightdash. You can read more about creating and managing service accounts with Google BigQuery in their docs.

info

If you're creating a new service account, make sure that you save your JSON key file somewhere safe! You'll need it for connecting to Lightdash.

The service account you use with Lightdash will need to have the following roles in your GCP project:

  • roles/bigquery.dataViewer (to see data in your project)
  • roles/bigquery.jobUser (to run queries in your project)

If you need to provide access to data across multiple BigQuery projects, the service account will need to be granted roles/bigquery.dataViewer on each additional BigQuery project.

Once you have a service account all ready to go, you'll need to add its JSON key file to Lightdash in the key file section of the Warehouse Connection page.

Threadsโ€‹

This is the number of multiple concurrent queries Lightdash will run against your Snowflake warehouse. Use a higher number if you have a larger warehouse. A larger number of threads will increase performance for many users Lightdash at the same time.

Timeout in secondsโ€‹

BigQuery supports query timeouts. By default, the timeout is set to 300 seconds. If a query run by Lightdash takes longer than this timeout to complete, then BigQuery may cancel the query and issue the following error:

Operation did not complete within the designated timeout.

To change this timeout, use the Timeout in seconds configuration.

Priorityโ€‹

The priority for the BigQuery jobs that Lightdash executes can be configured with the priority configuration in your Warehouse Connection settings. The priority field can be set to one of batch or interactive.

For more information on query priority, check out the BigQuery documentation.

Retriesโ€‹

The retries configuration specifies the number of times Lightdash should retry queries that result in unhandled server errors.

For example, setting retries to 5 means that Lightdash will retry BigQuery queries 5 times with a delay. If the query does not succeed after the fifth attempt, then Lightdash will raise an error.

By default, the number of retries is set to 3.

Maximum bytes billedโ€‹

If a value for the Maximum bytes billed is set, then queries executed by Lightdash will fail if they exceed the configured maximum bytes threshhold. This configuration should be supplied as an integer number of bytes.

For example, setting this to 1000000000 means if a query would bill more than a gigabyte of data (e.g. 2Gb), then BigQuery will reject the query and you'd get an error like this:

  Query exceeded limit for bytes billed: 1000000000. 2000000000 or higher required.

Postgresโ€‹

You can see more details in dbt documentation.

Hostโ€‹

This is the host where the database is running.

Userโ€‹

This is the database user name.

Passwordโ€‹

This is the database user password.

DB nameโ€‹

This is the database name.

Schemaโ€‹

This is the default schema used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE or your local profiles.yml file.

If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:

dbt-cloud-credentials

If you work with dbt locally, then check your profiles.yml file at ~/.dbt/profiles.yml and look for a field named schema:

company-name:
target: dev
outputs:
dev:
type: postgres
host: [hostname]
user: [username]
password: [password]
port: [port]
dbname: [database name]
schema: [dbt schema] # look for this one!

Portโ€‹

This is the port where the database is running.

Threadsโ€‹

This is the number of multiple concurrent queries Lightdash will run against your Snowflake warehouse. Use a higher number if you have a larger warehouse. A larger number of threads will increase performance for many users Lightdash at the same time.

Keep alive idle (seconds)โ€‹

This specifies the amount of seconds with no network activity after which the operating system should send a TCP keepalive message to the client. You can see more details in postgresqlco documentation.

Search pathโ€‹

This controls the Postgres "search path". You can see more details in dbt documentation.

SSL modeโ€‹

This controls how dbt connects to Postgres databases using SSL. You can see more details in dbt documentation.


Redshiftโ€‹

You can see more details in dbt documentation.

Hostโ€‹

This is the host where the database is running.

Userโ€‹

This is the database user name.

Passwordโ€‹

This is the database user password.

DB nameโ€‹

This is the database name.

Schemaโ€‹

This is the default schema used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE or your local profiles.yml file.

If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:

dbt-cloud-credentials

If you work with dbt locally, then check your profiles.yml file at ~/.dbt/profiles.yml and look for a field named schema:

company-name:
target: dev
outputs:
dev:
type: redshift
host: hostname.region.redshift.amazonaws.com
user: username
password: password1
port: 5439
dbname: analytics
schema: analytics # look for this one!

Portโ€‹

This is the port where the database is running.

Threadsโ€‹

This is the number of multiple concurrent queries Lightdash will run against your Snowflake warehouse. Use a higher number if you have a larger warehouse. A larger number of threads will increase performance for many users Lightdash at the same time.

Keep alive idle (seconds)โ€‹

This specifies the amount of seconds with no network activity after which the operating system should send a TCP keepalive message to the client.

If the database closes its connection while Lightdash is waiting for data, you may see the error SSL SYSCALL error: EOF detected. Lowering the keepalives_idle value may prevent this, because the server will send a ping to keep the connection active more frequently.

By default, this value is set to 240 seconds, but can be configured lower (perhaps 120 or 60), at the cost of a chattier network connection.

SSL modeโ€‹

This controls how dbt connects to Postgres databases using SSL.

RA3 Nodeโ€‹

Allow dbt to use cross-database-resources


Snowflakeโ€‹

You can see more details in dbt documentation.

Accountโ€‹

This is your Snowflake account identifer.

The easiest way to find this is to login to snowflake in your browser and your account identifier is whatever comes before snowflakecomputing.com.

snowflake-account-url

For example in the image above, the user logs in via https://aaa99827.snowflakecomputing.com/console/login#/ so the account identifier is aaa99827.

If you don't have access via the browser, you can use the following format <organization_name>-<account_name> where organization_name and account_name can be found by following any of the methods listed in Managing accounts in your organization.

Userโ€‹

This is the login name for your Snowflake user. This is usually the same username you use to login to Snowflake.

Alternatively, you can create a new user through the snowflake console with a username and password specifically for Lightdash to use.

If you're a snowflake admin you can list all users available in the snowflake console:

snowflake-users-list

Passwordโ€‹

This is the password your Snowflake user. This is usually the same password you use to login to Snowflake:

snowflake-login-form

Alternatively, you can create a new user through the snowflake console with a username and password specifically for Lightdash to use.

If you're a Snowflake admin you can list all users available in the snowflake console:

snowflake-users-list

Roleโ€‹

This is the security role that you would like to use when running queries as the specified user. The role must have access to any warehouses, databases, schemas, and tables you want to use.

If you're a Snowflake admin you can list all roles available in the snowflake console:

snowflake-roles-list

You can configure your role to allow read access to all warehouses, databases, schemas, and tables by following the guide for Creating custom read-only roles.

Databaseโ€‹

This is the name of your database. The specified user must be granted access to this database. You can see a list of databases available in the snowflake console:

snowflake-database-list

Warehouseโ€‹

This is the name of the warehouse you would like to use for running queries. The specified user must be grantend access to use this warehouse. You can see al list of warehouses available in the snowflake console:

snowflake-warehouse-list

Schemaโ€‹

This is the default schema used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE or your local profiles.yml file.

If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:

dbt-cloud-credentials

If you work with dbt locally, then check your profiles.yml file at ~/.dbt/profiles.yml and look for a field named schema:

my-snowflake-db:
target: dev
outputs:
dev:
type: snowflake
account: [account id]
user: [username]
password: [password]
role: [user role]
database: [database name]
warehouse: [warehouse name]
schema: [dbt schema] # Look for this one!

Threadsโ€‹

This is the number of multiple concurrent queries Lightdash will run against your Snowflake warehouse. Use a higher number if you have a larger warehouse. A larger number of threads will increase performance for many users Lightdash at the same time.

Keep client session aliveโ€‹

This is intended to keep Snowflake sessions alive beyond the typical 4 hour timeout limit. You can see more details in dbt documentation.

Query tagโ€‹

A value with which to tag all queries, for later searching in QUERY_HISTORY view)


Databricksโ€‹

The credentials needed to connect to your cluster can be found in the ODBC options in your databricks account:

  1. Go to the Compute tab in the sidebar.
  2. Click the configuration tab for the cluster that you're connecting to Lightdash.
  3. Expand the Advanced options tab
  4. Open the JDBC/ODBC tab

databricks connect screenshot

Server hostnameโ€‹

Follow the instructions above to find your ODBC connection instructions.

HTTP Pathโ€‹

Follow the instructions above to find your ODBC connection instructions.

Portโ€‹

Follow the instructions above to find your ODBC connection instructions.

Personal Access Tokenโ€‹

Your personal access token can be found in your user settings in databricks:

  1. Open Settings by clicking the cog โš™๏ธ in the sidebar and select User settings
  2. Click Generate token. You'll be asked to enter a name and expiry.
  3. Copy the token

databricks access screenshot

Databaseโ€‹

The default database name used by dbt for this connection. In databricks/spark the database is also the schema.