Geo Data using the Cell Tower Dataset
Goal
In this guide you will learn how to:
- Load the OpenCelliD data in ClickHouse
- Connect Apache Superset to ClickHouse
- Build a dashboard based on data available in the dataset
Here is a preview of the dashboard created in this guide:
Get the dataset
This dataset is from OpenCelliD - The world's largest Open Database of Cell Towers.
As of 2021, it contains more than 40 million records about cell towers (GSM, LTE, UMTS, etc.) around the world with their geographical coordinates and metadata (country code, network, etc.).
OpenCelliD Project is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, and we redistribute a snapshot of this dataset under the terms of the same license. The up-to-date version of the dataset is available to download after sign in.
- ClickHouse Cloud
- Self-managed
Load the sample data
ClickHouse Cloud provides an easy-button for uploading this dataset from S3. Log in to your ClickHouse Cloud organization, or create a free trial at ClickHouse.cloud.
Select your service, followed by Data sources -> Predefined sample data.
Choose the Cell Towers dataset from the Sample data tab, and Load data:
Examine the schema of the cell_towers table
If you need a SQL client connection, your ClickHouse Cloud service has an associated web based SQL console; expand Connect to SQL console below for details.
Connect to SQL console
From your ClickHouse Cloud services list, click on a service.

This will redirect you to the SQL console.

This is the output of DESCRIBE.  Down further in this guide the field type choices will be described.
- Create a table:
- Import the dataset from a public S3 bucket (686 MB):
Run some example queries
- A number of cell towers by type:
- Cell towers by mobile country code (MCC):
Based on the above query and the MCC list, the countries with the most cell towers are: the USA, Germany, and Russia.
You may want to create a Dictionary in ClickHouse to decode these values.
Use case: incorporate geo data
Using the pointInPolygon function.
- Create a table where we will store polygons:
- ClickHouse Cloud
- Self-managed
- This is a rough shape of Moscow (without "new Moscow"):
- Check how many cell towers are in Moscow:
Review of the schema
Before building visualizations in Superset have a look at the columns that you will use. This dataset primarily provides the location (Longitude and Latitude) and radio types at mobile cellular towers worldwide. The column descriptions can be found in the community forum. The columns used in the visualizations that will be built are described below
Here is a description of the columns taken from the OpenCelliD forum:
| Column | Description | 
|---|---|
| radio | Technology generation: CDMA, GSM, UMTS, 5G NR | 
| mcc | Mobile Country Code: 204is The Netherlands | 
| lon | Longitude: With Latitude, approximate tower location | 
| lat | Latitude: With Longitude, approximate tower location | 
To find your MCC check Mobile network codes, and use the three digits in the Mobile country code column.
The schema for this table was designed for compact storage on disk and query speed.
- The radiodata is stored as anEnum8(UInt8) rather than a string.
- mccor Mobile country code, is stored as a- UInt16as we know the range is 1 - 999.
- lonand- latare- Float64.
None of the other fields are used in the queries or visualizations in this guide, but they are described in the forum linked above if you are interested.
Build visualizations with Apache Superset
Superset is easy to run from Docker.  If you already have Superset running, all you need to do is add ClickHouse Connect with pip install clickhouse-connect.  If you need to install Superset open the Launch Apache Superset in Docker directly below.
Launch Apache Superset in Docker
Superset provides installing Superset locally using Docker Compose instructions.  After checking out the Apache Superset repo from GitHub you can run the latest development code, or a specific tag.  We recommend release 2.0.0 as it is the latest release not marked as pre-release.
There are a few tasks to be done before running docker compose:
- Add the official ClickHouse Connect driver
- Obtain a Mapbox API key and add that as an environment variable (optional)
- Specify the version of Superset to run
The commands below are to be run from the  top level of the GitHub repo, superset.
Official ClickHouse connect driver
To make the ClickHouse Connect driver available in the Superset deployment add it to the local requirements file:
Mapbox
This is optional, you can plot location data in Superset without a Mapbox API key, but you will see a message telling you that you should add a key and the background image of the map will be missing (you will only see the data points and not the map background). Mapbox provides a free tier if you would like to use it.
Some of the sample visualizations that the guides have you create use location, for example longitude and latitude, data. Superset includes support for Mapbox maps. To use the Mapbox visualizations you need a Mapbox API key. Sign up for the Mapbox free tier, and generate an API key.
Make the API key available to Superset:
Deploy Superset version 2.0.0
To deploy release 2.0.0 run:
To build a Superset dashboard using the OpenCelliD dataset you should:
- Add your ClickHouse service as a Superset database
- Add the table cell_towers as a Superset dataset
- Create some charts
- Add the charts to a dashboard
Add your ClickHouse service as a Superset database
To connect to ClickHouse with HTTP(S) you need this information:
- 
The HOST and PORT: typically, the port is 8443 when using TLS or 8123 when not using TLS. 
- 
The DATABASE NAME: out of the box, there is a database named default, use the name of the database that you want to connect to.
- 
The USERNAME and PASSWORD: out of the box, the username is default. Use the username appropriate for your use case.
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click Connect:
Choose HTTPS, and the details are available in an example curl command.
If you are using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
In Superset a database can be added by choosing the database type, and then providing the connection details. Open Superset and look for the +, it has a menu with Data and then Connect database options.
Choose ClickHouse Connect from the list:
If ClickHouse Connect is not one of your options, then you will need to install it. The command is pip install clickhouse-connect, and more info is available here.
Add your connection details
Make sure that you set SSL on when connecting to ClickHouse Cloud or other ClickHouse systems that enforce the use of SSL.
Add the table cell_towers as a Superset dataset
In Superset a dataset maps to a table within a database.  Click on add a dataset and choose your ClickHouse service, the database containing your table (default), and choose the cell_towers table:
Create some charts
When you choose to add a chart in Superset you have to specify the dataset (cell_towers) and the chart type.  Since the OpenCelliD dataset provides longitude and latitude coordinates for cell towers we will create a Map chart.  The deck.gL Scatterplot type is suited to this dataset as it works well with dense data points on a map.
Specify the query used for the map
A deck.gl Scatterplot requires a longitude and latitude, and one or more filters can also be applied to the query. In this example two filters are applied, one for cell towers with UMTS radios, and one for the Mobile country code assigned to The Netherlands.
The fields lon and lat contain the longitude and latitude:
Add a filter with mcc = 204 (or substitute any other mcc value):
Add a filter with radio = 'UMTS' (or substitute any other radio value, you can see the choices in the output of DESCRIBE TABLE cell_towers):
This is the full configuration for the chart that filters on radio = 'UMTS' and mcc = 204:
Click on UPDATE CHART to render the visualization.
Add the charts to a dashboard
This screenshot shows cell tower locations with LTE, UMTS, and GSM radios. The charts are all created in the same way, and they are added to a dashboard.
The data is also available for interactive queries in the Playground.
This example will populate the username and even the query for you.
Although you cannot create tables in the Playground, you can run all of the queries and even use Superset (adjust the host name and port number).
