Visualizing Light Pollution Data with SQLite and Datasette
How I created data visualizations with Globe At Night's open light pollution dataset—and what they say about the night sky
Background
Citizen science projects allow ordinary people to collect or label data for use by scientists and researchers. This model is powerful because it allows the project’s leaders to gather data from all over the world with a fraction of the time or expense, and it helps scientists share their research to a broad audience.
The DarkSky International and the National Science Foundation’s NOIRLab have created a citizen science project called Globe At Night that allows users to submit observations of the night sky via their website.
To submit an observation, contributors go outside 0-2 hours after sunset but before the Moon rises, wait 10-15 minutes for their eyes to adjust, find a constellation in the night sky, and compare their view to a series of charts. The chart that appears closest to reality determines the limiting magnitude to submit. The form also includes questions about location, cloud cover, and extra comments if applicable. Then, the observation is added to Globe At Night’s data, which they release yearly.
If you want to submit your own observation, check out the submission form here. It only takes a few minutes, and it’ll help map light pollution across the world.
You can also find other citizen science projects with CitizenScience.gov’s project catalog.
Limiting Magnitude
Astronomers use a logarithmic stellar magnitude system to measure the relative brightness of stars in the night sky. The star Vega is the benchmark, with a magnitude of 0. Brighter objects have lower magnitudes on this scale, so the sun has the lowest magnitude and the dimmest stars have the highest magnitudes. The scale is logarithmic, so decreasing magnitude by 1 results in a 2.512x increase in brightness.
Limiting magnitude is the highest magnitude (the stellar magnitude of the faintest star) that you can see during an observation. As limiting magnitude increases, relative visibility increases because you can see dimmer stars. Therefore, an observation with a low limiting magnitude value represents one with a lot of light pollution.
Globe At Night allows contributors to report limiting magnitudes between 0 and 7. For context, here is a table of some common magnitudes:
Object | Stellar Magnitude |
---|---|
The Sun | -26.98 |
The Moon | -12.7 |
Venus | -4.92 |
Jupiter | -2.94 |
Mars | -2.94 |
Mercury | -2.48 |
Saturn | -0.55 |
Sirius (the brightest star in the night sky) | -1.46 |
Polaris (the North Star) | +1.98 |
The faintest stars visible to the naked eye | +6-7 |
Data source: Wikipedia
So, if your limiting magnitude is N, you should be able to see all stars with a stellar magnitude less than N. Globe At Night provides some reference charts that depict common constellations at various limiting magnitudes:
Read more about stellar magnitude here.
Ingesting The Data
Globe At Night publishes data by year in many different formats.
For the purposes of this project, I downloaded the data from each year as a CSV. Observations looked like this:
In total, the 18 files contained over 290,000 rows and weighed about 52 MB. I wanted to run my queries and visualizations on the entire dataset, which was over 290,000 rows, so I couldn’t just shove all the data into a Google Sheet. My first thought was to use SQLite to query and aggregate the data and then graph the results.
We now have a database table with 291,306 rows ready for us to make some queries.
Cleaning
There are a few odd things I’ve found with this dataset. Considering it’s citizen science, some errors are to be expected, but we have to make sure to get rid of them before making visualizations with them. For example, some people failed to add their location, so their latitude and longitude is (0, 0). Some others couldn’t determine a limiting magnitude, so the recorded value is a negative number.
Here are a few of the queries I ran to remove invalid rows:
We are now down to 251,138 rows (40,168 invalid rows!). I also dropped the country, location and sky comment columns since I didn’t need them and wanted to reduce the database’s file size:
The database was now 24.7 MB, which is less than half the size of the original CSVs!
Queries
Now, I’m going to go through some of the questions I had about the data and how I answered them with SQL queries on the observations
table.
The data on the left is the output of the corresponding SQL query.
1. What effect does elevation have on limiting magnitude?
2. Do different constellations have different average limiting magnitudes?
3. How many times was each limiting magnitude observed?
4. What was the average limiting magnitude across all observations?
5. How has the amount of observations recorded changed over the years?
6. Which constellations are the most popular to observe?
7. Has limiting magnitude changed over time?
This graph shows a disturbing trend. Over the past 18 years, the average limiting magnitude of Globe At Night observations has dropped from 3.68 to 2.42. That’s a drop of 1.26 points, an average of 0.07 points per year.
This shows that, assuming there has not been a significant change in data collection methadology or sampling biases over the past 18 years, light pollution globally is now worse than it was 18 years ago.
Datasette
Datasette is an open-source data visualization and analysis tool that I found while researching for this project. It allows you to quicky gain insights from any SQLite database.
I installed it with pip
:
Then, you can just run datasette <filename>
, where <filename>
is the name of the file that contains your database.
After that, I navigated to http://127.0.0.1:8001/ in my browser and clicked on the observations
table.
Datasette will suggest facets for you to add:
I clicked on a few. Facets allow you to quickly see the most common values for a given column in your table:
This immediately tells us that the most common constellation observed was Orion, the most common limiting magnitude was 3, and most observations did not include a cloud cover estimation (-1
instead of a rough numeric value out of 100
).
Datasette also gives us a sample of the data. Clicking on any of the column headers allows you to sort, facet by, or hide the column entirely.
The reason why Datasette is so powerful is that it can be easily extended with Python. It already has a large plugin repository, and installing plugins is as simple as a pip install
. I found one that will help me explore the Globe At Night data: datasette-cluster-map
. It looks for latitude and longitude columns in your table and creates a map of all the points in your dataset.
I installed it with this command:
After restarting Datasette, a map appeared above the data table:
Below, it offers an option to load every row in your table:
That’s more like it!
Clicking on one of the circles zooms you in and displays more, smaller clusters.
Eventually, you can get close enough to see individual observations (the blue map markers).
This plugin can be really useful for visualizing light pollution in areas near you where you can envision light sources causing reduces visibility. You can also compare nearby observations side-by-side to see if light pollution in one area varies or has changed over the years.
It also just looks really cool.
If you play around with this, you will notice that the limiting magnitude in big cities typically doesn’t surpass 2 or 3 due to light pollution caused by street lights and lights on buildings.
Making a globe animation with react-globe.gl
As part of a web page I was making about light pollution and Globe At Night, I wanted to create a heatmap of Globe At Night observations. That way, I could showcase where most of the observations were and explain that the project has contributors worldwide. For this project, I didn’t want to set up or maintain a server, so the site had to be completely static.
First, I searched for existing visualization libraries or implementations I could reference. I was lucky enough to find react-globe.gl, a Three.js globe with some prebuilt visualization layers. All I had to do was gather my data and convert it into the format the library expects.
I knew that I had a lot of data to visualize, so I had to send it to the browser in the most compact format possible. I tried configuring Vite with a Rollup plugin to import CSVs as ES Modules on the client, but the files were too large; the browser really struggled to parse and execute them.
This is when I thought of using SQLite in the browser. I had read about this previously in a Notion engineering blog post and had wanted to try it ever since. Essentially, SQLite can be compiled to WebAssembly and shipped to the browser, allowing you to query a full SQL database in the browser. You can see this as a successor to Web SQL, which was a browser feature that allowed websites to access a SQL database via JavaScript. Check out this excellent blog post for a history of Web SQL and why it eventually failed.
When visitors load the site, a script will:
- Download a precompiled SQLite WASM binary
- Download the full database file
- Query the database for a random sample of observations
- Convert the data into a format usable by
react-globe.gl
- Render a visualization
Is this inefficient? Yes. Would it have been a better user experience to bake the visualizations on the server and only send the necessary data to the client? Yes. But I only had so much time to finish the site, and the technology involved in this is really cool.
I used sql.js to handle loading the SQLite binary and importing the dataset. This made it really easy:
Then, with the data, we can create a visualization using react-globe.gl
:
The Earth texture I used was from here. Here’s what it looks like:
The real component is a bit more complicated, but this is all you need to create a 3D globe and display your data. The full source code is available here.
Here’s what it looks like:
You can see Globe At Night observations from all around the world. Most of them are concentrated in population centers, as you would expect, but the project has far more reach than I originally thought.
Now, let’s take a step back from the data and remember what this endeavor can teach us.
Why is reducing light pollution important?
Keeping light pollution levels low is important for a number of reasons. Here are a few of the most important ones:
- Light pollution affects the circadian rhythms of humans and other animals, which causes sleep disruption.
- Artificial light can confuse animals and throw off their behavioral and migratory patterns.
- Excessive light pollution makes it harder for astronomers to observe the night sky and diminishes its natural beauty.
- Light pollution is a side effect of unnecessary lighting, which is a waste of energy.
What can we do to reduce light pollution?
Luckily, reducing light pollution on a local scale is quite easy! Here are a few things you can do:
- Choose “warmer” LED lights that emit less blue light (more info, research article)
- Use light fixtures that low to the ground, point downward, and shielded to prevent light leakage
- Switch off outdoor lights when not in use, or consider installing timers or motion sensors that automatically turn off lights when they don’t need to be on
- Petition your HOA and/or local government to consider light pollution when choosing outdoor or street lighting
- Make a Globe At Night observation to help scientists map light pollution globally
- Sign petitions advocating for preventing companies from spoiling our night sky with satellite constellations
- Share this information with others!