Landsat images

Peter Ralph

2026-03-02

Data: Landsat images

The NASA/USGS Landsat program provides the longest continuous space-based record of Earth’s land in existence. Landsat data are essential for making informed decisions about our planet’s resources and environment.

picture of a landsat sattelite from NASA via Wikimedia Commons

Data (Landsat 8/9):

table of bands from Wikipedia

photo of Virginia Norwood from https://landsat.gsfc.nasa.gov/article/virginia-t-norwood-mother-landsat

Obtaining the data

It’s free: from earthexplorer.usgs.gov

as GeoTIFF raster files

and there’s a LOT of it: each scene is ~1G

$ ls -lath *.tar
-rw-rw-r-- 1 peter peter 948M Mar  1 08:33 LC08_L2SP_046029_20250602_20250607_02_T1.tar
-rw-rw-r-- 1 peter peter 962M Mar  1 08:29 LC08_L2SP_046029_20200604_20200825_02_T1.tar

screenshot of map from earthexplorer

Raster data in python

Some good references:

import pandas as pd
import geopandas as gpd
import numpy as np
import pyproj
import contextily
import json
import matplotlib.pyplot as plt
import shapely
import rasterio
import rasterio.plot
import rasterio.mask

Files

I’ve picked a day without a lot of clouds (Aug 25, 2020), downloaded the data above Eugene, unpackaged the tar archive and put the basic bands, cropped somewhat for file size, in data/landsat:

Metadata

You’ll need to get the data, linked on the previous slide or from github, and put it in the data/landsat/ subdirectory.

This comes with a JSON metadata file:

ddir = "data/landsat"
fname = "LC08_L2SP_046029_20200604_20200825_02_T1"
with open(f"{ddir}/{fname}_MTL.json", "r") as f:
    md = json.loads(f.read())

list(md['LANDSAT_METADATA_FILE'].keys())
['PRODUCT_CONTENTS',
 'IMAGE_ATTRIBUTES',
 'PROJECTION_ATTRIBUTES',
 'LEVEL2_PROCESSING_RECORD',
 'LEVEL2_SURFACE_REFLECTANCE_PARAMETERS',
 'LEVEL2_SURFACE_TEMPERATURE_PARAMETERS',
 'LEVEL1_PROCESSING_RECORD',
 'LEVEL1_MIN_MAX_RADIANCE',
 'LEVEL1_MIN_MAX_REFLECTANCE',
 'LEVEL1_MIN_MAX_PIXEL_VALUE',
 'LEVEL1_RADIOMETRIC_RESCALING',
 'LEVEL1_THERMAL_CONSTANTS',
 'LEVEL1_PROJECTION_PARAMETERS']

Of note: file names

md['LANDSAT_METADATA_FILE']['PRODUCT_CONTENTS']
{'ORIGIN': 'Image courtesy of the U.S. Geological Survey',
 'DIGITAL_OBJECT_IDENTIFIER': 'https://doi.org/10.5066/P9OGBGM6',
 'LANDSAT_PRODUCT_ID': 'LC08_L2SP_046029_20200604_20200825_02_T1',
 'PROCESSING_LEVEL': 'L2SP',
 'COLLECTION_NUMBER': '02',
 'COLLECTION_CATEGORY': 'T1',
 'OUTPUT_FORMAT': 'GEOTIFF',
 'FILE_NAME_BAND_1': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_B1.TIF',
 'FILE_NAME_BAND_2': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_B2.TIF',
 'FILE_NAME_BAND_3': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_B3.TIF',
 'FILE_NAME_BAND_4': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_B4.TIF',
 'FILE_NAME_BAND_5': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_B5.TIF',
 'FILE_NAME_BAND_6': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_B6.TIF',
 'FILE_NAME_BAND_7': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_B7.TIF',
 'FILE_NAME_BAND_ST_B10': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_B10.TIF',
 'FILE_NAME_THERMAL_RADIANCE': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_TRAD.TIF',
 'FILE_NAME_UPWELL_RADIANCE': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_URAD.TIF',
 'FILE_NAME_DOWNWELL_RADIANCE': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_DRAD.TIF',
 'FILE_NAME_ATMOSPHERIC_TRANSMITTANCE': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_ATRAN.TIF',
 'FILE_NAME_EMISSIVITY': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_EMIS.TIF',
 'FILE_NAME_EMISSIVITY_STDEV': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_EMSD.TIF',
 'FILE_NAME_CLOUD_DISTANCE': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_CDIST.TIF',
 'FILE_NAME_QUALITY_L2_AEROSOL': 'LC08_L2SP_046029_20200604_20200825_02_T1_SR_QA_AEROSOL.TIF',
 'FILE_NAME_QUALITY_L2_SURFACE_TEMPERATURE': 'LC08_L2SP_046029_20200604_20200825_02_T1_ST_QA.TIF',
 'FILE_NAME_QUALITY_L1_PIXEL': 'LC08_L2SP_046029_20200604_20200825_02_T1_QA_PIXEL.TIF',
 'FILE_NAME_QUALITY_L1_RADIOMETRIC_SATURATION': 'LC08_L2SP_046029_20200604_20200825_02_T1_QA_RADSAT.TIF',
 'FILE_NAME_ANGLE_COEFFICIENT': 'LC08_L2SP_046029_20200604_20200825_02_T1_ANG.txt',
 'FILE_NAME_METADATA_ODL': 'LC08_L2SP_046029_20200604_20200825_02_T1_MTL.txt',
 'FILE_NAME_METADATA_XML': 'LC08_L2SP_046029_20200604_20200825_02_T1_MTL.xml',
 'DATA_TYPE_BAND_1': 'UINT16',
 'DATA_TYPE_BAND_2': 'UINT16',
 'DATA_TYPE_BAND_3': 'UINT16',
 'DATA_TYPE_BAND_4': 'UINT16',
 'DATA_TYPE_BAND_5': 'UINT16',
 'DATA_TYPE_BAND_6': 'UINT16',
 'DATA_TYPE_BAND_7': 'UINT16',
 'DATA_TYPE_BAND_ST_B10': 'UINT16',
 'DATA_TYPE_THERMAL_RADIANCE': 'INT16',
 'DATA_TYPE_UPWELL_RADIANCE': 'INT16',
 'DATA_TYPE_DOWNWELL_RADIANCE': 'INT16',
 'DATA_TYPE_ATMOSPHERIC_TRANSMITTANCE': 'INT16',
 'DATA_TYPE_EMISSIVITY': 'INT16',
 'DATA_TYPE_EMISSIVITY_STDEV': 'INT16',
 'DATA_TYPE_CLOUD_DISTANCE': 'INT16',
 'DATA_TYPE_QUALITY_L2_AEROSOL': 'UINT8',
 'DATA_TYPE_QUALITY_L2_SURFACE_TEMPERATURE': 'INT16',
 'DATA_TYPE_QUALITY_L1_PIXEL': 'UINT16',
 'DATA_TYPE_QUALITY_L1_RADIOMETRIC_SATURATION': 'UINT16'}

Opening a raster:

band5 = rasterio.open(ddir + "/" + md['LANDSAT_METADATA_FILE']['PRODUCT_CONTENTS']['FILE_NAME_BAND_5'])
band5.shape
(2635, 7801)
band5.meta
{'driver': 'GTiff',
 'dtype': 'uint16',
 'nodata': 0.0,
 'width': 7801,
 'height': 2635,
 'count': 1,
 'crs': CRS.from_wkt('PROJCS["WGS 84 / UTM zone 10N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-123],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32610"]]'),
 'transform': Affine(30.0, 0.0, 393885.0,
        0.0, -30.0, 4938915.0)}

Viewing the raster:

fig, ax = plt.subplots()
rasterio.plot.show(band5, ax=ax)
ax.set_title("Band 5");

Viewing the raster, in context:

Note the crs!

bbox = gpd.GeoSeries(shapely.geometry.box(*band5.bounds), crs=band5.crs)

fig, ax = plt.subplots(figsize=(10,10))
bbox.plot(ax=ax, alpha=0)
contextily.add_basemap(ax, crs=band5.crs)
rasterio.plot.show(band5, ax=ax, alpha=0.5)
ax.set_title("Band 5");

Values:

raster.read() gets a numpy array of the values. Note the missing data:

plt.hist(
    band5.read().flatten(),
    bins=100,
);

Now, let’s explore!

Things we won’t talk about: (see the docs for a lot more!)

  • USGS provides lots of downstream data products! Sophisticated combinations of bands to help:

    • ignore clouds
    • get useful real-world measurements (snow cover, thermal upwelling…)
  • What the spectral bands “mean”

  • How these bands relate to what we see

  • What people usually use this data for

Goals:

Basically, “make a nice picture”.

Concretely: make a map where color communicates something important about land type.

Two possible workflows:

  1. Pick a few ‘different’ locations on the map; find some combinations of bands that distinguish those; map these to colors.

  2. Do PCA, and map the PCs to colors.

A quick reminder about “colorspace”:

  • Human retinas most commonly have three kinds of color-sensitive photoreceptors.
  • So, our color perception is “three-dimensional”.
  • Printing: RGB (e.g., #3A11FF;) or CMYK
  • Perception: HSL and HSV
  • Plus: transparancy, or alpha: #3A11FF00;
  • On computer monitors, RGB values range from 0 to 255.

So, by “make an image” we mean “make 1 to 3 rectangular arrays of integers between 0 and 255”.

screenshot of a color picker in HSV space

Brainstorming:

  1. Pick a few ‘different’ locations on the map; find some combinations of bands that distinguish those; map these to colors.

  2. Do PCA, and map the PCs to colors.

  • What is the difference between these?
  • How would we actually do (1)?
  • How would we actually do (2)?

Tools

We could just reduce the landsat images to arrays, but we’d like to maintain spatial context.

So, we’ll need to:

  • Extract values from all the bands at particular locations.
  • Extract values from all the bands averaged across polygons (e.g., a circle around a location).
  • Combine together bands arithmetically.
  • Turn the result into an image.

Get the bands in order:

Note: beware; trying to do some things with all of these (e.g., plot them all) might crash your browser.

band_names = [f"BAND_{k}" for k in range(1,7)]
file_names = {
    bn : ddir + "/" + md['LANDSAT_METADATA_FILE']['PRODUCT_CONTENTS']["FILE_NAME_" + bn]
    for bn in band_names
}
bands = { bn: rasterio.open(file_names[bn]) for bn in band_names }

Working with layers

First let’s look at mapping values to images, and rescaling.

Low contrast?

fig, ax = plt.subplots()
rasterio.plot.show(band5, ax=ax)
ax.set_title("Band 5");

Well, that makes sense:

values = band5.read(masked=True).squeeze()
print('data:', values.data.shape, 'mask:', values.mask.shape)
plt.hist(values.data[~values.mask], bins=100);
data: (2635, 7801) mask: (2635, 7801)

Let’s rescale!

And, let’s work with just numpy arrays for a bit.

plt.imshow(values)

Rescaling:

And, let’s work with just numpy arrays for a bit.

a, b = np.quantile(values.data[~values.mask], q=[0.05, 0.95])
values = (values - a) / (b - a)
values = np.minimum(1.0, np.maximum(0.0, values))
values = (255 * values).astype(int)
plt.imshow(values)

How about in color?

def norm_values(bn):
    values = bands[bn].read(masked=True)
    a, b = np.quantile(values.data[~values.mask], q=[0.05, 0.95])
    values = 255*(values - a)/(b - a)
    values[values<0] = 0
    values[values>255] = 255
    return values.astype(int)
    
values = np.ma.array([norm_values(bn) for bn in ["BAND_5", "BAND_6", "BAND_3"]]).squeeze()
values = np.moveaxis(values, [0, 1, 2], [2, 0, 1])
print('values:', values.shape)
plt.imshow(values);
values: (2635, 7801, 3)

Putting this back on a map:

All it takes is the right transform argument!

fig, ax = plt.subplots(figsize=(10,10))
bbox.plot(ax=ax, alpha=0)
contextily.add_basemap(ax, crs=bands["BAND_5"].crs)
rasterio.plot.show(np.moveaxis(values, [0,1,2], [1,2,0]), transform=bands["BAND_5"].transform, ax=ax, alpha=0.5);

Or, with ax.imshow:

fig, ax = plt.subplots(figsize=(10,10))
bbox.plot(ax=ax, alpha=0)
contextily.add_basemap(ax, crs=bands["BAND_5"].crs)
ax.imshow(values, extent=bbox.bounds[['minx', 'maxx', 'miny', 'maxy']].values[0], alpha=0.5);
ax.set_title("Band 5");

Extracting values

We’ll do this “by hand”; another method would use rasterstats.

First: pick some points:

npts = 10
pts = bbox.sample_points(npts, seed=123)
pd.concat([pts, bbox]).explore()
/tmp/ipykernel_382247/352313147.py:2: FutureWarning: The 'seed' keyword is deprecated. Use 'rng' instead.
  pts = bbox.sample_points(npts, seed=123)
Make this Notebook Trusted to load map: File -> Trust Notebook

Pour one out for the NAs

Some of the points will be outside the area of the raster:

bn = 'BAND_5'
xyz = pts.get_coordinates()
xyz[bn] = np.array(list(bands[bn].sample([xy[1:] for xy in xyz.itertuples()]))).flatten()
xyz
x y BAND_5
0 406480.733030 4.879229e+06 7540
0 435052.258031 4.918478e+06 19706
0 437033.534858 4.876763e+06 15259
0 445455.821025 4.925021e+06 18754
0 458611.706316 4.878197e+06 20870
0 553575.806556 4.900415e+06 14375
0 583939.477393 4.909662e+06 13602
0 585732.160050 4.923036e+06 15186
0 602146.586969 4.900826e+06 0
0 609975.429888 4.933177e+06 0

Rejection sampling:

Let’s sample more than we need and restrict to ones coved by the data:

pts = bbox.sample_points(2*npts, seed=123).explode(index_parts=True)[0]
xyz = pts.get_coordinates()
xyz[bn] = np.array(list(bands[bn].sample([xy[1:] for xy in xyz.itertuples()]))).flatten()
pts = pts.loc[xyz[bn] > 0][:npts]
pd.concat([pts, bbox]).explore()
/tmp/ipykernel_382247/1854403256.py:1: FutureWarning: The 'seed' keyword is deprecated. Use 'rng' instead.
  pts = bbox.sample_points(2*npts, seed=123).explode(index_parts=True)[0]
Make this Notebook Trusted to load map: File -> Trust Notebook

Extracting values from circles

A single pixel seems kinda… noisy? Let’s take averages within some buffer around each point.

pts = pts.buffer(5000)
pd.concat([pts, bbox]).explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Of use: mask

im, mask = rasterio.mask.mask(
    bands['BAND_5'], 
    [pts.iloc[0]],
    crop=True,
    filled=False,
)
print("shape: {im.shape}")
plt.imshow(im.squeeze());
shape: {im.shape}

Let’s get some tooling together:

  1. Write a fuction that gets average values in each of a bunch of polygons from a band.
  2. Apply that function to each of our bands.
def zonal_means(pts, band):
    pass

point_data = gpd.GeoDataFrame(pts)