2026-03-09
For more than 15 years, we’ve asked volunteers to help us explore galaxies near and far, sampling a fraction of the roughly one hundred billion that are scattered throughout the observable Universe. Each one of the systems, containing billions of stars, has had a unique life, interacting with its surroundings and with other galaxies in many different ways; the aim of the Galaxy Zoo team is to try and understand these processes, and to work out what galaxies can tell us about the past, present and future of the Universe as a whole.
next slide from NASA/ESA/CSA James Webb Space Telescope
https://data.galaxyzoo.org/
The original Galaxy Zoo project ran from July 2007 until February 2009. It was replaced by Galaxy Zoo 2, Galaxy Zoo: Hubble, and Galaxy Zoo: CANDELS. In the original Galaxy Zoo project, volunteers classified images of Sloan Digital Sky Survey galaxies as belonging to one of six categories - elliptical, clockwise spiral, anticlockwise spiral, edge-on , star/don’t know, or merger.
Goals:

We’ll use HSV space:
Hue is circular (think rainbow)Saturation is white -> colorfulValue is black -> colorfulLet’s compare color values in the center to values near the edge:
Next, the average value across the image:
RGB = pd.DataFrame(
np.array([np.asarray(Image.open(fn)).mean(axis=(0,1)) for fn in filenames]),
columns=("R", "G", "B"),
)
HSV = pd.DataFrame(
np.array([np.asarray(Image.open(fn).convert("HSV")).mean(axis=(0,1)) for fn in filenames]),
columns=("H", "S", "V"),
)
colors = pd.concat([RGB, HSV], axis=1)def plot_extremes(filenames, x, n=3):
fig, axes = plt.subplots(2, n)
ix = np.argsort(x)
for ax, k in zip(axes[0], ix[:n]):
ax.imshow(Image.open(filenames[k]))
ax.set_title(f"{k}: {x[k]:.2f}")
ax.set_axis_off()
for ax, k in zip(axes[1], ix[-n:]):
ax.imshow(Image.open(filenames[k]))
ax.set_title(f"{k}: {x[k]:.2f}")
ax.set_axis_off();def im_to_array(fn, size):
im = Image.open(fn).resize(size=size)
return np.array([np.asarray(x) for x in im.split()]).flatten()
def images_to_array(filenames, size):
return np.array([im_to_array(fn, size=size) for fn in filenames])
size = (100, 110)
X = images_to_array(filenames, size=size).astype("float")
X -= X.mean(axis=1)[:,np.newaxis]
X.shape(5000, 33000)
pca = sklearn.decomposition.PCA(n_components=12).fit(X)
pcs = pd.DataFrame(
pca.transform(X), columns=[f"PC{k+1}" for k in range(pca.n_components_)]
)
ij = pd.Series([f"{x}.{i}.{j}" for x in ('r', 'g', 'b') for i in range(size[0]) for j in range(size[1])])
loadings = pd.concat([
pd.DataFrame(pca.components_.T, index=ij, columns=[f"PC{k+1}" for k in range(pca.n_components_)])
.reset_index(names='variable'),
pd.DataFrame.from_records(ij.str.split("."), columns=('rgb', 'row', 'col')),
], axis=1)
loadings['row'] = loadings['row'].astype('int')
loadings['col'] = loadings['col'].astype('int')
loadings| variable | PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | PC11 | PC12 | rgb | row | col | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | r.0.0 | -0.002556 | -0.000460 | 0.000462 | -0.006387 | -0.002465 | 0.001673 | 0.007242 | -0.000868 | 0.001055 | 0.005432 | 0.010187 | -0.001327 | r | 0 | 0 |
| 1 | r.0.1 | -0.002568 | -0.000415 | 0.000409 | -0.006656 | -0.002117 | 0.001825 | 0.007844 | -0.001226 | 0.001306 | 0.006359 | 0.010463 | -0.002005 | r | 0 | 1 |
| 2 | r.0.2 | -0.002496 | -0.000585 | 0.000376 | -0.006825 | -0.001908 | 0.001726 | 0.008319 | -0.000801 | 0.001367 | 0.006715 | 0.011059 | -0.002117 | r | 0 | 2 |
| 3 | r.0.3 | -0.002542 | -0.000692 | 0.000187 | -0.007098 | -0.002292 | 0.001729 | 0.008218 | -0.000497 | 0.000891 | 0.006882 | 0.011823 | -0.002481 | r | 0 | 3 |
| 4 | r.0.4 | -0.002634 | -0.000789 | -0.000240 | -0.007402 | -0.002236 | 0.001496 | 0.008715 | 0.000083 | 0.000841 | 0.007138 | 0.012801 | -0.002597 | r | 0 | 4 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32995 | b.99.105 | -0.002404 | -0.000214 | 0.000206 | 0.004532 | -0.003090 | -0.004303 | 0.001218 | 0.004700 | 0.003450 | 0.000932 | -0.003604 | 0.002391 | b | 99 | 105 |
| 32996 | b.99.106 | -0.002451 | -0.000150 | 0.000306 | 0.004506 | -0.003063 | -0.004476 | 0.001253 | 0.004087 | 0.003394 | 0.001102 | -0.003609 | 0.001687 | b | 99 | 106 |
| 32997 | b.99.107 | -0.002275 | 0.000036 | 0.000373 | 0.004389 | -0.003284 | -0.004167 | 0.001216 | 0.004093 | 0.003440 | 0.001188 | -0.003503 | 0.001381 | b | 99 | 107 |
| 32998 | b.99.108 | -0.002260 | 0.000057 | 0.000450 | 0.003915 | -0.003304 | -0.003941 | 0.000727 | 0.004434 | 0.003355 | 0.000867 | -0.003392 | 0.001677 | b | 99 | 108 |
| 32999 | b.99.109 | -0.002334 | 0.000013 | 0.000459 | 0.003671 | -0.003350 | -0.003447 | 0.000463 | 0.003925 | 0.003687 | 0.000608 | -0.003066 | 0.001869 | b | 99 | 109 |
33000 rows × 16 columns