Motivated by a desire to support community members financially during the coronavirus pandemic, researchers employed 30 local citizen scientists in the Hubble Image Similarity Project. This project quantified the similarities between astronomical images, providing a way to test the results of image-search algorithms.
The Eagle Nebula, pictured here in an image from Kitt Peak National Observatory, is a star-forming region in the Milky Way. Credit:  T.A.Rector (NRAO/AUI/NSF and NOIRLab/NSF/AURA) and B.A.Wolpa Credit: NOIRLab/NSF/AURA); CC BY 4.0
Seeking Similarities
Say you have an image of a star-forming region, featuring eye-catching gas clouds, dense and dusty knots, and newborn stars. How would you go about finding other images that resemble yours?
You might start your search with an astronomical image database, using filters for object type or instrument to sift through thousands and thousands of options. But even filtering out everything but star-forming regions might yield vastly different results, given the widely varying shapes, colors, and sizes of these regions.
Or maybe you’ll feed your image into a neural network that has been trained to spot similar images. The results may seem promising, but how can you tell whether the algorithm has found the images that are the most similar? Would another algorithm do better?
You might start your search with an astronomical image database, using filters for object type or instrument to sift through thousands and thousands of options. But even filtering out everything but star-forming regions might yield vastly different results, given the widely varying shapes, colors, and sizes of these regions.
Or maybe you’ll feed your image into a neural network that has been trained to spot similar images. The results may seem promising, but how can you tell whether the algorithm has found the images that are the most similar? Would another algorithm do better?
An example of individual test images (green squares) extracted from a Hubble Legacy Archive image (red square). Low-contrast areas have been excluded, leaving the galaxy’s spiral arms for analysis. Credit: White & Peek 2025
The Hubble Image Similarity Project
Astronomical image collections rarely contain information about similarities between images in their metadata, and while neural networks appear to excel at gathering similar images, the results of these models are generally unverified. The Hubble Image Similarity Project, led by Richard White (Space Telescope Science Institute) and Josh Peek (Space Telescope Science Institute and Johns Hopkins University), addressed these issues with a team of citizen scientists who generated similarity information for astronomical images, providing a quantitative means to test the results of neural networks.
White and Peek began by amassing a sample of images from the Hubble Legacy Archive. This sample included many different object types, such as galaxies, planetary nebulae, star-forming regions, and star clusters. After trimming and binning the images, converting them to 8-bit grayscale, filtering out low-contrast images, and eliminating satellite trails, image artifacts, and repeated observations of the same patch of sky, 2,098 images of 666 objects remained.
White and Peek began by amassing a sample of images from the Hubble Legacy Archive. This sample included many different object types, such as galaxies, planetary nebulae, star-forming regions, and star clusters. After trimming and binning the images, converting them to 8-bit grayscale, filtering out low-contrast images, and eliminating satellite trails, image artifacts, and repeated observations of the same patch of sky, 2,098 images of 666 objects remained.
Examples of similar images according to the image similarity matrix. In the lower-right corner is a visualization of the similarity data. The semicircle of data points in the bottom half of this visualization represents galaxies, while star clusters occupy the small arc near the top and nebulae sit in the island in the center of the plot. Credit: Adapted from White & Peek 2025
Citizen Scientists, Assemble
White and Peek recruited 30 members of the community within walking distance of the Space Telescope Science Institute to identify similar astronomical images, and the reviewers were paid for their work. In the three phases of the project, reviewers considered test images one at a time and 1) selected all similar images from a set of 15 comparison images, 2) selected the most similar image from a narrowed-down set of 6 comparison images, and finally 3) selected the most similar image from a set of 3 comparison images.
The citizen science team ultimately compared 5.4 million pairs of images, and White and Peek used these comparisons to produce an image similarity matrix. The matrix describes the metaphorical “distance” between the images, with the most similar images being the smallest distance apart.
Similar images resemble one another in terms of structure, texture, and other factors that White and Peek say are “difficult even to describe in words” — for example, the diffuse glow of a galaxy interrupted by a bright star with diffraction spikes, or a nebula speckled with stars and dense dusty clumps. The similarity data from this study are available online and can be used to test the performance of image-search algorithms. In future work, the authors plan to carry out a similar project using images of the Martian landscape.
The citizen science team ultimately compared 5.4 million pairs of images, and White and Peek used these comparisons to produce an image similarity matrix. The matrix describes the metaphorical “distance” between the images, with the most similar images being the smallest distance apart.
Similar images resemble one another in terms of structure, texture, and other factors that White and Peek say are “difficult even to describe in words” — for example, the diffuse glow of a galaxy interrupted by a bright star with diffraction spikes, or a nebula speckled with stars and dense dusty clumps. The similarity data from this study are available online and can be used to test the performance of image-search algorithms. In future work, the authors plan to carry out a similar project using images of the Martian landscape.
By Kerry Hensley
Citation
“The Hubble Image Similarity Project,” Richard L. White and J. E. G. Peek 2025 AJ 169 306. doi:10.3847/1538-3881/adcb43




 
