Skip to content

Data Catalogs - Overview

Starplot has a Catalog class that represents a catalog of sky objects. They're currently supported for the following object types: stars, constellations, and deep sky objects (DSOs).

There are a few officially supported catalogs for each object type, but you can also build your own. When you first plot an object type with a catalog, if the catalog path doesn't already exist and the catalog has a URL defined then it'll be downloaded from that URL. All the offically supported catalogs have download URLs.

All catalogs are stored as Parquet files to allow fast object lookup.

starplot.data.Catalog dataclass

Catalog of objects

path instance-attribute

path: Path | str

Path of the catalog. If using Hive partitions, this should be a glob (e.g. /data/**/*.parquet).

url class-attribute instance-attribute

url: str = None

Remote URL of the catalog. If the catalog doesn't exist at the path then it'll be downloaded from this URL.

hive_partitioning class-attribute instance-attribute

hive_partitioning: bool = False

If the catalog uses hive partitioning, then set this to True

healpix_nside class-attribute instance-attribute

healpix_nside: int = None

HEALPix resolution (NSIDE)

spatial_query_method class-attribute instance-attribute

spatial_query_method: SpatialQueryMethod = GEOMETRY

Method to use for spatial querying on this catalog.

For relatively small catalogs (less than 1 million objects), the geometry method should have good performance.

For larger catalogs, you can improve querying performance tremendously by defining a healpix_nside on the catalog, and setting this query method to SpatialQueryMethod.HEALPIX

exists

exists() -> bool

Returns true if the catalog path exists, else False.

download

download(silent: bool = False)

Downloads the catalog from its URL to its path

download_if_not_exists

download_if_not_exists(silent: bool = False)

Downloads the catalog only if it doesn't already exist at its path

healpix_ids_from_extent

healpix_ids_from_extent(extent: Polygon | MultiPolygon) -> list[int]

Returns HEALPix ids from a given polygon or multipolygon

Parameters:

  • extent (Polygon | MultiPolygon) –

    Polygon or multipolygon to get the HEALPix ids for

Returns:

  • list[int]

    List of integer HEALPix ids that are in the geometry (inclusive)

build

build(
    objects: Iterable[SkyObject],
    chunk_size: int = 1000000,
    columns: list[str] = None,
    partition_columns: list[str] = None,
    sorting_columns: list[str] = None,
    compression: str = "snappy",
    row_group_size: int = 200000,
) -> None

Creates the catalog from an iterable of sky objects. Output is one or more Parquet files.

Parameters:

  • objects (Iterable[SkyObject]) –

    Iterable that contains the sky objects for the catalog

  • chunk_size (int, default: 1000000 ) –

    Max number of objects to write per file

  • columns (list[str], default: None ) –

    List of columns to include in the catalog. Only the columns in this list will be written to the Parquet files.

  • partition_columns (list[str], default: None ) –

    List of columns to create Hive partitions for

  • sorting_columns (list[str], default: None ) –

    List of columns to sort by

  • compression (str, default: 'snappy' ) –

    Type of compression to use -- this is passed directly to PyArrow's Parquet writer.

  • row_group_size (int, default: 200000 ) –

    Row group size for the Parquet files

starplot.data.catalogs.SpatialQueryMethod

Options for spatial querying method

GEOMETRY class-attribute instance-attribute

GEOMETRY = 'geometry'

Use the geometry field

HEALPIX class-attribute instance-attribute

HEALPIX = 'healpix'

Use the healpix_index field