Data Catalogs - Overview

Starplot has a Catalog class that represents a catalog of sky objects. They're currently supported for the following object types: stars, constellations, and deep sky objects (DSOs).

There are a few officially supported catalogs for each object type, but you can also build your own. When you first plot an object type with a catalog, if the catalog path doesn't already exist and the catalog has a URL defined then it'll be downloaded from that URL. All the offically supported catalogs have download URLs.

All catalogs are stored as Parquet files to allow fast object lookup.

starplot.data.Catalog `dataclass`

Catalog of objects

path `instance-attribute`

path: Path | str

Path of the catalog. If using Hive partitions, this should be a glob (e.g. /data/**/*.parquet).

url `class-attribute` `instance-attribute`

url: str = None

Remote URL of the catalog. If the catalog doesn't exist at the path then it'll be downloaded from this URL.

hive_partitioning `class-attribute` `instance-attribute`

hive_partitioning: bool = False

If the catalog uses hive partitioning, then set this to True

healpix_nside `class-attribute` `instance-attribute`

healpix_nside: int = None

HEALPix resolution (NSIDE)

spatial_query_method `class-attribute` `instance-attribute`

spatial_query_method: SpatialQueryMethod = GEOMETRY

Method to use for spatial querying on this catalog.

For relatively small catalogs (less than 1 million objects), the geometry method should have good performance.

For larger catalogs, you can improve querying performance tremendously by defining a healpix_nside on the catalog, and setting this query method to SpatialQueryMethod.HEALPIX

exists

exists() -> bool

Returns true if the catalog path exists, else False.

download

download(silent: bool = False)

Downloads the catalog from its URL to its path

download_if_not_exists

download_if_not_exists(silent: bool = False)

Downloads the catalog only if it doesn't already exist at its path

healpix_ids_from_extent

healpix_ids_from_extent(extent: Polygon | MultiPolygon) -> list[int]

Returns HEALPix ids from a given polygon or multipolygon

Parameters:

extent (Polygon | MultiPolygon) –

Polygon or multipolygon to get the HEALPix ids for

Returns:

list[int] –

List of integer HEALPix ids that are in the geometry (inclusive)

build

build(
    objects: Iterable[SkyObject],
    chunk_size: int = 1000000,
    columns: list[str] = None,
    partition_columns: list[str] = None,
    sorting_columns: list[str] = None,
    compression: str = "snappy",
    row_group_size: int = 200000,
) -> None

Creates the catalog from an iterable of sky objects. Output is one or more Parquet files.

Parameters:

objects (Iterable[SkyObject]) –

Iterable that contains the sky objects for the catalog
chunk_size (int, default: 1000000 ) –

Max number of objects to write per file
columns (list[str], default: None ) –

List of columns to include in the catalog. Only the columns in this list will be written to the Parquet files.
partition_columns (list[str], default: None ) –

List of columns to create Hive partitions for
sorting_columns (list[str], default: None ) –

List of columns to sort by
compression (str, default: 'snappy' ) –

Type of compression to use -- this is passed directly to PyArrow's Parquet writer.
row_group_size (int, default: 200000 ) –

Row group size for the Parquet files

starplot.data.catalogs.SpatialQueryMethod

Options for spatial querying method

GEOMETRY `class-attribute` `instance-attribute`

GEOMETRY = 'geometry'

Use the geometry field

HEALPIX `class-attribute` `instance-attribute`

HEALPIX = 'healpix'

Use the healpix_index field

Data Catalogs - Overview

starplot.data.Catalog dataclass

path instance-attribute

url class-attribute instance-attribute

hive_partitioning class-attribute instance-attribute

healpix_nside class-attribute instance-attribute

spatial_query_method class-attribute instance-attribute