Data Catalogs - Overview
Starplot has a Catalog class that represents a catalog of sky objects. They're currently supported for the following object types: stars, constellations, and deep sky objects (DSOs).
There are a few officially supported catalogs for each object type, but you can also build your own. When you first plot an object type with a catalog, if the catalog path doesn't already exist and the catalog has a URL defined then it'll be downloaded from that URL. All the offically supported catalogs have download URLs.
All catalogs are stored as Parquet files to allow fast object lookup.
starplot.data.Catalog
dataclass
Catalog of objects
path
instance-attribute
Path of the catalog. If using Hive partitions, this should be a glob (e.g. /data/**/*.parquet).
url
class-attribute
instance-attribute
Remote URL of the catalog. If the catalog doesn't exist at the path then it'll be downloaded from this URL.
hive_partitioning
class-attribute
instance-attribute
If the catalog uses hive partitioning, then set this to True
healpix_nside
class-attribute
instance-attribute
HEALPix resolution (NSIDE)
spatial_query_method
class-attribute
instance-attribute
spatial_query_method: SpatialQueryMethod = GEOMETRY
Method to use for spatial querying on this catalog.
For relatively small catalogs (less than 1 million objects), the geometry method should have good performance.
For larger catalogs, you can improve querying performance tremendously by defining a healpix_nside on the catalog,
and setting this query method to SpatialQueryMethod.HEALPIX
download_if_not_exists
Downloads the catalog only if it doesn't already exist at its path
healpix_ids_from_extent
Returns HEALPix ids from a given polygon or multipolygon
Parameters:
-
extent(Polygon | MultiPolygon) –Polygon or multipolygon to get the HEALPix ids for
Returns:
-
list[int]–List of integer HEALPix ids that are in the geometry (inclusive)
build
build(
objects: Iterable[SkyObject],
chunk_size: int = 1000000,
columns: list[str] = None,
partition_columns: list[str] = None,
sorting_columns: list[str] = None,
compression: str = "snappy",
row_group_size: int = 200000,
) -> None
Creates the catalog from an iterable of sky objects. Output is one or more Parquet files.
Parameters:
-
objects(Iterable[SkyObject]) –Iterable that contains the sky objects for the catalog
-
chunk_size(int, default:1000000) –Max number of objects to write per file
-
columns(list[str], default:None) –List of columns to include in the catalog. Only the columns in this list will be written to the Parquet files.
-
partition_columns(list[str], default:None) –List of columns to create Hive partitions for
-
sorting_columns(list[str], default:None) –List of columns to sort by
-
compression(str, default:'snappy') –Type of compression to use -- this is passed directly to PyArrow's Parquet writer.
-
row_group_size(int, default:200000) –Row group size for the Parquet files