The four data classes
Every price in the catalog belongs to exactly one of these four classes, and the UI labels each row with a source badge so you never have to guess:
- LIVE
- Fetched directly from the provider's public pricing API and stored unmodified. AWS EC2 on-demand + reserved is live via the AWS Bulk Pricing API, and AWS spot is enriched from the EC2
DescribeSpotPriceHistoryAPI when the refresh worker has AWS credentials with the required permission. AWS RDS pricing is in Preview while we verify imported RDS catalog coverage against the dashboard gate. Azure VMs, SQL, Blob, and spot are all live via the Azure Retail Prices API. GCP Compute on-demand/spot/1-year/3-year is live via the Cyclenerd/google-cloud-pricing-cost-calculator dataset (Apache 2.0 + CC-BY 4.0 licensed; attributed in our attribution doc). - DERIVED
- Computed from one or more LIVE rows by a documented formula. Examples: cost-per-vCPU, cost-per-GB-memory, region-delta percentages, cheapest-on-demand within a family. Every derived number carries a reference to the LIVE rows it was computed from so you can trace it back. We don't derive future prices, we don't derive prices for providers we don't fetch, and we don't derive anything that could reasonably be a matter of opinion.
- SEEDED
- Hand-curated from published vendor pricing pages because no public API exists, or because the API only returns partial data. Currently seeded: SaaS subscription reference pricing (Microsoft 365, Slack, Datadog, PagerDuty, etc. — rotated quarterly), and AI model per-token inference rates. Seeded rows are clearly labeled
seededin the catalog UI. We will replace them with LIVE fetchers as vendor APIs become available. - UNAVAILABLE
- Explicitly marked as
nullwhen the provider does not publish the price or when a provider requires an authenticated feed that has not been configured. Example: AWSspotHourlyis only populated when the catalog refresh worker can call EC2's spot-history API; otherwise it staysspotHourlyasnullon AWS rows and the UI shows “—” instead of a number. We never invent a placeholder value just to fill a cell. Azure Cosmos commitment fields work the same way until Azure publishes them via their Retail Prices API.
How often each class refreshes
- LIVE — refreshed daily by the
catalog:refreshcron job. Every row gets a newfetched_attimestamp and astale_afterwindow derived from the provider's own SLA on their pricing feed. - DERIVED — recomputed on read, never stored. If a LIVE row refreshes at 2 AM, every derived value using it is automatically fresh for the next request.
- SEEDED — rotated quarterly. Each seeded JSON file has a header comment documenting the date we last pulled each vendor's public pricing page.
- UNAVAILABLE — always null. If we ever wire up a fetcher for that field, the value flips to LIVE automatically.
Drift detection
Today, drift review is an operator workflow supported by catalog freshness checks, provider refresh history, and spot checks against upstream pricing sources. A daily validator that records structured drift events is planned, but it is not shipped yet. Until that validator exists, we do not claim an automated drift-event table or automated daily SKU sampling.
Freshness guardrail
The CI deploy pipeline runs a freshness check (scripts/check-catalog-freshness.ts) before every deploy. If the bundled catalog-generated.json is older than CATALOG_MAX_AGE_DAYS (default 14 days), the deploy fails. This ensures we never ship a stale snapshot to customers by accident.
Your own spend data
Separate from the catalog: when you connect a cloud account and run a sync, we pull your own billing records from the provider and store them in your workspace. AWS uses Cost Explorer API + CUR. Azure uses the Cost Management API. GCP uses the BigQuery billing export (when available). Your spend data is stored per workspace and is never mixed with any other customer's data, never used to train any model, and never shared.
If a workspace has no imported spend data, the spend dashboard shows a clear setup card explaining how to connect an account — never sample or placeholder numbers. Customer data dashboards should follow that same empty-state rule.
What we will never do
- Invent a placeholder price just because a cell in the UI looks empty.
- Show a DERIVED number without the user being able to trace it back to the LIVE source.
- Ship a LIVE rate without a
fetched_attimestamp. - Mix SEEDED data with LIVE data without labeling it.
- Use your own spend data to improve the product without opt-in consent.
- Auto-take any action on your cloud account. The cloud credentials we hold are read-only and side-effect-free.
Current coverage limits
- AWS spot depends on authenticated access to EC2's spot-history API. In environments where the refresh worker lacks that permission,
spotHourlystays null on AWS rows. - Azure Cosmos and Azure SQL commitment rates — fetched as
nulluntil Azure exposes them via their Retail Prices API (currently on-demand only). - GCP BigQuery, Cloud SQL, Pub/Sub — seeded from our static snapshot until we replace the Cyclenerd dataset with a live GCP Cloud Billing Catalog fetcher.
- SaaS pricing — seeded quarterly. Will never be LIVE unless a SaaS vendor publishes a real pricing API (most don't).
Want to verify a specific number?
Every catalog row in the UI is clickable. Clicking shows the full provenance: the source URL or endpoint we fetched from, the fetch timestamp, the confidence level, and the stale-after window. If you find a number that looks wrong, email us and we'll investigate the specific row.