Pricing Methodology

How we source and display cloud pricing

We built this page because we owe you a straight answer about where every number on the catalog comes from. If you're going to hand us real cloud spend data, you need to know which numbers are real, which are derived, and which are deliberately marked as unavailable.

The four data classes

Every price in the catalog belongs to exactly one of these four classes, and the UI labels each row with a source badge so you never have to guess:

LIVE
Fetched directly from the provider's public pricing API and stored unmodified. AWS EC2 on-demand + reserved is live via the AWS Bulk Pricing API, and AWS spot is enriched from the EC2 DescribeSpotPriceHistory API when the refresh worker has AWS credentials with the required permission. AWS RDS pricing is in Preview while we verify imported RDS catalog coverage against the dashboard gate. Azure VMs, SQL, Blob, and spot are all live via the Azure Retail Prices API. GCP Compute on-demand/spot/1-year/3-year is live via the Cyclenerd/google-cloud-pricing-cost-calculator dataset (Apache 2.0 + CC-BY 4.0 licensed; attributed in our attribution doc).
DERIVED
Computed from one or more LIVE rows by a documented formula. Examples: cost-per-vCPU, cost-per-GB-memory, region-delta percentages, cheapest-on-demand within a family. Every derived number carries a reference to the LIVE rows it was computed from so you can trace it back. We don't derive future prices, we don't derive prices for providers we don't fetch, and we don't derive anything that could reasonably be a matter of opinion.
SEEDED
Hand-curated from published vendor pricing pages because no public API exists, or because the API only returns partial data. Currently seeded: SaaS subscription reference pricing (Microsoft 365, Slack, Datadog, PagerDuty, etc. — rotated quarterly), and AI model per-token inference rates. Seeded rows are clearly labeled seeded in the catalog UI. We will replace them with LIVE fetchers as vendor APIs become available.
UNAVAILABLE
Explicitly marked as null when the provider does not publish the price or when a provider requires an authenticated feed that has not been configured. Example: AWS spotHourly is only populated when the catalog refresh worker can call EC2's spot-history API; otherwise it stays spotHourly as null on AWS rows and the UI shows “—” instead of a number. We never invent a placeholder value just to fill a cell. Azure Cosmos commitment fields work the same way until Azure publishes them via their Retail Prices API.

How often each class refreshes

  • LIVE — refreshed daily by the catalog:refresh cron job. Every row gets a new fetched_at timestamp and a stale_after window derived from the provider's own SLA on their pricing feed.
  • DERIVED — recomputed on read, never stored. If a LIVE row refreshes at 2 AM, every derived value using it is automatically fresh for the next request.
  • SEEDED — rotated quarterly. Each seeded JSON file has a header comment documenting the date we last pulled each vendor's public pricing page.
  • UNAVAILABLE — always null. If we ever wire up a fetcher for that field, the value flips to LIVE automatically.

Drift detection

Today, drift review is an operator workflow supported by catalog freshness checks, provider refresh history, and spot checks against upstream pricing sources. A daily validator that records structured drift events is planned, but it is not shipped yet. Until that validator exists, we do not claim an automated drift-event table or automated daily SKU sampling.

Freshness guardrail

The CI deploy pipeline runs a freshness check (scripts/check-catalog-freshness.ts) before every deploy. If the bundled catalog-generated.json is older than CATALOG_MAX_AGE_DAYS (default 14 days), the deploy fails. This ensures we never ship a stale snapshot to customers by accident.

Your own spend data

Separate from the catalog: when you connect a cloud account and run a sync, we pull your own billing records from the provider and store them in your workspace. AWS uses Cost Explorer API + CUR. Azure uses the Cost Management API. GCP uses the BigQuery billing export (when available). Your spend data is stored per workspace and is never mixed with any other customer's data, never used to train any model, and never shared.

If a workspace has no imported spend data, the spend dashboard shows a clear setup card explaining how to connect an account — never sample or placeholder numbers. Customer data dashboards should follow that same empty-state rule.

What we will never do

  • Invent a placeholder price just because a cell in the UI looks empty.
  • Show a DERIVED number without the user being able to trace it back to the LIVE source.
  • Ship a LIVE rate without a fetched_at timestamp.
  • Mix SEEDED data with LIVE data without labeling it.
  • Use your own spend data to improve the product without opt-in consent.
  • Auto-take any action on your cloud account. The cloud credentials we hold are read-only and side-effect-free.

Current coverage limits

  • AWS spot depends on authenticated access to EC2's spot-history API. In environments where the refresh worker lacks that permission, spotHourly stays null on AWS rows.
  • Azure Cosmos and Azure SQL commitment rates — fetched as null until Azure exposes them via their Retail Prices API (currently on-demand only).
  • GCP BigQuery, Cloud SQL, Pub/Sub — seeded from our static snapshot until we replace the Cyclenerd dataset with a live GCP Cloud Billing Catalog fetcher.
  • SaaS pricing — seeded quarterly. Will never be LIVE unless a SaaS vendor publishes a real pricing API (most don't).

Want to verify a specific number?

Every catalog row in the UI is clickable. Clicking shows the full provenance: the source URL or endpoint we fetched from, the fetch timestamp, the confidence level, and the stale-after window. If you find a number that looks wrong, email us and we'll investigate the specific row.