Pricing methodology

The four data classes

Every price in the catalog belongs to exactly one of these four classes, and the UI labels each row with a source badge so you never have to guess:

LIVE: Fetched directly from the provider's public pricing API and stored unmodified. AWS EC2 on-demand + reserved is live via the AWS Bulk Pricing API, and AWS spot is enriched from the EC2 DescribeSpotPriceHistory API when the refresh worker has AWS credentials with the required permission. AWS RDS is live via the RDS Bulk Pricing API. Azure VMs, SQL, Blob, and spot are all live via the Azure Retail Prices API. GCP Compute on-demand/spot/1-year/3-year is live via the Cyclenerd/google-cloud-pricing-cost-calculator dataset (Apache 2.0 + CC-BY 4.0 licensed; attributed in our attribution doc).
DERIVED: Computed from one or more LIVE rows by a documented formula. Examples: cost-per-vCPU, cost-per-GB-memory, region-delta percentages, cheapest-on-demand within a family. Every derived number carries a reference to the LIVE rows it was computed from so you can trace it back. We don't derive future prices, we don't derive prices for providers we don't fetch, and we don't derive anything that could reasonably be a matter of opinion.
SEEDED: Hand-curated from published vendor pricing pages because no public API exists, or because the API only returns partial data. Currently seeded: SaaS subscription reference pricing (Microsoft 365, Slack, Datadog, PagerDuty, etc. — rotated quarterly), and AI model per-token inference rates. Seeded rows are clearly labeled seeded in the catalog UI. We will replace them with LIVE fetchers as vendor APIs become available.
UNAVAILABLE: Explicitly marked as null when the provider does not publish the price or when a provider requires an authenticated feed that has not been configured. Example: AWS spotHourly is only populated when the catalog refresh worker can call EC2's spot-history API; otherwise it stays spotHourly as null on AWS rows and the UI shows “—” instead of a number. We never invent a placeholder value just to fill a cell. Azure Cosmos commitment fields work the same way until Azure publishes them via their Retail Prices API.

How often each class refreshes

LIVE — refreshed daily by the catalog:refresh cron job. Every row gets a new fetched_at timestamp and a stale_after window derived from the provider's own SLA on their pricing feed.
DERIVED — recomputed on read, never stored. If a LIVE row refreshes at 2 AM, every derived value using it is automatically fresh for the next request.
SEEDED — rotated quarterly. Each seeded JSON file has a header comment documenting the date we last pulled each vendor's public pricing page.
UNAVAILABLE — always null. If we ever wire up a fetcher for that field, the value flips to LIVE automatically.

Drift detection

Each day, a drift validator samples ~100 SKUs across all LIVE providers and diffs them against the GCP Cloud Billing Catalog API (for providers where we have a cross-reference) or against the provider's own API re-fetched fresh. If any row drifted by more than 1% without a corresponding provider announcement, the validator writes a row to cloudcost_catalog_drift_events and sends an alert to the operator. You can see the current drift health on the admin catalog page.

Freshness guardrail

The CI deploy pipeline runs a freshness check (scripts/check-catalog-freshness.ts) before every deploy. If the bundled catalog-generated.json is older than CATALOG_MAX_AGE_DAYS (default 14 days), the deploy fails. This ensures we never ship a stale snapshot to customers by accident.

Your own spend data

Separate from the catalog: when you connect a cloud account and run a sync, we pull your own billing records from the provider and store them in your workspace. AWS uses Cost Explorer API + CUR. Azure uses the Cost Management API. GCP uses the BigQuery billing export (when available). Your spend data is stored per workspace and is never mixed with any other customer's data, never used to train any model, and never shared.

If a workspace has no imported spend data, the spend dashboard shows a clear setup card explaining how to connect an account — never sample or placeholder numbers. Every page in the product works this way.

What we will never do

Invent a placeholder price just because a cell in the UI looks empty.
Show a DERIVED number without the user being able to trace it back to the LIVE source.
Ship a LIVE rate without a fetched_at timestamp.
Mix SEEDED data with LIVE data without labeling it.
Use your own spend data to improve the product without opt-in consent.
Auto-take any action on your cloud account. The cloud credentials we hold are read-only and side-effect-free.

Current coverage limits

AWS spot depends on authenticated access to EC2's spot-history API. In environments where the refresh worker lacks that permission, spotHourly stays null on AWS rows.
Azure Cosmos and Azure SQL commitment rates — fetched as null until Azure exposes them via their Retail Prices API (currently on-demand only).
GCP BigQuery, Cloud SQL, Pub/Sub — seeded from our static snapshot until we replace the Cyclenerd dataset with a live GCP Cloud Billing Catalog fetcher.
SaaS pricing — seeded quarterly. Will never be LIVE unless a SaaS vendor publishes a real pricing API (most don't).

Want to verify a specific number?

Every catalog row in the UI is clickable. Clicking shows the full provenance: the exact API endpoint we fetched from, the fetch timestamp, the confidence level, and the stale-after window. If you find a number that looks wrong, email us and we'll investigate the specific row.

How we source and display cloud pricing