An automated valuation model is only as good as the data it learns from. In Canada, the binding constraint on AVM quality is rarely the algorithm. It is whether the training set contains enough verified repeat-sale pairs, tied to a persistent property identifier, with lifecycle history attached. BrightCat's Canadian Home Price Index dataset provides 194,167 such pairs, drawn from a weekly pipeline that has operated continuously since 2014.
An AVM produces a point estimate of a property's current value. To do that well, it needs four inputs: observed transaction prices, property-level attributes, temporal coverage sufficient to model market movement, and geographic coverage sufficient to model local variation. The first of these is where most Canadian AVM training data breaks down.
Individual sale prices are useful but limited. A single transaction tells the model what one property was worth on one date. The richer signal comes from watching the same property sell more than once. The price change between two sales of the same property, net of general market movement, is the cleanest observable evidence of what that specific property is worth relative to its peers.
That is a repeat-sale pair. It is the unit of learning for the time dimension of any serious valuation model.
Building repeat-sale pairs in Canadian data is harder than it sounds. The core difficulty is that a single property may carry different identifiers across its listing history. Listing numbers are reassigned when a property relists. Addresses are recorded with variations in punctuation, unit formatting, and directional suffixes. A property that sold in 2015, relisted in 2019 under a new listing number, and sold again in 2021 may appear as three unrelated records in any system that relies on the listing number as the join key.
To produce accurate repeat-sale pairs, the pipeline needs a persistent property identifier: a stable reference that links every record touching the same physical property, regardless of listing number, agent change, relist, or cosmetic address variation. That identifier is not something an AVM can generate from the data it sees at training time. It has to be produced upstream, in the pipeline that assembles the training set.
BrightCat's pipeline produces that identifier as part of weekly processing. Every residential and commercial record flowing through the pipeline since 2014 has been assigned a persistent property identifier, reconciled across relists, address variations, and agent transitions. That work is what makes 194,167 verified pairs possible across the Canadian dataset. Without it, the same underlying transactions would produce a far smaller, noisier pair set.
Every pair in the BrightCat Canadian Home Price Index dataset meets four conditions:
Pairs that cannot satisfy all four conditions are excluded from the published series. The result is a training set where the price signal is as clean as the underlying transaction record permits. For AVM teams building or retraining models on Canadian data, that filtering is not a detail. It is the difference between a pair set that trains a stable model and a pair set that introduces coincidental noise the model ends up memorising.
Canadian housing markets moved through several distinct regimes in the past decade: the long run-up from 2014 through early 2022, the rate-shock correction that followed, the regional divergence that emerged afterward. A pair set drawn only from recent years captures one part of that. A pair set drawn from a pipeline running since 2014 lets the model see how the same property behaved across different market conditions.
BrightCat's underlying lifecycle dataset covers 5.8 million residential properties and 297,000 commercial properties, with listing and transaction activity tracked weekly over twelve years. The repeat-sale pair set is the subset of that history where the conditions above are all satisfied. As the pipeline adds weekly data, the pair set grows, existing pairs gain additional context from subsequent listing activity, and the geographic and temporal distribution thickens.
A repeat-sale pair shows the first and last sale. The useful context lives between them. A property that sold in 2017, relisted four times over the next five years at declining prices before selling again in 2022, is a different data point than a property that sold cleanly in 2017 and again in 2022 with no activity in between. The final prices may be identical. The signal to an AVM is not.
BrightCat's pair set retains the link to the underlying lifecycle record: every listing event, every price change, every drop, every relist, every status transition between the two sales. AVM teams that need this context can join it back through the persistent property identifier. Teams that just want the pair prices can use the pair set directly.
The Canadian Home Price Index dataset spans all ten provinces, with residential coverage anchored by the weekly listing and sold pipeline. Pair density varies by province, driven by underlying transaction volume. Ontario, British Columbia, and Alberta together account for the largest share of pairs, consistent with their share of national residential transaction volume. Smaller provinces are represented in proportion to their market size. The full provincial breakdown is available on request.
The Canadian Home Price Index dataset and the underlying repeat-sale pair table are part of BrightCat Core. Delivery options include:
All four channels draw from the same weekly pipeline. Pair counts, lifecycle history, and property-level attributes are consistent across channels.
It is not a published house price index series. Teams that need a single national or provincial index number for reporting should look at official statistical publications. BrightCat's strength is the underlying pair table: the raw inputs an AVM team, a portfolio analyst, or a quantitative research group would use to produce their own index or train their own model. The 194,167 pairs are the substrate, not the final aggregate.
Verified Canadian repeat-sale pairs with full lifecycle context.