Skip to main content

Methodology

Governance and limitations

AI governance: where AI is and is not involved

StreetSignal uses large language models in parts of its development pipeline. We are explicit about where AI is and is not involved in what you see on this platform.

AI does not generate or estimate data values

Every figure on a suburb page, median property value, CAGR, electricity consumption, proximity distances, is computed deterministically from a primary government dataset. No model estimates or interpolates these values. Where AI-assisted text is used in neighbourhood descriptions, all numeric values inserted into that text are drawn exclusively from the computed data layer, not estimated by the model.

AI assists with development and written content

LLMs assist with code review, documentation, and exploratory analysis during development. All AI-assisted outputs are reviewed before publication. No LLM output reaches the data layer directly.

Why structured data must precede any AI layer

LLMs are probabilistic systems. Asked to describe a suburb's crime rate without ground-truth data, a model produces a plausible-sounding answer that may bear no relationship to reality. In a civic information context, where outputs influence where families choose to live, hallucinated statistics cause real harm. The deterministic ingestion pipeline exists specifically to prevent this failure mode. Grounding AI in structured, verified data is not a technical nicety. It is an ethical requirement.

The broader African context

The architecture built for Cape Town is designed to scale across other major African cities. Most African property markets have higher information asymmetry than Cape Town, not because the data doesn't exist, but because it is fragmented, poorly standardised, and inaccessible at scale. The core challenge is building deterministic ingestion pipelines that can harmonise multi-format municipal data before any analytical layer touches it. This ground-truth foundation is what separates a responsible civic data platform from a system that launders uncertainty through AI-generated confidence.

Known limitations

01

SAPS precinct join is incomplete. A portion of the 744 suburbs in our dataset have not been successfully matched to a SAPS precinct. These suburbs display a neutral safety index of 50 and a precinct label of "Unassigned". This is a known data gap, not a computed result.

02

Crime data lag. SAPS statistics are published quarterly with a reporting delay. The current dataset is Q3 2025/2026 (October-December 2025). Scores reflect this period, not current conditions.

03

Reported crime is not experienced crime. SAPS data captures crimes reported to police. Research consistently shows that a majority of crimes in South Africa go unreported. In some suburbs, household survey data on experienced victimisation diverges materially from the SAPS-derived safety index - this reflects structural under-reporting, not low actual risk. The safety index is a measure of relative reported crime pressure only.

04

GV2022 is a point-in-time legal assessment. Actual transaction prices may differ materially from the General Valuation. GV is assessed for municipal rating purposes, not as a live market index. CAGR figures are derived from changes in assessed municipal valuations, not from property sales transactions.

05

Proximity distances are straight-line only. All health, library, and park distances are Haversine (straight-line) from suburb centroid. Actual walking or road distances will be greater. Large suburbs with irregular boundaries may show significant variance from the centroid measure.

06

Electricity data covers prepaid connections only. The CoCT prepaid electricity dataset does not cover post-paid or credit meter connections. Wealthier suburbs with a higher proportion of post-paid connections will show lower kWh per capita figures than their actual consumption.

07

Household survey coverage is 407 of 744 suburbs. The remaining 337 suburbs do not have matched survey data above the reliability threshold. Household economic, food security, housing, and digital metrics are not available for these suburbs. Survey group matching uses geographic proximity; small suburbs may inherit indicative data from a neighbouring survey group.

08

Matric results coverage is 157 of 744 suburbs (21%). NSC matric data is only available where schools with matric cohorts could be matched via EMIS code to a suburb. The remaining 587 suburbs have no matric data displayed. Pass rates also reflect examination throughput, not educational quality, and are influenced by candidate selection decisions at school level.

09

Commute mode coverage is 136 of 744 suburbs. Commute mode data from the CCT Household Survey is available for suburbs within the 163 surveyed areas. Taxi connectivity data, sourced from CCT minibus taxi route records, covers 637 suburbs and serves as the primary transport signal where commute mode data is unavailable.

10

StreetSignal currently covers Cape Town as its pilot city. The methodology is city-agnostic and requires three inputs: a municipal valuation roll, crime statistics with geographic boundaries, and a household survey. Do not use Cape Town-calibrated benchmarks to assess suburbs in other metros.

11

No rental market or recent sales data. South African property transaction records are held by the Deeds Office and distributed through commercial data providers. Rental market data is held by specialist credit bureaux. Neither source publishes suburb-level data as open data. StreetSignal's property section reflects municipal valuations only, not market prices or rental yields.

12

No vacancy rate data. Residential vacancy rates at suburb level are not published as open data in South Africa. Commercial vacancy data is held by private providers and is not available for integration.

13

No development pipeline data. Planning applications, building plan approvals, and new development activity are managed through the City of Cape Town's e-services portal but are not published as a bulk open dataset. StreetSignal cannot show upcoming developments or construction activity.

Data freshness and update schedule

Every data display on StreetSignal includes the data period it covers. No metric is presented without its vintage. StreetSignal updates as soon as new releases become available from each source.

Dataset Source cadence Typical lag Version in use
CCT General Valuation Every 4 years 1-3 years GV2022
CCT Prepaid Electricity Monthly (open data portal) Weeks December 2025
SAPS Crime Statistics Quarterly ~6 weeks Q3 2025/2026 (Oct-Dec 2025)
SAPS Annual Crime Report Once per year (September) 6-9 months FY2023/24
Stats SA Population Decennial census 1-2 years Census 2022
CCT Health Facilities Irregular Variable 157 facilities (2025)
CCT Libraries Irregular Variable 102 libraries (2025)
CCT Parks & Green Spaces Irregular Variable 5,198 green spaces (2025)
DBE Schools & NSC Results Annual ~6 months EMIS Q2 2025 / NSC 2025
CCT Service Requests Ongoing (bulk export) Rolling ~2.5M requests (2025)
CCT Household Survey Every 2-3 years Months Feb-Oct 2024 (163 survey areas, 407 suburbs via group matching)
CCT Transport (commute mode) Irregular Variable 163 survey areas; taxi connectivity: 637 suburbs

Responsible use

StreetSignal data is published for personal decision-making by individuals considering where to live. It must not be used for insurance pricing, credit scoring, tenant screening, employment decisions, or any form of geographic discrimination. These restrictions apply equally to data accessed via the website, API, or embed widgets. See our terms of use for the full policy.

Questions about our methodology or data sources? Contact us at [email protected]. Full attribution and licence details are on our Attribution & Copyright page. This platform is not a substitute for professional property, legal, or safety advice.