Residential Water Efficiency and the California Data Quality Landscape

Context

The CaDC Efficiency Explorer is a planning and education tool for local water managers and the wider California water community. This tool is the result of a rapid first assessment of Governor Brown's Executive Order B-37-16, which calls for the development of water use targets customized to the unique conditions of each urban water agency as part of a new, permanent efficiency framework.

For version one of the Efficiency Explorer, the CaDC focused on the residential component of an agency's efficiency target. This target is calculated as the sum of indoor and outdoor residential production budgets, and can vary according to unique agency conditions as well as pending policy decisions. The parameters of these two budgets are displayed in the labeled panels below.

Data quality is an important dimension for this rapid first assessment, and this post aims to elaborate the data quality concerns that not only arose here, but also those that any subsequent target calculation effort must contend with.

Distinct Types of Error

There are two distinct senses in which efficiency target calculations can deviate from ground truth: precision and accuracy.

Imprecision  

Parameter data used to calculate targets can be imprecise. Imprecision reflects statistical deviations around a true value. The Efficiency Explorer's graphs include gray confidence bands around each agency's calculated target to indicate the imprecision resulting from the compounded statistical error for all parameter data sources. Analogous to the relationship between the darts and the bullseye in figure (a) above, one should expect the ground truth efficiency target values to lie somewhere within the confidence bands (for agencies not flagged as showing evidence of systematic inaccuracy).

While this type of error is important to understand, it is not the focus of this post. For technical details on each component error source, please see our error model. Takeaway: while imprecise target calculations can be further refined, they are useful first approximations of ground truth.

Inaccuracy

As alluded to above, in certain situations parameter data used to calculate targets can be not only imprecise, but also inaccurate. Inaccuracy reflects a more systematic bias away from ground truth. Figure (b) above graphically illustrates this type of error. Non-random inaccuracies can arise from situations such as the prevalence of large rural residential parcels in certain districts, which would result in systematic overestimation of target calculations in those districts. The prevalence of brown lawns in other districts would result in systematic underestimation of target calculations in those districts. Targets flagged as systematically inaccurate should not be interpreted as useful approximations and have therefore been grayed out on the Efficiency Explorer map.

Roughly 20% of agency target calculations show evidence of this more problematic source of error. The focus of this post is to break down the component data quality concerns leading to systematic inaccuracies.

Data Quality Concerns

There were five data quality concerns identified for our target calculations. These five concerns were evaluated on an agency-by-agency basis. The five concerns are as follows:

  • CIMIS Proximity
  • Rural Residential Prevalence
  • Residential Parcel Accuracy
  • Census Place Coverage
  • Service Boundary

Let’s explore each in detail.

CIMIS Proximity

Description

Suppliers with greater environmental demands for evaporation and plant transpiration will feel a greater demand for outdoor water production. Reference evapotranspiration captures this demand and scales the outdoor budget accordingly. To measure reference evapotranspiration, data from the Department of Water Resources' CIMIS stations are used to calculate inverse distance-weighted averages of nearby station readings for each supplier as a function of time. However, not all suppliers are within close proximity to CIMIS stations. Suppliers with no CIMIS stations within 20 kilometers, or with obvious intervening obstructions such as mountain ranges, have been flagged for data quality concerns.

Example

Blue dots represent CIMIS stations. In certain parts of the state there is insufficient coverage to obtain reliable ET readings.

Blue dots represent CIMIS stations. In certain parts of the state there is insufficient coverage to obtain reliable ET readings.

Moving Forward

Higher resolution ET data is foundational to being able to accurately assign parcel level ET and thus calculate an accurate, parcel customized water budget. The Department of Water Resources' spatial CIMIS data can provide more robust evapotranspiration measurements at a higher resolution.

Importantly though, according to the latest spatial CIMIS methodology, that measurement incorporates only solar radiation in addition to local in situ sensor measurements. To help improve on those measurements CaDC staff has partnered with researchers at UCLA and NYU CUSP to integrate additional pertinent publicly available data sources such as wind, precipitation and physiography. The latest work on that project is open source (like all of CaDC's projects) and available on the CaDC GitHub.  In addition, Moulton Niguel Water District, a founding participant in the CaDC, has developed an applied R&D partnership with Jet Propulsion Laboratory to improve the spatial granularity of evapotranspiration estimates.

Rural Residential Prevalence

Description

Suppliers with more landscaped area may have greater outdoor residential water production demands. Their outdoor residential production budgets are scaled accordingly.

There are ongoing discussions to finalize a definition of landscaped area appropriate for setting an outdoor water production standard. In our calculations, landscaped area is defined as the sum of photosynthetically active turf, and bushes and trees. These data were collected through remote-sensing in partnership with Claremont Graduate University.

While the landscapes of typical suburbs without brown lawns are captured well with the photosynthetically active remote sensing approach used, there generally exists greater data quality uncertainty for rural and wooded areas.

For these reasons, suppliers with a significant percentage of large, green, and un-irrigated residential parcels have been flagged for data quality concerns.

Example

Red boundaries represent residential parcel borders. The large wooded areas seen here are likely not being irrigated and the inclusion of this area will result in the systematic overestimation of target calculations in this district and others exhibiting this Rural_Residential_Prevalence.

Red boundaries represent residential parcel borders. The large wooded areas seen here are likely not being irrigated and the inclusion of this area will result in the systematic overestimation of target calculations in this district and others exhibiting this Rural_Residential_Prevalence.

Moving Forward

There seems to be an important space for human experts to manually inspect ambiguous areas to reconcile what specific category the area falls into. However, large-scale remote sensing has proved invaluable for addressing water efficiency in a way that can scale statewide. Even in areas where additional work must be done, remote sensing can be used to flag these challenging areas that need that special attention, allowing experts to operate more clinically.

Residential Parcel Accuracy

Description

Administrative boundary data were required to demarcate residential parcels from water using parcels designated for other land uses. For this iteration, residential parcel data were obtained from the Office of Planning and Research's residential parcel dataset. However, administrative boundary data must be deliberately validated.

If parcel data suggest that there is less than one person per residential parcel in a district, that district's residential landscaped area will likely be systematically overestimated. Conversely, if parcel data indicate that there are too many people per residential parcel, landscaped are measurements will be systematically underestimated. For these reasons, suppliers with an unreasonable average number of people per residential parcel have been flagged for data quality concerns. 

Example

Residential blocks in the bottom left and upper right are not captured by OPR's residential parcel data.

Residential blocks in the bottom left and upper right are not captured by OPR's residential parcel data.

Moving Forward

OPR’s statewide residential parcel data offers a great starting point that can be built upon.  Subsequent target calculations will need to improve parcel data to better align landscape area with customer type.

An approach enabled by CaDC's data sharing model is to match meters to parcels to highlight areas with maintained landscapes that use retail water in a non-manual (read: scalable) way. The current CaDC member agencies--14 as of this publication--have already made the pioneering investment in common data infrastructure to integrate that metered use and parcel data on a voluntary basis.

Census Place Coverage

Description

Census Designated Places were used by Claremont Graduate University to focus computational resources used for landscaped area remote sensing calculations. If a large percentage of a service area's residential parcels are outside of a designated census place, landscaped area estimates will be off will this approach.

Example

Census Designated Places (purple) were used to focus computational resources on fly-over imagery tiles more likely to have people. However, residential parcels (red) that exist outside of such places were not captured.

Census Designated Places (purple) were used to focus computational resources on fly-over imagery tiles more likely to have people. However, residential parcels (red) that exist outside of such places were not captured.

Moving Forward

The Census Designated Places acted as a filter to focus limited computational resources on imagery tiles most likely to have people. This was an artifact of achieving statewide landscape area estimates for 4% of the State's budget in this initial iteration. In future iterations, this quality concern should be relatively straightforwardly surmountable.

With this in mind, it should be made explicit that whichever tools and data California uses to carry out and explore statewide target calculations should be agnostic to the landscape area data source. Following this design principle, CaDC staff ensured the digital infrastructure built for this pragmatic first approximation of statewide efficiency targets was robust and general enough to smoothly receive iterative refinements of source data for all budget parameters. 

Service Boundary

Description

Administrative boundary data were also required to demarcate supplier service areas in order to appropriately allocate landscaped area. For this iteration, data were obtained from the Department of Water Resources' Water Management Planning Tool as well as the California Environmental Health Tracking Program's Water System Map Viewer.

A service area’s boundary data can have overlap with neighboring boundaries introducing data quality concern.

Examples

Moving Forward

Service area boundaries remain an important area for improvement. The California Environmental Health Tracking Program's water system boundary data needed additional processing and selective supplanting using Department of Water Resources's data before it could be used. Encouragingly, the Health Tracking Program seems to have the digital infrastructure in place for local water system personnel-driven iterative improvements toward an authoritative service area boundary dataset.

Conclusions

The five concerns above were combined into an aggregate data quality assessment for each agency's target calculation. For the agency-level evaluations see our agency-by-agency breakdown, and for a more general look at technical foundation of this work please see the CaDC Statewide Efficiency Explorer Methodology v 1.1.  

While the CaDC's rapid first assessment of Governor Brown's long term efficiency framework was the proximate cause of this exploration of the data quality landscape, it is important to hold in mind the concerns (and their suggested remedies) apply to any subsequent target calculation effort.  

The CaDC is currently scoping a version 2.0.  This will expand the residential efficiency exploration to commercial, industrial and institutional water use pioneered by two now CaDC data scientists when at NYU CUSP in a research group led by Constantine Kontokosta.  This integrated suite of version 2.0 tools will also look at supply in addition to demand. Stay tuned for more information!  

Implementing this framework will inevitably require coordinating disparate date sources maintained by disparate institutions. By working together smartly and collaboratively, together we can ensure water reliability no matter what the future holds.  We invite all California water utilities to join the winning team and participate in the CaDC today!