Facebook Twitter RSS Feed

Location Affordability Portal
US Flag

Bookmark and Share

About the Portal

Data & Methodology (Version 1)

This page describes the data and methodology used for Version 1 of the Location Affordability Index, launched in November 2013. The data for Version 1 can be downloaded here.

There’s more to affordability than how much you pay for housing. Transportation costs are the second-biggest budget item for most families and have an important and robust relationship with housing costs and location. The Location Affordability Index sheds a new light on affordability by showing users the combined cost of housing and transportation as a percentage of income.

Housing + Transportation = Location Affordability

Despite the simplicity of the concept, the Index is generated using an enormous amount of data and a series of complex analyses. This page walks through exactly what the Index is and how it is produced in general language (for a more technical description, please see the LAI Methodology.)


The goal of the Location Affordability Index is to give consumers access to reliable, standardized data on the cost of location to make more informed decisions about where to live and work. There are four elements—explained in greater detail in the following sections—that must be grasped in order to fully understand what the Index is, what it says, and how it is produced. They are:

  • Data sources – the Index is calculated using data from a number of different sources
  • Geographical scale – as with any informational resource used by consumers, the level of specificity makes a difference in how it can be used. For instance, weather forecasts are precise to the zip code, but flooding advisories apply to entire counties, making them less precise and thus less useful.
  • Prediction method – because there is no existing source of data that tells us what we’re interested in (the housing and transportation costs for a location, regardless of who lives there), we need to use existing data to make estimates. We do this using statistical regression modeling.
  • Components – the Index is expressed in terms of housing costs, transportation costs, and income. These three components are derived in three different ways.
    • Housing costs – predicted using regression modeling
    • Transportation costs – calculated by predicting how much transportation people in a given neighborhood use—i.e. how many cars they own, how much they drive, and how much they use public transit—and then multiplying each of these quantities by the cost per use (e.g. annual cost per car).
    • Income – to make the Index as concrete and useful as possible, housing and transportation costs are calculated for eight different household profiles characterized by income level, household size and number of commuters. These costs are then divided by the income for each profile to give a percentage of a given family’s income associated with a given location. The incomes used are based on the median income levels in each region covered by the Index.

Data Sources

In all, the Index draws from six different Federal data sources, Illinois state odometer readings, and transit data compiled by the Center for Neighborhood Technology:

  • U.S. Census American Community Survey (ACS) – an ongoing survey that generates data on community demographics, income, employment, transportation use, and housing characteristics. Here we use the 2008-2012 survey data.
  • U.S. Census TIGER/Line Files – contains data on geographical features such as roads, railroads, and rivers, as well as legal and statistical geographic areas.
  • U.S. Census Longitudinal Employment-Household Dynamics (LEHD) Origin-Destination Employment Statistics (LODES) – detailed spatial distributions of workers' employment and residential locations and the relation between the two at the Census Block level and characteristic detail on age, earnings, industry distributions, and local workforce indicators (see overview here). Here we use LODES and OnTheMap Version 6, which are built on 2010 Census data.
  • National Transit Database - over 660 transit providers who are recipients or beneficiaries of Federal Transit Administration grants report annually on transit service and safety. We use the 2010 database, which corresponds to the middle year of the 2010-2012 ACS data.
  • Consumer Expenditure Survey – a set of ongoing surveys conducted by the Bureau of Labor Statistics (part of the Department of Labor) that provide information on the buying habits of American consumers, including data on their expenditures, income, and consumer unit (families and single consumers) characteristics.
  • AllTransit™ database – developed by the Center for Neighborhood Technology, this is a compilation of General Transit Feed Specification (GTFS) station and stop data for bus, rail, and ferry service for more than 75 percent of all metropolitan and micropolitan areas in the country with populations larger than 250,000 and 41 percent of those with populations of less than 250,000 (see detailed documentation for coverage area).
  • Illinois State odometer readings – as part of the smog check required for vehicles in the state’s non-attainment areas (the Chicago and St. Louis metro areas), the Illinois Environmental Protection Agency records odometer readings. Odometer data for 2007 and 2009 were compared to determine how many miles had been driven by location. To validate the use of this data for entire country, it was compared to the 2009 National Household Travel Survey (maintained by the Federal Highway Administration).

These data contain information about the characteristics of every Census block group in the Index’s coverage area.

Geographic Scale

Given currently available data, we are able to reliably estimate housing and transportation costs at the Census block-group level. Census block groups have populations between 600 and 3,000 people. They vary in size depending on an area’s population density, ranging from only a few city blocks to the entirety of some rural counties. Block groups are the smallest geographical unit for which reliable data is available; they can generally be thought of as representing neighborhoods (bearing in mind the different ways people understand the concept of “neighborhood”). The Index covers 942 Core Based Statistical Areas (similar to metropolitan areas), accounting for 94% of the U.S. population.

Key Concept: Regression Modeling

To calculate the housing and transportation costs for a given location, we use data for demographics and features of the built environment that we know influence these costs: income, average household size, average commuters per household, population density, walkability, transit access, and employment access. Using these data and statistical regression – a widely used statistical technique that assesses the relationship between one or more inputs and an output – we generate a series of mathematical models for the relationship between all of these data points and housing and transportation costs. By plugging data into these models, we can estimate components of housing and transportation costs at the Census block-group level that can then be used to calculate the Index.

For an illustration of how this works, think about the relationship between driving and walkability. However you measure it, the greater a neighborhood’s walkability, the less its residents will drive, all else being equal. In order to use data on walkability to predict transportation costs, a researcher would need to model this relationship. He or she would do this by looking at existing data on vehicle miles traveled (VMT) and walkability for many block groups (there are almost 200,000 covered by the Index). Next, he or she would use statistical regression modeling to come up with the best possible approximation (or model) of that relationship, represented by an equation for a line through the middle of the data points. He or she then uses this equation and data on walkability to estimate the VMT for different block groups, which is used to calculate total transportation costs.

Figure 1: Regression Modeling in Concept

Regression Modeling in Concept

Modeling Transportation Behavior

To estimate transportation costs, we start by generating three regression models that predict the three major behaviors or decisions that determine transportation costs for a given block group: how many cars families own (average cars per household), how much they drive (average annual vehicle miles traveled or VMT), and how much they use transit (percent of commuters using transit). The inputs used to generate these models (Table 1) are calculated from Federal and transit data to approximate key demographic characteristics and features of the built environment: income, average household size, average commuters per household, population density, walkability, transit access, and employment access. To capture walkability, for instance, we calculate a neighborhood’s block density (blocks per square mile) and intersection density (intersections per square mile), both of which are negatively correlated with how much people walk in an area. (For more information about these model inputs, see the detailed methodology documentation.) Data for dependent or outcome variables—i.e., the quantity each model will predict—are also required as inputs (Table 2).

Table 1: Regression Modeling Inputs: Independent (Predictor) Variables

Variable Description Data Source
Gross Density # of households/total acres Census ACS, TIGER/Line files
Residential Density # households in residential blocks/total acres in residential blocks Census ACS, TIGER/Line files
Block Density # of blocks/total land area Census TIGER/Line files
Intersection Density # of intersections/total land area Census TIGER/Line files
Transit Connectivity Index Transit access as a function of transit service frequency and proximity to transit nodes, weighted by observed journey to work data AllTransit™ database
Transit Access Shed Optimal accessible area by public transportation within 30 minutes allowing for one transfer AllTransit™ database
Transit Frequency of Service Service frequency within a Transit Access Shed AllTransit™ database
Employment Access Index Number of jobs in area block groups/squared distance of block groups Census LEHD-LODES
Job Diversity Index Function of the correlation between employment in 20 different industry sectors and autos per household Census LEHD-LODES
Average Median Commute Distance Calculated from data on spatial distributions of workers' employment and residential locations and the relation between the two at the Census Block level Census LEHD-LODES
Median Household Income Census ACS
Average Household Size Calculated from data on Tenure and Total Population in Occupied Housing Units by Tenure Census ACS
Per-capita Household Income Median household income/average household size Census ACS
Average Commuters per Household Calculated using the total number of workers 16 years and over who do not work at home Census ACS

Table 2: Regression Modeling Outputs: Dependent (Outcome) Variables for Transportation Usage

Variable Description Data Source
Cars per household Census ACS
Annual VMT per household Illinois Environmental Protection Agency, National Household Travel Survey
Percentage of commuters using transit Census ACS

Once these models have been developed, we can use them to estimate average autos per household and vehicle miles traveled and the percent of commuters using transit for all 198,373 Census block groups covered by the Index. This is accomplished by plugging data for each of the 14 predictor variables for each block group into each of the three models.

Figure 2: General Method for Estimating Transportation Usage from Regression Models

Figure 2 - General method for estimating transportation usage from regression models

Most of the predictor variables come from data that describe features of a neighborhood that are the same regardless of who lives there: population density, walkability, and transit and employment access. For the demographic characteristics of a neighborhood—captured by the average household size, income, and number of commuters—using actual averages for each block group wouldn’t produce a very useful Index. Since people tend to live in places they can afford, using actual demographic data would produce a map where the majority of neighborhoods look more or less affordable. Instead, we have chosen eight household profiles—characterized by the number of family members, income, and number of commuters—that represent a wide range of American families. We calculate these family profiles for each region and then use them to generate 8 different sets of outcome variables for each block group. This approach allows us to show how affordable all the neighborhoods in a region are for each specific household, rather than just households that currently live there.

Table 3: Household Profiles Used for Estimating Transportation Usage

Household type Family members Income Commuters
Regional Typical Average Household Size for Region Median Income for Region Average number of Commuters per Household for Region
Regional Moderate Average Household Size for Region 80% of Median Income for Region Average number of Commuters per Household for Region
Low Income 3 50% of Housing and Urban Development Area Median Family Income 1
Single Person Very Low Income 1 National Poverty Line 1
Single Professional 1 200% of Per Capita Income for Region 1
Single Worker 1 Median Per Capita Income for Region 1
Dual-Income Family 4 150% of Median Income for Region 2
Retirees 2 80% of Median Income for Region 0

Figure 3: Specific Method for Estimating Transportation Usage for Eight Different Household Profiles

Figure 3 - Specific method for estimating transportation usage for eight different household profiles

Calculating Transportation Costs

Once the average transportation usage for each block group is estimated for each demographic profile, we can use those estimates to calculate total annual transportation costs.

Auto ownership and driving costs

The regression models we use estimate transportation usage, not total transportation costs. In order to calculate total automobile-related transportation costs, we multiply the estimated transportation usage (i.e. car ownership and vehicle miles traveled) by the cost per use (Figure 4). Ownership costs include all expenses, from the time of purchase on, that are required to keep the car roadworthy: purchase costs (spread over length of car ownership) or car payments, insurance, license and registration fees, taxes, and routine repairs and maintenance. Driving costs include the cost of gas and maintenance due to wear-and-tear.

Figure 4: Calculating Auto Ownership and Driving Costs

Calculating Auto Ownership and Driving Costs

The car ownership cost and car use cost components of the Index are generated using the Consumer Expenditure Survey (CES), from US Bureau of Labor Statistics. New research undertaken for the development of this site represents a significant advance over previous measures that focused primarily on autos less than five years old and used a single cost multiplier for all vehicle owners, regardless of income (a summary of the analysis can be found here).

Transit costs

There is no existing data on the average number of transit trips or expenditures per commuter or per household at the block-group level. Using regression modeling, we estimate the percentage of workers in each block group commuting by transit. We then use these estimates along with data from the ACS, the National Transit Database (maintained by the Federal Transit Administration), and our household profiles to calculate estimated annual transit trips and expenditures per household using the following steps:

  1. Total transit commuters (metropolitan area)
    1. Commuters in each block group = [number of households] X [commuters per household]
    2. Transit commuters in each block group = [number of commuters] X [estimated percentage of commuters using transit]
    3. Transit commuters in each metropolitan area = sum of transit commuters in all block groups in the metropolitan area
  2. Annual transit trips and transit fares paid (metropolitan area) – Transit trips and transit revenue are reported to the National Transit Database by transit agency, not metro area. As a result, we need to allocate transit agency trips and farebox revenue (which should be equal to the total fares paid by transit riders) to metro areas according to the proportion of the transit agency’s coverage area in each metro area. For instance, the Massachusetts Bay Transportation Authority (MBTA) commuter rail serves stops in the Boston, Worcester, and Providence metropolitan areas, so the commuter rail’s reported farebox revenues and trips are allocated to those three regions according to the proportion of total stops in each.
    1. Annual transit trips (metro area)
    2. Annual transit fares (metro area)
  3. Annual transit trips and transit fares per transit commuter (metropolitan area)
    1. Annual transit trips per transit commuter = [2a] / [1]
    2. Annual transit fares per transit commuter = [2b] / [1]
  4. Estimated transit commuters per household (each household profile @ block group) = [commuters per household] X [estimated percentage of commuters using transit]
  5. Estimated transit trips and transit fares per household (each household profile @ block group)
    1. Estimated annual transit trips per household = [4] X [3a]
    2. Estimated annual transit fares per household = [4] X [3b]

This calculation relies on the assumption that the average transit trips and fares per household in a given block group are proportional to the percentage of commuters using transit for their journey to work in that block group, relative to the other block groups in the same metro area. This is a reasonable assumption given what we know about the proportion of transit trips that are work-related: the 2009 National Household Travel survey puts the percentage of transit trips related to work at 33% (versus 20% of car trips; see Table 9), and an analysis of 150 separate on-board passenger surveys by the American Public Transit Association found that 59.2% of transit trips are work-related. Nevertheless, as with the autos per household and annual vehicle miles travelled figures that appear in the Index, these numbers are averages and do not attempt to represent the exact transit expenses for any specific household.

Modeling Housing Costs

As with transportation usage, we model housing costs using statistical regression. In addition to using the 14 variables from the transportation usage models, we include Regional Selected Monthly Ownership Costs (SMOC) and Regional Median Gross Rent when we are generating models for those two costs at the block-group level. Since we are modeling housing costs directly, there is no need for further calculation.

Table 4: Regression Modeling Inputs Used for Housing Costs Only (all from Census ACS)

Dependent Variables Additional Independent Variables
Gross Rent Regional Median Gross Rent
Selected Monthly Ownership Costs (SMOC) Regional Median SMOC

Figure 5: Method for Estimating Housing Costs for Renters and Owners for Eight Different Household Profiles

Figure 5 - Method for estimating housing costs for renters and owners for eight different household profiles

Bringing It All Together: The Location Affordability Index

After going through the steps described above, we have all the elements necessary for the Location Affordability Index: housing costs for renters and owners, transportation costs, and income for eight different household profiles for each block group covered by the Index. Selecting from the Household Type menu in the upper left allows the user to pull up the map for each household type.

Atlanta in the LAI
The U.S. Department of Housing and Urban Development WhiteHouse.gov, Welcome to the White House The U.S. Governments Official Web Portal Fair Housing

U.S. Department of Housing and Urban Development
451 7th Street S.W., Washington, DC 20410
Telephone: (202) 708-1112     TTY: (202) 708-1455
Find the address of the HUD office near you