Dogs and Dishwashers – The rental housing pricing problem.

The prices we place on the amenities we love

Hyatt Regency San Francisco, San Francisco, United States

Rental and real estate prices are among the most consequential factors of cost-of-living in America. The decision of how to price properties relies on a multitude of variables but very seldom are these decisions subject to critical analysis. A reverse question also exists. Under what conditions do landlords seek to renovate properties and increase rental prices? The rental pricing problem is therefore of great importance to landlords who wish to price their rental units competitively, for renters seeking maximum value and for public institutions seeking to understand patters of investment. In this paper, multiple methods are evaluated against the Chicago apartment listings as scraped from Craigslist.

Housing prices remain unique among commonly traded goods and services, in that nearly no geographic competition exists. This feature is often compounded by high costs and low frequencies of transactions. Demonstrating the importance of location in urban centers, is the extreme segregation of central business districts which radiate outward into residential areas. (Lucas and Hansberg, 2002).

Apartment pricing in the residential real estate market remains stubbornly inefficient. With roughly half of US rentals owned by small, “mom and pop” investors, valuation errors are predictably commonplace. Anchoring effect has been demonstrated to play a significant role in price setting. Landlords who buy at a market’s peak systematically charge 2-3% more and sit in inventory 6% longer than counterparts who purchased at the market’s trough (Giacoletti and Parsons. 2022). This inefficiency is at least partially explained by the conventional wisdom of the “1% rule of real estate”, which states rent should equal 1% of the purchase price. These peculiarities and inefficiencies underscore the need for a deeper understanding of price setting.

The formal approach to this problem is known as hedonic modeling which attempts to determine the extent to which each factor affects a property’s price. The goal of this paper is to model the specific decision to renovate existing rental units through the development of such a model. By investigating the wealth of information publicly available on apartment listing sites, we can construct a model that accurately explains the bulk of the variation in price between rental units. By mining data available through Craigslist, a predictive model of investor action can be built, and a prescriptive model for investors in the residential real-estate space.

The problem of hedonic regression has been approached in numerous other fields. A method described by Rosen (1974) outlines methods since adopted by numerous institutions to built explanatory models of real estate pricing. Su et al. (2021) used this approach to good effect in a paper with similar goals, using web-based classifieds to investigate the role of landscape amenities on rental prices in China. The hedonic model attempts to decompose a property into its constituent attributes – square footage, number of bathrooms, bedrooms, etc. The advantage of this model is that it transparently accounts for the attributes of a property and is easily intelligible. Because of its ease of use and interpretation, the hedonic model has become industry standard for valuation in the American housing market. Hedonic models are often limited though, by poor generalizability.

To address the problem of generalizability, the generalized additive model (GAM) seeks to sacrifice comprehensibility in favor of better accuracy. Detailed by Mason and Qugley (1996), this model generates highly accurate estimates of price, but interpretability does suffer somewhat. Even so, this approach’s strength lies in its ability to generate smooth functions from discontinuous datasets. Rather than depending on a nearest-neighbors method, a GAM directly calculates a gradient and provides a confidence interval.

Dataset Preparation

The data-gathering step was carried out using the python requests library. By sending a GET request to the craigslist API and parsing the JSON response, we generate a basic dataset containing price, neighborhood and, most importantly, a URL for the full listing. The next step iterates over the listing URLs and saves them to a local database. Attributes gathered are:

‘price’, ‘beds’, ‘sqft’, ‘parking’, ‘baths’, ‘descript’, ‘adress’, ‘lat’, ‘lon’, ‘date’, ‘cats are OK – purrr’, ‘dogs are OK – wooof’, ‘air conditioning’, ‘furnished’, ‘w/d in unit’, ‘laundry on site’, ‘laundry in bldg’, ‘no laundry on site’, ‘no parking’, ‘street parking’, ‘off-street parking’, ‘detached garage’.

The variable “descript” is unique because it contains the entire free-text field provided by the listing entity. From this attribute, the fields “hardwood” and “stainless” were captured by matching the substring to the descriptions.

By dividing price by reported square-footage (where available), correlation values could be calculated. The strongest correlations were: in-unit washer/dryer, air-conditioning and proximity to Chicago’s downtown. Landlords reporting no laundry on site report a stunning average monthly price difference of $1,100 below those reporting in-unit laundry. Indicators of renovation, ‘stainless’ keyword and listed air-conditioning are also lower in laundry-absent properties. This provides support for the theory that these attributes can be largely considered together as positive indicators of ongoing investment.

Price vs square-footage for collected properties

Linear model of collected attributes using sklearn linear_model. Predicted price on X axis, actual price on Y axis.

Still unclear though, is the actual hedonic value of adding any specific investment. To determine these marginal values in better detail, a linear generalized additive model from the pygam library was trained on the dataset. By plotting the curves generated, we see the effects of each variable in isolation, along with their respective 95% confidence intervals.

From this point, it becomes possible to provide guidance to maximize return on investment. The addition of air conditioning, for example, is valued at $75 to $200 per month. In-unit laundry, we see providing a benefit from $100 to over $300 per month. Updated kitchens (though not on the above figure), lend an additional $125 in value. As an example problem, assume a landlord would like to determine the additional value adding air conditioning would contribute to a rental unit. At time of writing, an 8% APR home-equity loan is typical for well-qualified borrowers. The EPA recommends 20,000 BTUs for the typical 1000 square-foot Chicago apartment. Prices for ductless split units vary widely and can range from as little as $3000 to upwards of $20,000. A washer/dryer set typically costs from $700 to $1800, and a stainless kitchen set (refrigerator, stove, dishwasher, and microwave) ranges from $2,300 to $11,000. This gives us a range of potential budgetary possibilities $6,000 to $32,800. The possible upside benefit is in the range of $225 to $700 per month or $2,700 to $8,400/year.

Through use of the confidence intervals provided by the GAM, we can simply add the mesh grids to generate predictive values for the expected change in rental incomes. The CI95 return on investment is found to range from $280 to $887, with a mean of $498 per month, or $5,976 annually, and $59,760 over the lifetime of the loan.

The key drawback of this study is that while it performs well predicting the mean asking value of a property, it fails to take time-on-market into account. The monthly nature of most rental agreements implies that there is an optimal time-on-market for listings looking to maximize revenue. Listing a property at too low of a price may secure a tenant quickly, but leave money on the table. Listing a property at higher prices corresponds to longer times on-market, but regardless of which scenario, a move-in date will typically remain the same. Landlords should therefore ask exactly as much as they can without excessively risking the loss of a month’s rent. To further investigate this, 90 days or more of rental listing data could be used to track the rate of posting renewal (Craigslist posts are renewed weekly) and attempt to learn from their correlations with this model’s predictions.

Further enriching this model is also possible. With advances in computer vision, it may be possible to train a neural network to perform regression analysis of posting images and thereby provide guidance to an interpretable model such as a GAM.

References

Giacoletti, M., and Parsons, C. A. 2022 Peak-Bust rental spreads. Journal of Financial Economics

Goodman, L and Mayer, C. 2018. Homeownership and the American Dream. The Journal of Economic Perspectives.

Lucas, R. E., & Rossi-Hansberg, E. 2002. On the Internal Structure of Cities. Econometrica

Mason, C. and Quigley, J.M. 1996 Non-Parametric Hedonic Housing Prices Housing Studies.

Osterbring, M., Mata, E., Thuvander, L., Wallbaum, H. 2019. Explorative Life-cycle assessment of renovating existing urban housing-stocks. Building and Environment.

Rosen, S. 1974. Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition. Journal of Political Economy.