Main

Column

Last updated: 2022-12-01
Time frame: 2019-12-02 to 2022-12-01

Overview

Overview 

This site a very beta rendering of a model that relies on recent "New Building" (NB) filings with the NYC DOB and other publicly available data (via NYC opendata). The model and site are the work of Josh Whitford and Kristen Akey, with funding support from ISERP, with some guidance from conversations with personnel at the NYC DOB. The underlying code is in a private repository, but we are making the model output publicly available, and running the model on the first of each month.

 

The model output predicts the eventual size of the built lot (which is often larger than indicated in initial NB filings, identifies adjacent lots, and provides some information about the buildings on those lots. The model output is displayed visually in the map, where you can either search by borough, block and lot numbers, or zoom in. The initial NB filing is brown; a lot our model predicts will be in the eventual lot is orange; a lot that is yellow is adjacent to our inferred/predicted lot. The numbers indicate the number of filings in an area (blue in mouseover); clicking will zoom, and divide into subareas (with the same logic; numbers, blue areas, in mouseover; note that area boundaries are arbitrary/algorithmic).

 

The model's output can also be downloaded, as either a CSV or XLSX file below (simply click where it says "CSV, or where it says "Excel"

 

A more detailed write-up of the underlying data, logic, and so on can be found above, by clicking either "here," or on "More Info" above (they take you to the same place).

 

Datatable

Column

New York City New Buildings Map

More Info

Page for More Info

 

More info

This website was produced by Josh Whitford and Kristen Akey, using a series of publicly available datasets [set anchor], and produces a dataframe (as CSV or excel) and a shapefile (rendered as a map)[set anchor]. The underlying logic of the model[set anchor] is rooted in the observation that in NYC new building filings can occur years before previous buildings on the lot have been vacated, let alone taken down, and do not always represent the "true" lot size of the eventual structure, once all approvals have taken place, permits granted, and ground broken. 

 

The site is intended, when finished, to be of use to anybody who is interested in new development and construction in NYC. This can include personnel at the DOB, who may find it useful to have a database that regularly tracks when permits are "pulled," or neighborhood and tenants groups wondering what is coming down the pike. It is, however, a relatively sparse tool, a means to begin further investigation. This can be done with a variety of resources. These include the Buildings Information Systems (BIS) website, which gives access to all permit filings; the NYC digital tax map is also a good resource if one is interested in looking at property records. 

 

The model takes as its initial input the last three years of New Building (NB) filings with the NYC DOB, but then drops all those that have in fact been permitted and broken ground[1]. We also drop any NB filing on a lot that is an outlier in the city system (e.g. Roosevelt island, which includes just a few lots with many buildings on each; some industrial areas; etc). 

 

 For the remainder, we use data available in the Automated City Register Information System (ACRIS) to ask whether any adjacent lots are jointly owned either by the party that filed the NB application, or the party with development rights to the underlying lot (i.e.  the "BBL" or Borough-Block-Lot as defined by the NYC system for parceling real estate). We then define an "inferred lot" as the lot that includes the NB filing and any jointly owned adjacent lots; we also tag all lots that are adjacent to the inferred lot as buildings on those lots can be at risk in construction [2].

 

These inferred lots are then the core input for the CSV file that can be downloaded on the main page. We move analytically from the lot to the structure—or, in data terms, from BBL to Building Identification Numbers, or BINs, as much of the information that is of potential interest is less about the land underneath than about the structures themselves. This is, also, because land and buildings are regulated differently, and by different entities. 

 

The spreadsheet/dataframe (CSV) that can be downloaded from the mainpage contains 19 columns, with some overlapping information, in what is perhaps not the most logical ordering (the site is, again, a work in progress). We include them all to allow for various ways of sorting/searching. Each column heading is listed in red boldface below, with some explanation. 

 

Job Cluster: The job cluster is defined by the BIN of the NB filing. Its purpose is to provide a stable and unique ID to link rows associated with a particular NB filing (i.e. the NB, any lots our model identifies as part of the true (inferred) lot, and any lots adjacent to that inferred lot).

 

BBL: the full 10 digit Borough-Block-Lot number. The first digit is the borough; the next five are the block number; the final four are the lot number. 

 

Borough: (1) Manhattan; (2) Bronx; (3) Brooklyn; (4) Queens; (5) Staten Island

 

Block: Self-explanatory. See BBL.

 

Job Filing: This is the job filing type from the NYC Open Data DOB Job Application Filings. By design (and filter), this is either NB, to indicate the BBL which was part of the initial input to our model, or "-" for lots that are in the CSV because in the inferred lot and/or adjacent to that lot. 

 

BBL Description: This column is, in effect, the listing of BBLs prior to the inference of our model. The lots we include in this CSV fall into the three categories. (1) "Job Filing", meaning the lot associated with an NB filing; (2) "Adjacent," meaning a lot somewhere adjacent to the NB filing lot; or (3) "Block" meaning they are on the same block, but are not adjacent to the initial filing.

 

Inferred Lots: This is the model output. Rows in the CSV are either: (1) "NB", meaning they are the lot associated with the initial filing; (2) "Inferred Lot," referencing a lot that is jointly owned by the NB filer and/or the owner of the lot associated with the NB filing (these are either "Adjacent" or "Block" in the BBL Description column); (3) "Adjacent," meaning the lot is adjacent to what our model identifies as the "true" demolition/construction site, where a building is potentially at some risk of damage[3]. (4) "-" (i.e. blank). This occurs in cases where zoning variances are requested and/or in in process, where is some mismatch in the data between BBLs and the associated BINs; this often occurs in multi-lot filings, with NB permits requested for multiple adjacent lots. 

 

Address: This is reasonably self-explanatory. We use the Property Address Directory (PAD) available in NYC opendata. If multiple addresses are listed for a BBL, we use the first address. 

 

NB owner: This is the name associated with the NB filing and/or the name of the owner we have identified using ACRIS[4]. We list owners only for lots where there are is NB filing, or where we have an "inferred lot." We leave the column blank for adjacent buildings. 

 

Adjacent Same Owner: This is a categorical variable, coded 1 if the BBL is identified by our model part of the inferred lot without, however, any NB filing. 

 

Multi-Lot Filing: The categories here are the same as in BBL description, but identify "true" lots that are larger than any single NB filing where the owner has filed separate NB applications for multiple adjacent lots. In these cases, we consider them part of the same "Job Cluster" (see first column) but do not refer to them as an "inferred lot" since we identify the cluster differently (i.e. by multiple filings, as opposed to joint ownership without filing). 

 

BIN: We use the Property Address Directory (PAD) to identify all the Buildings Identification Numbers on a lot. If there are multiple BINs, we include them all[5].

 

Pre-filing Date: This column gives the date of the "pre-filing" of the NB permit application. This is the date of the initial NB filing in the NYC DOB Job Application Filings dataset on NYC OpenData. It is associated by the DOB with a "Job#" but is, typically, not the only filing associated with that Job#. There are typically multiple rounds, each associated with different "document #" in the Job Applications database. We include this date in the CSV as it is the date when the DOB was notified of the intent to erect a new structure on the associated lot. 

 

Map Status: this distinguishes between NB filings that are on BBLs in the Tax Lot shapefile available to us from NYC opendata. It indicates whether an NB is on the map on our main page, or is not (the map is based on the shapefile, so if the lot has not been created by the Department of City Planning – perhaps because the NB filing has not been approved—we cannot show it). 

 

DOB Applications: this column includes applications to the DOB associated with any BIN in the CSV, whether in the projected building site or adjacent to it, if filed after the pre-filing date of the associated Job Cluster. We include applications for New Building (NB), Alterations (A1, A2, or A3), and demolition (DM), in the order of "actions" (filings or decisions by the DOB). We also include the "Job Status Code" of each application, separated by a dash. The codes themselves can be found here; more explanation of different permit types here; and, general explanation of these different codes and types here. By way of example, NB-J means "New Building—Plan Exam-Disapproved"; whereas NB-Q means "New Building—Permit Issued-Partial Job"; and so on. 

 

Latest Action Date; we do not list the dates for all the permits files, to avoid confusion. We include just the date of the most recent action (which will the last permit-status combination in the previous column). Note that this allows the user to sort the CSV by date to obtain a listing of buildings in or adjacent to an NB site where there have been recent actions. This may be useful for coordination across units interested in the same structures. 

 

DHCR Unit: this column identifies buildings in which there is a rent-regulated unit (Y/N) registered with New York's Division of Housing and Community Renewal (DHCR). There is no publicly available listing of which units are covered by New York State's rent regulation rules, but NYC's Rent Guidelines Board has published listings, by borough, of all buildings that had at least one unit registered as of 2018. Those listing—which we use to generate this flag—can be found here.

 

DOB Building Class: This column list the "DoBBuildingClass" from the dataset of "Buildings Subject to HPD Jurisdiction" dataset on NYC OpenData (HPD is the NYC Department of Housing Preservation and Development). We list the Building Class for each BIN in our BIN column (so, if there are two BINs, there will be two Building Class listings, separated by a semi-colon). A listing of the different building classes can be found here, and the rules on who must register their buildings with HPD can be found here. This dataset includes all buildings subject to NYC's "Multiple Dwelling Law," if ever registered. As a result, there are many buildings (BINs) in the dataset that are not registered at present, where the DoBBuildingClass field lists "Not available." These buildings are in the vast majority of cases 1-2 family homes that are not required to be registered with HPD if they are owner-occupied, but should otherwise be registered. They are often, in fact, still rental properties that have just not been registered with HPD. The building type can be obtained from other datasets, if desired (e.g. from the Departmet of Finance). So, to avoid confusion, we do not list the DoBBuildingClass field as "Not Available" as in the original dataset (since it is, in fact, available; it is just not in HPD data). We list the class instead as HPD NOT REGISTERED. 

 

Construction Risk: This column flags six building classifications that, according to the Chief Engineer for Enforcement at the DOB, are disproportionately at risk if on or adjacent to a construction site. Note that these include buildings that are not registered with HPD, where the risk is rooted, we suspect, in the disproportionate tendency of landlords who do not register their rental properties also to underinvest in building maintenance. The categories we have flagged are: 

 COL – CONVERTED OLD LAW TENEMENT

 HCA – HERETOFORE CONVERTED CLASS A

 HCB – HERETOFORE CONVERTED CLASS B

 HPDNA – HPD NOT REGISTERED

 NL – NEW LAW TENEMENT

 OL – OLD LAW TENEMENT

 

 

Datasets:

DOB Job Application Filings 

Property Address Directory (PAD)

ACRIS - Real Property Legals 

ACRIS - Real Property Parties 

ACRIS - Real Property Master 

ACRIS - Document Control Codes 

- From NYC Digital Tax Map: Tax Block shapefile - DTM_Tax_Block_Polygon.shp; Tax Lot shapefile - DTM_Tax_Lot_Polygon.shp 

-.DHCR – listings published by NYC Rent Guidelines Board

HPD – Buildings Subject to HPD Jurisdiction

Outputs 

- CSV output, with: 

- BBL_Description - description of the lot (Job Filing, Adjacent, Block) - Joint_NB - joint NB status of lot (NB, Adjacent, Lot) 

- Address - address of NB or same-owner matched lot 

- NB_Owner - name of NB owner or name of same-owner matched lot owner –

 Same_Owner - indicator, 1=lot has same owner as NB, 0=otherwise - RPP_Owner_Not_Same - for the NB, was RPP or NB filing name used? - 1=RPP name(s) of NB used, 0=NB filing name use 

- doc_id - document ID of same-owner matched lot

-Additional data, linking BBL to BIN via PAD as necessary, as described in above.

- NYC Lot Shapefile, with: 

- BBL_Description - description of the lot (Job Filing, Adjacent, Block) - Joint_NB - joint NB status of lot (NB, Adjacent, Lot) 

- Same_Owner - indicator, 1=lot has same owner as NB, 0=otherwise - Owner - NB, Same_Owner, Block

 

Model Logic and Development 

Background

The model grew out of conversations between Josh Whitford and personnel at the DOB, who had been manually producing and updating a spreadsheet similar to the CSV that this website generates and updates at regular intervals using computations methods. 

 

DOB personnel, working entirely with publicly available data, were keeping track of NB pre-filings, and then going through ACRIS property records to determine whether there was joint ownership with adjacent lots. This process is not entirely straightforward, since ownership may be assigned to different corporate holdings, some development rights may be sold but not others, and so on. But, since it is common prior to an NB filing to consolidate the necessary ownership and development rights under a single personal or corporate holding, it is often possible to infer the likely size of the "true" lot, even if not indicated in the pre-filings[6].

 

In these cases, DOB personnel would anticipate that in filings will likely include an application to the Department of City Planning for a zoning variance to join the lots, and changes to the NB filings to build on the entire lot (if approved). They were also identifying adjacent buildings, noting whether they contained rent-regulated units—a factor that can, if there is new building going on, lead to tenant harassment—and identifying the building class. 

 

Their aim was to help the different regulatory units that may have jurisdiction or responsibilities in the complex and sometimes multi-year process of permitting the demolition of older structures and building new ones. The New Building "pre-filing" is the first moment that the prospect of new construction in NYC is registered with government regulatory agencies and—as a consequence—becomes visible also to the public. 

 

Columbia contribution/model/etc 

Josh Whitford of the Columbia University Department of Sociology, with the aid of research assistants—Kristen Akey in particular—and seed funding from ISERP, relied on this manually produced spreadsheet as "training data" to automate this process using a more computational approach (python and qGIS for the model; R and Github actions to automate the updating of the site itself). We did not use machine learning techniques, but instead developed a series of queries of the relevant datasets, refining the method in a sort of iterated debugging. We did it this way in part because the manual methods had missed some (in our terms) "inferred lots" that our computational approach was able to identify. In those cases, we'd correct the training data, and iterate again. We were able ultimately to achieve levels of both precision and recall above 95%. 

 

We are not willing to share the specifics of the model at this time[7]. We have developed it with the dual intention of providing a tool that can be useful both to the DOB and to the public, and that we can use for purposes of our own research. We can, however, outlines the underlying logic. We begin with our listing of NB filings and, using qGIS and using an up-to-date tax block shapefile to identify all adjacent lots. We then use a fuzzy name matching algorithm, and names from the NB filings and a subset of the documents available on ACRIS to generate a likelihood that two lots are jointly held by a single party. If they are, we place them in the same "inferred lot", identify all lots that are adjacent to that larger lot, and repeat the process. The complexities lie in the fact that lots change over time, as zoning variances are granted (this changes what is next to what) and in identifying shared ownership for purposes of building, given the many different contractual forms that can take, the use of varying corporate names, and so on. 

 

 

Future plans

To come…. In the short term, we have some ideas for refining the model and for testing it with historical data (i.e. using data that was available in, say, 2010 and comparing what our model generates to what was permitted and built in subsequent years). 

 

 

 

 

 

Notes

 [1] We drop an NB filing when Job Status = "R – Permit Issued – Entire Job/Work"

 

 [2] If a new structure is permitted to run across more than one lot, there will at some point be a successful application for a zoning variance to merge those lots (i.e. if lots 6 and 7 are merged, 6 will exist in NYC's data for historical purposes, but will be encompassed in 7)

 

 [3] This is particularly the case for some building classes (see DOB building class column) in NYC, where tenement housing may be close together and depending, at least in part, for its structural integrity on adjacent structures. 

 

 [4] Identifying the relevant owner of a building or lot is not always straightforward, as property rights are contracted in multiple ways. See discussion of model development below.

 

 [5] There can be multiple BINs on a BBL for a variety of reasons. Sometimes, this is because there are multiple structures (e.g. a garage), but particularly when there is an NB filing, it can occur because the "old" BIN remains active for some period of time, as it may be associated with violations, liens and so on; but the "new" BIN assigned to the structure to be built has already been associated with the lot. We simply list the BINs that are included in the PAD, which retires the old BIN at some point and, when PAD retires it, we drop it as well. 

 

 [6] There are a variety of reasons to do this, some more legitimate than others. An owner may not yet have obtained agreement from tenants with leases to leave a building they intend to demolish, for instance; but they may also be hiding their intentions from competitors; or they may simply be in planning stages, with ideas that are genuinely in flux. 

 

 [7] With that said, if you want to try to reverse engineer or to try a machine learning approach, feel free. The data is all public, and you can use our output as training data. We would appreciate it if you’d let us know, though, as we'd probably offer to collaborate.