Page for More Info
This website was produced by Josh
Whitford and Kristen Akey, using a series of publicly available datasets
[set anchor], and produces a dataframe (as CSV or excel) and a shapefile
(rendered as a map)[set anchor]. The underlying logic of the model[set
anchor] is rooted in the observation that in NYC new building filings
can occur years before previous buildings on the lot have been vacated,
let alone taken down, and do not always represent the "true" lot size of
the eventual structure, once all approvals have taken place, permits
granted, and ground broken.
The site is intended, when finished,
to be of use to anybody who is interested in new development and
construction in NYC. This can include personnel at the DOB, who may find
it useful to have a database that regularly tracks when permits are
"pulled," or neighborhood and tenants groups wondering what is coming
down the pike. It is, however, a relatively sparse tool, a means to
begin further investigation. This can be done with a variety of
resources. These include the Buildings Information Systems (BIS)
website, which gives
access to all permit filings; the NYC digital tax map is also a good resource if one is
interested in looking at property records.
The model takes as its initial input
the last three years of New Building (NB) filings
with the NYC DOB, but
then drops all those that have in fact been permitted and broken
ground[1]. We also drop any NB filing on a lot that is an outlier in the
city system (e.g. Roosevelt island, which includes just a few lots with
many buildings on each; some industrial areas; etc).
For the remainder, we use data available in
the Automated City Register
Information System (ACRIS) to ask whether any adjacent lots are
jointly owned either by the party that filed the NB application, or the
party with development rights to the underlying lot (i.e. the "BBL"
or Borough-Block-Lot as defined by the NYC system for parceling
real estate). We then define an "inferred lot" as the lot that includes
the NB filing and any jointly owned adjacent lots; we also tag all lots
that are adjacent to the inferred lot as buildings on those lots can be
at risk in construction [2].
These inferred lots are then the core
input for the CSV file that can be downloaded on the main page. We move
analytically from the lot to the structure—or, in data terms, from BBL to Building
Identification Numbers, or BINs, as much of the information that is of
potential interest is less about the land underneath than about the
structures themselves. This is, also, because land and buildings are
regulated differently, and by different entities.
The spreadsheet/dataframe (CSV) that
can be downloaded from the mainpage contains 19 columns, with some
overlapping information, in what is perhaps not the most logical
ordering (the site is, again, a work in progress). We include them all
to allow for various ways of sorting/searching. Each column heading is
listed in red
boldface below, with some explanation.
Cluster: The job
cluster is defined by the BIN of the NB filing. Its purpose is to
provide a stable and unique ID to link rows associated with a particular
NB filing (i.e. the NB, any lots our model identifies as part of the
true (inferred) lot, and any lots adjacent to that inferred lot).
BBL: the full 10
digit Borough-Block-Lot number. The first digit is the borough; the
next five are the block number; the final four are the lot
Borough: (1) Manhattan; (2) Bronx; (3) Brooklyn;
(4) Queens; (5) Staten Island
Block: Self-explanatory. See BBL.
Filing: This is the
job filing type from the NYC Open Data DOB Job Application Filings. By design (and filter), this is either NB,
to indicate the BBL which was part of the initial input to our model, or
"-" for lots that are in the CSV because in the inferred lot and/or
adjacent to that lot.
Description: This
column is, in effect, the listing of BBLs prior to the inference of our
model. The lots we include in this CSV fall into the three categories.
(1) "Job Filing", meaning the lot associated with an NB filing; (2)
"Adjacent," meaning a lot somewhere adjacent to the NB filing lot; or
(3) "Block" meaning they are on the same block, but are not adjacent to
the initial filing.
Lots: This is the
model output. Rows in the CSV are either: (1) "NB", meaning they are the
lot associated with the initial filing; (2) "Inferred Lot," referencing
a lot that is jointly owned by the NB filer and/or the owner of the lot
associated with the NB filing (these are either "Adjacent" or "Block" in
the BBL Description column); (3) "Adjacent," meaning the lot is adjacent
to what our model identifies as the "true" demolition/construction site,
where a building is potentially at some risk of damage[3]. (4) "-"
(i.e. blank). This occurs in cases where zoning variances are requested
and/or in in process, where is some mismatch in the data between BBLs
and the associated BINs; this often occurs in multi-lot filings, with NB
permits requested for multiple adjacent lots.
Address: This is reasonably self-explanatory. We
use the Property Address Directory (PAD) available in NYC opendata. If multiple
addresses are listed for a BBL, we use the first address.
owner: This is the
name associated with the NB filing and/or the name of the owner we have
identified using ACRIS[4]. We list owners only for lots where there are
is NB filing, or where we have an "inferred lot." We leave the column
blank for adjacent buildings.
Adjacent Same
Owner: This is a
categorical variable, coded 1 if the BBL is identified by our model part
of the inferred lot without, however, any NB filing.
Filing: The categories
here are the same as in BBL description, but identify "true" lots that
are larger than any single NB filing where the owner has filed separate
NB applications for multiple adjacent lots. In these cases, we consider
them part of the same "Job Cluster" (see first column) but do not refer
to them as an "inferred lot" since we identify the cluster differently
(i.e. by multiple filings, as opposed to joint ownership without
BIN: We use
the Property Address Directory
(PAD) to identify all
the Buildings Identification Numbers on a lot. If there are multiple
BINs, we include them all[5].
Date: This column
gives the date of the "pre-filing" of the NB permit application. This is
the date of the initial NB filing in the NYC DOB Job Application
Filings dataset on NYC
OpenData. It is associated by the DOB with a "Job#" but is, typically,
not the only filing associated with that Job#. There are typically
multiple rounds, each associated with different "document #" in the Job
Applications database. We include this date in the CSV as it is the date
when the DOB was notified of the intent to erect a new structure on the
associated lot.
Status: this
distinguishes between NB filings that are on BBLs in the Tax Lot
shapefile available to us from NYC opendata. It indicates whether an NB
is on the map on our main page, or is not (the map is based on the
shapefile, so if the lot has not been created by the Department of City
Planning – perhaps because the NB filing has not been approved—we cannot
show it).
Applications: this
column includes applications to the DOB associated with any BIN in the
CSV, whether in the projected building site or adjacent to it, if filed
after the pre-filing date of the associated Job Cluster. We include
applications for New Building (NB), Alterations (A1, A2, or A3), and
demolition (DM), in the order of "actions" (filings or decisions by the
DOB). We also include the "Job Status Code" of each application,
separated by a dash. The codes themselves can be found here;
more explanation of different permit
types here; and, general explanation of these
different codes and types here. By way of example, NB-J means "New
Building—Plan Exam-Disapproved"; whereas NB-Q means "New Building—Permit
Issued-Partial Job"; and so on.
Latest Action
Date; we do not list
the dates for all the permits files, to avoid confusion. We include just
the date of the most recent action (which will the last permit-status
combination in the previous column). Note that this allows the user to sort the
CSV by date to obtain a listing of buildings in or adjacent to an NB
site where there have been recent actions. This may be useful for
coordination across units interested in the same
Unit: this column
identifies buildings in which there is a rent-regulated unit (Y/N)
registered with New York's Division of Housing and
Community Renewal (DHCR). There is no publicly available
listing of which units are covered by New York State's rent regulation
rules, but NYC's Rent Guidelines Board has published listings, by
borough, of all buildings that had at least one unit registered as of
2018. Those listing—which we use to generate this flag—can be
found here.
DOB Building
Class: This column
list the "DoBBuildingClass" from the dataset of "Buildings Subject to HPD
Jurisdiction" dataset
on NYC OpenData (HPD is the NYC Department of Housing Preservation and
Development). We list the Building Class for each BIN in our BIN column
(so, if there are two BINs, there will be two Building Class listings,
separated by a semi-colon). A listing of the different building classes
can be
found here, and the rules on who must register their
buildings with HPD can be found here. This dataset includes all
buildings subject to NYC's "Multiple Dwelling Law," if ever registered. As a result, there are many buildings
(BINs) in the dataset that are not registered at present, where the
DoBBuildingClass field lists "Not available." These buildings are in the
vast majority of cases 1-2 family homes that are not required to be
registered with HPD if they are owner-occupied, but should otherwise be
registered. They are often, in fact, still rental properties that have
just not been registered with HPD. The building type can be obtained
from other datasets, if desired (e.g. from the Departmet of Finance).
So, to avoid confusion, we do not list the DoBBuildingClass field as
"Not Available" as in the original dataset (since it is, in fact,
available; it is just not in HPD data). We list the class instead as HPD
Risk: This column
flags six building classifications that, according to the Chief Engineer
for Enforcement at the DOB, are disproportionately at risk if on or
adjacent to a construction site. Note that these include buildings that
are not registered with HPD, where the risk is rooted, we suspect, in
the disproportionate tendency of landlords who do not register their
rental properties also to underinvest in building maintenance. The
categories we have flagged are:
NL –
OL –
- DOB Job Application
- Property Address Directory
- ACRIS - Real Property
- ACRIS - Real Property
- ACRIS - Real Property
- ACRIS - Document Control
From NYC Digital Tax Map: Tax
Block shapefile -
DTM_Tax_Block_Polygon.shp; Tax Lot shapefile -
-.DHCR – listings published by
NYC Rent Guidelines Board
- HPD – Buildings Subject to
HPD Jurisdiction
- CSV output, with:
- BBL_Description - description of the
lot (Job Filing, Adjacent, Block) - Joint_NB - joint NB status of lot
(NB, Adjacent, Lot)
- Address - address of NB or
same-owner matched lot
- NB_Owner - name of NB owner or name
of same-owner matched lot owner –
Same_Owner - indicator, 1=lot has same owner
as NB, 0=otherwise - RPP_Owner_Not_Same - for the NB, was RPP or NB
filing name used? - 1=RPP name(s) of NB used, 0=NB filing name
- doc_id - document ID of same-owner
matched lot
-Additional data, linking BBL to BIN
via PAD as necessary, as described in above.
- NYC Lot Shapefile, with:
- BBL_Description - description of the
lot (Job Filing, Adjacent, Block) - Joint_NB - joint NB status of lot
(NB, Adjacent, Lot)
- Same_Owner - indicator, 1=lot has
same owner as NB, 0=otherwise - Owner - NB, Same_Owner, Block
Model Logic and
The model grew out of conversations
between Josh Whitford and personnel at the DOB, who had been manually
producing and updating a spreadsheet similar to the CSV that this
website generates and updates at regular intervals using computations
DOB personnel, working entirely with
publicly available data, were keeping track of NB pre-filings, and then
going through ACRIS property records to determine whether there was
joint ownership with adjacent lots. This process is not entirely
straightforward, since ownership may be assigned to different corporate
holdings, some development rights may be sold but not others, and so on.
But, since it is common prior to an NB filing to consolidate the
necessary ownership and development rights under a single personal or
corporate holding, it is often possible to infer the likely size of the
"true" lot, even if not indicated in the pre-filings[6].
In these cases, DOB personnel would
anticipate that in filings will likely include an application to the
Department of City Planning for a zoning variance to join the lots, and
changes to the NB filings to build on the entire lot (if approved). They
were also identifying adjacent buildings, noting whether they contained
rent-regulated units—a factor that can, if there is new building going
on, lead to tenant harassment—and identifying the building
Their aim was to help the different
regulatory units that may have jurisdiction or responsibilities in the
complex and sometimes multi-year process of permitting the demolition of
older structures and building new ones. The New Building "pre-filing" is
the first moment that the prospect of new construction in NYC is
registered with government regulatory agencies and—as a
consequence—becomes visible also to the public.
Josh Whitford of the Columbia University Department of
Sociology, with the aid of research assistants—Kristen
Akey in particular—and
seed funding from ISERP, relied on this manually produced spreadsheet as
"training data" to automate this process using a more computational
approach (python and qGIS for the model; R and Github actions to
automate the updating of the site itself). We did not use machine
learning techniques, but instead developed a series of queries of the
relevant datasets, refining the method in a sort of iterated debugging.
We did it this way in part because the manual methods had missed some
(in our terms) "inferred lots" that our computational approach was able
to identify. In those cases, we'd correct the training data, and iterate
again. We were able ultimately to achieve levels of both precision and
recall above 95%.
We are not willing to share the
specifics of the model at this time[7]. We have developed it with the
dual intention of providing a tool that can be useful both to the DOB
and to the public, and that we can use for purposes of our own research.
We can, however, outlines the underlying logic. We begin with our
listing of NB filings and, using qGIS and using an up-to-date tax block
shapefile to identify all adjacent lots. We then use a fuzzy name
matching algorithm, and names from the NB filings and a subset of the
documents available on ACRIS to generate a likelihood that two lots are
jointly held by a single party. If they are, we place them in the same
"inferred lot", identify all lots that are adjacent to that larger lot,
and repeat the process. The complexities lie in the fact that lots
change over time, as zoning variances are granted (this changes what is
next to what) and in identifying shared ownership for purposes of
building, given the many different contractual forms that can take, the
use of varying corporate names, and so on.
To come…. In the short term, we have
some ideas for refining the model and for testing it with historical
data (i.e. using data that was available in, say, 2010 and comparing
what our model generates to what was permitted and built in subsequent
[1] We drop an NB filing when Job Status =
"R – Permit Issued – Entire Job/Work"
[2] If a new structure is permitted to run
across more than one lot, there will at some point be a successful
application for a zoning variance to merge those lots (i.e. if lots 6
and 7 are merged, 6 will exist in NYC's data for historical purposes,
but will be encompassed in 7)
This is particularly the case for some building classes (see DOB
building class column) in NYC, where tenement housing may be close
together and depending, at least in part, for its structural integrity
on adjacent structures.
Identifying the relevant owner of a building or lot is not always
straightforward, as property rights are contracted in multiple ways. See
discussion of model development below.
There can be multiple BINs on a BBL for a variety of reasons. Sometimes,
this is because there are multiple structures (e.g. a garage), but
particularly when there is an NB filing, it can occur because the "old"
BIN remains active for some period of time, as it may be associated with
violations, liens and so on; but the "new" BIN assigned to the structure
to be built has already been associated with the lot. We simply list the
BINs that are included in the PAD, which retires the old BIN at some
point and, when PAD retires it, we drop it as well.
There are a variety of reasons to do this, some more legitimate than
others. An owner may not yet have obtained agreement from tenants with
leases to leave a building they intend to demolish, for instance; but
they may also be hiding their intentions from competitors; or they may
simply be in planning stages, with ideas that are genuinely in
With that said, if you want to try to reverse engineer or to try a
machine learning approach, feel free. The data is all public, and you
can use our output as training data. We would appreciate it if you’d let
us know, though, as we'd probably offer to collaborate.