Reference at Newman Library

NYCdata Presentation Dec 9th

NYCdata (http://www.baruch.cuny.edu/nycdata/) is a public web-based compendium of statistics about the City of New York. Created by the Weissman Center for International Business, this comprehensive resource is organized into categories of statistics (population, business, finance, trade, culture, etc.) and each table references its underlying sources. The data is updated on an on-going basis.

Eugene Sherman, a Fellow at the Weissman Center, and Project Manager Kenya Williams are going to give a 30 minute presentation to the library on Monday Dec 9th at 10am in room 320a. The website recently went through a re-design and they would like to illustrate the updates and generally promote the resource. All are welcome to attend.

MapPluto Tax Lot / Parcel GIS Data

Over the summer the NYC Dept of Planning changed course and started providing their MapPluto product – a GIS dataset with boundaries of every tax parcel and detailed attributes like zoning, land use, land value, building descriptors, and administrative identifiers – for free. Previously the dataset cost $300 for each borough and had very tight restrictions on use. Now anyone can download the latest version from their website.

We had purchased a copy of the 2008 MapPluto data for all five boroughs, and I handled the few requests I had via email. But now that the restrictions are off I’m providing free access to it on the Baruch Geoportal. The data is in shapefile format and can be used in any GIS system (ArcGIS, QGIS, etc.). There is one file for each borough. At this point it’s useful for historic purposes; users who want the latest data should go directly to the City’s website.

China Data Center GIS Datasets

We have purchased some GIS datasets from the University of Michigan’s China Data Center. The datasets include boundaries for provinces, counties, prefectures, and cities, as well as current and historic census data that can be joined to these boundaries for mapping and analysis. The datasets also include geographic features like roads, railroads, and rivers. In addition to the national series we have some detailed collections of data for the provinces of: Beijing, Guangdong, Jiangsu, and Shanghai.

The data is in shapefile format and can be used with any GIS software (ArcGIS, QGIS, etc.) In some cases some of the data tables are provided in Excel format, for patrons who are interested in working with the statistics without the mapping elements.

A list of the datasets is available on the Baruch Geoportal. Use is limited to current Baruch students, faculty, and staff for educational, non-commercial purposes. Since the data is copyrighted and I don’t have a secure method of distribution, anyone who is interested should contact me (using their Baruch email address) and I can send them the data, or make arrangements to give them copies.

Federal Government Shutdown Cuts Access to Data

Thanks to the ineptitude of our federal government, many public datasets have ceased to be available until further notice. These are just a few that we use pretty heavily, but it’s likely that access to data from many agencies will be effected:

  • The Census – their website Census.gov and the American Factfinder are ENTIRELY UNAVAILABLE (see this notice)
  • The Bureau of Labor Statistics – their website is still up but it (and their datasets) will not be updated
  • The SEC – EDGAR is still up and running, as the SEC funds many of its programs through license and user fees.

For alternatives to the Census you can steer students to the NYC Dept City Planning for local data and to our databases (Social Explorer, Infoshare, Statista) and the NHGIS for local or national data.

Trial for PolicyMap

We’ve just re-activated our trial to PolicyMap, a US web-mapping database with statistical data of interest to people in business, public policy, and the social sciences. The last trial ran during the final exams period in the spring, and thus didn’t get much attention. The trial is active again from now until Oct 31st and is accessible on campus via our list of trial databases.

Feel free to take a look and ask faculty who may be interested to do the same, and send feedback to Mike.

A description from PolicyMap’s literature:

“PolicyMap provides access to thousands of data indicators that can be analyzed as layers, as well as data points, on interactive maps. PolicyMap’s data indicators are related to demographics, neighborhood conditions, real estate markets, federal program-eligible areas, money and income, lending activity, jobs and economy, education, health, and more. Data can be viewed on a census block or census tract level in many cases, city, county, zip, state, US, as well as by congressional district, school district, state house and state senate districts. PolicyMap’s data collection includes but also extends far beyond US Census data.”

“PolicyMap data can be presented as maps, tables, charts and reports that can be incorporated into papers, presentations, blogs and websites. In addition, students and faculty can upload unlimited amounts of their own address-based data for use in PolicyMap, and can share these maps with others.”

IRS Tax-Exempt Organizations in NYC

The IRS Office of Statistics on Income publishes a dataset that lists the names and addresses of all tax-exempt organizations in the country that are required to file with the IRS: the Exempt Organizations Business Master File Extract.

Since we get many questions about non-profits in the city, and since this resource is raw data that isn’t readily useable for all purposes (selecting just the records in NYC is actually quite a chore), I’ve created a subset that contains just the records for the five boroughs. The data is available in a spreadsheet file that contains: one metadata sheet, one sheet that lists all of the organizations, and one that summarizes them by borough and exempt organization subcode. The large majority of records in the file are classified as 501(3)c organizations, which include most public charities and private foundations.

I’ve modified the records by adding a ZIP-5 code, a county/borough code, and by cleaning up and standardizing the city name in the address field; otherwise the records are exact duplicates of what appears in the original IRS file. The records represent all tax-exempt organizations that filed a 990 Form with the IRS – it doesn’t represent all non-profits. Religious organizations, state and federal public institutions, and small charities with annual revenue less than $50,000 are not required to file (but some do anyway). Since the IRS extracted this information directly from forms submitted by filers, records may contain spelling or classification errors and could appear in duplicate. Users will need to bear this in mind, and may have to clean the data further based on their intended purpose.

The file is available via the NYC Data LibGuide, on the Health and Human Services tab in a dedicated box I’ve created called IRS Tax-Exempt Organizations in NYC. If you have a LibGuide for a course or another subject feel free to *link* to this box in your guide if you find it relevant. I’ve created a process that enables me to easily update the data, which I plan on doing on a quarterly basis (the IRS updates its master file monthly). If you link to the box the contents will be automatically updated as I release new versions (whereas if you create a copy, the file will become stale as the copy is severed from the original box).

For more information and full documentation about the dataset you can visit the IRS Exempt Organization Master File page (I also provide a link to it from the LibGuide box).

NYC Geodatabase

The NYC Geodatabase (nyc_gdb) is a new resource I’ve created, designed for mapping and analyzing city-level features and data in GIS. The database comes in two formats: a Spatialite geodatabase built on SQLite that can be used in open source software like QGIS, and a personal geodatabase built on MS Access that can be used in ArcGIS.

The contents of the databases are identical and include geographic features and statistical areas from the US Census Bureau, transit and public facility point features from the City, and neighborhood-level census data. All features share a common coordinate system.

The databases will be updated bi-annually each winter and summer. While primarily designed for use with GIS and spatial database software, they can also be used to a limited extent with relational database software like MS Access and the SQLite Manager. I plan on creating a workshop around this resource in the near future.

The databases and associated documentation (including a tutorial and detailed metadata) are available through the Baruch Geoportal at http://www.baruch.cuny.edu/geoportal/nyc_gdb/. It is a public resource, licensed under Creative Commons, that anyone can access and download.

Data for Computer and Internet Use at Home

The Census Bureau’s Current Population Survey (CPS) is a monthly survey of 50k US households. Each year they include special topics alongside the basic questions that they usually ask; in 2010 they included questions on computer and internet use at home. The tables include household and individual characteristics by school enrollment, age, race, sex and Hispanic origin at the national and state level. You can view the announcement and access the tables here in Excel and CSV format.

Pew Internet Research Data

The Pew Research Center has started providing raw survey data that they collect from their Internet and American Life Project on their new Data Sets page. You can search by date or topic for surveys they’ve done back to 2003. Recent surveys include: The Social Side of the Internet, Purchasing Content Online, Health Tracking, and Cell Phone Use.

This is primarily raw data – people who want to do number crunching can download the data in SPSS or CSV format. This is survey data – variables are provided for weighting the data for use in analysis and for estimating total populations. Summary tables and the survey questionnaire are available in Word format.