About Me

My photo
Geógrafa pela Unicamp (2014), incluindo um ano de intercâmbio universitário na Universidade de Wisconsin (EUA). Possui experiência na área de geotecnologias, GIS e planejamento urbano, tendo realizado estágios na Agemcamp, American Red Cross e - atualmente - no Grupo de Apoio ao Plano Diretor da Unicamp.

Wednesday, February 6, 2013

Suitable Locations for a New School in NYC

1. Introduction 

The goal of this project is to find where a new school should be located in the urban area of New York City. For that, the idea is to analyze the spatial differences on the school distribution, which can reflect how the education is being offered to different areas. This is the basic step to accomplish a bigger goal: to insert schools in the areas with lack of education – based on the number of schools. The result should suggest potential initiatives for the city planning government department, aiming to increase the level of education offered. It should be useful especially for organizations dealing with urbanization management, since the area of interest is not a political boundary; instead, the urban dynamic leads to the New York Metropolitan Area, used for this project. 


2. Data Sources 

For this project, four feature classes were obtained from the Environmental Systems Research Institute (ESRI) server geodatabase: urban areas, census tracts, parks and schools. The urban area feature was used to select the New York urban region. The census tracts layer was used to obtain demographic and detailed information inside the previous feature. Parks and Schools were used to comply with the defined criteria. 

There were concerns about the data quality, mainly because most of the data is out of date. The metadata was analyzed and some of the data don’t have a clear description of when it was the last update. Tracts data were obtained 12 years ago, which is complicated because the age field is the main focus. After this period, the proportion of population with ages between 5-17 years may have changed a lot. Parks data is also old: obtained in 1997, however, the concerns are smaller because this feature doesn’t change so often. The same is valid for the schools feature: although it does change with a higher frequency, it doesn’t change as much as the population data. However, the major problem with the schools feature is that there’s no clear description of when it was obtained and which categories of schools are included. 

There are no worries about the scale of the dataset. The minimum scale of the dataset is around 1:50.000 – 1:500.000 and the results of the project are being presented and analyzed in a much smaller scale - around 1:1.000.000, maintaining an appropriate level of quality. 

3. Methods 

At first, the general data obtained from ESRI – covering the whole country – is prepared to meet the specifications of the project. By using the “Select by Attributes” tool, also simply called query, the New York area can be selected from the urban feature an then exported as a feature class. Since the data covers a larger area than necessary, this new feature can be used to clip all the other layers, minimizing the system overload. The standard coordinate system is GCS, based on the WGS 1984 datum, which has a high level of distortion. Then, it is necessary to re-project the dataset. Because the Area of Interest does not meet a political boundary, the State Plane Coordinate System or State Coordinate System weren’t appropriate. In the other hand, the entire area falls inside the 18N UTM zone, enabling the use of Universe Transverse Mercator Coordinate System, with the North American 1983 datum (Figure 1). 

Figure 1 – Data Preparation



Subsequently, the necessary criteria to find the suitable locations are used. Locations where a school is necessary are the main goal, rather than only where it would be acceptable. Then the first step consists in finding how many schools each tract has. A summarized inside join would create a field in the tracts feature containing the count of schools within each tract. 

However, the analysis of this information can be deceiving because each tract has a different amount of population in school age. Thus, the Ratio field is created and the field calculator allows showing the average number of students per school in each tract. The ones without any school will have the “NULL” expression in this field because a division by zero is not valid, which will be dealt with later. A map symbolized by this field helps to understand the school distribution, classified by standard deviation (Figure 2). The white areas related to the error fields mentioned. 

Figure 2 – Lack of Schools in New York


A high number of students per school in a region indicate the need for new schools. To find the number above which it would be considered lacking schools, the standard deviation is used. The tracts with a deviation greater than 0.5, which represent more than 729 students per school, are considered in need of more schools. In other words, these are locations where there are too much people in school age, but not enough schools. 

A query can select all the tracts where the ratio is higher than 729, but only with this criterion the tracts containing the “NULL” expression wouldn’t be selected and they represented more than 1000 entities. Thus, tracts with no school, but with population in school age higher than 729, should be included in the query (Figure 3). 

Figure 3 – Dataflow model to find the suitable area


Considering the recreation for the children and a possible distance walk of one kilometer, a buffer based on this distance for the parks feature is made. The intersection between this area and the ones considered lacking schools will give the final result of the suitable areas for a new school in the urban area of New York. 

4. Results 

The area found is mainly limited by the buffer created from the parks feature, rather than related to the lack of schools. It’s concentrated in the north and some portions of Long Island, but not much in New Jersey (Figure 4). Bronx has a concentration of the suitable areas even though it also has a high concentration of existing schools – which was noticed earlier in the project with the school point feature class. 

Figure 4 – Suitable locations for a new school in the New York urban area.



5. Conclusion 

It’s possible to notice a pattern where the areas away from the center are in more need of schools than others, which also coincides with areas of low-income. It’s suggested, then, that these areas receive more attention and further analysis to improve their education access. 

However, the results of this project need to be carefully taken in consideration since a lot of improvement could be considered. An important element to be regarded is the transportation dynamic for an extreme urbanized area. For this project, it’s being considered that the children of each tract would only attend to schools located in the same tract, which is not true. The mobility in a city like New York is high, which can make them attend schools far from their homes. Also, the size of the tracts is varied, so even if the children attend schools close to home, they might not be in the same tract. This is even truer in the densest area – Manhattan – where the tracts are smaller. 

Furthermore, an updated data would be essential to guarantee the consistency of the results related to the population age. Also, a more detailed data about the schools, including its type (middle school, elementary, high school) could show more precise results, because they would be related to the appropriate age. However, the tracts would also need more detailed in age fields. 

The main challenge for this project is to determine what makes an area be considered served or not by the education system. The capacity of each school is not present in the data used; otherwise a comparison between the capacity of each tract and the number of people in school age could be made. Because that wasn’t possible, statistics based on standard deviation were used, but it gives only a general observation which is not very specific. In a more complex project, it would be interesting to have the data related to the quality of the schools, maybe quantified by SAT results or a similar evaluation. Then, a deeper analysis of the school quality distribution could be made, suggesting not only where a school is need, but also where the existing schools need improvement.

Suitable Areas for Bears



GPS Data Collection


Using Census Data