As you know, there are too many datasets today of which about 80% have a spatial component. Of all those with spatial components, they can be grouped into themes. This data often includes both geographic coordinates (e.g., latitude and longitude) and information about the features or phenomena at those locations. Understanding geospatial data is essential for performing spatial analysis, as it helps you examine patterns, relationships, and distributions across geographical areas. Geospatial data can be represented in two main forms: vector data and raster data, each serving different purposes and requiring different approaches for analysis.

You can watch this video and read this discussion on where to get the most updated spatial data.

Types of Geospatial Data

1. Vector Data:

Vector data represents geographical features as points, lines, and polygons. These features are defined by their precise coordinates, making vector data ideal for representing discrete geographic objects like roads, rivers, cities, or boundaries. Vector data is highly efficient for modeling complex shapes with detailed attributes (e.g., a city boundary with population data). The most common types of vector data are:

  • Points: Represent specific locations, such as a city or a well.
  • Lines: Represent linear features, such as rivers, roads, or railway tracks.
  • Polygons: Represent areas, such as countries, lakes, or land parcels.

2. Raster Data:

Raster data represents geographical features using a grid of cells or pixels, where each cell holds a value representing some characteristic (e.g., temperature, elevation, or land cover). Each pixel in the raster corresponds to a specific area of the Earth's surface. Raster data is ideal for continuous data, such as satellite imagery, climate data, or elevation models. Raster datasets are commonly used in environmental studies, remote sensing, and terrain analysis, as they can capture large-scale phenomena over broad regions.

Common File Formats

Geospatial data is stored in various file formats, depending on the type of data and the software being used. Some of the most common file formats for vector and raster data include:

1. Shapefile:

The Shapefile format (.shp) is one of the most widely used formats for storing vector data. It stores geometric shapes (points, lines, or polygons) and their associated attribute data in separate files, which must be used together. The Shapefile format consists of a set of files with extensions like .shp (geometry), .shx (shape index), and .dbf (attribute data). Shapefiles are supported by most GIS software, making them highly interoperable. Over the last 5-7 years practictioners within the geospatial domain have adviced against the used of it. You can follow these discussions to understand why not to use shapefiles.

Learn more about other mordern GISfiles alternatives.

2. GeoPackage:

The GeoPackage (.gpkg) is an open, standardized format for storing both vector and raster data. Unlike Shapefiles, which require multiple files to store different data types, GeoPackages store all the data in a single SQLite database file. This makes GeoPackage a more compact and efficient format for managing complex datasets. Learn more about GeoPackage here.

3. GeoTIFF:

The GeoTIFF (.tif) format is a raster data format that stores geospatial information (such as coordinates, projection, and resolution) along with pixel values. This format is commonly used for satellite imagery, elevation models, and other geospatial raster data. Learn more about GeoTIFF here.

Overview of Relevant Data Layers

To apply our desired conceptual framework, it is essential to identify and align data layers with project needs. The datasets used in this workflow include:

1. Flood Hazard Data:

  • High Probability Floods (HP): Used to identify areas frequently at risk. Download instructions and resources: Download Flood HP Data
  • Medium Probability Floods (MP): Areas occasionally impacted by flooding. Download instructions and resources: Download Flood MP Data
  • Low Probability Floods (LP): Rarely impacted but crucial for extreme risk assessments. Download instructions and resources: Download Flood LP Data

2. Population Data:

  • Contains demographic breakdowns (gender, age) for settlements. Learn more and access the data: Population Dataset

3. Ecosystem Services Data:

4. Landscape characterization:

5. Points of Interest (POI):

  • Includes infrastructure like schools, hospitals, and parks. Download from: POI Dataset

Now that we have discussed the types of data and where to source them, we will go into the datasets used for our application. It is critical to select data layers that align with the specific needs of the project. In the next section, we will explore how to use these datasets effectively to implement the impact chain assessment framework.