Step2: Search of Geospatial Data
The search of data is not an easy task because there is no unique Geospatial Data Discovery Portal or “georepository” to find all the available geospatial data and this, despite the efforts of national spatial data infrastructures. Before beginning the search, it is necessary to have in mind the results of the previously completed needs assessment step (see my previous post) which corresponds to:
- what are we searching for, the semantics of objects → the What,
- the spatial extent → the Where,
- the temporal extent → the When,
- the analysis and usages → the Why, which will allow to precisely define the relevant What and to define How the data must be structured,
- the kind of document needed (vector, raster, DTM, etc.), the ranges of scales or spatial resolution and temporal resolution used in the data production, the coordinates system, data format, desired quality criteria, … → How is the data,
- Available Human and monetary resources → the Means.
- What: Criterion which directs us toward a particular portal in the event of very specific searches themes like: standing forest or type of soil. For more generic themes, like roads and hydrography, virtually all portals will describe this kind of data. Most of the portals use thesaurus (synonyms and parent/child terms lists) which facilitate the What search.
- Where: Criterion which helps to identify the search portals. For example, for a worldwide coverage, the Natural Earth portal offers free geospatial datasets on many themes. This portal replaces the data server Digital Chart of the World which content is not up-to-date. For a Canadian coverage, Canadian Geospatial Data Infrastructure (CGDI) is a doorway to about 700 vector and raster datasets available in different formats. At the Quebec provincial level, the Québec géographique portal shows all the maps, atlas and geospatial data products available on the Quebec government departments and other governmental organisations. Geospatial data can also be found in private companies like NAVTEQ (world coverage) and DMTI (Canadian coverage) which sell them. The Where can be interrogated by location keywords, by coordinates, by map number or by defining a region on a map.
- When: Easily identifiable criterion in raster datasets which represent the territory state at a specific date or a range of specific dates in the case of several images. It is the same with most large scale datasets for which the data acquisition was done over a small period of time. On the other hand, for small scale datasets covering a large spatial coverage like National road network (NRN on Geobase), the temporal coverage for the complete dataset may be very large (presently from 2003 to 2011 for the NRN). In this case, the When is very dependent of the Where, the What and the How, i.e. which will vary from one place to another, according to the data layer and the data acquisition technic used (remote sensing imagery, aerial photography, ground survey, etc.). In these conditions, the When will have to be analysed in more detail. It is not always possible to search data using the When in GeoPortals; the search of historical data may be laborious.
- How: The How is composed of many criteria. Most are easy to identify and are part of search criteria (document type, production scale), as some others vary according to time and space (spatial accuracy, completeness, thematic accuracy) and will have to be analysed in more details in the next steps in order to insure that they meet the requirements.
- Monetary resource: Some portals promote themselves as free data sources like GeoGratis and Open Street map. Others, like CGDI and Statistics Canada, allow to easily differentiating which data is free from data you will have to pay for. The Géoboutique Québec sells data like private companies. We also find geospatial data in some provincial government departments and municipalities but most of them do not distribute information on their data and do not have portals which allow discovering the data. It is necessary to contact them to find out if the data can be distributed and, if yes, if there is a cost associated. The risk of data misuses is not well known today and data producers are careful about distributing their data as it could have liability impact.
Following this research, it is now necessary to analyse in more detail the datasets that was selected to insure that they meet the requirements. For geospatial datasets that we acquired at a cost, it is often possible to get a dataset sample. The What will be analysed in more detail in Step 3, in my next GeoPost on How to select the best Geospatial Data in 5 steps? – Content assessment of selected datasets and the How ant the Why will be step 4: Data quality assessment.