Step 3: Content assessment of selected datasets
Once we have selected some datasets that should satisfy our needs, we now have to evaluate the datasets to determine which ones best meet our requirements. This step is crucial because even though this evaluation brings us to use the data that we think was the best to meet our needs, gaps to be corrected and necessary transformations will be identified at the beginning instead of during the project. This will help to better plan budgets and schedules, to better manage risk of inappropriate data usages, even save important costs. Never assume that available data will de facto meet our needs because the results can sometimes be surprising!
The first step of this assessment is to verify if the content meets the requirements. In this post, we will assess if the What, represented by the object Semantics, the shape of the geometry (point, line, polygon, multipoint, etc.) and its geometric definition (digitizing specification), satisfies the requirements.
In order to learn about the What, there is nothing better than an inventory of the existing data. The first step of this inventory will be to define its scope using the results of the needs assessment analysis discussed in the Post Step 1: Needs assessment. The level of detail of the inventory will vary from one object class to another depending on its relevance to meet the requirements. It is not always easy to evaluate the relevance level especially when the needs are not well defined. In some cases, it could be necessary to clarify the needs with the client or to validate the relevance of particular objects classes. The inventory will then consist of the examination of each object class (digitized or not), each attribute and each domain value with the aim of evaluating the relevance according to the level of detail required to satisfy the needs. For example, if we only want to display the roads, without using them for spatial analysis, it would not be necessary to have their geometric description to consider them relevant. However, if we want to make network analysis, we will have to know all the elements used for road segmentation (e.g.: road intersections, municipal boundaries, road name change and road width change). If no precise specification exists and the roads are segmented in incoherent manner (e.g. after a none controlled digitizing), this objects class will be considered as being not relevant or relevant but requiring to be cleaned and segmented again.
The biggest difficulty in this step will be to find the information. This lack of information can bring users to make bad decisions because they have made a mistake about what the data represents. The documentation provided with the datasets often does not define the What as well as it should be. Semantics definitions are too generic, the geometric shape is not well defined, geometric definitions are often missing and the rules to pass from one geometry to another in the case of alternate geometry object class (i.e.: a building having its surface area on the ground > 500 m² is represented by a polygon or else by a dot). Knowing the data well in relation to the requirements will require to communicate with the data producers to have more information (due to incomplete documentation) to gather additional information and to consult the data in order to extract information like spatial integrity constraints between objects classes (e.g. road segmented at railway junction, municipal boundaries and according to some attribute values like pavement, speed limit, number of lanes, etc.). Being extremely familiar with the raw material (datasets) used in a new system is essential!
In order to complete this inventory, complementary information defining the How and the Why could be added for the objects classes retained. The description of this step will be discussed in Step 4 of this post: Quality assessment.