Integration of the Cartographic Material - Forschungszentrum Deutscher Sprachatlas (englisch)

Integration of Cartographic Material

The availability of maps in REDE is contigent upon the form of available data.

In the event that maps are available in print-form, they are first scanned and subsequently georeferenced. Finally, the data from a given map are entered into the REDE database via an input mask (Type I: integration of the modern regional atlases via scan, georeferencing, and database entry).
In the event that the linguistic maps are available in the form of electronic records, they are imported into the REDE database via an ETL-process. In doing so, the character information is converted into phonetic or orthographic characters and the maps can then be generated directly from the database (Type 2: integration of the modern regional atlases via database import).

These two procedures will be described in what follows.

1. Integration of the Modern Regional Atlases via Scan, Georeferencing, and Database Entry

(1) Entering the primary data into the database: Pivotal map information are entered into a database for each of the maps so that REDE users can search for maps and data in a targeted manner. This includes the following:

atlas title
volume number
map number
map title.

This information serves as a map overview. The entries form the interface to the maps with which they are linked. With that, they form the basis for the search functions for REDE's user interface.

(2) Digitization: All maps are scanned in 600dpi (partnering firm: graphic sience), burned onto a DVD and then checked for quality.

(3) Georeferencing: Next, maps are edited with the program QGIS in such a way that they can be superimposed and compared with one another independent of their size, map section, and scale. Coordinates are attributed to image pixels (geocoding). After having been geocoded, the maps undergo a process of georeferencing. This results in maps that have geographic information. Then, the superfluous edges of the maps are "cut off" (polygonization). To allow for faster access to the maps (in various zoom steps), the maps undergo a compression process and converted into ecw-files (Enhanced Compressed Wavelet) before they are made available to users. This process was already used in the project DiWA. Further information can be found at DiWA. At the same time, the maps form the foundation for steps (6) through (8).

(4) Creating legends: The legends for each map are saved as separate files so that they can be shown in a separate window. This simplifies the work with the maps as well as the entry of legend information in step (7).

(5) Creating character sets: Two fonts are created for every atlas to ensure that the data from the maps can be recorded and integrated into the REDE database. They represent the linguistic information and the symbols on the maps. Using the fonts, two character sets are created for every atlas. A character set contains the inventory for the phonetic transcription and an additional character set contains an inventory of the symbols represented on the map.

(6) Creating a survey net: Before being able to assign a symbol to a particular location, a survey net has to be created for each atlas. The survey net contains the geographic information for each survey location in an atlas.

(7) Entering legend information: With the character sets created during step (5), a legend is created for every map that contains all of the symbols from the map. The entry of the data from the legends is done with a form from which the characters needed for a map can be entered with a virtual keyboard.

(8) Recording the symbols from a location: During step (8), linguistic data is linked with geographic information (s. step (6)). At the end of this step, linguistic information is then assigned to a survey location on a case by case basis. The edited data can then be displayed as thematic maps. To make this time-consuming step as smooth as possible, special tools were developed for the REDE SprachGIS.

(9) Entry of the metadata into the database: REDE users can search for maps and data in a targeted manner. For this, certain information needs to be recorded in the REDE data bank. In addition to the primary data, this includes the following:

category
mapped phenomenon
the linguistic level of the mapped phenomenon (sound, form, word, syntax map)
complete original classification (classification of a map in an atlas)
the context in which the mapped phenomenon was elicited
search term
...

Users can also access some basic information about each atlas such as the following:

editor
release date
scope
survey period
field workers
mapped linguistic levels
mapping methods.

(10) Checking for quality: There are two steps necessary to check for the quality of each map. Student assistents check both the metadata and the legends and map images. The scientific staff members also spot-check the maps, in addition.

2. Integration of the Modern Regional Atlases of German via Import

(1) Entering primary data into the database: See the procedures for type 1.

(2) Importing digital data into the REDE database: In the event that electronic data records are available for a given set of maps, steps (2) to (4) as well as (7) and (8), no longer apply as described under type 1. The data records already contain the results. Given the fact that not all data records are structured in the same way, they must be altered in such a way that they can be imported into the database (s. step (9)) from where they can be furnished with metadata.

(3) Preparing digital data: The data records are decoded and then converted. In this way, they can be assigned character sets and geographic coordinates. By doing this, a legend can be automatically generated for every map. The legend is generated from the database based on information as to where certain symbols occur.

(4) Entering metadata into the database: See the procedures for type 1.

(5) Checking for quality: See the procedures for type 1.