The data conversion XML document is used to convert the input data to match the data model defined in the lexicon document (Lexicon).
An example of the inlined and by-level input data formats are given below, for a comprehensive description of all the possible content options for the document see the schema document. Data continuation and abbreviated content expressed as …
The root level tag contains a reference to the schema document which is used to validate the content of the XML document:
<SIMO_conversion_mapping xmlns="http://www.simo-project.org/simo"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.simo-project.org/simo
../schemas/conversion_mapping.xsd">
Data levels definitions are given nested as in the lexicon. The input data is given Each data row has an belongs to a certain data level implicated by a variable in a given position in the row
The input data has two data levels: comp_unit and stratum, comp_unit being the top level, simulation unit (e.g. stand or sample plot) and stratum stratum being the lower level (sub_levels). In the input data the comp_unit data is on the rows that have value 1 (rowtype_value) at the position 1 (rowtype_rowpos), and the stratum data is on the rows where this value is 2. The id for each object is at the position 1 for comp_unit objects and at position 2 for stratum objects. The position values begin from 0:
…
<data_levels>
<level>
<name>comp_unit</name>
<id_rowpos>
<pos>0</pos>
</id_rowpos>
<rowtype_rowpos>1</rowtype_rowpos>
<rowtype_value>1</rowtype_value>
<date_rowpos>7</date_rowpos>
<sublevel>
<name>stratum</name>
<id_rowpos>
<pos>2</pos>
</id_rowpos>
<rowtype_rowpos>1</rowtype_rowpos>
<rowtype_value>2</rowtype_value>
</sublevel>
</level>
</data_levels>
Note that the <rowtype_value> tag can contain several values. If several rowtype values are specified, the smallest is taken to indicate the actual data row, others are treated as containing extra information. For example, the date information should be on the main data row.
Missing values in the input data are indicated by either with no data between the delimiters (‘’) or with the value -1 (none_value_indicator). In this case the input data row is rejected during data import if the SIMO variable MAIN_GROUP would get any of the values 4, 5, 6, 7 or 8 (object_rejection). The rejection variable must be from the highest data level; i.e., in this example from the comp_unit level:
…
<none_value_indicator>'' -1</none_value_indicator>
<object_rejection>
<SIMO_variable>
<name>MAIN_GROUP</name>
<reject_criterion oper="in">
<enum>4 5 6 7 8</enum>
</reject_criterion>
</SIMO_variable>
</object_rejection>
The variable “Pinta-ala” is converted into SIMO variable “AREA” during import. The “Pinta-ala” value in the from-element is only for documenting purposes. It won’t have any effect in the import, because the actual imported value is defined by the row_type and row_position element values. Here the row_type value refers to the ones defined above in the data_levels definitions. It would be possible to give a default value for this variable in case of a missing value (none_to_value). A conversion factor is defined for numerical attributes (conversion_factor). It’s used in data import by multiplying the input data value with it:
…
<variable>
<name>
<from>Pinta-ala</from>
<to>AREA</to>
</name>
<row_type>1</row_type>
<row_position>8</row_position>
<from_datatype>double</from_datatype>
<none_to_value/>
<numerical>
<conversion_factor>1</conversion_factor>
</numerical>
</variable>
For categorical values explicit mapping from input values to SIMO attribute values is given (value_mapping). In this case input data value 1 at position 14 for row type 1 is converted to PEAT attribute value 0, and values 2, 3, 4 and 5 are converted to 1:
…
<variable>
<name>
<from>Alaryhmä</from>
<to>PEAT</to>
</name>
<row_type>1</row_type>
<row_position>14</row_position>
<from_datatype>int</from_datatype>
<none_to_value/>
<categorical>
<value_mapping>
<value>
<from>1</from>
<to>0</to>
</value>
<value>
<from>2 3 4 5</from>
<to>1</to>
</value>
</value_mapping>
</categorical>
</variable>
…
</name> <row_type>1</row_type> <row_position>15</row_position> <from_datatype>date</from_datatype> <none_to_value/> <date>
<epoch_year>current</epoch_year>
</date>
</variable> … <variable>
- <name>
- <from></from> <to>estate_name</to>
</name> <row_type>1</row_type> <row_position>16</row_position> <from_datatype>string</from_datatype> <none_to_value/> <text/>
</variable>
</SIMO_conversion_mapping>
The conversion format for data in separate files for each data level is similar to the inlined data except for the data level definitions:
<data_levels>
<level>
<name>comp_unit</name>
<id_rowpos>
<pos>1</pos>
</id_rowpos>
<link_id_rowpos/>
<rowtype_value>1</rowtype_value>
<sublevel>
<name>stratum</name>
<id_rowpos>
<pos>1</pos>
</id_rowpos>
<link_id_rowpos>
<link>
<level>comp_unit</level>
<pos>0</pos>
</link>
</link_id_rowpos>
<rowtype_value>2</rowtype_value>
</sublevel>
</level>
</data_levels>
Here the linking between the files is defined in the link_id_rowpos-element for the stratum data: the first value (pos) in each row in the stratum data file will contain an id for the comp_unit (level) object the stratum belongs to. This id will be identical to the value found in the second position (id_rowpos is 1) in the rows of the comp_unit data file.