Mapping file specification#
The mapping file enables transformation of raw data into FHIRflat format. Mapping is usually done using a Google Sheet, though using a local file is also possible.
Qualifiers for requirement levels (MUST, SHOULD) are interpreted as in RFC 2119.
Once a mapping file or a Google Sheet is prepared, the
fhirflat.convert_data_to_flat() function can be used to
transform the source data into a folder of FHIRflat resources in parquet format.
Note
The usual location for a mapping file is a Google Sheet, which has a URL like
https://docs.google.com/spreadsheets/d/{SHEET_URL}/edit
Metadata#
The first sheet of the Google Sheets document MUST be a table with the following
column headers: Resources [cell A1], Sheet ID [cell B1] and Resource Type
[cell C1].
Cell A2 MUST be =getAllSheetNames() which populates the column with the names
of the other sheets. Cell B2 MUST be =getOtherSheetIDs() which populates the
column with the sheet IDs of the other sheets.
The rest of the sheets MUST be named according to the FHIR resource they are
mapping, e.g. Patient, Encounter.
The last column, Resource Type indicates whether the mapping type and MUST be
one of these types:
one-to-one: Information in the mapping sheet pertains to one record of the corresponding FHIRflat resource. One row in the raw data generates a single resource.one-to-many: Information in the mapping sheet pertains to multiple records of the corresponding FHIRflat resource. One row in the raw data generates multiple resources.
FHIRflat resource mapping sheet#
Each resource mapping sheet MUST be named the same name as the corresponding FHIRflat resource.
These columns are mandatory:
raw_variable: Name of column in raw data to be transformedraw_response: Assignment toraw_variablefor which the transformation rule should be applied. Theraw_responseentry MUST be of the following patternresponse, commentwhereresponseMAY be an integer andcommentis free text, separated by a comma. For example ifraw_variableisgenderthenraw_responsecould be2, Femalewhich indicates that the transformation rule corresponding to this row should be invoked whengender=2.
Rules: Each row in a resource mapping sheet represents a transformation
rule, invoked when the source entry for the variable in raw_variable
matches that in raw_response. The transformation target variables are the rest
of the columns in the mapping sheet.
Successive raw_response entries corresponding to the same raw_variable MAY
keep the corresponding raw_variable entry blank, example:
raw_variable |
raw_response |
gender.code |
gender.system |
|---|---|---|---|
demog_gender |
1, Man |
446151000124109 |
|
2, Woman |
446141000124107 |
||
99, Unknown |
Column headers#
The rest of the column headers MUST correspond to assignments to FHIRflat
variables. Nested data in FHIR are represented as flattened versions in
FHIRflat, with the parent and child fields joined by a period (.):
"gender": {
"system": "http://snomed.info/sct",
"code": 446141000124107
}
would flatten to
{
"gender.system": "http://snomed.info/sct",
"gender.code": 446141000124107
}
ISARIC specific FHIR extensions MUST be specified
with an extension. prefix as required in the FHIR standard.
Column values#
Values in a column (other than raw_response and raw_variable) are constants
(such as the preceding example) or refer to column names in the raw data. Column
names MUST be delimited by angle brackets, such as <column>.
Concatenation with constants or other field names uses the plus (+)
operator. The special column name <FIELD> refers to the value in the column
referred to by raw_variable.
Examples:
For a patient ID field,
raw_variable = subjidandid = Patient/+<FIELD>which assigns the value of the subject ID to theidcolumn in FHIRflat with a namespace prefix.To combine date of admission and time of admission,
raw_variable = dates_admdate, andactualPeriod.start = <FIELD>+<dates_admtime>.
Conditional assignments#
Conditional assignments are made using the if not statement. A value of
<A> if not <B> selects field A if B is blank.
Creating list assignments#
FHIR resource entries can be lists, such as a set of codes. This is handled in
the mapping specification by allowing multiple matches for a (raw_variable,
raw_response) tuple. The corresponding resource assignments (such as
coding.code) are then collected into lists.