fhirflat#
fhirflat is a library for transforming FHIR resources in NDJSON or native Python dictionaries to a flat structure that can be written to a Parquet file.
- fhirflat.convert_data_to_flat(data: str, date_format: str, timezone: str, folder_name: str = 'fhirflat_output', mapping_files_types: tuple[dict, dict] | None = None, sheet_id: str | None = None, subject_id='subjid', validate: bool = True, compress_format: None | str = None, parallel: bool = False)#
Takes raw clinical data (currently assumed to be a one-row-per-patient format like RedCap exports) and produces a folder of FHIRflat files, one per resource. Takes either local mapping files, or a Google Sheet ID containing the mapping files.
- Parameters:
data – The path to the raw clinical data file.
date_format – The format of the dates in the data file. E.g. “%Y-%m-%d”
timezone – The timezone of the dates in the data file. E.g. “Europe/London”
folder_name – The name of the folder to store the FHIRflat files.
mapping_files_types – A tuple containing two dictionaries, one with the mapping files for each resource type and one with the mapping type (either one-to-one or one-to-many) for each resource type.
sheet_id – The Google Sheet ID containing the mapping files. The first sheet must contain the mapping types - one column listing the resource name, and another describing whether the mapping is one-to-one or one-to-many. The subsequent sheets must be named by resource, and contain the mapping for that resource.
subject_id – The name of the column containing the subject ID in the data file.
validate – Whether to validate the FHIRflat files after creation.
compress_format – If the output folder should be zipped, and if so with what format.
parallel – Whether to parallelize the data conversion over different resources.
- fhirflat.validate(folder_name: str, compress_format: str | None = None)#
Takes a folder containing (optionally compressed) FHIRflat files and validates them against the FHIR. File names must correspond to the FHIR resource types they represent. E.g. a file containing Patient resources must be named “patient.parquet”.
- Parameters:
folder_name – The path to the folder containing the FHIRflat files, or compressed file.
compress_format – The format to compress the validated files into.