Docs/Data Models

Data Models

Portiere's mapping pipeline produces structured data models that represent the results of schema mapping (source columns to target model columns) and concept mapping (source codes to standard vocabulary concepts). These models provide methods for reviewing, approving, rejecting, and exporting mappings programmatically.


Table of Contents

  1. Target Model Definitions
  2. Schema Mapping Models
  3. Concept Mapping Models
  4. Cross-Standard Mapping Models
  5. Approval Workflows
  6. Export Formats

Target Model Definitions

Target models define the clinical data standards that Portiere maps source data into. Standard definitions are stored as YAML files and loaded via the YAMLTargetModel class.

YAMLTargetModel

YAMLTargetModel is a generic loader that reads any standard definition from a YAML file. It replaces the need for separate Python classes per standard.

from portiere.models.target_model import get_target_model

# Load a built-in standard
model = get_target_model("omop_cdm_v5.4")

# Key methods
model.get_schema()              # {table: [columns]} or {resource: [fields]}
model.get_target_descriptions() # {table.column: description}
model.get_source_patterns()     # {pattern: (table, column)}
model.generate_ddl()            # SQL DDL for relational standards
model.validate_output(engine, path)  # Validate ETL output

Properties

PropertyTypeDescription
namestrStandard identifier (e.g., "omop_cdm_v5.4")
versionstrStandard version (e.g., "v5.4")
standard_typestrOne of "relational", "resource", "segment", "archetype"

Supported Standards

StandardIdentifierTypeDescription
OMOP CDM v5.4"omop_cdm_v5.4"relationalOHDSI Observational Medical Outcomes Partnership
FHIR R4"fhir_r4"resourceHL7 Fast Healthcare Interoperability Resources
HL7 v2.5.1"hl7v2_2.5.1"segmentHL7 Version 2 messaging
OpenEHR 1.0.4"openehr_1.0.4"archetypeopenEHR archetype-based EHR

Standard YAML Schema

Each standard definition follows this structure:

name: "omop_cdm_v5.4"
version: "v5.4"
standard_type: "relational"    # relational | resource | segment | archetype
organization: "OHDSI"

entities:
  person:                       # table / resource / segment / archetype
    description: "Patient demographics"
    fields:
      person_id:
        type: "integer"
        required: true
        description: "Unique patient identifier"
      gender_concept_id:
        type: "integer"
        required: true
        description: "Patient gender"
        vocabulary: "Gender"
    source_patterns:            # Patterns that match source column names
      - "patient_id"
      - "subject_id"
    embedding_descriptions:     # For semantic schema mapping
      person_id: "unique patient identifier, subject ID, MRN"

vocabulary_systems: {}          # vocabulary_id → URI mapping

Standard YAML files are located in packages/sdk/src/portiere/standards/. You can provide a custom YAML file via custom_standard_path in PortiereConfig or by passing "custom:/path/to/file.yaml" to get_target_model().

See Multi-Standard Support for a comprehensive guide.


Schema Mapping Models

Schema mapping connects source data columns to target data model tables and columns (e.g., mapping a CSV column patient_dob to the OMOP person.birth_datetime field).

MappingStatus

The MappingStatus enum tracks the lifecycle state of each mapping item. It is shared between schema and concept mapping.

from portiere.models import MappingStatus

class MappingStatus(str, Enum):
    AUTO_ACCEPTED = "auto_accepted"
    NEEDS_REVIEW  = "needs_review"
    APPROVED      = "approved"
    REJECTED      = "rejected"
    OVERRIDDEN    = "overridden"
    UNMAPPED      = "unmapped"
StatusDescription
AUTO_ACCEPTEDConfidence score met the auto-acceptance threshold. No human review required.
NEEDS_REVIEWConfidence score is in the review band. Human review recommended.
APPROVEDA reviewer has explicitly approved this mapping.
REJECTEDA reviewer has explicitly rejected this mapping.
OVERRIDDENA reviewer has overridden the suggested mapping with a different target.
UNMAPPEDConfidence score was too low to suggest a mapping, or no suitable candidates were found.

SchemaMappingItem

Represents a single source-column-to-target-column mapping.

from portiere.models import SchemaMappingItem

class SchemaMappingItem(BaseModel):
    source_column: str
    source_table: str = ""
    target_table: str
    target_column: str
    confidence: float
    status: MappingStatus
    candidates: List[dict] = []
    override_target_table: Optional[str] = None
    override_target_column: Optional[str] = None

Note: The source_table field defaults to an empty string. This allows tests and simple use cases to omit it when the source is a single file.

Fields

FieldTypeDefaultDescription
source_columnstr(required)Name of the column in the source data
source_tablestr""Name of the source table (optional for single-file sources)
target_tablestr(required)Suggested target table in the data model
target_columnstr(required)Suggested target column in the data model
confidencefloat(required)Mapping confidence score (0.0 to 1.0)
statusMappingStatus(required)Current mapping lifecycle status
candidatesList[dict][]Alternative mapping candidates with scores
override_target_tablestrNoneReviewer-specified target table (when overridden)
override_target_columnstrNoneReviewer-specified target column (when overridden)

Properties

item = SchemaMappingItem(
    source_column="patient_dob",
    target_table="person",
    target_column="birth_datetime",
    confidence=0.92,
    status=MappingStatus.AUTO_ACCEPTED,
)

# effective_target_table returns the override if set, otherwise the original target
print(item.effective_target_table)  # "person"

# effective_target_column returns the override if set, otherwise the original target
print(item.effective_target_column)  # "birth_datetime"

When an override is applied:

item.override_target_table = "observation"
item.override_target_column = "observation_date"

print(item.effective_target_table)   # "observation"
print(item.effective_target_column)  # "observation_date"

Methods

approve(target_table=None, target_column=None)

Approves the mapping. If target_table and target_column are provided, the mapping is overridden to the specified target; otherwise the current suggestion is approved as-is.

# Approve the suggested mapping
item.approve()
print(item.status)  # MappingStatus.APPROVED

# Approve with an override
item.approve(target_table="observation", target_column="observation_date")
print(item.status)  # MappingStatus.OVERRIDDEN
print(item.effective_target_table)   # "observation"
print(item.effective_target_column)  # "observation_date"

reject()

Rejects the mapping, marking it as rejected.

item.reject()
print(item.status)  # MappingStatus.REJECTED

SchemaMapping

A collection of SchemaMappingItem objects representing all column mappings for a source.

from portiere.models import SchemaMapping

class SchemaMapping(BaseModel):
    items: List[SchemaMappingItem]
    project: str
    source: str
    finalized: bool = False

Fields

FieldTypeDefaultDescription
itemsList[SchemaMappingItem](required)All mapping items for this source
projectstr(required)Project identifier
sourcestr(required)Source file or table identifier
finalizedboolFalseWhether the mapping has been finalized

Filter Methods

schema_mapping = project.map_schema(source=source)

# Get items that need human review
review_items = schema_mapping.needs_review()
# Returns: List[SchemaMappingItem] where status == NEEDS_REVIEW

# Get items that were auto-accepted
auto_items = schema_mapping.auto_accepted()
# Returns: List[SchemaMappingItem] where status == AUTO_ACCEPTED

Batch Operations

# Approve all items currently in NEEDS_REVIEW status
schema_mapping.approve_all()

Finalization

Finalization locks the mapping, preventing further changes. This signals that all mappings have been reviewed and the ETL stage can proceed.

schema_mapping.finalize()
print(schema_mapping.finalized)  # True

# Attempting to modify a finalized mapping raises an error

Summary

The summary() method returns a dictionary with aggregate statistics:

stats = schema_mapping.summary()
print(stats)
# {
#     "total": 25,
#     "auto_accepted": 18,
#     "needs_review": 5,
#     "approved": 0,
#     "unmapped": 2,
#     "auto_rate": 0.72,
# }
KeyTypeDescription
totalintTotal number of mapping items
auto_acceptedintItems with AUTO_ACCEPTED status
needs_reviewintItems with NEEDS_REVIEW status
approvedintItems with APPROVED status
unmappedintItems with UNMAPPED status
auto_ratefloatFraction of items that were auto-accepted (auto_accepted / total)

Concept Mapping Models

Concept mapping links source codes and descriptions to standard vocabulary concepts (e.g., mapping source code "250.00" with description "Diabetes mellitus" to SNOMED concept 201826 "Type 2 diabetes mellitus").

ConceptMappingMethod

The ConceptMappingMethod enum describes how a concept mapping was established.

from portiere.models import ConceptMappingMethod

class ConceptMappingMethod(str, Enum):
    AUTO     = "auto"
    REVIEW   = "review"
    MANUAL   = "manual"
    OVERRIDE = "override"
    UNMAPPED = "unmapped"

Important: This class is named ConceptMappingMethod, not MappingMethod.

MethodDescription
AUTOMapping was auto-accepted (confidence >= 0.95). Also set when approve() is called without candidates.
REVIEWMapping was flagged for review (confidence 0.70-0.95).
MANUALMapping requires manual intervention (confidence < 0.70).
OVERRIDEA reviewer has overridden the mapping with a manually specified concept.
UNMAPPEDNo suitable concept was found or the mapping was explicitly marked as unmappable.

ConceptCandidate

Represents a single candidate concept returned by the knowledge layer search.

from portiere.models import ConceptCandidate

class ConceptCandidate(BaseModel):
    concept_id: int
    concept_name: str
    vocabulary_id: str
    domain_id: str
    concept_class_id: str
    standard_concept: str
    score: float

Fields

FieldTypeDescription
concept_idintUnique concept identifier (e.g., OMOP concept_id)
concept_namestrHuman-readable concept name
vocabulary_idstrSource vocabulary (e.g., "SNOMED", "LOINC", "RxNorm")
domain_idstrConcept domain (e.g., "Condition", "Drug", "Measurement")
concept_class_idstrConcept class (e.g., "Clinical Finding", "Ingredient")
standard_conceptstrStandard concept flag ("S" = Standard, "C" = Classification)
scorefloatRelevance score from the knowledge layer (0.0 to 1.0)

Candidate Scoring

Candidates are returned sorted by score in descending order. The score combines signals from the knowledge layer backend:

  • BM25s/Elasticsearch: BM25 token-overlap score, normalized to [0, 1]
  • FAISS: Cosine similarity between source term and concept embeddings
  • Hybrid: RRF-fused score combining dense and sparse signals
  • After reranking: Cross-encoder relevance score replaces the initial score
# Access candidates for a mapping item
for candidate in item.candidates:
    print(f"  {candidate.concept_id}: {candidate.concept_name} "
          f"({candidate.vocabulary_id}) score={candidate.score:.3f}")

ConceptMappingItem

Represents a single source-code-to-concept mapping.

from portiere.models import ConceptMappingItem

class ConceptMappingItem(BaseModel):
    source_code: str
    source_description: str
    source_column: str
    source_count: int
    target_concept_id: Optional[int]
    target_concept_name: Optional[str]
    target_vocabulary_id: Optional[str]
    target_domain_id: Optional[str]
    confidence: float
    method: ConceptMappingMethod
    candidates: List[ConceptCandidate] = []
    provenance: Optional[dict] = None

Fields

FieldTypeDefaultDescription
source_codestr(required)Original code value from the source data
source_descriptionstr(required)Description associated with the source code
source_columnstr(required)Column in the source data containing this code
source_countint(required)Number of occurrences in the source data
target_concept_idintNoneMapped target concept ID
target_concept_namestrNoneMapped target concept name
target_vocabulary_idstrNoneTarget concept vocabulary
target_domain_idstrNoneTarget concept domain
confidencefloat(required)Mapping confidence score (0.0 to 1.0)
methodConceptMappingMethod(required)How this mapping was established
candidatesList[ConceptCandidate][]Candidate concepts from knowledge layer search
provenancedictNoneMetadata about how the mapping was produced (e.g., LLM verification details)

Properties

item = concept_mapping.items[0]

# is_mapped: True if a target concept has been assigned
print(item.is_mapped)  # True (if target_concept_id is not None)

# approved: True if method indicates the mapping has been approved
print(item.approved)

# rejected: True if the mapping has been rejected
print(item.rejected)

Methods

approve(candidate_index=0)

Approves the mapping using the candidate at the specified index. If no candidates exist, sets the method to AUTO.

# Approve using the top candidate (index 0)
item.approve()
print(item.method)             # ConceptMappingMethod.AUTO (if no candidates)
print(item.target_concept_id)  # Set from candidates[0].concept_id

# Approve using the second candidate
item.approve(candidate_index=1)
print(item.target_concept_id)  # Set from candidates[1].concept_id

reject()

Rejects the mapping, clearing the target concept.

item.reject()
print(item.rejected)  # True

override(concept_id, concept_name="", vocabulary_id="")

Overrides the mapping with a manually specified concept. Sets the method to OVERRIDE (not MANUAL).

item.override(
    concept_id=4029098,
    concept_name="Atrial fibrillation",
    vocabulary_id="SNOMED",
)
print(item.method)             # ConceptMappingMethod.OVERRIDE
print(item.target_concept_id)  # 4029098
print(item.target_concept_name)  # "Atrial fibrillation"

mark_unmapped()

Marks the item as having no valid mapping.

item.mark_unmapped()
print(item.method)  # ConceptMappingMethod.UNMAPPED

ConceptMapping

A collection of ConceptMappingItem objects representing all concept mappings for a source.

from portiere.models import ConceptMapping

class ConceptMapping(BaseModel):
    items: List[ConceptMappingItem]
    project: str
    source: str
    finalized: bool = False

Fields

FieldTypeDefaultDescription
itemsList[ConceptMappingItem](required)All concept mapping items for this source
projectstr(required)Project identifier
sourcestr(required)Source file or table identifier
finalizedboolFalseWhether the mapping has been finalized

Filter Methods

concept_mapping = project.map_concepts(
    source=source,
    schema_mapping=schema_mapping,
)

# Get items that need human review
review_items = concept_mapping.needs_review()
# Returns: List[ConceptMappingItem] where method == REVIEW

# Get items that were auto-mapped
auto_items = concept_mapping.auto_mapped()
# Returns: List[ConceptMappingItem] where method == AUTO

# Get items that have no mapping
unmapped_items = concept_mapping.unmapped()
# Returns: List[ConceptMappingItem] where method == UNMAPPED

Batch Operations

# Approve all items currently in REVIEW status using their top candidate
concept_mapping.approve_all()

Finalization

concept_mapping.finalize()
print(concept_mapping.finalized)  # True

Summary

The summary() method returns a dictionary with aggregate statistics:

stats = concept_mapping.summary()
print(stats)
# {
#     "total": 150,
#     "auto_mapped": 120,
#     "needs_review": 20,
#     "manual_required": 10,
#     "auto_rate": 0.80,
#     "coverage": 0.93,
# }

Important: The summary keys for ConceptMapping differ from SchemaMapping:

KeyTypeDescription
totalintTotal number of concept mapping items
auto_mappedintItems mapped automatically (method == AUTO)
needs_reviewintItems flagged for review (method == REVIEW)
manual_requiredintItems requiring manual mapping (method == MANUAL)
auto_ratefloatFraction auto-mapped (auto_mapped / total)
coveragefloatFraction of items with any mapping ((total - unmapped) / total)

Cross-Standard Mapping Models

Cross-standard mapping converts data already in one clinical standard to another standard (e.g., OMOP to FHIR, HL7 v2 to FHIR, FHIR to OpenEHR). This is a post-pipeline capability using CrossStandardMapper.

CrossStandardMapper

from portiere.local.cross_mapper import CrossStandardMapper

mapper = CrossStandardMapper("omop_cdm_v5.4", "fhir_r4")

# Map a single record
fhir_patient = mapper.map_record("person", {
    "person_id": 12345,
    "gender_concept_id": 8507,
    "birth_datetime": "1980-06-15",
})
# {"id": "12345", "gender": "male", "birthDate": "1980-06-15", ...}

# Map a list of records
fhir_patients = mapper.map_records("person", records_list)

# Map a DataFrame
fhir_df = mapper.map_dataframe("person", persons_df)

Crossmap YAML Schema

Cross-standard mappings are defined in YAML files under standards/crossmaps/:

source: "omop_cdm_v5.4"
target: "fhir_r4"

entity_map:
  person: "Patient"
  condition_occurrence: "Condition"

field_map:
  person.person_id:
    target: "Patient.id"
    transform: "str"
  person.gender_concept_id:
    target: "Patient.gender"
    transform: "omop_gender"

transforms:
  omop_gender:
    type: "value_map"
    mapping:
      8507: "male"
      8532: "female"
    default: "unknown"

Available Crossmaps

SourceTargetFile
OMOP CDM v5.4FHIR R4omop_to_fhir_r4.yaml
FHIR R4OMOP CDM v5.4fhir_r4_to_omop.yaml
HL7 v2.5.1FHIR R4hl7v2_to_fhir_r4.yaml
FHIR R4OpenEHR 1.0.4fhir_r4_to_openehr.yaml
OMOP CDM v5.4OpenEHR 1.0.4omop_to_openehr.yaml

Transform Types

Built-in transform types available in crossmap YAML:

TransformDescription
passthroughCopy value as-is
str / int / floatType casting
value_mapStatic lookup table
formatDate/string formatting
codeable_conceptWrap into FHIR CodeableConcept
fhir_referenceCreate FHIR Reference
dv_coded_textCreate openEHR DV_CODED_TEXT
dv_quantityCreate openEHR DV_QUANTITY
vocabulary_lookupCross-vocabulary mapping via VocabularyBridge

See Cross-Standard Mapping for the complete reference.

Project Integration

# Cross-map via the project object
fhir_data = project.cross_map(
    source_standard="omop_cdm_v5.4",
    target_standard="fhir_r4",
    source_entity="person",
    data=persons_df,
)

Approval Workflows

Schema Mapping Review Workflow

import portiere
from portiere.config import PortiereConfig
from portiere.engines import PolarsEngine

project = portiere.init(name="data_models_demo", engine=PolarsEngine(), config=PortiereConfig(...))
source = project.add_source("data/patients.csv")
profile = project.profile(source)
schema_mapping = project.map_schema(source=source)

# Step 1: Check the summary
print(schema_mapping.summary())
# {"total": 20, "auto_accepted": 15, "needs_review": 3, "approved": 0, "unmapped": 2, "auto_rate": 0.75}

# Step 2: Review items that need attention
for item in schema_mapping.needs_review():
    print(f"\nSource: {item.source_column}")
    print(f"Suggested: {item.target_table}.{item.target_column} "
          f"(confidence: {item.confidence:.2f})")
    print(f"Candidates: {item.candidates}")

    # Decision: approve, reject, or override
    if item.confidence > 0.85:
        item.approve()
    else:
        item.approve(
            target_table="measurement",
            target_column="value_as_number",
        )

# Step 3: Handle unmapped items
for item in [i for i in schema_mapping.items if i.status.value == "unmapped"]:
    print(f"Unmapped: {item.source_column}")
    # Either approve with a manual target or leave unmapped

# Step 4: Finalize
schema_mapping.finalize()

Concept Mapping Review Workflow

concept_mapping = project.map_concepts(
    source=source,
    schema_mapping=schema_mapping,
)

# Step 1: Check the summary
print(concept_mapping.summary())
# {"total": 150, "auto_mapped": 120, "needs_review": 20, "manual_required": 10,
#  "auto_rate": 0.80, "coverage": 0.93}

# Step 2: Review items flagged for review
for item in concept_mapping.needs_review():
    print(f"\nSource: {item.source_code} - {item.source_description}")
    print(f"Current mapping: {item.target_concept_id} - {item.target_concept_name}")
    print(f"Confidence: {item.confidence:.3f}")
    print("Candidates:")
    for i, c in enumerate(item.candidates):
        print(f"  [{i}] {c.concept_id}: {c.concept_name} "
              f"({c.vocabulary_id}, {c.domain_id}) score={c.score:.3f}")

# Step 3: Take action on each item
item = concept_mapping.needs_review()[0]

# Option A: Approve the top candidate
item.approve(candidate_index=0)

# Option B: Approve a different candidate
item.approve(candidate_index=2)

# Option C: Override with a known concept
item.override(
    concept_id=4029098,
    concept_name="Atrial fibrillation",
    vocabulary_id="SNOMED",
)

# Option D: Reject the mapping
item.reject()

# Option E: Mark as unmappable
item.mark_unmapped()

# Step 4: Batch approve remaining review items
concept_mapping.approve_all()

# Step 5: Finalize
concept_mapping.finalize()

Export Formats

to_source_to_concept_map()

The ConceptMapping class provides a to_source_to_concept_map() method that exports the mapping in a format compatible with the OMOP source_to_concept_map table:

source_to_concept_map = concept_mapping.to_source_to_concept_map()

# Returns a list of dictionaries, one per mapped item:
# [
#     {
#         "source_code": "250.00",
#         "source_concept_id": 0,
#         "source_vocabulary_id": "ICD9CM",
#         "source_code_description": "Diabetes mellitus type II",
#         "target_concept_id": 201826,
#         "target_vocabulary_id": "SNOMED",
#         "valid_start_date": "2024-01-01",
#         "valid_end_date": "2099-12-31",
#         "invalid_reason": None,
#     },
#     ...
# ]

This export format can be loaded directly into the OMOP CDM source_to_concept_map table or used to build ETL transformation logic.

import pandas as pd

# Convert to DataFrame for further processing
stcm_df = pd.DataFrame(source_to_concept_map)
stcm_df.to_csv("source_to_concept_map.csv", index=False)

See Also