Docs/Getting Started

Quickstart: Getting Started in 5 Minutes

This guide walks you through installing Portiere, running your first mapping pipeline, and understanding what happens at each stage.


Table of Contents


Installation

Base Install

pip install portiere-health

The base package includes schema mapping, concept mapping, ETL generation, and local-mode support with the Polars compute engine.

Optional Extras

Install additional capabilities as needed:

ExtraCommandWhat It Adds
qualitypip install portiere-health[quality]Great Expectations integration for data profiling and validation
faisspip install portiere-health[faiss]FAISS vector search backend for the knowledge layer
allpip install portiere-health[all]Everything above, plus Spark and DuckDB engine support

Verify Installation

python -c "import portiere; print(portiere.__version__)"

Your First Mapping Pipeline

The entire workflow fits in five lines of Python. No cloud account, no configuration files, and no API keys required -- Portiere runs fully locally by default.

Standardize: Raw Data to Target Standard

import portiere
from portiere.engines import PolarsEngine

# 1. Initialize a standardization project
project = portiere.init(
    name="Hospital OMOP Migration",
    task="standardize",                   # default
    target_model="omop_cdm_v5.4",        # default
    engine=PolarsEngine(),
)

# 2. Ingest a source file
source = project.add_source("patients.csv")

# 3. Map source columns to OMOP CDM tables and fields
schema_map = project.map_schema(source)

# 4. Map clinical codes to standard vocabularies
concept_map = project.map_concepts(codes=["E11.9", "I10"])

# 5. Generate ETL scripts and validate output
etl = project.run_etl(source, output_dir="./output", schema_mapping=schema_map, concept_mapping=concept_map)

Cross-Map: Transform Between Standards

import portiere
import polars as pl
from portiere.engines import PolarsEngine

# Initialize a cross-mapping project
project = portiere.init(
    name="OMOP to FHIR Export",
    task="cross_map",
    source_standard="omop_cdm_v5.4",
    target_model="fhir_r4",
    engine=PolarsEngine(),
)

# Cross-map — source/target inferred from project settings
omop_df = pl.read_csv("./omop_output/person.csv")
fhir_df = project.cross_map(source_entity="person", data=omop_df)

Step-by-Step Breakdown

Step 1: Initialize a Project

from portiere.engines import PolarsEngine

project = portiere.init(name="Hospital OMOP Migration", engine=PolarsEngine())

portiere.init() creates a new project targeting OMOP CDM v5.4 by default. The task parameter (default "standardize") declares the project's purpose -- use "cross_map" for transforming between standards. The engine parameter is required and specifies the compute engine used for data processing and ETL execution. All configuration is auto-discovered -- Portiere looks for a portiere.yaml in your working directory, then falls back to environment variables, then defaults. In local mode, project artifacts are stored under ~/.portiere/projects/.

Step 2: Ingest a Source File

source = project.add_source("patients.csv")

add_source() registers your data file with the project. Portiere auto-detects the format from the file extension (CSV, Parquet, JSON, etc.). The returned source dictionary contains metadata about the file -- column names, inferred types, and row counts -- which downstream stages use.

Step 3: Map Schema

schema_map = project.map_schema(source)

map_schema() analyzes each column in your source and proposes mappings to OMOP CDM target tables and fields. Each mapping includes a confidence score. By default:

  • Scores >= 0.90 are auto-accepted.
  • Scores between 0.70 and 0.90 are flagged for review.
  • Scores below 0.70 require manual mapping.

The returned SchemaMapping object lets you inspect, approve, or override any mapping.

Step 4: Map Concepts

concept_map = project.map_concepts(codes=["E11.9", "I10"])

map_concepts() resolves clinical codes (ICD-10, SNOMED, LOINC, RxNorm, etc.) to OMOP standard concept IDs. You can pass individual codes as shown above, or point it at a source to map all code columns automatically -- no need to list codes yourself:

# Auto-detect and map all code columns in the source
concept_map = project.map_concepts(source=source)

# Or specify which columns contain codes
concept_map = project.map_concepts(source=source, code_columns=["diagnosis_code", "drug_code"])

The knowledge layer searches across all configured vocabularies (SNOMED, LOINC, RxNorm, ICD10CM by default) to find the best standard concept match for each source code. Confidence routing applies:

  • = 0.95: auto-mapped

  • 0.80 -- 0.95: needs verification
  • 0.70 -- 0.80: needs review
  • < 0.70: requires manual resolution

Step 5: Run ETL

etl = project.run_etl(source, output_dir="./output", schema_mapping=schema_map, concept_mapping=concept_map)

run_etl() generates and executes ETL transformation scripts using the Polars engine (default). Output lands in the specified directory as OMOP-formatted tables.


Database Sources

Portiere can also read directly from databases. Instead of a file path, provide a connection_string with a table or query:

import portiere
from portiere.engines import PolarsEngine

project = portiere.init(name="Hospital Migration", engine=PolarsEngine())

# Read from a PostgreSQL table
source = project.add_source(
    connection_string="postgresql://user:pass@localhost:5432/ehr_db",
    table="patients"
)

# Or use a custom SQL query
source = project.add_source(
    connection_string="postgresql://user:pass@localhost:5432/ehr_db",
    query="SELECT * FROM patients WHERE admission_date >= '2024-01-01'",
    name="recent_patients"
)

# The rest of the pipeline works the same
schema_map = project.map_schema(source)
concept_map = project.map_concepts(source=source, code_columns=["diagnosis_code"])

Supported connection URIs include PostgreSQL (postgresql://), MySQL (mysql://), SQLite (sqlite:///), and any database supported by your engine's connector.


Configuring Vocabularies

The vocabularies parameter in portiere.init() controls which standard vocabularies Portiere searches when mapping clinical codes. It is optional -- if omitted, Portiere uses a sensible default set.

Default Vocabularies

# These two calls are equivalent
project = portiere.init(name="My Project", engine=PolarsEngine())
project = portiere.init(name="My Project", engine=PolarsEngine(), vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM"])
VocabularyCoversExamples
SNOMEDClinical findings, procedures, body structuresDiabetes mellitus, Appendectomy
LOINCLaboratory tests, clinical observationsHemoglobin A1c, Blood pressure
RxNormMedications and drug ingredientsMetformin, Aspirin 325mg
ICD10CMDiagnosis codes (US billing)E11.9 (Type 2 diabetes), I10 (Hypertension)

Customizing Vocabularies

Select only the vocabularies relevant to your data domain:

# Diagnosis-only mapping (no lab or drug codes)
project = portiere.init(
    name="Diagnosis Migration",
    engine=PolarsEngine(),
    vocabularies=["SNOMED", "ICD10CM"]
)

# Lab data only
project = portiere.init(
    name="Lab Results",
    engine=PolarsEngine(),
    vocabularies=["LOINC"]
)

# Include additional vocabularies
project = portiere.init(
    name="Full Migration",
    engine=PolarsEngine(),
    vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM", "CPT4", "HCPCS"]
)

How Vocabularies Work Locally

When running the pipeline locally, vocabularies are resolved through the knowledge layer backend. The vocabularies parameter acts as a filter -- it tells Portiere which vocabulary IDs to search within the knowledge layer index.

The knowledge layer must be pre-indexed with vocabulary data before use. By default, Portiere uses a BM25s backend that ships with the base vocabulary set. For production use or custom vocabularies, configure a knowledge layer backend:

from portiere.config import PortiereConfig, KnowledgeLayerConfig

config = PortiereConfig(
    knowledge_layer=KnowledgeLayerConfig(
        backend="bm25s",
        bm25s_corpus_path="/path/to/vocabulary/corpus",
    )
)

project = portiere.init(
    name="My Project",
    engine=PolarsEngine(),
    vocabularies=["SNOMED", "LOINC"],
    config=config,
)

For a complete guide on downloading, preparing, and indexing vocabularies, see Vocabulary Setup.


What Just Happened?

Portiere executed a 5-stage pipeline behind the scenes:

Ingest --> Profile --> Schema Map --> ETL Generation --> Validate
StageWhat It DoesMethod
IngestRegisters source data, detects format, extracts metadataadd_source()
ProfileAnalyzes data quality -- completeness, distributions, anomaliesprofile() (optional, requires [quality] extra)
Schema MapMaps source columns to OMOP CDM target tables and fieldsmap_schema()
Concept MapResolves clinical codes to OMOP standard conceptsmap_concepts()
ETL + ValidateGenerates transformation code, executes it, validates outputrun_etl() / validate()

The knowledge layer powers concept mapping with hybrid search: dense vector retrieval (SapBERT embeddings) combined with sparse keyword matching (BM25), fused via Reciprocal Rank Fusion (RRF). This delivers high accuracy across diverse clinical terminologies without requiring manual dictionary management.


Adding Data Profiling and Validation

If you installed the quality extra, you can add profiling and validation to the pipeline:

import portiere
from portiere.engines import PolarsEngine

project = portiere.init(name="Hospital OMOP Migration", engine=PolarsEngine())
source = project.add_source("patients.csv")

# Profile source data quality before mapping
profile_report = project.profile(source)
print(f"Completeness: {profile_report['completeness']}")

# Run schema and concept mapping
schema_map = project.map_schema(source)
concept_map = project.map_concepts(source=source)

# Generate ETL output
etl = project.run_etl(source, output_dir="./output", schema_mapping=schema_map, concept_mapping=concept_map)

# Validate ETL output against OMOP CDM conformance rules
validation = project.validate(etl_result=etl)
print(f"Conformance: {validation['conformance']}")

Next Steps

TopicDocument
Full SDK API reference with all method signatures and examples02-unified-api-reference.md
Configuration deep dive -- LLM providers, thresholds, engines, YAML config03-configuration.md
Local, cloud, and hybrid operating modes04-operating-modes.md