Quickstart: Getting Started in 5 Minutes

This guide walks you through installing Portiere, running your first mapping pipeline, and understanding what happens at each stage.

Want to skip the explanation and just see it work?
pip install portiere-health[polars,quality]
portiere quickstart
Runs the full 5-stage pipeline against ~20 bundled synthetic patients in <60 s, fully offline. Produces a real OMOP mapping plus a reproducibility manifest.

Installation
Your First Mapping Pipeline
What Just Happened?
Next Steps

Installation

Base Install

pip install portiere-health

The base package includes schema mapping, concept mapping, ETL generation, and local-mode support with the Polars compute engine.

Optional Extras

Install additional capabilities as needed:

Extra	Command	What It Adds
quality	`pip install "portiere-health[quality]"`	Great Expectations integration for data profiling and validation
faiss	`pip install "portiere-health[faiss]"`	FAISS vector search backend for the knowledge layer
fhir (v0.3.0+)	`pip install "portiere-health[fhir]"`	US Core 6.1.0 + mCODE 2.0.0 profile validation; FHIR Bundle / NDJSON export
review (v0.3.1+)	`pip install "portiere-health[review]"`	Streamlit-based Mapping Review UI (`portiere review <project-dir>`)
all	`pip install "portiere-health[all]"`	Everything above, plus Spark and DuckDB engine support

Verify Installation

python -c "import portiere; print(portiere.__version__)"

Your First Mapping Pipeline

The entire workflow fits in five lines of Python. No cloud account, no configuration files, and no API keys required -- Portiere runs fully locally by default.

Standardize: Raw Data to Target Standard

import portiere
from portiere.engines import PolarsEngine

# 1. Initialize a standardization project
project = portiere.init(
    name="Hospital OMOP Migration",
    task="standardize",                   # default
    target_model="omop_cdm_v5.4",        # default
    engine=PolarsEngine(),
)

# 2. Ingest a source file
source = project.add_source("patients.csv")

# 3. Map source columns to OMOP CDM tables and fields
schema_map = project.map_schema(source)

# 4. Map clinical codes to standard vocabularies
concept_map = project.map_concepts(codes=["E11.9", "I10"])

# 5. Generate ETL scripts and validate output
etl = project.run_etl(source, output_dir="./output", schema_mapping=schema_map, concept_mapping=concept_map)

Cross-Map: Transform Between Standards

import portiere
import polars as pl
from portiere.engines import PolarsEngine

# Initialize a cross-mapping project
project = portiere.init(
    name="OMOP to FHIR Export",
    task="cross_map",
    source_standard="omop_cdm_v5.4",
    target_model="fhir_r4",
    engine=PolarsEngine(),
)

# Cross-map — source/target inferred from project settings
omop_df = pl.read_csv("./omop_output/person.csv")
fhir_df = project.cross_map(source_entity="person", data=omop_df)

Step-by-Step Breakdown

Step 1: Initialize a Project

from portiere.engines import PolarsEngine

project = portiere.init(name="Hospital OMOP Migration", engine=PolarsEngine())

portiere.init() creates a new project targeting OMOP CDM v5.4 by default. The task parameter (default "standardize") declares the project's purpose -- use "cross_map" for transforming between standards. The engine parameter is required and specifies the compute engine used for data processing and ETL execution. All configuration is auto-discovered -- Portiere looks for a portiere.yaml in your working directory, then falls back to environment variables, then defaults. In local mode, project artifacts are stored under ~/.portiere/projects/.

Step 2: Ingest a Source File

source = project.add_source("patients.csv")

add_source() registers your data file with the project. Portiere auto-detects the format from the file extension (CSV, Parquet, JSON, etc.). The returned source dictionary contains metadata about the file -- column names, inferred types, and row counts -- which downstream stages use.

Step 3: Map Schema

schema_map = project.map_schema(source)

map_schema() analyzes each column in your source and proposes mappings to OMOP CDM target tables and fields. Each mapping includes a confidence score. By default:

Scores >= 0.90 are auto-accepted.
Scores between 0.70 and 0.90 are flagged for review.
Scores below 0.70 require manual mapping.

The returned SchemaMapping object lets you inspect, approve, or override any mapping.

Step 4: Map Concepts

concept_map = project.map_concepts(codes=["E11.9", "I10"])

map_concepts() resolves clinical codes (ICD-10, SNOMED, LOINC, RxNorm, etc.) to OMOP standard concept IDs. You can pass individual codes as shown above, or point it at a source to map all code columns automatically -- no need to list codes yourself:

# Auto-detect and map all code columns in the source
concept_map = project.map_concepts(source=source)

# Or specify which columns contain codes
concept_map = project.map_concepts(source=source, code_columns=["diagnosis_code", "drug_code"])

The knowledge layer searches across all configured vocabularies (SNOMED, LOINC, RxNorm, ICD10CM by default) to find the best standard concept match for each source code. Confidence routing applies:

= 0.95: auto-mapped
0.80 -- 0.95: needs verification
0.70 -- 0.80: needs review
< 0.70: requires manual resolution

Step 5: Run ETL

etl = project.run_etl(source, output_dir="./output", schema_mapping=schema_map, concept_mapping=concept_map)

run_etl() generates and executes ETL transformation scripts using the Polars engine (default). Output lands in the specified directory as OMOP-formatted tables.

Database Sources

Portiere can also read directly from databases. Instead of a file path, provide a connection_string with a table or query:

import portiere
from portiere.engines import PolarsEngine

project = portiere.init(name="Hospital Migration", engine=PolarsEngine())

# Read from a PostgreSQL table
source = project.add_source(
    connection_string="postgresql://user:pass@localhost:5432/ehr_db",
    table="patients"
)

# Or use a custom SQL query
source = project.add_source(
    connection_string="postgresql://user:pass@localhost:5432/ehr_db",
    query="SELECT * FROM patients WHERE admission_date >= '2024-01-01'",
    name="recent_patients"
)

# The rest of the pipeline works the same
schema_map = project.map_schema(source)
concept_map = project.map_concepts(source=source, code_columns=["diagnosis_code"])

Supported connection URIs include PostgreSQL (postgresql://), MySQL (mysql://), SQLite (sqlite:///), and any database supported by your engine's connector.

Configuring Vocabularies

The vocabularies parameter in portiere.init() controls which standard vocabularies Portiere searches when mapping clinical codes. It is optional -- if omitted, Portiere uses a sensible default set.

Default Vocabularies

# These two calls are equivalent
project = portiere.init(name="My Project", engine=PolarsEngine())
project = portiere.init(name="My Project", engine=PolarsEngine(), vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM"])

Vocabulary	Covers	Examples
SNOMED	Clinical findings, procedures, body structures	Diabetes mellitus, Appendectomy
LOINC	Laboratory tests, clinical observations	Hemoglobin A1c, Blood pressure
RxNorm	Medications and drug ingredients	Metformin, Aspirin 325mg
ICD10CM	Diagnosis codes (US billing)	E11.9 (Type 2 diabetes), I10 (Hypertension)

Customizing Vocabularies

Select only the vocabularies relevant to your data domain:

# Diagnosis-only mapping (no lab or drug codes)
project = portiere.init(
    name="Diagnosis Migration",
    engine=PolarsEngine(),
    vocabularies=["SNOMED", "ICD10CM"]
)

# Lab data only
project = portiere.init(
    name="Lab Results",
    engine=PolarsEngine(),
    vocabularies=["LOINC"]
)

# Include additional vocabularies
project = portiere.init(
    name="Full Migration",
    engine=PolarsEngine(),
    vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM", "CPT4", "HCPCS"]
)

How Vocabularies Work Locally

When running the pipeline locally, vocabularies are resolved through the knowledge layer backend. The vocabularies parameter acts as a filter -- it tells Portiere which vocabulary IDs to search within the knowledge layer index.

The knowledge layer must be pre-indexed with vocabulary data before use. By default, Portiere uses a BM25s backend that ships with the base vocabulary set. For production use or custom vocabularies, configure a knowledge layer backend:

from portiere.config import PortiereConfig, KnowledgeLayerConfig

config = PortiereConfig(
    knowledge_layer=KnowledgeLayerConfig(
        backend="bm25s",
        bm25s_corpus_path="/path/to/vocabulary/corpus",
    )
)

project = portiere.init(
    name="My Project",
    engine=PolarsEngine(),
    vocabularies=["SNOMED", "LOINC"],
    config=config,
)

For a complete guide on downloading, preparing, and indexing vocabularies, see Vocabulary Setup.

What Just Happened?

Portiere executed a 5-stage pipeline behind the scenes:

Ingest --> Profile --> Schema Map --> ETL Generation --> Validate

Stage	What It Does	Method
Ingest	Registers source data, detects format, extracts metadata	`add_source()`
Profile	Analyzes data quality -- completeness, distributions, anomalies	`profile()` (optional, requires `[quality]` extra)
Schema Map	Maps source columns to OMOP CDM target tables and fields	`map_schema()`
Concept Map	Resolves clinical codes to OMOP standard concepts	`map_concepts()`
ETL + Validate	Generates transformation code, executes it, validates output	`run_etl()` / `validate()`

The knowledge layer powers concept mapping with hybrid search: dense vector retrieval (SapBERT embeddings) combined with sparse keyword matching (BM25), fused via Reciprocal Rank Fusion (RRF). This delivers high accuracy across diverse clinical terminologies without requiring manual dictionary management.

Adding Data Profiling and Validation

If you installed the quality extra, you can add profiling and validation to the pipeline:

import portiere
from portiere.engines import PolarsEngine

project = portiere.init(name="Hospital OMOP Migration", engine=PolarsEngine())
source = project.add_source("patients.csv")

# Profile source data quality before mapping
profile_report = project.profile(source)
print(f"Completeness: {profile_report['completeness']}")

# Run schema and concept mapping
schema_map = project.map_schema(source)
concept_map = project.map_concepts(source=source)

# Generate ETL output
etl = project.run_etl(source, output_dir="./output", schema_mapping=schema_map, concept_mapping=concept_map)

# Validate ETL output against OMOP CDM conformance rules
validation = project.validate(etl_result=etl)
print(f"Conformance: {validation['conformance']}")

Next Steps

Topic	Document
Full SDK API reference with all method signatures and examples	02-unified-api-reference.md
Configuration deep dive -- LLM providers, thresholds, engines, YAML config	03-configuration.md
Local, cloud, and hybrid operating modes	04-operating-modes.md

Solutions

Open Source SDK