Medical Data Mapping Automated
Transform clinical data into OMOP CDM or FHIR R4 with AI-powered schema and concept mapping. Review with your team, generate production ETL in minutes.
pip install portiere-healthBuilt for Clinical Data Engineers by Clinical Data Engineers
5-Stage Pipeline
Ingest, profile, schema-map, generate ETL, and validate — fully automated with AI-assisted confidence routing.
OMOP CDM + FHIR R4
Map to OMOP CDM v5.3/v5.4 or FHIR R4 with vocabulary-aware concept matching across SNOMED, LOINC, RxNorm, and more.
Hybrid Search
Dense vector (FAISS + SapBERT/OpenAI/Ollama) combined with BM25 lexical and Elasticsearch full-text search, fused with Reciprocal Rank Fusion. Choose your embedding and reranking provider.
BYO-LLM
Use OpenAI, Anthropic, Azure, Ollama, or AWS Bedrock. Your data stays under your control with any LLM provider.
9 Vector Store Backends
FAISS, BM25s, Elasticsearch, ChromaDB, PGVector, MongoDB Atlas, Qdrant, Milvus, or Hybrid — pick the backend that fits your infrastructure. All run locally.
100% Open Source
Apache 2.0-licensed. Run everything locally — your data never leaves your machine. No cloud dependency, no vendor lock-in, no usage limits.
How It Works
Connect Your Data
Point Portiere at your CSV, Parquet, or database tables. Choose your engine — Spark for scale, Polars for speed, Pandas for simplicity.
import portiere
from portiere.engines import PolarsEngine
project = portiere.init(
name="Hospital Migration",
engine=PolarsEngine(),
target_model="omop_cdm_v5.4",
vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM"],
)
source = project.add_source("patients.csv")
print(source.profile())
# Source: patients.csv | 48,231 rows × 11 cols | engine: polarsAI Maps Everything
Schema mapping + concept mapping with confidence routing. High-confidence items auto-accept; the rest queue for human review.
schema_map = project.map_schema(source)
concept_map = project.map_concepts(source=diagnoses_source)
print(schema_map.summary())
# {'total': 11, 'auto_accepted': 9, 'needs_review': 2}
print(concept_map.summary())
# {'total': 15, 'auto_mapped': 12, 'needs_review': 3}Review & Generate ETL
Approve, override, or reject mappings. Generate production-ready ETL scripts. Export for clinical SME review or load directly.
# Review and approve mappings
schema_map.approve("patient_id")
schema_map.override("lab_date",
target_table="measurement",
target_column="measurement_date")
concept_map.approve("E11.9")
# Generate ETL and validate
etl = project.run_etl(source, output_dir="./output")
report = project.validate(etl_result=etl)
print(report.summary())
# ✓ 11/11 columns mapped | 3 ETL scripts | 0 validation errors
# Export for SME review
project.export_concept_mapping("concept_review.csv")Choose Your Search Backend
Plug in the retrieval strategy that fits your use case. All backends run locally — no data leaves your machine.
BM25s Lexical Search
Fast lexical search using BM25s. Zero dependencies, runs entirely in-process. Great for exact terminology matching and quick prototyping.
from portiere import PortiereConfig, KnowledgeLayerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="bm25s", # Lexical search
bm25s_corpus_path="./vocab/concepts.json",
),
)FAISS Semantic Search
Dense vector search with SapBERT biomedical embeddings. Best for finding semantically similar concepts even when terminology differs.
from portiere import PortiereConfig, KnowledgeLayerConfig
from portiere.config import EmbeddingConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="faiss", # Dense vector search
faiss_index_path="./vocab/concepts.index",
faiss_metadata_path="./vocab/concepts.meta.json",
),
embedding=EmbeddingConfig(
provider="huggingface", # Or "openai", "bedrock", "ollama"
model="cambridgeltl/SapBERT-from-PubMedBERT-fulltext",
),
)Elasticsearch Full-Text
BM25 full-text search powered by an existing Elasticsearch cluster. Ideal for large vocabularies and teams that already run ES infrastructure.
from portiere import PortiereConfig, KnowledgeLayerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="elasticsearch", # ES full-text search
elasticsearch_url="http://localhost:9200",
elasticsearch_index="omop_concepts",
),
)ChromaDB
Embedded vector database with automatic persistence. No external services needed — perfect for local development and small-to-medium vocabularies.
from portiere import PortiereConfig, KnowledgeLayerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="chromadb", # Embedded vector DB
chroma_persist_path="./vocab/chroma/",
),
)PGVector
PostgreSQL-native vector search via the pgvector extension. Use your existing Postgres infrastructure for both relational data and vector similarity search.
from portiere import PortiereConfig, KnowledgeLayerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="pgvector", # Postgres vector search
pgvector_connection_string="postgresql://user:pass@localhost:5432/vocab",
pgvector_table="concept_embeddings",
),
)MongoDB Atlas Vector Search
Vector search on MongoDB Atlas. Ideal if your organization already runs MongoDB and you want a single platform for documents and embeddings.
from portiere import PortiereConfig, KnowledgeLayerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="mongodb_atlas", # MongoDB Atlas vector search
mongodb_uri="mongodb+srv://user:pass@cluster.mongodb.net/",
mongodb_database="portiere",
mongodb_collection="concept_embeddings",
),
)Qdrant
High-performance vector search engine with advanced filtering and payload indexing. Self-host for production-grade similarity search.
from portiere import PortiereConfig, KnowledgeLayerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="qdrant", # Qdrant vector search
qdrant_url="http://localhost:6333",
qdrant_collection="omop_concepts",
),
)Milvus
Distributed vector database built for scale. Handles billions of vectors with GPU acceleration. Deploy standalone or as a cluster.
from portiere import PortiereConfig, KnowledgeLayerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="milvus", # Milvus vector search
milvus_uri="http://localhost:19530",
milvus_collection="omop_concepts",
),
)Hybrid Search + Reranking
Combine any two backends with Reciprocal Rank Fusion, then rerank with a cross-encoder. Configure which backends to fuse via hybrid_backends.
from portiere import PortiereConfig, KnowledgeLayerConfig
from portiere.config import EmbeddingConfig, RerankerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="hybrid",
hybrid_backends=["bm25s", "faiss"], # Any 2 backends
faiss_index_path="./vocab/concepts.index",
faiss_metadata_path="./vocab/concepts.meta.json",
bm25s_corpus_path="./vocab/concepts.json",
fusion_method="rrf", # Reciprocal Rank Fusion
rrf_k=60,
),
embedding=EmbeddingConfig(
provider="huggingface",
model="cambridgeltl/SapBERT-from-PubMedBERT-fulltext",
),
reranker=RerankerConfig(
provider="huggingface",
model="cross-encoder/ms-marco-MiniLM-L-6-v2",
),
)Build Your Knowledge Layer from Athena
Download standard vocabularies from athena.ohdsi.org, then build searchable indexes in one call. Supports all 9 backends — BM25s, FAISS, ChromaDB, PGVector, MongoDB Atlas, Qdrant, Milvus, Elasticsearch, or hybrid — your concepts never leave your machine.
Step 1 — Build Indexes from Athena Download
Point build_knowledge_layer() at your Athena CSV directory. It parses CONCEPT.csv and CONCEPT_SYNONYM.csv, filters for standard concepts, and builds backend-specific indexes.
from portiere.knowledge import build_knowledge_layer
# Build indexes from your Athena download — pick any backend
paths = build_knowledge_layer(
athena_path="./data/athena/", # Directory with CONCEPT.csv
output_path="./data/vocab/",
backend="hybrid", # "bm25s", "faiss", "chromadb",
# "pgvector", "mongodb_atlas",
# "qdrant", "milvus", "elasticsearch",
# or "hybrid"
vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM"],
)
print(paths)
# {
# 'bm25s_corpus_path': './data/vocab/concepts.json',
# 'faiss_index_path': './data/vocab/concepts.index',
# 'faiss_metadata_path': './data/vocab/concepts.meta.json'
# }Step 2 — Use in Your Project
Pass the returned paths straight into your project config. Portiere handles the rest — embedding, searching, reranking, and confidence routing.
import portiere
from portiere import PortiereConfig, KnowledgeLayerConfig
from portiere.config import EmbeddingConfig, RerankerConfig
config = PortiereConfig(
knowledge_layer=KnowledgeLayerConfig(
backend="hybrid",
bm25s_corpus_path=paths["bm25s_corpus_path"],
faiss_index_path=paths["faiss_index_path"],
faiss_metadata_path=paths["faiss_metadata_path"],
fusion_method="rrf",
rrf_k=60,
),
embedding=EmbeddingConfig(
provider="huggingface", # Or "openai", "bedrock", "ollama"
model="cambridgeltl/SapBERT-from-PubMedBERT-fulltext",
),
reranker=RerankerConfig(
provider="huggingface",
model="cross-encoder/ms-marco-MiniLM-L-6-v2",
),
)
project = portiere.init(
name="Hospital Migration",
config=config,
target_model="omop_cdm_v5.4",
)
concept_map = project.map_concepts(source=diagnoses_source)
print(concept_map.summary())
# {'total': 342, 'auto_mapped': 298, 'needs_review': 38, 'manual_required': 6}Export for Clinical SME Review
Export AI-generated mappings to CSV for your clinical Subject Matter Experts to review in Excel or Google Sheets. Reload their edits back into Portiere to finalize.
Step 1 — Export Mappings for Review
Export schema and concept mappings to CSV. Items are pre-categorized by confidence — SMEs focus on needs_review rows while high-confidence mappings are auto-accepted.
# Export mappings to CSV for SME review
project.export_concept_mapping("concept_review.csv")
# Or export to JSON (includes full candidate lists & provenance)
project.export_concept_mapping("mappings_full.json")
# Preview what SMEs will see
df = concept_map.to_dataframe()
print(df.head())
# source_code source_description source_column source_count target_concept_id target_concept_name target_vocabulary_id target_domain_id confidence method
# 0 E11.9 Type 2 diabetes mellitus diagnosis 42 201826 Type 2 diabetes mellitus SNOMED Condition 0.98 auto
# 1 R51 Headache diagnosis 18 378253 Headache SNOMED Condition 0.96 auto
# 2 Z87.891 Personal history of NTD diagnosis 7 4099154 History of nicotine dep. SNOMED Condition 0.74 review
# 3 X42.LOCAL Custom lab code lab_result 3 None None None None 0.00 unmapped
print(concept_map.summary())
# {'total': 342, 'auto_mapped': 298, 'needs_review': 38, 'manual_required': 6}Step 2 — Reload SME Edits & Generate ETL
After your SME reviews the CSV — approving, rejecting, or overriding rows — reload it back and generate production ETL. For OMOP targets, export the standard source_to_concept_map table directly.
# Reload SME-reviewed CSV back into the project
reviewed = project.import_concept_mapping(path="concept_review_edited.csv")
print(reviewed.summary())
# {'total': 342, 'auto_mapped': 298, 'needs_review': 0, 'manual_required': 0}
# Export as OMOP source_to_concept_map for database loading
project.export_concept_mapping("source_to_concept_map.csv", omop_format=True)
# Generate ETL and validate
etl = project.run_etl(source, output_dir="./output")
report = project.validate(etl_result=etl)
print(report.summary())
# ✓ 342/342 concepts mapped | 5 ETL scripts | 0 validation errorsExplore the Documentation
Comprehensive guides covering everything from quickstart to production deployment.
Start Mapping Today
Open source, Apache 2.0-licensed, and free forever. Install with pip and start mapping clinical data in minutes.
pip install portiere-health