Docs/API Reference

Unified SDK API Reference

Complete reference for the Portiere SDK public API. All signatures, parameters, return types, and usage examples.


Table of Contents


Module Entry Point: portiere.init()

Creates and returns a new Project instance.

Signature

def init(
    name: str,
    *,
    engine: AbstractEngine,
    task: str = "standardize",
    target_model: str = "omop_cdm_v5.4",
    source_standard: Optional[str] = None,
    vocabularies: Optional[list[str]] = None,
    config: Optional[PortiereConfig] = None
) -> Project

Parameters

ParameterTypeDefaultDescription
namestrrequiredHuman-readable project name. Used as the project identifier in local storage and cloud sync.
engineAbstractEnginerequiredCompute engine instance for data processing and ETL execution. Import from portiere.engines (e.g., PolarsEngine(), SparkEngine(spark), PandasEngine()).
taskstr"standardize"Project task type. "standardize" maps raw source data to a target standard (full pipeline). "cross_map" transforms between two clinical data standards.
target_modelstr"omop_cdm_v5.4"Target CDM version. For standardize: the target standard. For cross_map: the target standard to transform into.
source_standardOptional[str]NoneSource standard for cross_map tasks (e.g., "omop_cdm_v5.4"). Required when task="cross_map".
vocabulariesOptional[list[str]]["SNOMED", "LOINC", "RxNorm", "ICD10CM"]Standard vocabularies to use for concept mapping.
configOptional[PortiereConfig]NoneConfiguration object. When None, auto-discovered via PortiereConfig.discover().

Returns

Project -- A fully initialized project instance ready for pipeline operations.

Examples

Minimal initialization (all defaults):

import portiere
from portiere.engines import PolarsEngine

project = portiere.init(name="My Hospital Migration", engine=PolarsEngine())

Custom vocabularies and target model:

import portiere
from portiere.engines import PolarsEngine

project = portiere.init(
    name="Lab Data Migration",
    engine=PolarsEngine(),
    target_model="omop_cdm_v5.4",
    vocabularies=["LOINC", "SNOMED", "UCUM"]
)

Cross-map project:

import portiere
from portiere.engines import PolarsEngine

project = portiere.init(
    name="OMOP to FHIR Export",
    engine=PolarsEngine(),
    task="cross_map",
    source_standard="omop_cdm_v5.4",
    target_model="fhir_r4",
)

# source_standard and target are inferred from project settings
fhir_df = project.cross_map(source_entity="person", data=omop_df)

Explicit configuration (cloud pipeline):

import portiere
from portiere.config import PortiereConfig, LLMConfig
from portiere.engines import PolarsEngine

config = PortiereConfig(
    api_key="pt_sk_your_api_key",
    llm=LLMConfig(provider="openai", api_key="sk-...", model="gpt-4o")
)

project = portiere.init(name="Cloud-Assisted Migration", engine=PolarsEngine(), config=config)

Behavior

  1. If config is None, calls PortiereConfig.discover() to resolve configuration from (in order): portiere.yaml in the current directory, environment variables with PORTIERE_ prefix, built-in defaults.
  2. Registers the provided engine instance (an AbstractEngine subclass) as the compute engine for the project.
  3. Sets up the knowledge layer for concept search based on config.knowledge_layer.
  4. Creates or loads a local project directory under config.local_project_dir / <name>.

Project Class

The Project class is the central orchestrator for all pipeline operations. It is a plain Python class (not a Pydantic model).

Important: Do not instantiate Project directly. Always use portiere.init().


Properties

engine

The compute engine instance used for ETL operations.

project.engine
# Returns the configured engine (Polars, Spark, DuckDB, Snowpark, or Pandas)

client

The API client for cloud operations. Only active when an api_key is configured (cloud or hybrid mode).

project.client
# Returns the Portiere API client, or None in pure local mode

storage

The storage backend managing project artifacts.

project.storage
# Returns the local or cloud storage handler

config

The resolved PortiereConfig for this project.

project.config
# Returns PortiereConfig instance
print(project.config.effective_mode)       # "local"
print(project.config.llm.model)  # "gpt-4o"

add_source()

Registers a data source with the project. Supports both file-based and database sources.

Signature

def add_source(
    path: Optional[str] = None,
    name: Optional[str] = None,
    format: Optional[str] = None,
    *,
    connection_string: Optional[str] = None,
    table: Optional[str] = None,
    query: Optional[str] = None,
) -> dict

Parameters

ParameterTypeDefaultDescription
pathOptional[str]NonePath to the source data file (CSV, Parquet, JSON, etc.). Mutually exclusive with connection_string.
nameOptional[str]NoneHuman-readable name for this source. Defaults to the filename stem or table name.
formatOptional[str]NoneFile format override. Auto-detected from extension when None. Set to "database" automatically for database sources.
connection_stringOptional[str]NoneDatabase connection URI (e.g., postgresql://user:pass@host/db). Mutually exclusive with path.
tableOptional[str]NoneDatabase table name to read. Requires connection_string.
queryOptional[str]NoneSQL query to execute. Requires connection_string.

Either path or connection_string must be provided (not both). Database sources require at least one of table or query.

Returns

dict -- Source metadata dictionary containing:

KeyTypeDescription
namestrSource name
pathstrResolved file path (file sources only)
formatstrDetected format ("csv", "parquet", "database", etc.)
connection_stringstrDatabase URI (database sources only)
tablestrTable name (database sources with table)
querystrSQL query (database sources with query)

Examples

Auto-detect format from extension:

source = project.add_source("patients.csv")
print(source["format"])
# "csv"

Explicit name and format:

source = project.add_source(
    "data/raw/encounters_2024.tsv",
    name="Emergency Encounters",
    format="csv"  # TSV is parsed as CSV with tab delimiter
)

Multiple sources in one project:

patients = project.add_source("patients.csv")
encounters = project.add_source("encounters.csv")
conditions = project.add_source("conditions.csv")

Database source — read a table:

source = project.add_source(
    connection_string="postgresql://user:pass@localhost:5432/ehr_db",
    table="patients"
)
# source["format"] == "database"
# source["name"] == "patients" (auto-derived from table name)

Database source — custom SQL query:

source = project.add_source(
    connection_string="postgresql://user:pass@localhost:5432/ehr_db",
    query="SELECT * FROM patients WHERE admission_date >= '2024-01-01'",
    name="recent_patients"
)

profile()

Runs data quality profiling on a source using Great Expectations. Analyzes completeness, distributions, type consistency, and anomalies.

Requires: pip install portiere-health[quality]

Signature

def profile(source: dict) -> dict

Parameters

ParameterTypeDefaultDescription
sourcedictrequiredSource metadata dictionary returned by add_source().

Returns

dict -- Profiling report containing:

KeyTypeDescription
completenessfloatOverall data completeness score (0.0 -- 1.0)
columnslist[dict]Per-column profiling results (null rate, unique count, distribution stats)
anomalieslist[dict]Detected data quality anomalies
expectationslist[dict]Generated Great Expectations suite

Example

source = project.add_source("patients.csv")
profile_report = project.profile(source)

print(f"Overall completeness: {profile_report['completeness']:.2%}")
# Overall completeness: 94.30%

for col in profile_report["columns"]:
    if col["null_rate"] > 0.1:
        print(f"  Warning: {col['name']} has {col['null_rate']:.1%} nulls")

map_schema()

Maps source columns to OMOP CDM target tables and fields using AI-assisted matching.

Signature

def map_schema(source: dict) -> SchemaMapping

Parameters

ParameterTypeDefaultDescription
sourcedictrequiredSource metadata dictionary returned by add_source().

Returns

SchemaMapping -- A mapping object containing all proposed column-to-CDM-field mappings with confidence scores and status.

Example

source = project.add_source("patients.csv")
schema_map = project.map_schema(source)

# Inspect mappings
for item in schema_map.items:
    print(f"{item.source_column} -> {item.target_table}.{item.target_column} "
          f"(confidence: {item.confidence:.2f}, status: {item.status})")

# Output:
# patient_id -> person.person_id (confidence: 0.98, status: APPROVED)
# birth_date -> person.birth_datetime (confidence: 0.95, status: APPROVED)
# gender -> person.gender_concept_id (confidence: 0.87, status: NEEDS_REVIEW)
# zip_code -> location.zip (confidence: 0.72, status: NEEDS_REVIEW)

Confidence Routing (Default Thresholds)

ConfidenceStatusAction
>= 0.90APPROVEDAuto-accepted
0.70 -- 0.90NEEDS_REVIEWFlagged for human review
< 0.70UNMAPPEDRequires manual mapping

Thresholds are configurable via PortiereConfig.thresholds.schema_mapping. See 03-configuration.md.


map_concepts()

Maps clinical codes and terms to OMOP standard concepts using hybrid search (dense + sparse retrieval with RRF fusion).

Signature

def map_concepts(
    source: Optional[dict] = None,
    codes: Optional[list[str]] = None,
    code_columns: Optional[list[str]] = None,
    vocabularies: Optional[list[str]] = None
) -> ConceptMapping

Parameters

ParameterTypeDefaultDescription
sourceOptional[dict]NoneSource metadata dictionary. When provided, maps all code columns found in the source.
codesOptional[list[str]]NoneExplicit list of clinical codes to map (e.g., ["E11.9", "I10"]).
code_columnsOptional[list[str]]NoneSpecific column names in the source to treat as code columns.
vocabulariesOptional[list[str]]NoneVocabulary filter for this mapping. Overrides the project-level vocabulary list.

At least one of source or codes must be provided.

Returns

ConceptMapping -- A mapping object containing resolved concept mappings with candidates, confidence scores, and approval status.

Examples

Auto-discover and map all codes from a source (recommended):

The simplest approach -- point map_concepts() at a source and let the knowledge layer find and map all clinical codes automatically. No need to list codes or specify target vocabularies; Portiere searches across all configured vocabularies (SNOMED, LOINC, RxNorm, ICD10CM by default).

source = project.add_source("conditions.csv")
concept_map = project.map_concepts(source=source)

summary = concept_map.summary()
print(summary)
# {"auto_mapped": 142, "needs_review": 18, "manual_required": 3}

Map specific code columns from a source:

source = project.add_source("encounters.csv")
concept_map = project.map_concepts(
    source=source,
    code_columns=["diagnosis_code", "procedure_code"]
)

Map specific columns with vocabulary filter:

source = project.add_source("lab_results.csv")
concept_map = project.map_concepts(
    source=source,
    code_columns=["loinc_code", "result_unit"],
    vocabularies=["LOINC", "UCUM"]
)

Map explicit codes (when you already know the codes):

concept_map = project.map_concepts(codes=["E11.9", "I10", "J45.0"])

summary = concept_map.summary()
print(summary)
# {"auto_mapped": 2, "needs_review": 1, "manual_required": 0}

Confidence Routing (Default Thresholds)

ConfidenceCategoryAction
>= 0.95auto_mappedAuto-accepted, no review needed
0.80 -- 0.95needs_reviewHigh confidence but should be verified
0.70 -- 0.80needs_reviewMedium confidence, review recommended
< 0.70manual_requiredLow confidence, manual resolution required

Approval and Override

# Approve a mapping without changing the candidate (sets method to AUTO)
concept_map.approve(code="E11.9")

# Override a mapping with a specific concept (sets method to OVERRIDE)
concept_map.override(code="I10", concept_id=320128, concept_name="Essential hypertension")

Important: approve() without candidates sets the mapping method to AUTO. override() sets the method to OVERRIDE (not MANUAL).


run_etl()

Generates and executes ETL transformation scripts that convert source data into OMOP CDM-formatted output.

Signature

def run_etl(
    source: dict,
    output_dir: str,
    schema_mapping: Optional[SchemaMapping] = None,
    concept_mapping: Optional[ConceptMapping] = None
) -> dict

Parameters

ParameterTypeDefaultDescription
sourcedictrequiredSource metadata dictionary returned by add_source().
output_dirstrrequiredDirectory path for OMOP-formatted output files.
schema_mappingOptional[SchemaMapping]NoneSchema mapping to apply. If None, loads the most recent mapping from the project.
concept_mappingOptional[ConceptMapping]NoneConcept mapping to apply. If None, loads the most recent mapping from the project.

Returns

dict -- ETL result dictionary containing:

KeyTypeDescription
output_pathstrPath to the generated output directory
tableslist[dict]Per-table output metadata (name, row count, file path)
enginestrCompute engine used
duration_secondsfloatTotal ETL execution time

Example

source = project.add_source("patients.csv")
schema_map = project.map_schema(source)
concept_map = project.map_concepts(source=source)

etl = project.run_etl(
    source,
    output_dir="./omop_output",
    schema_mapping=schema_map,
    concept_mapping=concept_map
)

print(f"ETL completed in {etl['duration_seconds']:.1f}s using {etl['engine']}")
for table in etl["tables"]:
    print(f"  {table['name']}: {table['row_count']} rows -> {table['file_path']}")

# ETL completed in 2.3s using polars
#   person: 15230 rows -> ./omop_output/person.parquet
#   condition_occurrence: 48102 rows -> ./omop_output/condition_occurrence.parquet

validate()

Validates ETL output against OMOP CDM conformance rules, completeness checks, and plausibility constraints.

Requires: pip install portiere-health[quality]

Signature

def validate(
    etl_result: Optional[dict] = None,
    output_path: Optional[str] = None
) -> dict

Parameters

ParameterTypeDefaultDescription
etl_resultOptional[dict]NoneETL result dictionary returned by run_etl().
output_pathOptional[str]NoneDirect path to OMOP output directory. Use when validating previously generated output.

At least one of etl_result or output_path must be provided.

Returns

dict -- Validation report containing:

KeyTypeDescription
completenessfloatData completeness score (0.0 -- 1.0)
conformancefloatCDM structural conformance score (0.0 -- 1.0)
plausibilityfloatClinical plausibility score (0.0 -- 1.0)
passedboolWhether all scores meet configured thresholds
detailslist[dict]Per-check results with pass/fail and messages

Default Validation Thresholds

MetricThreshold
min_completeness0.95
min_conformance0.98
min_plausibility0.90

Example

etl = project.run_etl(source, output_dir="./output", schema_mapping=schema_map)

validation = project.validate(etl_result=etl)

if validation["passed"]:
    print("All validation checks passed.")
else:
    print("Validation issues found:")
    for check in validation["details"]:
        if not check["passed"]:
            print(f"  FAIL: {check['message']}")

# Validate a previously generated output directory
validation = project.validate(output_path="./output")

push()

Pushes the current local project (mappings, configurations, and metadata) to Portiere Cloud. Enables collaboration, cloud-based review, and hybrid workflows.

Open-source SDK: push() raises NotImplementedError in the open-source SDK. Cloud sync requires Portiere Cloud. See https://portiere.io for details.

Signature

def push() -> str

Parameters

None.

Returns

str -- The cloud project ID assigned to (or already associated with) this project.

Requirements

  • Portiere Cloud subscription (not available in the open-source SDK).
  • config.api_key must be set (via config, environment variable PORTIERE_API_KEY, or portiere.yaml).
  • The Portiere Cloud endpoint must be reachable.

Example

import portiere
from portiere.config import PortiereConfig
from portiere.engines import PolarsEngine

config = PortiereConfig(
    api_key="pt_sk_your_api_key",
    storage="local",  # Keep artifacts local
)
project = portiere.init(name="Hospital Migration", engine=PolarsEngine(), config=config)

# ... perform local mapping work ...

# Push to cloud for team review
cloud_id = project.push()
print(f"Project synced to cloud: {cloud_id}")
# Project synced to cloud: proj_a1b2c3d4

See 04-operating-modes.md for detailed hybrid sync workflows.


pull()

Pulls the latest project state from Portiere Cloud, updating local mappings and metadata. Used in hybrid workflows to sync changes made by collaborators or via the cloud review UI.

Open-source SDK: pull() raises NotImplementedError in the open-source SDK. Cloud sync requires Portiere Cloud. See https://portiere.io for details.

Signature

def pull() -> None

Parameters

None.

Returns

None. Updates the local project state in place.

Example

# Pull latest changes from cloud (e.g., after a reviewer approves mappings)
project.pull()

# Load the updated mappings
schema_map = project.load_schema_mapping()
concept_map = project.load_concept_mapping()

load_schema_mapping()

Loads the most recent schema mapping from the project's local storage. Useful for resuming work or applying previously computed mappings to a new ETL run.

Signature

def load_schema_mapping() -> SchemaMapping

Parameters

None.

Returns

SchemaMapping -- The most recently saved schema mapping for this project.

Example

from portiere.engines import PolarsEngine

# Resume work from a previous session
project = portiere.init(name="Hospital Migration", engine=PolarsEngine())

schema_map = project.load_schema_mapping()
print(f"Loaded {len(schema_map.items)} column mappings")

load_concept_mapping()

Loads the most recent concept mapping from the project's local storage.

Signature

def load_concept_mapping() -> ConceptMapping

Parameters

None.

Returns

ConceptMapping -- The most recently saved concept mapping for this project.

Example

from portiere.engines import PolarsEngine

project = portiere.init(name="Hospital Migration", engine=PolarsEngine())

concept_map = project.load_concept_mapping()
summary = concept_map.summary()
print(f"Auto-mapped: {summary['auto_mapped']}, Needs review: {summary['needs_review']}")

import_concept_mapping()

Import an existing concept mapping table into the project. Use this when you already have a mapping table (e.g., from a previous migration or manual curation).

Signature

def import_concept_mapping(
    path: str | None = None,
    dataframe: Any = None,
    records: list[dict] | None = None,
) -> ConceptMapping

Parameters

ParameterTypeDefaultDescription
pathstr | NoneNonePath to a CSV or JSON file containing mappings.
dataframeAnyNoneA Pandas, Polars, or Spark DataFrame with mapping data.
recordslist[dict] | NoneNoneA list of dicts, each with at least source_code.

Provide exactly one of path, dataframe, or records.

Returns

ConceptMapping -- The imported mapping, persisted to project storage.

Examples

# Import from CSV
concept_map = project.import_concept_mapping(path="my_mappings.csv")

# Import from a Polars DataFrame
concept_map = project.import_concept_mapping(dataframe=df)

# Import from records
concept_map = project.import_concept_mapping(records=[
    {"source_code": "E11.9", "target_concept_id": 201826, "target_concept_name": "Type 2 diabetes mellitus", "confidence": 0.98},
    {"source_code": "I10", "target_concept_id": 320128, "target_concept_name": "Essential hypertension", "confidence": 0.95},
])

export_concept_mapping()

Export the project's concept mapping to a file.

Signature

def export_concept_mapping(
    path: str,
    *,
    omop_format: bool = False,
) -> str

Parameters

ParameterTypeDefaultDescription
pathstrrequiredOutput file path (.csv or .json).
omop_formatboolFalseIf True, export as OMOP source_to_concept_map format.

Returns

str -- The output file path.

Examples

# Export to CSV for SME review
project.export_concept_mapping("mappings_for_review.csv")

# Export to JSON
project.export_concept_mapping("mappings.json")

# Export in OMOP source_to_concept_map format
project.export_concept_mapping("source_to_concept_map.csv", omop_format=True)

SchemaMapping

Returned by project.map_schema() and project.load_schema_mapping().

Key Attributes

AttributeTypeDescription
itemslist[SchemaMappingItem]Individual column mappings

SchemaMappingItem Attributes

AttributeTypeDescription
source_tablestrSource table name (defaults to "")
source_columnstrSource column name
target_tablestrOMOP CDM target table
target_columnstrOMOP CDM target column
confidencefloatMapping confidence score (0.0 -- 1.0)
statusMappingStatusCurrent status: APPROVED, NEEDS_REVIEW, or UNMAPPED

ConceptMapping

Returned by project.map_concepts(), project.load_concept_mapping(), or project.import_concept_mapping().

Key Methods

MethodSignatureDescription
summary()() -> dictReturns {"auto_mapped": int, "needs_review": int, "manual_required": int}
approve()(code: str) -> NoneApproves a mapping. Sets method to AUTO when no candidates specified.
override()(code: str, concept_id: int, concept_name: str) -> NoneOverrides mapping with a specific concept. Sets method to OVERRIDE.
to_csv()(path: str) -> NoneExport mappings to CSV.
to_json()(path: str) -> NoneExport mappings to JSON.
to_dataframe()() -> pd.DataFrameExport mappings as a pandas DataFrame.
to_source_to_concept_map()() -> list[dict]Export in OMOP source_to_concept_map format.

Class Methods (Import)

MethodSignatureDescription
from_csv()(path: str) -> ConceptMappingImport from CSV file. Handles column aliases.
from_json()(path: str) -> ConceptMappingImport from JSON file.
from_dataframe()(df: Any) -> ConceptMappingImport from Pandas, Polars, or Spark DataFrame.
from_records()(records: list[dict]) -> ConceptMappingImport from list of dicts.

ConceptMappingMethod

ValueWhen Set
AUTOapprove() without explicit candidates
OVERRIDEoverride() with a specific concept

Knowledge Layer Backends

The knowledge layer provides concept search via pluggable backends. All backends implement a common interface with search(), get_concept(), and index_concepts() methods.

build_knowledge_layer()

Factory function that creates and returns a configured knowledge layer backend instance.

Signature

def build_knowledge_layer(
    config: KnowledgeLayerConfig,
    *,
    embedding_gateway: Optional[EmbeddingGateway] = None,
    hybrid_backends: Optional[list[str]] = None,
    **backend_kwargs: Any,
) -> AbstractKnowledgeBackend

Parameters

ParameterTypeDefaultDescription
configKnowledgeLayerConfigrequiredKnowledge layer configuration specifying the backend and its settings.
embedding_gatewayOptional[EmbeddingGateway]NoneEmbedding gateway instance for backends that require dense vectors. When None, a default gateway is created from the project config.
hybrid_backendsOptional[list[str]]NoneOverride for config.hybrid_backends. Explicit list of sub-backends to combine in hybrid mode.
**backend_kwargsAnyAdditional keyword arguments passed to the backend constructor.

BM25sBackend

Sparse keyword-based retrieval using BM25s. Zero external dependencies.

class BM25sBackend(AbstractKnowledgeBackend):
    def __init__(self, corpus_path: Optional[str] = None): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

FAISSBackend

Dense semantic search using FAISS indexes.

class FAISSBackend(AbstractKnowledgeBackend):
    def __init__(self, index_path: str, metadata_path: str, embedding_gateway: EmbeddingGateway): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

ElasticsearchBackend

Full-text and structured search using Elasticsearch.

class ElasticsearchBackend(AbstractKnowledgeBackend):
    def __init__(self, url: str, index: str = "portiere_concepts"): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

ChromaDBBackend

Vector search using ChromaDB (embedded or persistent).

class ChromaDBBackend(AbstractKnowledgeBackend):
    def __init__(
        self,
        collection: str = "portiere_concepts",
        persist_path: Optional[Path] = None,
        embedding_gateway: Optional[EmbeddingGateway] = None,
    ): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

Install: pip install portiere-health[chromadb]

PGVectorBackend

PostgreSQL-native vector search using the pgvector extension.

class PGVectorBackend(AbstractKnowledgeBackend):
    def __init__(
        self,
        connection_string: str,
        table: str = "portiere_concepts",
        embedding_gateway: Optional[EmbeddingGateway] = None,
    ): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

Install: pip install portiere-health[pgvector]

MongoDBBackend

MongoDB Atlas Vector Search backend.

class MongoDBBackend(AbstractKnowledgeBackend):
    def __init__(
        self,
        connection_string: str,
        database: str = "portiere",
        collection: str = "concepts",
        embedding_gateway: Optional[EmbeddingGateway] = None,
    ): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

Install: pip install portiere-health[mongodb]

QdrantBackend

High-performance vector search using Qdrant.

class QdrantBackend(AbstractKnowledgeBackend):
    def __init__(
        self,
        url: str,
        collection: str = "portiere_concepts",
        api_key: Optional[str] = None,
        embedding_gateway: Optional[EmbeddingGateway] = None,
    ): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

Install: pip install portiere-health[qdrant]

MilvusBackend

Scalable vector database for large-scale deployments.

class MilvusBackend(AbstractKnowledgeBackend):
    def __init__(
        self,
        uri: str,
        collection: str = "portiere_concepts",
        embedding_gateway: Optional[EmbeddingGateway] = None,
    ): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

Install: pip install portiere-health[milvus]

HybridBackend

Combines multiple backends using Reciprocal Rank Fusion (RRF) or weighted fusion.

class HybridBackend(AbstractKnowledgeBackend):
    def __init__(
        self,
        backends: list[AbstractKnowledgeBackend],
        fusion_method: Literal["rrf", "weighted"] = "rrf",
        rrf_k: int = 60,
    ): ...
    def search(self, query: str, top_k: int = 10, vocabularies: Optional[list[str]] = None) -> list[ConceptCandidate]: ...
    def get_concept(self, concept_id: int) -> Optional[ConceptRecord]: ...
    def index_concepts(self, concepts: list[ConceptRecord]) -> None: ...

See 17-hybrid-mode.md for hybrid search configuration examples.


KnowledgeLayerConfig Reference

class KnowledgeLayerConfig(BaseModel):
    backend: Literal["bm25s", "faiss", "elasticsearch", "hybrid",
                     "chromadb", "pgvector", "mongodb", "qdrant", "milvus"] = "bm25s"

    # Existing backend settings
    faiss_index_path: Optional[str] = None
    faiss_metadata_path: Optional[str] = None
    elasticsearch_url: Optional[str] = None
    elasticsearch_index: str = "portiere_concepts"
    bm25s_corpus_path: Optional[str] = None

    # Hybrid settings
    hybrid_backends: list[str] = ["bm25s", "faiss"]
    fusion_method: Literal["rrf", "weighted"] = "rrf"
    rrf_k: int = 60

    # ChromaDB
    chroma_collection: str = "portiere_concepts"
    chroma_persist_path: Optional[Path] = None

    # PGVector
    pgvector_connection_string: Optional[str] = None
    pgvector_table: str = "portiere_concepts"

    # MongoDB
    mongodb_connection_string: Optional[str] = None
    mongodb_database: str = "portiere"
    mongodb_collection: str = "concepts"

    # Qdrant
    qdrant_url: Optional[str] = None
    qdrant_collection: str = "portiere_concepts"
    qdrant_api_key: Optional[str] = None

    # Milvus
    milvus_uri: Optional[str] = None
    milvus_collection: str = "portiere_concepts"

See 03-configuration.md for the full field reference table.


For full details on configuring the SDK behavior -- thresholds, LLM providers, compute engines, and more -- see 03-configuration.md.