Fidelity

`cifflow.fidelity.check`

Fidelity comparison for CIF sources.

check_fidelity compares two CIF sources — files, paths, or pre-parsed CifFile objects — by ingesting both into in-memory SQLite databases and comparing the resulting data at the row level.

Known limitations

ValueType for structured tables ValueType is not stored for structured table columns; only the raw string value is persisted. ValueType fidelity for schema-known tags is therefore not checkable. For _cif_fallback, value_type is stored and compared directly.

SU fidelity in _cif_fallback For structured tables, SU columns are normalised with Decimal.normalize() so that 0.001 and 0.0010 compare equal. For _cif_fallback, SU values are embedded in the full value(su) string (e.g. 3.992(1)) and are compared as raw strings. Equivalent SU representations such as 3.992(1) and 3.9920(10) will compare as unequal.

Default-filled values (_cif_synthetic) Values filled from enumeration_default during ingestion are excluded from comparison. An explicit value in one source and a default-filled value in the other will produce a "row_content" mismatch even if identical. (_cif_synthetic is specced but not yet implemented in the ingestion layer; this step is a no-op until it is.)

version parameter The version parameter is not yet propagated to the parser as a fallback default. Version detection uses the file magic line; files without a magic line are parsed as CIF 1.1 regardless of version.

UUID-keyed tables When comparing sources where one uses natural primary keys and another uses generated UUID keys (e.g. ALL_BLOCKS output merging multiple CIF blocks), all PK columns of UUID-keyed tables and all FK columns pointing to those tables are stripped from the row representation in both connections. This allows content comparison without key-structure comparison.

`FidelityReport` `dataclass`

Result of a :func:check_fidelity call.

Attributes:

Name	Type	Description
`passed`	`bool`	`True` when no mismatches were found.
`mismatches`	`list[FidelityMismatch]`	Ordered list of all :class:`FidelityMismatch` objects found.

Source code in src/cifflow/fidelity/check.py

@dataclass
class FidelityReport:
    """Result of a :func:`check_fidelity` call.

    Attributes
    ----------
    passed
        ``True`` when no mismatches were found.
    mismatches
        Ordered list of all :class:`FidelityMismatch` objects found.
    """

    passed: bool
    mismatches: list[FidelityMismatch]

`FidelityMismatch` `dataclass`

A single semantic difference found between two CIF sources.

Attributes:

Name	Type	Description
`kind`	`str`	Machine-readable category (e.g. `'missing_block'`, `'value_mismatch'`).
`source`	`Literal['a', 'b', 'both']`	Which source the mismatch is tied to: `'a'`, `'b'`, or `'both'`.
`description`	`str`	Human-readable explanation of the difference.

Source code in src/cifflow/fidelity/check.py

@dataclass
class FidelityMismatch:
    """A single semantic difference found between two CIF sources.

    Attributes
    ----------
    kind
        Machine-readable category (e.g. ``'missing_block'``, ``'value_mismatch'``).
    source
        Which source the mismatch is tied to: ``'a'``, ``'b'``, or ``'both'``.
    description
        Human-readable explanation of the difference.
    """

    kind: str
    source: Literal['a', 'b', 'both']
    description: str

`check_fidelity(source_a, source_b, schema=None, *, version=CifVersion.CIF_2_0, report_file=None)`

Compare two CIF sources for semantic equivalence.

Parameters:

Name	Type	Description	Default
`source_a`	`'str \| pathlib.Path \| CifFile'`	First CIF source to compare. May be a file path (`str` or `pathlib.Path`) or a pre-parsed `CifFile` object.	required
`source_b`	`'str \| pathlib.Path \| CifFile'`	Second CIF source to compare. Same accepted types as source_a.	required
`schema`	`'str \| pathlib.Path \| SchemaSpec \| dict \| None'`	Schema to use for ingestion. `None` compares only `_cif_fallback`. Accepts `SchemaSpec`, `.json` cache path, or `.dic` DDLm dictionary path.	`None`
`version`	`CifVersion`	Fallback CIF version for files without a magic line. Default `CIF_2_0`.	`CIF_2_0`
`report_file`	`'str \| pathlib.Path \| None'`	Optional path for a human-readable text report. If provided, the report is written (UTF-8) before returning, regardless of pass/fail.	`None`

Returns:

Type	Description
`FidelityReport`	Parse and ingestion errors are captured in the report; never raises for data errors. Schema loading failures propagate directly.

Source code in src/cifflow/fidelity/check.py

def check_fidelity(
    source_a: 'str | pathlib.Path | CifFile',
    source_b: 'str | pathlib.Path | CifFile',
    schema: 'str | pathlib.Path | SchemaSpec | dict | None' = None,
    *,
    version: CifVersion = CifVersion.CIF_2_0,
    report_file: 'str | pathlib.Path | None' = None,
) -> FidelityReport:
    """Compare two CIF sources for semantic equivalence.

    Parameters
    ----------
    source_a
        First CIF source to compare.  May be a file path (``str`` or
        ``pathlib.Path``) or a pre-parsed ``CifFile`` object.
    source_b
        Second CIF source to compare.  Same accepted types as *source_a*.
    schema
        Schema to use for ingestion.  ``None`` compares only
        ``_cif_fallback``.  Accepts ``SchemaSpec``, ``.json`` cache path, or
        ``.dic`` DDLm dictionary path.
    version
        Fallback CIF version for files without a magic line.  Default
        ``CIF_2_0``.
    report_file
        Optional path for a human-readable text report.  If provided, the
        report is written (UTF-8) before returning, regardless of pass/fail.

    Returns
    -------
    FidelityReport
        Parse and ingestion errors are captured in the report; never raises
        for data errors.  Schema loading failures propagate directly.
    """
    mismatches: list[FidelityMismatch] = []

    def _label(src: object) -> str:
        if isinstance(src, CifFile):
            return 'CifFile object'
        return str(src)

    label_a = _label(source_a)
    label_b = _label(source_b)

    def _finish(ms: list[FidelityMismatch]) -> FidelityReport:
        rep = FidelityReport(passed=len(ms) == 0, mismatches=ms)
        if report_file is not None:
            pathlib.Path(report_file).write_text(
                _format_report(rep, label_a, label_b, schema_spec), encoding='utf-8'
            )
        return rep

    # Schema loading — propagates on failure (programming error)
    schema_spec = _load_schema(schema)

    # --- Step 1: load and parse sources ---
    cif_a, parse_errors_a = _load_source(source_a, version)
    for e in parse_errors_a:
        loc = f' at line {e.line}' if e.line else ''
        mismatches.append(FidelityMismatch(
            kind='parse_error', source='a',
            description=f'{e.error_type} error in A{loc}: {e.message}',
        ))

    cif_b, parse_errors_b = _load_source(source_b, version)
    for e in parse_errors_b:
        loc = f' at line {e.line}' if e.line else ''
        mismatches.append(FidelityMismatch(
            kind='parse_error', source='b',
            description=f'{e.error_type} error in B{loc}: {e.message}',
        ))

    if any(m.kind == 'parse_error' for m in mismatches):
        return _finish(mismatches)

    # --- Step 1 (continued): ingest ---
    conn_a = duckdb.connect()
    conn_b = duckdb.connect()

    ingest_ok_a = True
    ingest_ok_b = True

    try:
        conn_a, errors_a = ingest(cif_a, conn_a, schema=schema_spec)
        for msg in errors_a:
            mismatches.append(FidelityMismatch(
                kind='ingest_error', source='a', description=msg,
            ))
    except Exception as exc:
        ingest_ok_a = False
        mismatches.append(FidelityMismatch(
            kind='ingest_error', source='a', description=str(exc),
        ))

    try:
        conn_b, errors_b = ingest(cif_b, conn_b, schema=schema_spec)
        for msg in errors_b:
            mismatches.append(FidelityMismatch(
                kind='ingest_error', source='b', description=msg,
            ))
    except Exception as exc:
        ingest_ok_b = False
        mismatches.append(FidelityMismatch(
            kind='ingest_error', source='b', description=str(exc),
        ))

    if not ingest_ok_a or not ingest_ok_b:
        return _finish(mismatches)

    # --- Step 2: detect UUID-keyed tables ---
    if schema_spec is not None:
        uuid_tbls = _uuid_pk_tables(conn_a, conn_b, schema_spec)
        uuid_fk_cols = _fk_to_uuid_cols(schema_spec, uuid_tbls)
    else:
        uuid_tbls = frozenset()
        uuid_fk_cols = {}

    # --- Step 3: compare structured tables ---
    if schema_spec is not None:
        mismatches.extend(
            _compare_structured(conn_a, conn_b, schema_spec, uuid_tbls, uuid_fk_cols)
        )

    # --- Step 4: compare _cif_fallback ---
    mismatches.extend(_compare_fallback(conn_a, conn_b))

    # --- Step 5: schema mismatch detection ---
    if schema_spec is not None:
        mismatches.extend(_compare_schema_mismatch(conn_a, conn_b, schema_spec))

    return _finish(mismatches)

Fidelity

cifflow.fidelity.check

FidelityReport dataclass

FidelityMismatch dataclass

check_fidelity(source_a, source_b, schema=None, *, version=CifVersion.CIF_2_0, report_file=None)

`cifflow.fidelity.check`

`FidelityReport` `dataclass`

`FidelityMismatch` `dataclass`

`check_fidelity(source_a, source_b, schema=None, *, version=CifVersion.CIF_2_0, report_file=None)`