Output

`cifflow.output.emit`

CIF emission from a populated SQLite database.

emit(conn, schema, ...) reads structured tables and the _cif_fallback table and produces a valid CIF string.

Assumption: by emission time, all data in the database is assumed to belong to a single coherent dataset. Namespace conflicts (e.g. short identifiers from unrelated sources) are not detected or resolved by the output layer.

`emit(conn, schema, *, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0, plan=None, reconstruct_su=False, emit_defaults=True, line_ending='\n', pretty=True, line_limit=2048)`

Emit CIF text from a populated SQLite database.

Parameters:

Name	Type	Description	Default
`conn`	`DuckDBPyConnection`	Open `duckdb.DuckDBPyConnection` populated by `ingest()`. Read-only during emission.	required
`schema`	`SchemaSpec`	The `SchemaSpec` used when the database was ingested.	required
`mode`	`EmitMode`	How the database is partitioned into CIF blocks.	`ORIGINAL`
`version`	`CifVersion`	CIF version to emit. Controls quoting strategy.	`CIF_2_0`
`plan`	`OutputPlan \| None`	Optional ordering and grouping specification. `None` uses default ordering.	`None`
`reconstruct_su`	`bool`	When `True`, paired `(col, col_su)` columns are merged into a single `value(su)` token. Default `False`.	`False`
`emit_defaults`	`bool`	When `True` (default), columns filled from `enumeration_default` are emitted normally. When `False`, they would be suppressed; this requires per-value provenance tracking which is not yet implemented, so the flag is currently accepted but has no effect.	`True`
`line_ending`	`str`	Line terminator sequence written between every line and at the end of the output. Use `'\\n'` (default, Unix LF), `'\\r\\n'` (Windows CRLF), or `'\\r'` (legacy CR). The 2048-character line-length limit is measured on content before line endings are applied.	`'\n'`
`pretty`	`bool`	When `True` (default), tag–value pairs are column-aligned within each Set category and loop column values are padded to the widest value in that column. When `False`, output is compact (two spaces between tag and value / between tokens) — recommended for very large loop tables where the alignment pass would be expensive.	`True`
`line_limit`	`int \| None`	Maximum physical line length (in characters, before line endings are applied). Default `2048`. Use `None` to disable. Values below `40` are accepted but emit a `UserWarning`; very small limits may produce degenerate output for long tokens. When a content line inside a semicolon-delimited text field exceeds line_limit, the CIF 2.0 line-folding protocol (§5.3) is applied. When `'\\n;'` is also present in the value, the text-prefix protocol (§5.2) is combined with folding. Inline scalar values whose formatted line (tag + separator + token) would exceed line_limit are converted to semicolon-delimited fields. Loop data rows that exceed line_limit are wrapped across multiple physical lines using greedy token packing (tokens cannot be split). CIF 1.1 block codes, data names, and frame codes are independently limited to 75 characters by the CIF 1.1 specification; an exception is raised if this limit would be violated.	`2048`

Returns:

Type	Description
`str`	Complete CIF text including magic line, terminated with `line_ending`.

Source code in src/cifflow/output/emit.py

def emit(
    conn: duckdb.DuckDBPyConnection,
    schema: SchemaSpec,
    *,
    mode: EmitMode = EmitMode.ORIGINAL,
    version: CifVersion = CifVersion.CIF_2_0,
    plan: OutputPlan | None = None,
    reconstruct_su: bool = False,
    emit_defaults: bool = True,
    line_ending: str = '\n',
    pretty: bool = True,
    line_limit: int | None = 2048,
) -> str:
    r"""Emit CIF text from a populated SQLite database.

    Parameters
    ----------
    conn:
        Open ``duckdb.DuckDBPyConnection`` populated by ``ingest()``.  Read-only
        during emission.
    schema:
        The ``SchemaSpec`` used when the database was ingested.
    mode:
        How the database is partitioned into CIF blocks.
    version:
        CIF version to emit.  Controls quoting strategy.
    plan:
        Optional ordering and grouping specification.  ``None`` uses default
        ordering.
    reconstruct_su:
        When ``True``, paired ``(col, col_su)`` columns are merged into a
        single ``value(su)`` token.  Default ``False``.
    emit_defaults:
        When ``True`` (default), columns filled from ``enumeration_default``
        are emitted normally.  When ``False``, they would be suppressed; this
        requires per-value provenance tracking which is not yet implemented,
        so the flag is currently accepted but has no effect.
    line_ending:
        Line terminator sequence written between every line and at the end of
        the output.  Use ``'\\n'`` (default, Unix LF), ``'\\r\\n'`` (Windows
        CRLF), or ``'\\r'`` (legacy CR).  The 2048-character line-length limit
        is measured on content before line endings are applied.
    pretty:
        When ``True`` (default), tag–value pairs are column-aligned within
        each Set category and loop column values are padded to the widest
        value in that column.  When ``False``, output is compact (two spaces
        between tag and value / between tokens) — recommended for very large
        loop tables where the alignment pass would be expensive.
    line_limit:
        Maximum physical line length (in characters, before line endings are
        applied).  Default ``2048``.  Use ``None`` to disable.  Values below
        ``40`` are accepted but emit a ``UserWarning``; very small limits may
        produce degenerate output for long tokens.

        When a content line inside a semicolon-delimited text field exceeds
        *line_limit*, the CIF 2.0 line-folding protocol (§5.3) is applied.
        When ``'\\n;'`` is also present in the value, the text-prefix protocol
        (§5.2) is combined with folding.

        Inline scalar values whose formatted line (tag + separator + token)
        would exceed *line_limit* are converted to semicolon-delimited fields.

        Loop data rows that exceed *line_limit* are wrapped across multiple
        physical lines using greedy token packing (tokens cannot be split).

        CIF 1.1 block codes, data names, and frame codes are independently
        limited to 75 characters by the CIF 1.1 specification; an exception
        is raised if this limit would be violated.

    Returns
    -------
    str
        Complete CIF text including magic line, terminated with ``line_ending``.
    """
    if line_limit is not None and line_limit < 40:
        _warnings.warn(
            f'line_limit={line_limit} is very small; output may be degenerate for long tokens',
            UserWarning,
            stacklevel=2,
        )

    magic = '#\\#CIF_2.0' if version == CifVersion.CIF_2_0 else '#\\#CIF_1.1'

    if mode == EmitMode.ONE_BLOCK:
        raw_blocks = _collect_one_block(conn, schema)
    elif mode == EmitMode.ALL_BLOCKS:
        raw_blocks = _collect_all_blocks(conn, schema, version, plan)
    elif mode == EmitMode.GROUPED:
        raw_blocks = _collect_grouped(conn, schema, version)
    else:  # ORIGINAL
        raw_blocks = _collect_original(conn, schema)

    if mode == EmitMode.ALL_BLOCKS:
        plan_spec = plan.specs[0] if plan and plan.specs else None
        ordered = [(b, plan_spec) for b in raw_blocks]
    elif mode == EmitMode.ORIGINAL:
        if plan is not None:
            _warnings.warn(
                'OutputPlan is ignored in ORIGINAL mode; use GROUPED mode for custom ordering.',
                UserWarning,
                stacklevel=2,
            )
        ordered = [(b, None) for b in raw_blocks]
    else:
        ordered = _sort_and_merge(raw_blocks, plan)

    # Disambiguate block names; collect all output lines flat.
    used_names: dict[str, int] = {}
    lines = [magic]
    for i, (data, spec) in enumerate(ordered):
        base = data.name
        count = used_names.get(base, 0) + 1
        used_names[base] = count
        name = f'{base}_{count}' if count > 1 else base

        if i > 0:
            lines.append('')
            lines.append('')
        lines.extend(_render_block(name, data, schema, version, spec, reconstruct_su, pretty, line_limit))

    return line_ending.join(lines) + line_ending

`cifflow.output.plan`

Output plan dataclasses and EmitMode enum.

`EmitMode`

Bases: Enum

Controls how the database is partitioned into CIF blocks.

ONE_BLOCK All data collapsed into a single CIF block named 'output'.

ALL_BLOCKS One CIF block per schema category, plus one block per original _cifflow_block_id from _cif_fallback.

ORIGINAL Rows are grouped into blocks by their original _cifflow_block_id value, reconstructing the CIF blocks as they were before ingestion. This is the simple inverse of ingestion and the default.

GROUPED Rows are grouped by Set-category anchor key values. For each table the FK graph is searched (BFS) for the nearest Set-class ancestor:

- If a Set is reachable, that Set is the anchor.  Tables with
  composite keys — where some FK paths lead to Loop tables and others
  lead to a Set — are correctly anchored to the Set even when the Set
  path is not the first FK in the list.
- If no Set is reachable (the FK chain terminates at Loop tables only),
  those tables fall back to ``_cifflow_block_id`` grouping (equivalent to
  ORIGINAL for those tables).
- Keyless Set categories (those whose primary key is ``_cifflow_id``
  rather than a domain key) carry no cross-block identity; they also
  fall back to ``_cifflow_block_id`` grouping.

All tables that share the same Set anchor and the same anchor key
values are emitted in a single output block, merging rows from
multiple original data blocks that carry the same Set-level identity.

Source code in src/cifflow/output/plan.py

class EmitMode(Enum):
    """Controls how the database is partitioned into CIF blocks.

    ONE_BLOCK
        All data collapsed into a single CIF block named ``'output'``.

    ALL_BLOCKS
        One CIF block per schema category, plus one block per original
        ``_cifflow_block_id`` from ``_cif_fallback``.

    ORIGINAL
        Rows are grouped into blocks by their original ``_cifflow_block_id`` value,
        reconstructing the CIF blocks as they were before ingestion.  This is
        the simple inverse of ingestion and the default.

    GROUPED
        Rows are grouped by Set-category anchor key values.  For each table
        the FK graph is searched (BFS) for the nearest Set-class ancestor:

        - If a Set is reachable, that Set is the anchor.  Tables with
          composite keys — where some FK paths lead to Loop tables and others
          lead to a Set — are correctly anchored to the Set even when the Set
          path is not the first FK in the list.
        - If no Set is reachable (the FK chain terminates at Loop tables only),
          those tables fall back to ``_cifflow_block_id`` grouping (equivalent to
          ORIGINAL for those tables).
        - Keyless Set categories (those whose primary key is ``_cifflow_id``
          rather than a domain key) carry no cross-block identity; they also
          fall back to ``_cifflow_block_id`` grouping.

        All tables that share the same Set anchor and the same anchor key
        values are emitted in a single output block, merging rows from
        multiple original data blocks that carry the same Set-level identity.
    """

    ONE_BLOCK = "one_block"
    ALL_BLOCKS = "all_blocks"
    ORIGINAL = "original"
    GROUPED = "grouped"

`BlockSpec` `dataclass`

Emission specification for a group of output blocks.

Attributes:

Name	Type	Description
`matches`	`MatchPredicate`	Predicate for block routing. Accepted forms: `None` Catch-all; matches any block. `str` Equivalent to `any_of(name)` — matches if the name is in the anchor frozenset (Set-category tables with rows). `set[str]` / `frozenset[str]` Equivalent to `all_of(names)` — matches if every listed name is in the anchor frozenset. Two-argument callable `(anchors, tables) -> bool` anchors* is the frozenset of Set-category table names with rows; tables is the frozenset of all table names present (Set + Loop). :class:`_Matcher` Returned by :func:`only`, :func:`any_of`, :func:`all_of`, :func:`has`; supports `.excluding()`, `\|`, `&`. First-match wins across the ordered list in `OutputPlan.specs`.
`category_order`	`list[str \| list[str]]`	Categories in emission order within a block. A plain `str` names a single category. A `str` ending with `'*'` expands to that category plus all schema descendants, alphabetically. An inner `list[str]` is a merge group: compatible categories (sharing identical non-synthetic PK columns) are emitted as a single `loop_` via a FULL OUTER JOIN; incompatible categories fall back to plain loops in the listed order. Categories not listed are appended alphabetically (Set-class first) after those listed.
`single_block`	`bool`	When `False` (default), one output block is produced per unique combination of anchor key values matching this spec. When `True`, all data matching this spec is collapsed into a single output block; Set-category key columns are emitted as loop columns and FK-PK suppression does not apply. Mutually exclusive with `attach_to`.
`column_order`	`dict[str, list[str]]`	`category_name → [col_name, ...]`. Listed columns appear first within their category; remaining columns follow alphabetically.
`block_namer`	`Callable[[dict[str, list[str]]], str] \| None`	Optional per-spec block name override. Receives a dict mapping `'{category}.{object_id}'` → `[key_value, ...]` (single-element list when `single_block=False`; all values when `single_block=True`) and returns the desired block name as a plain string. Sanitization and disambiguation are still applied by the emitter. Falls back to `OutputPlan.block_namer`, then to the default construction rule.
`attach_to`	`MatchPredicate`	When set, this block is not emitted standalone. Instead its table rows are merged into the first already-resolved output block whose anchor and tables frozensets satisfy this predicate (same forms as `matches`). If no target is found, the block is emitted standalone with a `UserWarning`. Mutually exclusive with `single_block`.

Source code in src/cifflow/output/plan.py

@dataclass
class BlockSpec:
    """Emission specification for a group of output blocks.

    Attributes
    ----------
    matches:
        Predicate for block routing.  Accepted forms:

        ``None``
            Catch-all; matches any block.
        ``str``
            Equivalent to ``any_of(name)`` — matches if the name is in the
            anchor frozenset (Set-category tables with rows).
        ``set[str]`` / ``frozenset[str]``
            Equivalent to ``all_of(*names)`` — matches if every listed name
            is in the anchor frozenset.
        Two-argument callable ``(anchors, tables) -> bool``
            *anchors* is the frozenset of Set-category table names with rows;
            *tables* is the frozenset of all table names present (Set + Loop).
        :class:`_Matcher`
            Returned by :func:`only`, :func:`any_of`, :func:`all_of`,
            :func:`has`; supports ``.excluding()``, ``|``, ``&``.

        First-match wins across the ordered list in ``OutputPlan.specs``.
    category_order:
        Categories in emission order within a block.  A plain ``str`` names a
        single category.  A ``str`` ending with ``'*'`` expands to that
        category plus all schema descendants, alphabetically.  An inner
        ``list[str]`` is a merge group: compatible categories (sharing
        identical non-synthetic PK columns) are emitted as a single
        ``loop_`` via a FULL OUTER JOIN; incompatible categories fall back to
        plain loops in the listed order.  Categories not listed are appended
        alphabetically (Set-class first) after those listed.
    single_block:
        When ``False`` (default), one output block is produced per unique
        combination of anchor key values matching this spec.  When ``True``,
        all data matching this spec is collapsed into a single output block;
        Set-category key columns are emitted as loop columns and FK-PK
        suppression does not apply.  Mutually exclusive with ``attach_to``.
    column_order:
        ``category_name → [col_name, ...]``.  Listed columns appear first
        within their category; remaining columns follow alphabetically.
    block_namer:
        Optional per-spec block name override.  Receives a dict mapping
        ``'{category}.{object_id}'`` → ``[key_value, ...]`` (single-element
        list when ``single_block=False``; all values when ``single_block=True``)
        and returns the desired block name as a plain string.  Sanitization
        and disambiguation are still applied by the emitter.  Falls back to
        ``OutputPlan.block_namer``, then to the default construction rule.
    attach_to:
        When set, this block is not emitted standalone.  Instead its table
        rows are merged into the first already-resolved output block whose
        anchor and tables frozensets satisfy this predicate (same forms as
        ``matches``).  If no target is found, the block is emitted standalone
        with a ``UserWarning``.  Mutually exclusive with ``single_block``.
    """

    matches: MatchPredicate = None
    category_order: list[str | list[str]] = field(default_factory=list)
    single_block: bool = False
    column_order: dict[str, list[str]] = field(default_factory=dict)
    block_namer: Callable[[dict[str, list[str]]], str] | None = None
    attach_to: MatchPredicate = None

    def __post_init__(self) -> None:
        """Normalise and validate fields after dataclass initialisation.

        Raises
        ------
        ValueError
            If both ``single_block=True`` and ``attach_to`` are set.
        """
        if isinstance(self.matches, str):
            self.matches = any_of(self.matches)
        elif isinstance(self.matches, (set, frozenset)):
            self.matches = all_of(*self.matches)
        if isinstance(self.attach_to, str):
            self.attach_to = any_of(self.attach_to)
        elif isinstance(self.attach_to, (set, frozenset)):
            self.attach_to = all_of(*self.attach_to)
        if self.single_block and self.attach_to is not None:
            raise ValueError("BlockSpec: 'attach_to' and 'single_block' are mutually exclusive")

`__post_init__()`

Normalise and validate fields after dataclass initialisation.

Raises:

Type	Description
`ValueError`	If both `single_block=True` and `attach_to` are set.

Source code in src/cifflow/output/plan.py

def __post_init__(self) -> None:
    """Normalise and validate fields after dataclass initialisation.

    Raises
    ------
    ValueError
        If both ``single_block=True`` and ``attach_to`` are set.
    """
    if isinstance(self.matches, str):
        self.matches = any_of(self.matches)
    elif isinstance(self.matches, (set, frozenset)):
        self.matches = all_of(*self.matches)
    if isinstance(self.attach_to, str):
        self.attach_to = any_of(self.attach_to)
    elif isinstance(self.attach_to, (set, frozenset)):
        self.attach_to = all_of(*self.attach_to)
    if self.single_block and self.attach_to is not None:
        raise ValueError("BlockSpec: 'attach_to' and 'single_block' are mutually exclusive")

`OutputPlan` `dataclass`

Optional ordering and grouping specification for :func:emit.

Attributes:

Name	Type	Description
`specs`	`list[BlockSpec]`	Ordered list of :class:`BlockSpec` objects. For each output block the emitter evaluates specs in order and assigns the first matching spec (first-match wins). Blocks with no matching spec use default alphabetical category ordering. Emission order: all blocks assigned to `specs[0]` are emitted first, then `specs[1]`, etc. Unmatched blocks are emitted last in alphabetical order by block name. Within a single spec, multiple matching blocks are emitted in alphabetical order by block name. An empty list (default) means all blocks use default ordering.
`block_namer`	`Callable[[dict[str, list[str]]], str] \| None`	Global fallback block_namer (same signature as `BlockSpec.block_namer`) used when the matched `BlockSpec` has no `block_namer` of its own. When `None`, the default construction rule applies.

Source code in src/cifflow/output/plan.py

@dataclass
class OutputPlan:
    """Optional ordering and grouping specification for :func:`emit`.

    Attributes
    ----------
    specs:
        Ordered list of :class:`BlockSpec` objects.  For each output block
        the emitter evaluates specs in order and assigns the first matching
        spec (first-match wins).  Blocks with no matching spec use default
        alphabetical category ordering.

        Emission order: all blocks assigned to ``specs[0]`` are emitted
        first, then ``specs[1]``, etc.  Unmatched blocks are emitted last
        in alphabetical order by block name.  Within a single spec, multiple
        matching blocks are emitted in alphabetical order by block name.

        An empty list (default) means all blocks use default ordering.
    block_namer:
        Global fallback block_namer (same signature as
        ``BlockSpec.block_namer``) used when the matched ``BlockSpec`` has no
        ``block_namer`` of its own.  When ``None``, the default construction
        rule applies.
    """

    specs: list[BlockSpec] = field(default_factory=list)
    block_namer: Callable[[dict[str, list[str]]], str] | None = None

    def match(
        self,
        anchors: frozenset[str],
        tables: frozenset[str],
    ) -> tuple[int, BlockSpec] | tuple[None, None]:
        """Return ``(index, spec)`` of the first matching spec, or ``(None, None)``.

        Parameters
        ----------
        anchors
            Frozenset of Set-category table names that have rows in the block.
        tables
            Frozenset of all table names present in the block (Set + Loop).

        Returns
        -------
        tuple[int, BlockSpec] | tuple[None, None]
            ``(index, spec)`` of the first matching spec, or ``(None, None)``
            if no spec matches.
        """
        for i, spec in enumerate(self.specs):
            if spec.matches is None or spec.matches(anchors, tables):
                return i, spec
        return None, None

`match(anchors, tables)`

Return (index, spec) of the first matching spec, or (None, None).

Parameters:

Name	Type	Description	Default
`anchors`	`frozenset[str]`	Frozenset of Set-category table names that have rows in the block.	required
`tables`	`frozenset[str]`	Frozenset of all table names present in the block (Set + Loop).	required

Returns:

Type	Description
`tuple[int, BlockSpec] \| tuple[None, None]`	`(index, spec)` of the first matching spec, or `(None, None)` if no spec matches.

Source code in src/cifflow/output/plan.py

def match(
    self,
    anchors: frozenset[str],
    tables: frozenset[str],
) -> tuple[int, BlockSpec] | tuple[None, None]:
    """Return ``(index, spec)`` of the first matching spec, or ``(None, None)``.

    Parameters
    ----------
    anchors
        Frozenset of Set-category table names that have rows in the block.
    tables
        Frozenset of all table names present in the block (Set + Loop).

    Returns
    -------
    tuple[int, BlockSpec] | tuple[None, None]
        ``(index, spec)`` of the first matching spec, or ``(None, None)``
        if no spec matches.
    """
    for i, spec in enumerate(self.specs):
        if spec.matches is None or spec.matches(anchors, tables):
            return i, spec
    return None, None

`only(*categories)`

Match blocks whose anchor set is exactly the given set — no more, no less.

Source code in src/cifflow/output/plan.py

def only(*categories: str) -> _Matcher:
    """Match blocks whose anchor set is exactly the given set — no more, no less."""
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: anchors == cats)

`any_of(*categories)`

Match blocks containing at least one of categories in the anchor frozenset.

Source code in src/cifflow/output/plan.py

def any_of(*categories: str) -> _Matcher:
    """Match blocks containing at least one of *categories* in the anchor frozenset."""
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: bool(cats & anchors))

`all_of(*categories)`

Match blocks containing all of categories in the anchor frozenset.

Source code in src/cifflow/output/plan.py

def all_of(*categories: str) -> _Matcher:
    """Match blocks containing all of *categories* in the anchor frozenset."""
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: cats <= anchors)

`has(*categories)`

Match blocks containing at least one of categories in the full tables frozenset.

Checks the Set or Loop tables frozenset. Use this to route loop-only blocks that have no Set anchor without writing a lambda.

Source code in src/cifflow/output/plan.py

def has(*categories: str) -> _Matcher:
    """Match blocks containing at least one of *categories* in the full tables frozenset.

    Checks the Set **or** Loop tables frozenset.  Use this to route loop-only
    blocks that have no Set anchor without writing a lambda.
    """
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: bool(cats & tables))

`namer(*keys, prefix='', suffix='', sep='_', fallback='?')`

Return a block_namer that builds a name from anchor key values.

Parameters:

Name	Type	Description	Default
`*keys`	`str`	Anchor key identifiers in `'{category}.{object_id}'` form. The first value of each key is extracted from the `kd` dict passed by the emitter. Keys absent from `kd` contribute fallback. For example, a block anchored to diffrn with id='D1' would receive: {'diffrn.id': ['D1']} A bridge block with both pd_phase and pd_diffractogram: {'pd_diffractogram.id': ['D1'], 'pd_phase.id': ['Al2O3']}	`()`
`prefix`	`str`	String prepended to the result.	`''`
`suffix`	`str`	String appended to the result.	`''`
`sep`	`str`	Separator inserted between the extracted values. Default `'_'`.	`'_'`
`fallback`	`str`	Value used when a key is absent from `kd`. Default `'?'`.	`'?'`

Returns:

Type	Description
`Callable[[dict[str, list[str]]], str]`	A `block_namer` compatible with :class:`BlockSpec` and :class:`OutputPlan`.

Examples:

Single key with prefix:

>>> plan = OutputPlan(specs=[BlockSpec(matches='diffrn',
...                                   block_namer=namer('diffrn.id', prefix='structure_'))])
'structure_

Multi-key bridge block:

>>> namer('pd_phase.id', 'pd_diffractogram.id')({'pd_phase.id': ['Al2O3'], 'pd_diffractogram.id': ['D1']})
'Al2O3_D1'

Source code in src/cifflow/output/plan.py

def namer(*keys: str, prefix: str = '', suffix: str = '', sep: str = '_', fallback: str = '?') -> Callable[[dict[str, list[str]]], str]:
    """
    Return a block_namer that builds a name from anchor key values.

    Parameters
    ----------
    *keys
        Anchor key identifiers in ``'{category}.{object_id}'`` form.  The
        first value of each key is extracted from the ``kd`` dict passed by
        the emitter.  Keys absent from ``kd`` contribute *fallback*.

          For example, a block anchored to diffrn with id='D1' would receive: {'diffrn.id': ['D1']}
          A bridge block with both pd_phase and pd_diffractogram: {'pd_diffractogram.id': ['D1'], 'pd_phase.id': ['Al2O3']}
    prefix
        String prepended to the result.
    suffix
        String appended to the result.
    sep
        Separator inserted between the extracted values.  Default ``'_'``.
    fallback
        Value used when a key is absent from ``kd``.  Default ``'?'``.

    Returns
    -------
    Callable[[dict[str, list[str]]], str]
        A ``block_namer`` compatible with :class:`BlockSpec` and
        :class:`OutputPlan`.

    Examples
    --------
    Single key with prefix:

    >>> plan = OutputPlan(specs=[BlockSpec(matches='diffrn',
    ...                                   block_namer=namer('diffrn.id', prefix='structure_'))])
    'structure_

    Multi-key bridge block:

    >>> namer('pd_phase.id', 'pd_diffractogram.id')({'pd_phase.id': ['Al2O3'], 'pd_diffractogram.id': ['D1']})
    'Al2O3_D1'
    """
    def _fn(kd: dict[str, list[str]]) -> str:
        parts = [kd.get(k, [fallback])[0] for k in keys]
        return prefix + sep.join(parts) + suffix
    return _fn

`cifflow.output.quote`

Value quoting for CIF output.

quote(stored, version) converts a value as stored in the SQLite database back to a valid CIF token, selecting the least-restrictive delimiter that produces a correctly round-trippable result.

Storage encoding (from ingest.encode_value): - PLACEHOLDER . / ? → stored as . / ? (length 1) - Quoted . / ? → stored as "." / "?" (length 3) - Container (list / table) → stored as JSON text (CIF 2.0 only) - Everything else → stored as raw string

`quote(stored, version)`

Return a valid CIF token for stored, suitable for the given version.

Parameters:

Name	Type	Description	Default
`stored`	`str`	The value as retrieved from the SQLite database. Presence-state encoding from `encode_value` is decoded here: `'.'` or `'?'` (length 1) → PLACEHOLDER → returned unquoted. `'"."'` or `'"?"'` (length 3) → quoted dot/question-mark → the inner character is re-quoted as a regular string. All other values pass through the full quoting decision tree.	required
`version`	`CifVersion`	`CifVersion.CIF_2_0` or `CifVersion.CIF_1_1`. Controls which delimiter types are available (triple-quoted strings are CIF 2.0 only).	required

Returns:

Type	Description
`str`	A valid CIF token. Semicolon-delimited tokens begin with `'\n'` so the caller can distinguish them from inline tokens.

Source code in src/cifflow/output/quote.py

def quote(stored: str, version: CifVersion) -> str:
    r"""Return a valid CIF token for *stored*, suitable for the given *version*.

    Parameters
    ----------
    stored:
        The value as retrieved from the SQLite database.  Presence-state
        encoding from ``encode_value`` is decoded here:

        - ``'.'`` or ``'?'`` (length 1) → PLACEHOLDER → returned unquoted.
        - ``'"."'`` or ``'"?"'`` (length 3) → quoted dot/question-mark →
          the inner character is re-quoted as a regular string.
        - All other values pass through the full quoting decision tree.

    version:
        ``CifVersion.CIF_2_0`` or ``CifVersion.CIF_1_1``.  Controls which
        delimiter types are available (triple-quoted strings are CIF 2.0 only).

    Returns
    -------
    str
        A valid CIF token.  Semicolon-delimited tokens begin with ``'\n'``
        so the caller can distinguish them from inline tokens.
    """
    if stored in ('.', '?'):
        return stored                          # PLACEHOLDER — always unquoted
    if stored in ('"."', '"?"'):
        return _quote_string(stored[1], version)   # quoted dot/question-mark
    if version == CifVersion.CIF_2_0 and stored.startswith(_CONTAINER_PREFIX):
        return _format_container(decode_container(stored), version)
    return _quote_string(stored, version)

`make_text_field(s, line_limit=None)`

Produce a semicolon-delimited CIF text field for s.

Selects the correct wire format based on content requirements:

needs_prefix is True when s contains '\\n;', which would otherwise prematurely terminate the field.

needs_fold is True when line_limit is given and at least one content line in the text field would produce a physical line exceeding line_limit characters.

Valid for both CIF 1.1 and CIF 2.0 (semicolon fields exist in both).

Source code in src/cifflow/output/quote.py

def make_text_field(s: str, line_limit: int | None = None) -> str:
    r"""Produce a semicolon-delimited CIF text field for *s*.

    Selects the correct wire format based on content requirements:

    +--------------+-------------+-----------------------------+
    | needs_prefix | needs_fold  | format used                 |
    +==============+=============+=============================+
    | False        | False       | plain semicolon             |
    | True         | False       | prefix-only semicolon       |
    | False        | True        | fold-only semicolon         |
    | True         | True        | prefix + fold semicolon     |
    +--------------+-------------+-----------------------------+

    *needs_prefix* is ``True`` when *s* contains ``'\\n;'``, which would
    otherwise prematurely terminate the field.

    *needs_fold* is ``True`` when *line_limit* is given and at least one
    content line in the text field would produce a physical line exceeding
    *line_limit* characters.

    Valid for both CIF 1.1 and CIF 2.0 (semicolon fields exist in both).
    """
    needs_prefix = '\n;' in s
    needs_fold = False
    if line_limit is not None:
        if needs_prefix:
            # Physical line = '{_PREFIX}{content}', so content must fit in
            # line_limit - len(_PREFIX) chars.
            needs_fold = any(
                len(line) > line_limit - len(_PREFIX) for line in s.split('\n')
            )
        else:
            needs_fold = any(len(line) > line_limit for line in s.split('\n'))

    if needs_prefix and needs_fold:
        return _make_prefixed_folded_semicolon(s, line_limit)
    if needs_prefix:
        return _make_prefixed_semicolon(s)
    if needs_fold:
        return _make_folded_semicolon(s, line_limit)
    return _make_semicolon(s)

Output

cifflow.output.emit

emit(conn, schema, *, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0, plan=None, reconstruct_su=False, emit_defaults=True, line_ending='\n', pretty=True, line_limit=2048)

cifflow.output.plan

EmitMode

BlockSpec dataclass

__post_init__()

OutputPlan dataclass

match(anchors, tables)

only(*categories)

any_of(*categories)

all_of(*categories)

has(*categories)

namer(*keys, prefix='', suffix='', sep='_', fallback='?')

cifflow.output.quote

quote(stored, version)

make_text_field(s, line_limit=None)

`cifflow.output.emit`

`emit(conn, schema, *, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0, plan=None, reconstruct_su=False, emit_defaults=True, line_ending='\n', pretty=True, line_limit=2048)`

`cifflow.output.plan`

`EmitMode`

`BlockSpec` `dataclass`

`__post_init__()`

`OutputPlan` `dataclass`

`match(anchors, tables)`

`only(*categories)`

`any_of(*categories)`

`all_of(*categories)`

`has(*categories)`

`namer(*keys, prefix='', suffix='', sep='_', fallback='?')`

`cifflow.output.quote`

`quote(stored, version)`

`make_text_field(s, line_limit=None)`