Skip to content

Output

cifflow.output.emit

CIF emission from a populated SQLite database.

emit(conn, schema, ...) reads structured tables and the _cif_fallback table and produces a valid CIF string.

Assumption: by emission time, all data in the database is assumed to belong to a single coherent dataset. Namespace conflicts (e.g. short identifiers from unrelated sources) are not detected or resolved by the output layer.

emit(conn, schema, *, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0, plan=None, reconstruct_su=False, emit_defaults=True, line_ending='\n', pretty=True, line_limit=2048)

Emit CIF text from a populated SQLite database.

Parameters:

Name Type Description Default
conn DuckDBPyConnection

Open duckdb.DuckDBPyConnection populated by ingest(). Read-only during emission.

required
schema SchemaSpec

The SchemaSpec used when the database was ingested.

required
mode EmitMode

How the database is partitioned into CIF blocks.

ORIGINAL
version CifVersion

CIF version to emit. Controls quoting strategy.

CIF_2_0
plan OutputPlan | None

Optional ordering and grouping specification. None uses default ordering.

None
reconstruct_su bool

When True, paired (col, col_su) columns are merged into a single value(su) token. Default False.

False
emit_defaults bool

When True (default), columns filled from enumeration_default are emitted normally. When False, they would be suppressed; this requires per-value provenance tracking which is not yet implemented, so the flag is currently accepted but has no effect.

True
line_ending str

Line terminator sequence written between every line and at the end of the output. Use '\\n' (default, Unix LF), '\\r\\n' (Windows CRLF), or '\\r' (legacy CR). The 2048-character line-length limit is measured on content before line endings are applied.

'\n'
pretty bool

When True (default), tag–value pairs are column-aligned within each Set category and loop column values are padded to the widest value in that column. When False, output is compact (two spaces between tag and value / between tokens) — recommended for very large loop tables where the alignment pass would be expensive.

True
line_limit int | None

Maximum physical line length (in characters, before line endings are applied). Default 2048. Use None to disable. Values below 40 are accepted but emit a UserWarning; very small limits may produce degenerate output for long tokens.

When a content line inside a semicolon-delimited text field exceeds line_limit, the CIF 2.0 line-folding protocol (§5.3) is applied. When '\\n;' is also present in the value, the text-prefix protocol (§5.2) is combined with folding.

Inline scalar values whose formatted line (tag + separator + token) would exceed line_limit are converted to semicolon-delimited fields.

Loop data rows that exceed line_limit are wrapped across multiple physical lines using greedy token packing (tokens cannot be split).

CIF 1.1 block codes, data names, and frame codes are independently limited to 75 characters by the CIF 1.1 specification; an exception is raised if this limit would be violated.

2048

Returns:

Type Description
str

Complete CIF text including magic line, terminated with line_ending.

Source code in src/cifflow/output/emit.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
def emit(
    conn: duckdb.DuckDBPyConnection,
    schema: SchemaSpec,
    *,
    mode: EmitMode = EmitMode.ORIGINAL,
    version: CifVersion = CifVersion.CIF_2_0,
    plan: OutputPlan | None = None,
    reconstruct_su: bool = False,
    emit_defaults: bool = True,
    line_ending: str = '\n',
    pretty: bool = True,
    line_limit: int | None = 2048,
) -> str:
    r"""Emit CIF text from a populated SQLite database.

    Parameters
    ----------
    conn:
        Open ``duckdb.DuckDBPyConnection`` populated by ``ingest()``.  Read-only
        during emission.
    schema:
        The ``SchemaSpec`` used when the database was ingested.
    mode:
        How the database is partitioned into CIF blocks.
    version:
        CIF version to emit.  Controls quoting strategy.
    plan:
        Optional ordering and grouping specification.  ``None`` uses default
        ordering.
    reconstruct_su:
        When ``True``, paired ``(col, col_su)`` columns are merged into a
        single ``value(su)`` token.  Default ``False``.
    emit_defaults:
        When ``True`` (default), columns filled from ``enumeration_default``
        are emitted normally.  When ``False``, they would be suppressed; this
        requires per-value provenance tracking which is not yet implemented,
        so the flag is currently accepted but has no effect.
    line_ending:
        Line terminator sequence written between every line and at the end of
        the output.  Use ``'\\n'`` (default, Unix LF), ``'\\r\\n'`` (Windows
        CRLF), or ``'\\r'`` (legacy CR).  The 2048-character line-length limit
        is measured on content before line endings are applied.
    pretty:
        When ``True`` (default), tag–value pairs are column-aligned within
        each Set category and loop column values are padded to the widest
        value in that column.  When ``False``, output is compact (two spaces
        between tag and value / between tokens) — recommended for very large
        loop tables where the alignment pass would be expensive.
    line_limit:
        Maximum physical line length (in characters, before line endings are
        applied).  Default ``2048``.  Use ``None`` to disable.  Values below
        ``40`` are accepted but emit a ``UserWarning``; very small limits may
        produce degenerate output for long tokens.

        When a content line inside a semicolon-delimited text field exceeds
        *line_limit*, the CIF 2.0 line-folding protocol (§5.3) is applied.
        When ``'\\n;'`` is also present in the value, the text-prefix protocol
        (§5.2) is combined with folding.

        Inline scalar values whose formatted line (tag + separator + token)
        would exceed *line_limit* are converted to semicolon-delimited fields.

        Loop data rows that exceed *line_limit* are wrapped across multiple
        physical lines using greedy token packing (tokens cannot be split).

        CIF 1.1 block codes, data names, and frame codes are independently
        limited to 75 characters by the CIF 1.1 specification; an exception
        is raised if this limit would be violated.

    Returns
    -------
    str
        Complete CIF text including magic line, terminated with ``line_ending``.
    """
    if line_limit is not None and line_limit < 40:
        _warnings.warn(
            f'line_limit={line_limit} is very small; output may be degenerate for long tokens',
            UserWarning,
            stacklevel=2,
        )

    magic = '#\\#CIF_2.0' if version == CifVersion.CIF_2_0 else '#\\#CIF_1.1'

    if mode == EmitMode.ONE_BLOCK:
        raw_blocks = _collect_one_block(conn, schema)
    elif mode == EmitMode.ALL_BLOCKS:
        raw_blocks = _collect_all_blocks(conn, schema, version, plan)
    elif mode == EmitMode.GROUPED:
        raw_blocks = _collect_grouped(conn, schema, version)
    else:  # ORIGINAL
        raw_blocks = _collect_original(conn, schema)

    if mode == EmitMode.ALL_BLOCKS:
        plan_spec = plan.specs[0] if plan and plan.specs else None
        ordered = [(b, plan_spec) for b in raw_blocks]
    elif mode == EmitMode.ORIGINAL:
        if plan is not None:
            _warnings.warn(
                'OutputPlan is ignored in ORIGINAL mode; use GROUPED mode for custom ordering.',
                UserWarning,
                stacklevel=2,
            )
        ordered = [(b, None) for b in raw_blocks]
    else:
        ordered = _sort_and_merge(raw_blocks, plan)

    # Disambiguate block names; collect all output lines flat.
    used_names: dict[str, int] = {}
    lines = [magic]
    for i, (data, spec) in enumerate(ordered):
        base = data.name
        count = used_names.get(base, 0) + 1
        used_names[base] = count
        name = f'{base}_{count}' if count > 1 else base

        if i > 0:
            lines.append('')
            lines.append('')
        lines.extend(_render_block(name, data, schema, version, spec, reconstruct_su, pretty, line_limit))

    return line_ending.join(lines) + line_ending

cifflow.output.plan

Output plan dataclasses and EmitMode enum.

EmitMode

Bases: Enum

Controls how the database is partitioned into CIF blocks.

ONE_BLOCK All data collapsed into a single CIF block named 'output'.

ALL_BLOCKS One CIF block per schema category, plus one block per original _cifflow_block_id from _cif_fallback.

ORIGINAL Rows are grouped into blocks by their original _cifflow_block_id value, reconstructing the CIF blocks as they were before ingestion. This is the simple inverse of ingestion and the default.

GROUPED Rows are grouped by Set-category anchor key values. For each table the FK graph is searched (BFS) for the nearest Set-class ancestor:

- If a Set is reachable, that Set is the anchor.  Tables with
  composite keys — where some FK paths lead to Loop tables and others
  lead to a Set — are correctly anchored to the Set even when the Set
  path is not the first FK in the list.
- If no Set is reachable (the FK chain terminates at Loop tables only),
  those tables fall back to ``_cifflow_block_id`` grouping (equivalent to
  ORIGINAL for those tables).
- Keyless Set categories (those whose primary key is ``_cifflow_id``
  rather than a domain key) carry no cross-block identity; they also
  fall back to ``_cifflow_block_id`` grouping.

All tables that share the same Set anchor and the same anchor key
values are emitted in a single output block, merging rows from
multiple original data blocks that carry the same Set-level identity.
Source code in src/cifflow/output/plan.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
class EmitMode(Enum):
    """Controls how the database is partitioned into CIF blocks.

    ONE_BLOCK
        All data collapsed into a single CIF block named ``'output'``.

    ALL_BLOCKS
        One CIF block per schema category, plus one block per original
        ``_cifflow_block_id`` from ``_cif_fallback``.

    ORIGINAL
        Rows are grouped into blocks by their original ``_cifflow_block_id`` value,
        reconstructing the CIF blocks as they were before ingestion.  This is
        the simple inverse of ingestion and the default.

    GROUPED
        Rows are grouped by Set-category anchor key values.  For each table
        the FK graph is searched (BFS) for the nearest Set-class ancestor:

        - If a Set is reachable, that Set is the anchor.  Tables with
          composite keys — where some FK paths lead to Loop tables and others
          lead to a Set — are correctly anchored to the Set even when the Set
          path is not the first FK in the list.
        - If no Set is reachable (the FK chain terminates at Loop tables only),
          those tables fall back to ``_cifflow_block_id`` grouping (equivalent to
          ORIGINAL for those tables).
        - Keyless Set categories (those whose primary key is ``_cifflow_id``
          rather than a domain key) carry no cross-block identity; they also
          fall back to ``_cifflow_block_id`` grouping.

        All tables that share the same Set anchor and the same anchor key
        values are emitted in a single output block, merging rows from
        multiple original data blocks that carry the same Set-level identity.
    """

    ONE_BLOCK = "one_block"
    ALL_BLOCKS = "all_blocks"
    ORIGINAL = "original"
    GROUPED = "grouped"

BlockSpec dataclass

Emission specification for a group of output blocks.

Attributes:

Name Type Description
matches MatchPredicate

Predicate for block routing. Accepted forms:

None Catch-all; matches any block. str Equivalent to any_of(name) — matches if the name is in the anchor frozenset (Set-category tables with rows). set[str] / frozenset[str] Equivalent to all_of(*names) — matches if every listed name is in the anchor frozenset. Two-argument callable (anchors, tables) -> bool anchors is the frozenset of Set-category table names with rows; tables is the frozenset of all table names present (Set + Loop). :class:_Matcher Returned by :func:only, :func:any_of, :func:all_of, :func:has; supports .excluding(), |, &.

First-match wins across the ordered list in OutputPlan.specs.

category_order list[str | list[str]]

Categories in emission order within a block. A plain str names a single category. A str ending with '*' expands to that category plus all schema descendants, alphabetically. An inner list[str] is a merge group: compatible categories (sharing identical non-synthetic PK columns) are emitted as a single loop_ via a FULL OUTER JOIN; incompatible categories fall back to plain loops in the listed order. Categories not listed are appended alphabetically (Set-class first) after those listed.

single_block bool

When False (default), one output block is produced per unique combination of anchor key values matching this spec. When True, all data matching this spec is collapsed into a single output block; Set-category key columns are emitted as loop columns and FK-PK suppression does not apply. Mutually exclusive with attach_to.

column_order dict[str, list[str]]

category_name → [col_name, ...]. Listed columns appear first within their category; remaining columns follow alphabetically.

block_namer Callable[[dict[str, list[str]]], str] | None

Optional per-spec block name override. Receives a dict mapping '{category}.{object_id}'[key_value, ...] (single-element list when single_block=False; all values when single_block=True) and returns the desired block name as a plain string. Sanitization and disambiguation are still applied by the emitter. Falls back to OutputPlan.block_namer, then to the default construction rule.

attach_to MatchPredicate

When set, this block is not emitted standalone. Instead its table rows are merged into the first already-resolved output block whose anchor and tables frozensets satisfy this predicate (same forms as matches). If no target is found, the block is emitted standalone with a UserWarning. Mutually exclusive with single_block.

Source code in src/cifflow/output/plan.py
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
@dataclass
class BlockSpec:
    """Emission specification for a group of output blocks.

    Attributes
    ----------
    matches:
        Predicate for block routing.  Accepted forms:

        ``None``
            Catch-all; matches any block.
        ``str``
            Equivalent to ``any_of(name)`` — matches if the name is in the
            anchor frozenset (Set-category tables with rows).
        ``set[str]`` / ``frozenset[str]``
            Equivalent to ``all_of(*names)`` — matches if every listed name
            is in the anchor frozenset.
        Two-argument callable ``(anchors, tables) -> bool``
            *anchors* is the frozenset of Set-category table names with rows;
            *tables* is the frozenset of all table names present (Set + Loop).
        :class:`_Matcher`
            Returned by :func:`only`, :func:`any_of`, :func:`all_of`,
            :func:`has`; supports ``.excluding()``, ``|``, ``&``.

        First-match wins across the ordered list in ``OutputPlan.specs``.
    category_order:
        Categories in emission order within a block.  A plain ``str`` names a
        single category.  A ``str`` ending with ``'*'`` expands to that
        category plus all schema descendants, alphabetically.  An inner
        ``list[str]`` is a merge group: compatible categories (sharing
        identical non-synthetic PK columns) are emitted as a single
        ``loop_`` via a FULL OUTER JOIN; incompatible categories fall back to
        plain loops in the listed order.  Categories not listed are appended
        alphabetically (Set-class first) after those listed.
    single_block:
        When ``False`` (default), one output block is produced per unique
        combination of anchor key values matching this spec.  When ``True``,
        all data matching this spec is collapsed into a single output block;
        Set-category key columns are emitted as loop columns and FK-PK
        suppression does not apply.  Mutually exclusive with ``attach_to``.
    column_order:
        ``category_name → [col_name, ...]``.  Listed columns appear first
        within their category; remaining columns follow alphabetically.
    block_namer:
        Optional per-spec block name override.  Receives a dict mapping
        ``'{category}.{object_id}'`` → ``[key_value, ...]`` (single-element
        list when ``single_block=False``; all values when ``single_block=True``)
        and returns the desired block name as a plain string.  Sanitization
        and disambiguation are still applied by the emitter.  Falls back to
        ``OutputPlan.block_namer``, then to the default construction rule.
    attach_to:
        When set, this block is not emitted standalone.  Instead its table
        rows are merged into the first already-resolved output block whose
        anchor and tables frozensets satisfy this predicate (same forms as
        ``matches``).  If no target is found, the block is emitted standalone
        with a ``UserWarning``.  Mutually exclusive with ``single_block``.
    """

    matches: MatchPredicate = None
    category_order: list[str | list[str]] = field(default_factory=list)
    single_block: bool = False
    column_order: dict[str, list[str]] = field(default_factory=dict)
    block_namer: Callable[[dict[str, list[str]]], str] | None = None
    attach_to: MatchPredicate = None

    def __post_init__(self) -> None:
        """Normalise and validate fields after dataclass initialisation.

        Raises
        ------
        ValueError
            If both ``single_block=True`` and ``attach_to`` are set.
        """
        if isinstance(self.matches, str):
            self.matches = any_of(self.matches)
        elif isinstance(self.matches, (set, frozenset)):
            self.matches = all_of(*self.matches)
        if isinstance(self.attach_to, str):
            self.attach_to = any_of(self.attach_to)
        elif isinstance(self.attach_to, (set, frozenset)):
            self.attach_to = all_of(*self.attach_to)
        if self.single_block and self.attach_to is not None:
            raise ValueError("BlockSpec: 'attach_to' and 'single_block' are mutually exclusive")

__post_init__()

Normalise and validate fields after dataclass initialisation.

Raises:

Type Description
ValueError

If both single_block=True and attach_to are set.

Source code in src/cifflow/output/plan.py
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
def __post_init__(self) -> None:
    """Normalise and validate fields after dataclass initialisation.

    Raises
    ------
    ValueError
        If both ``single_block=True`` and ``attach_to`` are set.
    """
    if isinstance(self.matches, str):
        self.matches = any_of(self.matches)
    elif isinstance(self.matches, (set, frozenset)):
        self.matches = all_of(*self.matches)
    if isinstance(self.attach_to, str):
        self.attach_to = any_of(self.attach_to)
    elif isinstance(self.attach_to, (set, frozenset)):
        self.attach_to = all_of(*self.attach_to)
    if self.single_block and self.attach_to is not None:
        raise ValueError("BlockSpec: 'attach_to' and 'single_block' are mutually exclusive")

OutputPlan dataclass

Optional ordering and grouping specification for :func:emit.

Attributes:

Name Type Description
specs list[BlockSpec]

Ordered list of :class:BlockSpec objects. For each output block the emitter evaluates specs in order and assigns the first matching spec (first-match wins). Blocks with no matching spec use default alphabetical category ordering.

Emission order: all blocks assigned to specs[0] are emitted first, then specs[1], etc. Unmatched blocks are emitted last in alphabetical order by block name. Within a single spec, multiple matching blocks are emitted in alphabetical order by block name.

An empty list (default) means all blocks use default ordering.

block_namer Callable[[dict[str, list[str]]], str] | None

Global fallback block_namer (same signature as BlockSpec.block_namer) used when the matched BlockSpec has no block_namer of its own. When None, the default construction rule applies.

Source code in src/cifflow/output/plan.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
@dataclass
class OutputPlan:
    """Optional ordering and grouping specification for :func:`emit`.

    Attributes
    ----------
    specs:
        Ordered list of :class:`BlockSpec` objects.  For each output block
        the emitter evaluates specs in order and assigns the first matching
        spec (first-match wins).  Blocks with no matching spec use default
        alphabetical category ordering.

        Emission order: all blocks assigned to ``specs[0]`` are emitted
        first, then ``specs[1]``, etc.  Unmatched blocks are emitted last
        in alphabetical order by block name.  Within a single spec, multiple
        matching blocks are emitted in alphabetical order by block name.

        An empty list (default) means all blocks use default ordering.
    block_namer:
        Global fallback block_namer (same signature as
        ``BlockSpec.block_namer``) used when the matched ``BlockSpec`` has no
        ``block_namer`` of its own.  When ``None``, the default construction
        rule applies.
    """

    specs: list[BlockSpec] = field(default_factory=list)
    block_namer: Callable[[dict[str, list[str]]], str] | None = None

    def match(
        self,
        anchors: frozenset[str],
        tables: frozenset[str],
    ) -> tuple[int, BlockSpec] | tuple[None, None]:
        """Return ``(index, spec)`` of the first matching spec, or ``(None, None)``.

        Parameters
        ----------
        anchors
            Frozenset of Set-category table names that have rows in the block.
        tables
            Frozenset of all table names present in the block (Set + Loop).

        Returns
        -------
        tuple[int, BlockSpec] | tuple[None, None]
            ``(index, spec)`` of the first matching spec, or ``(None, None)``
            if no spec matches.
        """
        for i, spec in enumerate(self.specs):
            if spec.matches is None or spec.matches(anchors, tables):
                return i, spec
        return None, None

match(anchors, tables)

Return (index, spec) of the first matching spec, or (None, None).

Parameters:

Name Type Description Default
anchors frozenset[str]

Frozenset of Set-category table names that have rows in the block.

required
tables frozenset[str]

Frozenset of all table names present in the block (Set + Loop).

required

Returns:

Type Description
tuple[int, BlockSpec] | tuple[None, None]

(index, spec) of the first matching spec, or (None, None) if no spec matches.

Source code in src/cifflow/output/plan.py
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
def match(
    self,
    anchors: frozenset[str],
    tables: frozenset[str],
) -> tuple[int, BlockSpec] | tuple[None, None]:
    """Return ``(index, spec)`` of the first matching spec, or ``(None, None)``.

    Parameters
    ----------
    anchors
        Frozenset of Set-category table names that have rows in the block.
    tables
        Frozenset of all table names present in the block (Set + Loop).

    Returns
    -------
    tuple[int, BlockSpec] | tuple[None, None]
        ``(index, spec)`` of the first matching spec, or ``(None, None)``
        if no spec matches.
    """
    for i, spec in enumerate(self.specs):
        if spec.matches is None or spec.matches(anchors, tables):
            return i, spec
    return None, None

only(*categories)

Match blocks whose anchor set is exactly the given set — no more, no less.

Source code in src/cifflow/output/plan.py
93
94
95
96
def only(*categories: str) -> _Matcher:
    """Match blocks whose anchor set is exactly the given set — no more, no less."""
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: anchors == cats)

any_of(*categories)

Match blocks containing at least one of categories in the anchor frozenset.

Source code in src/cifflow/output/plan.py
 99
100
101
102
def any_of(*categories: str) -> _Matcher:
    """Match blocks containing at least one of *categories* in the anchor frozenset."""
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: bool(cats & anchors))

all_of(*categories)

Match blocks containing all of categories in the anchor frozenset.

Source code in src/cifflow/output/plan.py
105
106
107
108
def all_of(*categories: str) -> _Matcher:
    """Match blocks containing all of *categories* in the anchor frozenset."""
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: cats <= anchors)

has(*categories)

Match blocks containing at least one of categories in the full tables frozenset.

Checks the Set or Loop tables frozenset. Use this to route loop-only blocks that have no Set anchor without writing a lambda.

Source code in src/cifflow/output/plan.py
111
112
113
114
115
116
117
118
def has(*categories: str) -> _Matcher:
    """Match blocks containing at least one of *categories* in the full tables frozenset.

    Checks the Set **or** Loop tables frozenset.  Use this to route loop-only
    blocks that have no Set anchor without writing a lambda.
    """
    cats = frozenset(categories)
    return _Matcher(lambda anchors, tables: bool(cats & tables))

namer(*keys, prefix='', suffix='', sep='_', fallback='?')

Return a block_namer that builds a name from anchor key values.

Parameters:

Name Type Description Default
*keys str

Anchor key identifiers in '{category}.{object_id}' form. The first value of each key is extracted from the kd dict passed by the emitter. Keys absent from kd contribute fallback.

For example, a block anchored to diffrn with id='D1' would receive: {'diffrn.id': ['D1']} A bridge block with both pd_phase and pd_diffractogram: {'pd_diffractogram.id': ['D1'], 'pd_phase.id': ['Al2O3']}

()
prefix str

String prepended to the result.

''
suffix str

String appended to the result.

''
sep str

Separator inserted between the extracted values. Default '_'.

'_'
fallback str

Value used when a key is absent from kd. Default '?'.

'?'

Returns:

Type Description
Callable[[dict[str, list[str]]], str]

A block_namer compatible with :class:BlockSpec and :class:OutputPlan.

Examples:

Single key with prefix:

>>> plan = OutputPlan(specs=[BlockSpec(matches='diffrn',
...                                   block_namer=namer('diffrn.id', prefix='structure_'))])
'structure_

Multi-key bridge block:

>>> namer('pd_phase.id', 'pd_diffractogram.id')({'pd_phase.id': ['Al2O3'], 'pd_diffractogram.id': ['D1']})
'Al2O3_D1'
Source code in src/cifflow/output/plan.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
def namer(*keys: str, prefix: str = '', suffix: str = '', sep: str = '_', fallback: str = '?') -> Callable[[dict[str, list[str]]], str]:
    """
    Return a block_namer that builds a name from anchor key values.

    Parameters
    ----------
    *keys
        Anchor key identifiers in ``'{category}.{object_id}'`` form.  The
        first value of each key is extracted from the ``kd`` dict passed by
        the emitter.  Keys absent from ``kd`` contribute *fallback*.

          For example, a block anchored to diffrn with id='D1' would receive: {'diffrn.id': ['D1']}
          A bridge block with both pd_phase and pd_diffractogram: {'pd_diffractogram.id': ['D1'], 'pd_phase.id': ['Al2O3']}
    prefix
        String prepended to the result.
    suffix
        String appended to the result.
    sep
        Separator inserted between the extracted values.  Default ``'_'``.
    fallback
        Value used when a key is absent from ``kd``.  Default ``'?'``.

    Returns
    -------
    Callable[[dict[str, list[str]]], str]
        A ``block_namer`` compatible with :class:`BlockSpec` and
        :class:`OutputPlan`.

    Examples
    --------
    Single key with prefix:

    >>> plan = OutputPlan(specs=[BlockSpec(matches='diffrn',
    ...                                   block_namer=namer('diffrn.id', prefix='structure_'))])
    'structure_

    Multi-key bridge block:

    >>> namer('pd_phase.id', 'pd_diffractogram.id')({'pd_phase.id': ['Al2O3'], 'pd_diffractogram.id': ['D1']})
    'Al2O3_D1'
    """
    def _fn(kd: dict[str, list[str]]) -> str:
        parts = [kd.get(k, [fallback])[0] for k in keys]
        return prefix + sep.join(parts) + suffix
    return _fn

cifflow.output.quote

Value quoting for CIF output.

quote(stored, version) converts a value as stored in the SQLite database back to a valid CIF token, selecting the least-restrictive delimiter that produces a correctly round-trippable result.

Storage encoding (from ingest.encode_value): - PLACEHOLDER . / ? → stored as . / ? (length 1) - Quoted . / ? → stored as "." / "?" (length 3) - Container (list / table) → stored as JSON text (CIF 2.0 only) - Everything else → stored as raw string

quote(stored, version)

Return a valid CIF token for stored, suitable for the given version.

Parameters:

Name Type Description Default
stored str

The value as retrieved from the SQLite database. Presence-state encoding from encode_value is decoded here:

  • '.' or '?' (length 1) → PLACEHOLDER → returned unquoted.
  • '"."' or '"?"' (length 3) → quoted dot/question-mark → the inner character is re-quoted as a regular string.
  • All other values pass through the full quoting decision tree.
required
version CifVersion

CifVersion.CIF_2_0 or CifVersion.CIF_1_1. Controls which delimiter types are available (triple-quoted strings are CIF 2.0 only).

required

Returns:

Type Description
str

A valid CIF token. Semicolon-delimited tokens begin with '\n' so the caller can distinguish them from inline tokens.

Source code in src/cifflow/output/quote.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def quote(stored: str, version: CifVersion) -> str:
    r"""Return a valid CIF token for *stored*, suitable for the given *version*.

    Parameters
    ----------
    stored:
        The value as retrieved from the SQLite database.  Presence-state
        encoding from ``encode_value`` is decoded here:

        - ``'.'`` or ``'?'`` (length 1) → PLACEHOLDER → returned unquoted.
        - ``'"."'`` or ``'"?"'`` (length 3) → quoted dot/question-mark →
          the inner character is re-quoted as a regular string.
        - All other values pass through the full quoting decision tree.

    version:
        ``CifVersion.CIF_2_0`` or ``CifVersion.CIF_1_1``.  Controls which
        delimiter types are available (triple-quoted strings are CIF 2.0 only).

    Returns
    -------
    str
        A valid CIF token.  Semicolon-delimited tokens begin with ``'\n'``
        so the caller can distinguish them from inline tokens.
    """
    if stored in ('.', '?'):
        return stored                          # PLACEHOLDER — always unquoted
    if stored in ('"."', '"?"'):
        return _quote_string(stored[1], version)   # quoted dot/question-mark
    if version == CifVersion.CIF_2_0 and stored.startswith(_CONTAINER_PREFIX):
        return _format_container(decode_container(stored), version)
    return _quote_string(stored, version)

make_text_field(s, line_limit=None)

Produce a semicolon-delimited CIF text field for s.

Selects the correct wire format based on content requirements:

+--------------+-------------+-----------------------------+ | needs_prefix | needs_fold | format used | +==============+=============+=============================+ | False | False | plain semicolon | | True | False | prefix-only semicolon | | False | True | fold-only semicolon | | True | True | prefix + fold semicolon | +--------------+-------------+-----------------------------+

needs_prefix is True when s contains '\\n;', which would otherwise prematurely terminate the field.

needs_fold is True when line_limit is given and at least one content line in the text field would produce a physical line exceeding line_limit characters.

Valid for both CIF 1.1 and CIF 2.0 (semicolon fields exist in both).

Source code in src/cifflow/output/quote.py
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
def make_text_field(s: str, line_limit: int | None = None) -> str:
    r"""Produce a semicolon-delimited CIF text field for *s*.

    Selects the correct wire format based on content requirements:

    +--------------+-------------+-----------------------------+
    | needs_prefix | needs_fold  | format used                 |
    +==============+=============+=============================+
    | False        | False       | plain semicolon             |
    | True         | False       | prefix-only semicolon       |
    | False        | True        | fold-only semicolon         |
    | True         | True        | prefix + fold semicolon     |
    +--------------+-------------+-----------------------------+

    *needs_prefix* is ``True`` when *s* contains ``'\\n;'``, which would
    otherwise prematurely terminate the field.

    *needs_fold* is ``True`` when *line_limit* is given and at least one
    content line in the text field would produce a physical line exceeding
    *line_limit* characters.

    Valid for both CIF 1.1 and CIF 2.0 (semicolon fields exist in both).
    """
    needs_prefix = '\n;' in s
    needs_fold = False
    if line_limit is not None:
        if needs_prefix:
            # Physical line = '{_PREFIX}{content}', so content must fit in
            # line_limit - len(_PREFIX) chars.
            needs_fold = any(
                len(line) > line_limit - len(_PREFIX) for line in s.split('\n')
            )
        else:
            needs_fold = any(len(line) > line_limit for line in s.split('\n'))

    if needs_prefix and needs_fold:
        return _make_prefixed_folded_semicolon(s, line_limit)
    if needs_prefix:
        return _make_prefixed_semicolon(s)
    if needs_fold:
        return _make_folded_semicolon(s, line_limit)
    return _make_semicolon(s)