Fidelity
cifflow.fidelity.check
Fidelity comparison for CIF sources.
check_fidelity compares two CIF sources — files, paths, or pre-parsed
CifFile objects — by ingesting both into in-memory SQLite databases and
comparing the resulting data at the row level.
Known limitations
ValueType for structured tables
ValueType is not stored for structured table columns; only the raw
string value is persisted. ValueType fidelity for schema-known tags
is therefore not checkable. For _cif_fallback, value_type is
stored and compared directly.
SU fidelity in _cif_fallback
For structured tables, SU columns are normalised with
Decimal.normalize() so that 0.001 and 0.0010 compare equal.
For _cif_fallback, SU values are embedded in the full value(su)
string (e.g. 3.992(1)) and are compared as raw strings. Equivalent
SU representations such as 3.992(1) and 3.9920(10) will compare
as unequal.
Default-filled values (_cif_synthetic)
Values filled from enumeration_default during ingestion are excluded
from comparison. An explicit value in one source and a default-filled
value in the other will produce a "row_content" mismatch even if
identical. (_cif_synthetic is specced but not yet implemented in the
ingestion layer; this step is a no-op until it is.)
version parameter
The version parameter is not yet propagated to the parser as a
fallback default. Version detection uses the file magic line; files
without a magic line are parsed as CIF 1.1 regardless of version.
UUID-keyed tables When comparing sources where one uses natural primary keys and another uses generated UUID keys (e.g. ALL_BLOCKS output merging multiple CIF blocks), all PK columns of UUID-keyed tables and all FK columns pointing to those tables are stripped from the row representation in both connections. This allows content comparison without key-structure comparison.
FidelityReport
dataclass
Result of a :func:check_fidelity call.
Attributes:
| Name | Type | Description |
|---|---|---|
passed |
bool
|
|
mismatches |
list[FidelityMismatch]
|
Ordered list of all :class: |
Source code in src/cifflow/fidelity/check.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
FidelityMismatch
dataclass
A single semantic difference found between two CIF sources.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
str
|
Machine-readable category (e.g. |
source |
Literal['a', 'b', 'both']
|
Which source the mismatch is tied to: |
description |
str
|
Human-readable explanation of the difference. |
Source code in src/cifflow/fidelity/check.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
check_fidelity(source_a, source_b, schema=None, *, version=CifVersion.CIF_2_0, report_file=None)
Compare two CIF sources for semantic equivalence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_a
|
'str | pathlib.Path | CifFile'
|
First CIF source to compare. May be a file path ( |
required |
source_b
|
'str | pathlib.Path | CifFile'
|
Second CIF source to compare. Same accepted types as source_a. |
required |
schema
|
'str | pathlib.Path | SchemaSpec | dict | None'
|
Schema to use for ingestion. |
None
|
version
|
CifVersion
|
Fallback CIF version for files without a magic line. Default
|
CIF_2_0
|
report_file
|
'str | pathlib.Path | None'
|
Optional path for a human-readable text report. If provided, the report is written (UTF-8) before returning, regardless of pass/fail. |
None
|
Returns:
| Type | Description |
|---|---|
FidelityReport
|
Parse and ingestion errors are captured in the report; never raises for data errors. Schema loading failures propagate directly. |
Source code in src/cifflow/fidelity/check.py
661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 | |