bidsschematools.validator

A partial implementation of schema-based validation in Python.

Functions

bidsschematools.validator.log_errors(validation_result)

Raise errors for validation result.

Parameters:

validation_result (dict) – A dictionary as returned by validate_all() with keys including “schema_tracking”, “path_tracking”, “path_listing”, and, optionally “itemwise”. The “itemwise” value, if present, should be a list of dictionaries, with keys including “path”, “regex”, and “match”.

bidsschematools.validator.select_schema_path(bids_version=None, bids_root=None, bids_reference_root=None)

Select schema directory, according to a priority logic whereby the schema path is either:

  1. a concatenation of bids_reference_root and bids_version, if the latter is specified, and the BIDS version schema is compatible with the validator,

  2. a concatenation of bids_reference_root the detected version specification inside the BIDS root directory, if such a directory is provided and the BIDS version schema is compatible with the validator.

  3. None, expanded to the bundled schema supplied with the validator by bst.utils.get_bundled_schema_path.

Parameters:
  • bids_root (str or None, optional) – The path to the BIDS root for the paths to be validated.

  • bids_reference_root (str, optional) – Path where schema versions are stored, and which contains directories named exactly according to the respective schema version, e.g. “1.7.0”.

  • bids_version (str or None, optional) – BIDS version desired for validation. If empty, the dataset_description.json fie will be queried for the dataset schema version.

Returns:

A string which is a path to the selected schema directory.

Return type:

str

Notes

  • This is a purely aspirational function, and is pre-empted by logic inside

    bst.validator.validate_bids(), and further contingent on better schema stability and ongoing work in: https://github.com/bids-standard/bids-schema

  • The default bids_reference_root value is based on the FHS and ideally should be enforced.

    Alternatively this could be handled by an environment variable, though that also requires enforcement on the package distribution side.

bidsschematools.validator.validate_all(paths_list, regex_schema)

Validate bids_paths based on a regex_schema dictionary list, including regexes.

Parameters:
  • paths_list (list or str) – A string pointing to a BIDS directory for which paths should be validated, or a list of strings pointing to individual files or subdirectories which all reside within one and only one BIDS directory root (i.e. nested datasets should be validated separately).

  • regex_schema (list of dict) – A list of dictionaries as generated by regexify_all().

Returns:

results – A dictionary reporting the target files for validation, the unmatched files and unmatched regexes, and optionally the itemwise comparison results. Keys include “schema_tracking”, “path_tracking”, “path_listing”, “match_listing”, and optionally “itemwise”

Return type:

dict

Notes

  • Multi-source validation could be accomplished by distributing the resulting tracking_schema

    dictionary and further eroding it.

  • Currently only entities are captured in named groups, edit load_top_level() to name other

    groups as well.

bidsschematools.validator.validate_bids(in_paths, dummy_paths=False, bids_reference_root=None, schema_path=None, bids_version=None, report_path=False, suppress_errors=False, accept_non_bids_dir=False, exclude_files=None)

Validate paths according to BIDS schema.

Parameters:
  • in_paths (str or list of str) – Paths which to validate, may be individual files or directories.

  • dummy_paths (bool, optional) – Whether to accept path strings which do not correspond to either files or directories.

  • bids_reference_root (str, optional) – Path where schema versions are stored, and which contains directories named exactly according to the respective schema version, e.g. “1.7.0”. Currently this is untested.

  • bids_version (str or None, optional) – Version of BIDS schema, or path to schema. This supersedes the specification detected in dataset_description.json and is itself superseded if schema_path is specified.

  • schema_path (str or None, optional) – If a path is given, this will be expanded and used directly, ignoring all other BIDS version specification logic. This is not relative to bids_reference_root.

  • report_path (bool or str, optional) – If True a log will be written using the standard output path of .write_report(). If string, the string will be used as the output path. If the variable evaluates as False, no log will be written.

  • accept_non_bids_dir (bool, optional)

  • exclude_files (str, optional) – Files which will not be indexed for validation, use this if your data is in an archive standard which requires the presence of archive-specific files (e.g. DANDI requiring dandiset.yaml). Dot files (.*) do not need to be explicitly listed, as these are excluded by default.

Returns:

results – A dictionary reporting the target files for validation, the unmatched files and unmatched regexes, and optionally the itemwise comparison results. Keys include “schema_tracking”, “path_tracking”, “path_listing”, “match_listing”, and optionally “itemwise”

Return type:

dict

Examples

from bidsschematools import validator
bids_paths = '~/.data2/datalad/000026/noncompliant'
validator.validate_bids(bids_paths)

Notes

bidsschematools.validator.write_report(validation_result, report_path='~/.cache/bidsschematools/validator-report_{datetime}-{pid}.log', datetime_format='%Y%m%d%H%M%SZ')

Write a human-readable report based on the validation result.

Parameters:
  • validation_result (dict) – A dictionary as returned by validate_all() with keys including “schema_tracking”, “path_tracking”, “path_listing”, and, optionally “itemwise”. The “itemwise” value, if present, should be a list of dictionaries, with keys including “path”, “regex”, and “match”.

  • report_path (str, optional) – A path under which the report is to be saved, datetime, and pid are available as variables for string formatting, and will be expanded to the current datetime (as per the datetime_format parameter) and process ID, respectively.

  • datetime_format (str, optional) – A datetime format, optionally used for the report path.

Notes

  • Not using f-strings in order to prevent arbitrary code execution.