Specification details¶
A Specification is a agreed-upon standard that describes how a dataset should be organized. It includes the file paths allowed, the tags associated with a file, and their role in file name generation. This document will detail how to provide each of this options.
A minimal configuration of tags for a dataset looks like this:
tags:
- name: filename
pattern: "[/\\\\](.*)\\."
- name: extension
pattern: "(\\.[^/\\\\]+)$"
path_patterns:
- "{filename}{extension}"
Using the above minimal specification for path building with build_path()
:
# Build path according to specification with tags as parameters
path = specification.build_path(filename="file", extension=".txt")
# Print the built path.
print(path)
# file.txt
Top-level keys¶
path_patterns
¶
All details regarding permissible file paths sit inside the
path_patterns
key. The path_patterns
key consists of a
sequence of paths relative to the dataset root. Usage of tag values in
paths is supported.
A path can contain both template and ordinary patterns. The template patterns are:
[contents]
Used to indicate that the contents are optional.
The path
/dir[/subdir]/file
will match both/dir/file
anddir/subdir/file
.{name<values>|default}
Used to indicate that the template will be filled in by a tag value.
name
refers to the name of the tag,values
refers to the set of valid values separated by|
, anddefault
refers to the default value that is chosen while building path name from tags associated.values
anddefault
are optional.The path
/dir/{filename<file1|file2>|file1}
will match/dir/file1
and/dir/file2
but not/dir/file3
. If nofilename
tag is provided during path building,file1
is chosen as the default.