User Guide¶
Installing¶
almirah can be installed with pip
$ python -m pip install almirah
Organizing the dataset¶
The first step to using any dataset is getting it in shape to allow manual exploration or automated retrieval of a subset. To get started, import the almirah module:
import almirah
almirah relies on two components to organize a dataset:
a
Specification
object that takes in the contents of Specification details config as an argument to describe the details of the specification to be abided by.an Organization rules config that lays down the rules that will be followed for organization.
spec = Specification.create_from_file("/path to details")
spec.organize("/path to rules")
Note
For modalities such as Eye tracking, and Genomics where the BIDS-specification is still in the proposal stage or is only present as a descriptor, a custom BIDS-like specification that mimics the general pattern of the specification is used in the config.
Indexing¶
The indexing operation crawls through the organized dataset and stores files and directories that have a matching path in the specification in a database. This enables easy querying and filtering.
At an abstract level, each dataset can be thought of as a
Layout
of files. Each File
is
associated with a bunch of tags. Each Tag
is a
name:value pair that is derived from the filename and metadata files
associated to a file. To create an instance of
Layout
, pass the root directory path of the dataset
and the Specification
name to
Layout
:
lay = Layout(root="/path to dataset", specification_name="name")
almirah automatically retrieves a previous index if the layout is
found. If not, the Layout
can be indexed using
index()
. Index changes and additions are not
written unless commited using commit()
.
Tip
Setting valid_only = False
does not limit the files indexed to
only those that having matching paths in the specification
associated. This can act as a quick way to index the whole
directory or a dirty trick when you do not have time to redefine
the specification to accomodate a new path.
It is also possible to have a collection of layouts as a :class`~dataset.Dataset`:
ds = Dataset(name="name")
ds.add(layout_1, layout_2,..., layout_n)
By this, parts of a dataset located in diverse paths can be virtually collected into one for querying.
Tip
Objects can be retrieved once commited from the index, or if
present in the current session by using get(). To retrieve the a
Layout
of specification name bids, you can do
Layout.get(specification_name='bids')
.
Filter and Query¶
To retrieve a subset of files that match certain tags, provide the
criterions as keyword arguments to query()
and
File
objects of passing files will be returned:
lay.query(subject = "A3456", extension = ".png")
Tip
If you do not know the possible tags, options()
might be of help. option()
is available for all objects in and
can be used as a look-up.
Converting file formats¶
Sometimes you want to convert the file format of a file. For example,
from DICOM to NIfTI, or from EDF (Eyelink Data Format) to ASCII. These
are possible by provided the files to be converted, the output format
desired, and the output directory as arguments to
convert()
:
from almirah.utils.convert import convert
files = lay.query(extension = ".dcm")
convert(files, "NIfTI", <Layout of output dir>)
Currently, the following conversions are supported:
Input extensions |
Output formats |
Datatype |
---|---|---|
dcm |
NIfTI |
Magnetic resonance imaging |
bdf, cnt, data, edf, gdf, mat, mff, nxe, set, vhdr |
BrainVision, EDF, FIF |
Electroencephalography |
edf |
ASCII |
Eye tracking |
nirx |
SNIRF |
Functional near-infrared spectroscopy |
Interfacing with a Database¶
almirah can connect to databases supported by SQLAlchemy, Google
sheets, and URL endpoints that support retrieval of database
contents with Database
. During object creation,
name
, host
, and backend
have to be provided to
Database
. Later, while querying a connection needs
to established using connect()
by providing
the credentials:
import almirah
db = Database(name="db_name", host="db_host", backend="db_type")
# Create connection with database
db.connect("username", "password")
Only reading is supported is databases that are Google sheets or URL endpoints. Operations such as table creation, writing, metadata manipulation are only available in SQLAlchemy-valid databases.
To create a table in the database, the table schema is described by
the Database mapping config and passed to
create_table()
:
db.create_table({"mapping":"dict"})
To insert records into a table in the database, a
pandas.DataFrame
object whose columns match the table columns
is provided as an argument to to_table()
along
with the table name:
db.to_table(df, "table name")
Important
If no table of the name is present, a table is created automatically. This might not be desirable if you would like to define relationships between tables as the created table is vanilla and lacks these.
get_records()
can retrieve records from a
table in the database given the table name. A subset of table columns
can be provided via cols
, if not all columns are to be retrieved.
db.get_records("table_name")
Reporting¶
High-level summaries of a dataset can be reported by using
dataset.Database.report()
.
obj.report()
The tags based on which the summary is to be generated can be provided
via the tags
argument. subject is the used if no values are
provided.
Errors and Exceptions¶
almirah wraps built-in python exceptions with appropriate messages, for example:
raise ValueError(f"Unsupported transform value {transform}")
See Exception
for context.
Logging¶
If you are using the standard library logging
module, almirah will emit
several logs. In some cases, this can be undesireable. You can use the
standard logger interface to change the log level for almirah’s logger:
logging.getLogger("almirah").setLevel(logging.WARNING)
The CALM-Brain Resource
If you would like to use almirah to access the CALM-Brain resource, visit the CALM-Brain wiki.