Database and DBManager¶

class DBManager[source]¶

Bases: object

Manages database connections and sessions.

Session = None¶

property connection: Connection¶: Provides a context-managed connection to the database.

property session: None¶: Provides a context-managed session for performing database operations.

Database functionality for data write and access.

class Database[source]¶

Bases: Component

Generic database representation.

id: Mapped[int]¶

name: Mapped[str]¶

host: Mapped[str]¶

backend: Mapped[str]¶

property connection¶

property worksheet¶

property meta¶

property token¶

build_column(name, dtype, **kwargs)[source]¶: Build SQLalchemy Column object given column description.

build_constraint(cols, links)[source]¶: Build SQLalchmy constraint objects given links.

create_table(table, cols, refs=[])[source]¶

Create or extend table within database given the description.

Parameters:

table_name (str) – Name of the table to create or extend.
columns (list of dict) – List of columns specifications to add to the table.
constraints (list of dict, optional) – List of table constraints.

connect(username=None, password=None, keyfile=None)[source]¶

Establish a connection to the database.

Parameters:

username (str) – The database username.
password (str) – The database password.
keyfile (str) – Path to the service account keyfile. Required if backend is gsheet.

Raises:

SQLAlchemyError – If the connection cannot be established at database.
ValueError – If the connection cannot be established at URL endpoint.

get_primary(table)[source]¶: Return priamry keys for table.

get_records(table, cols=None)[source]¶

Retrieve records from table in database as a DataFrame.

Parameters:

table (str) – Table in database from which to retrieve records.
cols (list of str) – Column names to select from table.

to_table(df, table, check_dups=True, resolve_dups=False, check_fks=True, resolve_fks=False, insert_ignore=False, drop_na=None, threshold=None, if_exists='append', index=False, insert_method=None, **kwargs)[source]¶

Write records in DataFrame to a table.

Parameters:

df (pandas.DataFrame) – DataFrame containing records.
table (str) – Table in database into which the records will be inserted.
check_dups (bool, default True) – Check for duplicates in records.
resolve_dups ([False, 'first', 'last'], default False) – Resolution method for duplicates if found.
check_fks (bool, default False) – Check if foreign keys present in parent.
resolve_fks (bool, default False) – Attempt to resolve missing foreign keys by inserting to parent.
insert_ignore (bool, default Flase) – Ignore insertion of records already present in table.
drop_na (list of df column names, default None) – If provided, records with na in all given columns are dropped.
threshold (int, None) – If non-NA values < threshold, then record dropped.
if_exists (['append', 'replace', 'fail'], default 'append') – Insert behavior in case table exists. - ‘append’ : Insert new values to the existing table. - ‘replace’ : Drop the table before inserting new values. - ‘fail’ : Raise a ValueError if table exists.
index (bool, default False) – Write DataFrame index as a column. Uses index_label as the column name in the table.
insert_method ({None, 'multi', callable}, optional) – Controls the SQL insertion clause used. - None : Uses standard SQL INSERT clause (one per row). - ‘multi’: Pass multiple values in a single INSERT clause. - callable with signature (pd_table, conn, keys, data_iter). Details and a sample callable implementation can be found on Insertion method section of Insertion method.
kwargs (key, value mappings) – Other keyword arguments are passed down to pandas.DataFrame.to_sql.

Returns:

rows – Number of rows affected by to_sql. None is returned if the callable passed into insert_method does not return an integer number of rows.

Return type:

None or int

Raises:

ValueError –
- When values provided are not sufficient for insert operation. - When the table already exists and if_exists is ‘fail’.
OperationalError –
- Most likely there are duplicates records in the DataFrame. Other reasons are related to the database operation and are detailed on OperationalError.

resolve_dups(df, table, resolve=False)[source]¶

Resolve duplicate primary keys.

Parameters:

df (pandas.DataFrame) – DataFrame to check for duplicate records.
table (str) – Table the records will be inserted into.
resolve ([False, 'first', 'last'], default False) – Determines resolution method. * False : Mark all duplicates as False. * ‘first’ : Mark duplicates as False except for first occurence. * ‘last’ : Mark duplicates as False except for last occurence.

Returns:

mask – Series of booleans showing whether each record in the Dataframe is not a duplicate.

Return type:

pandas.Series

Raises:

ValueError – When primary key duplicates are found and resolve is True.

resolve_fks(df, table, resolve=False)[source]¶

Resolve missing foreign keys.

Parameters:

df (pandas.DataFrame) – Non-duplicated records to be checked.
table (str) – Table the records will be inserted into.
resolve (bool, optional) – If True, attempt to resolve missing parent records by inserting.

Returns:

mask – Series of booleans showing whether each record in the DataFrame has all required parent records.

Return type:

pandas.Series

Warning

Resolution fails with a ValueError when a foreign key constraint on a table does not reference all the columns that provide the required values to insert records into the parent table.

report()[source]¶: Generate report for database.

query(returns=None, **filters)[source]¶: Return table records that fit filter criteria.

classmethod get(**filters) → Base | None¶

Return an instance from index if found, else None.

Parameters:: filters (dict) – Dictionary of filter conditions used for querying.
Return type:: An instance of cls or None if not found.

classmethod get_identifiers(**kwargs) → Dict[str, Any]¶

Return dictionary of class identifier attributes and their values.

Parameters:: kwargs (key, value pairs) – Key-value pairs of identifier attributes and their values.
Return type:: A dictionary of the identifiers with their respective values.

classmethod options(**filters) → List[Base]¶

Return all existing instances from index.

Parameters:: filters (dict) – Dictionary of filter conditions used for querying.
Return type:: A list of instances of cls that match the filters.

type: Mapped[str]¶

check_for_key(key, mapping)[source]¶: Return columns in table mapping which contain the key provided.

migrate(src, dst, mapping, dry_run=False, na_vals=[], dtype_kws=None, **kwargs)[source]¶

Transform and migrate records from one database to another.

Parameters:

src (str) – Connection established source Database instance from which data is to be transferred.
dst (str) – Connection established destination Database instance to which data is to be transferred.
mapping (dict) – Dictionary providing information on column mappings, type, transformations, and validation checks.
dry_run (bool) – If dry run, no records are inserted into target and tables are not created, but errors and invalid records are logged.
na_vals (list, optional) – List of values to be considered as record not available. By default, the values ‘’, ‘None’, ‘NONE’, ‘NA’, and ‘Not Applicable’ are considered.
dtype_kws (dict, optional) – Key, value pairs that will be passed to almirah.utils.df.convert_column_type() kwargs.
kwargs (key, value mappings) – Other keyword arguments are passed down to almirah.Database.to_table.

reshape(records, steps)[source]¶: Reshape records into appropriate shape.

replace_value(value, column, mapping, file)[source]¶: Return unique replacement for given value based on mapping in file.

replace_column(series, value, to, file, strict=True)[source]¶: Replace values in series based on mapping in file.

transform(records, dtype_kws, mapping)[source]¶: Transform table into appropriate format.

transform_column(series, dtype_kws={}, **kwargs)[source]¶: Transform column to appropriate datatype.

validate(records, mapping)[source]¶: Validate table and return mask where True means valid.

validate_column(series, **kwargs)[source]¶: Validate column and return mask where True means valid.