Database and DBManager¶
- class DBManager[source]¶
Bases:
object
Manages database connections and sessions.
- Session = None¶
- property connection: Connection¶
Provides a context-managed connection to the database.
- property session: None¶
Provides a context-managed session for performing database operations.
Database functionality for data write and access.
- class Database[source]¶
Bases:
Component
Generic database representation.
- id: Mapped[int]¶
- name: Mapped[str]¶
- host: Mapped[str]¶
- backend: Mapped[str]¶
- property connection¶
- property worksheet¶
- property meta¶
- property token¶
- build_column(name, dtype, **kwargs)[source]¶
Build SQLalchemy Column object given column description.
- create_table(table, cols, refs=[])[source]¶
Create or extend table within database given the description.
- Parameters:
table_name (str) – Name of the table to create or extend.
columns (list of dict) – List of columns specifications to add to the table.
constraints (list of dict, optional) – List of table constraints.
- connect(username=None, password=None, keyfile=None)[source]¶
Establish a connection to the database.
- Parameters:
username (str) – The database username.
password (str) – The database password.
keyfile (str) – Path to the service account keyfile. Required if backend is gsheet.
- Raises:
SQLAlchemyError – If the connection cannot be established at database.
ValueError – If the connection cannot be established at URL endpoint.
- get_records(table, cols=None)[source]¶
Retrieve records from table in database as a DataFrame.
- Parameters:
table (str) – Table in database from which to retrieve records.
cols (list of str) – Column names to select from table.
- to_table(df, table, check_dups=True, resolve_dups=False, check_fks=True, resolve_fks=False, insert_ignore=False, drop_na=None, threshold=None, if_exists='append', index=False, insert_method=None, **kwargs)[source]¶
Write records in DataFrame to a table.
- Parameters:
df (pandas.DataFrame) – DataFrame containing records.
table (str) – Table in database into which the records will be inserted.
check_dups (bool, default True) – Check for duplicates in records.
resolve_dups ([False, 'first', 'last'], default False) – Resolution method for duplicates if found.
check_fks (bool, default False) – Check if foreign keys present in parent.
resolve_fks (bool, default False) – Attempt to resolve missing foreign keys by inserting to parent.
insert_ignore (bool, default Flase) – Ignore insertion of records already present in table.
drop_na (list of df column names, default None) – If provided, records with na in all given columns are dropped.
threshold (int, None) – If non-NA values < threshold, then record dropped.
if_exists (['append', 'replace', 'fail'], default 'append') – Insert behavior in case table exists. - ‘append’ : Insert new values to the existing table. - ‘replace’ : Drop the table before inserting new values. - ‘fail’ : Raise a ValueError if table exists.
index (bool, default False) – Write DataFrame index as a column. Uses index_label as the column name in the table.
insert_method ({None, 'multi', callable}, optional) – Controls the SQL insertion clause used. - None : Uses standard SQL INSERT clause (one per row). - ‘multi’: Pass multiple values in a single INSERT clause. - callable with signature
(pd_table, conn, keys, data_iter)
. Details and a sample callable implementation can be found on Insertion method section of Insertion method.kwargs (key, value mappings) – Other keyword arguments are passed down to pandas.DataFrame.to_sql.
- Returns:
rows – Number of rows affected by to_sql. None is returned if the callable passed into insert_method does not return an integer number of rows.
- Return type:
None or int
- Raises:
ValueError –
When values provided are not sufficient for insert operation. - When the table already exists and if_exists is ‘fail’.
OperationalError –
Most likely there are duplicates records in the DataFrame. Other reasons are related to the database operation and are detailed on OperationalError.
- resolve_dups(df, table, resolve=False)[source]¶
Resolve duplicate primary keys.
- Parameters:
df (pandas.DataFrame) – DataFrame to check for duplicate records.
table (str) – Table the records will be inserted into.
resolve ([False, 'first', 'last'], default False) – Determines resolution method. * False : Mark all duplicates as False. * ‘first’ : Mark duplicates as False except for first occurence. * ‘last’ : Mark duplicates as False except for last occurence.
- Returns:
mask – Series of booleans showing whether each record in the Dataframe is not a duplicate.
- Return type:
pandas.Series
- Raises:
ValueError – When primary key duplicates are found and resolve is True.
- resolve_fks(df, table, resolve=False)[source]¶
Resolve missing foreign keys.
- Parameters:
df (pandas.DataFrame) – Non-duplicated records to be checked.
table (str) – Table the records will be inserted into.
resolve (bool, optional) – If True, attempt to resolve missing parent records by inserting.
- Returns:
mask – Series of booleans showing whether each record in the DataFrame has all required parent records.
- Return type:
pandas.Series
Warning
Resolution fails with a ValueError when a foreign key constraint on a table does not reference all the columns that provide the required values to insert records into the parent table.
- classmethod get(**filters) Base | None ¶
Return an instance from index if found, else None.
- Parameters:
filters (dict) – Dictionary of filter conditions used for querying.
- Return type:
An instance of cls or None if not found.
- classmethod get_identifiers(**kwargs) Dict[str, Any] ¶
Return dictionary of class identifier attributes and their values.
- Parameters:
kwargs (key, value pairs) – Key-value pairs of identifier attributes and their values.
- Return type:
A dictionary of the identifiers with their respective values.
- classmethod options(**filters) List[Base] ¶
Return all existing instances from index.
- Parameters:
filters (dict) – Dictionary of filter conditions used for querying.
- Return type:
A list of instances of cls that match the filters.
- type: Mapped[str]¶
- check_for_key(key, mapping)[source]¶
Return columns in table mapping which contain the key provided.
- migrate(src, dst, mapping, dry_run=False, na_vals=[], dtype_kws=None, **kwargs)[source]¶
Transform and migrate records from one database to another.
- Parameters:
src (str) – Connection established source Database instance from which data is to be transferred.
dst (str) – Connection established destination Database instance to which data is to be transferred.
mapping (dict) – Dictionary providing information on column mappings, type, transformations, and validation checks.
dry_run (bool) – If dry run, no records are inserted into target and tables are not created, but errors and invalid records are logged.
na_vals (list, optional) – List of values to be considered as record not available. By default, the values ‘’, ‘None’, ‘NONE’, ‘NA’, and ‘Not Applicable’ are considered.
dtype_kws (dict, optional) – Key, value pairs that will be passed to
almirah.utils.df.convert_column_type()
kwargs.kwargs (key, value mappings) – Other keyword arguments are passed down to almirah.Database.to_table.
- replace_value(value, column, mapping, file)[source]¶
Return unique replacement for given value based on mapping in file.