API Reference#

This section provides detailed documentation for all ACRO classes, functions, and modules.

Core Classes#

ACRO Classes

ACRO Class#

class acro.ACRO(config='default', suppress=False)[source]

Bases: Tables, Regression

ACRO: Automatic Checking of Research Outputs.

Attributes:

configdict: Safe parameters and their values.
resultsRecords: The current outputs including the results of checks.
suppressbool: Whether to automatically apply suppression

Parameters:

config (str)
suppress (bool)

Methods

`add_comments`(output, comment)	Add a comment to an output.
`add_exception`(output, reason)	Add an exception request to an output.
`crosstab`(index, columns[, values, rownames, ...])	Compute a simple cross tabulation of two (or more) factors.
`custom_output`(filename[, comment])	Add an unsupported output to the results dictionary.
`finalise`([path, ext])	Create a results file for checking.
`hist`(data, column[, by_val, grid, ...])	Create a histogram from a single column.
`logit`(endog, exog[, missing, check_rank])	Fits Logit model.
`logitr`(formula, data[, subset, drop_cols])	Fits Logit model from a formula and dataframe.
`ols`(endog[, exog, missing, hasconst])	Fits Ordinary Least Squares Regression.
`olsr`(formula, data[, subset, drop_cols])	Fits Ordinary Least Squares Regression from a formula and dataframe.
`pivot_table`(data[, values, index, columns, ...])	Create a spreadsheet-style pivot table as a DataFrame.
`print_outputs`()	Print the current results dictionary.
`probit`(endog, exog[, missing, check_rank])	Fits Probit model.
`probitr`(formula, data[, subset, drop_cols])	Fits Probit model from a formula and dataframe.
`remove_output`(key)	Remove an output from the results.
`rename_output`(old, new)	Rename an output.
`surv_func`(time, status, output[, entry, ...])	Estimate the survival function.
`survival_plot`(survival_table, survival_func, ...)	Create the survival plot according to the status of suppressing.
`survival_table`(survival_table, safe_table, ...)	Create the survival table according to the status of suppressing.

Examples

>>> acro = ACRO()
>>> results = acro.ols(
...     y, x
... )
>>> results.summary()
>>> acro.finalise(
...     "MYFOLDER",
...     "json",
... )

__init__(config='default', suppress=False)[source]

Construct a new ACRO object and reads parameters from config.

Parameters:

configstr: Name of a yaml configuration file with safe parameters.
suppressbool, default False: Whether to automatically apply suppression.

Parameters:

config (str)
suppress (bool)

Return type:

None

finalise(path='outputs', ext='json')[source]

Create a results file for checking.

Parameters:

pathstr: Name of a folder to save outputs.
extstr: Extension of the results file. Valid extensions: {json, xlsx}.

Returns:

Records: Object storing the outputs.

Parameters:

path (str)

Return type:

Records | None

remove_output(key)[source]

Remove an output from the results.

Parameters:

keystr: Key specifying which output to remove, e.g., ‘output_0’.

Parameters:

key (str)

Return type:

None

print_outputs()[source]

Print the current results dictionary.

Returns:

str: String representation of all outputs.

Return type:

str

custom_output(filename, comment='')[source]

Add an unsupported output to the results dictionary.

Parameters:

filenamestr: The name of the file that will be added to the list of the outputs.
commentstr: An optional comment.

Parameters:

filename (str)
comment (str)

Return type:

None

rename_output(old, new)[source]

Rename an output.

Parameters:

oldstr: The old name of the output.
newstr: The new name of the output.

Parameters:

old (str)
new (str)

Return type:

None

add_comments(output, comment)[source]

Add a comment to an output.

Parameters:

outputstr: The name of the output.
commentstr: The comment.

Parameters:

output (str)
comment (str)

Return type:

None

add_exception(output, reason)[source]

Add an exception request to an output.

Parameters:

outputstr: The name of the output.
reasonstr: The comment.

Parameters:

output (str)
reason (str)

Return type:

None

crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False, show_suppressed=False)

Compute a simple cross tabulation of two (or more) factors.

By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

To provide consistent behaviour with different aggregation functions, ‘empty’ rows or columns -i.e. that are all NaN or 0 (count,sum) are removed.

Parameters:

indexarray-like, Series, or list of arrays/Series: Values to group by in the rows.
columnsarray-like, Series, or list of arrays/Series: Values to group by in the columns.
valuesarray-like, optional: Array of values to aggregate according to the factors. Requires aggfunc be specified.
rownamessequence, default None: If passed, must match number of row arrays passed.
colnamessequence, default None: If passed, must match number of column arrays passed.
aggfuncstr, optional: If specified, requires values be specified as well.
marginsbool, default False: Add row/column margins (subtotals).
margins_namestr, default ‘All’: Name of the row/column that will contain the totals when margins is True.
dropnabool, default True: Do not include columns whose entries are all NaN.
normalizebool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False: Normalize by dividing all values by the sum of values. - If passed ‘all’ or True, will normalize over all values. - If passed ‘index’ will normalize over each row. - If passed ‘columns’ will normalize over each column. - If margins is True, will also normalize margin values.
show_suppressedbool. default False: how the totals are being calculated when the suppression is true

Returns:

DataFrame: Cross tabulation of the data.

Parameters:

margins (bool)
margins_name (str)
dropna (bool)

Return type:

DataFrame

hist(data, column, by_val=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, axis=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, backend=None, legend=False, filename='histogram.png', **kwargs)

Create a histogram from a single column.

The dataset and the column’s name should be passed to the function as parameters. If more than one column is used the histogram will not be calculated.

To save the histogram plot to a file, the user can specify a filename otherwise ‘histogram.png’ will be used as the filename. A number will be appended automatically to the filename to avoid overwriting the files.

Parameters:

dataDataFrame: The pandas object holding the data.
columnstr: The column that will be used to plot the histogram.
by_valobject, optional: If passed, then used to form histograms for separate groups.
gridbool, default True: Whether to show axis grid lines.
xlabelsizeint, default None: If specified changes the x-axis label size.
xrotfloat, default None: Rotation of x axis labels. For example, a value of 90 displays the x labels rotated 90 degrees clockwise.
ylabelsizeint, default None: If specified changes the y-axis label size.
yrotfloat, default None: Rotation of y axis labels. For example, a value of 90 displays the y labels rotated 90 degrees clockwise.
axisMatplotlib axes object, default None: The axes to plot the histogram on.
sharexbool, default True if ax is None else False: In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in. Note that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure.
shareybool, default False: In case subplots=True, share y axis and set some y axis labels to invisible.
figsizetuple, optional: The size in inches of the figure to create. Uses the value in matplotlib.rcParams by default.
layouttuple, optional: Tuple of (rows, columns) for the layout of the histograms.
binsint or sequence, default 10: Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin.
backendstr, default None: Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.
legendbool, default False: Whether to show the legend.
filename:: The name of the file where the plot will be saved.

Returns:

matplotlib.Axes: The histogram.
str: The name of the file where the histogram is saved.

logit(endog, exog, missing=None, check_rank=True)

Fits Logit model.

Parameters:

endogarray_like: A 1-d endogenous response variable. The dependent variable.
exogarray_like: A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user.
missingstr | None: Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
check_rankbool: Check exog rank to determine model degrees of freedom. Default is True. Setting to False reduces model initialization time when exog.shape[1] is large.

Returns:

BinaryResultsWrapper: Results.

Parameters:

missing (str | None)
check_rank (bool)

Return type:

BinaryResultsWrapper

logitr(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Fits Logit model from a formula and dataframe.

Parameters:

formulastr or generic Formula object: The formula specifying the model.
dataarray_like: The data for the model. See Notes.
subsetarray_like: An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.
drop_colsarray_like: Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.
*args: Additional positional argument that are passed to the model.
**kwargs: These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns:

RegressionResultsWrapper: Results.

Return type:

RegressionResultsWrapper

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.

ols(endog, exog=None, missing='none', hasconst=None, **kwargs)

Fits Ordinary Least Squares Regression.

Parameters:

endogarray_like: A 1-d endogenous response variable. The dependent variable.
exogarray_like: A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user.
missingstr: Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
hasconstNone or bool: Indicates whether the RHS includes a user-supplied constant. If True, a constant is not checked for and k_constant is set to 1 and all result statistics are calculated as if a constant is present. If False, a constant is not checked for and k_constant is set to 0.
**kwargs: Extra arguments that are used to set model properties when using the formula interface.

Returns:

RegressionResultsWrapper: Results.

Return type:

RegressionResultsWrapper

olsr(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Fits Ordinary Least Squares Regression from a formula and dataframe.

Parameters:

formulastr or generic Formula object: The formula specifying the model.
dataarray_like: The data for the model. See Notes.
subsetarray_like: An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.
drop_colsarray_like: Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.
*args: Additional positional argument that are passed to the model.
**kwargs: These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns:

RegressionResultsWrapper: Results.

Return type:

RegressionResultsWrapper

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.

pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)

Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

To provide consistent behaviour with different aggregation functions, ‘empty’ rows or columns -i.e. that are all NaN or 0 (count,sum) are removed.

Parameters:

dataDataFrame: The DataFrame to operate on.
valuescolumn, optional: Column to aggregate, optional.
indexcolumn, Grouper, array, or list of the previous: If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.
columnscolumn, Grouper, array, or list of the previous: If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.
aggfuncstr | list[str], default ‘mean’: If list of strings passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves).
fill_valuescalar, default None: Value to replace missing values with (in the resulting pivot table, after aggregation).
marginsbool, default False: Add all row / columns (e.g. for subtotal / grand totals).
dropnabool, default True: Do not include columns whose entries are all NaN.
margins_namestr, default ‘All’: Name of the row / column that will contain the totals when margins is True.
observedbool, default False: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
sortbool, default True: Specifies if the result should be sorted.

Returns:

DataFrame: Cross tabulation of the data.

Parameters:

data (DataFrame)
margins (bool)
dropna (bool)
margins_name (str)
observed (bool)
sort (bool)

Return type:

DataFrame

probit(endog, exog, missing=None, check_rank=True)

Fits Probit model.

Parameters:

endogarray_like: A 1-d endogenous response variable. The dependent variable.
exogarray_like: A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user.
missingstr | None: Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
check_rankbool: Check exog rank to determine model degrees of freedom. Default is True. Setting to False reduces model initialization time when exog.shape[1] is large.

Returns:

BinaryResultsWrapper: Results.

Parameters:

missing (str | None)
check_rank (bool)

Return type:

BinaryResultsWrapper

probitr(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Fits Probit model from a formula and dataframe.

Parameters:

formulastr or generic Formula object: The formula specifying the model.
dataarray_like: The data for the model. See Notes.
subsetarray_like: An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.
drop_colsarray_like: Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.
*args: Additional positional argument that are passed to the model.
**kwargs: These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns:

RegressionResultsWrapper: Results.

Return type:

RegressionResultsWrapper

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.

surv_func(time, status, output, entry=None, title=None, freq_weights=None, exog=None, bw_factor=1.0, filename='kaplan-meier.png')

Estimate the survival function.

Parameters:

timearray_like: An array of times (censoring times or event times)
statusarray_like: Status at the event time, status==1 is the ‘event’ (e.g. death, failure), meaning that the event occurs at the given value in time; status==0 indicatesthat censoring has occurred, meaning that the event occurs after the given value in time.
outputstr: A string determine the type of output. Available options are ‘table’, ‘plot’.
entryarray_like, optional An array of entry times for handling: left truncation (the subject is not in the risk set on or before the entry time)
titlestr: Optional title used for plots and summary output.
freq_weightsarray_like: Optional frequency weights
exogarray_like: Optional, if present used to account for violation of independent censoring.
bw_factorfloat: Band-width multiplier for kernel-based estimation. Only used if exog is provided.
filenamestr: The name of the file where the plot will be saved. Only used if the output is a plot.

Returns:

DataFrame: The survival table.

Return type:

DataFrame

survival_plot(survival_table, survival_func, filename, status, sdc, command, summary): Create the survival plot according to the status of suppressing.

survival_table(survival_table, safe_table, status, sdc, command, summary, outcome): Create the survival table according to the status of suppressing.

Record Management#

Record Classes#

class acro.record.Records[source]

Bases: object

Stores data related to a collection of output records.

Methods

`add`(status, output_type, properties, sdc, ...)	Add an output to the results.
`add_comments`(output, comment)	Add a comment to an output.
`add_custom`(filename[, comment])	Add an unsupported output to the results dictionary.
`add_exception`(output, reason)	Add an exception request to an output.
`finalise`(path, ext)	Create a results file for checking.
`finalise_excel`(path)	Write outputs to an excel spreadsheet.
`finalise_json`(path)	Write outputs to a JSON file.
`get`(key)	Return a specified output from the results.
`get_index`(index)	Return the output at the specified position.
`get_keys`()	Return the list of available output keys.
`print`()	Print the current results.
`remove`(key)	Remove an output from the results.
`rename`(old, new)	Rename an output.
`validate_outputs`()	Prompt researcher to complete any required fields.
`write_checksums`(path)	Write checksums for each file to checksums folder.

__init__()[source]

Construct a new object for storing multiple records.

Return type:: None

add(status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Add an output to the results.

Parameters:

statusstr: SDC status: {“pass”, “fail”, “review”}
output_typestr: Type of output, e.g., “regression”
propertiesdict: Dictionary containing structured output data.
sdcdict: Dictionary containing SDC results.
commandstr: String representation of the operation performed.
summarystr: String summarising the ACRO checks.
outcomeDataFrame: DataFrame describing the details of ACRO checks.
outputlist[str | list[DataFrame]: List of output DataFrames.
commentslist[str] | None, default None: List of strings entered by the user to add comments to the output.

Parameters:

status (str)
output_type (str)
properties (dict)
sdc (dict)
command (str)
summary (str)
outcome (DataFrame)
output (list[str] | list[DataFrame])
comments (list[str] | None)

Return type:

None

remove(key)[source]

Remove an output from the results.

Parameters:

keystr: Key specifying which output to remove, e.g., ‘output_0’.

Parameters:

key (str)

Return type:

None

get(key)[source]

Return a specified output from the results.

Parameters:

keystr: Key specifying which output to return, e.g., ‘output_0’.

Returns:

Record: The requested output.

Parameters:

key (str)

Return type:

Record

get_keys()[source]

Return the list of available output keys.

Returns:

list[str]: List of output names.

Return type:

list[str]

get_index(index)[source]

Return the output at the specified position.

Parameters:

indexint: Position of the output to return.

Returns:

Record: The requested output.

Parameters:

index (int)

Return type:

Record

add_custom(filename, comment=None)[source]

Add an unsupported output to the results dictionary.

Parameters:

filenamestr: The name of the file that will be added to the list of the outputs.
commentstr | None, default None: An optional comment.

Parameters:

filename (str)
comment (str | None)

Return type:

None

rename(old, new)[source]

Rename an output.

Parameters:

oldstr: The old name of the output.
newstr: The new name of the output.

Parameters:

old (str)
new (str)

Return type:

None

add_comments(output, comment)[source]

Add a comment to an output.

Parameters:

outputstr: The name of the output.
commentstr: The comment.

Parameters:

output (str)
comment (str)

Return type:

None

add_exception(output, reason)[source]

Add an exception request to an output.

Parameters:

outputstr: The name of the output.
reasonstr: The reason the output should be released.

Parameters:

output (str)
reason (str)

Return type:

None

print()[source]

Print the current results.

Returns:

str: String representation of all outputs.

Return type:

str

validate_outputs()[source]

Prompt researcher to complete any required fields.

Return type:: None

finalise(path, ext)[source]

Create a results file for checking.

Parameters:

pathstr: Name of a folder to save outputs.
extstr: Extension of the results file. Valid extensions: {json, xlsx}.

Parameters:

path (str)
ext (str)

Return type:

None

finalise_json(path)[source]

Write outputs to a JSON file.

Parameters:

pathstr: Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

finalise_excel(path)[source]

Write outputs to an excel spreadsheet.

Parameters:

pathstr: Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

write_checksums(path)[source]

Write checksums for each file to checksums folder.

Parameters:

pathstr: Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

Record Module#

Records#

ACRO: Output storage and serialization.

acro.record.load_outcome(outcome)[source]

Return a DataFrame from an outcome dictionary.

Parameters:

outcomedict: The outcome to load as a DataFrame.

Parameters:

outcome (dict)

Return type:

DataFrame

acro.record.load_output(path, output)[source]

Return a loaded output.

Parameters:

pathstr: The path to the output folder (with results.json).
outputlist[str]: The output to load.

Returns:

list[str] | list[DataFrame]: The loaded output field.

Parameters:

path (str)
output (list[str])

Return type:

list[str] | list[DataFrame]

class acro.record.Record(uid, status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Stores data related to a single output record.

Attributes:

uidstr: Unique identifier.
statusstr: SDC status: {“pass”, “fail”, “review”}
output_typestr: Type of output, e.g., “regression”
propertiesdict: Dictionary containing structured output data.
sdcdict: Dictionary containing SDC results.
commandstr: String representation of the operation performed.
summarystr: String summarising the ACRO checks.
outcomeDataFrame: DataFrame describing the details of ACRO checks.
outputAny: List of output DataFrames.
commentslist[str]: List of strings entered by the user to add comments to the output.
exceptionstr: Description of why an exception to fail/review should be granted.
timestampstr: Time the record was created in ISO format.

Parameters:

uid (str)
status (str)
output_type (str)
properties (dict)
sdc (dict)
command (str)
summary (str)
outcome (DataFrame)
output (list[str] | list[DataFrame])
comments (list[str] | None)

Methods

serialize_output([path])

Serialize outputs.

__init__(uid, status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Construct a new output record.

Parameters:

uidstr: Unique identifier.
statusstr: SDC status: {“pass”, “fail”, “review”}
output_typestr: Type of output, e.g., “regression”
propertiesdict: Dictionary containing structured output data.
sdcdict: Dictionary containing SDC results.
commandstr: String representation of the operation performed.
summarystr: String summarising the ACRO checks.
outcomeDataFrame: DataFrame describing the details of ACRO checks.
outputlist[str] | list[DataFrame]: List of output DataFrames.
commentslist[str] | None, default None: List of strings entered by the user to add comments to the output.

Parameters:

uid (str)
status (str)
output_type (str)
properties (dict)
sdc (dict)
command (str)
summary (str)
outcome (DataFrame)
output (list[str] | list[DataFrame])
comments (list[str] | None)

Return type:

None

serialize_output(path='outputs')[source]

Serialize outputs.

Parameters:

pathstr, default ‘outputs’: Name of the folder that outputs are to be written.

Returns:

list[str]: List of filepaths of the written outputs.

Parameters:

path (str)

Return type:

list[str]

class acro.record.Records[source]

Stores data related to a collection of output records.

Methods

`add`(status, output_type, properties, sdc, ...)	Add an output to the results.
`add_comments`(output, comment)	Add a comment to an output.
`add_custom`(filename[, comment])	Add an unsupported output to the results dictionary.
`add_exception`(output, reason)	Add an exception request to an output.
`finalise`(path, ext)	Create a results file for checking.
`finalise_excel`(path)	Write outputs to an excel spreadsheet.
`finalise_json`(path)	Write outputs to a JSON file.
`get`(key)	Return a specified output from the results.
`get_index`(index)	Return the output at the specified position.
`get_keys`()	Return the list of available output keys.
`print`()	Print the current results.
`remove`(key)	Remove an output from the results.
`rename`(old, new)	Rename an output.
`validate_outputs`()	Prompt researcher to complete any required fields.
`write_checksums`(path)	Write checksums for each file to checksums folder.

__init__()[source]

Construct a new object for storing multiple records.

Return type:: None

add(status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Add an output to the results.

Parameters:

statusstr: SDC status: {“pass”, “fail”, “review”}
output_typestr: Type of output, e.g., “regression”
propertiesdict: Dictionary containing structured output data.
sdcdict: Dictionary containing SDC results.
commandstr: String representation of the operation performed.
summarystr: String summarising the ACRO checks.
outcomeDataFrame: DataFrame describing the details of ACRO checks.
outputlist[str | list[DataFrame]: List of output DataFrames.
commentslist[str] | None, default None: List of strings entered by the user to add comments to the output.

Parameters:

status (str)
output_type (str)
properties (dict)
sdc (dict)
command (str)
summary (str)
outcome (DataFrame)
output (list[str] | list[DataFrame])
comments (list[str] | None)

Return type:

None

remove(key)[source]

Remove an output from the results.

Parameters:

keystr: Key specifying which output to remove, e.g., ‘output_0’.

Parameters:

key (str)

Return type:

None

get(key)[source]

Return a specified output from the results.

Parameters:

keystr: Key specifying which output to return, e.g., ‘output_0’.

Returns:

Record: The requested output.

Parameters:

key (str)

Return type:

Record

get_keys()[source]

Return the list of available output keys.

Returns:

list[str]: List of output names.

Return type:

list[str]

get_index(index)[source]

Return the output at the specified position.

Parameters:

indexint: Position of the output to return.

Returns:

Record: The requested output.

Parameters:

index (int)

Return type:

Record

add_custom(filename, comment=None)[source]

Add an unsupported output to the results dictionary.

Parameters:

filenamestr: The name of the file that will be added to the list of the outputs.
commentstr | None, default None: An optional comment.

Parameters:

filename (str)
comment (str | None)

Return type:

None

rename(old, new)[source]

Rename an output.

Parameters:

oldstr: The old name of the output.
newstr: The new name of the output.

Parameters:

old (str)
new (str)

Return type:

None

add_comments(output, comment)[source]

Add a comment to an output.

Parameters:

outputstr: The name of the output.
commentstr: The comment.

Parameters:

output (str)
comment (str)

Return type:

None

add_exception(output, reason)[source]

Add an exception request to an output.

Parameters:

outputstr: The name of the output.
reasonstr: The reason the output should be released.

Parameters:

output (str)
reason (str)

Return type:

None

print()[source]

Print the current results.

Returns:

str: String representation of all outputs.

Return type:

str

validate_outputs()[source]

Prompt researcher to complete any required fields.

Return type:: None

finalise(path, ext)[source]

Create a results file for checking.

Parameters:

pathstr: Name of a folder to save outputs.
extstr: Extension of the results file. Valid extensions: {json, xlsx}.

Parameters:

path (str)
ext (str)

Return type:

None

finalise_json(path)[source]

Write outputs to a JSON file.

Parameters:

pathstr: Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

finalise_excel(path)[source]

Write outputs to an excel spreadsheet.

Parameters:

pathstr: Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

write_checksums(path)[source]

Write checksums for each file to checksums folder.

Parameters:

pathstr: Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

acro.record.load_records(path)[source]

Load outputs from a JSON file.

Parameters:

pathstr: Name of an output folder containing results.json.

Returns:

Records: The loaded records.

Parameters:

path (str)

Return type:

Records

Utilities#

Helper Functions#

ACRO: Utility Functions.

acro.utils.get_command(default, stack_list)[source]

Return the calling source line as a string.

Parameters:

defaultstr: Default string to return if unable to extract the stack.
stack_listlist[tuple]: A list of frame records for the caller’s stack. The first entry in the returned list represents the caller; the last entry represents the outermost call on the stack.

Returns:

str: The calling source line.

Parameters:

default (str)
stack_list (list[FrameInfo])

Return type:

str

acro.utils.prettify_table_string(table, separator=None)[source]

Add delimiters to table.to_string() to improve readability for onscreen display.

Splits fields on whitespace unless an optional separator is provided e.g. ‘,’ for csv.

Parameters:

table (DataFrame)
separator (str | None)

Return type:

str

Function Reference by Category#

Output Management#

finalise() - Prepare outputs for review
remove_output() - Remove specific output
print_outputs() - Display current outputs
custom_output() - Add custom output
rename_output() - Rename an output
add_comments() - Add comments to output
add_exception() - Add exception request

Method Parameters#

Common Parameters#

Many ACRO methods share common parameters:

Parameter	Type	Description
`suppress`	bool	Whether to suppress potentially disclosive outputs
`show_suppressed`	bool	Whether to display suppressed values in output
`safe_threshold`	int	Minimum cell count threshold for safety
`safe_dof_threshold`	int	Minimum degrees of freedom for statistical models
`safe_nk_n`	int	Minimum number of observations for nk-dominance rule
`safe_nk_k`	float	Threshold for nk-dominance rule (0-1)
`safe_p_threshold`	float	P-value threshold for statistical significance

Return Types#

Output Objects#

Most ACRO functions return specialized output objects that contain:

Original Result: The unmodified analysis result
Safe Result: The disclosure-controlled version
Disclosure Checks: Details of applied safety checks
Metadata: Information about the analysis and safety measures

# Example return object structure
result = acro.crosstab(df.col1, df.col2)

# Access components
print(result.output)          # Safe output for display
print(result.disclosure_checks)  # Applied safety checks
print(result.metadata)        # Analysis metadata

Return Types#

Output Objects#

ACRO functions return results that are automatically checked for disclosure risks:

import acro

# Initialize ACRO
session = acro.ACRO(suppress=True)

# Results are automatically checked
result = session.crosstab(df.col1, df.col2)

# View outputs
session.print_outputs()

# Finalize for review
session.finalise("outputs/")

Version Information#

import acro
from acro.version import __version__
print(__version__)

Compatibility#

Python Version Support#

ACRO supports Python 3.9 and later versions.

Dependency Requirements#

Package	Minimum Version	Purpose
pandas	1.5.0	Data manipulation and analysis
numpy	1.21.0	Numerical computing
statsmodels	0.13.0	Statistical modeling
openpyxl	3.0.0	Excel file support
pyyaml	5.4.0	Configuration file handling

Configuration#

ACRO uses YAML configuration files to set safety parameters:

# Initialize with default config
session = acro.ACRO(config="default", suppress=True)

# Configuration is loaded from default.yaml
print(session.config)

Custom Configuration#

Create custom YAML files for different environments:

# custom.yaml
safe_threshold: 10
safe_nk_n: 2
safe_nk_k: 0.9
check_missing_values: true
zeros_are_disclosive: false

API Reference#

Core Classes#

ACRO Class#

Record Management#

Record Classes#

Record Module#

Records#

Utilities#

Helper Functions#

Function Reference by Category#

Output Management#

Method Parameters#

Common Parameters#

Return Types#

Output Objects#

Return Types#

Output Objects#

Version Information#

Compatibility#

Python Version Support#

Dependency Requirements#

Configuration#

Custom Configuration#

See Also#

This Page