API Reference#

This section provides detailed documentation for all ACRO classes, functions, and modules.

Core Classes#

ACRO Class#

class acro.ACRO(config='default', suppress=False)[source]

Bases: Tables, Regression

ACRO: Automatic Checking of Research Outputs.

Attributes:
configdict

Safe parameters and their values.

resultsRecords

The current outputs including the results of checks.

suppressbool

Whether to automatically apply suppression

Parameters:
  • config (str)

  • suppress (bool)

Methods

add_comments(output, comment)

Add a comment to an output.

add_exception(output, reason)

Add an exception request to an output.

crosstab(index, columns[, values, rownames, ...])

Compute a simple cross tabulation of two (or more) factors.

custom_output(filename[, comment])

Add an unsupported output to the results dictionary.

finalise([path, ext])

Create a results file for checking.

hist(data, column[, by_val, grid, ...])

Create a histogram from a single column.

logit(endog, exog[, missing, check_rank])

Fits Logit model.

logitr(formula, data[, subset, drop_cols])

Fits Logit model from a formula and dataframe.

ols(endog[, exog, missing, hasconst])

Fits Ordinary Least Squares Regression.

olsr(formula, data[, subset, drop_cols])

Fits Ordinary Least Squares Regression from a formula and dataframe.

pivot_table(data[, values, index, columns, ...])

Create a spreadsheet-style pivot table as a DataFrame.

print_outputs()

Print the current results dictionary.

probit(endog, exog[, missing, check_rank])

Fits Probit model.

probitr(formula, data[, subset, drop_cols])

Fits Probit model from a formula and dataframe.

remove_output(key)

Remove an output from the results.

rename_output(old, new)

Rename an output.

surv_func(time, status, output[, entry, ...])

Estimate the survival function.

survival_plot(survival_table, survival_func, ...)

Create the survival plot according to the status of suppressing.

survival_table(survival_table, safe_table, ...)

Create the survival table according to the status of suppressing.

Examples

>>> acro = ACRO()
>>> results = acro.ols(
...     y, x
... )
>>> results.summary()
>>> acro.finalise(
...     "MYFOLDER",
...     "json",
... )
__init__(config='default', suppress=False)[source]

Construct a new ACRO object and reads parameters from config.

Parameters:
configstr

Name of a yaml configuration file with safe parameters.

suppressbool, default False

Whether to automatically apply suppression.

Parameters:
  • config (str)

  • suppress (bool)

Return type:

None

finalise(path='outputs', ext='json')[source]

Create a results file for checking.

Parameters:
pathstr

Name of a folder to save outputs.

extstr

Extension of the results file. Valid extensions: {json, xlsx}.

Returns:
Records

Object storing the outputs.

Parameters:

path (str)

Return type:

Records | None

remove_output(key)[source]

Remove an output from the results.

Parameters:
keystr

Key specifying which output to remove, e.g., ‘output_0’.

Parameters:

key (str)

Return type:

None

print_outputs()[source]

Print the current results dictionary.

Returns:
str

String representation of all outputs.

Return type:

str

custom_output(filename, comment='')[source]

Add an unsupported output to the results dictionary.

Parameters:
filenamestr

The name of the file that will be added to the list of the outputs.

commentstr

An optional comment.

Parameters:
  • filename (str)

  • comment (str)

Return type:

None

rename_output(old, new)[source]

Rename an output.

Parameters:
oldstr

The old name of the output.

newstr

The new name of the output.

Parameters:
  • old (str)

  • new (str)

Return type:

None

add_comments(output, comment)[source]

Add a comment to an output.

Parameters:
outputstr

The name of the output.

commentstr

The comment.

Parameters:
  • output (str)

  • comment (str)

Return type:

None

add_exception(output, reason)[source]

Add an exception request to an output.

Parameters:
outputstr

The name of the output.

reasonstr

The comment.

Parameters:
  • output (str)

  • reason (str)

Return type:

None

crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False, show_suppressed=False)

Compute a simple cross tabulation of two (or more) factors.

By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

To provide consistent behaviour with different aggregation functions, ‘empty’ rows or columns -i.e. that are all NaN or 0 (count,sum) are removed.

Parameters:
indexarray-like, Series, or list of arrays/Series

Values to group by in the rows.

columnsarray-like, Series, or list of arrays/Series

Values to group by in the columns.

valuesarray-like, optional

Array of values to aggregate according to the factors. Requires aggfunc be specified.

rownamessequence, default None

If passed, must match number of row arrays passed.

colnamessequence, default None

If passed, must match number of column arrays passed.

aggfuncstr, optional

If specified, requires values be specified as well.

marginsbool, default False

Add row/column margins (subtotals).

margins_namestr, default ‘All’

Name of the row/column that will contain the totals when margins is True.

dropnabool, default True

Do not include columns whose entries are all NaN.

normalizebool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False

Normalize by dividing all values by the sum of values. - If passed ‘all’ or True, will normalize over all values. - If passed ‘index’ will normalize over each row. - If passed ‘columns’ will normalize over each column. - If margins is True, will also normalize margin values.

show_suppressedbool. default False

how the totals are being calculated when the suppression is true

Returns:
DataFrame

Cross tabulation of the data.

Parameters:
  • margins (bool)

  • margins_name (str)

  • dropna (bool)

Return type:

DataFrame

hist(data, column, by_val=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, axis=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, backend=None, legend=False, filename='histogram.png', **kwargs)

Create a histogram from a single column.

The dataset and the column’s name should be passed to the function as parameters. If more than one column is used the histogram will not be calculated.

To save the histogram plot to a file, the user can specify a filename otherwise ‘histogram.png’ will be used as the filename. A number will be appended automatically to the filename to avoid overwriting the files.

Parameters:
dataDataFrame

The pandas object holding the data.

columnstr

The column that will be used to plot the histogram.

by_valobject, optional

If passed, then used to form histograms for separate groups.

gridbool, default True

Whether to show axis grid lines.

xlabelsizeint, default None

If specified changes the x-axis label size.

xrotfloat, default None

Rotation of x axis labels. For example, a value of 90 displays the x labels rotated 90 degrees clockwise.

ylabelsizeint, default None

If specified changes the y-axis label size.

yrotfloat, default None

Rotation of y axis labels. For example, a value of 90 displays the y labels rotated 90 degrees clockwise.

axisMatplotlib axes object, default None

The axes to plot the histogram on.

sharexbool, default True if ax is None else False

In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in. Note that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure.

shareybool, default False

In case subplots=True, share y axis and set some y axis labels to invisible.

figsizetuple, optional

The size in inches of the figure to create. Uses the value in matplotlib.rcParams by default.

layouttuple, optional

Tuple of (rows, columns) for the layout of the histograms.

binsint or sequence, default 10

Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin.

backendstr, default None

Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.

legendbool, default False

Whether to show the legend.

filename:

The name of the file where the plot will be saved.

Returns:
matplotlib.Axes

The histogram.

str

The name of the file where the histogram is saved.

logit(endog, exog, missing=None, check_rank=True)

Fits Logit model.

Parameters:
endogarray_like

A 1-d endogenous response variable. The dependent variable.

exogarray_like

A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user.

missingstr | None

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.

check_rankbool

Check exog rank to determine model degrees of freedom. Default is True. Setting to False reduces model initialization time when exog.shape[1] is large.

Returns:
BinaryResultsWrapper

Results.

Parameters:
  • missing (str | None)

  • check_rank (bool)

Return type:

BinaryResultsWrapper

logitr(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Fits Logit model from a formula and dataframe.

Parameters:
formulastr or generic Formula object

The formula specifying the model.

dataarray_like

The data for the model. See Notes.

subsetarray_like

An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.

drop_colsarray_like

Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.

*args

Additional positional argument that are passed to the model.

**kwargs

These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns:
RegressionResultsWrapper

Results.

Return type:

RegressionResultsWrapper

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.

ols(endog, exog=None, missing='none', hasconst=None, **kwargs)

Fits Ordinary Least Squares Regression.

Parameters:
endogarray_like

A 1-d endogenous response variable. The dependent variable.

exogarray_like

A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user.

missingstr

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.

hasconstNone or bool

Indicates whether the RHS includes a user-supplied constant. If True, a constant is not checked for and k_constant is set to 1 and all result statistics are calculated as if a constant is present. If False, a constant is not checked for and k_constant is set to 0.

**kwargs

Extra arguments that are used to set model properties when using the formula interface.

Returns:
RegressionResultsWrapper

Results.

Return type:

RegressionResultsWrapper

olsr(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Fits Ordinary Least Squares Regression from a formula and dataframe.

Parameters:
formulastr or generic Formula object

The formula specifying the model.

dataarray_like

The data for the model. See Notes.

subsetarray_like

An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.

drop_colsarray_like

Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.

*args

Additional positional argument that are passed to the model.

**kwargs

These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns:
RegressionResultsWrapper

Results.

Return type:

RegressionResultsWrapper

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.

pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)

Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

To provide consistent behaviour with different aggregation functions, ‘empty’ rows or columns -i.e. that are all NaN or 0 (count,sum) are removed.

Parameters:
dataDataFrame

The DataFrame to operate on.

valuescolumn, optional

Column to aggregate, optional.

indexcolumn, Grouper, array, or list of the previous

If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

columnscolumn, Grouper, array, or list of the previous

If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

aggfuncstr | list[str], default ‘mean’

If list of strings passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves).

fill_valuescalar, default None

Value to replace missing values with (in the resulting pivot table, after aggregation).

marginsbool, default False

Add all row / columns (e.g. for subtotal / grand totals).

dropnabool, default True

Do not include columns whose entries are all NaN.

margins_namestr, default ‘All’

Name of the row / column that will contain the totals when margins is True.

observedbool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

sortbool, default True

Specifies if the result should be sorted.

Returns:
DataFrame

Cross tabulation of the data.

Parameters:
  • data (DataFrame)

  • margins (bool)

  • dropna (bool)

  • margins_name (str)

  • observed (bool)

  • sort (bool)

Return type:

DataFrame

probit(endog, exog, missing=None, check_rank=True)

Fits Probit model.

Parameters:
endogarray_like

A 1-d endogenous response variable. The dependent variable.

exogarray_like

A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user.

missingstr | None

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.

check_rankbool

Check exog rank to determine model degrees of freedom. Default is True. Setting to False reduces model initialization time when exog.shape[1] is large.

Returns:
BinaryResultsWrapper

Results.

Parameters:
  • missing (str | None)

  • check_rank (bool)

Return type:

BinaryResultsWrapper

probitr(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Fits Probit model from a formula and dataframe.

Parameters:
formulastr or generic Formula object

The formula specifying the model.

dataarray_like

The data for the model. See Notes.

subsetarray_like

An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.

drop_colsarray_like

Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.

*args

Additional positional argument that are passed to the model.

**kwargs

These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns:
RegressionResultsWrapper

Results.

Return type:

RegressionResultsWrapper

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.

surv_func(time, status, output, entry=None, title=None, freq_weights=None, exog=None, bw_factor=1.0, filename='kaplan-meier.png')

Estimate the survival function.

Parameters:
timearray_like

An array of times (censoring times or event times)

statusarray_like

Status at the event time, status==1 is the ‘event’ (e.g. death, failure), meaning that the event occurs at the given value in time; status==0 indicatesthat censoring has occurred, meaning that the event occurs after the given value in time.

outputstr

A string determine the type of output. Available options are ‘table’, ‘plot’.

entryarray_like, optional An array of entry times for handling

left truncation (the subject is not in the risk set on or before the entry time)

titlestr

Optional title used for plots and summary output.

freq_weightsarray_like

Optional frequency weights

exogarray_like

Optional, if present used to account for violation of independent censoring.

bw_factorfloat

Band-width multiplier for kernel-based estimation. Only used if exog is provided.

filenamestr

The name of the file where the plot will be saved. Only used if the output is a plot.

Returns:
DataFrame

The survival table.

Return type:

DataFrame

survival_plot(survival_table, survival_func, filename, status, sdc, command, summary)

Create the survival plot according to the status of suppressing.

survival_table(survival_table, safe_table, status, sdc, command, summary, outcome)

Create the survival table according to the status of suppressing.

Record Management#

Record Classes#

class acro.record.Records[source]

Bases: object

Stores data related to a collection of output records.

Methods

add(status, output_type, properties, sdc, ...)

Add an output to the results.

add_comments(output, comment)

Add a comment to an output.

add_custom(filename[, comment])

Add an unsupported output to the results dictionary.

add_exception(output, reason)

Add an exception request to an output.

finalise(path, ext)

Create a results file for checking.

finalise_excel(path)

Write outputs to an excel spreadsheet.

finalise_json(path)

Write outputs to a JSON file.

get(key)

Return a specified output from the results.

get_index(index)

Return the output at the specified position.

get_keys()

Return the list of available output keys.

print()

Print the current results.

remove(key)

Remove an output from the results.

rename(old, new)

Rename an output.

validate_outputs()

Prompt researcher to complete any required fields.

write_checksums(path)

Write checksums for each file to checksums folder.

__init__()[source]

Construct a new object for storing multiple records.

Return type:

None

add(status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Add an output to the results.

Parameters:
statusstr

SDC status: {“pass”, “fail”, “review”}

output_typestr

Type of output, e.g., “regression”

propertiesdict

Dictionary containing structured output data.

sdcdict

Dictionary containing SDC results.

commandstr

String representation of the operation performed.

summarystr

String summarising the ACRO checks.

outcomeDataFrame

DataFrame describing the details of ACRO checks.

outputlist[str | list[DataFrame]

List of output DataFrames.

commentslist[str] | None, default None

List of strings entered by the user to add comments to the output.

Parameters:
  • status (str)

  • output_type (str)

  • properties (dict)

  • sdc (dict)

  • command (str)

  • summary (str)

  • outcome (DataFrame)

  • output (list[str] | list[DataFrame])

  • comments (list[str] | None)

Return type:

None

remove(key)[source]

Remove an output from the results.

Parameters:
keystr

Key specifying which output to remove, e.g., ‘output_0’.

Parameters:

key (str)

Return type:

None

get(key)[source]

Return a specified output from the results.

Parameters:
keystr

Key specifying which output to return, e.g., ‘output_0’.

Returns:
Record

The requested output.

Parameters:

key (str)

Return type:

Record

get_keys()[source]

Return the list of available output keys.

Returns:
list[str]

List of output names.

Return type:

list[str]

get_index(index)[source]

Return the output at the specified position.

Parameters:
indexint

Position of the output to return.

Returns:
Record

The requested output.

Parameters:

index (int)

Return type:

Record

add_custom(filename, comment=None)[source]

Add an unsupported output to the results dictionary.

Parameters:
filenamestr

The name of the file that will be added to the list of the outputs.

commentstr | None, default None

An optional comment.

Parameters:
  • filename (str)

  • comment (str | None)

Return type:

None

rename(old, new)[source]

Rename an output.

Parameters:
oldstr

The old name of the output.

newstr

The new name of the output.

Parameters:
  • old (str)

  • new (str)

Return type:

None

add_comments(output, comment)[source]

Add a comment to an output.

Parameters:
outputstr

The name of the output.

commentstr

The comment.

Parameters:
  • output (str)

  • comment (str)

Return type:

None

add_exception(output, reason)[source]

Add an exception request to an output.

Parameters:
outputstr

The name of the output.

reasonstr

The reason the output should be released.

Parameters:
  • output (str)

  • reason (str)

Return type:

None

print()[source]

Print the current results.

Returns:
str

String representation of all outputs.

Return type:

str

validate_outputs()[source]

Prompt researcher to complete any required fields.

Return type:

None

finalise(path, ext)[source]

Create a results file for checking.

Parameters:
pathstr

Name of a folder to save outputs.

extstr

Extension of the results file. Valid extensions: {json, xlsx}.

Parameters:
  • path (str)

  • ext (str)

Return type:

None

finalise_json(path)[source]

Write outputs to a JSON file.

Parameters:
pathstr

Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

finalise_excel(path)[source]

Write outputs to an excel spreadsheet.

Parameters:
pathstr

Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

write_checksums(path)[source]

Write checksums for each file to checksums folder.

Parameters:
pathstr

Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

Record Module#

Records#

ACRO: Output storage and serialization.

acro.record.load_outcome(outcome)[source]

Return a DataFrame from an outcome dictionary.

Parameters:
outcomedict

The outcome to load as a DataFrame.

Parameters:

outcome (dict)

Return type:

DataFrame

acro.record.load_output(path, output)[source]

Return a loaded output.

Parameters:
pathstr

The path to the output folder (with results.json).

outputlist[str]

The output to load.

Returns:
list[str] | list[DataFrame]

The loaded output field.

Parameters:
  • path (str)

  • output (list[str])

Return type:

list[str] | list[DataFrame]

class acro.record.Record(uid, status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Stores data related to a single output record.

Attributes:
uidstr

Unique identifier.

statusstr

SDC status: {“pass”, “fail”, “review”}

output_typestr

Type of output, e.g., “regression”

propertiesdict

Dictionary containing structured output data.

sdcdict

Dictionary containing SDC results.

commandstr

String representation of the operation performed.

summarystr

String summarising the ACRO checks.

outcomeDataFrame

DataFrame describing the details of ACRO checks.

outputAny

List of output DataFrames.

commentslist[str]

List of strings entered by the user to add comments to the output.

exceptionstr

Description of why an exception to fail/review should be granted.

timestampstr

Time the record was created in ISO format.

Parameters:
  • uid (str)

  • status (str)

  • output_type (str)

  • properties (dict)

  • sdc (dict)

  • command (str)

  • summary (str)

  • outcome (DataFrame)

  • output (list[str] | list[DataFrame])

  • comments (list[str] | None)

Methods

serialize_output([path])

Serialize outputs.

__init__(uid, status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Construct a new output record.

Parameters:
uidstr

Unique identifier.

statusstr

SDC status: {“pass”, “fail”, “review”}

output_typestr

Type of output, e.g., “regression”

propertiesdict

Dictionary containing structured output data.

sdcdict

Dictionary containing SDC results.

commandstr

String representation of the operation performed.

summarystr

String summarising the ACRO checks.

outcomeDataFrame

DataFrame describing the details of ACRO checks.

outputlist[str] | list[DataFrame]

List of output DataFrames.

commentslist[str] | None, default None

List of strings entered by the user to add comments to the output.

Parameters:
  • uid (str)

  • status (str)

  • output_type (str)

  • properties (dict)

  • sdc (dict)

  • command (str)

  • summary (str)

  • outcome (DataFrame)

  • output (list[str] | list[DataFrame])

  • comments (list[str] | None)

Return type:

None

serialize_output(path='outputs')[source]

Serialize outputs.

Parameters:
pathstr, default ‘outputs’

Name of the folder that outputs are to be written.

Returns:
list[str]

List of filepaths of the written outputs.

Parameters:

path (str)

Return type:

list[str]

class acro.record.Records[source]

Stores data related to a collection of output records.

Methods

add(status, output_type, properties, sdc, ...)

Add an output to the results.

add_comments(output, comment)

Add a comment to an output.

add_custom(filename[, comment])

Add an unsupported output to the results dictionary.

add_exception(output, reason)

Add an exception request to an output.

finalise(path, ext)

Create a results file for checking.

finalise_excel(path)

Write outputs to an excel spreadsheet.

finalise_json(path)

Write outputs to a JSON file.

get(key)

Return a specified output from the results.

get_index(index)

Return the output at the specified position.

get_keys()

Return the list of available output keys.

print()

Print the current results.

remove(key)

Remove an output from the results.

rename(old, new)

Rename an output.

validate_outputs()

Prompt researcher to complete any required fields.

write_checksums(path)

Write checksums for each file to checksums folder.

__init__()[source]

Construct a new object for storing multiple records.

Return type:

None

add(status, output_type, properties, sdc, command, summary, outcome, output, comments=None)[source]

Add an output to the results.

Parameters:
statusstr

SDC status: {“pass”, “fail”, “review”}

output_typestr

Type of output, e.g., “regression”

propertiesdict

Dictionary containing structured output data.

sdcdict

Dictionary containing SDC results.

commandstr

String representation of the operation performed.

summarystr

String summarising the ACRO checks.

outcomeDataFrame

DataFrame describing the details of ACRO checks.

outputlist[str | list[DataFrame]

List of output DataFrames.

commentslist[str] | None, default None

List of strings entered by the user to add comments to the output.

Parameters:
  • status (str)

  • output_type (str)

  • properties (dict)

  • sdc (dict)

  • command (str)

  • summary (str)

  • outcome (DataFrame)

  • output (list[str] | list[DataFrame])

  • comments (list[str] | None)

Return type:

None

remove(key)[source]

Remove an output from the results.

Parameters:
keystr

Key specifying which output to remove, e.g., ‘output_0’.

Parameters:

key (str)

Return type:

None

get(key)[source]

Return a specified output from the results.

Parameters:
keystr

Key specifying which output to return, e.g., ‘output_0’.

Returns:
Record

The requested output.

Parameters:

key (str)

Return type:

Record

get_keys()[source]

Return the list of available output keys.

Returns:
list[str]

List of output names.

Return type:

list[str]

get_index(index)[source]

Return the output at the specified position.

Parameters:
indexint

Position of the output to return.

Returns:
Record

The requested output.

Parameters:

index (int)

Return type:

Record

add_custom(filename, comment=None)[source]

Add an unsupported output to the results dictionary.

Parameters:
filenamestr

The name of the file that will be added to the list of the outputs.

commentstr | None, default None

An optional comment.

Parameters:
  • filename (str)

  • comment (str | None)

Return type:

None

rename(old, new)[source]

Rename an output.

Parameters:
oldstr

The old name of the output.

newstr

The new name of the output.

Parameters:
  • old (str)

  • new (str)

Return type:

None

add_comments(output, comment)[source]

Add a comment to an output.

Parameters:
outputstr

The name of the output.

commentstr

The comment.

Parameters:
  • output (str)

  • comment (str)

Return type:

None

add_exception(output, reason)[source]

Add an exception request to an output.

Parameters:
outputstr

The name of the output.

reasonstr

The reason the output should be released.

Parameters:
  • output (str)

  • reason (str)

Return type:

None

print()[source]

Print the current results.

Returns:
str

String representation of all outputs.

Return type:

str

validate_outputs()[source]

Prompt researcher to complete any required fields.

Return type:

None

finalise(path, ext)[source]

Create a results file for checking.

Parameters:
pathstr

Name of a folder to save outputs.

extstr

Extension of the results file. Valid extensions: {json, xlsx}.

Parameters:
  • path (str)

  • ext (str)

Return type:

None

finalise_json(path)[source]

Write outputs to a JSON file.

Parameters:
pathstr

Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

finalise_excel(path)[source]

Write outputs to an excel spreadsheet.

Parameters:
pathstr

Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

write_checksums(path)[source]

Write checksums for each file to checksums folder.

Parameters:
pathstr

Name of a folder to save outputs.

Parameters:

path (str)

Return type:

None

acro.record.load_records(path)[source]

Load outputs from a JSON file.

Parameters:
pathstr

Name of an output folder containing results.json.

Returns:
Records

The loaded records.

Parameters:

path (str)

Return type:

Records

Utilities#

Helper Functions#

ACRO: Utility Functions.

acro.utils.get_command(default, stack_list)[source]

Return the calling source line as a string.

Parameters:
defaultstr

Default string to return if unable to extract the stack.

stack_listlist[tuple]

A list of frame records for the caller’s stack. The first entry in the returned list represents the caller; the last entry represents the outermost call on the stack.

Returns:
str

The calling source line.

Parameters:
  • default (str)

  • stack_list (list[FrameInfo])

Return type:

str

acro.utils.prettify_table_string(table, separator=None)[source]

Add delimiters to table.to_string() to improve readability for onscreen display.

Splits fields on whitespace unless an optional separator is provided e.g. ‘,’ for csv.

Parameters:
  • table (DataFrame)

  • separator (str | None)

Return type:

str

Function Reference by Category#

Output Management#

  • finalise() - Prepare outputs for review

  • remove_output() - Remove specific output

  • print_outputs() - Display current outputs

  • custom_output() - Add custom output

  • rename_output() - Rename an output

  • add_comments() - Add comments to output

  • add_exception() - Add exception request

Method Parameters#

Common Parameters#

Many ACRO methods share common parameters:

Parameter

Type

Description

suppress

bool

Whether to suppress potentially disclosive outputs

show_suppressed

bool

Whether to display suppressed values in output

safe_threshold

int

Minimum cell count threshold for safety

safe_dof_threshold

int

Minimum degrees of freedom for statistical models

safe_nk_n

int

Minimum number of observations for nk-dominance rule

safe_nk_k

float

Threshold for nk-dominance rule (0-1)

safe_p_threshold

float

P-value threshold for statistical significance

Return Types#

Output Objects#

Most ACRO functions return specialized output objects that contain:

  • Original Result: The unmodified analysis result

  • Safe Result: The disclosure-controlled version

  • Disclosure Checks: Details of applied safety checks

  • Metadata: Information about the analysis and safety measures

# Example return object structure
result = acro.crosstab(df.col1, df.col2)

# Access components
print(result.output)          # Safe output for display
print(result.disclosure_checks)  # Applied safety checks
print(result.metadata)        # Analysis metadata

Return Types#

Output Objects#

ACRO functions return results that are automatically checked for disclosure risks:

import acro

# Initialize ACRO
session = acro.ACRO(suppress=True)

# Results are automatically checked
result = session.crosstab(df.col1, df.col2)

# View outputs
session.print_outputs()

# Finalize for review
session.finalise("outputs/")

Version Information#

import acro
from acro.version import __version__
print(__version__)

Compatibility#

Python Version Support#

ACRO supports Python 3.9 and later versions.

Dependency Requirements#

Package

Minimum Version

Purpose

pandas

1.5.0

Data manipulation and analysis

numpy

1.21.0

Numerical computing

statsmodels

0.13.0

Statistical modeling

openpyxl

3.0.0

Excel file support

pyyaml

5.4.0

Configuration file handling

Configuration#

ACRO uses YAML configuration files to set safety parameters:

# Initialize with default config
session = acro.ACRO(config="default", suppress=True)

# Configuration is loaded from default.yaml
print(session.config)

Custom Configuration#

Create custom YAML files for different environments:

# custom.yaml
safe_threshold: 10
safe_nk_n: 2
safe_nk_k: 0.9
check_missing_values: true
zeros_are_disclosive: false

See Also#