Quick Start#

This guide will get you up and running with ACRO in just a few minutes.

Installation#

pip install acro

Core ACRO Class#

class acro.ACRO(config='default', suppress=False)[source]

ACRO: Automatic Checking of Research Outputs.

Attributes:
configdict

Safe parameters and their values.

resultsRecords

The current outputs including the results of checks.

suppressbool

Whether to automatically apply suppression

Parameters:
  • config (str)

  • suppress (bool)

Methods

add_comments(output, comment)

Add a comment to an output.

add_exception(output, reason)

Add an exception request to an output.

crosstab(index, columns[, values, rownames, ...])

Compute a simple cross tabulation of two (or more) factors.

custom_output(filename[, comment])

Add an unsupported output to the results dictionary.

finalise([path, ext])

Create a results file for checking.

hist(data, column[, by_val, grid, ...])

Create a histogram from a single column.

logit(endog, exog[, missing, check_rank])

Fits Logit model.

logitr(formula, data[, subset, drop_cols])

Fits Logit model from a formula and dataframe.

ols(endog[, exog, missing, hasconst])

Fits Ordinary Least Squares Regression.

olsr(formula, data[, subset, drop_cols])

Fits Ordinary Least Squares Regression from a formula and dataframe.

pivot_table(data[, values, index, columns, ...])

Create a spreadsheet-style pivot table as a DataFrame.

print_outputs()

Print the current results dictionary.

probit(endog, exog[, missing, check_rank])

Fits Probit model.

probitr(formula, data[, subset, drop_cols])

Fits Probit model from a formula and dataframe.

remove_output(key)

Remove an output from the results.

rename_output(old, new)

Rename an output.

surv_func(time, status, output[, entry, ...])

Estimate the survival function.

survival_plot(survival_table, survival_func, ...)

Create the survival plot according to the status of suppressing.

survival_table(survival_table, safe_table, ...)

Create the survival table according to the status of suppressing.

Examples

>>> acro = ACRO()
>>> results = acro.ols(
...     y, x
... )
>>> results.summary()
>>> acro.finalise(
...     "MYFOLDER",
...     "json",
... )
__init__(config='default', suppress=False)[source]

Construct a new ACRO object and reads parameters from config.

Parameters:
configstr

Name of a yaml configuration file with safe parameters.

suppressbool, default False

Whether to automatically apply suppression.

Parameters:
  • config (str)

  • suppress (bool)

Return type:

None

finalise(path='outputs', ext='json')[source]

Create a results file for checking.

Parameters:
pathstr

Name of a folder to save outputs.

extstr

Extension of the results file. Valid extensions: {json, xlsx}.

Returns:
Records

Object storing the outputs.

Parameters:

path (str)

Return type:

Records | None

remove_output(key)[source]

Remove an output from the results.

Parameters:
keystr

Key specifying which output to remove, e.g., ‘output_0’.

Parameters:

key (str)

Return type:

None

print_outputs()[source]

Print the current results dictionary.

Returns:
str

String representation of all outputs.

Return type:

str

custom_output(filename, comment='')[source]

Add an unsupported output to the results dictionary.

Parameters:
filenamestr

The name of the file that will be added to the list of the outputs.

commentstr

An optional comment.

Parameters:
  • filename (str)

  • comment (str)

Return type:

None

rename_output(old, new)[source]

Rename an output.

Parameters:
oldstr

The old name of the output.

newstr

The new name of the output.

Parameters:
  • old (str)

  • new (str)

Return type:

None

add_comments(output, comment)[source]

Add a comment to an output.

Parameters:
outputstr

The name of the output.

commentstr

The comment.

Parameters:
  • output (str)

  • comment (str)

Return type:

None

add_exception(output, reason)[source]

Add an exception request to an output.

Parameters:
outputstr

The name of the output.

reasonstr

The comment.

Parameters:
  • output (str)

  • reason (str)

Return type:

None

Essential Methods#

Data Analysis#

ACRO.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False, show_suppressed=False)

Compute a simple cross tabulation of two (or more) factors.

By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

To provide consistent behaviour with different aggregation functions, ‘empty’ rows or columns -i.e. that are all NaN or 0 (count,sum) are removed.

Parameters:
indexarray-like, Series, or list of arrays/Series

Values to group by in the rows.

columnsarray-like, Series, or list of arrays/Series

Values to group by in the columns.

valuesarray-like, optional

Array of values to aggregate according to the factors. Requires aggfunc be specified.

rownamessequence, default None

If passed, must match number of row arrays passed.

colnamessequence, default None

If passed, must match number of column arrays passed.

aggfuncstr, optional

If specified, requires values be specified as well.

marginsbool, default False

Add row/column margins (subtotals).

margins_namestr, default ‘All’

Name of the row/column that will contain the totals when margins is True.

dropnabool, default True

Do not include columns whose entries are all NaN.

normalizebool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False

Normalize by dividing all values by the sum of values. - If passed ‘all’ or True, will normalize over all values. - If passed ‘index’ will normalize over each row. - If passed ‘columns’ will normalize over each column. - If margins is True, will also normalize margin values.

show_suppressedbool. default False

how the totals are being calculated when the suppression is true

Returns:
DataFrame

Cross tabulation of the data.

Parameters:
  • margins (bool)

  • margins_name (str)

  • dropna (bool)

Return type:

DataFrame

ACRO.olsr(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Fits Ordinary Least Squares Regression from a formula and dataframe.

Parameters:
formulastr or generic Formula object

The formula specifying the model.

dataarray_like

The data for the model. See Notes.

subsetarray_like

An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.

drop_colsarray_like

Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.

*args

Additional positional argument that are passed to the model.

**kwargs

These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns:
RegressionResultsWrapper

Results.

Return type:

RegressionResultsWrapper

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.

Output Management#

ACRO.finalise(path='outputs', ext='json')[source]

Create a results file for checking.

Parameters:
pathstr

Name of a folder to save outputs.

extstr

Extension of the results file. Valid extensions: {json, xlsx}.

Returns:
Records

Object storing the outputs.

Parameters:

path (str)

Return type:

Records | None

ACRO.print_outputs()[source]

Print the current results dictionary.

Returns:
str

String representation of all outputs.

Return type:

str

Quick Workflow#

  1. Install: pip install acro

  2. Initialize: Create ACRO session with suppress=True

  3. Analyze: Use ACRO methods for statistical analysis

  4. Review: Check outputs with print_outputs()

  5. Finalize: Export results with finalise()

Next Steps#