Quick Start#
This guide will get you up and running with ACRO in just a few minutes.
Installation#
pip install acro
Core ACRO Class#
- class acro.ACRO(config='default', suppress=False)[source]
ACRO: Automatic Checking of Research Outputs.
- Attributes:
- configdict
Safe parameters and their values.
- resultsRecords
The current outputs including the results of checks.
- suppressbool
Whether to automatically apply suppression
- Parameters:
config (str)
suppress (bool)
Methods
add_comments(output, comment)Add a comment to an output.
add_exception(output, reason)Add an exception request to an output.
crosstab(index, columns[, values, rownames, ...])Compute a simple cross tabulation of two (or more) factors.
custom_output(filename[, comment])Add an unsupported output to the results dictionary.
finalise([path, ext])Create a results file for checking.
hist(data, column[, by_val, grid, ...])Create a histogram from a single column.
logit(endog, exog[, missing, check_rank])Fits Logit model.
logitr(formula, data[, subset, drop_cols])Fits Logit model from a formula and dataframe.
ols(endog[, exog, missing, hasconst])Fits Ordinary Least Squares Regression.
olsr(formula, data[, subset, drop_cols])Fits Ordinary Least Squares Regression from a formula and dataframe.
pivot_table(data[, values, index, columns, ...])Create a spreadsheet-style pivot table as a DataFrame.
Print the current results dictionary.
probit(endog, exog[, missing, check_rank])Fits Probit model.
probitr(formula, data[, subset, drop_cols])Fits Probit model from a formula and dataframe.
remove_output(key)Remove an output from the results.
rename_output(old, new)Rename an output.
surv_func(time, status, output[, entry, ...])Estimate the survival function.
survival_plot(survival_table, survival_func, ...)Create the survival plot according to the status of suppressing.
survival_table(survival_table, safe_table, ...)Create the survival table according to the status of suppressing.
Examples
>>> acro = ACRO() >>> results = acro.ols( ... y, x ... ) >>> results.summary() >>> acro.finalise( ... "MYFOLDER", ... "json", ... )
- __init__(config='default', suppress=False)[source]
Construct a new ACRO object and reads parameters from config.
- Parameters:
- configstr
Name of a yaml configuration file with safe parameters.
- suppressbool, default False
Whether to automatically apply suppression.
- Parameters:
config (str)
suppress (bool)
- Return type:
None
- finalise(path='outputs', ext='json')[source]
Create a results file for checking.
- Parameters:
- pathstr
Name of a folder to save outputs.
- extstr
Extension of the results file. Valid extensions: {json, xlsx}.
- Returns:
- Records
Object storing the outputs.
- Parameters:
path (str)
- Return type:
Records | None
- remove_output(key)[source]
Remove an output from the results.
- Parameters:
- keystr
Key specifying which output to remove, e.g., ‘output_0’.
- Parameters:
key (str)
- Return type:
None
- print_outputs()[source]
Print the current results dictionary.
- Returns:
- str
String representation of all outputs.
- Return type:
str
- custom_output(filename, comment='')[source]
Add an unsupported output to the results dictionary.
- Parameters:
- filenamestr
The name of the file that will be added to the list of the outputs.
- commentstr
An optional comment.
- Parameters:
filename (str)
comment (str)
- Return type:
None
- rename_output(old, new)[source]
Rename an output.
- Parameters:
- oldstr
The old name of the output.
- newstr
The new name of the output.
- Parameters:
old (str)
new (str)
- Return type:
None
- add_comments(output, comment)[source]
Add a comment to an output.
- Parameters:
- outputstr
The name of the output.
- commentstr
The comment.
- Parameters:
output (str)
comment (str)
- Return type:
None
- add_exception(output, reason)[source]
Add an exception request to an output.
- Parameters:
- outputstr
The name of the output.
- reasonstr
The comment.
- Parameters:
output (str)
reason (str)
- Return type:
None
Essential Methods#
Data Analysis#
- ACRO.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False, show_suppressed=False)
Compute a simple cross tabulation of two (or more) factors.
By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.
To provide consistent behaviour with different aggregation functions, ‘empty’ rows or columns -i.e. that are all NaN or 0 (count,sum) are removed.
- Parameters:
- indexarray-like, Series, or list of arrays/Series
Values to group by in the rows.
- columnsarray-like, Series, or list of arrays/Series
Values to group by in the columns.
- valuesarray-like, optional
Array of values to aggregate according to the factors. Requires aggfunc be specified.
- rownamessequence, default None
If passed, must match number of row arrays passed.
- colnamessequence, default None
If passed, must match number of column arrays passed.
- aggfuncstr, optional
If specified, requires values be specified as well.
- marginsbool, default False
Add row/column margins (subtotals).
- margins_namestr, default ‘All’
Name of the row/column that will contain the totals when margins is True.
- dropnabool, default True
Do not include columns whose entries are all NaN.
- normalizebool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False
Normalize by dividing all values by the sum of values. - If passed ‘all’ or True, will normalize over all values. - If passed ‘index’ will normalize over each row. - If passed ‘columns’ will normalize over each column. - If margins is True, will also normalize margin values.
- show_suppressedbool. default False
how the totals are being calculated when the suppression is true
- Returns:
- DataFrame
Cross tabulation of the data.
- Parameters:
margins (bool)
margins_name (str)
dropna (bool)
- Return type:
DataFrame
- ACRO.olsr(formula, data, subset=None, drop_cols=None, *args, **kwargs)
Fits Ordinary Least Squares Regression from a formula and dataframe.
- Parameters:
- formulastr or generic Formula object
The formula specifying the model.
- dataarray_like
The data for the model. See Notes.
- subsetarray_like
An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.
- drop_colsarray_like
Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.
- *args
Additional positional argument that are passed to the model.
- **kwargs
These are passed to the model with one exception. The
eval_envkeyword is passed to patsy. It can be either apatsy:patsy.EvalEnvironmentobject or an integer indicating the depth of the namespace to use. For example, the defaulteval_env=0uses the calling namespace. If you wish to use a “clean” environment seteval_env=-1.
- Returns:
- RegressionResultsWrapper
Results.
- Return type:
RegressionResultsWrapper
Notes
data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. Arguments are passed in the same order as statsmodels.
Output Management#
- ACRO.finalise(path='outputs', ext='json')[source]
Create a results file for checking.
- Parameters:
- pathstr
Name of a folder to save outputs.
- extstr
Extension of the results file. Valid extensions: {json, xlsx}.
- Returns:
- Records
Object storing the outputs.
- Parameters:
path (str)
- Return type:
Records | None
- ACRO.print_outputs()[source]
Print the current results dictionary.
- Returns:
- str
String representation of all outputs.
- Return type:
str
Quick Workflow#
Install:
pip install acroInitialize: Create ACRO session with
suppress=TrueAnalyze: Use ACRO methods for statistical analysis
Review: Check outputs with
print_outputs()Finalize: Export results with
finalise()
Next Steps#
Configuration - Learn configuration options
API Reference - Explore the full API reference
Notebook Examples - Interactive Jupyter notebook examples