GitHub - emjun/tisane: Specification language for generating Generalized Linear Models (with or without mixed effects) from conceptual models (original) (raw)

tisane

Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships

TL;DR: Analysts can use Tisane to author generalized linear models with or without mixed effects. Tisane infers statistical models from variable relationships (from domain knowledge) that analysts specify. By doing so, Tisane helps analysts avoid common threats to external and statistical conclusion validity. Analysts do not need to be statistical experts!

Jump to see a tutorial here or see some examples here. Below, we provide an overview of the API and language primitives.


Tisane provides (i) a graph specification language for expressing relationships between variables and (ii) an interactive query and compilation process for inferring a valid statistical model from a set of variables in the graph.

Graph specification language

Variables

There are three types of variables: (i) Units, (ii) Measures, and (iii) SetUp, or environmental, variables.

# There are 386 adults participating in a study on weight loss.
adult = ts.Unit("member", cardinality=386)
# Adults have motivation levels.
motivation_level = adult.ordinal("motivation", order=[1, 2, 3, 4, 5, 6])
# Adults have pounds lost. 
pounds_lost = adult.numeric("pounds_lost")
# Adults have one of four racial identities in this study. 
race = adult.nominal("race group", cardinality=4)
# Researchers collected 12 weeks of data in this study. 
week = ts.SetUp("Week", order=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

Design rationale: We derived this type system from how other software tools focused on study design separate their concerns.

Relationships between variables

Analysts can use Tisane to express (i) conceptual and (ii) data measurement relationships between variables.

There are three different types of conceptual relationships.

These relationships are used to construct an internal graph representation of variables and their relationships with one another.

Internally, Tisane constructs a graph representing these relationships. Graph representation is useufl for inferring statistical models (next section).

For example, the below graph represents the above relationships. Rectangular nodes are units. Elliptical nodes are measures and set-up variables. The colored node is the dependent variable in the query.The dotted edges connect units to their measures. The solid edges represent conceptual relationships, as labeled.A graph representation created using DOT

A graph representation created using TikZ

Interactive query and compilation

Analysts query the relationships they have specified (technically, the internal graph represenation) for a statistical model. For each query, analysts must specify (i) a dependent variable to explain using (ii) a set of independent variables.

design = ts.Design(dv=pounds_lost, ivs=[treatment_approach, motivation_level]).assign_data(df)
ts.infer_statistical_model_from_design(design=design)

Query validation: To be a valid query, Tisane verifies that the dependent variable does not cause an independent variable. It would be conceptually incorrect to explain a cause from an effect.

Interaction model

A key aspect of Tisane that distinguishes it from other systems, such as Tea, is the importance of user interaction in guiding the statistical model that is inferred as output and ultimately fit.

Tisane generates a space of candidate statistical models and asks analysts disambiguation questions for (i) including additional main or interaction effects and, if applicable, correlating (or uncorrelating) random slopes and random intercepts as well as (ii) selecting among viable family/link function pairs.

To help analysts, Tisane provides text explanations and visualizations. For example, to show possible family functions, Tisane simulates data to fit a family function and visualizes it on top of a histogram of the analyst's data and explains to the how to use the visualization to compare family functions.

Statistical model inference

After validating a query, Tisane traverses the internal graph representation in order to generate candidate generalized linear models with or without mixed effects. A generalized linear model consists of a model effects structure and a family/link function pair.

Query

Analysts query the relationships they have specified (technically, the internal graph represenation) for a statistical model. For each query, analysts must specify (i) a dependent variable to explain using (ii) a set of independent variables.

Query validation: To be a valid query, Tisane verifies that the dependent variable does not cause an independent variable. It would be conceptually incorrect to explain a cause from an effect.

Statistical model inference

After validating a query, Tisane traverses the internal graph representation in order to generate candidate generalized linear models with or without mixed effects. A generalized linear model consists of a model effects structure and a family/link function pair.

Model effects structure

Tisane generates candidate main effects, interaction effects, and, if applicable, random effects based on analysts' expressed relationships.

INFERENCE.md explains all inference rules in greater detail.

Family and link functions depend on the data types of dependent variables and their distributions.

Based on the data type of the dependent variable, Tisane suggests matched pairs of possible family and link functions to consider. Tisane ensures that analysts consider only valid pairs of family and link functions.


Limitations


Benefits

Tisane helps analysts avoid common threats to statistical conclusion and external validity.

Specifically, Tisane helps analysts

These are four of the 37 threats to validity Shadish, Cook, and Campbell outline across internal, external, statistical conclusion, and construct validity [1].


Examples

Check out examples here!

References

[1] Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.