Allow users to specify arbitrary branch & clade labels by jameshadfield · Pull Request #728 · nextstrain/augur (original) (raw)

jameshadfield

jameshadfield

huddlej

jameshadfield

jameshadfield added a commit to nextstrain/ncov that referenced this pull request

Jun 10, 2021

@jameshadfield

This commit is a WIP commit to test the new functionality being introduced in augur PR 728 [1]. This allows us to simplify the nCoV workflow as we can explicitly define the attribute names used for clade membership and branch labelling.

These changes have only been tested for the "open" build, which itself is a WIP.

[1] nextstrain/augur#728

huddlej

trvrb

@jameshadfield

Our current implementation of read_node_data requires that every node in the tree is specified in the (merged) node_data files. For mutations this is overkill -- many nodes don't have mutations and it's overkill to require node_data JSONs to specify things like "node_name": {"muts": []}.

This may well be the general behaviour we want, but i didn't want to modify the read_node_data function which sees extensive use.

A welcome side effect of these changes is that we no longer have to supply both nuc and aa_muts.

@jameshadfield

See comments in tests/functional/clades.t

Also adds / updates comments and docstrings which were noticed as I worked through the code relating to these tests.

@jameshadfield

Workflows may be using this so I elected to hide it rather than remove it (and warn people it's a no-op if they do happen to be using it)

@jameshadfield

This function had a few subtle bugs in it which are fixed here, as well as improving the warning message to explain how this may affect clade inference.

Note that the presence of sequences on nodes other than the root is not considered by augur clades.

@jameshadfield

We could check all of these up-front instead of exiting upon the first error, and such a check should be part of validation within augur clades, but this commit is a simple solution to fix a reported bug.

Closes #965

@jameshadfield

@jameshadfield

A fatal error is raised if no clades are defined, but if a clade is not found on the tree it's only a warning. Suggested in #735

@jameshadfield

Multiple mutations at the same position on a single branch are now a fatal error. Previous behaviour was to overwrite such mutations when parsing. Suggested by #735.

@jameshadfield

Multiple improvements to augur clades

corneliusroemer added a commit that referenced this pull request

May 15, 2023

@corneliusroemer

In PR #728, extra node data validation was introduced. In particular, files without information for either nodes or branches caused erroring.

This is problematic for test scripts that may produce empty node data in test cases.

This PR removes the eager validation. In the future we could reintroduce it as a warning. And possibly an error but with opt-out.

This was referenced

May 15, 2023

corneliusroemer added a commit that referenced this pull request

May 15, 2023

@corneliusroemer

Resolves #1215

Warn instead error when no nodes in a node data json, fixing issue introduced recently in PR #728

In PR #728, extra node data validation was introduced. In particular, files without information for either nodes or branches caused erroring.

This is problematic for test scripts that may produce empty node data in test cases.

This PR removes the eager validation. In the future we could reintroduce it as a warning. And possibly an error but with opt-out.

This type of node data json was previously errored on by augur export, it is now accepted again:

{
  "nodes": {},
  "rbd_level_details": {}
}

Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment)

What steps should be taken to test the changes you've proposed? If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?

jameshadfield added a commit to nextstrain/ncov that referenced this pull request

May 16, 2023

@jameshadfield

This updates the workflow to use the new clades interface from augur v22 (see nextstrain/augur#728). In the process we can remove two rules from the workflow.

If this workflow is run with augur prior to v22, the emerging_lineages rule will error due to unknown arguments.

The script add_branch_labels.py is no longer used, but not removed here, as it contains logic to export spike mutations as branch labels which may be useful at some point. If we do use this, it would be better to produce an intermediate node-data JSON with a custom branch label to avoid modifying the auspice JSON after export.

jameshadfield added a commit to nextstrain/ncov that referenced this pull request

May 16, 2023

@jameshadfield

This updates the workflow to use the new clades interface from augur v22 (see nextstrain/augur#728). In the process we can remove two rules from the workflow.

If this workflow is run with augur prior to v22, the emerging_lineages rule will error due to unknown arguments.

The script add_branch_labels.py is no longer used, but not removed here, as it contains logic to export spike mutations as branch labels which may be useful at some point. If we do use this, it would be better to produce an intermediate node-data JSON with a custom branch label to avoid modifying the auspice JSON after export.

jameshadfield added a commit that referenced this pull request

May 16, 2023

@jameshadfield

The intention of the coloring logic is that if an auspice-config provides the clade_membership key then it is exported at that position in the colorings list. If clade_membership is not explicitly set in the config (but is present in a node-data file) then we have (for a very long time) added it as the very first entry in the colorings list.

PR #728 (augur v22.0.0) erroneously modified the behavior of the second case described above, which has now been restored by this commit.

jameshadfield added a commit that referenced this pull request

May 16, 2023

@jameshadfield

The intention of the coloring logic is that if an auspice-config provides the clade_membership key then it is exported at that position in the colorings list. If clade_membership is not explicitly set in the config (but is present in a node-data file) then we have (for a very long time) added it as the very first entry in the colorings list.

PR #728 (augur v22.0.0) erroneously modified the behavior of the second case described above, which has now been restored by this commit.

jameshadfield added a commit to nextstrain/ncov that referenced this pull request

May 16, 2023

@jameshadfield

This updates the workflow to use the new clades interface from augur v22.0.1 (see nextstrain/augur#728). In the process we can remove two rules from the workflow.

If this workflow is run with augur prior to v22, the emerging_lineages rule will error due to unknown arguments.

The script add_branch_labels.py is no longer used, but not removed here, as it contains logic to export spike mutations as branch labels which may be useful at some point. If we do use this, it would be better to produce an intermediate node-data JSON with a custom branch label to avoid modifying the auspice JSON after export.

jameshadfield added a commit to nextstrain/ncov that referenced this pull request

May 16, 2023

@jameshadfield

This updates the workflow to use the new clades interface from augur v22 (see nextstrain/augur#728). In the process we can remove two rules from the workflow. The minimum augur version is bumped to 22.0.1, as that includes a couple of important bug-fixes.

If this workflow is run with augur prior to v22, the emerging_lineages rule will error due to unknown arguments.

The script add_branch_labels.py is no longer used and thus removed here (as recommended in code review: #1000 (comment)) Note that it contained unused functionality to export spike mutations; if we reinstate this in the future we should update the output format to produce a node-data JSON with a custom branch label to avoid modifying the auspice JSON after export.

joverlee521 added a commit to nextstrain/ncov that referenced this pull request

Jan 29, 2024

@joverlee521

The JSON output from augur clades was updated to separate nodes and branches in nextstrain/augur#728 so now the assign_rbd_levels script needs to parse the branches in order to find the basal node.

joverlee521 added a commit to nextstrain/ncov that referenced this pull request

Jan 29, 2024

@joverlee521

The JSON output from augur clades was updated to separate nodes and branches in nextstrain/augur#728 so now the assign_rbd_levels script needs to parse the branches in order to find the basal node.

joverlee521 added a commit to nextstrain/ncov that referenced this pull request

Jan 30, 2024

@joverlee521

The JSON output from augur clades was updated to separate nodes and branches in nextstrain/augur#728 so now the assign_rbd_levels script needs to parse the branches in order to find the basal node.

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})