Support more styles for xlsxwriter by jnothman · Pull Request #16149 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation30 Commits16 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

jnothman

I was surprised to find that despite the interchangeable representation of Excel styles, xlsxwriter did not have good style support.

I've not added direct tests for this functionality, but test some of it through test_styler_to_excel.

@jnothman

There is a consistent test failure, but not one I've managed to replicate locally.

@codecov

Codecov Report

Merging #16149 into master will decrease coverage by <.01%.
The diff coverage is 89.65%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #16149 +/- ##

Flag Coverage Δ
#multiple 88.61% <89.65%> (-0.01%) ⬇️
#single 40.3% <3.44%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/excel.py 80.55% <89.65%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b80ed3...c935f5d. Read the comment docs.

@codecov

@jnothman

Finally passing tests after identifying a recently fixed openpyxl bug that got in the way

@jnothman

jreback

if num_format_str is not None:
xl_format.set_num_format(num_format_str)
props['num_format'] = num_format_str

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move more of this logic out of here an into the formats dir somewhere?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean explicitly moving the number format logic out? Yes, perhaps that's a worthwhile refactoring. I think we should also be calculating the number format from the display.precision config. For which reason, I believe all of those changes belong in a different PR.

Or are you talking about moving this mapping logic out? Well currently we assume nested style dicts as an interchange format, which are well-suited to openpyxl but need conversion for all writers. The stuff in formats/ should remain relatively writer-agnostic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally all of this logic would just be a single call here and the logic elsewhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've grokked your vision, given that this is writer specific. Do you mean that there should be more refactoring across writers? Except for this number formatting, it's already quite factored, as they each have different syntaxes for creating and formatting cells.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i think think excel should be refactored into a subdir of writer code and style things should live there

maybe make an issue about this
it's a bit of work to split it then adding things like style should be easy

@jnothman

I'm happy to make an issue aiming to refactor excel writing code. But how do you feel about this PR?

@jreback

@jnothman

@jnothman

@jnothman

I've moved the what's new to 0.20.

@jnothman

jreback

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')
for src, dst in self.STYLE_MAPPING:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this only is triggered if there is styling (IOW this won't cause a perf issue for 'regular' excel)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we return if style_dict is None; a few lines above that we return if num_format_str is None and style_dict is None. I think that is sufficient.

Btw, I think even the default to_excel has some styling of headers, so this function will always be called, but will be returned early where possible.

There are ways to make this faster, though:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing a faster variant.

@@ -1609,6 +1609,68 @@ def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0,
startcol + cell.col,
val, style)
# Map from openpyxl-oriented styles to flatter xlsxwriter representation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code would be simpler to make this style formatting into a separate class (rather than have it live in functions sitting in the main excel code). can you refactor to make this cleaner.

@jreback

this looks reasonable, can you rebase

@jnothman

Yes, sorry I've not managed to do the refactoring you'd like to see. I've been unsure what you would like, and have had my attentions elsewhere.

@jnothman

@jnothman

jnothman

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppVeyor failure looks like someone else's problem.

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')
for src, dst in self.STYLE_MAPPING:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we return if style_dict is None; a few lines above that we return if num_format_str is None and style_dict is None. I think that is sufficient.

Btw, I think even the default to_excel has some styling of headers, so this function will always be called, but will be returned early where possible.

There are ways to make this faster, though:

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')
for src, dst in self.STYLE_MAPPING:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing a faster variant.

@jnothman

@jreback

@jnothman

Merge into 0.21 and avoid future what's new conflicts?

@jreback

you need to rebase and some comments to respond

@jnothman

Well, your comments suggested a refactor, and I've previously commented that I do not understand your vision of a refactor in this space, and certainly have not found capacity within my other FOSS commitments to to do a general code quality improvement here.

Since those comments, you announced "this looks reasonable". I'll have another very brief look at refactoring.

@jnothman

@jnothman

@jnothman

I made the requested change as I interpret it. I do not find it improves code quality.

@jnothman

I haven't understood the cause of the travis failure, apparently in lint.sh but without error message that I can see.

I've merged in master and moved what's new to the next version.

@jreback

Linting *.py
pandas/io/excel.py:1794:1: E305 expected 2 blank lines after class or function definition, found 1

you can check this locally via:

make lint-diff (or just directly run flake8 on that file)

@jnothman

@jnothman

TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clipboard failure on travis looks unrelated.

All good @jreback?

jreback

jreback

@@ -22,7 +22,7 @@ New features
Other Enhancements
^^^^^^^^^^^^^^^^^^
-
- Better support for ``Dataframe.style.to_excel()`` output with the ``xlsxwriter`` engine. (:issue:`16149`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now. it might make sense to enhance the excel docs with what is better now? (or maybe a listing of styles that work). but for a followup.

@jreback

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request

Oct 31, 2017

@jnothman @peterpanmj

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request

Nov 28, 2017

@jnothman @No-Stream