Support more styles for xlsxwriter by jnothman · Pull Request #16149 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation30 Commits16 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

I was surprised to find that despite the interchangeable representation of Excel styles, xlsxwriter did not have good style support.

I've not added direct tests for this functionality, but test some of it through test_styler_to_excel.

~~closes #xxxx~~
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry

There is a consistent test failure, but not one I've managed to replicate locally.

Codecov Report

Merging #16149 into master will decrease coverage by <.01%.
The diff coverage is 89.65%.

@@ Coverage Diff @@ ## master #16149 +/- ##

Coverage 90.83% 90.83% -0.01%
Files 159 159
Lines 50796 50809 +13

Hits 46143 46153 +10

Misses 4653 4656 +3

Flag	Coverage Δ
#multiple	88.61% <89.65%> (-0.01%)	⬇️
#single	40.3% <3.44%> (-0.01%)	⬇️

Impacted Files	Coverage Δ
pandas/io/excel.py	80.55% <89.65%> (-0.07%)	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b80ed3...c935f5d. Read the comment docs.

Finally passing tests after identifying a recently fixed openpyxl bug that got in the way


if num_format_str is not None:
xl_format.set_num_format(num_format_str)
props['num_format'] = num_format_str

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move more of this logic out of here an into the formats dir somewhere?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean explicitly moving the number format logic out? Yes, perhaps that's a worthwhile refactoring. I think we should also be calculating the number format from the display.precision config. For which reason, I believe all of those changes belong in a different PR.

Or are you talking about moving this mapping logic out? Well currently we assume nested style dicts as an interchange format, which are well-suited to openpyxl but need conversion for all writers. The stuff in formats/ should remain relatively writer-agnostic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally all of this logic would just be a single call here and the logic elsewhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've grokked your vision, given that this is writer specific. Do you mean that there should be more refactoring across writers? Except for this number formatting, it's already quite factored, as they each have different syntaxes for creating and formatting cells.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i think think excel should be refactored into a subdir of writer code and style things should live there

maybe make an issue about this
it's a bit of work to split it then adding things like style should be easy

I'm happy to make an issue aiming to refactor excel writing code. But how do you feel about this PR?

I've moved the what's new to 0.20.

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')

for src, dst in self.STYLE_MAPPING:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this only is triggered if there is styling (IOW this won't cause a perf issue for 'regular' excel)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we return if style_dict is None; a few lines above that we return if num_format_str is None and style_dict is None. I think that is sufficient.

Btw, I think even the default to_excel has some styling of headers, so this function will always be called, but will be returned early where possible.

There are ways to make this faster, though:

store STYLE_MAPPING as a trie and descend recursively only where a prefix is matched.
flatten style_dict and store STYLE_MAPPING as a dict so that their keys match. But to be deterministic in case of multiple competing styles, STYLE_MAPPING would need to store the matched index, and the results would need to be sorted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing a faster variant.

@@ -1609,6 +1609,68 @@ def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0,
startcol + cell.col,
val, style)

# Map from openpyxl-oriented styles to flatter xlsxwriter representation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code would be simpler to make this style formatting into a separate class (rather than have it live in functions sitting in the main excel code). can you refactor to make this cleaner.

this looks reasonable, can you rebase

Yes, sorry I've not managed to do the refactoring you'd like to see. I've been unsure what you would like, and have had my attentions elsewhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppVeyor failure looks like someone else's problem.

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')

for src, dst in self.STYLE_MAPPING:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we return if style_dict is None; a few lines above that we return if num_format_str is None and style_dict is None. I think that is sufficient.

Btw, I think even the default to_excel has some styling of headers, so this function will always be called, but will be returned early where possible.

There are ways to make this faster, though:

store STYLE_MAPPING as a trie and descend recursively only where a prefix is matched.
flatten style_dict and store STYLE_MAPPING as a dict so that their keys match. But to be deterministic in case of multiple competing styles, STYLE_MAPPING would need to store the matched index, and the results would need to be sorted.

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')

for src, dst in self.STYLE_MAPPING:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing a faster variant.

Merge into 0.21 and avoid future what's new conflicts?

you need to rebase and some comments to respond

Well, your comments suggested a refactor, and I've previously commented that I do not understand your vision of a refactor in this space, and certainly have not found capacity within my other FOSS commitments to to do a general code quality improvement here.

Since those comments, you announced "this looks reasonable". I'll have another very brief look at refactoring.

I made the requested change as I interpret it. I do not find it improves code quality.

I haven't understood the cause of the travis failure, apparently in lint.sh but without error message that I can see.

I've merged in master and moved what's new to the next version.

Linting *.py
pandas/io/excel.py:1794:1: E305 expected 2 blank lines after class or function definition, found 1

you can check this locally via:

make lint-diff (or just directly run flake8 on that file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clipboard failure on travis looks unrelated.

All good @jreback?

@@ -22,7 +22,7 @@ New features
Other Enhancements
^^^^^^^^^^^^^^^^^^

-
- Better support for ``Dataframe.style.to_excel()`` output with the ``xlsxwriter`` engine. (:issue:`16149`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now. it might make sense to enhance the excel docs with what is better now? (or maybe a listing of styles that work). but for a followup.

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request

Oct 31, 2017

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request

Nov 28, 2017

Support more styles for xlsxwriter by jnothman · Pull Request #16149 · pandas-dev/pandas (original) (raw)

Conversation

Codecov Report

@@ Coverage Diff @@ ## master #16149 +/- ##

Coverage 90.83% 90.83% -0.01%

Files 159 159Lines 50796 50809 +13

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Files 159 159
Lines 50796 50809 +13