[1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial by zhangguanheng66 · Pull Request #1352 · pytorch/tutorials (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation22 Commits17 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

In torchtext 0.9.0 release, we will include the raw text datasets as beta release. Update the text classification tutorial with the new torchtext library.

This PR should be tested against pytorch 1.8.0 rc and torchtext 0.9.0 rc.

zhangguanheng66 changed the title~~[WIP][DO NOT REVIEW] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial~~ [WIP][DO NOT REVIEW][1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial

Feb 10, 2021

zhangguanheng66 changed the title~~[WIP][DO NOT REVIEW][1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial~~ [1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial

Feb 11, 2021

text = torch.cat(text)
return text, offsets, label
train_iter = AG_NEWS(split='train')
num_class = len(set([label for (label, text) in train_iter]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're materializing the dataset again, but this already happened earlier in the context of DataLoader. We can just assign list(train_iter) to a variable to avoid this. We should probably also add the number of labels to our dataset documentation, which would be much more efficient to use than this. I'll add this as a task.

Guanheng Zhang added 2 commits

February 11, 2021 13:58

Base automatically changed from master to main

February 16, 2021 19:33

Base automatically changed from main to master

February 16, 2021 19:37

Guanheng Zhang added 2 commits

February 17, 2021 19:13

Guanheng Zhang added 3 commits

February 19, 2021 12:59

==================================

In this tutorial, we will show how to use the new torchtext library to build the dataset for the text classification analysis. In the nightly release of the torchtext library, we provide a few prototype building blocks for data processing. Users will have the flexibility to
In this tutorial, we will show how to use the new torchtext library to build the dataset for the text classification analysis. Users will have the flexibility to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to say "new" torchtext library anymore, because the datasets are now part of the top folder.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Guanheng Zhang added 2 commits

February 23, 2021 17:10

# computes the mean value of a “bag” of embeddings. The text entries here
# have different lengths. ``nn.EmbeddingBag`` requires no padding here
# since the text lengths are saved in offsets.
# The model is composed of the `nn.EmbeddingBag https://pytorch.org/docs/stable/nn.html?highlight=embeddingbag#torch.nn.EmbeddingBag`__ layer plus a linear layer for the classification purpose. ``nn.EmbeddingBag`` computes the mean value of a “bag” of embeddings. Although the text entries here have different lengths, nn.EmbeddingBag module requires no padding here since the text lengths are saved in offsets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think EmbeddingBag provides 'mode' option where 'mean' is just by default. So perhaps it's better to be explicit about instead of stating that EmbeddingBag take mean to combine embeddings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the context and explicitly say the default mode of mean.

'\| accuracy {:8.3f}'.format(epoch, idx, len(dataloader),
total_acc/total_count))
total_acc, total_count = 0, 0
start_time = time.time()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused variable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we reset the start_time variable to have the new time period.

#

from torch.utils.data import DataLoader
import time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps can be imported in next code snippet as it might not be used here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in L195?

Guanheng Zhang added 2 commits

February 27, 2021 14:38

brianjo changed the base branch from master to 1.8-RC5-TEST

March 4, 2021 05:07

brianjo added a commit that referenced this pull request

Mar 4, 2021

Update build.sh
Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (#1379)
[wip] replace audio tutorial
Update
Update
Update
fixup
Update requirements.txt
update
Update

Co-authored-by: Brian Johnson brianjo@fb.com

[1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (#1352)
switch to the new dataset API
checkpoint
checkpoint
checkpoint
update docs
checkpoint
switch to legacy vocab
update to follow the master API
checkpoint
checkpoint
address reviewer's comments

Co-authored-by: Guanheng Zhang zhangguanheng@devfair0197.h2.fair Co-authored-by: Brian Johnson brianjo@fb.com

[1.8 release] Switch to LM dataset in torchtext 0.9.0 release (#1349)
switch to raw text dataset in torchtext 0.9.0 release
follow the new API in torchtext master

Co-authored-by: Guanheng Zhang zhangguanheng@devfair0197.h2.fair Co-authored-by: Brian Johnson brianjo@fb.com

[WIP][FX] CPU Performance Profiling with FX (#1319)

Co-authored-by: Brian Johnson brianjo@fb.com

[FX] Added fuser tutorial (#1356)
Added fuser tutorial
updated index.rst
fixed conclusion
responded to some comments
responded to comments
respond

Co-authored-by: Brian Johnson brianjo@fb.com

Update numeric_suite_tutorial.py
Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (#1347)
Tutorial combining DDP with Pipeline Parallelism to Train Transformer models.

Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process drives GPUs 0 and 1 and another drives GPUs 2 and 3.

Polish out some of the docs.
Add thumbnail and address some comments.

Co-authored-by: pritam pritam.damania@fb.com

More updates to numeric_suite
Even more updates
Update numeric_suite_tutorial.py

Hopefully that's the last one

Update numeric_suite_tutorial.py

Last one

Update build.sh

Co-authored-by: moto 855818+mthrok@users.noreply.github.com Co-authored-by: Guanheng George Zhang 6156351+zhangguanheng66@users.noreply.github.com Co-authored-by: Guanheng Zhang zhangguanheng@devfair0197.h2.fair Co-authored-by: James Reed jamesreed@fb.com Co-authored-by: Horace He horacehe2007@yahoo.com Co-authored-by: Pritam Damania 9958665+pritamdamania87@users.noreply.github.com Co-authored-by: pritam pritam.damania@fb.com Co-authored-by: Nikita Shulga nshulga@fb.com

rodrigo-techera pushed a commit to Experience-Monks/tutorials that referenced this pull request

Nov 29, 2021

Update build.sh
Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (pytorch#1379)
[wip] replace audio tutorial
Update
Update
Update
fixup
Update requirements.txt
update
Update

Co-authored-by: Brian Johnson brianjo@fb.com

[1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (pytorch#1352)
switch to the new dataset API
checkpoint
checkpoint
checkpoint
update docs
checkpoint
switch to legacy vocab
update to follow the master API
checkpoint
checkpoint
address reviewer's comments

Co-authored-by: Guanheng Zhang zhangguanheng@devfair0197.h2.fair Co-authored-by: Brian Johnson brianjo@fb.com

[1.8 release] Switch to LM dataset in torchtext 0.9.0 release (pytorch#1349)
switch to raw text dataset in torchtext 0.9.0 release
follow the new API in torchtext master

Co-authored-by: Guanheng Zhang zhangguanheng@devfair0197.h2.fair Co-authored-by: Brian Johnson brianjo@fb.com

[WIP][FX] CPU Performance Profiling with FX (pytorch#1319)

Co-authored-by: Brian Johnson brianjo@fb.com

[FX] Added fuser tutorial (pytorch#1356)
Added fuser tutorial
updated index.rst
fixed conclusion
responded to some comments
responded to comments
respond

Co-authored-by: Brian Johnson brianjo@fb.com

Update numeric_suite_tutorial.py
Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (pytorch#1347)
Tutorial combining DDP with Pipeline Parallelism to Train Transformer models.

Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process drives GPUs 0 and 1 and another drives GPUs 2 and 3.

Polish out some of the docs.
Add thumbnail and address some comments.

Co-authored-by: pritam pritam.damania@fb.com

More updates to numeric_suite
Even more updates
Update numeric_suite_tutorial.py

Hopefully that's the last one

Update numeric_suite_tutorial.py

Last one

Update build.sh

[1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial by zhangguanheng66 · Pull Request #1352 · pytorch/tutorials (original) (raw)

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Labels