spaCy · Discussion #8226 (original) (raw)

This is a guide to issues that multiple people have experienced. Some of them are basic Python environment issues, some of them are recent bugs in or out of spaCy we're dealing with, and some we're not sure about yet.

Be sure to also check out the troubleshooting section in the docs!

How to ask a good question

If your question isn't answered here, feel free to open a new discussion. Some things to keep in mind to help us help you:

don't post screenshots of code, errors, or terminal output - paste text as text with appropriate markdown formatting
include the full stack trace for any errors, not just the error message
include the output of spacy info and the versions of any related packages you are using
when you include code, provide a full snippet we can copy/paste and run

Are you using an old spaCy version?

Be sure to upgrade to the most recent spaCy version to get fixes for older bugs. If you're using v2.x, we're still releasing updates for it, so try those too. If you can't upgrade your version for some reason, let us know why so we understand your situation better.

Installing trained pipelines

FAQ: Trouble installing models #8575: Check our list of common causes of issues installing pipeline packages.

Jupyter and Google Colab

I installed spaCy in Jupyter but I get the error "no such module"

AttributeError: type object 'Language' has no attribute 'factory' #7435: Sometimes Jupyter isn't using the Python you think it is.

GPU Issues on Google Colab

Spacy 3.0 not working in colab pro #7912: KeyError: 'packaging' It seems to be possible to fix this by using a clean environment and not installing cupy
Can not connect GPU with Google Colab #8747 It seems like installing your own cupy no longer gives an error, but still doesn't work. The fix is still to not install any cuda extras.

GPU Support

I have an AMD card...

We do not currently have a guide for this. Cupy has experimental support for AMD GPUs you can try. If you've gotten spaCy working with an AMD card, please let us know! You can open a Discussion on the topic.

Training a ML model

Check the Example Projects!

Did you know we have example projects? These are complete examples of using the NER, Text Categorizer, Entity Linking, and other models. They include all necessary training data, so you can check out the code, train a model, and use their configs and conversion scripts as reference for your own models. If you have a question about how everything fits together in practice, check here first!

Preprocessing Text

How should I preprocess text for spaCy? #10243: You generally don't need to preprocess text for spaCy - see this post for details.

My retrained model forgot pretrained entities

Forgetting pre-trained labels after training ner model with 'en_core_web_sm' #7666 (comment): Theoretical background on this "catastrophic forgetting" problem
Extracting entity relations with newly trained (from pretrained) named entity recognizer #5134 (comment): Practical advice on how to measure/prevent it

Can I continue training from the latest epoch of the previous training run?

Is It Possible to Resume Training Via CLI in Spacy v3 (transformers)? #8176: Yes - you can source the previously trained model in your config

I'm having trouble with binary classification in textcat

Self-contradictory summary in spacy debug data #8035 (comment) You should structure the binary classification as a two-label classification problem

I want to add non-textual features

As of 3.2, nlp() and nlp.pipe() accept Docs as input, so what you can do is create a simple Doc and add your data as custom attributes, and then pass that Doc to another pipeline.

Here are some older workarounds for this:

Can´t use Extension attributes in Tok2Vec #6527 (comment) Has one example approach to adding custom token features.
Pass an argument in the Language __call__() besides the text #8194 Shows how to modify make_doc to pass arbitrary extra data.

What are iterations, steps, epochs....? When does training stop?

Unclear what the column 'E' is outputting in the console output during training #7731 Explains training output such as E and #
How to configure when model training ends? #7465 Explains the logic around when training stops, including patience/early stopping

Hyperparameters

FAQ: Guide to understanding hyperparameters in spaCy #10625

Performance

Incorrect pre-trained model predictions

📚 Inaccurate pre-trained model predictions master thread #3052

spaCy is too slow

FAQ: What to do when spaCy is too slow? #8402 This is a quick guide to techniques for speeding up inference.

I'm getting Out of Memory errors

parameter batch_size vs max_length vs batcher.size #8600 (comment) You can modify batch size and max-length settings in the config
For a transformer pipeline, you can add a doc_cleaner component at the end of the pipeline in spaCy v3.2.1+ to automatically remove doc._.trf_data to reduce the memory required during the training evaluate steps and the size of any saved output docs.

Windows

I get the error `Microsoft Visual C++ 14.0 is required`

update error messaging 'Microsoft Visual C++ 14.0 is required.' on Windows-- 64 bit is a better solution #5869: You're probably using 32bit Python, which is strongly discouraged - use 64bit Python instead.

I get the error `ImportError: DLL load failed: The specified module could not be found.`

Windows .pyd files sneakily depend on msvcp140.dll #5332: Try installing the latest Microsoft Visual C++ Redistributable