Read First: FAQ / Common Issues / Troubleshooting Guide · explosion/spaCy · Discussion #8226 (original) (raw)
This is a guide to issues that multiple people have experienced. Some of them are basic Python environment issues, some of them are recent bugs in or out of spaCy we're dealing with, and some we're not sure about yet.
Be sure to also check out the troubleshooting section in the docs!
How to ask a good question
If your question isn't answered here, feel free to open a new discussion. Some things to keep in mind to help us help you:
- don't post screenshots of code, errors, or terminal output - paste text as text with appropriate markdown formatting
- include the full stack trace for any errors, not just the error message
- include the output of
spacy infoand the versions of any related packages you are using - when you include code, provide a full snippet we can copy/paste and run
Are you using an old spaCy version?
Be sure to upgrade to the most recent spaCy version to get fixes for older bugs. If you're using v2.x, we're still releasing updates for it, so try those too. If you can't upgrade your version for some reason, let us know why so we understand your situation better.
Installing trained pipelines
- FAQ: Trouble installing models #8575: Check our list of common causes of issues installing pipeline packages.
Jupyter and Google Colab
I installed spaCy in Jupyter but I get the error "no such module"
- AttributeError: type object 'Language' has no attribute 'factory' #7435: Sometimes Jupyter isn't using the Python you think it is.
GPU Issues on Google Colab
- Spacy 3.0 not working in colab pro #7912:
KeyError: 'packaging'It seems to be possible to fix this by using a clean environment and not installing cupy - Can not connect GPU with Google Colab #8747 It seems like installing your own cupy no longer gives an error, but still doesn't work. The fix is still to not install any cuda extras.
GPU Support
I have an AMD card...
We do not currently have a guide for this. Cupy has experimental support for AMD GPUs you can try. If you've gotten spaCy working with an AMD card, please let us know! You can open a Discussion on the topic.
Training a ML model
Check the Example Projects!
Did you know we have example projects? These are complete examples of using the NER, Text Categorizer, Entity Linking, and other models. They include all necessary training data, so you can check out the code, train a model, and use their configs and conversion scripts as reference for your own models. If you have a question about how everything fits together in practice, check here first!
Preprocessing Text
- How should I preprocess text for spaCy? #10243: You generally don't need to preprocess text for spaCy - see this post for details.
My retrained model forgot pretrained entities
- Forgetting pre-trained labels after training ner model with 'en_core_web_sm' #7666 (comment): Theoretical background on this "catastrophic forgetting" problem
- Extracting entity relations with newly trained (from pretrained) named entity recognizer #5134 (comment): Practical advice on how to measure/prevent it
Can I continue training from the latest epoch of the previous training run?
- Is It Possible to Resume Training Via CLI in Spacy v3 (transformers)? #8176: Yes - you can source the previously trained model in your config
I'm having trouble with binary classification in textcat
- Self-contradictory summary in spacy debug data #8035 (comment) You should structure the binary classification as a two-label classification problem
I want to add non-textual features
As of 3.2, nlp() and nlp.pipe() accept Docs as input, so what you can do is create a simple Doc and add your data as custom attributes, and then pass that Doc to another pipeline.
Here are some older workarounds for this:
- Can´t use Extension attributes in Tok2Vec #6527 (comment) Has one example approach to adding custom token features.
- Pass an argument in the Language __call__() besides the text #8194 Shows how to modify
make_docto pass arbitrary extra data.
What are iterations, steps, epochs....? When does training stop?
- Unclear what the column 'E' is outputting in the console output during training #7731 Explains training output such as
Eand# - How to configure when model training ends? #7465 Explains the logic around when training stops, including patience/early stopping
Hyperparameters
Performance
Incorrect pre-trained model predictions
spaCy is too slow
- FAQ: What to do when spaCy is too slow? #8402 This is a quick guide to techniques for speeding up inference.
I'm getting Out of Memory errors
- parameter batch_size vs max_length vs batcher.size #8600 (comment) You can modify batch size and max-length settings in the config
- For a transformer pipeline, you can add a doc_cleaner component at the end of the pipeline in spaCy v3.2.1+ to automatically remove
doc._.trf_datato reduce the memory required during the training evaluate steps and the size of any saved output docs.
Windows
I get the error Microsoft Visual C++ 14.0 is required
- update error messaging 'Microsoft Visual C++ 14.0 is required.' on Windows-- 64 bit is a better solution #5869: You're probably using 32bit Python, which is strongly discouraged - use 64bit Python instead.
I get the error ImportError: DLL load failed: The specified module could not be found.
- Windows .pyd files sneakily depend on msvcp140.dll #5332: Try installing the latest Microsoft Visual C++ Redistributable