Add fast tokenizer for BARTpho by datquocnguyen · Pull Request #17254 · huggingface/transformers (original) (raw)
The ModelArguments
should have trust_remote_code_model
and trust_remote_code_tokenizer
separately for the model and tokenizer loading, respectively, shouldn't it? For example:
tokenizer_name_or_path = model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path
if config.model_type in {"bloom", "gpt2", "roberta"}:
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_name_or_path,
cache_dir=model_args.cache_dir,
use_fast=True,
revision=model_args.model_revision,
trust_remote_code=trust_remote_code_tokenizer, # For tokenizer
use_auth_token=True if model_args.use_auth_token else None,
add_prefix_space=True,
)
else:
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_name_or_path,
cache_dir=model_args.cache_dir,
use_fast=True,
revision=model_args.model_revision,
trust_remote_code=trust_remote_code_tokenizer, # For tokenizer
use_auth_token=True if model_args.use_auth_token else None,
)
model = AutoModelForTokenClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
trust_remote_code=trust_remote_code_model, # For model
use_auth_token=True if model_args.use_auth_token else None,
ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
)