torch_frame.datasets.MultimodalTextBenchmark — pytorch-frame documentation (original) (raw)
class MultimodalTextBenchmark(root: str, name: str, text_stype: torch_frame.stype = stype.text_embedded, col_to_text_embedder_cfg: dict[str, TextEmbedderConfig] | TextEmbedderConfig | None = None, col_to_text_tokenizer_cfg: dict[str, TextTokenizerConfig] | TextTokenizerConfig | None = None)[source]
Bases: Dataset
The tabular data with text columns benchmark datasets used by“Benchmarking Multimodal AutoML for Tabular Data with Text Fields”. Some regression datasets’ target column is transformed from log scale to original scale.
Parameters:
- name (str) – The name of the dataset to download.
- text_stype (torch_frame.stype) – Text stype to use for text columns in the dataset. (default:
torch_frame.text_embedded
)
STATS:
Name | #rows | #cols (numerical) | #cols (categorical) | #cols (text) | #cols (other) | #classes | Task | Missing value ratio |
---|---|---|---|---|---|---|---|---|
product_sentiment_machine_hack | 6,364 | 0 | 1 | 1 | 0 | 4 | multiclass_classification | 0.0% |
jigsaw_unintended_bias100K | 125,000 | 29 | 0 | 1 | 0 | 2 | binary_classification | 41.4% |
news_channel | 25,355 | 14 | 0 | 1 | 0 | 6 | multiclass_classification | 0.0% |
wine_reviews | 105,154 | 2 | 2 | 1 | 0 | 30 | multiclass_classification | 1.0% |
data_scientist_salary | 19,802 | 0 | 3 | 2 | 1 | 6 | multiclass_classification | 12.3% |
melbourne_airbnb | 22,895 | 26 | 47 | 13 | 3 | 10 | multiclass_classification | 9.6% |
imdb_genre_prediction | 1,000 | 7 | 1 | 2 | 1 | 2 | binary_classification | 0.0% |
kick_starter_funding | 108,128 | 1 | 3 | 3 | 2 | 2 | binary_classification | 0.0% |
fake_job_postings2 | 15,907 | 0 | 3 | 2 | 0 | 2 | binary_classification | 23.8% |
google_qa_answer_type_reason_explanation | 6,079 | 0 | 1 | 3 | 0 | 1 | regression | 0.0% |
google_qa_question_type_reason_explanation | 6,079 | 0 | 1 | 3 | 0 | 1 | regression | 0.0% |
bookprice_prediction | 6,237 | 2 | 3 | 3 | 0 | 1 | regression | 1.7% |
jc_penney_products | 13,575 | 2 | 1 | 2 | 0 | 1 | regression | 13.7% |
women_clothing_review | 23,486 | 1 | 3 | 2 | 0 | 1 | regression | 1.8% |
news_popularity2 | 30,009 | 3 | 0 | 1 | 0 | 1 | regression | 0.0% |
ae_price_prediction | 28,328 | 2 | 5 | 1 | 3 | 1 | regression | 6.1% |
california_house_price | 47,439 | 18 | 8 | 2 | 11 | 1 | regression | 13.8% |
mercari_price_suggestion100K | 125,000 | 0 | 6 | 2 | 1 | 1 | regression | 3.4% |