Luca Di Liello - Academia.edu (original) (raw)

Papers by Luca Di Liello

Research paper thumbnail of Chapter 22. Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Chapter 22. Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Frontiers in artificial intelligence and applications, Jul 21, 2023

Research paper thumbnail of Effective Pretraining Objectives for Transformer-based Autoencoders

Findings of the Association for Computational Linguistics: EMNLP 2022

In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELEC-TRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.

Research paper thumbnail of Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.

Research paper thumbnail of Context-Aware Transformer Pre-Training for Answer Sentence Selection

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering p... more Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper, we propose three pre-training objectives designed to mimic the downstream fine-tuning task of contextual AS2. This allows for specializing LMs when fine-tuning for contextual AS2. Our experiments on three public and two large-scale industrial datasets show that our pre-training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets.

Research paper thumbnail of TorchMetrics - Measuring Reproducibility in PyTorch

Journal of Open Source Software

A main problem with reproducing machine learning publications is the variance of metric implement... more A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation (Heusel et al., 2017) will differ based on the specific interpolation method used.

Research paper thumbnail of Effective Pre-Training Objectives for Transformer-based Autoencoders

arXiv (Cornell University), Oct 24, 2022

In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pretraining approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.

Research paper thumbnail of Language Transfer for Identifying Diagnostic Paragraphs in Clinical Notes

Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

English. This paper aims at uncovering the structure of clinical documents, in particular, identi... more English. This paper aims at uncovering the structure of clinical documents, in particular, identifying paragraphs describing "diagnosis" or "procedures". We present transformer-based architectures for approaching this task in a monolingual setting (English), exploring a weak supervision scheme. We further extend our contribution to a cross-lingual scenario, mitigating the need for expensive manual data annotation and taxonomy engineering for Italian. Italian. In questo lavoro abbiamo studiato approfonditamente la struttura dei documenti clinici ed, in particolare, abbiamo creato sistemi automatici per l'estrazione di paragrafi contenenti diagnosi e procedure. Attraverso l'utilizzo di modelli basati sull'architettura transformer, abbiamo estratto diagnosi e procedure nel setting monolingua (in inglese). Successivamente, abbiamo esteso la nostra ricerca allo scenario multilingue, riducendo il fabbisogno di larghi dataset in italiano annotati manualmente grazie all'utilizzo di machine translation e transfer learning.

Research paper thumbnail of Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection

arXiv (Cornell University), May 20, 2022

An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.

Research paper thumbnail of Paragraph-based Transformer Pre-training for Multi-Sentence Inference

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved... more Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks.

Research paper thumbnail of PyTorchLightning/metrics: New NLP metrics and improved API

PyTorchLightning/metrics: New NLP metrics and improved API

We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pret... more We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever. NLP metrics - Text package Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluatio...

Research paper thumbnail of PyTorchLightning/metrics: Feature teaser

PyTorchLightning/metrics: Feature teaser

Machine learning metrics for distributed, scalable PyTorch applications.

Research paper thumbnail of Efficient Generation of Structured Objects with Constrained Adversarial Networks

ArXiv, 2020

Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and... more Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraint...

Research paper thumbnail of Cross-Language Transformer Adaptation for Frequently Asked Questions

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020, 2020

Transfer learning has been proven to be effective, especially when data for the target domain/tas... more Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1 .

Research paper thumbnail of Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Research paper thumbnail of Efficient pre-training objectives for Transformers

arXiv (Cornell University), Apr 19, 2021

Transformer-based neural networks have heavily impacted the field of natural language processing,... more Transformer-based neural networks have heavily impacted the field of natural language processing, outperforming most previous state-of-the-art models. However, well-known models such as BERT, RoBERTa, and GPT-2 require a huge compute budget to create a high quality contextualised representations. In this paper, we study several efficient pre-training objectives for Transformersbased models. By testing these objectives on different tasks, we determine which of the ELECTRA model's new features is the most relevant: (i) Transformers pre-training can be improved when the input is not altered with artificial symbols, e.g., masked tokens; and (ii) loss functions computed using the whole output reduce training time. (iii) Additionally, we study efficient models composed of two blocks: a discriminator and a simple generator (inspired by the ELECTRA architecture). Our generator is based on a much simpler statistical approach, which minimally increases the computational cost. Our experiments show that it is possible to efficiently train BERT-like models using a discriminative approach as in ELECTRA but without a complex generator. Finally, we show that ELECTRA largely benefits from a deep hyper-parameter search.

Research paper thumbnail of Chapter 22. Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Chapter 22. Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Frontiers in artificial intelligence and applications, Jul 21, 2023

Research paper thumbnail of Effective Pretraining Objectives for Transformer-based Autoencoders

Findings of the Association for Computational Linguistics: EMNLP 2022

In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELEC-TRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.

Research paper thumbnail of Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.

Research paper thumbnail of Context-Aware Transformer Pre-Training for Answer Sentence Selection

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering p... more Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper, we propose three pre-training objectives designed to mimic the downstream fine-tuning task of contextual AS2. This allows for specializing LMs when fine-tuning for contextual AS2. Our experiments on three public and two large-scale industrial datasets show that our pre-training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets.

Research paper thumbnail of TorchMetrics - Measuring Reproducibility in PyTorch

Journal of Open Source Software

A main problem with reproducing machine learning publications is the variance of metric implement... more A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation (Heusel et al., 2017) will differ based on the specific interpolation method used.

Research paper thumbnail of Effective Pre-Training Objectives for Transformer-based Autoencoders

arXiv (Cornell University), Oct 24, 2022

In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pretraining approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.

Research paper thumbnail of Language Transfer for Identifying Diagnostic Paragraphs in Clinical Notes

Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

English. This paper aims at uncovering the structure of clinical documents, in particular, identi... more English. This paper aims at uncovering the structure of clinical documents, in particular, identifying paragraphs describing "diagnosis" or "procedures". We present transformer-based architectures for approaching this task in a monolingual setting (English), exploring a weak supervision scheme. We further extend our contribution to a cross-lingual scenario, mitigating the need for expensive manual data annotation and taxonomy engineering for Italian. Italian. In questo lavoro abbiamo studiato approfonditamente la struttura dei documenti clinici ed, in particolare, abbiamo creato sistemi automatici per l'estrazione di paragrafi contenenti diagnosi e procedure. Attraverso l'utilizzo di modelli basati sull'architettura transformer, abbiamo estratto diagnosi e procedure nel setting monolingua (in inglese). Successivamente, abbiamo esteso la nostra ricerca allo scenario multilingue, riducendo il fabbisogno di larghi dataset in italiano annotati manualmente grazie all'utilizzo di machine translation e transfer learning.

Research paper thumbnail of Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection

arXiv (Cornell University), May 20, 2022

An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.

Research paper thumbnail of Paragraph-based Transformer Pre-training for Multi-Sentence Inference

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved... more Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks.

Research paper thumbnail of PyTorchLightning/metrics: New NLP metrics and improved API

PyTorchLightning/metrics: New NLP metrics and improved API

We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pret... more We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever. NLP metrics - Text package Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluatio...

Research paper thumbnail of PyTorchLightning/metrics: Feature teaser

PyTorchLightning/metrics: Feature teaser

Machine learning metrics for distributed, scalable PyTorch applications.

Research paper thumbnail of Efficient Generation of Structured Objects with Constrained Adversarial Networks

ArXiv, 2020

Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and... more Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraint...

Research paper thumbnail of Cross-Language Transformer Adaptation for Frequently Asked Questions

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020, 2020

Transfer learning has been proven to be effective, especially when data for the target domain/tas... more Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1 .

Research paper thumbnail of Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Research paper thumbnail of Efficient pre-training objectives for Transformers

arXiv (Cornell University), Apr 19, 2021

Transformer-based neural networks have heavily impacted the field of natural language processing,... more Transformer-based neural networks have heavily impacted the field of natural language processing, outperforming most previous state-of-the-art models. However, well-known models such as BERT, RoBERTa, and GPT-2 require a huge compute budget to create a high quality contextualised representations. In this paper, we study several efficient pre-training objectives for Transformersbased models. By testing these objectives on different tasks, we determine which of the ELECTRA model's new features is the most relevant: (i) Transformers pre-training can be improved when the input is not altered with artificial symbols, e.g., masked tokens; and (ii) loss functions computed using the whole output reduce training time. (iii) Additionally, we study efficient models composed of two blocks: a discriminator and a simple generator (inspired by the ELECTRA architecture). Our generator is based on a much simpler statistical approach, which minimally increases the computational cost. Our experiments show that it is possible to efficiently train BERT-like models using a discriminative approach as in ELECTRA but without a complex generator. Finally, we show that ELECTRA largely benefits from a deep hyper-parameter search.