Luca Di Liello - Academia.edu (original) (raw)
Papers by Luca Di Liello
Chapter 22. Semantic Loss Functions for Neuro-Symbolic Structured Prediction
Frontiers in artificial intelligence and applications, Jul 21, 2023
Findings of the Association for Computational Linguistics: EMNLP 2022
In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELEC-TRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering p... more Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper, we propose three pre-training objectives designed to mimic the downstream fine-tuning task of contextual AS2. This allows for specializing LMs when fine-tuning for contextual AS2. Our experiments on three public and two large-scale industrial datasets show that our pre-training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets.
Journal of Open Source Software
A main problem with reproducing machine learning publications is the variance of metric implement... more A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation (Heusel et al., 2017) will differ based on the specific interpolation method used.
arXiv (Cornell University), Oct 24, 2022
In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pretraining approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
English. This paper aims at uncovering the structure of clinical documents, in particular, identi... more English. This paper aims at uncovering the structure of clinical documents, in particular, identifying paragraphs describing "diagnosis" or "procedures". We present transformer-based architectures for approaching this task in a monolingual setting (English), exploring a weak supervision scheme. We further extend our contribution to a cross-lingual scenario, mitigating the need for expensive manual data annotation and taxonomy engineering for Italian. Italian. In questo lavoro abbiamo studiato approfonditamente la struttura dei documenti clinici ed, in particolare, abbiamo creato sistemi automatici per l'estrazione di paragrafi contenenti diagnosi e procedure. Attraverso l'utilizzo di modelli basati sull'architettura transformer, abbiamo estratto diagnosi e procedure nel setting monolingua (in inglese). Successivamente, abbiamo esteso la nostra ricerca allo scenario multilingue, riducendo il fabbisogno di larghi dataset in italiano annotati manualmente grazie all'utilizzo di machine translation e transfer learning.
arXiv (Cornell University), May 20, 2022
An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved... more Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks.
PyTorchLightning/metrics: New NLP metrics and improved API
We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pret... more We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever. NLP metrics - Text package Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluatio...
PyTorchLightning/metrics: Feature teaser
Machine learning metrics for distributed, scalable PyTorch applications.
ArXiv, 2020
Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and... more Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraint...
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020, 2020
Transfer learning has been proven to be effective, especially when data for the target domain/tas... more Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1 .
arXiv (Cornell University), Apr 19, 2021
Transformer-based neural networks have heavily impacted the field of natural language processing,... more Transformer-based neural networks have heavily impacted the field of natural language processing, outperforming most previous state-of-the-art models. However, well-known models such as BERT, RoBERTa, and GPT-2 require a huge compute budget to create a high quality contextualised representations. In this paper, we study several efficient pre-training objectives for Transformersbased models. By testing these objectives on different tasks, we determine which of the ELECTRA model's new features is the most relevant: (i) Transformers pre-training can be improved when the input is not altered with artificial symbols, e.g., masked tokens; and (ii) loss functions computed using the whole output reduce training time. (iii) Additionally, we study efficient models composed of two blocks: a discriminator and a simple generator (inspired by the ELECTRA architecture). Our generator is based on a much simpler statistical approach, which minimally increases the computational cost. Our experiments show that it is possible to efficiently train BERT-like models using a discriminative approach as in ELECTRA but without a complex generator. Finally, we show that ELECTRA largely benefits from a deep hyper-parameter search.
Chapter 22. Semantic Loss Functions for Neuro-Symbolic Structured Prediction
Frontiers in artificial intelligence and applications, Jul 21, 2023
Findings of the Association for Computational Linguistics: EMNLP 2022
In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELEC-TRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering p... more Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper, we propose three pre-training objectives designed to mimic the downstream fine-tuning task of contextual AS2. This allows for specializing LMs when fine-tuning for contextual AS2. Our experiments on three public and two large-scale industrial datasets show that our pre-training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets.
Journal of Open Source Software
A main problem with reproducing machine learning publications is the variance of metric implement... more A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation (Heusel et al., 2017) will differ based on the specific interpolation method used.
arXiv (Cornell University), Oct 24, 2022
In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transfo... more In this paper, we study trade-offs between efficiency, cost and accuracy when pretraining Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pretraining approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
English. This paper aims at uncovering the structure of clinical documents, in particular, identi... more English. This paper aims at uncovering the structure of clinical documents, in particular, identifying paragraphs describing "diagnosis" or "procedures". We present transformer-based architectures for approaching this task in a monolingual setting (English), exploring a weak supervision scheme. We further extend our contribution to a cross-lingual scenario, mitigating the need for expensive manual data annotation and taxonomy engineering for Italian. Italian. In questo lavoro abbiamo studiato approfonditamente la struttura dei documenti clinici ed, in particolare, abbiamo creato sistemi automatici per l'estrazione di paragrafi contenenti diagnosi e procedure. Attraverso l'utilizzo di modelli basati sull'architettura transformer, abbiamo estratto diagnosi e procedure nel setting monolingua (in inglese). Successivamente, abbiamo esteso la nostra ricerca allo scenario multilingue, riducendo il fabbisogno di larghi dataset in italiano annotati manualmente grazie all'utilizzo di machine translation e transfer learning.
arXiv (Cornell University), May 20, 2022
An important task for designing QA systems is answer sentence selection (AS2): selecting the sent... more An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved... more Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks.
PyTorchLightning/metrics: New NLP metrics and improved API
We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pret... more We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever. NLP metrics - Text package Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluatio...
PyTorchLightning/metrics: Feature teaser
Machine learning metrics for distributed, scalable PyTorch applications.
ArXiv, 2020
Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and... more Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraint...
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020, 2020
Transfer learning has been proven to be effective, especially when data for the target domain/tas... more Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1 .
arXiv (Cornell University), Apr 19, 2021
Transformer-based neural networks have heavily impacted the field of natural language processing,... more Transformer-based neural networks have heavily impacted the field of natural language processing, outperforming most previous state-of-the-art models. However, well-known models such as BERT, RoBERTa, and GPT-2 require a huge compute budget to create a high quality contextualised representations. In this paper, we study several efficient pre-training objectives for Transformersbased models. By testing these objectives on different tasks, we determine which of the ELECTRA model's new features is the most relevant: (i) Transformers pre-training can be improved when the input is not altered with artificial symbols, e.g., masked tokens; and (ii) loss functions computed using the whole output reduce training time. (iii) Additionally, we study efficient models composed of two blocks: a discriminator and a simple generator (inspired by the ELECTRA architecture). Our generator is based on a much simpler statistical approach, which minimally increases the computational cost. Our experiments show that it is possible to efficiently train BERT-like models using a discriminative approach as in ELECTRA but without a complex generator. Finally, we show that ELECTRA largely benefits from a deep hyper-parameter search.