Computing Semantic Similarity between Skill Statements for Approximate Matching (original) (raw)
International Journal of Artificial Intelligence in Education, 2019
Describing the competencies required by a profession is essential for aligning online profiles of job seekers and job advertisements. Comparing the competencies described within each context has typically not be done, which has generated a complete disconnect in language between them. This work presents an approach for the alignment of online profiles and job advertisements, according to knowledge and skills, using measures of lexical, syntactic and taxonomic similarity. In addition, we use a ranking that allows the alignment of the profiles to the topics of a thesaurus that define competencies. The results are promising, because the combination of the measures of similarity with the alignment with thesauri of competencies offers robustness to the process of generation of professional competence descriptions. This combination allows dealing with the common problems of synonymy, homonymy, hypernymy/hyponymy and meronymy of the terms in Spanish. This research uses natural language processing to offer a novel approach for assessing the match of the competencies described by the applicants and by the employers, even if they use different terminology. The resulting approach, while developed in Spanish for computer science jobs, can be extended to other languages and domains, such is the case of recruitment, where it will contribute to the creation of better tools that give feedback to job seekers about how to best align their competencies with job opportunities.
On Constructing, Grouping and Using Topical Ontology for Semantic Matching
2009
An ontology topic is used to group concepts from different contexts (or even from different domain ontologies). This paper presents a pattern-driven modeling methodology for constructing and grouping topics in an ontology (PAD-ON methodology), which is used for matching similarities between competences in the human resource management (HRM) domain. The methodology is supported by a tool called PAD-ON. This paper demonstrates our recent achievement in the work from the EC Prolix project. The paper approach is applied to the training processes at British Telecom as the test bed.
Identifying Competences in IT Professionals through Semantics
Analyzing the Future
In current organizations, the importance of knowledge and competence is unquestionable. In Information Technology (IT) companies, which are, by definition, knowledge intensive, this importance is critical. In such organizations, the models of knowledge exploitation include specific processes and elements that drive the production of knowledge aimed at satisfying organizational objectives. However, competence evidence recollection is a highly intensive and time consuming task, which is the key point for this system. SeCEC-IT is a tool based on software artifacts that extracts relevant information using natural language processing techniques. It enables competence evidence detection by deducing competence facts from documents in an automated way. SeCEC-IT includes within its technological components such items as semantic technologies, natural language processing, and human resource communication standards (HR-XML).
A syntactic approach for searching similarities within sentences
Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02, 2002
Textual data is the main electronic form of knowledge representation. Sentences, meant as logic units of meaningful word sequences, can be considered its backbone. In this paper, we propose a solution based on a purely syntactic approach for searching similarities within sentences, named approximate sub 2 sequence matching. This process being very time consuming, efficiency in retrieving the most similar parts available in large repositories of textual data is ensured by making use of new filtering techniques. As far as the design of the system is concerned, we chose a solution that allows us to deploy approximate sub 2 sequence matching without changing the underlying database.
Leveraging Grammatical Roles for Measuring Semantic Similarity Between Texts
IEEE Access, 2021
Semantic similarity between texts can be defined based on their meaning. Assessing the textual similarity is a prerequisite in almost all applications in the field of language processing and information retrieval. However, the diversity in the sentence structure makes it formidable to estimate the similarity. Some sentences pairs are lexicographically similar but semantically dissimilar. That is why the trivial lexical overlapping is not enough for measuring the similarity. To attain the semanticity of sentences, the context of the words and the structure of the sentence should be considered. In this paper, we propose a new method for capturing the semantic similarity between sentences based on their grammatical roles through word semantics. First, the sentences are divided grammatically into different parts where each part is considered as a grammatical role. Then multiple new measures are introduced to estimate the role-based similarity exploiting word semantics considering the se...
JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching
arXiv (Cornell University), 2024
Recent approaches in skill matching, employing synthetic training data for classification or similarity model training, have shown promising results, reducing the need for time-consuming and expensive annotations. However, previous synthetic datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. In this paper, we introduce JOBSKAPE, a framework to generate synthetic data that tackles these limitations, specifically designed to enhance skill-to-taxonomy matching. Within this framework, we create SKILLSKAPE, a comprehensive open-source synthetic dataset of job postings tailored for skill-matching tasks. We introduce several offline metrics that show that our dataset resembles real-world data. Additionally, we present a multi-step pipeline for skill extraction and matching tasks using large language models (LLMs), benchmarking against known supervised methodologies. We outline that the downstream evaluation results on real-world data can beat baselines, underscoring its efficacy and adaptability. 1 * Equal contribution.
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery From Data, 2008
We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.
Skill2vec: Machine Learning Approaches for Determining the Relevant Skill from Job Description
2017
Un-supervise learned word embeddings have seen tremendous success in numerous Natural Language Processing (NLP) tasks in recent years. The main contribution of this paper is to develop a technique called Skill2vec, which applies machine learning techniques in recruitment to enhance the search strategy to find the candidates who possess the right skills. Skill2vec is a neural network architecture which inspired by Word2vec, developed by Mikolov et al. in 2013, to transform a skill to a new vector space. This vector space has the characteristics of calculation and present their relationship. We conducted an experiment using AB testing in a recruitment company to demonstrate the effectiveness of our approach.
Verb similarity on the Taxonomy of WordNet
2006
In this paper, we introduce two kinds of word similarity algorithms to investigate the capability of WordNet in measuring verb similarity. Both are tested on two noun and two verb data sets. The noun set is a standard set but in the absence of a standard verb set we have proposed and tested human and computer results on a similar verb set.