Uncovering Trends in Human Trafficking and Migrant Smuggling Activities: A Natural Language Processing Approach to UNODC SHERLOC Database (original) (raw)

Global human trafficking seen through the lens of semantics and text analytics

Proceedings of the Association for Information Science and Technology, 2017

Human trafficking is understood as a modern-day form of slavery. It involves the recruitment, transportation, transfer, harboring, or receipt of persons by improper means (such as force, abduction, fraud, or coercion) for an improper purpose including forced labor or sexual exploitation. Human trafficking is global, is found in every country, affects all genders, and persons of all ages. While pervasive, it is also invisible. Quantitative and qualitative research methods into human trafficking have significant challenges. This program presents collegial collaborative research by the U.S. Department of State and Georgetown University into the use of text analytics and semantic analysis methods to map trafficking, to identify trafficking hubs around the world, and to expose human trafficking.

Enhancing the Detection of Criminal Organizations in Mexico using ML and NLP

2020 International Joint Conference on Neural Networks (IJCNN)

This paper relies on Machine Learning (ML) and supervised Natural Language Processing (NLP) to generate a geo-referenced database on the violent presence of Mexican Criminal Organizations (MCOs) between 2000-2018. This application responds to the need for high-quality data on criminal groups to inform academic and policy analysis in a context of intense violence such as Mexico. Powered by ML and NLP tools, this computational social science application processes a vast collection of news stories written in Spanish to track MCOs' violent presence. The unprecedented granularity of the data allows disaggregating daily-municipal information for 10 main MCOs comprising more than 200 specific criminal cells.

Semi-automated knowledge discovery: identifying and profiling human trafficking

International Journal of General Systems, 2012

many information systems of various kinds, from dispatching systems to knowledge based systems for monitoring potential criminals. He was information architect at the national police department from 2002 to 2005 and designed the XML based national full text inquiry database where all incidents and all criminal records of the Dutch police are made available for information analysis. In 2005 he continued his career at the Amsterdam-Amstelland Police Department where he worked on knowledge-based information systems and started his PhD entitled "Formalizing the concepts of crimes and criminals" in 2007. At this moment he is working as project leader in several innovative data-and text-mining projects within different police organizations. Sergei O.Kuznetsov is since We propose an iterative and human-centred knowledge discovery methodology based on Formal Concept Analysis (FCA). The proposed approach recognizes the important role of the domain expert in mining real world enterprise applications and makes a use of specific domain knowledge, including human intelligence and domain-specific constraints. Our approach was empirically validated at the Amsterdam-Amstelland police to identify suspects and victims of human trafficking in 266157 suspicious activity reports. Based on guidelines of the Attorney Generals of the Netherlands we first defined multiple early warning

Effective Analysis of Relevant Legal Material for War Crime Prosecutions using Natural Language Processing Technologies

Proceedings for war crimes and crimes against humanity which take place in the International Criminal Court (ICC) take place over a long span of time, and often involve in the process the scrupulous examination of legal material and court records such as testimonies, prosecutions' submissions, news records and so on. The manual analysis of the legal material relevant to a particular case in preparation for the same or forthcoming cases can be a cumbersome task, however, this process can be streamlined with the help of Natural Language Processing (NLP) technologies. This paper presents a methodical study on the analysis of legal material relevant to the proceedings of war crimes in the ICC for speeding up and improving the process of gaining insights from the said materials. Using Latent Dirichlet Allocation (LDA), a topic discovery means in NLP (Blei, Ng and Jordan 2003), a toolkit of keywords and crucial phrases, as well as visual and quantitative indices such as dendrograms are generated from legal documents pertinent to the prosecutors' office from 3 landmark cases of the ICC, each of a unique category. The toolkit and indices aid case briefings and discernments about the judgement of not only the cases they relate to, but also objective insights with regards to the abstract category of cases they belong to. The analysis indicates that depending on the category and duration of cases, the type of topics varies greatly. This is overall in consistency with empirical evidence (Bielen, et al. 2016) (Spurr 1997). This system has a plethora of other potential applications in the field of law such as analysis of evidence and testimonies, to be used to make court proceedings more efficient.

Performance Evaluation of a Natural Language Processing Approach Applied in White Collar Crime Investigation

Lecture Notes in Computer Science, 2014

In today's world we are confronted with increasing amounts of information every day coming from a large variety of sources. People and corporations are producing data on a large scale, and since the rise of the internet, e-mail and social media the amount of produced data has grown exponentially. From a law enforcement perspective we have to deal with these huge amounts of data when a criminal investigation is launched against an individual or company. Relevant questions need to be answered like who committed the crime, who were involved, what happened and on what time, who were communicating and about what? Not only the amount of available data to investigate has increased enormously, but also the complexity of this data has increased. When these communication patterns need to be combined with for instance a seized financial administration or corporate document shares a complex investigation problem arises. Recently, criminal investigators face a huge challenge when evidence of a crime needs to be found in the Big Data environment where they have to deal with large and complex datasets especially in financial and fraud investigations. To tackle this problem, a financial and fraud investigation unit of a European country has developed a new tool named LES that uses Natural Language Processing (NLP) techniques to help criminal investigators handle large amounts of textual information in a more efficient and faster way. In this paper, we present briefly this tool and we focus on the evaluation its performance in terms of the requirements of forensic investigation: speed, smarter and easier for investigators. In order to evaluate this LES tool, we use different performance metrics. We also show experimental results of our evaluation with large and complex datasets from real-world application.

Supporting Law Enforcement in Digital Communities through Natural Language Analysis

Lecture Notes in Computer Science, 2008

Recent years have seen an explosion in the number and scale of digital communities (e.g. peer-to-peer file sharing systems, chat applications and social networking sites). Unfortunately, digital communities are host to significant criminal activity including copyright infringement, identity theft and child sexual abuse. Combating this growing level of crime is problematic due to the ever increasing scale of today's digital communities. This paper presents an approach to provide automated support for the detection of child sexual abuse related activities in digital communities. Specifically, we analyze the characteristics of child sexual abuse media distribution in P2P file sharing networks and carry out an exploratory study to show that corpus-based natural language analysis may be used to automate the detection of this activity. We then give an overview of how this approach can be extended to police chat and social networking communities.

Identifying Victims of Human Trafficking at Hotspots by Focusing on People Smuggled to Europe

Research has shown that smuggling of migrants is associated with human trafficking. Hence, victims of human trafficking amongst smuggled migrants should be identified by EU Member States at hotspots established by the European Commission , to overcome the migrant and refugee crisis. Identified victims should be given a visa and a programme of protection to escape their traffickers. In order to achieve these objectives, research suggests that EU law on migrant smuggling should be amended and the Temporary Protection Directive should be applied to smuggled persons when there is an indication that they may be victims of human trafficking. This approach should be adopted by the EASO in cooperation with police forces investigating smuggling and trafficking at hotspots.

Knowledge Management and Human Trafficking: Using Conceptual Knowledge Representation, Text Analytics and Open-Source Data to Combat Organized Crime

Lecture Notes in Computer Science, 2014

Globalization, the ubiquity of mobile communications and the rise of the web have all expanded the environment in which organized criminal entities are conducting their illicit activities, and as a result the environment that law enforcement agencies have to police. This paper triangulates the capability of open-source data analytics, ontological knowledge representation and the wider notion of knowledge management (KM) in order to provide an effective, interdisciplinary means to combat such threats, thus providing law enforcement agencies (LEA's) with a foundation of competitive advantage over human trafficking and other organized crime.

Leveraging Publicly Available Data to Discern Patterns of Human-Trafficking Activity

We present a few data analysis methods that can be used to process advertisements for escort services available in public areas of the Internet. These data provide a readily available proxy evidence for modeling and discerning human-trafficking activity. We show how it can be used to identify advertisements that likely involve such activity. We demonstrate its utility in identifying and tracking entities in the Web-advertisement data even if strongly identifiable features are sparse. We also show a few possible ways to perform community-and population-level analyses including behavioral summaries stratified by various types of activity and detection of emerging trends and patterns.

An Investigative Search Engine for the Human Trafficking Domain

Enabling intelligent search systems that can navigate and facet on entities, classes and relationships, rather than plain text, to answer questions in complex domains is a longstanding aspect of the Semantic Web vision. This paper presents an investigative search engine that meets some of these challenges, at scale, for a variety of complex queries in the human trafficking domain. The engine provides a real-world case study of synergy between technology derived from research communities as diverse as Semantic Web (investigative ontologies, SPARQL-inspired querying, Linked Data), Natural Language Processing (knowledge graph construction, word embeddings) and Information Retrieval (fast, user-driven relevance querying). The search engine has been rigorously proto-typed as part of the DARPA MEMEX program and has been integrated into the latest version of the Domain-specific Insight Graph (DIG) architecture , currently used by hundreds of US law enforcement agencies for investigating human trafficking. Over a hundred millions ads have been indexed. The engine is also being extended to other challenging illicit domains, such as securities and penny stock fraud, illegal firearm sales, and patent trolling, with promising results.