What is entity extraction? A beginner’s guide (original) (raw)

What is considered an entity?

In the context of entity extraction, an "entity" refers to a specific piece of information or an object within a text that holds particular significance. These are often real-world concepts or specific mentions that systems can identify and categorize. Think of them as the key nouns or noun phrases that convey factual information.

Common types of entities include:

People: Names of individuals (for example, "Sundar Pichai," "Dr. Jane Doe")
Organizations: Names of companies, institutions, government agencies, or other structured groups (for example, "Google," "World Health Organization")
Locations: Geographical places, addresses, or landmarks (for example, "New York," "Paris," "United States")
Dates and times: Specific dates, date ranges, or time expressions (for example, "yesterday," "5th May 2025," "2006")
Quantities and monetary values: Numerical expressions related to amounts, percentages, or money (for example, "300 shares," "50%," "$100")
Products: Specific goods or services (for example, "iPhone," "Google Cloud")
Events: Named occurrences such as conferences, wars, or festivals (for example, "Olympic Games," "World War II")
Other specific categories: Depending on the application, entities can also include job titles (for example, "CEO"), phone numbers, email addresses, medical codes, or any custom-defined terms relevant to a particular domain

The goal is to identify these significant mentions and assign them to a predefined category, transforming unstructured text into data that a computer can process and interpret.