What is entity extraction? A beginner’s guide (original) (raw)
What is considered an entity?
In the context of entity extraction, an "entity" refers to a specific piece of information or an object within a text that holds particular significance. These are often real-world concepts or specific mentions that systems can identify and categorize. Think of them as the key nouns or noun phrases that convey factual information.
Common types of entities include:
- People: Names of individuals (for example, "Sundar Pichai," "Dr. Jane Doe")
- Organizations: Names of companies, institutions, government agencies, or other structured groups (for example, "Google," "World Health Organization")
- Locations: Geographical places, addresses, or landmarks (for example, "New York," "Paris," "United States")
- Dates and times: Specific dates, date ranges, or time expressions (for example, "yesterday," "5th May 2025," "2006")
- Quantities and monetary values: Numerical expressions related to amounts, percentages, or money (for example, "300 shares," "50%," "$100")
- Products: Specific goods or services (for example, "iPhone," "Google Cloud")
- Events: Named occurrences such as conferences, wars, or festivals (for example, "Olympic Games," "World War II")
- Other specific categories: Depending on the application, entities can also include job titles (for example, "CEO"), phone numbers, email addresses, medical codes, or any custom-defined terms relevant to a particular domain
The goal is to identify these significant mentions and assign them to a predefined category, transforming unstructured text into data that a computer can process and interpret.