How automated content tagging improves findability (original) (raw)
People save content in a way that makes sense to them, but when other people can't easily find the content, it becomes a problem.
Properly tagged content can solve this problem. Metadata tagging is one of the most powerful features of content management systems, and one of the areas that AI can assist in. Tagging enables content managers to attribute meaning and context to content. Enterprise search software can assist in content tagging efforts and help business users find relevant information.
Businesses must identify content by more than the basic metadata fields, such as author and title. Content should include specific -- and useful -- metadata, such as document type, project name, effective date and other contextually relevant information.
Automated content tagging is one approach to solving the challenge of finding the right content. The concept is simple; instead of forcing business users to add metadata to content, an AI engine tags every piece of content.
What can AI do?
The easiest form of automated content tagging is the extraction of form data. A longstanding feature of scanning software, tagging engines can scan incoming forms of various content types and extract key metadata with minimal training of the AI engine. This leads to quick startup cycles and ROI for the software.
That approach works well with case management scenarios, where AI tagging engines routinely process forms. The greater challenge for businesses is performing the same task for free-form content. Extracting information from content, such as an article, and populating metadata improves the findability of content. Sometimes, these are single keywords that a user adds to a list of relevant terms, but they can be sets of paired values, such as pairing a custom property name with its value. For instance, a Social Security number and its value.
A third scenario for automated content tagging is grouping. AI engines can analyze content and group similar content together. With forms, this leads to the categorization of the same form types together.
For less structured content, the AI tool assigns tags to the new content that matches content with similar topics. In the case of this article, the tagging engine may automatically apply the tags "artificial intelligence" and "content management" because it shares many of the same terms and topics of existing articles with those tags already applied.
Preparing for success
All AI works better when a user trains it, or if it has built-in training. This means that an organization has properly categorized tags to feed into the AI engine. The more correctly tagged content a business has, the more accurately the AI engine performs.
An established taxonomy is also important for automatically tagging content.
An established taxonomy is also important for automatically tagging content. Having an agreed upon taxonomy for storing content -- at least at a high level -- enables teams to store content in locations that make sense to them. Using the hierarchical content structure as a starting point, the AI can derive metadata values from the name of the folder -- and parent folder -- where people store content.
For example, AI may automatically tag content in a budget folder as a "financial document." When businesses store that budget folder in a project collaboration site, the AI can also tag the content with the project and client names. When the finance department needs to find project budgets for a certain client, the taxonomy automatically generates tags, which enable the finance team to pull documents from multiple projects in a single search.
Vendors that perform AI tagging
An organization should consider how it structures existing content, as well as how it wants to store content, such as what format and what types of tags, when looking at AI tagging vendors.
ActiveNav and Everteam are file analysis tools that use AI and dig into existing content stores and classify the content. Their strength is in processing large content backlogs. This is often a prelude to moving the content into a managed content store. For organizations that aren't sure what content they have, talking to a file analysis vendor is a good place to start.
When an organization needs to extract metadata from forms and other structured content -- such as resumes -- Adlib Software and the open source Ephesoft are vendors that address that problem. Moreover, long-established document scanning software providers, such as Kofax, have nonscanning products that extract metadata as well. These providers are ideal for tagging structured content, whether from active scanning activities or digital forms.
Organizations looking to tag day-to-day free-form content -- including proposals, resumes, presentations and meeting notes -- should start by looking to see what options or features their content management vendor has. If the content management vendor cannot provide a tagging software, they have software partners able to help. A good place to find one is at the vendor's user events.
Other vendors that provide AI tagging assistance include:
- IBM Watson Content Hub. IBM Watson can propose relevant tags based on content.
- M-Files. Smart subjects provide tag suggestions based on document content.
- Microsoft SharePoint and Office 365. SharePoint and Office 365 tag images based on object recognition and geolocation data.
- OpenText. Magellan's AI capabilities include speech and text analytics from contextual hypothesis and meaning deduction.
Limitations of automated content tagging
Like any software, there are limitations to what automated tagging can do. AI is only as good as the training data it receives. The more unstructured the content, the less specific the metadata values that it can extract. Forms will likely be able to provide very specific metadata pair-values, such as names and addresses. Proposals and other free-form content are more challenging and can lend themselves to lists of keyword tags.
When thinking of the limitations, keep in mind that people make mistakes and often choose to not set metadata values. The goal is to improve on the current state of metadata tagging. Perfect metadata tagging does not exist, even in fully automated systems. Businesses that employ AI technologies come closer to perfection than organizations that purely rely on staff to tag everything manually.
Moving forward
Organizations should begin by clearly defining what challenge they are working to resolve. First, businesses should consider whether their focus is on incoming content or existing content. Next, if they tag all of their old content, they need to make sure that they tag all new content on creation so that they don't create more untagged content. Once organizations address those considerations, they can identify, evaluate and select the proper type of software for their needs.
Once the organization trains the AI, the automated tagging process can make content findable. Organizations can then gain greater understanding of what is taking place in their business.