Latent Semantic Indexing
Bringing Order to Unstructured Data
Industry research suggests that more than half of an organization’s data is unstructured with this figure growing twice as fast as structured data. Finding relevant information and insight in unstructured data is challenging at best and virtually impossible in terabyte size repositories. What’s too often missing is an easy and intuitive way to search, categorize and analyze unstructured data.
Agilex Information Discovery, our solution for Latent Semantic Indexing (LSI) addresses this void. Agilex Information Discovery allows users to search for and analyze text in terms of key concepts rather than keywords. This makes it possible to locate relevant information more readily and precisely while uncovering additional insight and a richer understanding of a document’s ‘true’ meaning. LSI is being used within the national security and criminal justice arenas for intelligence analysis and surveillance, to address information management requirements like FOIA compliance and e-Discovery, and to accelerate scientific research.
Patented Technology Backed by Patent Holders
Agilex is a Content Analyst Reseller. Based on our exclusive alliance with Content Analyst®, a leading provider of patented LSI technology, our customized solutions build upon a powerful conceptual mapping engine – CAAT from Content Analysts – as the basis for a multitude of scalable, multi-lingual, schema-less applications for document analysis, content analysis, and secure data sharing. This allows us to dramatically improve the effectiveness and productivity of analysts, librarians and researchers by:
- Reducing document search times by 65% or more
- Allowing users to sort and categorize millions of documents per day 70% faster
- Identifying conceptually-related documents automatically without the use of an external dictionary or thesauri
- Searching simultaneously for a key concept across multiple languages
- Summarizing key documents automatically
The Agilex LSI team is led by industry veterans with decades of applied experience and multiple patents in semantic processing to their name. Working with a group of cleared professionals in information retrieval and management, they can support even the most sensitive assignment.
Key features of our LSI solutions include:
- Conceptual Search — Virtually ‘reads’ every document it indexes and identifies all of the concepts instead of just tags and keywords. Not only is this faster than other approaches, it also finds information that those techniques can't, and can even use a long document as the ‘search string’ itself.
- Automatic Categorization — Uses small sets of examples – 20 or so per category – to find all related or similar documents in large collections. This allows a user to categorize millions of documents per server per day, identifying only those documents that are extremely relevant to their query.
- Automatic Clustering — ‘Reads’ a previously uncategorized collection of documents and automatically sorts them into logical groups or clusters. This eliminates the need for the reviewer to understand the subject matter.
- Automatic Summarization — Creates an ‘ad hoc’ summary – regardless of the document’s length or complexity – using the document’s key sentences,. In addition to be significantly faster, use of the actual content eliminates the potential bias of paraphrasing.
- Cross-Lingual Analysis — Analyzes documents in their native languages without translation for search, categorization, taxonomy generation, and summarization. With support for sixteen native languages, including many Asian and Middle Eastern ones, the need for additional translation is eliminated while also providing reviewers with the ability to analyze documents in their original language.
- Contextual Explanation — By clicking on a new or unfamiliar term, related or similar terms are automatically highlighted in the text. This helps to facilitate self-learning as reviewers explore new topic areas.
- Conceptual Comparison — Automatically maps all of a document’s text – ranging from just a single word to an entire book – into a conceptual grid so that relationships can be more easily uncovered and explored. This is a critical feature that helps user deconstruct and reassemble complex relationships.
- Name Tracking and Disambiguation — Track an individual who may use multiple names, nicknames or aliases. This works to improve accuracy and confidence.
- Relationship Discovery — Automated ability to identify subtle relationships between words, content and concepts. Allows users to gain additional insight, such as the alias an individual commonly uses.
- Taxonomy Generation — Groups of documents can be automatically categorized into hierarchical subsets. This enables faster discovery as reviewers can prioritize their analysis more effectively.
As our solutions are based upon an open framework, they can be easily customized to meet the unique requirements of each client. Furthermore, due to the modular architecture of the underlying CAAT technology, new components can be readily added to the solution to further refine and extend performance.

Collateral Spotlight