When advising clients on their document collection, one of our key goals is to reduce the volume of records before the manual part of the review begins. The rationale is to reduce the collection of irrelevant data, which in turn reduces the labour intensive and costly human review of those records. While filtering by custodian, date, document type and keywords are the typical starting points, there are many new and advanced filtering methods available to us today, which are useful for organizing and reducing large and complex document collections.
Take, for instance, probability tracking. Probability tracking places a value on words based on relationships, proximities and frequency, in order to map out or create relationships with unknown data. Once the relationships between the words are identified, irrelevant data is more readily identified and culled from the collection.
Clustering tools analyze the content of documents, comparing the number of times different words appear. These tools then place the documents into a specified number of clusters. Concept learning technologies are a step up in complexity from probability tracking. It identifies related words and analyzes their relationship in a document that may not share the same words but nevertheless can be identified as having similar topics. These techniques become even more powerful when combined with thesauri, taxonomies and ontologies.
Complex tools such as these, based on mathematical probability and statistics, can be partnered with programs that present the information in a meaningful way for human examination and review. Collections of data represented in tables, trees, clusters and threads, allow us to understand the relationships of the records, thereby speeding up the review process.
For more information on probability tracking and clustering tools, contact Wortzman Nickle.