Backstop Predictive Coding

Preferred predictive coding solution for multilingual, large-scale, complex data sets

Not all predictive coding software is created equal.  The traditional standard for predictive coding software has been how accurately it can emulate human coding decisions with a minimal number of training or seed documents. But in the world of large-scale, complex matters comprised of documents in multiple languages, our predictive coding software has to do more.

Backstop has been used on matters of all shapes and sizes since 2007.  This varied experience has hardened the Backstop software and attuned it to the needs of complex data sets that are common to Consilio’s clients.

Here is how Backstop predictive coding proves its superiority:


  • Supports multilingual data sets, including those in Japanese, Chinese, Korean and more
  • Requires fewer documents to accurately “train” the software
  • Scales to support large matters without sacrificing speed

Excellence with Multilingual Data Sets

The Backstop predictive coding software has been applied against data sets commonly found in multinational corporations’ collections, including those in Japanese, Chinese, Italian, Swedish, Portuguese, German and more.  Better yet, Backstop has exceled in cases where multiple languages are comingled into a single document population without requiring pre-separation of the corpus by language for successful analysis– a common practice with other predictive coding software.

Special focus and enhancements to Backstop’s Feature Classifier allow for highly accurate language identification, superior modeling of feature concepts within the documents and a self-optimising algorithm attuned to the languages found in the document population.  The Backstop predictive coding software is deployed within Consilio’s worldwide data centres, so it’s available in the locations where non-English documents are most likely collected.

It all adds up to software that performs exceedingly well for a comingled, multilingual document corpus.

Better Accuracy with Fewer Training Documents

“How many documents do I need to review in order to train the computer?” That’s the question on the tip of every lawyer’s tongue at the outset of a predictive coding workflow. Backstop’s predictive coding software requires a relatively low threshold of seed documents– an answer that’s borne from multiple side-by-side, apples-to-apples comparisons.

Backstop predictive coding utilises Dynamic Parameter Optimisation and superior Feature Extraction in order to get more leverage out of every coded training document, so the computer models stabilise quickly at the desired margin of error. This means that Backstop will arrive at a high accuracy model – both in recall and precision – with fewer training documents than comparative software. Typically after review of only 1,000 documents* the software is able to achieve its desired recall target, and direct the case team to highly relevant documents in the corpus for prioritisation.

Speed and Scalability to Support Your Matter

Predictive coding software needs to model and deliver predictions quickly despite the size of the aggregate training set, scoring the balance of the document population rapidly regardless of the size of the corpus analysed.  Case teams are waiting on results, and review teams are waiting in the wings: time is money.

Backstop predictive coding software is built in a sharded, data store architecture with high-throughput parallelisation to facilitate rapid generation of predictions.  The software is built to spool up additional processor cores as needed to generate predictions – usually in less than an hour.  The Backstop software has performed quickly in matters as large as 50 million documents, generating scores for documents in fewer than six hours.