MT>3
  • Home
  • About
  • People
  • Services
    • e-Discovery
    • Managed Review
    • Information Governance
    • Due Diligence
  • Blog
  • News
  • Contact

Did the U.S. Election Invalidate TAR?

10/11/2016

 
​We all know the outcome. Donald Trump will be the next US president, despite almost all polls predicting he had practically no chance of attaining that post. Does this mean that statistics don’t work? Hardly.
 
Almost all the polls indicated that Hillary Clinton had between a 70% and 98% chance of winning the election. Only one poll, The Los Angeles Times Daybreak poll, said that Donald Trump was more likely to win. In the so-called swing states, these same polls were way off.
 
The polls were based on sampling a small percentage of the population, and then using statistical probability analysis to predict how the whole population would vote. This is a similar method used by TAR (Technology Assisted Review) – the fancy acronym that encompasses predictive coding and other cost-saving e-discovery techniques. So, what do the election results say about the validity of using TAR?
 
Based on what we have gathered so far, it appears the polls failed because the pollsters did not use representative samples.  Statistical probability assumes that the sample will be made up of an even cross-section of the whole population. When dealing with voting patterns, this means the sample needs to include representatives from all different social groups: males, females, young, old, ethnic minorities, gay, straight, etc. More importantly, the pollsters needed to consider whether the people polled would actually vote, since those who would not vote should not have been included.
 
The unrepresentative nature of the samples is where the pollsters failed. There were many reasons for this. One was the way the samples were taken. Many of the polls were conducted online. Older people generally don’t use computers, or at least are less likely to answer polls online. The polls were also conducted primarily in populated areas. The results show that the percentage of people in urban areas who actually voted was lower than the percentage of people in rural areas. This means that, for the samples to be representative, more rural people and fewer urban people should have been included.
 
When sampling discovery data, the same representative sample requirement holds true. Statistics tell us that, if you have a large enough collection and the different records are somewhat evenly distributed throughout, a random selection of a few thousand will likely give you a representative sample. Unlike the election, in e-discovery we have a way to make sure our sample is representative – validation.  After we run our sample and separate our records into relevant and not relevant, we can then go back and sample the not relevant set to see if we missed anything.  If our initial sample was not representative, our second validation sample will very likely show up relevant records. 
 
The theory behind statistics and probability have been proven to be valid. When used correctly, they will return defensible results. Even though the election results may have surprised you, there is no reason to worry about the value of TAR. 

Comments are closed.

    Categories

    All
    Artificial Intelligence
    Blockchain
    Cyber Security
    E Discovery
    Information Governance
    Legaltech
    Privacy
    Social Media
    Technology


    Archives

    November 2020
    October 2020
    July 2020
    June 2020
    April 2020
    March 2020
    February 2020
    January 2020
    November 2019
    October 2019
    September 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019
    January 2019
    May 2018
    April 2018
    March 2018
    September 2017
    August 2017
    February 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015
    June 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    June 2013
    May 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    September 2012
    August 2012
    July 2012
    June 2012
    April 2012
    March 2012
    February 2012
    January 2012
    December 2011
    November 2011
    October 2011
    September 2011
    August 2011
    June 2011
    April 2011
    March 2011
    February 2011
    January 2011
    December 2010
    November 2010
    October 2010
    September 2010
    August 2010
    July 2010
    June 2010
    May 2010
    March 2010
    February 2010
    January 2010
    October 2009
    September 2009
    August 2009
    December 2008
    March 2008
    November 2007
    October 2007

130 Adelaide Street West Suite 2020
Toronto, Ontario M5H 3P5
​ ​
t: 416-642-2220  
tf: 1-877-642-2220  
f: 416-642-9021

Contact MT>3
@MT>3 2018. All Rights Reserved
Picture

Privacy Policy and Terms of Use

  • Home
  • About
  • People
  • Services
    • e-Discovery
    • Managed Review
    • Information Governance
    • Due Diligence
  • Blog
  • News
  • Contact