litscope

E-Discovery Search 101

These days, it’s becoming harder and harder to stay on top of the various searching practices that may be applied during the e-discovery lifecycle. While it is universally agreed that search is the best and most efficient way to reduce the mind bogging volumes of data, which also makes it the best and most efficient way to reduce the mind boggling costs of e-discovery, agreement on the best search technology and methodology is vigorously contested. Not surprisingly, where one stands on the topic of search technology and methodology typically falls in lock step with the company one represents. In this article, I will leave the bias aside and simply provide a basic overview and definition of the most common search technologies and methodologies used in e-discovery.


Over-inclusive vs. Under-inclusive

These self-explanatory terms are applicable in several contexts within the legal domain. With respect to e-discovery search, they refer to search practices that either; A) do not return all the documents that may be relevant to the particular search (under-inclusive), or B) return more than just the documents that are relevant to the particular search (over-inclusive). While it is generally accepted that all search methodologies will have a certain level of each, too much of either is a recipe for disaster. Judge Paul Grimm’s opinion last year in the Victor Stanley case provides an interesting perspective on the over-inclusive – under-inclusive issue, while also providing useful insight on how to enhance the defensibility of search in general.


Keyword Search

Thanks to our friends at Google and Yahoo!, most of us are more or less familiar with how keyword search works. Simply type in a word or phrase, and the search engine returns the documents that contain that word or phrase. – no rocket science there. Of course, even keyword search has its complexities. While Boolean operators (AND, OR, NOT) are relatively straightforward, “wildcard” or “star” searching (e.g., “infra*” returns any document containing a word that begins with those characters), stemming (e.g., “jog” with stemming returns any documents containing “jog”, “jogs”, “jogger”, “jogged”, “jogging”) and “fuzzy” searching (returns any document containing various spellings and/or misspellings of a search term), have made keyword search a lot more susceptible to some of the issues that make search such a controversial and scrutinized subject within the e-discovery space.


Concept Search

Concept search, which has been getting a lot of buzz in the industry lately, tries to overcome the under-inclusive issue that is perceived to be inherent in traditional keyword search. The theory is that by simply looking for specific words, keyword search will miss documents regarding the same subject matter but containing synonyms of the keywords rather than the keywords themselves. For example, one document may refer to an employee as having been “fired”, while another may refer to the employee as having been “terminated.” While these documents clearly address the same topic, it’s easy to see how keyword search could miss one or the other. That said, the flip side of the under-inclusive coin is over-inclusiveness, which can lead to many other problems, including skyrocketing costs.


Concept Categorization

Concept categorization, also known as concept or document clustering, gained mainstream notoriety in the e-discovery space with the success of Attenex and Stratify. What these tools and others like them do is automatically bundle documents of similar subject matter together, which subsequently increases reviewer productivity. While this notion certainly has merit, at some point the industry began to think of concept categorization as a search process. Concept categorization looks at each document, applies some logic (e.g, Bayesian), and based on that logic makes a determination on which documents have similar subject matter. Concept categorization does not take in a search term and retrieve documents based on that parameter.


“Black-Box”

Traditionally search companies like the aforementioned Google and Yahoo! guard their search algorithms as extremely confidential trade secrets. This approach also applies to enterprise search companies, many of who have had their technology adopted by the e-discovery industry. The problem is that secrecy is not a good thing when is comes to the question of how evidence was found. Imagine a homicide proceeding where the prosecution answers the question of how they found the murder weapon with, “Sorry Judge, it’s a secret.” Needless to say, that probably wouldn’t sit well with most judges. Unfortunately, that’s essentially what attorneys that use “black-box” search tools have to say when the same question is posed regarding documents in an e-discovery case. Of course, there are ways to mitigate this issue, and consulting with opposing counsel and agreeing on search methodology before hand is the key component for doing so.


“Transparent”

The idea behind transparent search is to overcome the exact issue presented by black-box search algorithms. By bringing the operations that search engines typically do behind the scenes to the forefront, and, in some cases, allowing users to tweak those operations, transparent search is supposedly better suited for e-discovery and the defensibility requirement inherent in any litigation. Take for example the stemming capabilities offered by many keyword search tools. In a typical black-box search tool, a stemming enabled search for the term “jog” might automatically return any documents containing the words “jog”, “jogs”, “jogger”, “jogged”, and “jogging.” In a black-box solution, these stem variants would be largely unknown and the user would not have any ability to tweak them. Transparent search on the other hand, would allow users to adjust the stem variants as necessary, therefore limiting the search to only the desired terms and providing full “transparency” into the process. While the benefits of transparent search are clear, the technology is quite new and has yet to be proven in the mainstream e-discovery space. Furthermore, transparent search creates more complexity and additional areas of debate among opposing counsels. Time will tell if the promise of transparent search truly comes to fruition.


E-Discovery search is a complex and ever evolving topic. Like so much of the technology in the industry, e-discovery has had to make due with search tools that were not designed for use in litigation or a legal setting, which has led to lessons being learned the hard way. However, the development of e-discovery specific search technology, together with more refined search methodologies, the future is bright, and search promises to play an increasingly important role in e-discovery.

-->