gq-image
Photo Concepts / iStockphoto.com
26 October 2017AmericasEllen Sherin

Searching for sequences: a high stakes game

The intent of patent searching is to ask: “what’s out there that is related to my invention?” It could be prior art affecting patentability; it could be claimed IP (granted or not) with its associated risk of patent infringement. The search results, and interpretation by the IP practitioner, are key factors when deciding the fate of the invention or technology being searched.

The potential financial implications of this type of decision can be enormous. According to the AIPLA 2015 Report of the Economic Survey, the median costs of patent litigation through trial and appeal ranged from $600,000 for less than $1 million at risk, to above $5 million for more than $25 million at risk. That’s just legal costs, without consideration of damages for infringement. Idenix (Merck) and Gilead’s litigation of infringement and invalidity on multiple patents for hepatitis C drugs is an example. Gilead’s August 2017 Form 10-Q estimates its potential liability for the various legal actions up to $9 billion.

One case (now on appeal) awarded a $2.5 billion judgment to Merck (with potential for future triple damages). But Merck also lost some other litigations: two Merck patents were invalidated in favour of Gilead’s patents—one for lack of enablement, the other for prosecutorial misconduct—and the real winners are ultimately the law firms. American Lawyer’s July 18, 2017 edition reports a $12.5 million legal fees award in just one of these cases.

Genetic patents are not exempt. In March 2017, an appeal confirmed Bayer’s $469 million award against Dow for patent infringement regarding use of a gene conferring glufosinate resistance in plants.

These cases exemplify the high stakes in IP law, and the need to “get it right” the first time—whether to determine freedom to operate, patentability, to decide to kill or advance a research project, or to develop an opinion on the validity of a given patent.

Let’s take a closer look at how sequences are addressed in patents. On a very basic level, they are treated as text strings, and the descriptor “percent identity” is used to describe how many sequence letters match. Percent identity is often used in claims, and is a key screening parameter when searching.

Not the whole story

Of course, it isn’t quite that simple. The complexity comes when matching up these long strings when there are differences. That’s where sequence search algorithms come in. Different algorithms handle matching differently, and the percent identity will also differ as a function of algorithm choice and parameters.

A percent identity claim based on the Smith-Waterman algorithm, which is evaluated compared to a BLAST search result, may lead to erroneous conclusions. Furthermore, sequence search algorithms have settings which affect the percent identity, and even the type of sequence they may find (or miss). Alignments created with the same algorithm but different parameters often give different results.

The percent identity for a search result, when calculated according to a patent’s definition, may be within a claim scope, but if the search algorithm or parameters are different, the percent identity may appear outside the claim scope, or vice versa. As a result, either an incorrect clearance may be given, or a promising project may be cancelled.

Missed hits

The BLAST algorithm is the most commonly used algorithm for sequence claims and is also frequently used for sequence searching, so depending on the query sequence’s characteristics, that is often a good starting point. However, there are types of query sequences which require either BLAST parameter adjustments or different algorithms for the most complete results.

Short sequences such as complementary determining regions, probes, primers or other short sequences are better searched with an algorithm like GenePast, as BLAST misses hits unless parameters are adjusted for short queries.

Genomic sequences often break into pieces called multiple high-scoring sequence pairs (mHSPs) when aligned against non-genomic DNA. Sequences with significant insertions or deletions may also, as will sequences with relatively long regions of mismatch between matched regions. Depending on the searcher’s knowledge and the search product used, mHSPs may be detectable, but the searcher and the IP practitioner must understand how to identify and to evaluate these results.

One approach is GenomeQuest’s “query % HSP coverage” field, which combines the different pieces into a group with an overall calculated percent identity, to give the practitioner a preliminary idea of the potential relevance of each group of HSPs.

Already registered?

Login to your account

To request a FREE 2-week trial subscription, please signup.
NOTE - this can take up to 48hrs to be approved.

Two Weeks Free Trial

For multi-user price options, or to check if your company has an existing subscription that we can add you to for FREE, please email Adrian Tapping at atapping@newtonmedia.co.uk