Sergey Nivens / Shutterstock.com
Minesoft’s Chemical Explorer helps users to find chemical information in patents issued by multiple authorities. Robert Poolman explains how it works.
‘Open innovation’ was a term coined by organisational theorist Henry Chesbrough to emphasise that innovators should use internal and external ideas to create value and advance their technology. Patents are an essential source of external ideas and are often the first and only source of public disclosure of a new invention, especially in the world of chemistry.
If a chemical entity is disclosed in the scientific literature, this can take on average one to three years after publication in a patent document, so ignoring patents as an information source will, at best, delay innovation and, at worst, prevent it.
Until recently, chemical information in patents was available only through a number of well-established databases, such as Chemical Abstract Service’s SciFinder, Elsevier’s Reaxys and Thomson Reuters’ Cortellis, which rely on the costly approach of manual indexing. Access to these premium price databases was therefore limited mainly to industrial scientists. Furthermore, although manual indexing can provide accurate data, the ever-increasing volume of patents published each year has meant that compromises have had to be made in terms of coverage and turnaround time.
Advances in technology have been made for chemical named entity recognition (CNER), which has allowed large scale, automated data mining. SureChEMBL and IBM’s Strategic IP Insight Platform are examples of databases using such technology, but these databases have limitations in terms of country coverage, language recognition and data timeliness.
Extract and identify
Building on the success of its full text patent database PatBase, in November 2015 Minesoft launched Chemical Explorer to give users an alternative resource. Using CNER technology and updating daily, Chemical Explorer extracts and identifies the chemical entities disclosed in English, French, German, Chinese, Japanese and Korean from the full text of patents issued by more than 12 authorities, and makes this available to users at a reasonable, all-inclusive subscription-based cost.
Minesoft’s Chemical Explorer covers patent-issuing authorities in the US, UK, Australia, Israel, India, the European Patent Office and Patent Cooperation Treaty, and also extracts chemical information from the original non-Latin text of Chinese, Japanese and Korean patents.
This is important for two reasons. Non-Latin patents now account for more than half of all national patent filings and hence have an increasing importance when determining if a new compound is novel and therefore patentable. Second, non-Latin chemical nomenclature is quite different to Latin nomenclature, meaning that the precision and recall of CNER technology is significantly higher when analysing the original non-Latin text rather than the machine translation.
As well as text-based disclosures, chemical compounds are often disclosed as structural images. Since 2001, the US Patent and Trademark Office (USPTO) has required applicants to submit chemical structures as computer readable, MDL molfiles and ChemDraw files. To enhance its comprehensiveness, Minesoft’s Chemical Explorer has made use of this data and extracted and indexed the chemicals disclosed as images in all US patents and applications from 2001 to date.
The use of CNER technology in Chemical Explorer has opened up a new world of chemical prior art where, compared to manual indexing, there is no limit to the number of chemicals identified per document, no indexing policy dictating what is identified and indexed, and previously unidentified non-Latin chemicals are made available to all.
With the ability to easily draw or import a chemical structure or retrieve a chemical structure from a generic, trade or International Union of Pure and Applied Chemistry name or even a Chemical Abstracts Service (CAS) Registry number, Chemical Explorer allows chemists and non-chemists to complete a structure-based search.
After selecting the type of search to conduct (eg, identity, similarity or substructure) and the section of the patent to search, structure hits are retrieved instantaneously with details of the compound (including names, simplified molecular input line entry system strings, international chemical identifier keys and molecular weight) and links to external resources, including PubChem. The number of patent documents disclosing the structure is also identified and seamlessly linked to PatBase, giving users the ability to identify which publications within a patent family disclose the compound of interest.
Mining patent text
Chemical patents can be hundreds if not thousands of pages in length, making it a constant challenge for users to identify where within a document the chemical compound of interest has been disclosed. Being able to quickly locate the exact instances of the chemical in a patent document can save a lot of time and make the review of multiple documents more efficient.
"CHEMICAL EXPLORER HAS OPENED UP A NEW WORLD OF CHEMICAL PRIOR ART WHERE, COMPARED TO MANUAL INDEXING, THERE IS NO LIMIT TO THE NUMBER OF CHEMICALS IDENTIFIED PER DOCUMENT."
With this in mind, Minesoft has developed a new visualisation software, TextMine. Coupled with Chemical Explorer, TextMine is able to pinpoint the exact location(s) of the chemical in the full text of the patent document. Furthermore, TextMine has the ability to extract and automatically highlight all chemical, genetic, disease, polymer, engineering terms and physical parameters that are disclosed in a patent document and can be used more generally to aid the review of patent documents in PatBase.
An additional advantage of the seamless linking between Chemical Explorer and PatBase is the ability to combine results from Chemical Explorer with the wealth of patent data available in PatBase. With more than 100 million patent publications from 100-plus patent-issuing authorities in a single database, PatBase allows searches to be carried out on patent documents from all sectors published anywhere in the world.
The ability to interrogate patent information from multiple angles including keyword, assignee, inventor and patent classification, as well as from a chemical structure perspective, ensures comprehensiveness so that meaningful decisions can be taken based on the results.
Regularly searching and reviewing patent literature is crucial for survival in any innovation-driven industry. Many industries, including pharmaceutical, biotechnology, chemical, consumer, cosmetics and engineering, rely on chemicals, and having access to a comprehensive database of global patent data is critical to help build a strong patent portfolio, monitor competitor activity and create a long-term competitive advantage.
Minesoft’s continuing commitment to developing innovative products for the patent information market ensures that organisations are empowered to harness new technologies and exploit patent literature to boost innovation globally.
Robert Poolman is a senior manager at Minesoft. He has more than 14 years of experience in the information industry, including managing the NIBR search and analytics team at Novartis. He joined Minesoft in 2014 and focuses on spearheading future product development. He can be contacted at: firstname.lastname@example.org and you can find out more about Chemical Explorer at: www.minesoft.com
Robert Poolman, Minesoft, patent, USPTO, chemical innovation,