Patentopia

A multi-stage patent extraction platform with disambiguation for certain semantic challenges

More Info
expand_more

Abstract

Bibliographic name disambiguation is an major semantic challenge, but critical to social sciences studies of important intellectual assets. Here we contribute to innovation research in several ways. We show a significant synonym problem in author names and discuss how a pre-processing heuristic step standardizing name variants helps, but homonyms generated with Chinese names are particularly difficult to resolve and manifest in an associated location list. Here we identify a new phenomenon of "onomastic profusion," the frequent use of certain words in firm names for semantic reasons that can confound disambiguation clustering algorithms. We illustrate these concerns with Patentopia, our customized platform accessing the PatentsView portal for the United States Patent and Trademark Office database and available for free academic use. This multi-stage system uses heuristics in concert with the PatentsView clustering process and reports meta-data to further assist analysis. As highly relevant use cases, we illustrate system performance with data derived from two important public innovation programs, I-Corps and Small Business Innovation Research (SBIR), and we close with implications for bibliometric analysis of current patent data.

Files

Patentopia_A_multi_stage_paten... (pdf)
(pdf | 1.3 Mb)
- Embargo expired in 26-07-2023