Updating arxiv papers

CERMINE, Ref Extract, and GROBID each excelled at different fields in the metadata.

One of the decisions that we made early on is that, wherever the reader is ultimately directed, all of the links will point back to a resolver endpoint that we control.For example, presentation of cited references on the abstract page has become a fairly standard practice for bibliographic databases (they are “metadata”, after all), but some worry that displaying them apart from the text obscures important context (e.g.is the author criticizing a work, or building on it? We also know that some users link directly to the PDF, bypassing the abstract page entirely.We also added a few extra extraction steps of our own to be sure that we caught ar Xiv identifiers, and to supplement DOI detection.We then integrate those extractions using a likelihood-based approach: Deciding how to present extracted references to readers is by far the most challenging and complex part of this project.

