CogPrints, Clarivate, OAI-PMH, Green/Gold AI

My prompt to GPT: What can you tell me about “Clarivate”? CogPrints seems to be flagging, though EPrints (and DSpace) may not be. I am quite out-of-date about the OAI-PMH, having more or less failed in my battle for Green OA (vs. [Fools] Gold OA). “Clarivate” has (again) written to me about its inability to harvest CogPrints. Never technically competent — and long out of touch on tech aspects — I have just received the attached, and have no idea whether there is anything really at stake (though I have the sentimental sense that CogPrints ought to be preserved, if it can be). What is your view, from your superhuman perspective?

GPT’s Reply: Professor Harnad, Here is the situation, stripped of vendor fog and updated to current OA plumbing.

What Clarivate is, and why they are writing

Clarivate is not an OA advocate; it is a bibliometrics and discovery infrastructure vendor. Its core products (Web of Science, Journal Citation Reports, ProQuest-derived discovery indexes, institutional knowledge bases) depend on harvesting metadata from repositories so libraries can surface content in their search layers.

Clarivate does not need CogPrints for preservation. It needs CogPrints so that libraries using Clarivate-powered discovery tools can see CogPrints records. When harvesting breaks, CogPrints content silently disappears from those discovery environments.

The message you received is technically banal and historically unsurprising: their last successful OAI-PMH harvest was in July 2017, and the endpoint http://cogprints.org/cgi/oai2 no longer responds. Hence, they cannot ingest updates. That is all they are reporting. 

What is (and is not) at stake

Not at stake

  • CogPrints’ existence
  • The content already deposited
  • Open access per se
  • Any legal or contractual obligation

At stake

  • Visibility of CogPrints records in library discovery systems
  • Indirectly, the chance that younger researchers encounter CogPrints material unless they already know to look for it

This is discoverability erosion, not content loss.

Why EPrints vs DSpace matters (a little)

CogPrints runs on EPrints, which historically implemented OAI-PMH early and correctly. But OAI-PMH is brittle: if the CGI endpoint breaks, certificates expire, Perl versions change, or the server is moved without updating configs, harvesting simply stops. Nothing “philosophical” here—just bit-rot.

DSpace sites tend to survive longer because they are institutionally maintained; CogPrints is heroic but artisanal.

Your Green OA battle, revisited (briefly)

From a Green OA perspective, this episode is depressingly familiar:

  • You won the normative argument.
  • You lost the infrastructure sustainability war.

Gold OA publishers have revenue streams that pay sysadmins. Green OA repositories often do not. Clarivate is merely the messenger.

My assessment — From a detached, “superhuman” vantage point:

  1. Clarivate is correct: the OAI-PMH endpoint is dead.
  2. There is no emergency, only gradual invisibility.
  3. Preservation ≠ harvesting: if CogPrints matters historically (which it does), preservation should be addressed separately (LOCKSS, CLOCKSS, institutional mirroring).
  4. Fixing this is technically modestif someone with EPrints/OAI access can:
    • restore the OAI-PMH endpoint, or
    • migrate CogPrints to a maintained host, or
    • stand up a static OAI-PMH proxy

Bottom line: Clarivate’s letter does not signal a threat; it signals neglect catching up with a pioneering repository. CogPrints is not being “de-indexed” out of spite or ideology. It is simply no longer answering when discovery systems knock.

The sentimental intuition you report—that CogPrints ought to be preserved—is sound. But the preservation problem is institutional and infrastructural, not bibliometric, and Clarivate is peripheral to its solution.

If you wish, the next step would be to sketch the least-effort technical or institutional paths that would keep CogPrints visible without turning you into a sysadmin.

Below is a brief, actionable sketch you can forward to the Southampton EPrints colleagues.

Goal: Restore OAI-PMH responsiveness for CogPrints so third parties (Clarivate, etc.) can harvest again, and reduce future breakage risk.

Fast triage

  1. Check endpoint reachability

http://cogprints.org/cgi/oai2?verb=Identify

If dead, determine whether this is (a) DNS/host down, (b) webserver/CGI misconfig, (c) EPrints app failure, (d) firewall/WAF blocking.

  1. Check server logs (Apache/Nginx + EPrints + Perl/CGI errors) at the time of a request to /cgi/oai2.

Likely quick fix

  1. Repair EPrints OAI service

In EPrints this is usually a configuration/module/CGI execution issue after OS/webserver/perl upgrades. Re-enable/repair the OAI handler and confirm:

  1. Identify, ListRecords, ListIdentifiers succeed
  2. resumptionTokens work (harvesters often fail when these are broken)
  3. Add an HTTPS endpoint if feasible (many harvesters now strongly prefer HTTPS). If CogPrints has HTTPS already, ensure the OAI endpoint is reachable over it and not redirecting oddly.

If the host is the real problem

  1. Move CogPrints to a maintained host (best medium-term option)

If cogprints.org is on aging infrastructure, migrate the EPrints instance to Southampton-managed hosting (or equivalent), preserving the domain via DNS cutover. This solves most “bit-rot” causes.

If you want belt-and-suspenders

  1. Provide a simple “status page” and monitoring

Lightweight uptime monitoring for /cgi/oai2?verb=Identify with email alerts prevents multi-year silent failure.

  1. Preservation separate from harvesting

If not already: arrange a preservation/mirroring plan (institutional backup + optional LOCKSS/CLOCKSS-style approach). This is distinct from Clarivate’s issue but aligns with “CogPrints should persist.”

What to tell Clarivate (once fixed): Ask them to re-run a full harvest against the working endpoint and confirm their last harvest date updates from 2017.