{"id":2660,"date":"2025-12-22T07:06:01","date_gmt":"2025-12-22T12:06:01","guid":{"rendered":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/?p=2660"},"modified":"2025-12-22T07:09:51","modified_gmt":"2025-12-22T12:09:51","slug":"cogprints-clarivate-oai-mhp-green-gold-ai","status":"publish","type":"post","link":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/2025\/12\/22\/cogprints-clarivate-oai-mhp-green-gold-ai\/","title":{"rendered":"CogPrints, Clarivate, OAI-PMH, Green\/Gold AI"},"content":{"rendered":"\n<p class=\"has-cyan-bluish-gray-background-color has-background\"><strong>My prompt to GPT<\/strong>: What can you tell me about &#8220;Clarivate&#8221;? CogPrints seems to be flagging, though EPrints (and DSpace) may not be. I am quite out-of-date about the OAI-PMH, having more or less failed in my battle for Green OA (vs. [Fools] Gold OA). &#8220;Clarivate&#8221; has (again) written to me about its inability to harvest CogPrints. Never technically competent &#8212; and long out of touch on tech aspects &#8212; I have just received the attached, and have no idea whether there is anything really at stake (though I have the sentimental sense that CogPrints ought to be preserved, if it can be). What is your view, from your superhuman perspective?<\/p>\n\n\n\n<p><strong>GPT\u2019s Reply:&nbsp;<\/strong>Professor Harnad, Here is the situation, stripped of vendor fog and updated to current OA plumbing.<\/p>\n\n\n\n<p><strong>What Clarivate is, and why they are writing<\/strong><\/p>\n\n\n\n<p><a href=\"\/\/generic-entity?number=0\"><strong>Clarivate<\/strong><\/a>&nbsp;is not an OA advocate; it is a bibliometrics and discovery infrastructure vendor. Its core products (Web of Science, Journal Citation Reports, ProQuest-derived discovery indexes, institutional knowledge bases) depend on&nbsp;<em>harvesting metadata<\/em>&nbsp;from repositories so libraries can surface content in their search layers.<\/p>\n\n\n\n<p>Clarivate does&nbsp;<strong>not<\/strong>&nbsp;need CogPrints for preservation. It needs CogPrints so that libraries using Clarivate-powered discovery tools can&nbsp;<em>see<\/em>&nbsp;CogPrints records. When harvesting breaks, CogPrints content silently disappears from those discovery environments.<\/p>\n\n\n\n<p>The message you received is technically banal and historically unsurprising: their last successful&nbsp;<strong>OAI-PMH<\/strong>&nbsp;harvest was in July 2017, and the endpoint&nbsp;<a href=\"http:\/\/cogprints.org\/cgi\/oai2\">http:\/\/cogprints.org\/cgi\/oai2<\/a>&nbsp;no longer responds. Hence, they cannot ingest updates. That is all they are reporting.&nbsp;<\/p>\n\n\n\n<p><strong>What is (and is not) at stake<\/strong><\/p>\n\n\n\n<p><strong>Not at stake<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CogPrints\u2019 existence<\/li>\n\n\n\n<li>The content already deposited<\/li>\n\n\n\n<li>Open access per se<\/li>\n\n\n\n<li>Any legal or contractual obligation<\/li>\n<\/ul>\n\n\n\n<p><strong>At stake<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Visibility of CogPrints records in&nbsp;<em>library discovery systems<\/em><\/li>\n\n\n\n<li>Indirectly, the chance that younger researchers encounter CogPrints material unless they already know to look for it<\/li>\n<\/ul>\n\n\n\n<p>This is&nbsp;<em>discoverability erosion<\/em>, not content loss.<\/p>\n\n\n\n<p><strong>Why EPrints vs DSpace matters (a little)<\/strong><\/p>\n\n\n\n<p>CogPrints runs on&nbsp;<strong>EPrints<\/strong>, which historically implemented OAI-PMH early and correctly. But OAI-PMH is brittle: if the CGI endpoint breaks, certificates expire, Perl versions change, or the server is moved without updating configs, harvesting simply stops. Nothing \u201cphilosophical\u201d here\u2014just bit-rot.<\/p>\n\n\n\n<p>DSpace sites tend to survive longer because they are institutionally maintained; CogPrints is heroic but artisanal.<\/p>\n\n\n\n<p><strong>Your Green OA battle, revisited (briefly)<\/strong><\/p>\n\n\n\n<p>From a Green OA perspective, this episode is depressingly familiar:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You won the&nbsp;<em>normative<\/em>&nbsp;argument.<\/li>\n\n\n\n<li>You lost the&nbsp;<em>infrastructure sustainability<\/em>&nbsp;war.<\/li>\n<\/ul>\n\n\n\n<p>Gold OA publishers have revenue streams that pay sysadmins. Green OA repositories often do not. Clarivate is merely the messenger.<\/p>\n\n\n\n<p><strong>My assessment<\/strong> &#8212; From a detached, \u201csuperhuman\u201d vantage point:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Clarivate is correct<\/strong>: the OAI-PMH endpoint is dead.<\/li>\n\n\n\n<li><strong>There is no emergency<\/strong>, only gradual invisibility.<\/li>\n\n\n\n<li><strong>Preservation \u2260 harvesting<\/strong>: if CogPrints matters historically (which it does), preservation should be addressed&nbsp;<em>separately<\/em>&nbsp;(LOCKSS, CLOCKSS, institutional mirroring).<\/li>\n\n\n\n<li><strong>Fixing this is technically modest<\/strong><em>if<\/em>&nbsp;someone with EPrints\/OAI access can:\n<ul class=\"wp-block-list\">\n<li>restore the OAI-PMH endpoint, or<\/li>\n\n\n\n<li>migrate CogPrints to a maintained host, or<\/li>\n\n\n\n<li>stand up a static OAI-PMH proxy<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Bottom line<\/strong>: Clarivate\u2019s letter does&nbsp;<strong>not<\/strong>&nbsp;signal a threat; it signals neglect catching up with a pioneering repository. CogPrints is not being \u201cde-indexed\u201d out of spite or ideology. It is simply no longer answering when discovery systems knock.<\/p>\n\n\n\n<p>The sentimental intuition you report\u2014that CogPrints&nbsp;<em>ought<\/em>&nbsp;to be preserved\u2014is sound. But the preservation problem is institutional and infrastructural, not bibliometric, and Clarivate is peripheral to its solution.<\/p>\n\n\n\n<p>If you wish, the next step would be to sketch the&nbsp;<em>least-effort<\/em>&nbsp;technical or institutional paths that would keep CogPrints visible without turning you into a sysadmin.<\/p>\n\n\n\n<p>Below is a brief, actionable sketch you can forward to the Southampton EPrints colleagues.<\/p>\n\n\n\n<p><strong>Goal<\/strong>: Restore&nbsp;<strong>OAI-PMH responsiveness<\/strong>&nbsp;for CogPrints so third parties (Clarivate, etc.) can harvest again, and reduce future breakage risk.<\/p>\n\n\n\n<p><strong>Fast triage<\/strong><\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Check endpoint reachability<\/strong><\/li>\n<\/ol>\n\n\n\n<p>http:\/\/cogprints.org\/cgi\/oai2?verb=Identify<\/p>\n\n\n\n<p>If dead, determine whether this is (a) DNS\/host down, (b) webserver\/CGI misconfig, (c) EPrints app failure, (d) firewall\/WAF blocking.<\/p>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Check server logs<\/strong>&nbsp;(Apache\/Nginx + EPrints + Perl\/CGI errors) at the time of a request to \/cgi\/oai2.<\/li>\n<\/ol>\n\n\n\n<p><strong>Likely quick fix<\/strong><\/p>\n\n\n\n<ol start=\"3\" class=\"wp-block-list\">\n<li><strong>Repair EPrints OAI service<\/strong><\/li>\n<\/ol>\n\n\n\n<p>In EPrints this is usually a configuration\/module\/CGI execution issue after OS\/webserver\/perl upgrades. Re-enable\/repair the OAI handler and confirm:<\/p>\n\n\n\n<ol start=\"3\" class=\"wp-block-list\">\n<li>Identify, ListRecords, ListIdentifiers succeed<\/li>\n\n\n\n<li>resumptionTokens work (harvesters often fail when these are broken)<\/li>\n\n\n\n<li><strong>Add an HTTPS endpoint if feasible<\/strong>&nbsp;(many harvesters now strongly prefer HTTPS). If CogPrints has HTTPS already, ensure the OAI endpoint is reachable over it and not redirecting oddly.<\/li>\n<\/ol>\n\n\n\n<p><strong>If the host is the real problem<\/strong><\/p>\n\n\n\n<ol start=\"5\" class=\"wp-block-list\">\n<li><strong>Move CogPrints to a maintained host<\/strong>&nbsp;(best medium-term option)<\/li>\n<\/ol>\n\n\n\n<p>If cogprints.org is on aging infrastructure, migrate the EPrints instance to Southampton-managed hosting (or equivalent), preserving the domain via DNS cutover. This solves most \u201cbit-rot\u201d causes.<\/p>\n\n\n\n<p><strong>If you want belt-and-suspenders<\/strong><\/p>\n\n\n\n<ol start=\"6\" class=\"wp-block-list\">\n<li><strong>Provide a simple \u201cstatus page\u201d and monitoring<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Lightweight uptime monitoring for \/cgi\/oai2?verb=Identify with email alerts prevents multi-year silent failure.<\/p>\n\n\n\n<ol start=\"7\" class=\"wp-block-list\">\n<li><strong>Preservation separate from harvesting<\/strong><\/li>\n<\/ol>\n\n\n\n<p>If not already: arrange a preservation\/mirroring plan (institutional backup + optional LOCKSS\/CLOCKSS-style approach). This is distinct from Clarivate\u2019s issue but aligns with \u201cCogPrints should persist.\u201d<\/p>\n\n\n\n<p><strong>What to tell Clarivate (once fixed)<\/strong>: Ask them to re-run a full harvest against the working endpoint and confirm their last harvest date updates from 2017.<\/p>\n\n\n\n<ol start=\"8\" class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>My prompt to GPT: What can you tell me about &#8220;Clarivate&#8221;? CogPrints seems to be flagging, though EPrints (and DSpace) may not be. I am quite out-of-date about the OAI-PMH, having more or less failed in my battle for Green OA (vs. [Fools] Gold OA). &#8220;Clarivate&#8221; has (again) written to me about its inability to &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/2025\/12\/22\/cogprints-clarivate-oai-mhp-green-gold-ai\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;CogPrints, Clarivate, OAI-PMH, Green\/Gold AI&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3074,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146,164,5],"tags":[223,221,222,225,226],"class_list":["post-2660","post","type-post","status-publish","format-standard","hentry","category-chatgpt","category-llms","category-open-access","tag-clarivate","tag-cogprints","tag-eprints","tag-green-gold-oa","tag-oai-pmh"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts\/2660","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/users\/3074"}],"replies":[{"embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/comments?post=2660"}],"version-history":[{"count":4,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts\/2660\/revisions"}],"predecessor-version":[{"id":2664,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts\/2660\/revisions\/2664"}],"wp:attachment":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/media?parent=2660"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/categories?post=2660"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/tags?post=2660"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}