Could portal.hdfgroup.org be made searchable by Googlebot?


#1

Hi again :wink:

Entering this search string into Google:

site:portal.hdfgroup.org hdf5

turns up no results.

Is there a compelling reason that the HDF5 documentation is not searchable (presumably because Google’s crawler has been banned from this website) ? This really makes it extremely difficult to find HDF5 documentation about specific topics unless one knows in advance exactly where to look. It would be greatly appreciated if this could be fixed!

Thanks for considering,

Kevin


#2

Hi Kevin,

We are unfortunately aware of this problem. While we have identified a few potential causes and solutions, we have not had the resources to devote to this issue. We hope to work on it during this calendar year to get this resolved and get our documentation in a better state for users. As always, we welcome feedback and/or resources from our community to resolve this.

Lori Cooper
Product Marketing Associate


#3

Thanks for your reply, Lori!

I noticed since first posting that this issue was already reported an entire year ago:

Respectfully, I suggest that prioritizing a fix would benefit both current HDF5 library users and the HDF group. The current lack of searchable docs presents both a challenge to existing users, and a barrier against uptake by prospective users.

I have no insight into the technical issues you face, but maybe a workaround would be to generate a static mirror of the website (daily?) and allow that mirror to be searchable? This mirror could even be maintained by some enterprising person outside the HDF group if permission to create a mirror could be granted. (I am afraid that I cannot be that person since maintaining a mirror site would be far too time intensive for me.)

Best regards,
Kevin


#4

It looks like the robots.txt file (https://portal.hdfgroup.org/robots.txt ) creates a redirect loop - attempting to fetch it results in a 302 redirect back to the same address. As responsible crawlers use that file to determine what they are allowed to look at, I wonder if they respond to this brokenness by assuming the entire domain is off-limits.