Log File Analysis — please add meta-webindexer (Meta AI Search crawler) to bot classification
R
Robert Pielanen
Hi,
I'm using Log File Analysis and hit a bot-classification gap worth fixing.
Problem: In the Bot Activity Timeline, the single largest crawler on my site is bucketed under a generic label +https: instead of being named. On one week of logs for a single site it's ~491,000 requests — more than Googlebot, bingbot, and ChatGPT-User combined.
Cause: It's meta-webindexer, which isn't in your known-bot list. When the product token isn't recognized, your parser appears to fall back to the +https://… URL fragment in the user-agent and uses the scheme (+https:) as the label — so this bot (and several other unrecognized ones) all collapse into one meaningless +https: line.
The crawler:
meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
This is Meta's AI Search crawler — it indexes content for citation/linking in Meta AI search answers (per Meta's web-crawler docs). That makes it directly relevant to the AEO/AI-search use case your log tool targets, so it really should be a first-class entry.
Requests:
- Add meta-webindexer to the known-bot list, grouped under Meta. Ideally add the full current Meta family while you're at it:
- meta-webindexer — Meta AI Search
- meta-externalagent — Meta AI training
- meta-externalfetcher — Meta user-initiated fetch
- facebookexternalhit — link previews
- FacebookBot — speech/LM training
- Parser improvement: for unrecognized user-agents, fall back to the product token (the name before /version) rather than the +https:// URL scheme. That would make any unknown crawler show by its real name instead of everything piling into +https:.
Thanks