For content in the article details section, Marfeel uses an incremental extraction strategy that's subject to how often the publisher changes their content.

Marfeel's extraction frequency is finely tuned. It starts crawling very frequently, gradually incrementing during the first hour. For the first 10 minutes, the article details crawl frequency is incremented by a factor of 1.1 for every minute. From 10 minutes to 1 hour, the increments remain constant at 1.5. From 1 hour to the 24 hour mark, this increment increases to 2. At 24 hours the crawling frequency plateaus.

These time values denote when Marfeel invalidates expired items and crawls the publisher's article details for new content. 

If a 'hot' article is detected with very rapid updates, such as minute-by-minute coverage of a live event, the crawling frequency is set to a maximum of a minute. 

When new content is detected, this material will be extracted and crawled by increments that decrease by a factor of 0.7 every time new content is detected.

This is especially great for legacy or archived content because we stop calling for updates so that our publishers' servers don't receive these requests and we can focus traffic on active queries, but still provide this real-time extraction mechanism.

Crawling Process

Marfeel's crawling policy and algorithm are designed this way to deliver the optimal user experience and UX features that increase traffic and revenue. The process follows these two steps:

  1. Marfeel sanitizes and minimizes the depth of the HTML for optimization

  2. The Marfeel platform then replaces the widgets and elements for their lazy loading counterparts. Components are loaded as late as possible and Marfeel pre-fetches everything as early as possible to boost the loading times and provide an instantaneous experience.

    Normally all text files are delivered from the domain through http/2 under https which also contributes to improving loading times.