Instead of using feeds or RSS, Marfeel uses a crawler to browse our publishers' sites for new content to extract and be updated in their Marfeel Progressive WebApps (PWA). The benefits of using a crawler are:
- There's no integration needed on the customer's side
- Provides an automated process right out-of-the-box that seamlessly extracts content directly from a client's website
- Publishers who use several CMS's behind the scenes can find it very difficult to consolidate a solution to output content in a mobile friendly format. Marfeel's crawler manages this complexity by remaining agnostic to the system and sitting behind the scenes.
Content and its context is extracted which means Marfeel displays articles in a way that matches the importance of your current site - not just showing the most recent first.
How Marfeel crawls a client's website for new content
Marfeel periodically crawls a publisher's site to obtain the content.
Once the original HTML code is obtained, it's processed to clean it up by removing unnecessary and repeated elements like navigation bars, footers, and traditional desktop ad placements. This process allows Marfeel to optimize the markup and enable lazy loading on media elements like images, videos, audio, and third party widgets like Twitter or Instagram embeds, and so on.
The last step in the crawling process is to minify, compress, and then package the content. When the final transformed HTML of individual articles is consolidated, it's minimized and packaged with other articles that, with a very high probability, will be consumed together. This is done to reduce loading times and provide the user with a sense of immediacy.
The transformed HTML is then hosted and delivered from Marfeel's elastic infrastructure behind a content delivery network (CDN) to pre-cache content and speed up content delivery.
Marfeel has a crawling API that forces an article or section to be crawled at any precise moment.
This would be useful in scenarios where:
- There's an editorial mistake in the content that needs to be changed and corrected
- To integrate with a client's CMS.