Article Details Metadata

Marfeel uses Boilerpipe to detect and extract a publisher's metadata in the article details. Essentially, a customer's page is scraped tag by tag to identify, extract, and store the metadata in the article details.

In Boilerpipe, metadata is a detector. Every time there's a script, Marfeel scans the script to identify and parse relevant elements according to the heuristics in place. 

When a metadataProvider is detected, Marfeel extracts the information and stores it at the beginning of an article.

Example

The following is an example of extracting custom dimensions for Google Analytics:

public class DefaultCustomDimensionDetector extends AbstractCustomDimensionMetadataDetector {
    private final static Pattern SCRIPT_INFO_ELEMENT = Pattern.compile("[ga|_gaTracker]\\(['\"]set['\"],\\s?['\"](dimension\\d+)['\"],\\s?['\"](.*?)['\"]", Pattern.CASE_INSENSITIVE);

    @Override
    protected CustomDimension getCustomDimensions(String content) {
        CustomDimension customDimension = new CustomDimension();

        Matcher matcher = SCRIPT_INFO_ELEMENT.matcher(content);

        while(matcher.find() && matcher.groupCount() > 0) {
            customDimension.add(matcher.group(1), matcher.group(2));
        }

        return customDimension;
    }

    @Override
    public String getName() {
        return "defaultGACustomDimension";
    }
}