WhiteCollar 2.0

In most cases, the section mosaic is the first page a reader lands on when accessing a publisher's mobile site. For this reason, section mosaics need to be compelling and magnetic to engage the user and maximize content exploration, time spent on site, and ad revenue generated. whiteCollar is the JavaScript code Marfeel has engineered that extracts the content from a publisher's desktop website for the section mosaics of their Marfeel Progressive WebApps (PWA). The latest version of whiteCollar was developed to avoid and eliminate code duplication. 

This article presents the usage guidance principles that Marfeel follows for whiteCollar.

whiteCollar Template

document.whiteCollar = (function () {
    // Put here local variables to avoid polluting the global scope

    return {
        removeDuplicates: true,
        numColumns: 10,
        getItems: [
            {
                selector: '',
                extractors: {
                    title: 'h2 > a',
                    uri: 'h2 > a',
                    media: 'img',
                    excerpt: '',
                    date: '',
                    subtitle: '',
                    author: '',
                    pocket: {}
                },
                modifiers: []
            }
        ],
        modifiers: []
    }
})();

Usage example whiteCollar 2.0

WC is the library name, just like $ for Jquery.

document.whiteCollar = {
    getItems: [
        {
            selector: 'article',
            extractors: {
                title: 'h1 > a',
                uri: 'h1 > a',
                media: 'img',
                excerpt: function (articleNode) {
                    var excerpt = WC.qs('.excerpt', articleNode);
                    return excerpt.textContent.trim().slice(50);
                },
                date: '.date'
            },
			modifiers: WC.applyBlacklist(['blacklistedStringInURL'])
        }
    ],
    modifiers: [WC.limitArticles(30), WC.filterEqConsecutiveArticles()]
};

To use whiteCollar 2.0, Marfeel engineers need to pass an array to getItems instead of a function. Marfeel engineers no longer need to define them because everything is already specified inside the getItems.

getItems are an Array of Object and every Object defines a different group of Items. 

Selector

The first property of the Object is a selector where Marfeel specifies the selector used to query the items. This String is then passed to a document.querySelectorAll under the hood, where Marfeel engineers can write everything they would write in querySelectorAll.

Even though different comma separated selectors can be placed here, it's better to keep it short and simple. 

Extractors

The next property is extractors. This is where the whiteCollar is instructed on how to extract the item properties.

The default and most simple method is to specify a selector, just like with the selector property. In addition, in this case, the string is passed under the hood to a querySelector function. The same recommendations specified for the selector property apply here.

The other method is to use a function in which you can manipulate the articleNode to get the item property back. Caution should be exercised because when the selector is used, Gutenberg takes care of everything, however, when a function is passed, it assumes that you know what you are doing and passes the articleNode and expects you to return the elaborated property as in the example (excerpt) above.

Not all the extractors behave like this, as presented in the following detailed list:

  • title: String (selector) or Function

  • subtitle: String (selector) or Function

  • media: Empty String (null media), String (selector), Object (like in the default whiteCollar), or leave it undefined to so it will default to the "IMG" selector

  • uri: String (selector) or Function

  • date: String (selector) or Function

  • excerpt: String (selector) or Function

  • author: String (selector) or Function

  • isExtractable: Boolean or Function

  • allowDifferentHost: Boolean or Function

  • pocket: Object or Function

Modifiers

The last property is modifiers. It is a property of both the getItems Objects and of document.whiteCollar.

 It accepts an array of functions or a single function which manipulates the array of items.

 The modifiers passed to getItems are applied only on the items selected by selector and the ones passed to whiteCollar are applied to all the items.

 The modifiers list should be passed inside an Array, or directly if it's only one modifier.

The following is an example of a more complex scenario:


document.whiteCollar = {
    getItems: [
      {
            selector: 'article',
            extractors: {
                title: 'h1 > a',
                uri: 'h1 > a',
                media: 'img',
                excerpt: function (articleNode) {
                    var excerpt = WC.qs('.excerpt', articleNode);
                    return excerpt.textContent.trim().slice(50);
                },
                date: '.date'
            }
      },
      {
        selector: 'article.carousel',
        extractors: {
          title: 'h1 > a',
          uri: 'h1 > a',
          media: '', // pass empty string if you want media to be null, otherwise it will fallback to the default media selector which is 'img'
          author: '.author',
          date: '.date'
        },
        modifiers: [
          WC.limit(10),
          WC.applyBlacklist(['sport', 'news'])
        ] 
      },
      {
        selector: 'article.latest',
        extractors: {
          title: 'h1 > a',
          uri: 'h1 > a',
          author: '.author',
          date: '.date'
        }
      },
      {
        selector: '#widget',
		extractors: {
			title: 'h1 > a',
			uri: 'h1 > a',
			pocket: {
 				className: "mrf-widget",
				widget: "homeWidget"
			}
		}
      }
    ],
   modifiers: WC.applyBlacklist(['blacklistedStringInURL'])
};

WC API

Modifiers

All modifiers are functions returning a function. This should be noted and kept in mind in cases where you build your own.

limitArticles

Limits the number of the extracted articles according to the specified number.

Wc.limitArticles(Number limit)

filterEqConsecutiveArticles

Filters all the articles which are consecutives and have the same property value as the one specified as the argument.

If no argument is specified, it defaults to URI.

WC.filterEqConsecutiveArticles(String propertyName [default "uri"])

uniqueBy

WC.notExtractableIf(Function checker)

notExtractableIf

WC.notExtractableIf(Function checker)

Set the item "isExtractable" to false when the checker function returns true.

For example:

WC.notExtractableIf(function(item) {
	return item.uri === "http://subdomain.tenant.com";
});

applyBlacklist

Filters all the items for the URI which contains any of the strings specified in the blacklist Array.

WC.applyBlacklist(Array blacklistedStrings)

Utility Functions

Boolean WC.contains(Array or String container)(Any content)

The curried function checks if "content" is contained by "container."

For example:

var uriContains = WC.contains(item.uri);
 
var isSubdomain = uriContains("sport");

getSectionName

WC.getSectionName(function Extractor)

If no argument is specified, it uses the default extractor which takes the first string of the pathname.

For example:

// current page URI: "http://example.tenant.com/this/is/pathname
 var sectionName = WC.getSectionName(); // sectioName will be "this"

getPageNumber

WC.getPageNumber(function Extractor)

If no argument is specified, it uses the default extractor which takes the number from "/page/(number)"

For example:

// current page URI: "http://example.tenant.com/home/page/3
var pageNumber = WC.getPageNumber(); // pageNumber will be 3
 
// current page URI: "http://example.tenant.com/home
 
var pageNumber = WC.getPageNumber(); // pageNumber will be 1


filterFalsy

Filters all items of an array which are falsy values ("", 0, NaN, null...).

qs

Node WC.qs(String selector, Node node[default document])

qsAll

Array WC.qsAll(String selector, Node node[default document])

 

convertToArray

Converts array-like objects to Array.

Workflow

Identify all the groups of items with the same structure in the tenant page. Every group will be mapped into an Object of the Array passed to getItems.

Once you have defined the selector of every group, you can proceed to define the extractors of every group.

If you find yourself writing some custom function that you think could be useful to someone else in the future, be sure to add it to the library. For more information, see the How to contribute section.

How to contribute

All the library functions are defined in "wcLibrary.js" in Gutenberg which makes significant use of Ramda.js. To better understand the source code, see the Ramda.js documentation for more information. In addition, remember that it's possible to use both Ramda and Jquery inside the whiteCollar.

The purpose of this new version is to avoid code duplication, so every time you find yourself writing some custom function that is already used in another file or going to be used in the future, be sure to add it to wcLibrary.js.