Introducing PPC information retrieval

Computational advertising is a fairly new coined term and is comprised of three elements:

  1. Sponsored search (SS) places ads on the search engine results page
  2. Conextual advertising (ConAd) deals with placing ads on third-party web pages
  3. Social adverting (SA) is the newest type of advertising which places ads on personalised home pages on a users’ social network profile page

Google AdWords T-shirt

Regardless of what specific element computational advertising uses they all use specifically selected, personalised, thus relevant ads in order to:

  • Immediately initiate user interest, increasing the chance of clicks
  • Create a better user experience (UX)
  • Clicks and good UX maximises revenue for every search engine.

    Sponsored search, otherwise called PPC, paid or non-organic search, also constantly touches on these points and employs a variety of technologies, such as, information retrieval, machine learning, statistical modelling and, for example, microeconomics. All major engines use estimated click-through rates (CTRs) alongside bids to rank PPC ads. There are many ways in which CTRs can be calculated, the most popular include:

    • Hierarchical clustering of bid terms (bidded term/s are organised in clusters) are organised into ‘pools’ and freshly retrieved each time
    • Information derived from ad text
    • Query-ad word pair indicator features

    Engines frequently conduct thorough experiments for PPC ads (e.g. Support Vector Machines, Maximum Entropy, adaBoost Decision Trees). Vector space modelling is what Google, Bing and Yahoo all use to construct their index. Neither engine is a semantic search engine.

    3 secreted PPC IR elements uncovered

    Every ad is comprised of three elements: a title; description; and display URL. Engines analyse this information and assign a score, such as, Quality Score on Google, for example. Does get more clicks on mobile compared to, say, (a responsive site)? PPC teams should utilise and regularly test display URLs.

    1. Word and character overlap
    Bigrams are when two adjacent words or symbols appear more commonly than other variations. In the English language, for instance, the letters “th” are the most common bigram. Engines also use this data to improve PPC relevancy. Is it a case that the more common a word is, the more should be charged for that word?
    2. String edit distance
    Edit distance is precisely how spell check works on an engine: Edit distance simply compares two words and counts the minimum number of operations required to transform one string into the other. This is how engines can replace British and American spellings (e.g. capitalise vs capitalize).
    For SEO engines stem keywords on their index, [swim] triggers [swimming].
    3. Cosine similarity
    Since engines use vector space modelling, cosine similarity identifies how similar two documents are likely to be in terms of their subject matter. Does this document match the searchers’ keywords?

    Calculating Quality Score

    Every engine potentially can use the above elements to boost click-through rates: CTRs are boosted by improving ad quality via relevancy. Again reinforcing why Google rejects PPC ads. CTRs are a relevance measure; thus are used to rank PPC ads on all the major search engines. To do this:

    • Cosine similarity between query and title
    • Character overlap between query and abstract
    • Character overlap between query and (display) URL
    • Query length
    • A feature that counted the number of bigrams in the query that had the order of the words preserved in the title

    It is also worth noting that engines now place weight on UX. So your website needs to, at least, be fast loading, easy to use and scan, clearly and correctly labelled etc.

    UX and relevancy also examines what landing page the ad will land the searchers on. As a result, URL redirect is also a quality score signal.

    Clicks alone can be a weak indicator of relevance due to click fraud, accidental or exploratory clicks, and position bias. For this reason engines also take note of personal records or, histories if you like, to better understand your behaviour.

    Image: Copyright granted, reused, unmodified.


    Dave, K. and Varma, V. (2011) Computational Advertising: Leveraging User Interaction & Contextual Factors for Improved Ad Retrieval & Ranking. PD. D. Symposium. Hyderabad, India.