**------------------------------------------------------------------------------------------------
* @header_start
* WebGrab+Plus ini for grabbing EPG data from TvGuide websites
* @Site: locatetv.com
* @MinSWversion: 1.1.1/53
* @Revision 2 - [26/06/2014] Francis De Paemeleere
* - make .channels.xml generation generic for (US/UK/IE)
* @Revision 1 - [25/06/2014] Jan van Straaten/Hicks
* - update to site
* @Revision 0 - [21/06/2014] Jan van Straaten
* - creation
* @Remarks: readme.txt for customization
* @header_end
**------------------------------------------------------------------------------------------------
*
site {url=locatetv.com|timezone=US/Pacific|maxdays=12|cultureinfo=en-US|charset=utf-8|titlematchfactor=90}
*url_index{url|http://www.locatetv.com/listings/|channel|#|urldate}*http://www.locatetv.com/listings/cnbc-hd#23-Jun-2014
*urldate.format{datestring|dd-MMM-yyyy}
*
url_index{url|http://www.locatetv.com|channel|?offset=|urldate}
urldate.format{daycounter|0}
url_index.headers {customheader=Accept-Encoding=gzip,deflate}
url_index.headers {customheader=X-Requested-With=XMLHttpRequest}
*
index_urlchannellogo {url||
|style="width: 0px"> }
index_showsplit.modify {cleanup(removeduplicates)} * simple removeduplicates sufficient?
scope.range {(indexshowdetails)|end}
index_start.scrub {single(excludeblock="min""now")|
||}
*
index_temp_1.scrub {single|
||} * start value without excludeblock
*in case
17 mins ago
index_temp_2.modify {substring(type=regex)|'index_temp_1' "\A(\d{1,2}) min"} * the minutes 'ago'
index_temp_2.modify {addstart(not "")|00:} *timespan format
*in case
in 17 mins
index_temp_3.modify {substring(type=regex)|'index_temp_1' "\Ain (\d{1,2})"} * the minutes to go
index_temp_3.modify {addstart(not "")|00:} *timespan format
* calculate the start time from the 'now' (index_variable_element) value
* if ago
index_start.modify {calculate('index_temp_2' not "" format=time)|'index_variable_element' 'index_temp_2' -}
* if to go
index_start.modify {calculate('index_temp_3' not "" format=time)|'index_variable_element' 'index_temp_3' +}
*
index_showicon.scrub{single|>
|"/>}
*
index_temp_4.scrub {single|
|">||}
index_temp_5.scrub {single||">||}
index_title.modify {addstart(d'index_temp_5' "")|'index_temp_4'}
index_title.modify {addstart('index_temp_5' not "")|'index_temp_5'}
index_subtitle.modify {addstart('index_temp_5' not "")|'index_temp_4'}
* episode, two cases :
* subtitle starts with Season 3 Episode 5: ....
index_episode.modify {substring(type=regex)|'index_subtitle' "\A(.+?Episode \d{1,}):"}
* subtitle starts with EPISODE: 25
index_episode.modify {substring('index_episode' "" type=regex)|'index_subtitle' "\A(EPISODE: \d{1,})"}
*
index_subtitle.modify {remove(type=regex)|"\A('index_episode': )"}
index_subtitle.modify {remove(type=regex)|"\A('index_episode')"}
index_episode.modify {remove|:}
index_episode.modify {cleanup(style=lower)}
index_description.scrub {single|||
|}
*
* details and subdetails
*index_urlshow.modify {clear}
*index_urlsubdetail.modify {clear}
*index_temp_1.modify {clear}
*index_temp_2.modify {clear}
*
* if there is a 'star appendLink series'
* get title, desc, cat
* get the urlshow from the 'star appendLink series'
index_urlshow.scrub {single||href="|">|}
index_urlshow.modify {addstart('index_urlshow' not "")|http://www.locatetv.com}
*
* if there is no 'star appendLink series' the details are in the appendLink
* in that case we can clear the index_description because the same is also in the details
* get the urlshow from the 'star appendLink' but only is urlshow is still "" (no appendLink serie)
index_description.modify {clear('index_urlshow' "")}
index_temp_6.scrub {single||href="|">|}
index_urlshow.modify {addstart('index_urlshow' "")|http://www.locatetv.com'index_temp_6'}
index_urlshow.headers {customheader=Accept-Encoding=gzip,deflate}
end_scope
*
title.scrub {single||
|
|}
description.scrub {single||" />}
category.scrub {single||||
}
productiondate.scrub {single||||
}
category.modify {remove| ('productiondate')}
actor.scrub {multi|Cast
|)"}
*keith-david/21527">Keith David Stappleton
* role?
* remove role:
*actor.modify {remove(type=regex)|".+?(.*?)\Z"}
* altenative: add word 'as' between name and role
actor.modify {replace(type=regex)|"().+"| as}
actor.modify {cleanup}
actor.modify {substring(type=element)|0 8} * limits to 8 actors
*
* the cast is also in a subdetail page, it lists director and producer:
* enable the next two lines to get that
*urlsubdetail.modify {addstart('index_urlshow' not "")|'index_urlshow'/cast}
*urlsubdetail.headers {customheader=Accept-Encoding=gzip,deflate}
subdetail_temp_1.scrub {multi|Credits
|)"}
subdetail_director.modify {substring(type=regex)|'subdetail_temp_1' "\A(.+?) Director"}
subdetail_producer.modify {substring(type=regex)|'subdetail_temp_1' "\A(.+?) Producer"}
subdetail_producer.modify {substring(type=regex)|'subdetail_temp_1' "\A(.+?) Executive-Producer"}
*
* actor is already in the detail page, but just in case:
**subdetail_actor.scrub {multi|Cast
|)"}
**subdetail_actor.modify {cleanup(tags="<"">")}
**subdetail_actor.modify {substring(type=element)|0 8} * limits to 8 actors
**subdetail_actor.modify {substring(type=regex)|"\A(.+?) as "} * optional removal of role
** _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
** ##### CHANNEL FILE CREATION (only to create the locate.channel.xml file)
**
** @auto_xml_channel_start
*site {loadcookie=locatetv.com_cookies.txt}
*subpage.format{list(format=F0 step=1 count=25)|1}
*url_index {url|http://www.locatetv.com/listings/?start=&page=|subpage}
*index_site_channel.scrub {regex||]*class="channel"[^>]*data-name="([^>]*)">||}
*index_site_id.scrub {regex||]*class="channel".*?