You are here

swisscom.ch

28 posts / 0 new
Last post
DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year
swisscom.ch

Hi,I try to create an ini for swisscom.ch

but i stuck on "error downloading page: Index was outside the bounds of the array."

maybe someone can help ?

ini :

**------------------------------------------------------------------------------------------------
* @header_start
* WebGrab+Plus ini for grabbing EPG data from TvGuide websites
* @Site: services.sg1.etvp01.sctv.ch
* @MinSWversion: V2.1.5
* @Revision 1 - [25/03/2019] DeBaschdi
* -Creation
* @Remarks:
* @header_end
**------------------------------------------------------------------------------------------------
site {url=swisscom.ch|timezone=UTC|maxdays=14.1|cultureinfo=de-DE|charset=UTF-8|titlematchfactor=50}
*

url_index{url(debug)|https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/ids=|channel|;level=enorm;start=|urldate|}
urldate.format {datestring|yyyyMMddHHmm}

url_index.headers {customheader=Accept-Encoding: gzip, deflate, br}
url_index.headers {accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8}
url_index.headers {accept=application/json; charset=utf-8}

*index_urlshow.modify {cleanup(style=jsondecode)}

*
index_showsplit.scrub {(debug)multi|"Content":|"Channel"}
*index_showsplit.modify {cleanup(style=jsondecode)}

index_start.scrub {single(pattern="yyyy-MM-dd-HH:mm:ss")|"Start":"||"|"}
index_stop.scrub {single(pattern="yyyy-MM-dd-HH:mm:ss")|"End":"||"|"}

log :

[ Info ] ( 1/1 ) SWISSCOM.CH -- chan. (xmltv_id=ard) -- mode Force
[ Debug ] debugging information siteini; urlindex builder
[ Debug ] siteini entry :
[ Debug ] urldate format type: datestring, value: |yyyyMMddHHmm
[ Debug ] https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/ids=|channel|;level=enorm;start=|urldate
[ Debug ] url_index created:
[ Debug ] https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/ids=25;leve...
[Error ] error downloading page: Index was outside the bounds of the array. (10sec)
[Error ] retry 1 of 4 times
[Error ] error downloading page: Index was outside the bounds of the array. (20sec)
[Error ] retry 2 of 4 times
[Error ] error downloading page: Index was outside the bounds of the array. (30sec)
[Error ] retry 3 of 4 times
[Error ] error downloading page: Index was outside the bounds of the array. (40sec)
[Error ] retry 4 of 4 times
[Error ] Unable to update channel ard
[Critical] Generic syntax exception:
[Critical] message:
[Error ] no index page data received from ard
[Error ] unable to update channel, try again later
[ Info ] Existing guide data restored!
[ Debug ]
[ Debug ] 0 shows in 1 channels
[ Debug ] 0 updated shows
[ Debug ] 0 new shows added
[ Info ]
[ Info ]
[ ] Job finished at 29/03/2019 06:11:31 done in 0s

i think i have an problem to find the right index headers ?

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

after a littlebit testing with headers :

url_index.headers {host=services.sg1.etvp01.sctv.ch}
url_index.headers {accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8}
url_index.headers {customheader=Accept-Encoding=gzip,deflate,br}
url_index.headers {customheader=Accept-Language=de,en-US;q=0.7,en;q=0.3}
url_index.headers {customheader=Upgrade-Insecure-Requests=1}

log now says :
[ Debug ] url_index created:
[ Debug ] https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/ids=25;leve...
[Warning ] error downloading page: Error: SecureChannelFailure (The authentication or decryption has failed.)

Attachments: 
mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours

you need to read the manual. There are a lot of things wrong. So first step use the pure url https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/(end=201903300500;ids=401;level=normal;sa=true;start=201903290500)
this way you will understand if you are grabbing the page or something else need to be done.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

thx for your response, but i get always "SecureChannelFailure"
way over php works

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

"SecureChannelFailure"

1.if ur on window u need min wg V2.1.5 and or netframework updated to latest for ur windows version.it cud also be tls.1.2 isnt enabled(win 7 for example) so u may have to search on how to check this also.
2.if linux u need mono > 5.0.0

mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours
DeBaschdi wrote:

thx for your response, but i get always "SecureChannelFailure"
way over php works

PHP.... what is that the current webgrab fashion? on 100 sites only 1 need php.
tvair.swisscom.ch is one of the 99.
So remove those crap url index headers and start checking what is downloaded (first check what BB199 said)
here a start
url_index{url|https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/(end=201903300500;ids=401;level=normal;sa=true;start=201903290500)}
url_index.headers {customheader=Accept-Encoding=gzip,deflate}
*
urldate.format {datestring|yyyyMMdd}

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

thx again for replay, but still "secure channel"
im on ubuntu 18.04, Mono= (4.6.2.7+dfsg-1ubuntu1)

maybe thats the problem bb said..

i try tu upgrade my mono version

**edit

yayyyy thx BB
with mono JIT compiler version 5.18.1.0 (tarball Fri Mar 15 20:41:32 UTC 2019)
nomore "secure channel" problem.

mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours

Good, now next step ;)

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

ok, the secure channel problem is gone away with mono >5
but im not able to recice the html.source.htm for "debugging"

maybe i should finish the php version, and you profis fix it up for "normal" usage :)

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

ok, i stuck on the next step, trying to scrape the start and stop time.

i already seperated my blocks, from progamm ---> programm (see log (attached))

iḿ wrong with this "rule" to scrape the starttime ?
[{"AvailabilityStart":"2019-03-29T01:08:00Z","AvailabilityEnd":"2019-03-29T01:10:00Z"}]

index_start.scrub {single(debug)(pattern="yyyy-MM-ddHH:mm:ss")|"AvailabilityStart":"||"|"}
index_start.modify {remove|T}
index_start.modify {remove|Z}

because nothing is showing up in the log for "debugging"

Attachments: 
mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours

What do you mean php version? There is no php involved here.After the lines above if you want to check if the page is receieved, add to those 3 line: index_showsplit.scrub{multi(debug)||||} this command should show the page downloaded in webgrablog.txt and will also download the html page. Forget php...there is nothing in this site that involves php.
You have wrong showsplit....this means your idea is confused. Read page 60-61....then you proceed with time and other stuff.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

i know, but the site doesnt download...?!
maybe one more bug in my webgrab version....

with my php helperfile:

<?php
$agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0';
$dir_path = dirname(__FILE__);
$type = $_GET['type'];
$start = $_GET['date'];
$time = $_GET['time'];
$stop = date("Ymd", strtotime("$start +$time days"));

if($type == '1') {
$channel = $_GET['channel'];
$url2 = 'https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/(end=' . $stop . '2359;ids=' . $channel . ';level=normal;start=' . $start . '0000)';
$ch = curl_init ($url2);
curl_setopt ($ch, CURLOPT_HTTPHEADER, array('text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'));
curl_setopt ($ch, CURLOPT_HTTPHEADER, array('application/json; charset=utf-8'));
curl_setopt ($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $dir_path . '/swisscom.ch.cookies.txt');
$output2 = curl_exec($ch);
curl_close($ch);
echo $output2;

} elseif($type == '2') {
$url3 = 'https://services.sg2.etvp01.sctv.ch/portfolio/tv/channels';
$ch = curl_init ($url3);
curl_setopt ($ch, CURLOPT_HTTPHEADER, array('Accept: application/json, text/javascript, */*; q=0.01'));
curl_setopt ($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded; charset=UTF-8'));
curl_setopt ($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $dir_path . '/swisscom.ch.cookies.txt');
$output3 = curl_exec($ch);
curl_close($ch);
echo $output3;
}
?>

and this in my ini :
url_index{url|http://127.0.0.1:92/wordpress/swisscom_ch.php?channel=|channel|&date=|urldate|&time=##time##&type=1}
url_index.headers {customheader=Accept-Encoding=gzip}

every think is fine ...

mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours

All stuff that u don't need. Webgrab download the page, you don't need any of that. I already told you 3 times. NO PHP is needed.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

maybe its an bug in my mono version.... maybe i finish up the php version, and if u want, u can modify it for "normal usage" and we test again :)

anyway,
can u help me find the correct way to scrub the starttime ?

mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours

Above ini to start with...don't make things complex. Forget php, webgrab does it for you.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

omg, iḿ an idiod, yure absolutly right.
it works without php file, just an typingerror in my ini :D

ok, back to the roots, still the problem to scrape the starttime :)

mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours

Like i said above try with one channel line (url) so you understand. Actually you going too fast. You should start getting channel list, see what is needed in URL (channel_id, start_time and end_time or whatever will be needed), make up your channel list and after that work on url.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

aaaaaah
because of this "inblock"
[{"AvailabilityStart":"2019-03-29T01:08:00Z","AvailabilityEnd":"2019-03-29T01:10:00Z"}]

the (includeblock="AvailabilityStart") needs to be defined right ?

thank you very much matt, today, i learned alot :)

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

sometimes you are pretty puffed up, does that have to be?

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

I dont know what u mean.
There ist no copy paste, and start stop is defined and used.so, its absolutly wrong what u say. And a know what i do.
Defined :

$start = $_GET['date'];
$time = $_GET['time'];
$stop = date("Ymd", strtotime("$start +$time days"));

And here used :

$url2 = 'https://services.sg1.etvp01.sctv.ch/catalog/tv/channels/list/(end=' . $stop . '2359;ids=' . $channel . ';level=normal;start=' . $start . '0000)';

Also.. i dont understand everything in webgrab+. But im ready ro learn.

mat8861
Offline
WG++ Team memberDonator
Joined: 6 years
Last seen: 8 hours

For sure is surprising how u can set up something like that without knowledge of basic stuff .....bit strange.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

what does curl / php have to do with webgrab?
i'm new to webgrab, but have experience in other areas.
and when i see a command in an ini i understand it after a little try and error too.
but Matt is right, I should read the "instructions" first completely and experiment.
Nevertheless, I have managed to create a working ini for swisscom, of course, with the help of matt, it is not that difficult if someone is stuck behind it.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

Hi Guys,
the ini for swisscom is almost done.
I have a problem with the "channel-creation"
the problem is that the channelshapes block also contains "" Identifier ":".
and after hours of tinkering I have not found a usable solution.
can u please look over it?

this is my first ini, certainly much can be solved "more professional". maybe you take positions to the individual points, and give me tips.
in advance, thank you very much

Attachments: 
Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

thats where time and experience come into play,there are a few ways u can do it once u get used to using the tools webgrab provides.

index_site_id.scrub {multi(exclude="-")|{"Identifier":"||",|",}
or
index_site_id.scrub {regex||\{"Identifier":"(\d+)",||}

so the first using separator string method only keeps the identifiers that dont have a "-" in them,this should only keep the ones that are really channel ids.

the second does the same thing using regex by says the value between the 2 "xxx" must be all numbers so any indetifier with a letter,- or anything thats not a number will be excluded.

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

thx blackbear, the regex solution work :)

cixxo
Offline
Donator
Joined: 3 years
Last seen: 10 months

hi i'm interested in the ini file. would you like to share it once completed?

DeBaschdi
Offline
Has donated long time ago
Joined: 3 years
Last seen: 1 year

shure, feel free to use it.
almost bugfree :)

Attachments: 
cixxo
Offline
Donator
Joined: 3 years
Last seen: 10 months

tanks :-)

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl