You are here

Trying to Improve an INI

11 posts / 0 new
Last post
hshah
Offline
Has donated long time ago
Joined: 3 years
Last seen: 2 years
Trying to Improve an INI

I have been spending loads of time trying to understand how all this works and make improvements to the existing RadioTimes UK INI.

I just about understand regex but can't seem to get my head around how it works with WebGrab++. I have still managed to make some improvements, but I prefer to learn and do more before I post it here for others.

So a few questions:

1) Let's say I am looking for a URL in the JSON like:
"uri":"http://abc.com/blah.jpg?quality=60"

I have the below so far based on what I copied from elsewhere, but a) I am not sure if that is right and b) how do I ensure that I only get back the first occurrence, knowing that there will be more than one?
index_temp_1.scrub {regex|"uri":"||","|","}

2) Once the above is working, and I get back something like:
http://abc.com/blah.jpg?quality=60

I have this regex to extract the filename, but no idea how I would use that in the context of my ini:
([^/]+).(jpg|bmp|jpeg|gif|png|tif)

3) If a show title matches lets say "ABC" (exact match) how can I remove it completely from the resulting file? If I do this, I assume it makes the title null, but is that sufficient?
index_title.modify {remove|ABC}

4) Is it possible to have multiple icons (index_showicon) with various sizes? I noticed it was a single thing, so I assume not, but I swear I have seen XMLTV files with multiple

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

to try to answer...

1.if the json data has..

"uri":"http://abc.com/blah.jpg?quality=60"

this is wrong for regex.. index_temp_1.scrub {regex|"uri":"||","|","}

i assume u meant separator string method..

index_temp_1.scrub {single|"uri":"||","|","}

2. u dont need to use regex again if u only want everything before the ?quality=60,it can be done on the same line as the scrub.

index_temp_1.scrub {single(separator="?" include=first)|"uri":"||","|","}

3. once a show is created during the showsplit it cannot be removed,yes u can remove the title and it will not appear in you xml file but your webgrab log file will have a message saying "skipped a show without a title at xxxxx".if for some reason u want to omit a show it has to be removed during the showsplit stage.

4. yes u can have multiple icons,it may or may not be a simple thing to do depending on how the data is formatted.
to give a simple example i will use the data u have above..

say u had multiple "uri":"xxxx","..

"uri":"http://abc.com/blah1.jpg?quality=60","uri":"http://abc.com/blah2.jpg?quality=60","uri":"http://abc.com/blah3.jpg?quality=60","

all u have todo is change it from a single element scrub to a multi..

index_showicon.scrub {multi(separator="?" include=first)|"uri":"||","|","}

the only difference in using..

index_showicon.scrub {single
and
index_showicon.scrub {multi

is for single webgrab only keeps the first element,where multi it keeps them all.

for other scenarios it would be more complicated.

hshah
Offline
Has donated long time ago
Joined: 3 years
Last seen: 2 years

Thank you very much for the detailed post. I can't say I understand all of it perfectly just yet, but I definitely have enough to continue tinkering/testing lol.

I have one more question based on what I've seen already being scrubbed in another ini file:

5) Assuming that there are multiple images but the first one referenced by "Image" has already been scrubbed using:

index_showicon.scrub {single|"Image":"||","|","}

I don't want the other image references (the first one is fine), how can I then duplicate "index_showicon" lets say 3 times, and make it into a multi?

The plan is to then add a different string to the end of each URL, which I am assuming I can just use .modify or .replace for, but I haven't worked out how to actually make it into a multi and then how to reference each item of the array with .modify or .replace.

I will let you know how I get on with the rest of questions you answered :)

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

index_showicon.scrub {single|"Image":"||","|","}
index_showicon.modify {set(not "")|'index_showicon'[what you want to add here]\|'index_showicon'[what you want to add here]\|'index_showicon'[what you want to add here]}

just keep adding \|'index_showicon'[what you want to add here] to the end as many time as u want.

webgrab uses a | as a separator internally for multi value elements but when we add to a single value element(to make it multi) we must escape the | so its \|

if u want the original image url un touched then dont add anything to the first element.

hshah
Offline
Has donated long time ago
Joined: 3 years
Last seen: 2 years

Woo Hoo! Probably not as efficient, but to help get my head around things and make future changes easier (mostly for myself), I ended up scrubbing the showicon multiple times and assigning separate temp_X variables. I did slightly different modifying for each one, and then combined the lot with the following:

index_showicon.modify {set(not "")|'index_temp_5'(height=576)(width=384)\|'index_temp_6'(height=720)(width=1280)}

How to add the height and width wasn't clear in the documentation nor could I find anything online, but managed to figure it out with some trial and error :)

I've noticed that the RadioTimes JSON contains IMDB references. Is it feasible to include them in the XMLTV so that any post processing can use this value rather than having to search/match based on other details?

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

post processing doesnt work that way.

the only thing that can be done is adding the imdb url with a subdetails page and there would be mupltiple of these needed.
it wouldnt be recommended as it would slow down the grabbing to a snails pace.

the site already has everything plus more available than you would get using the post processor.
if ur not getting everything you need to fix ur ini.

ex channel 4 for today.

Attachments: 
hshah
Offline
Has donated long time ago
Joined: 3 years
Last seen: 2 years

Hmm, you make it look easy and have also pointed out the bleeding obvious... other the possibility of better artwork, I have absolutely no idea why I have spent so long faffing around with the IMDB thing.

Funnily enough I actually just got the xmltv tv_imdb thing working and that augmented 57.9% of my XMLTV, but having looked at it in more detail, there is nothing of value being added lol!

On the plus side I have images working nicely - have a portrait and landscape version so that in Plex/Emby is looks decent on the majority of screens. I also managed to remove the Closed / Channel Off Air entries so that they do not appear on the EPG. Making some sort of progress at least :D

Have to ask, is the manual something that most people understand or am I missing something? I tend to find the part I am looking for, but then I can't work out how it all works. For example, I am now trying to add premiere and previous shown, but don't understand what it means about it having to be true. I also get back a mixture of true and false, which I found when I tried to do something like this (copied bits and pieces from here and there):

premiere.scrub {regex(debug)||"premier":(\w+),"||}
premiere.modify {cleanup(debug)(removeduplicates=equal,100)}
premiere.modify {replace(debug)(not "")|'premiere'|true}

Having looked at another line which was already in the ini:
index_starrating.scrub {regex(not "null")||"FilmStarRating":(.*?),"||}

And also the (not "") bit you mentioned yesterday, I took that to mean that it should only do the next bit if the first is not null or "". However, when I tried to debug the above starrating regex (was already present in the ini), I am basically seeing null in the XMLTV whenever there isn't a number present.

Have I misunderstood how this part works? I tested the not "xxxx" by putting true, false, null "", "null" etc however the outcome was always the same, so I can't quite tell what it is doing :(

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

premiere element has been broken since wg version 2.1 and still hasnt been fixed.
so even if ur scrub did work nothing is written to ur guide.xml

if you read the manual on this element u will see its value is boolean(only valid value is true or false).

when u add the argument (not "") or not "xxx" xx being null or any other value u want to use.it means the operation is only performed if the element being used is not empty(has no value) or if its not the value u specify in "xxx".
(not "") means the same thing as the argument (null).

dont get this confused with the argument (notnull),from the manual..

4.6.2.2 Post-Conditional arguments
These conditions will only be evaluated if the pre-conditions are true (or left out).
Values : either anycase, null or notnull
 anycase (default) : the operation will be performed regardless any of the conditions described below.
 null
: the operation will only be performed if the element is empty

notnull
(in the case of addstart and addend commands)
: the operation will only be performed if the element is not empty in the case of addstart and addend
commands
: the operation will not be performed if the element will become empty during/through the operation
in the case of replace and remove commands

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

to give a practical example of what notnull is usefull for.
say the index_title has the year at the end( a 4 digit number).

index_title.modify {remove(notnull type=regex)|"\s*\d{4}"}

so say u had these 2 title

My Movie 2019
and
2012

My Movie 2019 ===> My Movie
2012 ==> 2012

so for the second one the 4 digit number is not removed because this would cause the title to be empty,notnull tells wegrab to not perform the operation if this will happen.

hshah
Offline
Has donated long time ago
Joined: 3 years
Last seen: 2 years

Ohhh! I was thinking about it the other way around. I am generally a technical person, but regex and this thing really does make my head hurt lol.

Looking at the RadioTimes ini that I started with, someone put this line:
subtitle.scrub {single(not "null")|"subtitle":||}|}}

Is that basically saying to not bother scrubbing subtitle, if it is going to be:
"subtitle":"null"

And if it was one of these two (not sure which is the right way of doing it or if they are even valid):
subtitle.scrub {single(notnull)|"subtitle":||}|}}
subtitle.scrub {single(not "")|"subtitle":||}|}}

Does it mean:
"subtitle":""
"subtitle":

So going back to one of the points I mentioned, this was already present in the ini:
index_starrating.scrub {regex(not "null")||"FilmStarRating":(.*?),"||}

I was getting this in the XML:

null

The part in the JSON was "FilmStarRating":null so is that why not "null" never worked (because it wasn't matching a string, and instead it should be one of the two I mentioned above?

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 6 years
Last seen: 1 week

subtitle.scrub {single(notnull)|"subtitle":||}|}}
subtitle.scrub {single(not "")|"subtitle":||}|}}

notnull means dont scrub the value if the element(subtitle) will be empty after the scrub
this is called a post conditional argument(it happens after the operation(scrub in this case)) is performed.

not "" means only scrub the value if the element(subtitle) is not(currently) empty
this is a pre conditional argument(it happens before the scrub(in this case)) is performed.

both arguments make no sense in this case.

index_starrating.scrub {regex(not "null")||"FilmStarRating":(.*?),"||}

here u saying dont scrub the starrating if its value is "null"
again,this is a pre conditional argument(it happens before the scrub(in this case)) is performed.

so u cannot say dont scrub the starrating if its value is "null" because you dont know yet what it will be until after the scrub happens.

what would make sense is..

index_starrating.scrub {regex||"FilmStarRating":(.*?),"||}
index_starrating.modify {clear("null")}

so after u scrub the value u check if its "null" and if so delete it as its not a real starrating value.

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl