You are here

I updated tvsou.com.ini (CN)

22 posts / 0 new
Last post
Netuddki
Offline
Joined: 6 years
Last seen: 1 year
I updated tvsou.com.ini (CN)

I fixed the tvsou.com.ini so the channel list creation actually creates channels with their respective name, instead of their url as name.

Updated ini for new website on 16. Jan. 2017

Edit2: I replaced the ini again, because the channel list creation was not right.

EDIT 3 (19. Jan. 2017):

Updated ini and channel list.
Way more channels, than initially thought.
Also channels from Tibet, Taiwan, Hong Kong and Macau.
Probably still not all.

There are even more channel on the site, but I am not sure, if those are really different channels or just duplicates.

Chinese people are either crazy for TV and they have really thousands of channels, or the website is just listing everything multiplied.

The ini removes duplicates now based on channel ID instead of channel name, because channel "names" are often just a general description, like "Public TV" or "News channel"

chen0928
Offline
Joined: 5 years
Last seen: 9 months

The site has been updated, the data can not be collected, look at the correction! Thank you

Netuddki
Offline
Joined: 6 years
Last seen: 1 year

Thanks!

I will check it!

chen0928
Offline
Joined: 5 years
Last seen: 9 months

Hello, can you help me with this http://www.tvmao.com/program site collection? This site has a wider coverage of data. Thank you

Netuddki
Offline
Joined: 6 years
Last seen: 1 year

I updated the ini in the original post.

 

chen0928 wrote:

Hello, can you help me with this http://www.tvmao.com/program site collection? This site has a wider coverage of data. Thank you

I will look at it.

Netuddki
Offline
Joined: 6 years
Last seen: 1 year

I updated the ini again.

Read explanation in OP.

Netuddki
Offline
Joined: 6 years
Last seen: 1 year
chen0928 wrote:

Hello, can you help me with this http://www.tvmao.com/program site collection? This site has a wider coverage of data. Thank you

I started creating the ini, but tvmao banned me temporarlily, probably because I started scrubbing too many times :-)

I will continue tomorrow.

Netuddki
Offline
Joined: 6 years
Last seen: 1 year
chen0928 wrote:

Hello, can you help me with this http://www.tvmao.com/program site collection? This site has a wider coverage of data. Thank you

 

I give up.

As far as I understand, the site shows only the morning program up until noon, everything after that will be generated by javascript on the fly, so WebGrab only gets the morning program.

I assume, one should load the whole site first, then clone it and then grab it to get all the shows, but my knowledge isn't enough for that.

The mobile site m.tvmao.com is even worse, because there is no program in the page source at all.

I'm sorry.

I was able to generate the channel list. Will not do anything, but here it is.

chen0928
Offline
Joined: 5 years
Last seen: 9 months

Perhaps this site is deliberately designed so that you can not crawl the page information

Netuddki
Offline
Joined: 6 years
Last seen: 1 year

Probably yes.

It is a website from a big EPG providing company who sells EPG data services, so it is most probably their intention not to be able to crawl the site.

chen0928
Offline
Joined: 5 years
Last seen: 9 months

tvsou.com,The contents of the collection is not right, just I carefully checked the contents of the day is always collected. Hope to fix, thank you

Netuddki
Offline
Joined: 6 years
Last seen: 1 year

Can you explain it in more detail?

It's hard to check, because I don't speak chinese and if I don't know what is wrong exactly, then I don't know what I should look for.. :-)

chen0928
Offline
Joined: 5 years
Last seen: 9 months

I was a continuous collection of more than 5 days of the channel and found that all the channels are collected on the first day of information, the acquisition of the first 5 days of the day

Netuddki
Offline
Joined: 6 years
Last seen: 1 year

I updated the ini and the channel list.

There are far more channels on the site and now they are grabbed as well.

There are even more, but I am not sure, if those are really different channels or just duplicates.

Chinese people are either crazy for TV and they have really thousands of channels, or the website is just listing everything multiplied.

The ini removes duplicates now based on channel ID instead of channel name, because channel "names" are often just a general description, like "Public TV" or "News channel"

chen0928 wrote:

I was a continuous collection of more than 5 days of the channel and found that all the channels are collected on the first day of information, the acquisition of the first 5 days of the day

I still don't understand, what you try to say, sorry.
English is my third language and it's probably also not your first, so we have a problem here.

As far as I understand, you try to say, that only the first day will be grabbed and those programs will be added to every day, is that correct?

Because I looked at the generated xml and I don't see a problem. At least not with CCTV-1 and Dragon channel.
As far as I can compare without knowing chinese, the programs are the same in the xml, like they are on the site. Channels often have the same programming every day.

If you could post an example of your problem, copied from the XML, or some screenshots, maybe I could make sense of it.

chen0928
Offline
Joined: 5 years
Last seen: 9 months

I try, because I was using the translation, so some expression is not correct. Previously collected content is the first day

chen0928
Offline
Joined: 5 years
Last seen: 9 months

问题依旧,是采集的内容都是同一天的 !

Attachments: 
chen0928
Offline
Joined: 5 years
Last seen: 9 months

The problem remains, is the collection of the contents of the same day!

Attachments: 
Netuddki
Offline
Joined: 6 years
Last seen: 1 year

Today is the 19th of January.

The xml shows exactly the same data as the website.

The first show starts at 0:00 on the website and in the xml.

Datum marked green, the actual program marked red on the attached image.

I still don't see a problem.

I really don't want to sound rude, but copy+pasting the same sentence translated with google will not get us anywhere.

Please ask a friend who speaks english and tell him your problem and he should write it down in comprehensible english and you can paste it here.

I really try to help, but I can't until you can explain the problem in detail so that I can also understand it.

If you mean that the previous days are not collected, that's not a problem with the ini.
That's how WebGrab works. It collects data from today (when the script started running) and the next days.
You can't collect data from yesterday.

Attachments: 
chen0928
Offline
Joined: 5 years
Last seen: 9 months

Content is the day before

Attachments: 
Netuddki
Offline
Joined: 6 years
Last seen: 1 year

I just grabbed the channel and I get everything right.

The problem must be on your side.

Maybe something with time shift / time zones etc.

The ini grabs the program with Shanghai time.

zjly87904756
Offline
Joined: 3 years
Last seen: 2 years

All programs are captured in time format of 201905240000000
The corresponding version of TVOSU.COM.INI is Revision 3 - [19/01/2017] Netuddki
Request for comment!

zjly87904756
Offline
Joined: 3 years
Last seen: 2 years

Others are problematic. I'm only testing these two fields at the moment.
I can't understand why "index_title.scrub {single (separator="("include=1) | target="_blank |"> }" works properly.
and
"Index_start.scrub {single | class=" relative cur | < span > < / span > < / Li >}"is not normal!
The index_start.scrub grammar is problematic.
Or index_show split.scrub {multi | < ol class= "font-14 Color-3 | < Li | < / Li > | < / OL >} is problematic,
or other questions?
My real intention is to develop "INI" through case study. Please give me some advice. Thank you!

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl