Tags: | | | |

2007-10-04 23:06

Legitimate Content Scraping


I know, it doesn't sound good - and usually content scraping is frowned upon, especially by bloggers, but I have found a legitimate reason to write content scraping software. If you don't know what content scraping is, it's the act of taking a web page or web feed and skimming off the content (the article, the post, the breaking news story, etc) and putting it to some other use. In most contexts, content scraping would be used in order to steal web content. There are sites built on a daily basis that do nothing but scrape fresh content from the web and pass it off as original content - sometimes causing trouble for the original author in search engine results. The scrapers are usually trying to get away with monetizing other people's hard work, and reaping the rewards of search engine traffic.

So, what legitimate purpose does content scraping hold? Retrieval of forum data. Long ago my sister started a forum on a free forum host, forumer.com, and now that it has grown to a decent size and built it's own community, she would like to move away from forumer.com for better branding and customization ability. The only problem is, forumer.com is not cooperative when it comes to retrieving forum databases for their customers, and we are left with no options other than to scrape the user, thread, post, and other data from the forumer forum, and insert them into the database of her newly setup forum. My hope was that other people had already run into this problem, and come up with a solution or written an existing software solution, unfortunately it doesn't seem like anyone has. So, as I continue to search for forum scraping software, I am beginning to mentally organize just how to write the software myself, as that might be where this project ends up. So, if you are ever planning to start a forum, think twice before using a free hosting service, and consider possibly investing in hosting it for yourself - it may be worth it down the road. If you are like my sister, and you are stuck in a bad situation, you can always consider scraping your own forum's content.

- Kevin
Kevin (at) Upcsite (dot) Net



If you enjoyed this post, Please consider subscribing to my full-post feed, or subscribing to receive my posts by email. Have anything to say? I love feedback on my posts, so feel free to leave a comment below.





< Next Post | Previous Post >




Post A Comment:












*required field