![]() A blog is the unedited, unregulated voice of an individual (Mishne 2007), as published on a web page containing time-stamped entries ( blog posts) in reverse chronological order (i.e., last entry displayed first). Organizations look for ways of mining the information that is available in these user generated sources, and to do so, tools and techniques need to be developed that are capable of handling this type of content. This content, i.e., user generated content, is particularly valuable as it offers an insight in what people do, think, need to know, or care about. ![]() All of these activities involve the creation of content by the end users of these platforms, as opposed to editors or webmasters. We increasingly live our lives online: we keep in touch with friends on Facebook,įootnote 1 expand our network using LinkedIn,įootnote 2 quickly post messages on Twitter,įootnote 3 comment on news events on online news paper sites, help others on forums, mailing lists, or community question-answer sites, and report on experiences or give our opinions in blogs. The model integrates aggressive pruning techniques as well as very lean representations of the contents of blog posts, resulting in substantial gains in efficiency while maintaining effectiveness at a very competitive level. We then introduce a two-stage model where a pre-selection of candidate blogs is followed by a ranking step. This approach achieves state-of-the-art performance in terms of effectiveness. Working in the setting of a probabilistic language modeling approach to information retrieval, we model the blog feed search task by aggregating over a blogger’s posts to collect evidence of relevance to the topic and persistence of interest in the topic. We examine the effectiveness of an approach to blog feed search that is based on individual posts as indexing units (instead of full blogs). The large number of blogs makes the blogosphere a challenging domain, both in terms of effectiveness and of storage and retrieval efficiency. In this paper, we address the task of blog feed search: to find blogs that are principally devoted to a given topic, as opposed to blogs that merely happen to mention the topic in passing. User generated content forms an important domain for mining knowledge.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |