Tuesday, December 20, 2011

Web Browser Behavior Exploitation

Background
The use of web behavior data is important to get feedback from customers. The feedback from customers can be divided into two groups, explicit and implicit feedback. Below are some items of that feedback.
Explicit feedback
  • specified keyword
  • selected & market document
  • rated items
Implicit feedback
  • natural interaction of the system
  • no extra cost
  • less accurate
  • can be combined with explicit feedback
  • Potential observable behavior
  • “modeling information content using observable behavior” (ASIST, 2003).
That feedback is analyzed from year to year. Based on web browsing behavior data, in implicit feedback, web browsing behavior was focused on widely (www). Since 1995, researchers perform research on web browsing behavior (www started in 1991).

How to collect data
Data can be collected using WWW system via link hits (url, content, history, etc), operation pages, viewing page per frame, keyword, mouse activities, use of a button, eye track (movement, gaze point), talk aloud by vid tapping, diary, and interview.

On an article, its reading time can be analyzed by the following techniques:
  • information filtering based on user behavior analysis and best text match
  • a correlation between article and reading time
  • article vs reading time in graph data
  • Using tools such as gnus/emacs block
Data as stated previously can be collected using www system (server logs, client). Using www system, the factors below should be considered.
  • server: easy collected
  • client: with special software installed, everything can be recorded
  • proxy: tick option in web browser, such as IE, mozilla, safari, chrome etc, to speed access, to log usage, data can be collected transparently, no special software needed, if numbers of users grow largely, the system may stack
Not only by using www system but the data also can be obtained using other systems such as ;
  • Eye docking device → sensor/module to capture eye movement
  • mouse → user interaction capture module
  • display → screen capture module

How to use data
After the data are collected the next step is to use the data to achieve the goals such as evaluating and analyzing the characteristics of users. Some algorithms of machine learning and data mining are used in exploiting data. Those algorithms included but were not limited to,
  • collaborative filtering
  • clustering
  • support vector machine
  • bayesian network
  • neural network
  • association rule mining

Those algorithms are the most commonly used by the community. For example, collaborative filtering can be easily described in Fig 1.


Fig 1. Collaborative filtering: 5 should be recommended to A

As shown in Fig 1. If a group (group A) has a large similarity to another group (group B). So, what is done in group B should be recommended to group A. By this method, it is possible to improve the service to group A and also get benefits.

Concrete studies have been done by researchers. Examples of the previous, current, and future studies are as follows
  • They are many studies using web browsing data
  • To retrieve, some filtering should be proposed
  • web page recommendation
  • mining navigation history for recommendation
  • histories of accessed pages are collected
For focusing on web search study, below are important techniques and factors:
  • implicitly used modeling for personalized search (CIKM, 2005)
  • A tourist and programmer may use “java” to search for different information
  • For the basic users, actions are considered
  • submitting a keyword query
  • gaze position from mouse movement

Examples of those techniques can be given in the following screenshots.
For picture data, tag recommendations are used with the following rules,
  • a large number of photos are taken and shared
  • Flickr, users can upload photos and applied tags
  • selecting appropriate word
The picture or image should be accompanied by a keywords tag recommendation. Examples of keywords tags recommendation can be shown in the last figures. The first figure is a stadium for football called diamond stadium. So the tags are the diamond stadium, sports game, world cup, football. The second picture is a picture of Tokyo Tower at Night. So, the tags should be: city, Tokyo, trip, tower, night, Japan.
Tag:  diamond stadium, sport game, world cup, football

Tag: city, Tokyo, trip, tower, Skytree, Japan

Conclusion
By knowing the interest of the user, it can be given better service. Better service means better benefits. To achieve the goal, web browsing behavior is observed, analyzed, and exploited by some algorithms.

Future studies can be analyzed. Examples are,
  • What data should be collected
  • Many issues are open to being investigated

    by Bagus Tris Atmaja - 114D9818
    Written as homework for the Current Science and Technology Class
    Graduate School of Science and Technology, Kumamoto University
    Japan 2011
Related Posts Plugin for WordPress, Blogger...