You are here: Tag Based Simple Correlation

Tag Based Simple Correlation

The previous experience about realization of correlation coefficients, we need first to realize the similar funcation of search engine - word segmenting, indexing, and then statistically comparing TF/IDF in order to judge the relevance of the contents. With regard to the this technology, we can get a good reference from this article: Beauty of Number, which wirtten by Mr. Wu Jun, a researcher of Google.

The creation of Tag, is a excellent complement to realization of correlation coefficients. Because it provides a way that we can change the method from getting the key words by machine (robot) to getting those key words by individual's description. The accuracy and applicability of the result people do are much better than machine does.

In a simple system, we often use one content table, one tag table and one relevance table to realize Tag function. For examle, the three tables are content table namely Blogs, tag table namely Tags and relevance table namely Marks, both Blogs and Tags have primary-key id, the relevance table connect three tables by comparing with blog_id and tag_id:

<pre class=brush:sql> CREATE TABLE `blogs` (

`id` int(10) auto_increment,
`title` varchar(255) ,
PRIMARY KEY  (`id`)

);

CREATE TABLE `tags` (

`id` int(10) auto_increment,
`tag` varchar(100) ,
`count` int(10),
PRIMARY KEY  (`id`)

);

CREATE TABLE `marks` (

`tag_id` int(10),
`blog_id` int(10),
PRIMARY KEY  (`tag_id`,`blog_id`)

) ; </pre>

In such kind of system, if we want to get the relevant content of article A, we can only retrieve all tags of this article, and then retrieve articles which associated with those tags on it. For instance, if we want to get the relevant blog of id 1 blog, then we can realize it in <a href="http://en.wikipedia.org/wiki/Join_(SQL)|left join" />this way</a> in Marks table:

<pre class=brush:sql> SELECT B.blog_id, B.tag_id FROM marks AS A LEFT JOIN marks AS B ON A.tag_id = B.tag_id WHERE A.blog_id =1 and B.blog_id != 1 </pre>

the result will show in blog_id field with all blogs which have connection with id=1

Of course there is also some disadvantages of doing it. First of all, this is a result that considering the important of all tags are same. However, in fact all tags have their own weight. For instance, in EASTHV's internal tech-blog, there are 12 articles tagged as "Javascript" and there are 2 articles tagged as "Apache". If every article only has one tag, then obviously the articles which tagged as "Apache" have more interaction than the articles which tagged as "Javascript".

Thereupon we can use the concept of TF/IDF to calculate the weight of Tag. If one blog have 100 articles, then the weight of Tag "Javascript" is Ln(100/12)=2.120, the weight of Tag "Apache" is Ln(100/2) is 3.912. Showing that in the SQL query is:

<pre class=brush:sql> SELECT B.blog_id, B.tag_id, LOG( 100 / C.count ) AS weight FROM marks AS A LEFT JOIN marks AS B ON A.tag_id = B.tag_id LEFT JOIN tags AS C ON B.tag_id = C.id WHERE A.blog_id =1 AND B.blog_id !=1 </pre>

We can get the weight of Tag by relating Tag table and the count value (the articles of one Tag).

Then the following question is, usually one article have more than one Tags, such as:

The Tags article A have are: "Javascript", "Apache", "Css", "Ajax", the Tags article B have are: "Javascript", "Apache", "Google", the Tags article C have are: "Json", "Apache", "XML".

If all Tags have same weight, then obviously the correlation of three articles is AB>AC. Because AB have two same Tags and AC only have one.

How do we apply this in SQL conjunctive query, only require some few changes:

<pre class=brush:sql> SELECT B.blog_id, B.tag_id, SUM( LOG( 100 / C.count ) ) AS weight FROM marks AS A LEFT JOIN marks AS B ON A.tag_id = B.tag_id LEFT JOIN tags AS C ON B.tag_id = C.id WHERE A.blog_id =1 AND B.blog_id !=1 GROUP BY B.blog_id ORDER BY weight DESC </pre>

Here we use SUM function and GROUP BY query.

Thus, we only used one SQL query to gain the correlation sorted according to the sequence.

EASTHV News

Reviews from our Clients   EASTHV always believe in this: As long as we do best work to our cl...
New Order Signed: Metal Product (aluminum) EASTHV just signed a new order with our client. The client request 2000P...
Seeking Business Corporation about "Rose oil" or "Rose Absolute" East Hui Van Business Consulting Co., Ltd. (herein after referred to as ...
Annual Greeting from EASTHV   Dear Clients and Partners of EASTHV,   Hereby an annual greeting ...
New Order Signed with Our Client (Banner) EASTHV just signed a new order with our client. The client request 150...

EASTHV Articles

A New Way of Doing International Business - All About Trust The Quality of Products and service are the foundation for the survival and...
Top7 Tips for Verifying Chinese Companies Last time, we shared the why of Using Chinese legal website checking system...
So You Think You Can eCommerce? The eCommerce develops quickly but blind. Everyone has his own answer when ...
Living and Doing Business in China General Information about China: In a fast growing world in times of globa...
Using Chinese legal website checking system to verify Chinese companies During 2009, EASTHV Business Consulting has verified hundreds of Chinese Co...

Selected Chinese Suppliers

Plastic Mould Manufacturer As a privately owned enterprise with a registered capital of...
Specialized in making acrylic products Yamei DISPLAY PRODUCTS CO.,LTD. is a manufacture all kinds ...
Integrated Non-Ferrous Metallurgical Enterprise JNMC is one of the biggest integrated non-ferrous metallurgi...
Manufactureing Sports Bottles Founded in 2001, this is a professonal manufacturer of Sport...

Core Value

Our Mission:

Focusing on our clients's challenges and opportunities, providing competitive solutions and services, continuously creating the maximum value for our clients.


Our Vision:

Providing and Creating a better opportunity for our clients, employees and business partners in order to make their dream come true.

Contact Us

Please contact us at the following:

  • ADD: #805 - Tianqing International Business Building - Tianqing Rd #588 - Lanzhou - China
  • Postcode: 730040
  • Tel: +86 931 8527523
  • Fax: +86 931 8527501
  • Email: info@easthv.com
  • Website: www.easthv.com