Source Data
As input for a tag cloud, you need a dataset consisting of at least three columns:
- Text (to display).
- Weight (to determine the font size).
- Identifier (something to support navigation.)
The weight in the dataset often represents some frequencythe number of times a text is used as a search term, or the number of items sold of a product. However, the weight is not always an integer value. You can also consider, say, election results consisting of political parties and their percentages, earthquakes and their intensity, or movie stars and their IQ. In fact, there is technically little difference between tag clouds, histograms, line graphs, and pie charts. (I wouldn't be surprised to find the tag cloud as just another type of standard chart in Excel 2010.)
While constructing the source data for a tag cloud, you can impose restrictions on the raw data in the system in three ways:
- Similar to other graphs, there is a restriction to the density of the information in tag clouds. According to Wikipedia, tag clouds generally contain between 30 and 150 tags. Usability clearly sets an upper limit to the number of tags. Moreover, the page layout can impose a restriction to the available space for the tag cloud. It is therefore necessary to take into account an imposed maximum length for the dataset.
- Some texts may not be interesting to users and should be omitted from the tag cloud. This is the case for articles and other small words that are considered to be "noise" by search algorithms. If there are tags such as these in your data, you might want to filter the results.
- Many tag clouds present information calculated over a period of time, such as the number of times that search terms have been used in the last 24 hours. Depending on the data, your function may contain extra parameters with which you restrict the aggregation of data to a (progressive) subset.
Eventually, you will create one or more functions that resemble Listing One. Your architecture for data access is hopefully more sophisticated than this simple example. But if you separate the construction of source data from the remaining functional layers of the tag cloud, then you already have a better design than the average tag cloud example found on the Internet.
Public Function GetWriters(ByVal maxCount As Integer, _ ByVal ignoreNoise As Boolean, ByVal fromDate As DateTime, _ ByVal toDate As DateTime) As DataTable Dim query As String = String.Format( _ "SELECT * FROM (SELECT TOP {0} ID, Text, " & _ "Count FROM Writers ORDER BY Count DESC) sub " & _ "ORDER BY Text ASC", maxCount) 'TODO: also filter on ignoreNoise, fromDate and toDate Dim adapter As New SqlDataAdapter(query, _ConnectionString) Dim table As New DataTable adapter.Fill(table) Return table End Function