<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Cleaning up a set of tags, part 1</title>
	<atom:link href="http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/</link>
	<description></description>
	<lastBuildDate>Wed, 10 Feb 2010 18:19:37 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: david</title>
		<link>http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/comment-page-1/#comment-591</link>
		<dc:creator>david</dc:creator>
		<pubDate>Wed, 28 Jan 2009 17:09:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.drmaciver.com/?p=373#comment-591</guid>
		<description>Also I&#039;m glad you&#039;re finding it interesting. :-) I don&#039;t expect that much of what I&#039;m doing will see use on these sites - it&#039;s not really for that. It&#039;s more for building data analysis on top of a noisy tag set.</description>
		<content:encoded><![CDATA[<p>Also I&#8217;m glad you&#8217;re finding it interesting. :-) I don&#8217;t expect that much of what I&#8217;m doing will see use on these sites &#8211; it&#8217;s not really for that. It&#8217;s more for building data analysis on top of a noisy tag set.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/comment-page-1/#comment-590</link>
		<dc:creator>david</dc:creator>
		<pubDate>Wed, 28 Jan 2009 17:08:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.drmaciver.com/?p=373#comment-590</guid>
		<description>It&#039;s true that stripping off pluralisation can change the meaning of the tag - my expectation is that that these cases will be sufficiently far outliers compared to the number of cases where this removes that I can live with the slight loss of useful information. Generally the meaning will be close enough to preserved that it&#039;s tolerable.

I&#039;m going to do some analysis on usage later when I try to clean up things further, and when I do I&#039;ll see if I can spot any cases where it&#039;s actually breaking things.</description>
		<content:encoded><![CDATA[<p>It&#8217;s true that stripping off pluralisation can change the meaning of the tag &#8211; my expectation is that that these cases will be sufficiently far outliers compared to the number of cases where this removes that I can live with the slight loss of useful information. Generally the meaning will be close enough to preserved that it&#8217;s tolerable.</p>
<p>I&#8217;m going to do some analysis on usage later when I try to clean up things further, and when I do I&#8217;ll see if I can spot any cases where it&#8217;s actually breaking things.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joshua Drake</title>
		<link>http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/comment-page-1/#comment-589</link>
		<dc:creator>Joshua Drake</dc:creator>
		<pubDate>Wed, 28 Jan 2009 17:00:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.drmaciver.com/?p=373#comment-589</guid>
		<description>One comment on pluralization.  Although it sounds like this did not apply to the data set you normalized, it is possible that Statistic applied to a single fact or piece of information, while Statistics referred to the field of, and their may be additional cases where the plural and singular are different usefully different entities.

That said I appreciate your work on this, especially the details given on your approach. I hope that because of work like yours, sites will consider cleaning up their tags.  Only two sites I can think of attempt to control their tags to any degree.  Amazon, which presents previously used tags and Stack Overflow which handles this with tag auto-completion and a reputation requirement for creating new tags.</description>
		<content:encoded><![CDATA[<p>One comment on pluralization.  Although it sounds like this did not apply to the data set you normalized, it is possible that Statistic applied to a single fact or piece of information, while Statistics referred to the field of, and their may be additional cases where the plural and singular are different usefully different entities.</p>
<p>That said I appreciate your work on this, especially the details given on your approach. I hope that because of work like yours, sites will consider cleaning up their tags.  Only two sites I can think of attempt to control their tags to any degree.  Amazon, which presents previously used tags and Stack Overflow which handles this with tag auto-completion and a reputation requirement for creating new tags.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/comment-page-1/#comment-588</link>
		<dc:creator>david</dc:creator>
		<pubDate>Wed, 28 Jan 2009 06:39:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.drmaciver.com/?p=373#comment-588</guid>
		<description>I don&#039;t mind at all. :-) It was an interesting read. Thanks.

I will however probably continue using Ruby for these tasks. In particular I think you&#039;re going to be completely unable to port the second one to awk because that&#039;s where it stops being remotely line oriented and where I start making use of more general purpose libraries. I could start out with awk and switch to Ruby when that happens, but frankly I&#039;m much more comfortable keeping it in the same language throughout.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t mind at all. :-) It was an interesting read. Thanks.</p>
<p>I will however probably continue using Ruby for these tasks. In particular I think you&#8217;re going to be completely unable to port the second one to awk because that&#8217;s where it stops being remotely line oriented and where I start making use of more general purpose libraries. I could start out with awk and switch to Ruby when that happens, but frankly I&#8217;m much more comfortable keeping it in the same language throughout.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Porges</title>
		<link>http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/comment-page-1/#comment-587</link>
		<dc:creator>Porges</dc:creator>
		<pubDate>Wed, 28 Jan 2009 03:28:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.drmaciver.com/?p=373#comment-587</guid>
		<description>As you can see from the pingback I&#039;ve written a reply semi-tutorial on how to use Awk to do the same task. Hope you don&#039;t mind :)</description>
		<content:encoded><![CDATA[<p>As you can see from the pingback I&#8217;ve written a reply semi-tutorial on how to use Awk to do the same task. Hope you don&#8217;t mind :)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
