<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David R. MacIver &#187; clustering</title>
	<atom:link href="http://www.drmaciver.com/tag/clustering/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.drmaciver.com</link>
	<description></description>
	<lastBuildDate>Tue, 07 Feb 2012 11:12:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Determining logical project structure from commit logs</title>
		<link>http://www.drmaciver.com/2009/04/determining-logical-project-structure-from-commit-logs/</link>
		<comments>http://www.drmaciver.com/2009/04/determining-logical-project-structure-from-commit-logs/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 16:35:04 +0000</pubDate>
		<dc:creator>david</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[markov clustering]]></category>
		<category><![CDATA[scala]]></category>

		<guid isPermaLink="false">http://www.drmaciver.com/?p=505</guid>
		<description><![CDATA[In a bored 5 minutes at work I threw the following together: Logical source file groupings in the Scala repo The largest cluster is clearly noisy and random. I more or less expected that. But the small and medium ones often make a lot of sense. The basic technique is straightforward: We use a trivial [...]]]></description>
			<content:encoded><![CDATA[<p>In a bored 5 minutes at work I threw the following together: <a href="http://drmaciver.com/scala_logical_modules">Logical source file groupings in the Scala repo</a></p>
<p>The largest cluster is clearly noisy and random. I more or less expected that. But the small and medium ones often make a lot of sense.</p>
<p>The basic technique is straightforward: We use <a href="http://gist.github.com/103249">a trivial script</a> to scrape SVN logs to get a list of files that change in each commit. We use this to calculate the <a href="http://github.com/DRMacIver/binary-pearsons/tree/master">binary pearsons</a> of these observations to get a measure of the similarity between two files (a number between -1 and 1, though we throw away anything <= 0). We then use <a href="http://www.micans.org/mcl/">markov clustering</a> to cluster the results into distinct groupings.</p>
<p>The results are obviously far from perfect. But equally obviously there&#8217;s a lot of interesting information in them, and the technique could certainly be refined (e.g. by looking at sizes of diffs on each file and using that rather than a simple 0/1 changed. Also experimenting with other clustering algorithms, etc). Maybe something worth pursuing?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.drmaciver.com/2009/04/determining-logical-project-structure-from-commit-logs/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>More kittens: Improving edge weight calculations</title>
		<link>http://www.drmaciver.com/2009/04/more-kittens-improving-edge-weight-calculations/</link>
		<comments>http://www.drmaciver.com/2009/04/more-kittens-improving-edge-weight-calculations/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 08:30:27 +0000</pubDate>
		<dc:creator>david</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[kittens]]></category>
		<category><![CDATA[markov clustering]]></category>

		<guid isPermaLink="false">http://www.drmaciver.com/?p=472</guid>
		<description><![CDATA[One of the problems with my old kitten clustering was that inside the black bits (text, the kitten, etc) the clustering descended into a chaotic rainbow of teeny tiny clusters.This turns out to be little to do with the clustering and more to do with the slightly moronic way I&#8217;d calculated edge weights, which was [...]]]></description>
			<content:encoded><![CDATA[<p>One of the problems with my old kitten clustering was that inside the black bits (text, the kitten, etc) the clustering descended into a chaotic rainbow of teeny tiny clusters.This turns out to be little to do with the clustering and more to do with the slightly moronic way I&#8217;d calculated edge weights, which was doing a bad job between dark colours. I&#8217;ve simplified the calculation, so the following should be more indicative of how markov clustering clusters this sort of thing:</p>
<p><img class="aligncenter size-full wp-image-473" title="segmented2" src="http://www.drmaciver.com/wp-content/uploads/2009/04/segmented2.png" alt="segmented2" width="640" height="400" /></p>
<p>The clusters are still very small, but now recognisable rather than just dots.</p>
<p>In case you&#8217;re wondering, this isn&#8217;t going anywhere in particular. I&#8217;m just experimenting and figured I may as well do it in public.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.drmaciver.com/2009/04/more-kittens-improving-edge-weight-calculations/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Segmenting kittens: Experiments with clustering image contents</title>
		<link>http://www.drmaciver.com/2009/04/segmenting-kittens-experiments-with-clustering-image-contents/</link>
		<comments>http://www.drmaciver.com/2009/04/segmenting-kittens-experiments-with-clustering-image-contents/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 21:01:02 +0000</pubDate>
		<dc:creator>david</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[clustering]]></category>

		<guid isPermaLink="false">http://www.drmaciver.com/?p=467</guid>
		<description><![CDATA[I&#8217;ve been experimenting with using Markov Clustering at work. It&#8217;s a very nice algorithm for clustering certain classes of symmetric graphs. I had a vaguely interesting thought: What, thought I, if I took an image and built a graph out of its pixels. The edges would go between nearby pixels and would be weighted according [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been experimenting with using <a href="http://www.micans.org/mcl/">Markov Clustering</a> at work. It&#8217;s a very nice algorithm for clustering certain classes of symmetric graphs.</p>
<p>I had a vaguely interesting thought: What, thought I, if I took an image and built a graph out of its pixels. The edges would go between nearby pixels and would be weighted according to the similarity of their colours. We could then markov cluster that and see what the results looked like.</p>
<p>Well&#8230; the results are amusing. There&#8217;s some interest in them, but mainly in what it say sabout how markov clustering works. Behold, as we turn this:</p>
<p><img class="aligncenter size-full wp-image-468" title="kittens" src="http://www.drmaciver.com/wp-content/uploads/2009/04/kittens.jpg" alt="kittens" width="640" height="400" /></p>
<p>Into this:</p>
<p><img class="aligncenter size-full wp-image-470" title="segmented1" src="http://www.drmaciver.com/wp-content/uploads/2009/04/segmented1.png" alt="segmented1" width="640" height="400" /></p>
<p>Hmm. Back to the drawing board.</p>
<p>I&#8217;ve dumped the code <a href="http://github.com/DRMacIver/image-segmentation/tree/master">here</a> if you&#8217;re at all curious.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.drmaciver.com/2009/04/segmenting-kittens-experiments-with-clustering-image-contents/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

