<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David R. MacIver &#187; markov clustering</title>
	<atom:link href="http://www.drmaciver.com/tag/markov-clustering/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.drmaciver.com</link>
	<description></description>
	<lastBuildDate>Wed, 18 Aug 2010 13:56:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Determining logical project structure from commit logs</title>
		<link>http://www.drmaciver.com/2009/04/determining-logical-project-structure-from-commit-logs/</link>
		<comments>http://www.drmaciver.com/2009/04/determining-logical-project-structure-from-commit-logs/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 16:35:04 +0000</pubDate>
		<dc:creator>david</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[markov clustering]]></category>
		<category><![CDATA[scala]]></category>

		<guid isPermaLink="false">http://www.drmaciver.com/?p=505</guid>
		<description><![CDATA[In a bored 5 minutes at work I threw the following together: Logical source file groupings in the Scala repo The largest cluster is clearly noisy and random. I more or less expected that. But the small and medium ones often make a lot of sense. The basic technique is straightforward: We use a trivial [...]]]></description>
			<content:encoded><![CDATA[<p>In a bored 5 minutes at work I threw the following together: <a href="http://drmaciver.com/scala_logical_modules">Logical source file groupings in the Scala repo</a></p>
<p>The largest cluster is clearly noisy and random. I more or less expected that. But the small and medium ones often make a lot of sense.</p>
<p>The basic technique is straightforward: We use <a href="http://gist.github.com/103249">a trivial script</a> to scrape SVN logs to get a list of files that change in each commit. We use this to calculate the <a href="http://github.com/DRMacIver/binary-pearsons/tree/master">binary pearsons</a> of these observations to get a measure of the similarity between two files (a number between -1 and 1, though we throw away anything <= 0). We then use <a href="http://www.micans.org/mcl/">markov clustering</a> to cluster the results into distinct groupings.</p>
<p>The results are obviously far from perfect. But equally obviously there&#8217;s a lot of interesting information in them, and the technique could certainly be refined (e.g. by looking at sizes of diffs on each file and using that rather than a simple 0/1 changed. Also experimenting with other clustering algorithms, etc). Maybe something worth pursuing?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.drmaciver.com/2009/04/determining-logical-project-structure-from-commit-logs/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>More kittens: Improving edge weight calculations</title>
		<link>http://www.drmaciver.com/2009/04/more-kittens-improving-edge-weight-calculations/</link>
		<comments>http://www.drmaciver.com/2009/04/more-kittens-improving-edge-weight-calculations/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 08:30:27 +0000</pubDate>
		<dc:creator>david</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[kittens]]></category>
		<category><![CDATA[markov clustering]]></category>

		<guid isPermaLink="false">http://www.drmaciver.com/?p=472</guid>
		<description><![CDATA[One of the problems with my old kitten clustering was that inside the black bits (text, the kitten, etc) the clustering descended into a chaotic rainbow of teeny tiny clusters.This turns out to be little to do with the clustering and more to do with the slightly moronic way I&#8217;d calculated edge weights, which was [...]]]></description>
			<content:encoded><![CDATA[<p>One of the problems with my old kitten clustering was that inside the black bits (text, the kitten, etc) the clustering descended into a chaotic rainbow of teeny tiny clusters.This turns out to be little to do with the clustering and more to do with the slightly moronic way I&#8217;d calculated edge weights, which was doing a bad job between dark colours. I&#8217;ve simplified the calculation, so the following should be more indicative of how markov clustering clusters this sort of thing:</p>
<p><img class="aligncenter size-full wp-image-473" title="segmented2" src="http://www.drmaciver.com/wp-content/uploads/2009/04/segmented2.png" alt="segmented2" width="640" height="400" /></p>
<p>The clusters are still very small, but now recognisable rather than just dots.</p>
<p>In case you&#8217;re wondering, this isn&#8217;t going anywhere in particular. I&#8217;m just experimenting and figured I may as well do it in public.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.drmaciver.com/2009/04/more-kittens-improving-edge-weight-calculations/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
