Stable serialization and cryptographic hashing for tracking seen objects

This is a clever trick I figured out yesterday. It’s in retrospect moderately obvious, but I’ve never seen it before and it’s taken Hypothesis example tracking down from an expected space complexity of O(mn) and time complexity of O(god I don’t even know) to a deterministic O(n) space complexity and an O(mn) time complexity. It’s also vastly improved the constant factors by moving a bunch of stuff from pure python to native code.

(Actually there’s a bit of a wrinkle in that it’s not strictly deterministic because n is a random function of the number of examples you’re looking for based on the duplicate example rate, but in practice it’s rarely more than a few times larger than the number of examples).

First: The problem. Due to the way its data is generated, Hypothesis has a decent chance of producing the same example more than once (if it didn’t there would be a lot of other interesting examples it struggled to produce). Rather than run the same example multiple times, Hypothesis keeps a record of which examples it’s already tried and doesn’t try them again. This is also used during simplification to avoid loops in the simplify graph (it’s a lot easier to write simplifies that might occasionally have loops and let Hypothesis sort that out than it is for every simplify method to do a lot of careful book keeping).

Previously I was using the obvious data structure to track which examples we’ve already seen: A hash table. Stick the object in a hash table when it’s seen, check in the hash table to see if we’ve already seen it.

There are two problems with this. One is intrinsic, the other implementation specific.

The intrinsic one is that we’re keeping all these objects around! That uses up a lot of memory. Some of these objects may be large nested lists of things, and even if they aren’t Python is not exactly good at compact object representation.

The second problem is that Python’s default equality and hashing are completely unsuitable for this. We don’t want to consider {1} and frozenset({1.0}) equal for these purposes, and we do want to consider nan and nan equal for these purposes. Additionally we need to be able to track things like lists which aren’t hashable. So this means we need our own hashing and equality logic. This was written in pure python and thus was very slow when dealing with large objects.

This is very sub-optimal, but it’s not clear how I could have done it better when tracking arbitrary python objects.

…which is the clue. These days Hypothesis doesn’t need to track arbitrary Python objects. It tracks templates, which I can require to have a much more specific type than the objects tracked. In particular I can require them to be marshallable.

In the end I decided on a slightly laxer requirement because marshalling doesn’t handle certain types I wanted to be able to use (specifically namedtuple instances). Every template needs to be either:

  1. Marshallable
  2. An iterable type containing other templates

Combining these by flattening out collections into tuples + a type tag and then marshalling the whole lot lets us generate a binary string for every template value such that the two binary strings are equal the templates are equivalent (and generally speaking the converse too, though there are some special cases where that might not quite work if the iteration order is unstable and we just accept the slight duplication).

This is already a pretty great saving: We have native equality and hashing which does what we want it to do, and the representation is so much more compact than the Python object representation.

But we can do better! Sometimes (often) the strings we get here will be quite long. When a string is at least twenty bytes long we replace it with its sha1 hash. This gives us yet another space saving (in theory it also gives us a time saving, but the hash for binary strings is quite good and with random data equality will tend to return False early, so in practice I don’t expect equality comparisons between long strings would have been a noticable time cost).

The result of this change has been that some parts of my build are now nearly ten times faster than they used to be, and the memory usage has become much more predictable. It’s also meant that I could ditch a ton of code that I really hated – the custom hashing and equality code used to have other use cases, but this was the last remaining one. Over all, this has been one of my better ideas.

This entry was posted in Hypothesis, Uncategorized on by .

An alternate timeline for Stargate SG1

Advance warning: All my non-fiction-writing brain is taken up by writing documentation for Hypothesis, so instead you’re getting teased with another piece of fiction I’ll never actually write.

So I watch actually quite a lot of TV. I rarely sit down and watch TV, but it’s great background for while I’m doing other things – cooking, cleaning, doing exercises, etc.

One of my all time favourite TV series is Stargate SG1. It’s goofy as hell, but it’s a lot of fun and I love the characters.

But… you know, every now and then they have these episodes where Evil Civilian Regulators come in and point out that actually Stargate Command is pretty incompetent and look at all these things they did wrong and maybe there should be some competent people in charge instead?

Obviously these people are civilians and thus secretly evil and a suitable deus ex machina like the thunder god thor descending from the heavens and telling them to back off (this is not a hypothetical example) occurs and the evil civilian attempt to instil some competence is thwarted.

As you can tell, I think the evil civilians have a point. For example, it does not occur to anyone until they nearly unleash a devastating plague that could wipe out all of human civilization that maybe they should have quarantine procedures in place for people coming back from human populations that have been isolated from earth for thousands of years. This takes precisely 4 episodes to bite them.

So I’d like to propose a Stargate AU. Competent!Stargate you might call it. No-one has had a massive intelligence upgrade, the whole thing is not a massive game of Xanatos speed chess, it’s just… everyone behaves at a level of competence and strategic thinking that you would expect from their character and role.

Oh, that and nobody off Earth speaks English because argh seriously?

It’s similar to this alternate starting point for Atlantis, but not compatible with it.

Why is it not compatible with it? Well, easy. Stargate Command couldn’t possibly spare Doctor Weir from her role as head of the civilian half of Stargate Command, a position she’s held since Catherine retired from it.

Uh, let me start from the beginning.

It starts with a very simple point of departure. A conversation between Colonel O’Neill (the actual point of departure is that he’s had two ls from the beginning which put him in a much better state of mind) and General West.

O’Neill: General, we detonated the bomb and eliminated the threat, as per orders.
West: And is the Stargate on the other side destroyed?
O’Neill: Well, no.
West: All right, we’ll send through another bomb.
O’Neill: You can’t do that, General.
West: And why not exactly?
O’Neill: Well because on the other side there’s no threat, but there are an awful lot of innocent civilians. Oh and also a large quantity of the mineral that the Stargate is made out of which by all accounts is an amazing power source.
West: Hmm. A power source you say?
O’Neill: Ra said it would enhance our nuke by, uh, a lot.
West: I do like enhanced nuclear weapons…

An expedition is sent back through the gate, where they’re met by Daniel Jackson and a bunch of the Abydonians. It is politely explained to them that while they are welcome guests they are on the Abydonians’ land and will not be allowed to just start mining Naquadah. Daniel requests Catherine be involved in negotiating rights.

Between Daniel and Catherine they negotiate a deal for a provisional survey in exchange for food, medicines and a variety of other things that the Abydonians could really use from modern earth civilizations. Captain Carter comes through the gate and is so ridiculously excited by the Naquadah that she talks general West, and through him the US government, into realising they basically have to have these mining rights.

Daniel Jackson is not very keen on the military at this point, so part of the bargaining is that he really wants to deal with civilians, and preferably ones he trusts. Eventually a compromise agreement is reached where Stargate Command will be a join military/civilian operation with Catherine in charge of the civilian side, especially focused around coordinating the scientific and archaeological research teams on Abydos and liasons with the Abydonians.

The US is leased mineral rights to mine for Naquadah, plus the assistance of several Abydonians who are experienced in its mining, in exchange for a merely extortionate fee and assistance with converting that fee into personnel and goods.

So they get cracking on this. Carter heads up a team researching the various applications of Naquadah. Daniel assists with the archaeologists investigating the pyramid.

After some months of discovery they discover the Abydos cartouche and Sam rapidly figures out the program she wrote in canon SG1 to convert these into coordinates they can dial. The SGC’s exploration program begins 8 months early, long before Apophis has discovered the coordinates to earth or Abydos.

Colonel O’Neill asks to head up SG1 as he’s really bored with the military side of the SGC – it’s all depressingly quiet, and he wants to see the universe. Several members of his old team join him. Daniel Jackson and Sam Carter are way too busy doing important research and it’s not like they’ve been incompetent enough to get anyone they cared about be abducted by aliens, so why would they come along?

After a planet or two of cautious exploration, they discover that basically Goa’uld is the lingua franca of the galaxy. It’s not that far from the dialect they speak on Abydos, but none of the US military personnel have bothered to learn more than a couple words like “moonshine” in it, so this doesn’t go so well.

A new plan is formed: There are enough Abydonians who speak tolerable English and who want to see the universe who are more than happy to go exploring. Each SG team is equipped with one of them as a translator. Additionally, the SGC puts on regular classes in Goa’uld. Everyone who wishes to go offworld is required to attend these classes – the translators are a sufficient solution for now, so exploration continues, but being in an SG team and not trying to learn to speak the language is not an option.

Jack picks it up surprisingly quickly, with help from Skaara (they play baseball together on weekends), though his accent is and will remain atrocious.

After some months of exploration and talking to people they learn about the Goa’uld in more detail. They’ve yet to actually encounter any, but have figured out that Ra was definitely not the last of his race and that they should be prepared.

As part of this preparedness they realise they need better security to prevent invasion through the gate. The SGC installs Irises on both the earth and Abydos gates (the latter under the control of the Abydonians – they insisted, and Earth was keen enough to protect their Naquadah mine that they didn’t really have the option to say no).

After some less than responsible behaviour from people when confronted by natives who think they are gods, the SGC institutes a program of extremely strict psychological screening for teams. Jack is raised as a potential issue but persuades the psychologists that he is dealing well with his grief. Jonas Hanson is flagged as unsuitable for first contact situations.

Carter and her team meanwhile are making extremely good progress figuring out Naquadah power sources, given the ample supply of raw material and Goa’uld technology left on Abydos. She does miss being in the field but it would be cutting into way too much valuable research time. She’s the leading expert on this stuff.

Daniel on the other hand feels that the rest of the archaeological team has the study of the history of Abydos well in hand and requests to join SG1 with Jack. Jack is enthusiastic about the idea and his membership is accepted.

The SGC exploration teams discover Avnil and its abandoned Naquadah mines. After the aforementioned entirely psychologically stable team comes through, informs the locals that they are not gods, and does some investigation they find the radiation shield and turn it on (ably assisted by Carter’s team of scientists and engineers who while they haven’t yet got the underlying principles very clear yet are more than capable of finding the on switch). The locals are ridiculously grateful and are more than happy to let them set up a modern mining operation here. The mine is pretty close to depleted, so it’s not a patch on the Abydos mining operation, but the SGC are keen on having a redundant operation which they have much less ambiguous rights to.

They install an iris on the Avnil stargate and declare it to be their beta site. A garrison is established there and SG teams are routed through it when exploring so as to provide a secondary quarantine measure.

Apophis finds Ra’s cache of hidden addresses and dials both earth and Abydos. He does not have the relevant codes, so the Irises do not open and he concludes the gates must be buried.

The SGC encounter Apophis’s forces. They attempt peaceful negotiation, but Apophis is having none of these uppity humans who thinks they are his equal.

A Guerilla war between Apophis’s forces and the SGC begins. The SGC’s goal is generally to avoid conflict, but generally speaking where conflict occurs they wipe the floor with Apophis’s forces. They’re better at tactics and have better weapons (it’s basically canon that Earth projectile weapons are vastly more effective than the staffs the Jaffa are typically armed with).

Teal’c and Bra’tac hear word of the fact that there is a human force out there engaged in regular stargate travel and capable of taking on the Goa’uld. They start quietly sounding out people for starting the Jaffa rebellion. They gradually grow their network, with instructions to try to make contact with anyone from the SGC they run into.

However first, SG1 are met by a man with glowing eyes who tells them that not all Goa’uld are as they think. Although some are parasites, others achieve true symbiosis – the host and the symbiote sharing the body and making decisions together. They speak with both the host and he confirms this story.

That Goa’uld’s name? Ba’al.

Obviously they’re not stupid enough to do anything that would give Ba’aal the coordinates of earth, but they set up yet another off world base with an iris and give Ba’al the access codes for that. This becomes their point of coordination for the new offensive against Apophis.

Assisted by Ba’aal who provides them with ships and technology while they provide the manpower, Apophis is rapidly put on the defensive.

Jack O’Neill leads the final invasion of Apophis’s palace. There he faces Teal’c, who tells him that he has important information about their new “ally” Ba’al, and surrenders to O’Neill. Teal’c is taken back to Earth to be debriefed.

Apophis however is nowhere to be found, having escaped through the Stargate before the invasion.

Meanwhile, Ba’als forces have successfully invaded Apophis’s private sanctum, where they find, among other things his private stash of Stargate addresses, including the ones he obtained from Ra.

Acting on this information, Ba’al and his servants add these addresses to their program of exploration. They are particularly interested in this one for the Tau’ri given their interesting new allies with the remarkably populous planet.

Season 1 ends.

Notes:

  • I really like Sam as a character, but it just makes no sense at all for her to be on the front line given her expertise.
  • Daniel on the other hand actually is a field researcher, so has reason to join SG1, he’s just got not quite as strong an incentive as in canon
  • In Competent!Stargate it’s really hard to get Teal’c into contact with SG1, which is why it takes until end of season 1.
  • Ba’al’s appearance is brought forward four seasons. This is partly because he’s the only actually credibly competent threat amongst the Goa’uld and partly because as a competent Goa’uld there’s no way he wouldn’t take this opportunity to cement his power amongst the system lords (in canon the year after Ra’s death is basically one giant power struggle amongst the system lords, with Ba’al, Apophis and several others coming out on top).
  • Technological advancement proceeds a little faster than in canon because of the much greater availability of Naquadah and Goa’uld technology from Abydos, and later from Ba’al. Also because we actually have Sam Carter and her team working on it as a full time job. It doesn’t proceed ridiculously fast though because they’re still trying to reverse engineer a vastly more advanced alien technology – e.g. it will probably still take contact with the Orbanians before there are portable Naquadah generators.

 

 

This entry was posted in Stargate, Uncategorized on by .

The downside of making accurate inferences about people

This is a point that I’ve been thinking about on and off for a while and never really come up with or otherwise seen a satisfying conclusion to. It’s in the general category of “sometimes my politics and my epistemology are hard to make play well together”.

Suppose you meet a woman at a conference. Based solely on the fact that she’s a woman, you are 90% sure she’s a recruiter (Note: All numbers in this post are made up for making a clear point and should not be considered accurate). This isn’t you being prejudiced – you’ve met a lot of women at previous similar events and even this one and of those 90% were recruiters. Your judgement that the a priori chance that she’s a recruiter is an entirely accurate one.

(Reminder that I have no idea what the actual numbers are and that in most circumstances 90% is going to be a ridiculous overestimate)

The problem is not this judgement, but how you act on it. Maybe you really don’t want to interact with a recruiter right now so you avoid her, judging that the 10% chance that she’s a dev isn’t worth the 90% chance that you’ll have yet another awkward recruitment conversation. Maybe you do want to interact with a recruiter and you go up to her and talk excitedly about how you’re looking for a job and she’s like “Uh, great? I’m here to talk about consistency in distributed databases. I don’t really have any hiring power”. Maybe you just act surprised when you find out she’s actually a developer.

A useful feminist concept is that of the microaggression. An interaction where each individual instance is a minor thing that serves to reinforce roles and express prejudice in the aggregate. All of the above are examples.

The fact that they’re minor in individual instances but major in the aggregate is part of why microaggressions are so insidious. Because each individual interaction is not in and of itself a big deal, if you only see a few of them you probably don’t perceive this as any real problem. e.g. in the above you might have deprived the pair of you the chance of an interesting interaction, you might have slightly annoyed someone, etc. All of these are strictly worse than not doing them, but they’re also not the end of the world.

The problem is that everyone else is making more or less the same judgement as you. In practice peoples judgement will be inaccurate (usually tending to the overconfident), but in an epistemically optimal world where everyone has perfect reasoning, most people will come to something like that 90% number.

And this is pretty rough for the 10%. They’re now on the receiving end of a constant stream of microaggressions caused by these accurate judgements: The vast majority of people are treating them as if they’re something they’re not, or assuming them to be less competent at their speciality than they actually are.

(Aside: This being a problem does not require you to think that being a recruiter is in any way a bad thing. Recruiters sadly have a bad rep, but the problem here exists regardless of that: Being constantly assumed to be something other than what you are is grating)

Which will tend to mean they stop coming to conferences, and that number is going to get more extreme.

(Additional parenthetical disclaimer: Obviously this is not the only source of problem for women developers at conferences. It is likely swamped by other more serious problems. I have however definitely heard plenty of women developers complaining about things that sound awfully like being on the receiving end of this, so I don’t think this is just empty theorising)

This is the core problem of making accurate judgements about people: Whatever judgement you make will tend to reinforce itself, because it will be based on broad statistical trends, and this will tend to add friction to interactions with people who buck those trends, which will tend to discourage them, and thus counterexamples to the trends you base your judgement on will tend to disappear faster than those who fit the pattern.

And I’m not really sure what to do about this.

Oh, in the conference setting it’s easy enough. The benefits of accurate judgement are low enough that just going “Don’t do that then” is basically enough of a solution. Don’t form preconceptions about what people do based on their gender (or their race, or any of countless other categories) and try to treat everyone the same and you’ll probably just do fine in this case.

But a lot of the time when you’re making judgements about people it’s actually much more important and you do need to make accurate judgements. Consider for example hiring people.

Obviously you should not make judgements like “You are a woman therefore you are less likely to be good at this job therefore I won’t bother to interview you”. Even if this were true (it’s not) it would still be a terrible thing to do.

But a lot of people make judgements like “You do not have a Github profile with lots of open source code on it therefore you are less likely to be good at this job and therefore I won’t bother to interview you”. And guess what: Open source contributions are significantly gendered, due to a variety of cultural problems (women tend to have less free time due to greater expectation of doing house work, child care, etc, and open source is not exactly an inclusive environment). This is somewhat related to what I’ve written about false proxies previously, but is more insidious: It’s almost impossible to come up with metrics that are completely oblivious to certain boundaries (even the “Hire them and work with them for several years and see how you find it” metric isn’t: What if your company is secretly a bit racist and you just haven’t noticed because you’re white? The black colleague you hired is having a much harder time of it than the white one and so you will tend to judge them more harshly even if you yourself are completely ignoring their race).

About the best thing you can do that I know of is screen off certain questions at the individual level when making these decisions (make as many decisions as you can without even knowing about the person’s race, gender, etc. and where you do know it do your best to ignore it), then later go back and calibrate: This question that we screened off… are we actually screening it off? Do we get significantly different results in our process for men and women? Or for different ethnicities?

This is worth doing when you can, but a lot of the time it’s impossible to do. If you’re a small company you probably don’t have the numbers to get good stats. If you’re an individual trying to form opinions about people you can’t do this sort of statistical analysis – you’re not gathering the data, you probably can’t gather the data, and a lot of the time you’re not even aware you’re asking the question.

Which leads be back to “I don’t know what to do”, which is a pretty depressing point to end this piece on. I value both accurate judgements (not just for the sake of them: they’re also necessary for making good decisions and helping people) and not reinforcing structural prejudice, and it’s completely unclear to me how to balance the two. My current solutions are basically just a bunch of patchwork and special cases and I’ve no real idea whether I’m missing important areas or not.

If you’ve got any good ideas, I’d appreciate hearing them.

This entry was posted in Uncategorized on by .

Topological compactness is an induction principle

Compactness is one of the most important concepts in point-set topology. However when you first come across it it might not necessarily be obvious why. Paul Crowley asked me yesterday to do a follow-up post to my one about continuous functions to explain compactness, so I spent some time this morning thinking about how one might make it seem more intuitive.

I’m not sure this is that post, but it’s an interesting insight I had while trying to figure that post out. Another more comprehensive one may follow.

There are a variety of induction principles. The most well known one is that if you prove that something is true for 0 and that if it’s true for n it’s true for n + 1 then it must be true for all natural numbers. This is a special case of the induction principle for well ordered sets: In a well-ordered set, if p being true for all y < x implies that p is true for x, then p is true for all x.

The common thread is that induction allows you to conclude that something is true for the larger case by showing it’s true for smaller pieces and that the set of things for which it is true is closed under some operation.

I noticed this morning that there is a topological induction principle that’s fairly straightforwardly equivalent to compactness. What it essentially says is that compactness is the property that lets you conclude that things that are true locally are true globally.

Theorem: A topological space \(X\) is compact if and only if for every property of sets \(p\) such that:

  1. For all \(x \in X\) there is some open set \(U\) with \(x \in U\) and \(p(U)\).
  2. If \(p(A_1), \ldots, p(A_n)\) then \(p(\bigcup A_i)\)

Then \(p(X)\).

Proof:

First assume \(X\) is compact. By property 1 the set of open \(U\) such that \(p(U)\) is an open cover. By compactness it has a finite subcover. i.e. we can find \(U_1, \ldots, U_n\) such that \(p(U_i)\) and \(\bigcup U_i = X\). Thus by property 2, \(p(X)\).

Now assume the induction principle holds. Let \(\mathcal{U}\) be an open cover of \(X\). Let \(p(A)\) be the property that \(A\) is covered by a finite union of elements of \(\mathcal{U}\). This satisfies the requisite properties – every point is contained in a single element of \(\mathcal{U}\) which suffices for property 1, and a finite union of finite sets is finite, so it satisfies property 2. Therefore by the induction principle, \(X\) is also covered by a finite union of elements of \(\mathcal{U}\), i.e. a finite subcover. Therefore \(X\) is compact.

QED

Note the restriction to finite unions. If we’re allowed to take arbitrary unions then this is just true of all topological spaces.

I feel like there’s probably a similar induction principle lurking in the intersections of closed sets equivalent form. Perhaps something that looks more like recursive function definition? I’m not totally sure.

For an example of how you could use this, here is a reframing of the proofs of two classic results:

Theorem: A compact metric space is bounded.

Proof: A finite union of bounded sets is bounded, and the open ball \(B(x, 1)\) is a bounded open set containing \(x\). Therefore by topological induction \(X\) must also be bounded. QED

Theorem: A compact subset of a Hausdorff topological space is closed.

Let \(X\) be Hausdorff and \(Y \subseteq X\) be compact. Let \(x \in X \setminus Y\).

Let \(p(A)\) be true if \(x\) is not in the closure of \(A\) considered as a subset of \(X\).

Because \(X\) is Hausdorff for every \(y \in Y\) we can find \(U, V\) open and disjoint with \(y \in U\) and \(x \in V\). So \(\overline{U} \subseteq V^c \subseteq X \setminus \{x\}\) and thus \(p(U)\). The closure of a finite union of sets is the union of the closure of the sets, so if \(p(A_1), \ldots, P(A_n)\) then \(p(\bigcup A_i)\). Therefore \(p(Y)\). i.e. \(x\) is not in the closure of \(Y\). But \(x\) was an arbitrary point not in \(Y\), hence \(Y\) must be closed. QED.

These aren’t really very different from the normal proofs. Virtually identical even. I think they come out slightly more nicely this way, but there’s not much in it. Hopefully it helps clear up some of the intuition around compactness though.

This entry was posted in Numbers are hard on by .

Continuous functions are those which preserve approximate measurements

I like the point-set topology definition of continuous function. It’s elegant, generalises well, and I think puts a bunch of things on firmer foundations than epsilon-delta definitions.

But it also confusing to some people. Why are open sets? Why is it that the pre image of an open set under a continuous function is open rather than the image?

One way to fix this is to start with different but equivalent definitions of topological spaces. This is fine, but it’s a little unsatisfying. The open set formulation is widely used because it’s quite powerful. It would be nice to be able to make intuitive sense of it. Additionally, the same sort of definition crops up elsewhere – e.g. a measurable function is one where the pre-image of measurable sets are measurable.

So I’d like to give you some intuition as to why open sets make sense and why given that intuition the definition of the continuous function is the “obvious” one.

I suggest that the intuitive concept you should attach to an open set is that and open set is an approximate measurement.

What does this mean?

Well, first let me pin down what I mean by the words individually.

A “measurement” does not here mean something like “this rod is exactly 1.23 meters long”. “This rod is less than a mile long” or “this rod is between 1 and 2 meters long” are also measurements. “The length of this rod is no more than 100 times its diameter” is also a measurement. A measurement in this case is anything that helps you pin down the range of possible objects.

And “approximate” does not mean “I guessed”. It means “you do not need to know the exact value arbitrarily well in order to validate this measurement”. You can easily validate that the rod is between 1 and 2 meters long with a tape measure. You can’t validate that it’s exactly 1.23 meters long with a tape measure (but you can validate that it’s not).

An approximate measurement is, more or less, one where you only need a finite amount of information to validate it.

Note that you might need an infinite amount of information to refute it. If I tell you that the rod is less than one meter long and it turns out that the rod is exactly one meter long down to such a subquantum scale that it turns out we’re all living in a simulation of a platonic euclidean universe then you need to measure its length infinitely precisely in order to tell me I’m wrong – even if you measure it down to the nearest micron it might be half a micron short of one meter.

So this is our intuitive and imprecise definition of an open set: An open set is one where for any member of the set we can prove that it’s a member of that set with a finite amount of information.

This is of course nonsense. How does this give rise to different topologies? And what constitutes information?

What those questions are then determines our topology. They don’t need to actually correspond to any notion of finiteness (for example we could simply define the discrete topology in which all of the questions “Is it this point?” are permitted), but many classic ones do: e.g. You only need to evaluate a real number to a finite number of decimal places to prove that it’s in an open set.

Essentially these two resolve themselves together: Topologies correspond to different sorts of questions we can ask, and then “finite amount of information” just means that for every member we can prove that it’s a member by only asking a finite number of those questions.

This intuition corresponds nicely to the topology axioms: You only need 0 questions to determine if a member of the whole set is a member of the whole set, the empty set satisfies the property vacuously. If you have an arbitrary union \(\bigcup U_i\) then for \(x \in \bigcup U_i\), \(x \in U_j\) for some \(j\) and you only need a finite set of questions to prove that. If \(x \in U \cap V\) then you can take the finite proof that \(x \in U\) and the finite proof that \(x \in V\) and union them together.

You can make all this formal and get yet another characterisation of topological spaces but it’s not very interesting and ends up mostly corresponding to existing notions.

With that notion of approximate measures hand waved, we can now hand wave our notion of a continuous function:

If you apply a continuous function to some input and make an approximate measurement of the result, this gives you an approximate measurement of the input.

So for example if we just consider the length of a rod and make an approximate measurement of that, this gives us an approximate measurement of the whole rod: It still constrains the space of possible objects in a way we only need to ask finitely many questions to answer.

And this is precisely what “the preimage of an open set is open” means: If we make some measurement \(V\) and constrain \(f(x) \in V\) then this precisely corresponds to \(x \in f^{-1}(V)\). So “an approximate measurement of the result of a continuous function gives an approximate measurement of its input” is exactly “The preimage of an open set under a continuous function is open”.

But why does that match what we would intuitively think of as “continuity”?

Well, in some cases it doesn’t really, but that’s OK. For examples where we have more intuition about what continuous should mean it matches quite nicely:

Consider e.g. \(f\) with \(f(0) = 1\) and \(f(x) = 0\) otherwise. Now consider the measurement \(f(x) > \frac{1}{2}\). In order to know whether this holds for \(x\) we’re back in the “this rod is exactly one meter long” territory – no matter how precisely you measure \(x\) it might be just a bit closer to zero than that but still non-zero.

This works in more generality: At any point of discontinuity \(x\) you will find open sets that you need to know \(y\) arbitrarily well to distinguish it from \(x\) in order to determine membership.

Note also that an approximate measurement of the input to a continuous function does not give you an approximate measurement to the output. Consider e.g. the constant function \(f(x) = 1\). Then given some open set \(U\), in order to determine if \(y \in f(U)\) we need to test if \(y = 1\). This requires infinitely many decimal points of \(y\) and thus is not an approximate measurement.

Anyway, that’s enough hand waving. I don’t know if this actually clears things up for anyone (I figured this representation out long after I’d already internalized the rules of topology), but hopefully it’s given a different perspective on it.

This entry was posted in Numbers are hard on by .