This is a discussion between John Ousterhout and Martin, who advocated in “Clean Code” to omit comments and split code in extremely small functions. Ousterhout takes that to town by asking Martin to explain an algorithm which Martin presented in his book on “Clean Code”, and algorithm that generates a list of prime numbers. It turns out that Martin essentially does not understand his own code because of the way it is written - and even introduces a performance regression!

Ousterhout: Do you agree that there should be comments to explain each of these two issues?

Martin: I agree that the algorithm is subtle. Setting the first prime multiple as the square of the prime was deeply mysterious at first. I had to go on an hour-long bike ride to understand it.

[… .] The next comment cost me a good 20 minutes of puzzling things out.

[…] I refactored that old algorithm 18 years ago, and I thought all those method and variable names would make my intent clear – because I understood that algorithm.

[ Martin presents a re-write of the algorithm]

Ousterhout: Unfortunately, this revision of the code creates a serious performance regression: I measured a factor of 3-4x slowdown compared to either of the earlier revisions. The problem is that you changed the processing of a particular candidate from a single loop to two loops (the increaseEach… and candidateIsNot… methods). In the loop from earlier revisions, and in the candidateIsNot method, the loop aborts once the candidate is disqualified (and most candidates are quickly eliminated). However, increaseEach… must examine every entry in primeMultiples. This results in 5-10x as many loop iterations and a 3-4x overall slowdown.

It gets even more hilarious when one considers where Martin has taken from the algorithm, and who designed it originally:

Martin took it from a 1972 publication of Donald E. Knuths seminal article on Literate Programming:

http://www.literateprogramming.com/knuthweb.pdf

In this article, Knuth explains that the source code of a program should be ideally understood as a by-product of an explanation which is directed at humans, explaining reasoning, design, invariants and so on. He presents a system which can automatically extract and assemble program source code from such a text.

Even more interesting, the algorithm was not invented by Knuth himself. It was published in 1970 by Edsger Dijkstra in his “Notes on Structured Programming” (with a second edition in 1972).

In this truly fascinating and timeless text, Dijkstra writes on software design by top-down problem decomposition, proving properties of program modules by analysis, using invariants to compose larger programs from smaller algorithms and design new data types, and so on. Also, how this makes software maintainable. In this, he uses the prime number generation algorithm as an extended example. He stresses multiple times that both architecture and invariants need to be documented on their own, to make the code understandable. (If you want that feeling you are standing on the shoulders of giants, you should read what Dijkstra, Knuth, and also Tony Hoare and Niklaus Wirth wrote).

So, Robert Martin is proven wrong here. He does not even understand, and could not properly maintain, the code from his own book. Nor did he understand that his code is hard to understand for others.

( I would highly recommend Ousterhout’s book.)

  • squaresinger@lemmy.world
    link
    fedilink
    arrow-up
    16
    ·
    23 days ago

    I’m totally with Ousterhout here! Thanks for posting this great discussion!

    The problem with the “Clean code” approach of overdecomposition is that it doesn’t abstract the code away in meaningful ways. The code is still there and to debug/avoid bugs you still need to know all of it, if the methods are entangeld. So I still need to keep 500 lines of code in mind, but now they aren’t all in one file where I can easily follow them, but instead spread over 40 files, each just containing 1-2 line methods.

    I’m also very much against “Clean code”'s recommendations on comments. In the end it either leads to no documentation or documentation lost somewhere in confluence that nobody ever reads or updates because it’s not where it’s needed.

    Getting developers to read and update documentation is not an easy task, so the easier it is to find and update the documentation the more likely it is that the documentation is actually used. And there is no easier-to-access place for documentation than in comments right in the code. I really like Javadoc-style documentation since it easily explains the interface right where it’s needed and neatly integrates with IDEs.

    • Thorry@feddit.org
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      23 days ago

      There are a couple of things I do agree with in regards to the comments in code. They aren’t meant as a replacement for documentation. Documentation is still required to explain more abstract overview kind of stuff, known limitations etc. If your class has 3 pages of text in comments at the top, that would probably be better off in the documentation. When working with large teams there are often people who need to understand what the code can and can’t do, how edge cases are handled etc. but can’t read actual code. By writing proper documentation, a lot of questions can be avoided and often help coders as well with a better understanding of the system. Writing doc blocks in a matter that can be extracted into a documentation helps a lot as well, but I feel that does provide an easy way out to not write actual documentation. Of course depending on the situation this might not matter or one might not care, it’s something that comes up more when working in large teams.

      Just like writing code, writing proper comments is a bit of an art. I’ve very often seen developers be way too verbose, commenting almost every line with the literal thing the next line does. Anyone who can read the code can see what it does. What we can’t see is why it does this or why it doesn’t do it in some other obvious way. This is something you see a lot with AI generated code, probably because a lot of their training was done on tutorials where every line was explained so people learning can follow along.

      This also ties in with keeping comments updated and accurate when changing code. If the comment and the code doesn’t match with each other, which one is true? I’ve in the past worked on legacy codebases where the comments were almost always traps. The code didn’t match the comments at all, sometimes obviously so, most times only very subtle. We were always guessing was the implementation meant to be the comment and the difference just a mistake? The codebase was riddled with bugs, so it’s likely. Or was the code changed at a later point on purpose and the comments neglected?

      Luckily these days we have good tools in regards to source control, with things like feature branches, pull requests with tools that allow for discussion and annotation. That way at least usually the origin of a change is traceable. And code review can be applied before the change is merged, so mistakes like neglecting comments can be caught.

      Now I don’t agree with the principle of no comments at all. Just because a tool has some issues and limitations doesn’t mean it gets banned from our toolbox. But writing actual useful comments is very hard and can be just as hard as writing good code. Comments also aren’t a cheat card for writing bad code, the code needs to stand on its own and be enhanced by the comments.

      It’s one of those things we’ve been arguing about over my entire 40 year career. I don’t think there is a right way. Whatever is best depends on the person, the team, the system etc. And like with many things, there are people who are good and people who suck. That’s just the way the cookie crumbles.

      • squaresinger@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        23 days ago

        You are obviously right about the things you are saying. I was specifically talking about code documentation on a class/method level. User documentation, architecture documentation or other high-level documentation doesn’t make sense in the code, of course.

        I have seen similar levels of documentation as you talk about (every line, every call documentated), but in flow charts in Confluence. That has the same issues as documenting every line of code in comments but worse.

        Just because a tool has some issues and limitations doesn’t mean it gets banned from our toolbox.

        This is very much it. Every tool can be abused and no tool is perfect. Code can have bugs and can be bad (and often both things happen). Should we now ban writing code?

        If the comment and the code doesn’t match with each other, which one is true?

        This can be true even with code alone. A while ago I found a bug in an old piece of code written by someone who left the company years ago.

        The method causing the bug was named something like isNotX(). In the function it returned isX. About half the places where the function was called, the returned value was assigned to a variable named isX and in the other half of the places the variable was named isNotX. So which is true?

        A javadoc-style comment could have acted as parity. Since comments are simpler to write than code, it’s easier to correctly explain the purpose of a function in there than in code.

        While in the example I referenced it was quite clear that something was wrong, this might not always be the case. Often the code looks consistent while actually being wrong. A comment can help to discern what’s going on there.

        Another example of that that we had at the same project:

        In the project there were bookings and prebookings. We had a customer-facing REST endpoint called “getSomeSpecialBookings” (it wasn’t called that, but the important thing was that this function would return a special subset of bookings). Other “get…Bookings” endpoints would return only return bookings and not prebookings, but this special endpoint would return both bookings and prebookings. A customer complained about that, so we fixed the “bug” and now this endpoint only returned bookings.

        (There was no comment anywhere and we couldn’t find anything relevant in Confluence.)

        Directly after the release some other customer creates a highest priority escalation because this change broke their workflow.

        Turns out, that endpoint only existed because that customer asked for it and the dev who implemented that endpoint just implemented it as the customer requested without documenting it anywhere.

        A comment would have been enough to explain that what this endpoint was doing was on purpose.

        We all know that code tends to be bad, especially after the project has been running for a few years and has been through a few hands.

        Why would anyone think that code is good enough to be the documentation?

        Luckily these days we have good tools in regards to source control, with things like feature branches, pull requests with tools that allow for discussion and annotation. That way at least usually the origin of a change is traceable.

        Sadly, we also have non-technical people running procurement and thus we keep switching tools because one is maginally cheaper or because cloud is cool right now (or not cool anymore right now) and migrations suck and then we end up with lost history.

          • squaresinger@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            23 days ago

            A year or so before I started my current job, the team working on the project got split. Someone then decided that both teams should use different jira prefixes for tickets processed by each team. So they took all issues and automatically split them into two prefixes based on the people who implemented the ticket and renumbered everything. But they didn’t do the same in Gitlab merge requests, and they didn’t do it in git commit messages either.

            So now git and gitlab reference all old tickets by their old numbering system, but there’s no trace of these old numbers in Jira. It’s close to impossible to find the Jira ticket mentioned in a git commit message.

            Oh, and of course, nobody ever managed to properly link Jira and Gitlab (so that jira tickets contain the gitlab MRs, branches and commits) because for that you need a free Jira plugin and procurement wants a multi-page long description why this is needed, and it needs to be signed off by 5 people including the department lead and has to go through the whole procurement process before we can install that plugin.

      • HaraldvonBlauzahn@feddit.orgOP
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        23 days ago

        Anyone who can read the code can see what it does. What we can’t see is why it does this or why it doesn’t do it in some other obvious way. This is something you see a lot with AI generated code, probably because a lot of their training was done on tutorials where every line was explained so people learning can follow along.

        This is also an important difference between C++ and Rust: Rust ensures correctness of ownership semantics, mutating xor sharing values, absence of race conditions, and so on. In that sense, Rust has stricter syntax: It puts things into the code which are “meta-code” in C++.

        Because of that, many or perhaps most correct Rust programs could be re-written verbatim to correct C++ programs. Invariants that would ensure correctness can be put into comments. But, in practice it is extremely hard to write originally correct multi-threaded C++ programs that way. One reason for this is that many C++ programmers lack both the means as well as the culture to annotate the correctness of their code. Of course invariants and pre-conditions can be annotated in comments, but in reality it is rarely done.