Hacker Newsnew | past | comments | ask | show | jobs | submit | abainbridge's commentslogin

Pounding stone seems reasonable to me. Obviously I don't have any proof or even strong evidence but I saw a video that changed my perception of what is possible. It showed two old men making a millstone with hand tools: https://www.youtube.com/watch?v=lscs5ZgNQrE. The amount of labour involved and quality of the finished item was astonishing to me. Maybe you'll think that the hideous amount of labour needed to make a simple geometric shape makes you even more convinced the Inca has some other way to achieve their even harder task. But it is a fun video anyway.

Similarly astonishing to me is that Michelanglo's David was carved from a single piece of marble with a hammer and chisel. I mean, just look at it: https://en.wikipedia.org/wiki/David_(Michelangelo)


The video does not counter the parents argument about measuring fit.

What the masons in the video do is certainly impressive. Cutting organic shapes that fit perfectly together, as if they once were elastic, is another level.

Perhaps the did something similar to what dentists do when building on teeth so that the added material is not the only contract point when jaws are closed. That is, a contact sheet that leaves contact marks.


> The video does not counter the parents argument about measuring fit.

I know. I mainly just wanted to link that video because it is awesome.

The article does explain how the Inca did it - only the front edges are tight fitting. The gaps between the inside surfaces are filled with mortar. They sat the stone where it was to be placed, but with the front edge raised up by resting on some spacers, then just incrementally improved the fit of the edge and re-tried the fit. I'd have still thought that was impossible without seeing something like the video I linked - my intuition of what can be achieved with hammer and chisel was wrong.


Edit: I think that was too strong. I don't have any real knowledge of this subject. The explanation in the article seemed reasonable to me. That is all.

> Perhaps the did something similar to what dentists do when building on teeth so that the added material is not the only contract point when jaws are closed. That is, a contact sheet that leaves contact marks.

The article linked in this post mentions the possibility of „red clay“ being used for this purpose, as well as being a mortar.


The examples are fun, but rather than yet another article saying how amazing optimizing compilers are (they are, I already know), I'd probably benefit more from an article explaining when obvious optimizations are missed and what to do about it.

Some boring examples I've just thought of...

eg 1:

    int bar(int num) { return num / 2; }
Doesn't get optimized to a single shift right, because the that won't work if num is negative. In this case we can change the ints to unsigneds to tell the compiler we know the number isn't negative. But it isn't always easy to express to the compiler everything you know about your data and use case. There is an art in knowing what kinds of things you need to tell the compiler in order to unlock optimizations.

eg 2:

    int foo(void) { return strlen("hello"); }
We all know that strlen will return 5, but some compilers don't: https://godbolt.org/z/M7x5qraE6

eg 3:

    int foo(char const *s) {
      if (strlen(s) < 3) return 0;
      if (strcmp(s, "hello") == 0)
        return 1;
      return 0;
    }
This function returns 1 if s is "hello". 0 otherwise. I've added a pointless strlen(). It seems like no compiler is clever enough to remove it. https://godbolt.org/z/Koj65eo5K. I can think of many reasons the compiler isn't able to spot this.


> We all know that strlen will return 5, but some compilers don't: https://godbolt.org/z/M7x5qraE6

I feel like it is unfair to blame the compiler when you've explicitly asked for `/O1`. If you change this to `/O2` or `/Ox` then MSVC will optimize this into a constant 5, proving that it does "know" that strlen will return 5 in this case.


Fair point. It doesn't do the optimization if you ask to optimize for size '/Os' either.


Yeah, this one as well:

  bool is_divisible_by_6(int x) {
      return x % 2 == 0 && x % 3 == 0;
  }

  bool is_divisible_by_6_optimal(int x) {
      return x % 6 == 0;
  }
Mathematically x % 2 == 0 && x % 3 == 0 is exactly the same as x % 6 == 0 for all C/C++ int values but the compiler doesn't see them as identical, and produces less optimal code for is_divisible_by_6 than for is_divisible_by_6_optimal.


Mhm, this is one of these cases I'd prefer a benchmark to be sure. Checking %2 is very performant and actually just a single bit check. I can also imagine some cpu's having a special code path for %3. In practice I would not be surprised that the double operand is actually faster than the %6. I am mobile at this moment, so not able to verify.


But if % 2 && % 3 is better, then isn't there still a missed optimization in this example?


Let's throw this into godbolt: https://clang.godbolt.org/z/qW3qx13qT

    is_divisible_by_6(int):
        test    dil, 1
        jne     .LBB0_1
        imul    eax, edi, -1431655765
        add     eax, 715827882
        cmp     eax, 1431655765
        setb    al
        ret
    .LBB0_1:
        xor     eax, eax
        ret

    is_divisible_by_6_optimal(int):
        imul    eax, edi, -1431655765
        add     eax, 715827882
        ror     eax
        cmp     eax, 715827883
        setb    al
        ret
By themselves, the mod 6 and mod 3 operations are almost identical -- in both cases the compiler used the reciprocal trick to transform the modulo into an imul+add+cmp, the only practical difference being that the %6 has one extra bit shift.

But note the branch in the first function! The original code uses the && operator, which is short-circuiting -- so from the compiler's perspective, perhaps the programmer expects that x % 2 will usually be false, and so we can skip the expensive 3 most of the time. The "suboptimal" version is potentially quite a bit faster in the best case, but also potentially quite a bit slower in the worst case (since that branch could be mispredicted). There's not really a way for the compiler to know which version is "better" without more context, so deferring to "what the programmer wrote" makes sense.

That being said, I don't know that this is really a case of "the compiler knows best" rather than just not having that kind of optimization implemented. If we write 'x % 6 && x % 3', the compiler pointlessly generates both operations. And GCC generates branchless code for 'is_divisible_by_6', which is just worse than 'is_divisible_by_6_optimal' in all cases.


I also tried this

  bool is_divisible_by_15(int x) {
      return x % 3 == 0 && x % 5 == 0;
  }

  bool is_divisible_by_15_optimal(int x) {
      return x % 15 == 0;
  }
is_divisible_by_15 still has a branch, while is_divisible_by_15_optimal does not

  is_divisible_by_15(int):
        imul    eax, edi, -1431655765
        add     eax, 715827882
        cmp     eax, 1431655764
        jbe     .LBB0_2
        xor     eax, eax
        ret
  .LBB0_2:
        imul    eax, edi, -858993459
        add     eax, 429496729
        cmp     eax, 858993459
        setb    al
        ret

  is_divisible_by_15_optimal(int):
        imul    eax, edi, -286331153
        add     eax, 143165576
        cmp     eax, 286331153
        setb    al
        ret


Nice.

Is the best way to think of optimizing compilers, "I wonder if someone hand wrote a rule for the optimizer that fits this case"?


Probably not, because a lot of the power of optimizing compilers comes from composing optimizations. Also a lot comes from being able to rule out undefined behavior.


> int bar(int num) { return num / 2; } > > Doesn't get optimized to a single shift right, because the that won't work if num is negative.

Nit: some might think the reason this doesn't work is because the shift would "move" the sign bit, but actually arithmetic shifting instructions exist for this exact purpose. The reason they are not enough is because shifting provides the wrong kind of division rounding for negative numbers. This can however be fixed up by adding 1 if the number is negative (this can be done with an additional logical shift for moving the sign bit to the rightmost position and an addition).


Good point. I guess there are more cases than just this one where I'd like to be able to tell the compiler I don't care about rounding behaviour and would prefer the fastest code. Like -ffast-math but for integer operations. I don't think that exists. I wonder why.


Will shift, shift, and add be slower or faster than a divide instruction on machines with a divide instruction?


Most likely no, division instructins generally take as much as 10-20 other arithmetic/logic instruction.


> won't work if num is negative

I remember reading (although I can't find it now) a great analysis of all the optimizations that Javascript compilers _can't_ do because of the existence of the "eval" instruction.


The extra fun thing about this is that eval has different semantics if it's assigned to a different name, in order to allow JavaScript implementations to apply extra optimizations to code that doesn't call a function literally named "eval": https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Andy Wingo (of course!) has a good explanation of this: https://wingolog.org/archives/2012/01/12/javascript-eval-con...


Could this perhaps be it? https://janvitek.org/pubs/ecoop11.pdf


A JIT can do any optimization it wants, as long as it can deoptimize if it turns out it was wrong.


You also want to prove that the „optimization“ doesn’t make things slower.


I don't think anyone actually bothers to do that tbh.


The compiler doesn't know the implementation of strlen, it only has its header. At runtime it might be different than at compile time (e.g. LD_PRELOAD=...). For this to be optimized you need link time optimization.


Both clang and gcc do optimize it though - https://godbolt.org/z/cGG9dq756. You need -fno-builtin or similar to get them to not.


No, the compiler may assume that the behavior of standard library functions is standards-conformant.


> No, the compiler may assume that the behavior of standard library functions is standards-conformant.

Why?

What happens if it isn't?


Sadness. Tons of functions from the standard library are special cases by the compiler. The compiler can elide malloc calls if it can prove it doesn't need them, even though strictly speaking malloc has side effects by changing the heap state. Just not useful side effects.

memcpy will get transformed and inlined for small copies all the time.


What happens when you change random functions in your C compiler? The C standard library and compiler are not independent, both make up a C implementation, which behaviour is described by the C standard.


Yes, though it's worth stating that it's a little more nuanced than that, since (for historical, path-dependent reasons) the compiler and libc are often independent projects (and libc often includes a bunch of other stuff beyond what the standard/compiler need).

This is the case, for example, on macOS, FreeBSD, and Linux.

Cute username, BTW.


You are right, it depends, whether you write C (from the standard) or a specific dialect from your vendor (everybody does in practice). In the latter case, you need to know about the rules of the compiler. But to allow optimization, these are often similar, so that the compiler assumes these have the behaviour of the implementation, that the compiler is tailored against.

> Cute username, BTW.

Thanks, I was to lazy to think of a real name, so this is the timestamp, I created the account.


> What happens if it isn't?

§6.4.2.1: "If the program defines a reserved identifier [...] the behavior is undefined."


The most common reason is to do optimizations such as replacing strlen("hello") with 5 or open-coding strlen (or, more commonly, memcpy or memcmp). If you're linking with a non-conformant strlen (or memcpy or whatever) the usual thing that happens is that you get standards-compliant behavior when the compiler optimizes away the call, but you get the non-conformant behavior you presumably wanted when the compiler compiles a call to your non-conformant function.

But the orthodox answer to such questions is that demons fly out of your nose.


Because that's what it means to compile a specific dialect of a specific programming language?

If you want a dialect where they aren't allowed to assume that you would have to make your own


It does. The meaning of certain functions are prescribed by the C standard and the compiler is allowed to expect them to have certain implementations. It can replace them with intrinsics or even remove them entirely. It is of course different for a freestanding implementation.


Hmmm, really? Switching compiler seems sufficient: https://godbolt.org/z/xnevov5d7

BTW, the case of it not optimizing was MSVC targetting Windows (which doesn't support LD_PRELOAD, but maybe has something similar?).


> I've added a pointless strlen(). It seems like no compiler is clever enough to remove it.

For that you could at least argue that if the libc's strlen is faster than strcmp, that improves performance if the programmer expects the function to be usually called with a short input.

That said, changing it to `if (strlen(s) == 5) return 0;` it still doesn't get optimized (https://godbolt.org/z/7feWWjhfo), even though the entire function is completely equivalent to just `return 0;`.


> but some compilers don't: https://godbolt.org/z/M7x5qraE6

You've misrepresented the situation. Turn up the optimiser to `/O2` and MSVC returns 5 directly, too.

> This function returns 1 if s is "hello". 0 otherwise. I've added a pointless strlen(). It seems like no compiler is clever enough to remove it.

It's funny how sometimes operating at a higher level of abstraction allows the compiler to optimise the code better: https://godbolt.org/z/EYP5764Mv

In this, the string literal "hello" is lowered not merely into a static string, but a handful of integral immediates that are directly inline in the assembly, no label-dereferencing required, and the 'is equal to "hello"' test is cast as the result of some sign extends and a bitwise-xor.

Of course, one could argue that std::string_view::size() is statically available, but then my counter-argument is that C's zero-terminated strings are a massive pessimisation (which is why the compiler couldn't 'see' what we humans can), and should always be avoided.


eg 4:

   int foo(char const *s) {
     if (s[0] == 'h' && s[1] == 'e' && s[2] == 'l' && s[3] == 'l')
        return 1;
     return 0;
   }
The outputs 4 cmp instructions here, even though I'd have thought 1 was sufficient. https://godbolt.org/z/hqMnbrnKe


`s[0] == 'h'` isn't sufficient to guarantee that `s[3]` can be access without a segfault, so the compiler is not allowed to perform this optimization.

If you use `&` instead of `&&` (so that all array elements are accessed unconditionally), the optimization will happen: https://godbolt.org/z/KjdT16Kfb

(also note you got the endianness wrong in your hand-optimized version)


> If you use `&` instead of `&&` (so that all array elements are accessed unconditionally), the optimization will happen

But then you're accessing four elements of a string that could have a strlen of less than 3. If the strlen is 1 then the short circuit case saves you because s[1] will be '\0' instead of 'e' and then you don't access elements past the end of the string. The "optimized" version is UB for short strings.


Yes, so that's why the compiler can't and doesn't emit the optimized version if you write the short circuited version - because it behaves differently for short strings.


UB doesn't exist in the processor (it does, but not here). If the compiler knows the pointer is aligned it can do the transformation.


For the compiler to know the pointer is aligned it would have to actually be aligned and there is no guarantee that it is.


This is fantastic, thanks! This is the approach I use in httpdito to detect the CRLFCRLF that terminates an HTTP/1.0 GET request, but I'm doing it in assembly.


Ooo, I'd never thought of using & like that. Interesting.

> (also note you got the endianness wrong in your hand-optimized version) Doh :-)


Matt Godbolt's talk on ray tracers, shows how effective that change can be. Think it was that talk anyway.

https://www.youtube.com/watch?v=HG6c4Kwbv4I


good ol' short circuiting


If you want to tell the compiler not to worry about the possible buffer overrun then you can try `int foo(char const s[static 4])`. Or use `&` instead of `&&` to ensure that there is no short-circuiting, e.g. `if ((s[0] == 'h') & (s[1] == 'e') & (s[2] == 'l') & (s[3] == 'l'))` Either way, this then compiles down to a single 32-bit comparison.

Interestingly, it is comparing against a different 32-bit value than `bar` does. I think this is because you accidentally got the order backwards in `bar`.

The code in `bar` is probably not a good idea on targets that don't like unaligned loads.


That's because the 1 instruction variant may read past the end of an array. Let's say s is a single null byte at 0x2000fff, for example (and that memory is only mapped through 0x2001000); the function as written is fine, but the optimized version may page fault.


Ah, yes, good point. I think this is a nice example of "I didn't notice I needed to tell the compiler a thing I know so it can optimize".


`s` may be null, and so the strlen may seg fault.


But that's undefined behavior, so the compiler is free to ignore that possibility.


> so the compiler is free to ignore that possibility

And that's what is wrong. This is the most unfriendly behavior towards the programmer.


Since the optimiser is allowed to assume you're not invoking UB, and strlen of null is UB, I don't believe that it would consider that case when optimising this function.


I understand that, but I don't agree that such optimizer behavior is worth it and I won't put it in my compilers.


I appreciate that greatly.


The notion that because it is undefined behavior means that the compiler is free to replace it with anything up to and including "launch nuclear missiles". This is just nuts.

If I program it to cause a null pointer seg fault, I expect a null pointer seg fault. If I program it to cause a twos complement overflow, I want a twos complement overflow.


Yeah, I feel the same way. It's refreshing to hear that that's not just because I'm insane. I think C compiler teams are sort of forced into this stupid shit because they don't have new CPU architectures to port to anymore, so, unless they want to go find new jobs, they're forced to waste their time and everyone else's by "improving" the compilers by increasing performance in riskier and riskier ways.


FTA, "A Gaussian splat is essentially a bunch of blurry ellipsoids. Each one has a view-dependent color". Does that explain it?


While we're nit-picking the title, what does the "real-time" part mean? How would it be different if it wasn't real-time?

Dictionary.com defines "real-time" like as, "the actual time during which a process or event occurs", eg "along with much of the country, he watched events unfolding in real time on TV". Or in the domain of Computing, "relating to a system in which input data is processed within milliseconds so that it is available virtually immediately as feedback to the process from which it is coming, e.g. a missile guidance system might have "real-time signal processing".

Neither definition work here. It seems like they took a sequence of pictures very quickly, and then, some time later, played them back at an enormously slowed-down rate.


The opposite of "real-time" in this context would be "sampling". It means that the capture represents the high-resolution time history of one particular event (one explosion) instead of fast and successively offset captures from as many events.


Or academics in 1986: https://dl.acm.org/doi/abs/10.1145/13310.13338

The idea of optimizations running at different stages in the build, with different visibility of the whole program, was discussed in 1979, but the world was so different back then that the discussion seems foreign. https://dl.acm.org/doi/pdf/10.1145/872732.806974


> I would have guessed that any kind of forests have quite limited cap how much carbon it could retain in dead wood

The article says, "We found that a forest that's developing toward old-growth condition is accruing more wood in the stream than is being lost through decomposition" and "The effect will continue in coming decades, Keeton said, because many mature New England forests are only about halfway through their long recovery from 19th- and 20th-century clearing for timber and agriculture".


Ah, overlooked they actually acknowledge the "cap" directly in the preceding paragraph, and even put it into "coming decades" time frame. Makes much more sense now, thanks for the pointer!

Still a bit confused about the emphasis in wood deposits in "streams" – reportedly way more effective, but I'd guess with very limited capacity to really "lock" the mass – compared to regular hummus – not that effective, but for forest with couple of centuries of growth ahead I'd guess way more capacious. Good news either way, though!


Reading between the lines in the article (which is of course always subject to incorrect interpretation) I think the reason for the focus on streams is just that nobody else has looked at that before and thus it is a factor not previously accounted for. Other sources have already been accounted for - they may be worth more than what is in streams, but it is already known so the article didn't mention them.


“Coming decades” is an understatement. It depends on local conditions but douglas fir pines in the PNW take 200-300 years to decay completely, so that’s centuries more of carbon capture as long as we let our forests rewild. Realistically a forest becomes old growth once there are at least three generations of trees in various states of decay. That may decades in warmer climates but much longer in the north.


I skimmed https://en.wikipedia.org/wiki/Sinoatrial_node#Function. Here's my guess at what is going on:

In humans (and I guess many animals), the thing that controls the heart beat is a structure in the heart called the Sinoatrial node. Each cell in the SA node has an ability to generate its own rhythmic electrical impulse. I imagine that when one of these cells thaws out in a Wood frog, it immediately starts producing its rhythmic pulse. It has to get in sync with the rest of the cells in the Sinoatrial node before the heart will beat correctly, so the cells have a mechanism to communicate their rhythm with their neighbours. I guess each cycle, each cell adjusts its phase a little towards the average phase of its neighbours and thus a consensus will be reached.


Fireflies eventually manage to more or less sync up and they’re completely separate organisms - tiny cells with physical connections inside the body should be able to make it work.


Yep. When I walk into a room containing my dog and the poo it did on the carpet two hours earlier, it is sorry about what it did.


Pavlovian response. The dog associates pooing on the carpet with you punishing it/being upset with it.


> When I walk into a room containing my dog and the poo it did on the carpet two hours earlier, it is sorry about what it did.

This shows that the dog knows it did something you didn't want it to do, yes. I'm not sure it shows "thinking about ourselves, scrutinising and evaluating what we do and why we do it", which is how the article this discussion is about described "mental reflection". Our dogs have made messes in our house and have shown evidence that they know they're not supposed to, but I see no evidence that they have done any reflection on why they did it.

Of course "mental reflection" is not an all or nothing thing, obviously there is a continuum of possibilities. So a more precise phrasing of my question might be: is there any evidence that dogs can perform mental reflection at a point on that continuum anywhere near the point where humans do it? Or are they only capable of it at a point on the continuum much, much closer to the other end, the "no reflection at all" end?


What would a metric for measuring distance on this continuum look like?


I'm not sure since we don't have any good way of quantifying "amount of mental reflection". But that doesn't mean there isn't a large difference between dogs and humans in this regard.


Another example is Low Density Parity Check Codes [1]. Discovered in 1962 by Robert Gallager but abandoned and forgotten about for decades due to being computationally impractical. It looks like there was a 38 year gap in the literature until rediscovered by David MacKay [2].

The first mainstream use was in 2003. It is now used in WiFi, Ethernet and 5G.

[1] https://en.wikipedia.org/wiki/Low-density_parity-check_code

[2] https://scholar.google.com/scholar?q=%22low+density+parity+c...


An article about that from 2000. https://www.theregister.com/2000/04/17/playstation_2_exports...

Brilliantly it says:

Register readers with very long memories indeed will recall similar concerns being raised over Sir Clive Sinclair's ZX-81. The fear then was that the sneaky Sovs would try to buy heaps of ZX-81s for their Zilog Z80-A CPUs and might 1KB RAM to upgrade their nuclear missile guidance systems.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: