The better error messages alone make it worthwhile to upgrade. The trend started in 3.10, and it already made a difference for me, my coworkers and students.
But remember that while it's great to play with it freshly out of the oven, and that you might want to test your projects/libs with it, we should wait a bit before migrating production.
Indeed, every first release of a new major version of Python eventually important bugs that get ironed out in a later patch. Also, some libs on pypi may simply not be compatible with it yet, breaking your pip install.
I usually wait until the 3rd patch myself, after many years of paying the price of greedy upgrades.
Maybe what I have in mind as the env setup is different but 1) lower envs might not experience the load or edge cases as prod 2) lower envs are supposed to be as similar as possible to prod. Is it a good idea to test things with Python 3.11 and run it in prod using a different Python version? It might even force you to use different versions of a lib for each env which means no confidence for releases to prod based on lower env tests (one way would be to have two lower envs. One for next Python version and another for current one as in prod. But #1 is my main issue.)
Once everything gets wheels/bumped, it'll be a lot easier. The last few major versions have been fairly straight forward to upgrade once they're all in place, and the nice thing is this should hopefully fix any remaining packages that aren't built for the arm64 macs.
About this new asyncio.TaskGroup thing, I found this from Guido on the related GH issue*
> After some conversations with Yury, and encouraged by the SC's approval of PEP-654, I am proposing to add a new class, asyncio.TaskGroup, which introduces structured concurrency similar to nurseries in Trio.
I have never used but have been told that Trio's nurseries make it much easier to handle exceptions in asyncio tasks. Does someone more knowledgeable can tell if this will help? Looking at the docs*, this only seems to be a helper when you want to await several tasks at once, so I am not sure this changes much for exception handling.
As an emperical point, I moved from asyncio to Trio and it was transformative. This will help bring asyncio almost up to parity but it's a pity that it's still possible to make tasks that don't belong to a task group - in Trio, the only way to start a task is to run it in a specified nursery. (But of course understandable for backwards compatibility.)
> this only seems to be a helper when you want to await several tasks at once
Sort of. It's a helper for if you want to run multiple tasks at once, not necessarily awaiting them. And you're definitely running multiple tasks at once otherwise you wouldn't be using asyncio in the first place.
Task groups do require you to wait for the tasks - after all, you have to start the task in a task group, and then implicit await the tasks in it (by falling off the end of the task group context block). But you can always have an outer task group representing tasks that you indent to run indefinitely in the background. In that way, task groups force you to think about when a task would cancel other tasks, representing the overall structure of your program.
> This highlighting will occur for every frame in the traceback. For instance, if a similar error is part of a complex function call chain, the traceback would display the code associated to the current instruction in every frame:
Traceback (most recent call last):
File "test.py", line 14, in <module>
lel3(x)
^^^^^^^
File "test.py", line 12, in lel3
return lel2(x) / 23
^^^^^^^
File "test.py", line 9, in lel2
return 25 + lel(x) + lel(x)
^^^^^^
File "test.py", line 6, in lel
return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable
I sometimes sacrifice readability just because I hate creating variables. But then if it affects debugging times, my boss would be furious. As such, I use a full debugger anyway so I can trace quickly.
What's your plan for redframes? It looks really new?
I'm the co-author of O'Reilly's High Performance Python so I'm always on the lookout for pandas alternatives. Are you looking at speed implications too? Bigger-than-RAM use cases? "Easier than pandas" only (which, of course, is totally huge in its own right)?
"Easier than pandas" is the most important consideration for right now! If pandas is "Jira", I consider redframes to be "Trello" (appropriate for most, but not all use cases).
I actually was inspired to build the library after teaching a one week intensive pandas course to a couple of Data Scientists @ a Fortune 500... (pandas is really hard for beginners!)
While performance and OOM aren't priorities right now, I'd love to one day replace the pandas "backend" with Arrow (or something else) once I nail the API :)
I interviewed author Ritchie Vink on my newsletter (NotANumber) some months back, he's smart and the library has a nice design.
I still barely know anyone trying it, it did get a write up just recently here: https://news.ycombinator.com/item?id=32998040
On one of my recent Higher Performance training courses a hedge fund person said they'd tried it with mixed success - some things faster than Pandas, others more expensive on RAM.
I'm hoping to have a play soon but have only lightly tinkered so far.
The parallel-by-design nature is nice, but I think the API is still evolving rapidly making it harder to develop with.
In addition to some ways to try to not have nogil have as much overhead he added a lot of unrelated speed improvements so that python without the gil would still be faster not slower in single thread mode. They seem to have merged those performance patches first that means if they add his Gil removal patches in say python 3.12 it will still be substantially slower then 3.11 although faster then 3.10. I hope that doesn't stop them from removing the gil (at least by default)
It's not a language feature, but I wanted to point out a new aspect of how Python is released: releases are now signed with Sigstore[1], producing a certificate and signature for each source distribution/build in the release table[2].
This is intended to supplement the current PGP signatures, giving Python distributors (and the packagers themselves) a similar degree of authenticity/identity without needing to perform PGP keyring maintenance.
For anyone browsing on Android and confused, the sigstore website has a major design issue hiding the menu button on some devices. You need to scroll the page to the right: https://github.com/sigstore/sigstore-website/issues/132
‘di said it, but to emphasize: with sigstore, there is no key management whatsoever. The keys in question are ephemeral and never leave your machine; the entire idea of the project is to bind an identity (like an email or GitHub username) to short-lived signing certificates.
So in addition to getting the file from python.org over HTTPS, you get a certificate attesting that sigstore automatically checked that it was released by the people who owned python.org? What security does this add?
PGP has a web-of-trust aspect, allowing people to trust people. What is the point of something doing automated verification of identity, on top of the one done for HTTPS certificate issuance?
The certificate specifically demonstrates that the <release-manager>@python.org identity signed the artifact.
So there's a) no long-lived private key for them to lose (because it's never stored after signing) and b) a consumer doesn't need to find the right key PGP ID, verify (somehow) that that key ID is associated with a given release manager -- they can just trust that the release manager is in control of their @python.org identity.
Additionally, with PGP, you have no idea if your private key is being used somewhere else to generate valid signatures maliciously. With Sigstore, in order for the signature to be valid, it must be published in a transparency log, which is continuously monitored. So in the event of if the key/identity is compromised, the identity owner can be made aware immediately and the signature revoked.
Not the people who own python.org, but the CPython release maintainers.
They’re different groups of people, which points to one of the potential benefits to sigstore verification here: people who download CPython from python.org can now additionally verify that the artifact was not tampered with on the server. They can, furthermore, mirror the artifact and its signing material on their own. In short, TLS provides delivery authenticity while sigstore providers publisher authenticity.
Sigstore has no idea who the publishers should be, they do automated verification based on a domain or GitHub account. The person owning the GitHub org might be some admin, just like with the domain. The automated process doesn't know and doesn't care.
That is where the concept of policies come in: an actual deployment of sigstore for verification purposes would establish the artifact’s identities via another mechanism. That mechanism is going to depend on the sigstore deployment: in the context of most package indices, for example, it probably makes sense to have the verification map to the index’s account metadata, package metadata, etc. That could, in turn, be made into a public commitment via a TUF repository.
However, that complexity does not apply to simple cases like the CPython one: for this case, you can verify that the identity matches the one of the public email identities of the CPython release team. This is no more complex that PGP identity verification, and is much more resilient (since anybody can publish a claimant key for an identity in PGP).
You're comparing Sigstore with the log and assuming ultimate trust in their verification process and CA to GPG without the web-of-trust. This is obviously apples-to-oranges.
You also linked to this random python.org HTTPS page which contains the list of people you are supposed to expect to have signed the Python releases. If this is the root of trust... it might has well have had PGP fingerprints.
The truth is that you login with OIDC and Sigstore signs your artifacts, giving you an attestation that the owner of that email/GitHub/... identity made that artifact, and publishes that to a persistent log. This makes the whole thing great for automation (both of publishing and verifying), but claiming that this adds to security is false.
Their Security Model page clearly outlines the limits of their system and is consistent with my characterization https://docs.sigstore.dev/security/:
> If an OIDC identity or OIDC provider is compromised, Fulcio might issue unauthorized certificates
> If Fulcio is compromised, it might issue unauthorized certificates
You have to trust OIDC providers, you have to trust the CA, and the presence of logs only allows those people to notice unauthorized issuance, not end users.
I think we're going in circles, because I've identified specific ways in which this adds to the security model that self-custodial PGP keys cannot:
* Sigstore uses short-lived keys and short-lived certificates, eliminating an entire common risk class where maintainers accidentally disclose their signing keys. This property alone eliminates the single largest source of illegitimate signing events in ecosystems like Windows software.
* The logs in question are public CT logs. In other words: anybody can audit them for unauthorized issuance, including the legitimate publishing identity. It's not particularly useful for the end (installing) user to audit the log, but it was never claimed that they would find it useful.
For the specific case of CPython, you're missing the point: CPython is an easy case, since the email identities of the release managers are well-known facts that can be cross-checked across python.org, GitHub, etc. Python.org is not currently a root of trust for sigstore, but it is for PGP (again, because anybody can claim an identity in PGP).
There are, of course, limitations. But these limitations are no strictly worse than trusting CA and IdP ecosystems that you're already trusting, which makes them strictly better than mystery meat PGP keys.
You're just replacing a "mystery meat" PGP key with a "mystery meat" email address or OIDC handle. As you point out, committing to one of those can easily be done by posting it on python.org, GitHub, etc with the major difference that PGP fingerprints are cryptographically tied to an identity rather than require a third-party like Sigstore to attest that the person had control of it at some point in the past.
It is also much more likely that someone managed to click one link in a developer's inbox once to complete the automated Sigstore verification, rather than they managed to steal their PGP keyring and passphrase.
I am not a fan of having to trust in developer's key-management abilities but this just shifts the problem very slightly, at significant cost.
The single advantage is obvious: this allows easy automated signing and verification, allowing enterprises to easily check boxes in their supply-chain-security checklist. This is valuable in itself, and I am all for automation, but I don't know why we have to claim that it is "more secure".
> PGP fingerprints are cryptographically tied to an identity rather than require a third-party like Sigstore to attest that the person had control of it at some point in the past.
A PGP fingerprint is tied to a PGP key, which is tied to a claimed identity. Anybody can claim to be you, me, or the President of the United States in the PGP ecosystem. Some keyservers will "verify" email-looking identities by doing a clickback challenge, but that's neither standard nor common.
In theory, you trust PGP identities because of the Web of Trust: you trust Bob and Bob trusts Sue, so you trust Sue. But it turns out nobody actually uses that, because it's (1) unergonomic and doesn't handle any of the normal failure cases that happen when codesigning (like rotation), and (2) it's been dead because of network abuse for years anyways[1].
> It is also much more likely that someone managed to click one link in a developer's inbox once to complete the automated Sigstore verification, rather than they managed to steal their PGP keyring and passphrase.
That's not how Sigstore does email identity verification; it uses a standard interactive OAuth flow. Those aren't flawless, but they're significantly better than a secret URL and fundamentally avoid the problem of secure key storage. Which, again, is actually where most codesigning failures occur.
OAuth flow is even worse, if you find someone's browser open and click the link, it will complete as long as they are currently logged into GitHub/Gmail/whatever provider. I am not claiming that key management is easy or foolproof, but when this is what we're comparing to...
And again, you don't have to use web-of-trust. It is there, which is an advantage. If you don't/can't use that, you can find a PGP fingerprint on a random HTTPS page, which will be just as easy to copy-paste as the list of email addresses you showed me a couple posts up... with the advantage that I can use them for verification directly, rather than involving third-party authorities.
> OAuth flow is even worse, if you find someone's browser open and click the link, it will complete as long as they are currently logged into GitHub/Gmail/whatever provider. I am not claiming that key management is easy or foolproof, but when this is what we're comparing to...
And the same can be said for PGP keyholders. There are very, very few threat models in which an open, logged-in computer is not a "game over" scenario (which is also why most password managers and authentication agents don't consider it a case worth guarding against). In other words: Sigstore is no worse than PGP key management in this manner, but is better in the other ways that matter.
Looking up PGP fingerprints on random HTTPS pages is not a scaleable or ergonomic solution, and not one that has ever succeeded. Remember: that is the status quo with both CPython and Python package distribution, and there is no evidence that either had gained any meaningful amount of adoption (either by packages or end users). The goal here is to enable users to sign packages without doing the things they've demonstrated they won't do.
(Also, we've focused on email identities. A separate goal is to allow GitHub Actions identities, which will require no interaction from a user's browser and has a threat model coextensive with the CI environment that many Python packages are already using to build and publish their distributions.)
> with the advantage that I can use them for verification directly, rather than involving third-party authorities.
I'm not sure what you mean by "third-party authorities" here. As a verifier, your operations can be entirely offline: you're verifying that the file, its signature, and certificates are consistent, that their claims are what you expect, and (optionally) that the entry has been included in the CT log. That latter part is the only online part, and it's optional (since you can opt for a weaker SET verification, demonstrating an inclusion promise).
> PGP has a web-of-trust aspect, allowing people to trust people
Must be great for the 5 people on the planet that maintain a personal web of trust with PGP while the rest of us just run whatever "curl|gpg --import" command the download page tells us to run, thus adding zero security on top of https.
The alternative is this, you get a nifty certificate from Sigstore that allows you to be sure that "somerando@outlook.com" has indeed authored the lib you depend on. How do you check if that's what you want? How much security do you get from knowing that the email is definitely correct for a person you know nothing about?
Anyone can generate a hash. Signing with a private key means that only the owner of that private key was able to generate a given signature. Singing with a private key which was bound to a known identity via a signing certificate proves that only that identity was in possession of the signing key, during a very short window.
In this specific case, we can say that the artifacts in question were verifiably signed by owner of the <release-manager>@python.org identity.
That's the point of the certificate authority and transparency log: it makes the public key known by publicly binding it to a verifiable identity.
A hash verifies integrity, but has no way to demonstrate any relationship to a signing identity. Signing is not just about integrity, but also being able to say _who_ generated the signature.
* You generate a keypair and use it to sign your Python installer (SP)
* Authority creates a signature of your public key (SA) and puts the whole thing in transparency logs (SP + SA)
* End users can check SA because they trust authority, can check SA is in transparency logs, therefore can trust your signature SP, therefore can trust your software
Why not just use a hash instead:
* You generate a hash of your Python installer (HP)
* Authority creates a signature of your hash (SA) and puts it in transparency logs (H + SA)
* End users can check SA because they trust authority, can check SA is in transparency logs, therefore can trust your hash, therefore can trust your software
What matters is that ultimately the contents of your software are signed by the Authority and a commitment of that is in logs. Why add this level of keys, that can't possibly be trusted since they are ephemeral?
This proposed scheme lacks individual publishing identities, which are desirable. The goal here isn't just to churn out random signatures; it's to be able to associate specific identities (such as a person, GitHub repository, or specific CI workflow) with the production of an artifact.
It also requires the user to trust the signer to do proper secret generation, which is weakens the scheme. With Sigstore, the entire CA and CT infrastructure can fail or be compromised, but the certificates (and the ephemeral keys that they bind to) remain sound. That too is desirable, which is why the TLS PKI ecosystem is the way it is.
Edit: To be clear, the PGP equivalent for your scheme would be "trust Joe Public to sign for everyone on PyPI, he's reliable." If you can see why that doesn't work, you should also be able to see why your alternative to Sigstore won't work.
The certificates remain sound? What does that mean? Those certificates are never to be reused (ephemeral). Do you mean the signature remains valid and secure?
And whether the CA signs "identity information + public key" or "identity information + software hash", I don't see the different in "identities", no matter what that means to you.
Please, give a concrete example of information that is available/verifiable in one scheme and not the other. You both keep saying vague things like "it lacks identities" or "there's a binding" etc and I really don't see it.
The idea behind sigstore is to enable an end user (which, for CPython, might be someone who intends to build it from source for inclusion in a package manager) to verify that the artifact is the same one produced by a trusted entity, not just the same one downloaded from a server. A strong cryptographic hash only provides the latter.
The keypair is bound to a public identity via OIDC. That’s where the authenticity comes from; modulo a breach in GitHub’s IdP, you can be confident that the keypair corresponds to a signing action performed by the GitHub identity (either username or repository).
> The keypair is bound to a public identity via OIDC
It is not "bound" to anything cryptographically. Sigstore checks that you own the OIDC account, and if yes, it signs your public key and puts it in the log. Why not just sign your software's hash and put it in the log, "binding" it as you say?
This is true, but only because a cryptographic binding to the OIDC JWT would be meaningless. Fulcio could conceivably hash the JWT and add it as another certificate extension, but I don't see why it would (since nobody is expected to "burn" the JWT by publishing it after expiry).
> Why not just sign your software's hash and put it in the log, "binding" it as you say?
That's exactly what it's doing. Is the objection you have solely to the fact that it can be done with short-lived keys?
It’s very close to the existing PKI ecosystem for TLS: the CA is presented a possession proof for the locally held private key, and mints a signing certificate for it.
There is no singular “root certificate”: there’s a trust root for the CA, a separate root for the transparency log, etc.
Nope. The private key is generated within the client each time a signing event occurs, and that's what is used to sign the artifact. It doesn't come from the certificate.
The certificate just binds the public key to the identity at a given point in time, in a public way. This certificate is generated every time you sign something, and is put in the transparency log.
Exciting release. All useful additions. Love the Variadic Generics (embed array layout into its type top avoid confusion). A surprisingly common issue in data science code.
But.. I am I the only one who struggles to parse the Exception groups?
Would it not have been better to left or right align the exception group id? Centering them just clobbers them with the actual error output and makes it a bit hard to parse.
That output looks super complicated, but if you get an error like that then I think you're in a super complicated situation to start with: you've started a hierarchy of tasks, of which 6 raised exceptions (only counting leaf-node exceptions) at 4 different levels of the hierarchy. I could believe that left aligning the exception group index could've made it a little simpler though.
If you're noticing that the numbers that form a list are right below each other in the same column, it kind of makes sense. Suddenly it seems a lot more ordered. Could be done differently though. Left alignment seems clearer:
I have dropped flake8 everywhere due to how hostile it has become to the rest of the python ecosystem. They pull a lot of nonsense like this, such as a refusal to fix their dependency pinning with no logical reasoning.
Besides… Between Black / Tan for cosmetic issues and Mypy / Pylance / Pyright for logical issues, flake8 has never since caught any concrete problem with my codebase and has solely been a source of things to disable or work around.
I find Pylint to be great, catches a lot, integrates well enough into pyproject, and the new standalone vscode extension is solid. If only I didn't have to restart the Pylint server every time I update a signature...
My key problem with pylint - and the main reason why I have been using flake8 - is its terrible defaults; it takes huge amount of configuration to get a sane set of reports. But I might need to look into it again...
I've never tried adding it to a large project, but in starting with it early I'm continually impressed with what it finds. "Unnecessary comprehension in any/all" was the latest surprise.
I have just three configs:
- ignore TODOs (but only in pre-commit so i still get the IDE squiggles)
I think python 3.11 has effectively killed off both Pypy and Pyston. Now that the CPython team has finally shown both willingness and ability to deal with performance problems, few people are going to fool around with some esoteric version of python for an increasingly questionable performance-gains/headache ratio. Especially given how painful it already is to package and deploy normal python code and how hostile Guido always has been to alternative implementations. I don't think being maybe 2x faster right now is anywhere good enough to justify the additional risks and hassle, and it looks like the performance gap might shrink further with 3.12.
Pyston may be considered estoreric, but Pypy is pretty well-established already and is still a good deal faster. It could be that CPython starts to eat into its user base as it accumulates more performance gains, but Pypy is definitely not done yet.
Pyston is already at the "our incredible journey" stage ("We’re very excited about these changes and [...] Marius and I (Kevin) are planning to gradually reduce our time investment in the project").
Pypy, as a practical software deployment runtime has been and will remain esoteric (I do absolutely think that they had a positive impact on the wider python community, both in terms of dissemination of ideas and also practical engineering artifacts). But what's their market share relative to CPython? A thousandth of a percent or less? Has anyone actually built a significant business on top of Pypy?
There is, IMO no realistic path at that point that Pypy could become a viable CPython alternative. They are effectively competing with a hostile platform they need to maintain an extremely high amount of compatibility with, and that can and does move in directions that invalidate some fundamental engineering choices they make. Practical end results include that they're stuck w/ a lot of crippling design decisions (GIL, FFI API etc.) and a core part that is still in 2.7 land (RPython) and have historically mostly only been compatible to very outdated versions of python. This has improved a lot, but the next time CPython throws another curve ball the same thing is bound to happen again.
The only chance they really had was to be compellingly enough faster or otherwise superior that community pressure would have forced the CPython team into adopting a much more collaborative stance. That seems very unlikely to happen, now after all this time, given that now CPython is catching up and they have the albatross of the C extension ecosystem around their neck. Almost anyone who cares about python performance outside of algorithmic programming competitions will be using C extensions where Pypy offers no compelling advantage, some disadvantages, and by the momentum of the existing eco-system is unable to develop a superior alternative.
I just didn't think Pypy belonged in the same category as Pyston. There have been a few similar projects - Pyjion, Unladen Swallow and Nuitka also come to mind - each of them are/were impressive pieces of engineering, worthy of our admiration. But each of them have also seemed to hit a wall of some kind and have fallen by the wayside, as you said Pyston has. Meanwhile Pypy has stuck around for fifteen years, it may not be CPython but it'll still be around for a while yet I imagine.
At least among the scientific community, PyPy's FFI limitations are the main stumbling block. I'm not sure the other factors dissuade much from its adoption in that community (people have just started migrating to 3.6 from 2.7).
I have been professionally programming Python now since I guess 2012 and its Pypy is an interesting one. Pypy seems to be in use overall in fairly specialized Python application, like research code that is 'too big to rewrite', there are some legacy python applications I heard of successfully running on pypy for years, also you are able to get professional support in onboarding python programs on pypy. So pypy is often used in software that cannot continue to be operated on CPython for performance reasons and rewriting is not feasible / desirable.
Fond memories: I did use pypy or a predecessor in like 2004 I think when I took part in a student computer science competition and my algorithm searching for subgraphs wasn't performing well enough to terminate in time.
Ha funny, we have both (ab)used Pypy in exactly the same way :) I had an Advent of Code solution that I implemented horribly in Python, I knew it would spit out the right answer eventually but it was just taking too long. Rather than do a proper rewrite I figured I'd at least try it with Pypy just to see if I could be lazy, and sure enough I got my answer fairly quickly :D
Incidentally, I was working on a 15 year old Python 2 project for a client last week that used Pysco (the predecessor for PyPy). The cool thing about Pysco was that you could just import the library and it would make many operations much faster.
If PyPy had a similar mode where you could load it as a library it would have a much easier time gaining traction.
When you're dealing with one of the most popular languages in the world and the cost of getting it wrong and breaking things is astronomical, being risk adverse and starting small seems like the way to go.
Hasn't Microsoft experimented with disabling the JIT for security reasons?[1] Doesn't have much relevance for Python at the moment, but I'm just mentioning it to underline that a modern JIT can be very complex thing and keeping it simple for 3.11 seems like a very reasonable requirement.
JavaScript has multiple companies with valuations measured in billions of USD competing. Optimizing JavaScript is just as hard but they can throw massive development teams at the problem. Python isn’t without resources but the level of scale is dramatically different.
Yeah, I thought about that when I was writing that but didn’t want to get into debates about what fraction of those resources they put into web stuff. Even if you ignore the big ones however, Mozilla is just web stuff and they’re still much bigger than the core Python team.
(Mozilla does support a ton of OSS, too, so read that as everyone else needing to step up rather than an attack on them)
Exactly: V8 benefits almost everyone using JavaScript, TF doesn’t benefit anyone not doing ML with that stack.
All I’m saying is that asking why JS performs faster than Python in some cases is less about the languages and more what you could do with hundreds of engineers working for years. If the stars had aligned differently and that effort had gone into Python (or Ruby, etc.) I’d expect a similar delta.
Because those investing into machine learning frameworks in Python rather spend their resources rewriting their libraries in C, C++ and Fortran while calling it "Python".
Python's ease of calling C libraries is both the blessing and a curse.
My opinion is that it's mostly a curse, since you are hampering the language's evolution and growth for a mere temporary benefit.
Not to mention the horror of distributing compiled libraries, which is one of the biggest reasons why packaging in Python is still such a nightmare.
Making CPython faster by getting rid of the GIL will do wonders for this language and it's community. It will make it much more portable, too. Think of Java-level portability, but in a much nicer package.
Sure, my point was just that the number of corporate investment in the core Python language is smaller. Investing in an ML framework isn’t a bad thing but it doesn’t support anyone working on the core language the way Google, Apple, Microsoft, Samsung, etc. contribute directly to Chromium/WebKit development.
I’ve heard this “no C API” thing echoed by a couple people and it’s baffling. Do folks really think three major JS engines all written in C++ wouldn’t have an interface to interact with C?
The problem is evolving the interpreter in a way that doesn't break bazillions of third party bindings -- Win32, OpenGL, Fortran for NumPy/Pandas, databases, GPUs, every C and C++ library ever, etc.
Python's C API exposes ref counting and the GIL. It's also very large
JS doesn't have that problem -- more code is written in pure JS, there are no C/C++ bindings in the browser.
There are C/C++ bindings in node.js for v8, but as far as I know they are discouraged and not used very much. The bindings are more "first party" in node.js than third party.
They have issues, but not the same ones as CPython, because the API is very different
JS VMs must be re-entrant because they're embedded in browsers. That was never the case for CPython
This is unfortunately a bit misinformed. You are thinking of a specific JavaScript implementation (browsers). The language has heavy adoption in the native space. Check out Node.js!
JavaScript does interact with random C libraries :)
Numba focuses on scientific use cases speeding up most of numpy and some of scipy. Whilst it can compile some parts of pure Python (eg numeric array loops), generally you have to copy the data in to the numba side which can be slow in volume.
Numba isn't really meant for pure Python speed-ups, it gets rid of numpy specific inefficiencies.
Personally I'm pretty excited to see a fresh JIT working its way into core CPython (I'm the co-author of O'Reilly's High Performance Python).
The low hanging fruit is to implement something like a built in Cython without any type annotations. You get a speedup from not having bytecode but still have all the PyObject attribute lookups performed from native code.
Are there any plans to change Python the language to make it faster? AFAIK most of the slowness comes from dynamic overhead, like object attributes may change or disappear in the middle of a loop and so on.
The hardest part about optimizing Python is the fact that there are so many packages out there relying on every single documented and undocumented aspect of the C/C++ interface, which is incredibly broad and directly tied to the internals of the language. If they change anything in a non-backwards compatible manner many of those packages will break, and those packages are the reason so many people use Python to begin with.
Javascript never had this problem, as all code was always written in Javascript itself by necessity, so it was far easier to optimize as you did not have to worry about backwards compatibility of the internals.
That's a excellent point. That ecosystem is also one of the main reason packaging in python is a bit of a clusterf*ck.
For C/C++ extensions, I think there may be hope to support a slow/emulating C API and a faster, less internals-leaking new API that extensions could adopt. It would take years for the migration, but if speed gains was say 5x, I think it could be realistic.
pypy managed to emulate the C API fairly well, after all. E.g. you can build numpy and pandas on top of pypy and it actually kinda works.
Our sloppy container spec bit us today though. We had
FROM: python:3-slim
with a bunch of pip requirements following. Some of those were not 3.11 ready, eg scipy==1.8.0, and our build broke. Our answer was to not be sloppy and pin until everything catches up, eg
FROM: python:3.10.8-slim
and we're good. Hope someone sees this that needs reminding.
Good question and thank you for raising that possibility. I am ignorant here. Of course I'd prefer patches applied asap but...
We are almost daily discovering upstream changes like this one that breaks something N components removed so our kneejerk response is usually to pin aggressively when found and periodically upgrade deps for a whole component.
What are the chances I have some dep somewhere that says python<=3.10.8 and is working today but when that 3.10-slim spec allows 3.10.8 turn into 3.10.9 it will break? That's what happened today for scipy but on the 2nd int and not the third one, because we had started with 3-slim.
A related note is for any requirements files. Something like this bit me the other day.
Libraryname >=3.1
After a few years,the package was updated substantially and has lots of breaking changes in the recent branch. Fix was to so ==3.1 until we work out the next step
I was bitten once a few years ago and have always pinned everything since then.
And of course last year I was bitten by a ~ in a package.json that I hadn't got around to pinning in a code base I'd inherited.
Not a great idea, IMHO. Lots of package maintainers are not at all interested in explicitly supporting unreleased versions of Python. I would certainly be one of them.
That said, Python releases generally don't introduce syntax-breaking changes. APIs are sometimes deprecated, but these have large windows that give maintainers plenty of advance notice. Years, typically. Practically all 3.10 code should run fine on 3.11, including your pyarrow.
Another thing to consider: the Python community is historically really bad at updating to the "latest and greatest". This [1] resource suggests that 80% of Python versions in the wild are Python 3.6-3.8!!
I think this site is specifically keying in on the "Programming Language :: Python :: 3.11" PyPI classifier... While many packages haven't been updated to include this classifier, if you were to run all of the unit tests for all of the "Top 360" libraries on 3.11, I'd be willing to bet 350+ would pass without failure.
Many of the gray projects listed never even specify the minor number. Looking at setuptools, they only list Python :: 3.
If you go back to https://pyreadiness.org/3.8/ there are 20% of packages not explicitly supported, many from the top 100 list, anyone using these packages know this is not a concern. (Going back even further in versions you start to see dropped support instead)
Deducting these 20% should give a more accurate picture, and even then I’d bet most other packages work anyway, like you say.
> Big ask, but can we wait to release until X number of packages are supported before releasing? And stay in RC until then?
No, that’s silly.
(1) There’s no reason for something that otherwise is ready for stable to stay in RC because an arbitrary number of edternal packages aren’t ready to say they are ready for it, and
(2) No user actually benefits from X number of packages being ready, they benefit from the specific packages they depend on being ready. So its best for them to track them, and upgrade only when all the specific packages they need support the new version.
This seems like a nice release. It would be even nicer if they would summarize the major new features with some examples and short summaries instead of linking to PEP's and Github issues. I don't think many people have the time to read all of those to get a gist of the new features and changes.
I hate that both the first (3.11.0) and the last (3.11.9) releases of a minor version branch are called "final".
I get it, 3.11.0 is "final" in the sense of "definitive" from the development team's point of view, the final one of the pre-releases. But 3.11.9 is also called "the ninth and final 3.11 bugfix update" in the schedule [1], the actual final one from the maintenance team's point of view, in the sense there will be no more.
Can't we find better terms, that work for everyone? 3.11.0 stable? 3.11.0 actual? For anyone but the dev team, this is in no way a "final" release, this is the "first" release.
Maybe "3.11.0 public" or "3.11.0 release" would be better suited to distinguish between the various dev-builds and the "final" release of the minor version
That the stdlib added support for parsing TOML but not for serializing to it (at least that was the case I read about the 3.11 when it was upcoming), sounds really short-sighted - couldn't they just add that too?
There was a long debate about this, and the conclusion was:
There is one way to read toml, but a lot of ways to write it (do you preserve formatting, what do you do with comments, do you allow partial updates...). Therefor reading is easy to get right, but writing less so, and once it's in the stdlib, we can't make quick changes to the API.
Since reading is already very useful, and tomli author's is ready to provide that for free, let's include it now and see writing later.
>There is one way to read toml, but a lot of ways to write it (do you preserve formatting, what do you do with comments, do you allow partial updates..
Then don't use TOML for internal stuff?
Or just settle on a way your lib writes it, and if every parser out there can read it anyway, and get the same deserialized structure, it doesn't matter which way you write it exactly...
>As for it was a good idea, this has already been debated and acted upon, there is no need to repeat exactly the same arguments.
Well, the same debates happened over many other decisions debated, acted upon, and later regretted and reverted. At some point the GIL seemed like a good decision too!
The GIL is a good decision and has been a sound engineering tradeoff for decades. It's only been the past few years that running the interpreter without a GIL has been even remotely feasible (cf the gilectomy project). The GIL is a relatively straightforward way to hold back a tremendous amount of complexity.
Similarly, TOML writing is a ton of completely. The org isn't opposed to adding toml writing to the stdlib fundamentally, they just aren't rushing and want to hammer out all of the grossness first.
Obviously the complexity level is different, but to me you sound like someone complaining about Linux adding support for a new filesystem, but read-only at first. Why all the hostility?
I have plenty of cases with yaml where I care about the formatting of the serialized output. I have to use obscure hooks into pyyaml's inner workings to tweak things just right. In some cases, I end up having to do some of the serialization myself. While technically it doesn't matter because it's all equivalent once in memory, there are externalized costs once you're integrated into a large system. The whole point of using something like yaml or toml is to be human friendly.
Parsing TOML is clearly useful, but the case for serialization is much less clear.
It's meant to be human-writable, and offers multiple ways to express the same table data; what format should a TOML serializer use?
I can see that you might want to programmatically edit a TOML file, preserving the rest of its layout unchanged. That's a bit fiddly to get right, and needs a different interface from a pure serializer. If the correct design isn't obvious, better to leave it out for now. It could still be added later.
>Parsing TOML is clearly useful, but the case for serialization is much less clear.
I never met any format where parsing it is useful but serializing to it is not. Maybe some binary format where I only care for e.g. playing a video or decoding an image. For a configuration format though?
The obvious use case is reading, altering AND writing back.
>It's meant to be human-writable, and offers multiple ways to express the same table data; what format should a TOML serializer use?
It should just pick one and stick with it?
If N versions of the format express the same data (deserialize into the same structure) then it doesn't really matter from a functional way which they pick.
Users would still need to pick a TOML lib to write data - so they already OK with that lib picking a specific way. Why wouldn't they be OK with Python do it?
If they don't want nobody to mess with their hand-written TOML files, they can always just not write them back from Python.
That was how I interpreted this (but I didn’t say “in general”):
If N versions of the format express the same data (deserialize into the same structure) then it doesn't really matter from a functional way which they pick.
To clarify my point, if you’re using TOML, presumably you want to be able to format things nicely by hand because that’s the whole point of TOML. If you don’t need to do that, just use JSON. So I don’t see a whole lot of value in a tool for serialising data into boilerplate TOML (or YAML, etc).
>presumably you want to be able to format things nicely by hand because that’s the whole point of TOML.
For me the whole point of TOML is as a stricter, saner, YAML-type (human readable that is) config first, and a format used in several places, including upcoming Python standards, second.
Couldn't care less about formatting things nicely by hand.
> The obvious use case is reading, altering AND writing back
That’s actually (if what you are reading is human-edited input thar you might transform but expect to be human edited) a hard case, because then you want to keep as much as possible the format entered in, while accommodating changes, which is easy to subjectively evaluate but seems potebtially hard to express mechanically in a way which would provide generally optimal UX. You don’t want to hand edit config and have a UI, but have an “I touched it with the UI and made a couple small changes out of dozens of entries and now its unrecognizable”.
If you spend a few minutes thinking about practical implications of a piece of code parsing toml and writing it back with stylistic changes, I'm sure you'll find many such situations.
I covered that though: this would be the case for ANY existing TOML serializing library. People already use those, and are OK with them picking a specific output style. So, they could use the functionality in the stdlib too.
Oh look TOML support. It feels good to see better support for that, as I had to argue with managers and related to use toml and to convince people that it was one of the ways forward wayyyyy back in the middle of covid.
Let‘s hope that Python in itself does not become a Kerr singularity. I don’t want to become an observer of this. And thinking of the image of a snake that eats its tail (forming a ring) makes me even fearful that there is some hidden message in this part of the release notes…
Python has a 12 month release schedule[0], so 3.14.0 should be released in October 2025. Of the python 3 versions, only 3.6 and 3.7 got to .15, about 4 and 5 years after their releases[1]. But 3.9.14 was released last month, less than two years after it's initial release. So somewhere between 2027 and 2030?
I’ve settled for pyenv and poetry for my projects.
As a former Homebrew maintainer, allow me to stress another thing:
Don’t use your system’s (or even your system-level package manager’s) Python environment for your Python projects.
That Python environment isn’t for you. It exists primarily for one single reason: to make other packages work that happen to depend on Python.
The same goes for Node.js, Ruby and other fast-evolving platforms.
Now, if you _do_ use that environment, the packaging police isn’t exactly coming for you. Just be aware that maintainers are free to version-bump or even remove the environment at any time without notice. That’s why you’re going to be happier and safer if you use *env- (pyenv, nodenv, …) managed installations.
This is a relatively new and dangerous way of thinking about operating systems. Due to the massive futureshock of libs rapidly changing under-foot people have had to switch to containerization as a mitigation. And now people have been containerizing so long they are starting to believe it is the proper way to do things.
It's not. Giving up the idea of an operating system with system libraries is very bad. The idea that you have to set up an entirely new lib environment to run every single script is absurd and it has dire consequences for software longevity and portability. With no more OS system as a base an OS is fractured into literally innumerable possibilities. Gone is the idea of a distro being the same for everyone using it. Gone is the ability to just install things. And we're left with a pile of containers that make debugging things when they go wrong even harder.
Nix is taking this concept to the extreme and absurd, but using pyenv for every script is almost as bad. pyenv is not version management approach to Python. It's a bandaid that doesn't address the actual issue.
> This is a relatively new and dangerous way of thinking about operating systems.
One could as well say: this is a response to faster-than-ever evolving platforms.
> Giving up the idea of an operating system with system libraries is very bad.
Good point. I don’t like the situation either.
I’m not sure there’s a good remedy. For example, how are system package maintainers supposed to know that your personal script is now ready to migrate, so they can finally bump system Python?
> using pyenv for every script is almost as bad. pyenv is not version management approach to Python. It's a bandaid that doesn't address the actual issue.
Not sure if I’m understanding you correctly here. You’re saying that using pyenv for every script is bad. Are you against venvs, too? Because you could apply a similar argument to those.
Firstly, this is about the core platform, not third-party libraries.
Secondly, I absolutely do keep all my projects in lock-step with the latest stable platform version.
My point is that I do that deliberately and in a controlled way, decoupled from my system package manager’s decisions.
I certainly don’t appreciate waking up to dozens of unexpected compile errors and warnings due to some random system Python version bump.
> Don’t use your system’s (or even your system-level package manager’s) Python environment for your Python projects.
I get the reasoning but I disagree with this as a blanket statement.
For my own stuff, I stick with certain OS versions that I know and trust. If I standardize on Ubuntu 22.04 for example, I know that it will only ever ship Python 3.10 (plus patch releases). If I ever need another version, that's what the deadsnakes repositories are for. Ubuntu is never going to spontaneously upgrade the `python3` package to 3.11, especially not in an LTS release.
I understand the situation is different on Mac (and possibly Windows?) as they have a history of shipping outdated (and/or broken) Python environments and any system upgrade has the potential to bump your Python version.
Python's virtual environments may be somewhat clunky but it's entirely possible (and not even that hard) to keep project libraries and dependencies completely isolated from the system's with the various venv tools that exist these days.
At work, we have a large in-house ecosystem of scripts, modules, and packages written in Python so it's easier to tell developers, "pyenv is our supported Python environment, use anything else at your own risk."
You’re right of course; I’ve never really used Ubuntu, so I’ve never come to enjoy that level of stability.
I used to have a Mac, which is shipping with outdated platforms, and used Homebrew on it, which has a rolling-release model. I’ve switched to Arch since, which also happens to be a rolling release. That’s the context I’m coming from.
> Ubuntu is never going to spontaneously upgrade the `python3` package to 3.11, especially not in an LTS release.
Even then, your teammate may use a different LTS or even different distro than you do, ending up using a different Python version on your project than you do. I wouldn’t be willing to deal with that drift.
> Python's virtual environments may be somewhat clunky but it's entirely possible (and not even that hard) to keep project libraries and dependencies completely isolated from the system's with the various venv tools that exist these days.
Absolutely. However, even when using venvs, I still don’t want my venv to point to /usr/bin/python. Hence the `pyenv install -s && pyenv exec pip install poetry && pyenv exec poetry install` instead of just `poetry install`.
Windows doesn't ship Python out of the box at all, so there's no "system-level Python" there. The closest you can get to that is installing it from the app Store, but each major Python version gets a separate package there.
If you mean finding an entry-level engineering position, then it's the same as any other job search. Leverage your network to find openings, build something that shows you know enough Python to be dangerous and put it on GitHub, etc.
> Python 3.11 is up to 10-60% faster than Python 3.10.
Horrayy!!
Does anyone has any experience with working with python in a very large scale?
Most of the big tech companies I worked with use Java, I remember when working with large scale JS projects it was nightmare to debug, TS came along and really saved the working experience and scale of JS/TS projects. I've seen TS adopted almost everywhere in big tech companies, some even create microservices with node.
When choosing tech stack for a very small projects I tend to use Django, but when I want to create my own large scale business one day I'm quite afraid to use it because of the typing problems, I might choose node/nestJS, but just wonder how does it work in large scale businesses that use python mainly? Is it a nightmare to debug?
I work on a reasonably large Django project (1M+ LOC) with approximately 100 developers and mostly have a reasonable time debugging. We use mypy, lots of linters (some custom flake8 plugins), and a fairly strict layered architecture[0] that's enforced with tooling[1].
Without the tooling it's a nightmare. My previous Django project with approx 500k LOC had linting and some typing and that was a mess.
Is there any chance of making Octopus's vendor.bundle.js less than 15MB? It takes time to load on any device, and the size makes the site unusable on mobile.
MyPy provides static analysis that uses the type hints to actually, well, do things with it. Type hints are the specification, but MyPy|PyLance|Pyright|Pyre use those hints for and type checking.
~500k loc python project here. Detest it. For being such an old and established language, the ecosystem/community is very immature. Most stuff is unstable, there are thousands of ways of doing anything, no one has settled on which tooling to use so all projects are a miss-match of what was in vogue the month it got started. Typing is almost useless, like Typescript in the early days where you really cannot trust it not blowing up runtime even though you think you have typed everything. Especially since most libraries do stuff with (*args, *kwargs) and the type checker has to give up, or decorations not propagating types correctly, or Django ORM lying about types. You think you're safe, but secretly everything is just typed as Any. No safety enforced between modules, so every piece of code you write can be considered part of a modules public API as people will import and use it. Really lacking stdlib making fugly list comprehensions with mutations the standard way of solving things that no one will understand in a years time. Dependencies are a hell, everything break all the time, and "works on my computer" could lead to days of trying to figure out the setup on a coworkers computer where it doesn't work.
I never really understood this distinction between large and small projects.
Why not invest into good boundaries and turn your large project into a group of small projects?
500k LOC project should have plenty of natural boundaries. A team should recognize and draw those, regardless of the language being used.
I recently worked at ~200k LOC Django project: the code was far from perfect, and yet I had no trouble onboarding new team members and making them productive. Here's an isolated 20k LOC domain, you'll grasp it in a week, you'll ship on your first day and then almost every day afterwards, and eventually your knowledge will extend to other areas. Isn't that how every big project should be managed?
Sure, things like strong typing do make the monolithic ball of mud more maintainable. But how about not building big ball of mud in the first place?
A better language would enforce those boundaries. In python you can't, not without making it a completely new app. And when you first is in the situation, it's virtually impossible to separate compared to for instance java.
It's always easy to say "just be better and more diligent programmers", but that doesn't work. If the language promote spagetti, spagetti will be written.
> It's always easy to say "just be better and more diligent programmers", but that doesn't work. If the language promote spagetti, spagetti will be written.
Oh, I completely agree and I would never say that.
But at the same time, Java promotes complexity and overengineering. I've seen 10+ nested classes for something that was a 5-line function in Python.
The big difference for me is when I talk to Python engineer they agree that their spaghetti sucks. They want to evolve out of it, they just haven't found a way yet.
It is much harder to convince Java folks that their class hierarchies are useless.
Fixing Python spaghetti is way easier than fixing Java folks mindset.
> But at the same time, Java promotes complexity and overengineering
Nothing in Java inherently does that. It's actually improved quite dramatically since Java 8, with many features like records, pattern matching, lambdas, SAMs, etc.
I agree with you that using Python for very large projects might not be the best choice. I love programming in Common Lisp, and there are similar issues as Python.
For huge projects, I still think that Java is a good choice, and although I have only professionally worked on one Haskell project (medium size), I think that Haskell might be good if a team is in place who can use it. A new friend of mine in town is enthusiastic about OCaml, and after a few evenings of studying, I wish that about 8 years ago when I started Haskell I had chosen OCaml for a production typed language.
For Python: I really like Python for deep learning, reinforcement learning, quick and small semantic web apps, etc. The common thread here is that I am not writing much Python myself, instead I am exploiting large well tested libraries.
> I wish that about 8 years ago when I started Haskell I had chosen OCaml for a production typed language.
Do you mind writing a bit more about why? I have been a curious bystander in OCaml land but some of the differences with Haskell, like the lack of type classes, have pushed me toward the latter.
I feel very similar but have struggled to set aside enough time to find a better replacement. For work I often build one-off scripts, web scrapers/automaters, data tools, and backend web apps/APIs. While I don't disagree with your comments about the ecosystem, I find myself very dependent on it to do the aforementioned work (playwright + beautifulsoup, peewee/native sqlite3 lib, numpy + scikit, Flask/Django) and is probably the main reason I've continued using it. Does anyone have recommendations for some directions I could research? Go and/or Rust seem to be clear contenders but I'm not sure the ecosystem has equivalents or at least mature-enough equivalents for the libraries I use. Very open to learning about other languages too but simply am out of the loop. Something with a great type system and some reasonable flexibility would be amazing (eg I like that I can mix classes and functions in modules easily in Python compared to say old-school Java where everything is a class). I'm also not looking for a language that's primarily functional at this time, too much to learn right now on top of a new language, but it's on my long term to-do list.
I think the issues you encountered may be due to the specific libraries. Lots of the pre-typing libraries haven’t adopted static typing, like Django and Celery and then when your project is 95% Django and Celery you’re SOL.
I’m not even sure it’s possible to have Django typed without reworking the ORM, I’m thinking about reverse relations, .annotate(), etc.
Yes, there are type stubs for these libraries but they’re either forced to be more strict, preventing use of dynamism, or opt for being less strict but allowing you to use all the library features, at the cost of safety.
I think in the end, new libraries built with static typing in mind, like Pydantic, FastAPI, and Edgedb, are the answer.
> Yes, there are type stubs for these libraries but they’re either forced to be more strict, preventing use of dynamism, or opt for being less strict but allowing you to use all the library features, at the cost of safety.
The problem is that you lose all help from tooling/IDEs. Like in Celery, the definition is "shared_task(*args, *kwargs)". This gives you no indication of what parameters you actually can use. Opening up the code doesn't help, as it's many layers down. The decorated function ends up untyped, but with some new methods on it that again are untyped. But like originalfunction.delay(...) should have the params of the original decorated function. But no, all that is lost. Just pray that the docs are correct.*
While it's of course not ideal, stub files can help with this issue. For example you can get stubs for Celery that make both `shared_task` and `delay` properly typed: https://github.com/sbdchd/celery-types
I currently work at a startup that uses Python for everything. Since we're data science oriented (and Python is the only programming language the data scientists kinda-sorta know). And the CTO fears having to hire for more than one backend language (even though we have more people with prior Java experience than Python, and all of the backend API developers hate working with Python).
We get by, but it's pretty awful. People will give you all sorts of arguments for why loose typing and an NPM-like sloppy ecosystem are advantages for Python. And for the domains where Python really shines (i.e. devops, and manual data-wrangling), maybe these are advantageous. But what they DON'T tell you is that best practices for large-scale backend development call for bolting-on so many extra things, and accepting so many constraints, that it wipes out the purported advantages.
You'll need to use at least one linter, if not two, and a style checker to enforce the use of type hints. This gets you to something that is maybe 50-75% as effective as a Java compiler, but will never be any better than that, and will never have anywhere near the same level of IDE integration.
If you have a lot of dependencies (and you will), then you'll need a proper build system like Poetry. Which puts you right back in that Maven world, that escaping from was supposed to be one of the advantages. And the dependency ecosystem is SO... FREAKING... BAD. Unlike Java, where most of the major libraries are professional in nature with corporate financial backing, most Python libraries are "pure" community and written by unpaid volunteers or hobbyists. Many of whom have NO understanding of semantic versioning, so you find yourself having to pin ALL of your dependencies to specific fixed versions so the spider web doesn't tangle up. Upgrading anything is a nightmare.
You'll be told that it's easier to hire for. But then you'll find that the candidate pool is largely ops or QA/QE people looking to transition into engineering, boot camp grads, and other people with no professional experience in backend API development or even Python itself. Most capable candidates that you hire will take the job for other reasons (e.g. wanting some exposure to data science or ML), and will complain constantly about having to work with Python for general backend services.
You can get by, but as the project or company grows, you'll probably find yourself either re-writing Python portions, or at least deprecating them as legacy and eventually migrating to some other new greenfield thing. I can't imagine any sane reason to migrate a Java or .NET project in the opposite direction.
I've seen all-Python shops before and as the codebase grows they tend to regret it. The one that came to mind migrated to Java. I am one of the former devops guys myself and initially focused on Python to get into fulltime software, so I understood your comment completely.
I have years under my belt now, and what I ended up preferring is C# and TypeScript. I would only build a product on one of those two. This random person's opinion is that your company would probably be best served by using Python for the data team and TypeScript for front and backend.
I've been on large projects written in Python, it's no good. (I say this as a Python fan!)
For scripts and small projects or rapid prototyping Python is pretty much king. (Although Red/Rebol is a contender. Check out their GUI examples!)
For medium-sized projects (100K LoC, 1-10 devs) you might use Go or you could experiment with the usual suspects: Lisp or Nim or Haskell or OCaml or whatever you want.
For a large project (millions of LoC, 100's of devs) I would use Ada or Java unless the problem was very Erlang-shaped (in which case you would use Erlang (or Elixir.))
For front-end work I'll personally never use anything but Elm for the foreseeable future. Based on economics. Compared to Elm the entire JS ecosystem is a incredibly massive boondoggle, a total waste. (I'm cranky this morning and I'm practically trolling here. Apologies to those who are not entertained. I mean it though: in my considered opinion Elm mocks JS, brutally.)
What exactly happens between 20k and 100k LOC (or 2 devs and 10 devs) that makes Go more suitable than Python?
What exactly happens between 100k and 1kk LOC (or 100 devs and 500 devs) that makes Java more appropriate than Go?
How do you even find this boundary when same apps in Java and Python might have 2-3x LOC count difference?
I've seen this mantra repeated for decades (technologies X/Y/Z for small/medium/large projects) and could never make sense of it. For example Angular was always sold as a technology for "large" projects (as opposed to React). Where is Angular now? All those "large" projects are now legacy that everyone despises. And if you have 100+ developers on a "large" Angular project I bet you're are feeling fucked now.
Since everything breaks at scale, I'm actually into the opposite mantra: only the tech stacks that are great at small scale are capable of being great at large scale. Note: capable, not guaranteed.
I always prefer layering complexity on top of a simple technology, instead of praying that complexity inside of a complex technology will perfectly match my needs.
> What exactly happens between 20k and 100k LOC (or 2 devs and 10 devs) that makes Go more suitable than Python?
> What exactly happens between 100k and 1kk LOC (or 100 devs and 500 devs) that makes Java more appropriate than Go?
> How do you even find this boundary when same apps in Java and Python might have 2-3x LOC count difference?
I've been thinking about those questions for roughly thirty years, and I still don't have exact answers. (If I did I'd be famous and hopefully rich, but that's another story, eh?) Speaking in broad generalities, the main thing seems to be something like the number of separate concepts you have to keep in mind to make (non-breaking) changes to the system. The more that the system you're using can do for you, the more suitable it is for larger-scale projects.
> I've seen this mantra repeated for decades (technologies X/Y/Z for small/medium/large projects) and could never make sense of it.
I gotta say, it's not a "mantra", it's experience. Have you not worked with various languages on various sized projects?
(In re: Angular, to me that was an obvious shitshow from day one. I would never hire anyone who admitted to ever thinking Angular was a good idea. Same thing in re: Heroku, while I'm at it. Lots of people do lots of foolish things when computers get in the mix.)
> I'm actually into the opposite mantra: only the tech stacks that are great at small scale are capable of being great at large scale. Note: capable, not guaranteed.
I don't see how that follows? (I also don't see how that's "opposite" to the other "mantra".)
Kotlin may have many advantages over Python as a language (subjective), but the Java ecosystem is not an attraction over the Python ecosystem to me (both have flaws, but the Java world has a lot of bloated stuff)
After using Kotlin in Spring projects for a couple of years, I rewrote most of it to Java 17 and stick to Kotlin strictly in Android work. It's not that much different anymore with all the recent additions to Java, and as Java progresses (and the main JVM implementation which is first and foremost being developed for Java) and diverges in other directions in some very important areas (value types, Loom), Kotlin starts to feel more and more like a second-class citizen. I mean, that's what some old-timers were preaching right here on HN from the very start.
Things like Hibernate require some pretty ugly (IMHO) hacks to work properly (you need something like three compiler plugins, and spread keywords around like there's no tomorrow).
The compiler was also very slow compared to Java. The same amount of code written in the same style took 2× times to build. This may have improved.
Probably the only thing I miss are nullability annotations (after getting used to them in C#, not Kotlin).
Edit: after re-reading what I wrote, the tone feels way too critical. Kotlin is a nice language, and certainly easier & more fun to both read and write (although it's trivial to write unreadable mess by going too far with its features). It just doesn't go far enough (for me) to warrant using a separate language with all that entails.
You raise an interesting point: Maybe Kotlin was the push Java core devs needed to increase their rate of innovation ("release cadence"). I saw similar with Netscape vs IE, Chrome vs IE, Clang vs GCC, Boost vs C++ foundation library, .NET vs Java, Java+.NET vs C++, Node vs Deno. Kotlin can continue to experiment with new and interesting language features. Those that prove most popular can be adopted by Java. Also, as a secondary effect, since Jetbrains wrote Kotlin and had the best IDE, they had a head-start to build the community. That also helped to sell more IDE licenses. (I have nothing against this 1980s spy movie "master plan": They are great company with many great products.)
Java has adopted maybe 20% of the things that make Kotlin better, but due to backwards compatibility it'll never catch up.
Personally I think it'd almost never make sense to adopt Java over Kotlin for any greenfield JVM project, unless the devs you hire are truly so low-skill they can't pick it up.
> Java has adopted maybe 20% of the things that make Kotlin better, but due to backwards compatibility it'll never catch up.
Kotlin supports neither structural pattern matching nor guarded patterns as they are available in both Java 18 and Scala. It's Kotlin that needs catching up nowadays.
In Java 18, this is the pattern matching you're talking about.
`if (o instanceof String s) {`
Not much benefits over Kotlin's smart casts.
Guarded patterns with this version of pattern matching should be translatable to adding another "and" condition. The Kotlin compiler should have no problem reasoning with that.
---
I find positional destructuring in Scala (which works similarly in Java 19) a bad design.
What's the point of Kotlin, aside absence of semicolons and the 'val' keyword, if it plays the catch up game with the rest of the features becoming available first in Java?
> They can't have it instantly.
Why not if Scala did it way before pattern matching was introduced into Java?
Read-only interface by default - `MutableList` vs `List`.
if-expression, they just look nicer than `a ? b : c`.
---
You're right that Kotlin is not an "abstraction atop Java (the language)". It's meant to be a better Java that feels familar to Java. With that rationale in mind, it's reasonable that they left some features from FP languages out of Kotlin.
As I have said in another reply, it was a great decision to not follow what Scala did with pattern matching.
The gap has closed somewhat for sure, but there's still quite a few nice features in Kotlin that Java doesn't have. https://gist.github.com/makosblade/8fb1f577d81f02025784093d2... does a good job at identifying most. I'd add the semantics around lambdas are much nicer better in Kotlin.
With the passionate hatred both languages inspire, I'm not sure which is which in your statement.
One could argue C++ gets even more done, since it is appropriate for use in more contexts, like firmwares/drivers, native SDKs, games, etc. Java's main use cases have strong competition in languages like C#, Go, C++, Python, etc.
Yeah, been on multiple, and for me it felt like a nightmare (maybe I'm just not smart enough) I'll use python for small tools, depending on the planned deployment. But as a general rule, everything north of 2 months developer time (being it just me, or multiple devs) gets a language with static types, at least for the non-glue main components. Glad that JS got TS types for that. My approach has served me well so far.
I find mypy to be underwhelming in comparison to Typescript, but maybe it's just the ecosystem. Most JS libraries nowadays are either written in Typescript or have type definitions, but some big Python libraries don't (or the types are not complete).
That's a bad idea given the sheer amount of "undefined behavior" in the language. To remind, UB means that your app can do what you expect, or it might delete all your files... or it might cause a time-travel paradox erasing your very existence to avoid the offending code from ever being written in the first place. You never know!
But remember that while it's great to play with it freshly out of the oven, and that you might want to test your projects/libs with it, we should wait a bit before migrating production.
Indeed, every first release of a new major version of Python eventually important bugs that get ironed out in a later patch. Also, some libs on pypi may simply not be compatible with it yet, breaking your pip install.
I usually wait until the 3rd patch myself, after many years of paying the price of greedy upgrades.