~500k loc python project here. Detest it. For being such an old and established language, the ecosystem/community is very immature. Most stuff is unstable, there are thousands of ways of doing anything, no one has settled on which tooling to use so all projects are a miss-match of what was in vogue the month it got started. Typing is almost useless, like Typescript in the early days where you really cannot trust it not blowing up runtime even though you think you have typed everything. Especially since most libraries do stuff with (*args, *kwargs) and the type checker has to give up, or decorations not propagating types correctly, or Django ORM lying about types. You think you're safe, but secretly everything is just typed as Any. No safety enforced between modules, so every piece of code you write can be considered part of a modules public API as people will import and use it. Really lacking stdlib making fugly list comprehensions with mutations the standard way of solving things that no one will understand in a years time. Dependencies are a hell, everything break all the time, and "works on my computer" could lead to days of trying to figure out the setup on a coworkers computer where it doesn't work.
I never really understood this distinction between large and small projects.
Why not invest into good boundaries and turn your large project into a group of small projects?
500k LOC project should have plenty of natural boundaries. A team should recognize and draw those, regardless of the language being used.
I recently worked at ~200k LOC Django project: the code was far from perfect, and yet I had no trouble onboarding new team members and making them productive. Here's an isolated 20k LOC domain, you'll grasp it in a week, you'll ship on your first day and then almost every day afterwards, and eventually your knowledge will extend to other areas. Isn't that how every big project should be managed?
Sure, things like strong typing do make the monolithic ball of mud more maintainable. But how about not building big ball of mud in the first place?
A better language would enforce those boundaries. In python you can't, not without making it a completely new app. And when you first is in the situation, it's virtually impossible to separate compared to for instance java.
It's always easy to say "just be better and more diligent programmers", but that doesn't work. If the language promote spagetti, spagetti will be written.
> It's always easy to say "just be better and more diligent programmers", but that doesn't work. If the language promote spagetti, spagetti will be written.
Oh, I completely agree and I would never say that.
But at the same time, Java promotes complexity and overengineering. I've seen 10+ nested classes for something that was a 5-line function in Python.
The big difference for me is when I talk to Python engineer they agree that their spaghetti sucks. They want to evolve out of it, they just haven't found a way yet.
It is much harder to convince Java folks that their class hierarchies are useless.
Fixing Python spaghetti is way easier than fixing Java folks mindset.
> But at the same time, Java promotes complexity and overengineering
Nothing in Java inherently does that. It's actually improved quite dramatically since Java 8, with many features like records, pattern matching, lambdas, SAMs, etc.
I agree with you that using Python for very large projects might not be the best choice. I love programming in Common Lisp, and there are similar issues as Python.
For huge projects, I still think that Java is a good choice, and although I have only professionally worked on one Haskell project (medium size), I think that Haskell might be good if a team is in place who can use it. A new friend of mine in town is enthusiastic about OCaml, and after a few evenings of studying, I wish that about 8 years ago when I started Haskell I had chosen OCaml for a production typed language.
For Python: I really like Python for deep learning, reinforcement learning, quick and small semantic web apps, etc. The common thread here is that I am not writing much Python myself, instead I am exploiting large well tested libraries.
> I wish that about 8 years ago when I started Haskell I had chosen OCaml for a production typed language.
Do you mind writing a bit more about why? I have been a curious bystander in OCaml land but some of the differences with Haskell, like the lack of type classes, have pushed me toward the latter.
I feel very similar but have struggled to set aside enough time to find a better replacement. For work I often build one-off scripts, web scrapers/automaters, data tools, and backend web apps/APIs. While I don't disagree with your comments about the ecosystem, I find myself very dependent on it to do the aforementioned work (playwright + beautifulsoup, peewee/native sqlite3 lib, numpy + scikit, Flask/Django) and is probably the main reason I've continued using it. Does anyone have recommendations for some directions I could research? Go and/or Rust seem to be clear contenders but I'm not sure the ecosystem has equivalents or at least mature-enough equivalents for the libraries I use. Very open to learning about other languages too but simply am out of the loop. Something with a great type system and some reasonable flexibility would be amazing (eg I like that I can mix classes and functions in modules easily in Python compared to say old-school Java where everything is a class). I'm also not looking for a language that's primarily functional at this time, too much to learn right now on top of a new language, but it's on my long term to-do list.
I think the issues you encountered may be due to the specific libraries. Lots of the pre-typing libraries haven’t adopted static typing, like Django and Celery and then when your project is 95% Django and Celery you’re SOL.
I’m not even sure it’s possible to have Django typed without reworking the ORM, I’m thinking about reverse relations, .annotate(), etc.
Yes, there are type stubs for these libraries but they’re either forced to be more strict, preventing use of dynamism, or opt for being less strict but allowing you to use all the library features, at the cost of safety.
I think in the end, new libraries built with static typing in mind, like Pydantic, FastAPI, and Edgedb, are the answer.
> Yes, there are type stubs for these libraries but they’re either forced to be more strict, preventing use of dynamism, or opt for being less strict but allowing you to use all the library features, at the cost of safety.
The problem is that you lose all help from tooling/IDEs. Like in Celery, the definition is "shared_task(*args, *kwargs)". This gives you no indication of what parameters you actually can use. Opening up the code doesn't help, as it's many layers down. The decorated function ends up untyped, but with some new methods on it that again are untyped. But like originalfunction.delay(...) should have the params of the original decorated function. But no, all that is lost. Just pray that the docs are correct.*
While it's of course not ideal, stub files can help with this issue. For example you can get stubs for Celery that make both `shared_task` and `delay` properly typed: https://github.com/sbdchd/celery-types