Normally, my blog posts are a bit more planned than this one. I might have an idea for something I want to write (and tell the world) about, and I generally know how I want to approach it; it might be a project of mine I feel I want to describe in more detail than I usually put in it's readme; or it might be a particularly interesting question that's been making me put a lot of thought into an answer (or why there are no answers).
This one is more of the "interesting question" type, though it's something that just hit me out of the blue. So with the preamble out of the way...
The disclaimer (because I feel like this post needs it)
I'll start by being the first to point out that I don't program professionally. I've made some money off the things I make a few times and that's great. I've also never had any formal education in programming, databases, computer science or anything of that nature - I'm entirely self-taught through experimentation, headaches, lots of googling and reading lots of code.
In this post I dig a little deeper into programming abstractions and libraries, as well as why we use them (and what I'm worried the results of that might be). It's the first kind of post I've done like this, so (when I fix the comments), please chime in if you think something is incorrect, could be worded better, or with a differing opinion (but please be civil about it).
The background (such as it is)
As I'm typing this, not 10 minutes ago I was planning a small project I was (and still am) about to code. Generally when I do this, I do a lot of research to see what's already out there (as far as libraries) that I can use or adapt to my needs (as an example, mqn uses paho-mqtt to receive mqtt messages). this is a really common occurrence here in code-stuff-land; programmers in general (I feel confident enough to say) hate duplicating functionality, and for good reason. Why code your own http library for an app when there are already libraries that do things better than your hacked-up, four-hour, untested module that only implements the things you "need right now". There are also a lot of other arguments as to why programmers should use libraries for things instead of "rolling their own", but I won't go into them here. The general consensus I've noticed is "libraries are good most of the time, full stop."
When deciding what to use in my new project, I took a step back for a second and wondered "why is this my first thought - to use some code other people wrote to do something?". The obvious and immediate answer I came up with was "because it probably does that thing better than you could given a day or so to come up with something, and it's available now and does far more that you might use in the future".
This sparked the question that's the focus of this blog post: do high-level abstractions like libraries really make programming easier for most of us?
The question, in detail
Before your possible immediate response of something like "of course they do, how could you seriously ask that!", take a second to stop and think about the following:
- Do you really understand how a package or library really works? So you've read it's documentation, and depending on just how well documented a particular package or lib is, you might have picked up a thing or 3 about how it works, it's internals, exceptions or errors it throws and when, etc. that's all fine and even necessary in programs that use it more than just "call x; parse or just store it's output; do something with output", but... What about when you hit that "edge case" race condition? What about when something breaks that isn't just a Google search or a github issue search away? Could you fix it? Even more important for those that program professionally - can you fix it (or temporarily work around it) in a short amount of time, say a few hours?
- Do you understand how a particular package or lib does what it does? This might sound similar to the previous question, but it isn't always. As an example, one of my choices for a library I'm going to use in my project is SQLAlchemy, which is an excellent database toolkit for python. I understand (mostly) how SQLAlchemy internally does what it does, but it's been a few years since I've done anything serious with raw sql, so for most of the things I can do off of the top of my head with SQLAlchemy I can't as easily do with raw sql. That doesn't mean that I couldn't do everything I use SA for in sql, just that it would take me some googling to find the syntaxes.
This sort of thing (where coders understand more about the abstraction layer they're using than they do about what it's abstracting) scares me now that I seriously think about it. Abstractions are supposed to take away complexity from a system that we understand, while making it easier to do things with that system. An example might be the python requests library, which handles making http requests of all types. With it, you can send a get request in one line (once the library is imported):
response = requests.get('https://github.com')
A lot of things are happening behind the scenes (a request object is prepared, it gets set with requests' default user agent header, it adds any cookies you've given it, url-encodes any url parameters you've provided and more), and that's before a socket is ever opened and the request gets transmitted. It then does several things with the response (including following redirects if you haven't told it otherwise, trying to guess the encoding, checking certificates if the response is over https, parsing header links, and probably more that I'm unaware of).
My point: you don't need to know all of that - you can just run that one line of code and (like magic), usually the response variable will be populated with a response, ready for whatever you want to do with it. But does not knowing all of that (including the nitty gritty on the http specification) make things harder for you in the future when bugs pop up?
The answer I came up with
Yes, to a degree.
We use abstractions in the form of libraries so frequently because they work. They're (hopefully) easy to use and well documented, and that translates to less time we have to spend on things that aren't directly related to the thing that's being worked on.
To use the project which got me thinking about this as an example of how useful abstractions can be: it probably will be done in a few days of casual work; it will be capable of connecting to and using more than 10 types of databases (I don't remember how many database backends SQLAlchemy supports off the top of my head), will work on a single self-hosted server or can run on an automatic scaling cloud, will have an https API (probably with rate limiting), will have it's forms CSRF protected by default, and will be able to send and receive emails. Hell, I don't even really need to worry about escaping html tags in user-submitted input; there are of course still security concerns, but much less than if I were doing all of this with my own code, rather than code that's been tested by likely thousands of people and been improved for years.
If that sounds like nothing special and pretty much every other web app ever, keep in mind here that the only code I need to write is specific to my app. I don't need to worry about connecting to databases, executing commands in specific ways on each one, what syntax they support, what the fastest way to do a particular operation is on any of them, what backend database row type to use for the best results on a particular database, writing specific sql queries to get data out or put data in... SQLAlchemy handles all of that for me, and all I really need to do is give it the URI of the database and tell it what tables and columns I want (and even that is easier).
I also don't need to worry about making my app WSGI compliant, routing requests to specific URL routs and functions, properly creating, formatting, and parsing returned cookies, implementing the 'HEAD' and 'OPTIONS' http verbs for every endpoint, returning a properly-formatted mime type (at least for the most part)... You get the idea.
As someone who's curious about these types of things, I generally know enough to understand at a high level what these libraries are doing (and just how much time they're saving me so I don't have to implement it all myself), and I would imagine a good portion of programmers know about the same (and probably more). But as more and more libraries, frameworks, microframeworks, nanoframeworks (do those even exist?) are published to Pypi (the python package index) and similar places... Do they make programming harder for some people (particularly new programmers)? When they hit a bug in a library, will they have any idea of what it could be? Will they even know what to check or what's most likely? Or will they rely so heavily on the documentation for their abstraction layer that the error message might as well be in a different language.
Of course there's always Google and friends for situations like this, but how much of that googling will be shots in the dark, hoping for a stack overflow post that explains exactly how to fix that error, verses ones that understand the systems involved and are just looking for implementation details, and maybe where the package they're using keeps it's contributing guidelines so they can "fix this damn bug and get on with the day".
With these frameworks and libraries, are we making it easier for everyone to use certain systems, standards and algorithms? Or are we causing false confidence in an abstraction layer that was meant for those who know what (and how) it's doing the abstracting?
I think the answer to this comes down to the classic "too much of a good thing is a bad thing". By all means use the massive framework (looking at you, django), or the microframework, or your favorite database toolkit, or whatever. But take some time out of the day to read up on the problems these libraries solve and how they do it, if your unsure or just have no idea. It might just save you an hour or two the next time you find a bug, and sometimes that can be the difference between proudly showing off your creation (and maybe making some cash or landing a job!), or relegating it to the big bucket of bytes in the sky.