Why TRUE + TRUE = 2: Data Types


In the early days of computing, programmers needed to be very sure about the
data they were operating on. If an operation was fed a number when it was
expecting a letter, a character: at best you might get a garbled response, and at worst you could break the system.
Maybe even physically. Now, at the low level of coding,
yeah, that’s still true. But these days we have
programming languages where we don’t always need to be
so rigorous in defining the data, and we can let the computer figure it out. But for something that seems so technical,
this can be controversial. When you write a computer program,
you use variables, which are basically just labelled
buckets of memory. Inside that bucket is some data,
and you can change it, you can vary it.
Hence, variable. I know it’s a massive simplification, but computer memory is a bit like
an enormous rack of switches storing ones and zeros that represent other
things like letters and numbers. But if we look into a region of memory, there is nothing in there to indicate what
those ones and zeros are actually representing. So in the code, we declare that the variable
is a particular type. What’s contained in that variable,
in that bucket? It’s an integer. What’s in that one?
It’s a string of characters. That tells the computer how to interpret those
ones and zeros in memory. The types that you get to use can
differ a bit between languages. But in general, you’ll at least have:
Integer or INT. That’s a whole number that can’t have
anything after the decimal point. And those are extremely useful
for storing things like the number of times you’ve looped
through some code, or how many points
your player’s clocked up, or how many pennies there are
in someone’s account. Then you’ve got character or CHAR. These are letters, numbers,
punctuation, and whitespaces, like the space between words,
and instructions to start a new line. And in most high-level languages,
you’ll probably be using a STRING instead, which is just a string of characters. Then you’ve got Boolean, or BOOL, named after
George Boole, an English mathematician. That’s very simple: it’s true or false. A boolean only contains either
a zero or a one. A yes or a no. A no or a yes. Nothing more. Then there’s floating-point numbers, or
FLOATs. Floats are complicated and messy and a whole
other video, but in short, they let you store numbers with decimals, although you might lose a very small bit of
precision as you do it. There are others, other types,
in a lot of languages, I know it’s more complicated than this:
but this is just the basics. So. Most languages use
“explicit type declaration”. So when you declare a
when you set up that bucket, you have to also declare its type. So, x is an integer, it can only hold integers,
and right now, that integer is 2. But in some languages, including some popular ones that people
tend to get started with, and that I like, you don’t need to actually declare that. It just gets figured out from your code.
That’s called “implicit declaration”. So in JavaScript, you can just type
x=1.5 and it’ll know, that’s a number. Put the 1.5 in quotes,
and it’ll go, ah, it’s a string. So, okay, it’s storing 1.5 and “1.5”
as different ones and zeros. Why does that matter? Well, in JavaScript, the plus sign means
two different things. It’s the addition operator,
for adding two numbers together. But it’s also the concatenation operator,
for combining two strings together. So if x is “1.5”, you ask for x + x…
it returns 3. But if either of those xs is “1.5”, a string, it’ll return that. And that’s called “type casting”;
converting from one data type to another. Some languages require the programmer
to explicitly request the conversion in code. Other languages, like JavaScript there,
do it automatically. JavaScript is referred to as having “weak
typing” as opposed to “strong typing”. And it’s weak because, even if that 1.5
is a string, and you ask for it multiplied by 2… it’ll return 3. Unlike the plus sign, that asterisk can only
mean ‘multiply’, so it can only handle an integer
or a floating-point number. Give it a string, though, and it won’t throw an error like a
strongly-typed language would. It’ll just convert it for you on the fly. Really convenient.
Really easy to program with. Really easy to accidentally screw things up and create a bug that’ll take you hours
to track down. Or worse, create a bug that you
don’t even notice until much, much later. In a lot of languages, you can also cast
to and from boolean values. Which is called “truthiness”, and experienced programmers who are
watching this may already be grimacing. Truthiness is a great shorthand. If you convert an empty string to a boolean, it generally comes out as false. Anything else, true. So you can just test for an empty string with
if(x). But that also means that in JavaScript, you can ask for true + true and
it’ll tell you that the answer to that is 2, because when you cast ‘true’ to a number
you get 1. In PHP, a language notable for many
questionable design decisions, even a string with just a single zero in it
will get converted to a boolean false, there’s a special case just for that string. Which can cause a lot of unexpected bugs. Now, there is a workaround for that in
loosely-typed languages. Normally, if you want to
compare two variables, you use two equals signs, like this. You can’t use a single one, because that’s
used for assigning variables. I’ve been coding for about thirty years and I
still absent-mindedly screw that up sometimes. Now, if you ask if 1.5 is equal to “1.5”
with two equals signs in JavaScript or PHP, you’ll get “true”. But if you add a third equals sign,
then you’re asking for strict equality. If the data types don’t match,
any comparison will automatically fail. So why is all this controversial? Well, languages like Javascript and PHP can
get a bad reputation because they use weak typing. If you see yourself as a Real Programmer —
and I’m using that sarcastically, but if you see yourself as the kind of programmer
where you are in control of everything, then… yeah, you can see that weak typing
is like training wheels, something that introduces sloppy
coding practices and bugs and shorthand. And that’s not unfair. But weak typing also makes programming
easier to learn and easier to do, it can reduce frustration and
just make programmers’ lives easier. It is a trade-off: even if it is a controversial one. This series of The Basics is sponsored by
Dashlane, the password manager. It’s kinda obvious that passwords are more
secure the longer they are, but it can be difficult to get an intuitive
sense of just how much more secure. So let’s say you’ve signed up to Dashlane, and now you’re able to use long, complicated,
symbol-filled passwords everywhere because Dashlane remembers,
synchronises and autofills them for you. How much more secure are those passwords? Well, let’s do the maths. The characters that most web sites tend to
accept for passwords are: uppercase and lowercase letters, numbers,
and let’s say thirty punctuation marks. That’s 92 characters. There’s an argument that you should be allowed
to use any Unicode character in a password, but it’s definitely an argument, so let’s limit it to those 92 characters. First, because we’re using random characters,
we can ignore ‘wordlists’ that make password cracking orders of magnitude
easier by guessing dictionary words first including, like, using an exclamation mark
instead of a 1, or deliberately swapping a couple of letters. Those tricks won’t work here. Now, let’s say that an attacker
is able to test a billion possible passwords
every single second. That’s not unreasonable if the encryption
is weak and their computer is fast. If you have a six-character password, it will take them at
about ten minutes to try every single possible one of the
606 billion combinations. But every letter you add
multiples that time by 92. Seven characters works out to
15 and a half hours. Eight characters, about two months. Nine characters, 14 years. Ten characters, more than a millennium. Eleven characters is a hundred thousand years. By the time you’ve got to fifteen characters, it’s about a thousand times longer than
the universe has existed. Long, complicated, random passwords aren’t
a cure-all. But they’re certainly better than the alternative. And to help you with them: dashlane.com/tomscott for a 30-day free trial
of Dashlane Premium, which includes unlimited password storage
and sync, plus a load of other features. And if you like it, you can use the code
“tomscott” for 10% off.

100 Comments

  1. And that's the last in this run of the Basics! Thanks to Dashlane for sponsoring all these: you can get their password manager for free on your first device at https://www.dashlane.com/tomscott – there's a 30-day free trial, and 10% off with my code, "tomscott".

  2. Our comp sci class watched this video in lesson :D. Thanks for creating content like this, it really helps me out learning to program and getting ahead in lessons. keep up the great work :).

  3. This is off topic, but I've been agonizing over how to do math on a number that JavaScript thinks is a string. I had absolutely no idea you could just convert the effing things back and forth. Finally finished setting up an applet on IFTTT to schedule my PC to shutdown from Google Home.

  4. Funny I'd actually make the opposite argument. Strict typing feels like you have training wheels on. Java anyone? After all if your API is perfect duct typing is the ideal. No special type conversions or exception handling. If your data exists.. its right. You need to check for type? Save more data about the type. Its all this type declaration and compiler level automation thats the training wheels.

  5. I do the "use strict" thing in a lot of my javascript. It makes it a bit stricter.
    BTW:
    C++ is sort of a strongly typed language and sort of not strongly at all. Because of the ability to make your own types and operators:
    A = B + C;
    Could mean two take the two numbers B and C and add them or to concatenate the strings B and C or perhaps to reformat your hard drive. The "+" operator can be defined as anything the programmer happens to want it to mean.

  6. I strongly prefer strongly typed languages. I do a lot of coding in PHP and JavaScript, and I always feel better when I get to sit down with c#.

  7. I feel like you're conflating dynamic typing with type inference. OCaml doesn't need explicit type declarations and it's statically typed

    edit: also dynamic typing and weak typing. iirc Lisp and Python are dynamically typed but strongly typed

  8. When I did my app, I stored all my numbers as integers. And that was just to store them, but when they went to variables (which they did a LOT) they stayed that way. This is except for really long numbers that I stored as text.

    So far, not sorry. It's been working.

  9. Gotta disagree with you on one point. I'm a "real programmer". Been programming for close to 45 years, (professionally for 35), and have always strongly preferred (and spent most of my career using) weakly typed languages. To me, a strongly typed language is like driving a manual transmission car. Why would I want to have to tediously define stuff? The whole point of computers is to automate stuff. We made assemblers so we wouldn't have to program in machine code, and higher level languages so we wouldn't have to program in assembly. So why this giant step backward? And weakly typed languages translate into more power and flexibility.

    (Also. your graphics were botched for *. That's 00101010, not 00010101 as you had it.)

    (PS the fact that javascript interprets "1.5"+"1.5" as "1.51.5" is based on their questionable design decision to overload an operator. In MUMPS, for example + means addition, _ means concatenation, so "1.5 sheep"+"20 stars" = "35", not "1.5 sheep20 stars" (technically everything is treated as a string in MUMPS except when actually doing math) (OTOH, they use a single = for both assignment and equality, since context tells them apart))

  10. I can appreciate this … I used to do horrible things with embedded boolean statements in calculations and then use sign as a branching or formatting conditional

  11. Before watching the video: It doesn't. Plus isn't an appropriate operator for true or any boolean. What you're describing is a fluke in older compilers and languages that produces a nonsensical result— similar to how integer values can overflow, and floating point values can reach a point where incrementing them makes them infinity. Valid syntax or not, the behavior is a bad code smell and should always be avoided. You're trying to turn a loophole in how computer programming worked before we had the CPU power to enforce correct syntax at compile time into some kind of crazy magic trick. Just stop.

    Again, I haven't watched the video yet so maybe you explain that it's a quirk and spin it into talking about type systems. But that doesn't make the title any less of a distortion of reality and effectively and practically false (while technically true decades ago, but no longer). You may have a CS degree or whatever Tom, but sometimes your spin come off really sour, throwing egg on the faces of those of us who do this professionally and care about our craft.

  12. Training wheels help you stay upright and move straight forward until you can do so by yourself. Usually you remove the training wheels when you don't need them anymore: you learned to ride the bike. Weak typing with implicit conversions is more like having an interpreter follow you around, translating what you say for others, so you don't have to learn how to be understood and be clear about what you mean, but sometimes they mess up the translation into something radically different from what you meant and you're in trouble.

  13. I think in C.
    And am fluent in hexadecimal.
    Those other language rules seem like a made up school-yard game designed to annoy participants

  14. On the conversion of 0 and 1 to data types, the Operating System doesn't know what type of file its handling without it having an extension, like a JPEG image is marked with .jpg, a Wave audio file as .wav, or a Matroska video as .mkv.
    If the extension is mismatched, the OS doesn't know what program to use. But with this in mind, you can take an image, change its extension to .mp3 and your system will think it's an audio file, even though it won't be able to play it, it will try to.

  15. You could do a whole series of hilarious weird traps in Javascript. Javascript is just extremely goofy. Starting with the fact that it doesn't have integers, every number is a float, and that is just the start of it…

  16. I hate weak typing and every language that does it.
    Note that Python is a great language, and has strong typing. The only reason that True + True == 2, is because Boolean is a subtype of integer, not because a value is cast.

  17. I'd argue that explicit conversion and weakly typing will take infinitely longer to debug than you saved on writing it…

  18. If anyone is responsible for a newsletter at a CS club or something, you can have this name for free:
    Weekly typed

  19. Lua has type casting, but they call it "coercion". I spent a long time trying to look up how to convert a string to an integer, only to eventually find that I didn't have to.

  20. Personally, I like my passwords to be phrased. "The purple dinosaur jumped over the 3 little puppies!" will take 87 OCTOVIGINTILLION years to crack. I will also never forget it.

  21. What I like about implicit typing in languages like JavaScript, is not needing to cast for my arrays. You can store multiple data types in an array nice and easily, without worrying about typecasting.

  22. Javascript is not implicitly typed. It's a dynamically typed. An implicitly typed language would be where it figures out what type the variable is when you initially declare, but then it never changes after that. In javascript, you can set x to 1.5 and then immediately change it to "hello world"–the type is dynamic because it changes.

  23. Good overview of the basics of data typing, Tom!

    I've spent the past six years coding in a very old language (Cache Object Script) that was written in the 1960's. It's pre-Unix and pre-"C" so the syntax and typing is completely alien – assigning a variable needs a "SET" and lines don't end in a semicolon like nearly all descendants of "C". What makes it more difficult is that I'm also coding in PHP and JavaScript so there's a lot of swapping syntax back and forth as I go from system to system. Fortunately we're replacing the Content Management System that's based on Cache Script and going to PHP so I get to only program in two languages – it'll be like a mental vacation!

    The bottom line, though, is that if you learn the fundamentals of programming then for the most part the languages are just the particular dialect you need to get something done. Loops are still loops, Boolean logic is still AND and OR, data storage, string manipulation and more are all conceptually the same, it's just the phrasing used to accomplish the goals that changes. Over the course of my career I've worked in over 30 different languages – from machine code to object oriented languages and the fundamentals are always the same…

  24. Another useful thing is if you need your magic web application to not break under ANY circumstance, even if it outputs a potentially wrong or unusual answer. I mean of course there is almost a better way to do it regardless if you think hard enough, but thinking too much increases the entropy of the universe so…

  25. Weak typing is easier to learn, but when you are writing complex programs, it also makes easier to introduce subtle bugs. Autoconverting between numbers and strings is specially problematic. In both PHP and Javascript, which operate on the open Internet and pass/receive a lot of information as strings, it can result in security vulnerabilities: "this is a number, so I don't need to escape it in the SQL sentence"… and then… boom! SQL injection vulnerability!

  26. For those who consider themselves "real programmers," it should be noted that the assembler barely recognizes types at all. Of course TRUE + TRUE = 2. "TRUE" is represented as a 1. And if you decide for the moment that that is an integer, 1 + 1 = 2;

  27. Why couldn't you make these videos last year when I had to do all this sorta stuff for my HND 😭 this is miles more useful than any of my 'teachers'

  28. Silly. "true + true" should clearly equal 3. Since a boolean isn't an integer, the plus sign should concatenate the two, resulting in binary 11.

  29. I like explicit typing because it can prevent my dumb arse from putting in stupid bugs. Also when you said you still mess up = and ==, I felt that

  30. Programmer here too. I use strict ones like Objective C, Java, Typescript, but I also work with weak ones like Swift, JavaScript, NodeJS, and mostly PHP. It's so hard to switch around some times. And frustrating working with weak ones and not know what went wrong. No error prompts on JavaScript, which is why I prefer coding in Typescript.

  31. I would add that interpreted languages actually offer MORE control and freedom over you're programs. Anyone who says "not a real programmer" in 2020 needs a wider scope.

  32. I don't think weak typing are training wheels, on the contrary you have to be all the more careful with them. It's more like removing security equipment to rush even faster at your own risks. It always make me tick when people suggest Javascript as a beginner language.

  33. "…in PHP, a language notable for many questionable design decisions…" I won't point out the obvious misstatement here, I will just say (IMHO) that the questionable (meta) design decision is choosing PHP.

  34. in most machines:
    declare false 0;
    declare true !false;
    I always thought that was clever. it prevents 2 and -1 from being an undefined behavior.

  35. in most machines:
    declare false 0;
    declare true !false;
    I always thought that was clever. it prevents 2 and -1 from being an undefined behavior.

  36. Still not quite 1. True + True = (binary) 1+1 = 10, or if we are using fixed memory, 0. -1 is unusable due to ONE bit.

  37. now using the $(( )) in shell scripts you get some real fun: $(( true + true )) equals 0 as any lowercase letter is counted as 0.
    Now using something like bc (on linux specifically) and the real fun starts true + true = 0 while TRUE + TRUE = 19998.

    Essentially the characters are counted up so A represents 10 and z represents 35 you would expect AA to be 100 however it just remains 99 just like ZZ.
    Moral of the story here don't use letters in bc. Even going for hex does not work as FFF does not equal 255

  38. TRUE + TRUE doesn't equal anything, you can't title this as if it's about data types and act like data is universally processed in all cases. C literally has no concept of a boolean traditionally, treating 0 as false and anything else as true. You can't add booleans because they don't exist, and TRUE+TRUE is literally the universal set of all assignable data values, because -1 + 1 is TRUE + TRUE and = FALSE

  39. 5:18
    Hey, sorry. I have like a years of programming experience and only with python.
    Is that also true with it?
    Could someone explain a bit more?

  40. So, wouldn't implicit types be "error proof" if the language had explicit operators instead?

    Like "+" only for summing while "&" only for concatenating? Use an hypothetical "#=" for comparing as values, "$=" for comparing as strings, "%=" as booleans, and so on…?

    It seems to me the problem with implicit types arise because the same operators are used to do different things for different data types. They accumulate different functions under the same symbol and cause havoc.

  41. There are so many arguments about "weak typing" vs "strong typing", but the argument always seems to apply exclusively to low-level internals / language builtins. Once you start getting even slightly higher level than that, almost everyone is happy to weakly type everything as those pre-existing low-level types.

    a "string", for example, could contain: a YouTube comment, a person's name, an XML document, etc, etc, etc. But these are almost always treated as "just plain strings" until they pass through some critical point of converting them into an entirely different representation.

    a string which describes an XML document should rarely be the same high-level type as a string containing a person's name, but until someone says parseXml(TheDocumentString), that variable is almost certainly being called a "String".

    I understand there are reasons, both in terms of performance and in terms of programmer sanity, that it's just "better" to use the simple solution of sticking data into whatever low-level type it will fit into, when it will fit into that type. But I hate that that sort of thing is considered to be "strongly typed" so long as it considers a string and an integer to be different things.

  42. True + True isn't always 2.
    In C "not-zero" is considered true so you could have 255 internally (true) added to 1 (another true) and get 0 as a result (Which would be false)

Leave a Reply

Your email address will not be published. Required fields are marked *