The lost art of telnet


A long time ago, on an operating system far,
far away...




It was a period of protocol war.
Raw text connections sent
over the internet had
gained a strong following.

During this time, greater
needs and complexity arose
eventually leading to the
wide adoption of the hyper
text transport protocol

Knowledge of how the web worked
on these lower levels soon
vanished. It is up to you to
learn the old ways and restore
freedom to the internet...



There's something we once did back in the early days of the wild wild web, back before HTTP was the only game in town. It was called telneting, now known only in passing to some programmers, and a hand full of old school hackers that might still use it.

We lost something when we stopped using telnet, we stopped understanding how protocols are higher abstractions of raw text, how fragile things are, how simple they might be.

What is telnet?

Okay, so I'll admit telnet is actually the wrong technical term for it, but it's how a lot of early hackers learnt about protocols and did neat things. For all intents and purposes we're talking about sending raw text over TCP/IP. To be totally honest, this is a bit about internet protocols.

Fun and games

One of the coolest things I remember back even during the turn of the millennium was something called Multi-User-*, where the * was a place holder for "created Kingdoms" or "Dungeons". They were called MUDs, MOOs, MUSHes and MUCKs. They're a dying breed, born in the mid 1970s, but by now with these fancy web scripts and css tricks most people have entirely lost their interest in them.

There were no graphics or special data encodings, just a dumb box of text that you got to play with (for hours and hours). It may be hard to see why people would bother with these things. Eventually a handful of the servers did branch out and add images, but generally they weren't a feature.

I really got into them at an early age because one particular type of server had several programming languages exposed to the player. You could log in, and start writing code that manipulated the game world, and other users could walk into your part of the world and start to play with them. Anything you wanted you could program while playing the game with others, and export that code to another server somewhere. I actually ran one of the most popular sites on something called MPI back when I was about 12 years old, but that's long defunct now. Others played the game for the adventure of other user-created world and plots they wrote (that's right, we had user-created content as the focus of the internet back in the 1970s before I was even born, web2.0 and MMORPGs: go eat your heart out).


Well, wash my arm! Stren Withel will rue the day he eyed me over 
and started flipping tables at me!

It was a great environment for me to learn in, because it was a chat room (like IRC), it was a game world that I could explore and interact with.

The above is a snapshot of the Discworld MUD, where the theme is based on the famous Discworld series by Terry Pratchett! It's been on-line for almost 20 years now, and there are roughly 50-150 players logged in at any time.

The MUCK I was a big part of was actually the NMC project (a fork of a popular MUCK server). The only NMC server around right now as far as I know is Redwall MUCK, which you might enjoy as it has a web-based interface that lets you connect from a browser. One of the really cool things with that MUCK is that Brian Jacques, the series author, gave his permission for it to operate before his unfortunate passing earlier last year.

We don't really have this sort of thing on the Web as we know it. I mean, yeah sure, we have crummy phpBB installs where some people role play or do fantasy football. If you want a cheap laugh you can hop over to Kongregate and play Don't shit your pants, but you know what? It's not quite the same thing.


Sweet, I got a crown after unlocking 
all achievements.


There was something more engaging about using my imagination with 10-150 people, all of us sharing and creating. Somewhere along the line people started enjoying Skinner box rewards from killing sheep in MMORPGs for ten to fifteen bucks each month is more than a game that's text-based, and you have to create your own world and plot. Anyone who's experienced an addiction to NetHack knows just how fun these games can be.

Knowledge

One of the really cool things that existed was the gopher network. Before Wikipedia, we had a number of servers with networks serving to normal gopher clients. We still have a few today.

It sounds a bit silly, but I wish the idea of what gopher was ended up winning -- if it had, we wouldn't be in this "over-design-everything" mess we're in now. You can check out the gopher manifesto over here, which honestly reads more like a post mortem than anything. To summarize both of those, if we had gopher we would have websites that deliver pure content without any of the design chrome, or spam.

Pure media (sounds, videos, graphics, text, whatever) without needing to force a heavy markup layout on us to send the content. It's what the static web needs to be: just the knowledge, and not the sparkling prettiness of bad web design design and spam-filled "news" sites that floods the net today (one paragraph per page with ten ads and irrelevant noise). It's tragic the protocol died so long ago. Had it evolved with the same effort that HTTP had, we may have seen a very different net today, a cleaner one with less spam and marketing adds flying around. Perhaps one of the coolest things: it would have made this whole "mobile site" thing a total non-issue; we're only delivering the content, not the flood of menus, pretty backgrounds, background music, and flashing animated advertisements.

There's still some gopher servers out there, but the unfortunate part is that the technology has remained in near-stasis since 1991, so to get an appreciation for it you need to think you're back on a dial-up modem and JavaScript won't be invented for another four years.

Using lynx to browse a gopher of everyone' favourte web comic.

There are a few knowledge and communication based protocols floating around, like IRC, which has survived very well, and as most technically inclined people have likely used it before, so I won't be going into detail there.

Email is another great one. What could be simpler than SMTP? It has an interactive help system, and you even greet the service with "HELO" and lets you write emails on-the-spot, not a single data burst.

For fun, for profit.

Knowing all of this is interesting, sure, but what do you really get out of it?

Well, for starters, you gain an appreciation for how the internet is not the web, and especially not the web browser... it's so much more, so much richer, and has so much greater potential than the web! If you understand telnet you understand the Internet isn't a magic box that renders HTML strings into pretty pictures, that's only what happens after the internet takes place.

You understand the internet, you understand that every time you open a web address you send raw text that looks like "GET /Resource" at it's very core, there is no magic going on, it's not a closed black box that is beyond your understanding. It's so dead-simple you could make something just like it.

WireShark is a great tool for inspecting network traffic.
This is some of the raw text that was sent when I loaded Reddit.

So why did we suddenly stop writing cool and inventive protocols and stuff everything into an HTTP service? There's a few good reasons. All of these mixed protocols lead to total madness on the web, we didn't really have a common ground to do cool things on, you needed lots of different software to do lots of different protocols.

Of course, the web browser was originally meant to be the common user-interface for many protocols, it ended up being the HTTP/FTP browser. HTTP didn't end up being the best tool for everything but merely a tool for everything. Today we we have the SOAP monster and RESTful API services, along with some strange form of APIs that aren't either of these things but still run if the sun and the moon line up the right way. HTTP is nice in that it's brought a lot of normalization to the internet, but it's also halted innovation from doing something that isn't an HTTP/HTML driven web. Something that you may find entirely amazing is that over twenty years ago some smart people at XeroxPARC were talking about social media and personalized user interfaces: two things that are only recently present on the web. They were talking about doing this on the protocol level with something called WAIS, and if you have an hour you should watch this lecture from 1991 (right when the web started) and ask yourself why has it taken so long to finally get some of things?

We don't always need to write code for the HTTP/HTML internet to do some fun hacking projects, but it never really happens. My guess is that most simply don't realize they can make their own protocols, or cannot see a reason to to replace HTTP to try and accomplish something else in a different way, the same way FTP, Gopher, SMTP, or even IRC do. After all, all of those protocols can be emulated in functionality through HTTP with a RESTful or SOAP API.

Not enough people understand them

Not because they're complicated. As you've seen here, some are dead-simple. They aren't understood because they aren't thought about, because they're out of sight: it's hidden away, tucked inside the internals of your browser, email client, or any other software. It just works. HTTP/HTML was reaching a tipping-point back in 2005 where we had a hard time writing interactive apps like desktop software on the web, but the advantages of the web were promising. If AJAX and JavaScript hadn't come in and matured the user experience of the web as we knew it, we may have very well ended up with a new protocol already that would have done animations and rich interfaces as part of the protocol and encoding.

What am I suggesting? Learn the protocols. I'm willing to write about any specific protocol in more detail if there's a high demand for it. You can write a POP3 email client in less than 250 lines of code.

What would I really like to see? I wish for more hacker-inspired innovation in the protocol arena, where people just invent things only to push boundaries and see how something else might work. I want this for a few reasons. Politicians decided hyper-linking to pirate content is illegal, some decide blocking DNS makes sense, others say encryption is illegal... those of us who understand the internet truthfully understand how asinine these policies are. The other reason I wish to see more playing around is simply so we don't keep working ourselves into one corner of technology. The web is slowly becoming one protocol which was never designed with the entire web in mind. We're creating a feature creep simply because we push so hard to get everything into a browser that only supports HTTP and HTML.

What else might we have if we didn't have the HTML/HTTP web?
  • We already have the ability to build faster websites, if we could only have a protocol that sent pre-computed DOM trees, rather than HTML documents.
  • We could send multiple documents at once-- rather than sending a document that describes more documents to request to make back to the server that tells us what requests we need to make.
  • We could make a protocol specifically for discussion sites, so places like Reddit might exist in Telnet, unwrapped from the chrome of the web, and can work as a messaging service.
  • We could make everything a JSON object, because that's the hip thing right now.
  • We could send content in TeX format.
  • What else would you do?
We don't have new protocols because of the inertia HTTP has built up, but that doesn't mean you, me, and all the enthusiastic hackers around here can't take our internet back from the people who don't understand it.






What's in a name database?

All too often I will see an application somewhere cause users to scream bloody murder because the author didn't know about some of the material I'm about to cover here. These things took a lot of time and reading to fully realize not only because they are not intuitive but also culture dependant. Hopefully you can follow my findings and save yourself the pain of bad design, and the time spent reading about cultural differences.

Many websites ask us for our name, under the assumption that they need this sort of information.Of course they do, how else would they personalize emails to us as if they were our dear friends that care so deeply about us.


Does anyone actually feel special when getting emails from some random forum they posted a question on 5 years ago? I'm glad they reminded me they're #1 on my birthday. Well wash my arm folks, I'm touched!

Nobody on the planet can justify requiring a name for sending these lame emails.

So, what's your name?

Surprisingly, this is actually the wrong question to be asking someone. Let me quickly explain to you what a name really is. In the German language (which is one of the languages that helped form English), you don't normally ask someone "What is your name?" but more literally you will ask them "What are you called?". A name is simply some noises and letters we use as a reference to someone else. Historically, in many cultures it wasn't unheard of to go by different names all at the same time, or at later dates. You may have heard of William Shakespear being referred to as The Bard, or Williams Shatner who goes by Bill with closer friends. Such nick-names are still very common today, perhaps sometimes goofy.

It's not unheard of for people to change their names either, obviously so with married persons in modern western cultures, but even historically thousands of years ago, entire names would change to signify a major life event. Names are not always inherited either, such as the tradition of ancient Hebrew cultures where there was no "last name" or "family name," you were simply David, son of Jesse, or in the Sikh tradition where men will take the last name Singh upon baptism.

The question I'd like to go back to is, why are you asking for this person's name?

What is a name?

A name is a reference to a person relative to another group of people at a specific time. My friends call me BDG and write it on my birthday card, my boss calls me Brian, the government calls me Mr. Graham. If I change my name, my friends will still probably call me BDG, my boss might call me both names, and the government will bill me for the name-change processing fees.

So, again, a name is flexible, it changes, and it can be a number of things, to anyone, at any time. So before you ask someone for their name, you need to tell yourself first why you need their name. That's right, scope out requirements rather than mindlessly collect data.

If you've been paying attention to registration forms you'll see them come in all types. Facebook's changed their signup form and run their big data crunching over names hundreds of times. Twitter's needs are actually very straight forward-- you get a "Full Name", twitter account, email, and password.


Speaking strictly as an end-user for a moment here. I don't know you or your service/company on any personal level, so I don't really trust you. I don't want to tell you anything about me because you're just going to spam me on my birthday with some email. Why you would ever need my name is frankly beyond me. You'd better have a great reason that's valuable to me as an end-user, and not you as an advertising or data-collecting agency.

On sites that feel my "Government of Canada name" is important without providing me a clear reason why simply get a fake name such as "Bruce Oxford". It's going to happen to you when you're only asking for my name so your emails can say "Dear Bruce" in the subject line, rather than a valid reason that I care about. Email greetings generated by a robot are stupid, my name on a parcel is not.

You don't need my name, stop asking for it. Provide me whatever service it is you do and leave me the hell alone after that. Nobody on this planet is naïve enough to think that some website mass email script included me because they want to be my friend, and making money had nothing to do with it.

I actually do need your name.

I saw an email sent out to a travel agency once from their customer. The agency is collecting data used to book an individual flights and hotels. While it might seem obvious to us that he may need to give them his name and address to do things like work on processing entry visa requirements to China, he didn't understand and simply replied with a very short email asking "Are you totally high?! I'm not giving that out!!". Yes, he was an extreme case, but there's something to be learnt here. People care about their information, so explain to them why you need it, even if it's obvious to you. Let them know you're not selling it to some spam farm in Russia for twenty Kopeks.

Ask for their name after you've ensured there's a value to them in having it (and doing mental gymnastics doesn't count, it's a strong sign you need to immediately stop the mistake you're about to throw onto your customers).

Ask them for the name you need. You may need the government name that's the most recognized, or simply a pseudonym like "Incognito". Let them pick something if you don't actually need it for any regulatory purposes, or even better, don't use names at all. My email from something I pay money for didn't include my name and I couldn't be more thrilled about it.



Okay, so we can now ask them for their real name?

You have a real reason now that actually provides value to your users? Of course you can! So make your database tables with your three fields like you were about to.
  • First name
  • Middle name
  • Last name
And now let's have the last German Minister of Defence come and put his name in, so here's his name and start typing it in: Karl Theodor Maria Nikolaus Johann Jacob Philipp Franz Joseph Sylvester Freiherr von und zu Guttenberg. 

Yes, our friend Karl is the exceptional case being of nobility, so please ensure your database cannot accept any members of the bourgeois and stick with the simple traditional proper modern anglo-saxon naming contention of having a first name, middle name, and a last name. Everyone who isn't in this naming scheme is a person who's business you most certainly don't want.

Over in Hungary names are different from how you might expect them to be. If your name is Bob Smith Magyar, in Hungary you would be known as Magyar Bob Smith. Their culture expects the family name to come before the given names. They have similar conventions in many Asian cultures, for instance, my friend Dai Wai (who goes commonly by Dave, and doesn't use Chinese writing over here to write his name, but back home and on his passport he does). In some Malay, Korean and Arab cultures it may be the custom to refer to a person as the parent of their children (rather than the Hebrew tradition "David, son of Jesse", one could expect "David's father").

I'm not the only person saying this stuff either. The W3C has a lengthy document describing how names differ around the world, what the implications are, and how to design for this concept.

Now, if for some reason you actually agree that a naming convention is a silly reason to not just take someone's money they're willing to give you, keep reading.

We can't do three names, I get it. What do I do?

We have two options here, add in at-least fifty more fields to the database to deal with this (by the way, "von und zu Gutenberg" is german for "from and to Gutenberg", so spaces doesn't mean it's another name), or we go back to that question I keep telling you to answer: why do you need this name?

Most of you don't need any names from an end-user. You can work off some random user-handle or something and let us keep what shred of privacy we have left. Just let me change it later if I need to signify a change in my life.

So let's accept there is no such thing in the international world as a "first" name. Hungarian first names are the family name, English first names are the given name. There's commonly a Given Name (that is, the name that was given to you, or you gave yourself), and the Family Name (or more correctly, SurName)

So, ask for, only when needed, a place for the following:
  • Given Names (all of the names you were given)
  • SurName (the name of a family you belong to)
You can't make them required. People living in Turkey, Indonesia, Afghanistan, Tibet,  and south India commonly have no surname at all. Here in Canada, we have a senator named "Nancy Ruth", where Ruth is actually not her surname. Nancy formally renounced her surname of "Jackman", leaving only her given names. This isn't entirely unheard of either, many people all over the globe will renounce their surname on paper.

That's it. Just ask for those two names, and have flags go off it it doesn't meet your guidelines to have someone review it. Let them enter in all the accents, spaces, and other marks they need to.

Don't be dumb

I want to warn you here, because you might not see it elsewhere. I see a lot of people bang out some code for what they think is a suitable program for taking name input, only to have their database crash when Jack O'Brian comes and puts his name in. That's because you don't understand SQL injection, so go figure that out (start searching the web, it's a big topic). Don't pass names through a regex, don't force capitalization or lower case, "McDonald" is a valid name, so is "DeLuca". Don't play around with people's names once they give them to you.

Don't use the same name when testing your database either. Ensure you have fake test data. The last thing you want is a real person's data becoming abused (especially your own) at some point. There's a service that doesn't specialize in edge-cases that may push the limits of what a name is, but an excellent tool for generating large sets of fake persons. It is called the Fake Name Generator. I use it.

One last note about Honorifics.

There's a huge mistake I see all the time, where programmers don't know what Honorifics are. You may have never heard of the word, but you do know some of them.

If you see the name, "Mr. Oxford", it's that "Mr." that is the honorific. If you read through this post and thought names were complicated, you have no idea how bad these get. My advice, much like names,. is avoid dealing with them.

Sometimes, you might need to because of a regulation. My advice in this case is stick with the normal honorific system of your country of operation. If you're in Japan, you shouldn't worry about Mr. Oxford, or if this is Oxford-sama or Oxford-chan or any of the millions of combinations over all of the cultures out there.

The Honorific is truely worse than a name. It will change depending on age, social status, martial status, and perhaps other things. Keep it to a simple list as per the minimum requirement of whatever regulatory need you're working with. If you don't need them, don't do it.

I've seen forms list this as a "salutation" input for some reason, and no, it's an honorific. I understand honorific is a long word, so simply put "title" which is close enough that English readers understand the context.

Tl;dr
  • Don't ever ask me for a name
  • Don't ever ask me for a title
  • Seriously, don't ask.
  • What type of name do you need from me?
  • Okay, explain why you need it and only if you really do need it
  • Support any text as "Given Names" and "Surname". Ideally you support all proper names.
  • Support as few honorific titles are you need to. Ideally you support 0.
  • The scheme for names changes quite frequently through history, and culture.

Labels: , ,







ProjectEuler: 001

Project Euler

I've decided to take on Project Euler, as a way for me to improve my programming skills, force me to grow math skills, and train myself to think more about algorithms.

My purpose is to push myself, not take the "yeah I can do that"  specific approach here. I want to solve the general problem, not the specific. Solving a fizz buzz is easy, but solving the general problem effectively requires a bit more thinking.

If you're reading this and interested in writing code, I suggest you try the problems before reading my notes. Problem solving is not a spectator sport.

If I wanted to do this the easy way in which I learnt nothing I would simply paste the first dozen or so answers into wolfram alpha, because it calculates the answer for us.

Problem 001
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
This seems to be like simple "fizz buzz" problem, but I'm curious if I can solve this with an equation rather than iterate through a loop. The loop to me, would be taking the easy way out. That's because I'm better at writing code to solve problems than I am at writing math to solve problems.

My old way

To do so, I would simply create one loop that adds to a sum on modulo 3 and modulo 5 (but not twice for both). I like JavaScript, especially because you can show it online.

This took me 5 minutes to write out, and you can see the live code at JSFiddle.net

Solving every set of multiple numbers 
(between 1 and any number greater than the lowest multiple)

Like I said, that's my easy way out and I've learnt nothing. I could do some really neat code re-use patterns here to make this capable of any combination of multiples.

Pushing myself to think a little

So how else might I try to solve this? Thinking back to high-school math I remember sums of arithmetic and geometric series. Arithmetic series are useful for things like "Add/multiply all the numbers in 1..n together". This may be useful, because 3, 6, 9, 12, 15... and 5, 10, 15, 20... are both arithmetic sequences that progress by 3 and 5 respectively.

Now, we can calculate them separately, but I'm not sure how we might count them so we don't have things like the number 15 counted twice. Generally, the equation for a sum of the sequence is as follows:


Basically, you add the first and last term in a series, and multiply by half the number of elements. So, "one plus one hundred times fifty (half of one hundred)" ends up being the sum of all numbers between one and one hundred. That's quite a bit easier than writing out 100 numbers and adding them.

We could do this by adding the series for 3, and series for 5, but we're also adding the over-lapping numbers that are multiples of 3 and 5 twice. So how can we subtract that? What numbers are common to 3 and 5? Well, the common multiple to 3 and 5 is always a multiple of 15, so we can likely add 3s and 5s, then subtract the 15 to compensate.

Now here we are, using no loops or IF statements, simple functions and mathematics to figure out the sum.

How do I solve all of the combinations like I did previously?

Well, in the above example, 15 is the least common multiple of both 3 and 5 (multiplied, they are 15), so when we sum the small sets and negate the over-lapping numbers once, we find our value.

Thinking back to set theory and Venn diagrams, I imagine if I wanted to make that function Expand into any set of multiples, we would need to sum all the series, and subtract a sum of all the combinations of series. So, in other words, if we were adding the multiples of a, b, and c we would add the series values of ab, and c but have to subtract series sums of abac, bc, and abc.

I understand how to do that using two for loops, but again, that's the easy way out. We aren't going to make the computer work harder because I'm ignorant of combinations and set theory. So, lets bring search engines up to maximum thrust.

I found two answers on my favorite Q&A community:
Most of the answers on the programmer site seem to be using loops or recursive solutions. I decided to ask my own question and I got a really nice answer back. In essence, these all can be thought of as binary lists of values, and the only thing I'm doing is subtracting parts of the total set to find another part.

This really made me think of something interesting.

Instead of adding the first set and subtracting the second, I should sum all the sets of this for 2^k, where k is the number of multiples, and each combination of k would represent it's own subsection of these sets as if they were on a Venn Diagram.

Lets say I have a set of multiples {3, 5, 7} for values 1..999 inclusive:
(generated by Google charts API)

Lets abstract the numbers out, so instead of {3, 5, 7} the list is {A, B, C} I would have my binary list such as:

001 = C
010 = B
011 = BC
100 = A
101 = AC
110 = AB
111 = ABC

The individual circles are the single values. Intersections of circles are intersections of the values. The center of all three circles is ABC.

Using this concept, I could simply multiply each value out, thus getting the multiple, and then I could do any calculation on any Venn diagram I want, and even pick and chose specific to each intersection.

I just figured out something really cool there. Think about it, rather than checking for every single number in a range, against every single divisor, all I'm doing is seeing how big a series is without counting them, and summing them together. So lets put it into code.

What's really exciting here is I have a good reason to think about bitwise operators for the first time since I wrote x86 assembly code. This is exciting.

I've written a bit of code here that shows how we can generate these combinations, I'd like to see if I can improve on it somehow as it seems the second for-loop could be removed. This will take ["a", "b",  "c" , "d"] and return ["a", "b", "ab", "c", "ac", "bc", "abc", "d", "ad", "bd", "abd", "cd", "acd", "bcd", "abcd"] (all combinations of ABCD). The problem is now how to combine these values correctly.

The inclusion-exclusion principle, as it turns out, perfectly describes how we can find the value for the union of all the sets, but discount the duplicate data correctly. The equation on the wikipedia page is somewhat complicated for someone who is not literate with all the symbols, so I will attempt to re-write and explain what's going on here.


On Wikipedia, there is a "general" formula that describes how to find the value for any number of sets. There's a symbol that is shaped like a "cup" which represents a "union." The equation states that for a union of n sets starting with the first, we can find this by using this compact equation:




That equation states that the union of n sets is equal to the sum of all combinations, where we add or negate based on the cardinality of a set. That sounds complicated so lets expand what that equation would look like if we were going to write it out the long way.




Okay, it's still a bit confusing. So lets look. That first line with the cup is the same as above, and that next line says we're going to add up the sum of the intersection (the upside down cup is a cap). The sum of all the intersections would mean (if we had 4 items, A, B, C, D) adding together A, B and C and D. The second line would mean we subtract the sum of AB, AC, AD, BC, BD, and CD. The third line means we add the sum of ABC, ABD, and BCD, the fourth line means we subtract the value of ABCD. The pattern here is we alternate the operation (adding or subtracting a sum) based on the cardinality (how many items are in a set) of something. Even cardinalities are subtracted, odd cardinalities are added. We sum together all combinations of that cardinality.


So we need to express this process with some code. I could iterate over each set, but I know another trick I could use. Since we already know the cardinality of any subset I'm not sure how to express that this works mathematically, so I posed a question to the Math Stack Exchange site. I can show it works, and if you look at it, this makes sense. While writing the equation's general form is presently beyond me, writing the code is dead-simple.

So all we do is add or subtract a value based on the count of "on bits" being even or odd.

Lets put that together and generate all combinations of all multiples and negate the most common divisors and the least common divisors, and it should look something like this.

Better, smarter, faster, stronger?

Who's faster?



For the small data-set with the range of 1..999, and two sets of multiples, using a brute-force method is roughly 60% slower. If you notice, using the brute-force method on a huge data set doesn't even register (actual ops per second came in at 0.01, meaning it took over a minute and a half just to compute), yet using the same data set on the math-oriented approach, we see nearly 3248 operations per second with the same performance regardless of the series size, entirely dependent on the number of multiples. FireFox must be doing something interesting with their JavaScript engine optimization.

What did I learn?
  • Speed is way better when you don't just brute-force it, unless a compiler does something funky.
  • Brute force ended up crashing on larger ranges.
  • Math-based approach.
  • A bit more about set theory, a cool way to use bitwise to generate combinations
  • Some cool bitwise techniques.
  • Execution time on math only increases when I add more multiples, it can solve 3's and 5's faster, for any data set imaginable -- yet, the brute force cannot do this.
  • Inclusion-exclusion principle
  • Basic LaTeX for math notation
  • How to read some more math
What did I learn that I need to learn?
  • Set theory
  • Euclidean theorem
  • More bitwise stuff
  • More LaTeX
  • Better understanding of math.
  • Why my getCombinations function has some quirks in it, perhaps a recursive approach would be better.
  • Maybe I should learn how to properly translate my algos into O-notation so I can sound fancy, might also help me find bottlenecks in my code.
I'm excited to look at this post in a year and find all the improvements I could have made, or other approaches I could have taken. For anyone who looks at their year-old code, you will understand.

I've also created a backup of all the code on github if for some reason jsFiddle dumps their database in the future.

Labels: ,







GET and POST security



I came across a question on Stack Overflow asked many years back:
...between a http POST and GET, what are the differences from a security perspective? Is one inherently more secure then another? I realize that POST doesn't expose information on the URL but is there any real value in that or is it just security through obscurity? What is the best practice here?
So, this is is a great and valid question, but unfortunately I see the wrong thoughts about this propagating all over the place to the point where some persons hold the belief that GET in inherently insecure simply because variables are shown in the address bar. I decided to post an answer to that question even though it was two years old, simply because I didn't entirely agree with the accepted answer as there was simply a massive shortage of detail provided, and agreeing with things dogmatically isn't helpful: we don't know what the issues related to these requests are so we can't contextualize it, and we flatly agree with something rather than understand why.

First, lets discuss what these two methods are. 

In the HTTP protocol, you can provide multiple methods of interacting with a resource (for example, index.html could be a resource) and you specify particulars in different ways depending on the method and what you're trying to do. In general, web browsers only deal with two of the eight methods:
  • OPTIONS
  • GET
  • HEAD
  • POST
  • PUT
  • DELETE
  • TRACE
  • CONNECT
All of these methods do their own special job within the HTTP protocol, as specified by RFC2616 section 9, but I'm only going to talk about GET and POST, feel free to research these on your own. Most browsers ignore the rest of the HTTP protocol suite.

When your web browser wants to load a resource, it will generally send a GET request. While you may request data through a post, that generally goes against convention, POST is intended to submit 

Now, lets pretend we have a very basic form that looks like this:



Your browser doesn't use magic to get that resource, it submits something that looks like this in raw-text to the server:





So to quickly summarize this, your GET requests puts the resource in the first line, it's asking for the root to submit a variable username=swordfish, another password=hunter2, and extra=lolcatzThe rest of the content explains to the server what host we're connecting to, what data we are accepting, what our browser is, the character set, and a bunch of information that is really outside of what we're talking about here.


That's GET, what about POST?





So, to explain the above here, we're requesting the root (that one / after POST means root, which is like requesting example.com directly), and then we're sending the same sorts of information about who we are, and lastly we're sending the exact same information that we sent in the get request, it's just way down there.


So what's the difference between GET and POST?


A GET method is considered idempotent and safe, and if you read through the RFC I posted way up at the start, you'd know that means they're not intended to take action other than retrieval, and that the request shouldn't have side-effects. This lets web browsers request a resource without an "oops" taking place, such as deleting a hundred records. 


For example, the URL http://example.com/?deleteUserID=1337 should be considered unsafe as it preforms an action, which is generally reserved for post. 


Now, what about http://example.com/?viewUserProfileID=1337 which should let us view a profile? Well, that's fine, it didn't take an action, it returned data.


Okay, yes, RFC21616 does describe 'security concerns' in section 15.1.3, but what is explained has nothing to do with using POST as security. It only says some web servers might log the page address because it's part of the URI.


Why am I contradicting the IETF?


I'm not. The wording of section 15.1.3 may be confusing to some people, which explains why some people consider it to mean something else. The sections are titled specifically, "Sensitive Information", This section doesn't say security is granted through POST, it says URIs aren't meant to contain sensitive data: they are resource links. Sensitive data is anything that might cause an action to be preformed or contains exploitable data. 


This doesn't mean "Never" include it, if you read RFC2119 you'll see clear definition for the interpretation of "should not". Saying GET is more secure than POST because URIs are suggested not to contain sensitive information assumes that somehow a web server, programs on a server, and web-browser are designed to work the way they've said, and this means that only GET is really a threat because it's a common convention to log the entire URI in the log files.


Well, I've got news for you, 
  • My web server logs could just as easily track all the data, regardless of it being POST or GET
  • My programs on the server could easily log all the data sent in the request, regardless of being POST or GET
  • My web-browser ("user agent") doesn't have to be one of these fancy popular things, I could have written it myself to maliciously log everything, regardless of it being POST or GET
  • Regular HTTP requests that are not sent over SSL can be eavesdropped and/or modified between my machine and the web server (commonly called Man-in-the-middle attack), regardless of it being POST or GET
The only 'security' you're accomplishing here, is preventing someone from checking out your browser history, or running off with server logs from a popular web server (Apache/IIS/XAMPP) that doesn't log POST data (keep in mind, my PHP file (or other language) can very easily keep it's own log file of your POST requests).


What if I use SSL?


Here's what those above requests would look like if sent to encrypted.google.com over SSL, between my machine and the google web server:





So what have we actually secured against?
  • Man in the middle attacks
  • Eves dropping
  • Some other more complex things that SSL takes care of
So, the entire block of data is encrypted in communication between here and there, but that doesn't prevent you from using a normal browser to bookmark https://example.com/?deleteUserID=1337 and making this request, which goes back to the concept of safe request methods in HTTP: methods with no action.

What if I wasn't using a normal browser? What if I could replay a POST request? I could just as easily delete that person. This isn't security.

If your browser isn't doing what the W3C wants it to do, it doesn't mean it's still safe because it's a GET or a POST. HTTP works regardless of your browser, you can open a telnet session and send raw-text to any web server to do whatever you want.


Once those SSL requests get to the web server, they're decrypted and the PHP file can just as easily log all the data you sent along (which only makes sense, it's the one that needs the decrypted data and does something with it). Once you've sent that data, your browser or virus on your computer could just as easily log all that data somewhere else.


Using SSL is a great addition to security, but thinking your even a fraction more secure using POST requests than GET is completely naive:
  • You're only protecting against what's common, not what's possible.
  • You're only preventing the very un-informed from logging into your https://email.example.com/?user=awesome&pass=hunter2
    • You haven't stopped: hackers, viruses, the web site admin, your system admin, someone checking over your shoulder, the guy sitting in Starbucks running firesheep. And you won't, regardless of it being POST or GET.
Final thoughts and common objections

All this offers is obfuscation. Security through obfuscation presents no genuine security, code, or business advantages to your website.
"POST is less susceptible to phishing and XSS"
Not really, infact, blog posts from five years ago (perhaps earlier) give clear instructions on how to do just that using post. Forwarding a GET request to become a POST request, or vice-verse is an infinitely simple task to do that should take a novice with PHP less than 5 minutes. This claim is absurd.
"Security comes isn't an all-or-nothing game, it's a best effort"
Yes, and some times are entirely foolish to try. If you try to write code for PCI DSS, they don't care (anywhere, what so ever) what your HTTP request method was. You will flunk the requirements if you try to claim it's "more" secure because someone can't cut+paste the link from the URL bar. Yes, this will stop the lowest-level script kiddies, meanwhile, Yuri is going to sell all of your credit card numbers for $1.50 on the black market, no credit card issuer will deal with you, and your silly attitude has ruined your reputation and business. Have a nice day? This is a classic example of variable manipulation.

If a site becomes infected, as is very common, the site may simply start logging everything and reporting it back to the command-and-control.


Summary
  • Don't use GET to do anything that preforms an action (such as logging in) as it's against design. 
  • POST has it's place, and can be used as a best-practice to help your user.
  • A hacker will find out what's in your inbox regardless of your login being POST.
  • Use SSL to get real security, not imaginary security.
  • Make sure you understand where the security really is.
  • Trust nobody.
  • Never become so deluded that you think obscurity is security.
So please, stop thinking you're protecting someone's information because "most browsers don't log POST." That's entirely asinine from a security perspective: you're not making it secure, you're simply making it harder for people who don't understand the technology in the first place.

Labels: , , ,







Full list of archived posts