HarvardX CS50 - Week 5

CS50 is a first year university course offered by Harvard University and more recently, Yale University, which dives into the basics of computer science, exploring algorithmic thinking and teaching students how to code in a variety of languages including Scratch, C, and Python, amongst others. In the last few weeks of the course, it offers a choice of three tracks - game, web, or mobile development that students can explore and create their final project which is encouraged to be something which is useful beyond the course.

The online course which is known as CS50x offers a stripped down experience of the real course. The assignments, known as problem sets (or psets), are marked against certain criteria and a number of test cases by an automated program. All of your coursework is pushed into individual branches of your own CS50 GitHub repository. I've copied these branches into a single repository for ease of perusal, which can be found here.

Week 5

This was a big one. The lecture was pretty mindbending and whilst a lot of these things made sense in theory, they were much much harder to implement. 

The theme of the week was data structures with the lecture discussing singly-linked lists, hash tables, and tries. All these data structures are heavily doused in pointers and structs. Tricky stuff given Week 4 had only just introduced pointers. More importantly, this week was the final week of C - something I was very happy about, being used to coding in more modern languages.

So the problem set in Week 5 just drops you into the thunderdome without so much as giving you linked list floaties. There's distribution code provided that nicely provides the template for the data structure, the prototypes for the functions and a few other bits and pieces. The problem to be implemented was a hash table and lookup as a spell checker - the logic of the hashing function, lookups, dictionary load, etc. are to be written by you. 

The first problem I tackled was loading in the dictionary into the hash table however it wasn't long before I realised I needed to be able to hash my data first before placing it in the table, so I searched out and used a hash function I found online (permitted by the course). Once I'd done this, I had the difficult task of loading the dictionary file in, line by line, into my hash table. I fed each word into the hash function, which generated an index for the array of linked lists I was building. If there was already something in my array, I shifted the existing linked list and inserted the item, connecting the dots as I went. This was definitely not something that was easy for me as I kept losing track of my pointers so I did get a little help from my partner.

One rather amusing bug I did run into was that I was hashing upper case words and comparing them against the lower case one stored in the dictionary, which was a bit frustrating till I figured it out. The check function for this reason does a case-insensitive string comparison.

I implemented my size function a bit stupidly, in that I actually looped through the entire structure to count rather than counting into a global variable as I added elements for size to use later, but I think I assumed I couldn't do something like that. I found the unload function pretty easy to implement once I understood how to loop over a hash table, as I just went through and popped elements out, moving pointers around.

This problem actually had a benchmarking component that would tell you the runtime of your various functions once the program executed. This was pretty cool, since I'd never really thought about the performance of my programs before. There was also a leaderboard at the end that you could submit your code to and see how it benched against everyone else's. I got 40th on the leaderboard when I submitted, I did try to tweak a few things but immediately dropped to something like 200th - I then reverted the change but never got back anywhere near 40th, so I guess it's a fickle thing.

Honestly, whilst learning C has been a good experience to learn how things work under the hood, I am somewhat glad to be back in my comfort zone, especially when it comes to memory management. Onwards to Python...