I’m really grateful for you taking the time to explain this. You’ve made me realise what I’m striving for: wisdom rather than knowledge. You’ve unblocked a problem with writing my book. I’m “no wiser” but you have given me greater confidence in my knowledge by increasing my confidence that such knowledge does exist, somewhere in my culture. I could be satisfied with “it can be proved”, now. That’s a higher level of unproven belief.
I can see that the slope of the tangent at that point is ‘at 45 degrees’. Why is that the tipping point?
Sorry, it’s my mental model leaking into things and making a mess I tend to think in terms of ratios, so from zero to one the ratio grows slowly, arithmetically. After that, it’s heading towards infinity (vertical), if the rate of change of the ratio isn’t decreasing. (That’s clearly not true in every case. It’s just the model I keep in my head.)
That mirrors my experience before uni. (There are plenty of proofs out there now, which I’m sure you’ve found. e.g. http://www.math.com/tables/derivatives/more/x^n.htm)
At university, everything was taught from first principles – stuff like calculus was presumed, of course. So, you ended up with a string of theorems and their proofs that would establish a result. Then, if you got stuck, you were able to go back to first principles to establish what was going on. It’s not possible to truly comprehend something without understanding what it’s built on. Hence why popular physics books rarely convey their subject. Heck, academic ones rarely do!
@stevejalim This is a bit – well, more than a bit – ahead of where we are on the course, but I’ll leave it here for reference:
Thanks. I hadn’t found anything yet but was planning to look. Today, I have mostly been doing abstraction.
"It’s not possible to truly comprehend something without understanding what it’s built on."
Also why computer science was tricky. I did a programming course. They were desperately trying to find some substance to hang everything on but I don’t think many people knew why they were doing things. It was craft. Cambridge seemed to be completely different, all number theory and algorithms.
I really wish I’d met SICP when it first came out and I’m a bit annoyed at myself for letting Lisp get away because I might have done, if I’d tracked the right flavour of Lisp. I followed Common Lisp for years but stopped programming before I got back in touch. Deciding not to learn emacs seems to have been a bad decision too. I guess it save me from decades of knowing everyone else was wrong and whining about it.
And that’s week two done and dusted.
This week we moved from one variable in our data to many via the fancily titled multivariate linear regression. This entails the additional step of feature scaling, which brings the variables into a similar range, lest they swamp the regression algorithms (making them slower rather than failing, I believe).
Edit: What this enables is finding a straight line through a set of data so that other values can be determined. e.g. determine property price from size. One variable is a weak indicator, of course, so the multivariate approach improves on this. e.g. determine property price from size, and number of bedrooms, and etc. This can only get more complex (with, say, location), thus creating lots of local minima, so I guess we’ll get around to that a some point.
To solve the regression algorithms, we’ve been using the gradient descent method (a numerical method for finding local minima). This week, the normal equation was introduced, although we’ve not done anything with it yet. It looks much easier to use (or less bother), but at O(n^3), it can be expensive on large data sets. (Edit: This was used in the optional work and is indeed w-a-y simpler to deal with. \o/ powerful machines.)
We also got an introduction to Octave and its matrix & vector handling. The basic syntax and usage is straightforward, but the matrix and vector handling is new to me, which makes problem solving tricky atm.
This week was a lot more intense than last week, mostly due to having to learn a new programming environment with unfamiliar data types in a few hours, then being tested on it in a fairly wonky test/runtime environment. It’s been an interesting and challenging week.
For you delectation, here’s the homework output:
Erk - done after a week from hell. But, done…
EDIT (after some sleep) I agree that this was quite a full on week. For me it wasn’t the Octave/MATLAB aspect - it was the maths in general that forced me to be super-attentive throughout the lectures. Not being blessed with a relatively recent degree in mathematics ( ;o) @ auxbuss ) my concern was that I’d get left behind, fast. However, I was pleasantly surprised how much it all held together with only dusty A-level memories (and only having done matrix manipulation at GCSE many, many moons ago).
Related to that, I thought Prof Ng judged the learning curve well, in terms of building up layers of techniques, which kept it away from being intimidating.
However, I did find that (for me) I’d covered so much mental ground by the time we reached the actual programming exercise, combining all the week had covered, I had a can’t-see-wood-for-trees moment and had to go back and review it all, to find earlier aspects. In the end, I went with vectorised implementations, though the time sunk into iterative work was useful consolidation.
I do need to make time to do the optional stuff, as well – Week 3 looks lighter in terms of code, so hopefully I’ll have time.
PS: on top of the course, add: rotavirus, half term and stacks of client work. But, I made it
Saw this jokey/cynical tweet:
and started to wonder if a chunk of devs is already rejecting techniques of ML. And, if so, what the reasons might be.
Speculation: maybe it’s a case of courses like this being a) interesting and b) plugged as being a route to a greater income (apparently), which leads to more people carrying ML-shaped hammers in their mental toolbox. Fast forward to ‘everything looks like a nail’.
Probably the usual reason: people with insufficient brain power try to do it, screw up then they or others try to give the whole area a bad reputation, either because it’s too dangerous, or if they can’t understand it, it must be impossible: ‘Reducing inequality can’t work because Stalin’.
This week, we started by tackling classification problems. That is, putting solutions of a problem into discrete buckets rather than reading off a solution from a continuous graph.
For example, for an email: spam or not spam. Or for the weather: sunny, rainy, cloudy.
At some point, you reach a decision boundary where one solution is preferred over another. Yes, it can be sunny, rainy, and cloudy simultaneously, but only one prevails in ML-land.
My notes are full of equations about how these things are solved, but I will spare you the details.
More interesting for me was the look at overfitting. Since the various numerical methods look at finding best-fit equations for solutions, the fact that an equation might be over-zealous is a real problem. So, we looked at various methods to tackle this problem. More equations. Even as a maths’ dude, I was finding this a little tedious.
Next week, we move into neural network territory. Hopefully, this isn’t explained as yet another series of (frankly dull) equations.
I think classification is a problem too “Everything is deeply intertwingled.”
Also: I keep telling everyone I failed at Lisp. I actually failed at Lisp at neural networks.
Belated rememberance: I was supposed to be extending a system written in MacLisp with a macro system library called Fuzzy which added fuzzy logic, because the borders of categories are fuzzy really. The only documentation I had for Fuzzy was a printout and I didn’t know Lisp.
Spoiler from next week’s programming exercise:
In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits.
So, def more equations coming our way, but hopefully more exciting results / a greater feeling of wielding some magic lamp
Reflecting on this a bit more, I think it’s the continued use of the same long, intricate eqns that’s bugging me. I suspect it’s done to reinforce familiarity. For me, it does the opposite. So, instead of becoming familiar with the underlying meaning, I’m familiar with a few long equations.
It’s not dissimilar to putting small pieces of code into functions named for what they do. It’s easier to read than having to wade through the longer code each time.
For sure there’s going to be a bunch more maths. However, I think I’ll start using my own notation.
I LOL’d this week when Ng explained some derivation and said something to the effect of “and we do a whole bunch of maths here to get…” I imagined @Woo muttering into his beard at that
Like this, you mean?
Always interesting to see how others perceive you from the things you say and my subsequent thinking about that may have increased my self-knowledge:
I was never one of those kids that took things apart to see how they worked but give me a black-box with 4 buttons on and tell me not to worry about what the last one does and I can’t think about anything else. I can accept levels of abstraction that I don’t need to worry about, but not being kept in the dark about an exposed interface. Partial knowledge of a level, without understanding, is what makes me crazy. I could use a library routine and leave out a parameter if someone I trusted told me to, but I could never use it again, on my own, until I knew what it was for. How do I know it is safe to not do ‘it’ in my my current context, if I don’t know what ‘it’ is?
I feel most exposed when I’m following someone’s lead and they won’t tell me about the fourth switch and when I refuse to go any further without an explanation, they reveal that they don’t know either. I then know my fate is tied to the decisions of a bluffer and I’ve had the bad judgement to follow them this far. I respect people who admit what they don’t know, far more than bluffers. This doesn’t seem to be fashionable but I think a team that ‘knows what it doesn’t know’ is infinitely safer and more scientific.
Couldn’t agree more.
Just to be clear, my comment wasn’t meant as a judgement; more empathy (re your comment above re differentiation).
I took it as an analysis that I felt I needed to understand everything, which would be fair from what I’d said; so made me question why that didn’t feel right. I found the process useful. I’ve been aware for ages that I tend to abstract things more than most people, so it was informative to work out what I intuitively do. I never mind fair criticism. It’s how we learn.
I watched an interesting video on functional languages by a guy who had clearly taken the Microsoft Shilling. He argued that some languages started at high functional purity e.g. Lisp, Haskell and went down to get more efficient while some started low and went up e.g. C, C# to get to functions. He (said he) thought F# was the perfect compromise. I try not to be compromised.
I think I stay as abstract as I can get to work, but I worry too much about detail, too early. My analyst/agilist Chi is still a bit out of whack.