High Dimensional Vectors / Vectorizing Data

Chris Tralie

It is very straightforward to generalize the vector rules we've seen so far to higher dimensions. Let's start in 3D. In this case, the vector has 3 components: one along the x-axis, one along the y-axis, and one along the z-axis. We can think of each component as being the side of a cube, as drawn below:

Actually, we can see right away that the magnitude rule generalizes what we saw in 2D. Let's consider the length of the projection of this vector onto the XZ plane, which I'll call u. This vector makes a right angle with the vector from the tip of u to (a, b, c), which I'll call v. The picture below shows this

u can really be thought of as just a 2D vector in the XZ plane, and we already know how to compute its magnitude

$|\vec{u}| = \sqrt{a^2 + c^2}$

v is even simpler because it's just a straight line along the y-axis, so its magnitude is the length b. Since u is perpendicular to v, we can apply the pythagorean theorem again to get the length of (a, b, c)

$|(a, b, c)| = |\vec{u}|^2 + |\vec{v}|^2 = \sqrt{(\sqrt{a^2 + c^2})^2 + b^2} = \sqrt{a^2 + b^2 + c^2}$

In fact, this can generalize to any Euclidean dimension if we keep applying the Pythagorean theorem inductively. So, for example, in 4D, the magnitude of the vector

$\vec{v} = (a, b, c, d)$

is

$|\vec{v}| = \sqrt{a^2+b^2+c^2+d^2}$

and the magnitude of a $d$-dimension vector v is

$|\vec{v}| = \sqrt{ \sum_{i = 1}^d v_i^2 }$

Vector addition and subtraction are exactly the same as well. For instance,

$(a, b, c, d) - (e, f, g, h) = (a-e, b-f, c-g, d-h)$

Even though we can't visualize such vectors directly, we can still compute these things about them

Features: Vectorizing Data

A lot of data that we deal with either comes with many dimensions, or we can summarize it with our own choice of dimensions, each of which is referred to as a feature. Let's return to the grad school admissions data that we applied Naive Bayes to in class. This time, we'll think of it in more geometric terms. We have 6 features:

["GRE Score", "TOEFL Score", "University Rating", "SOP", "LOR ", "CGPA"]

So we can think of each student as a vector in 6 dimensional space. When we have many such vectors, its a convention in AI/ML to order them in a 2D matrix where each row is a unique data entry and the columns correspond to different dimensions. I will usually refer to this matrix as a capital X in this class. For example, here's how we would organize a set of n points in 4D

In general, n points in d dimensions would be in a $n \times d$ matrix.

Let's plot the first 3 dimensions/columns of the grad school data below. We'll just plot the tip of each vector as a point, which we will color by the chance of being admitted. The brighter the point, the more likely it is that the student will be admitted

Actually, it will be good to normalize the dimensions first; that is, the dimensions each have a different range (e.g. CGPA is in the range 1-10, while GRE score is in the range from 290-340. To make sure that no dimension has an outsized influence, we'll put each dimension into the range [0, 1] by subtracting off the min of each dimension and dividing by the max that remains

Now, if we think about the magnitude of our vectors, it should grow larger as the student has a higher chance of being admitted. In other words, each dimension contributes positively towards chance of being admitted. Let's compute the magnitude of each dimension and see if there is such a correlation

We indeed see that the chance of getting in goes up with magnitude! There is a shorter way to do this with "numpy broadcasting" code, though. We can square all of the elements of X, then sum them individually across each row, and finally take the square root of what's left