Empirical Methods in Natural Language Processing Course

April 12th, 2009  |  Published in Machine Learning  |  11 Comments

Print Friendly

I came across this course in Natural Language Processing today while doing some research. It is currently being taught at The University of Edinburgh School of Informatics.

This course is an introduction to data-driven methods applied to natural language processing. The emphasis is on methods, but we will survey applications such as syntactic parsing, text classification, information extraction, tagging, summarization. The final lectures will deal with statistical machine translation.

See the lecture notes here.

Responses

  1. aresh says:

    January 25th, 2010at 7:08 am(#)

    http://www.codeproject.com/KB/recipes/LinearCorrelation.aspx

  2. Jorge Fernandes says:

    February 9th, 2010at 1:34 pm(#)

    I’m currently interested in implementing a LSI, and i’m in need of a SVD implementation.

    Reading your post, i tried SmartMathLibrary, using your test code and the downloaded .dll.

    Unfortunately i get a System.TypeLoadException related to SmartMathLibrary.AbstractMatrix (cant exactly copy the full report since it’s in a foreign language).

    Any ideas why?

    Thnx in advance

  3. Eric says:

    February 9th, 2010at 2:23 pm(#)

    Hey Jorge,

    Thanks for visiting the site! I have 1 idea why this may not be working as I ran in to something similar. First, a System.TypeLoadException is essentially a “Could not load file or assembly” error. When I was testing out numerous SVD libraries I ran in to the same error with LatoolNet and posted my error: http://latoolnet.codeplex.com/WorkItem/View.aspx?WorkItemId=5332 and the resulting error was at the time was due to the fact that I was on a x64 (64bit) system. So if this is the same error you can do 1 of 2 things download the full SmartMathLibrary project a recompile all .dll’s and import in to your project or two maybe try LatoolNet as it is also very good (and I believe the have dll’s now for both 64/32 bit systems) and the SVD implementation is very similar and would probably take you very little time to make the small adjustments.

    Hope this help’s please let me know how it works out or if you have any other questions.

    Eric

  4. Jorge Fernandes says:

    February 10th, 2010at 11:26 am(#)

    Thnx for the tip, solved my previous problem, and this example code runs witouth a problem.

    So my next step was trying to rebuild the holeDifficulty original matrix by multiplying the SVD components, so i added:

    Matrix u = SVD.U.Copy();
    Matrix s = SVD.S.ToMatrix();
    Matrix v = SVD.V.Copy();

    Matrix aux = u * s * v.Transpose();

    And i get another exception:

    SmartMathLibrary.IllegalArithmeticException was unhandled
    Message=”The number of columns of the one side operator and the number of rows of the other side operator have to be even”
    Source=”SmartMathLibrary”

    So i compared the prints of the u,s,v with the results of doing a SVD in Wolfram Alpha (http://www.wolframalpha.com/input/?i=SVD+{{2%2C1%2C0%2C0}%2C{4%2C3%2C0%2C0}}) and indeed the matrix’s have the wrong dimensions.

    Is it a bug in this lib version? At least i dont see a mistake in my part, especially since it’s mostly copied from your example.

  5. Eric says:

    February 10th, 2010at 1:42 pm(#)

    Well there maybe two things going wrong here with the code you’ve provided.

    First, if you print out the matrices U, S, V you will notice something a bit odd about S or sigma – it looks like the following:


    1
    2
    3

    instead of


    1 0 0
    0 2 0
    0 0 3

    So when you do matrix multiplication they have to be the same dimensions as some of the matrices sigma – Sigma has the right height as V but not the same width as U. That’s why I have the CreateSigmaMatrix method so it gives us the same height and width as U and V (when you pass in their height and width) – resulting in the matrix that is in the second example. In short the sigma matrix is off as correctly noted in the Wolfram link.

    Second, when performing Latent Semantic Indexing you have to reduce/shrink these matrices and this can also give this error but is most often caused due to an invalid k value.
    I am not exactly sure where things go wrong but I do have a work around for you that test’s out the wordVector or ‘u’ matrix see the following code:


    private int FindK(Matrix wordVector)
    {
    int k = 0;

    for (int i = 2; i < 10; i++)
    {
    Matrix reducedWordVector = CopyMatrix(wordVector, wordVector.Rows, i - 1);
    if(reducedWordVector.Columns == 2) /// change this if you want a different value for k
    {
    k = i;
    }
    }
    return k;
    }

    ///

    /// Gets the document word plots.
    ///
    /// My matrix. private void GetDocumentWordPlots(Matrix myMatrix)
    {
    // Run single value decomposition
    var svd = new SingularValueDecomposition(myMatrix);
    svd.ExecuteDecomposition();

    // Put components into individual matrices
    Matrix wordVector = svd.U.Copy();
    Matrix sigma = svd.S.ToMatrix();
    Matrix documentVector = svd.V.Copy();

    // get value of k
    var k = (int) Math.Floor(Math.Sqrt(myMatrix.Columns));
    k = FindK(wordVector);

    // reduce the vectors
    Matrix reducedWordVector = CopyMatrix(wordVector, wordVector.Rows, k - 1);
    Matrix reducedSigma = CreateSigmaMatrix(sigma, k - 1, k - 1);
    Matrix reducedDocumentVector = CopyMatrix(documentVector, documentVector.Rows, k - 1);

    //etc.. etc...
    Matrix a = reducedWordVector*reducedSigma*reducedDocumentVector.Transpose();
    }

    Mainly, if you run it to this error print out your matrices and see where things have gone wrong and have some of the same dimensions as the pictures above. Hope this helps and is clear – please feel free to drop me a line if you need any other help!

    Take Care,
    Eric

  6. Jorge Fernandes says:

    February 10th, 2010at 3:26 pm(#)

    At the moment i just wanted to test if the SVD would rebuild the original matrix by multiplying it’s components after transformation, just to know if the SVD was working correctly.

    That detail about the sigma matrix eluded me, since i assumed the lib would convert the general vector to the proper sized matrix. After using your CreateSigmaMatrix method with the general vector Count (instead of k) and multiplying the components according to the formula i got the original {{2,1,0,0},{4,3,0,0}} matrix.

    So i’ll look into LSI from here, with the rest of your explanations i hope to get working results soon, then i just need to figure out how exactly can LSI be used to categorize documents.

    Thnx for everything, i hope i wont need to bother you again,

    Jorge

  7. Eric says:

    February 10th, 2010at 3:52 pm(#)

    Cool – glad it worked! And if you have any other questions please feel free to just post/email me a message – it’s no problem. Eric

  8. Jorge Fernandes says:

    February 17th, 2010at 11:40 am(#)

    Good afternoon,

    i’m sorry to bother you again but i just returned from some holidays and i’m back to the problem.

    I am now looking at LSI, and i have a question, didn’t you make a mistake in the lsi1 image? You display V’s first two rows when you should display V.Transpose’s.

    If the idea was to actually display V, shouldn’t it display the first two columns instead?

    It’s just a minor detail, but i tought i should let you know.

  9. Eric says:

    February 17th, 2010at 12:46 pm(#)

    Hey Jorge!

    You are absolutely right! Good catch and thanks for letting me know! It is just V and not V.Transposed. For the correct walk through as far as values are concerned check out the “Indexing by Latent Semantic Analysis” (Deerwester et al.) paper starting on page 26 for the correct values of the matrices http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf

    I think in my hast – I just slapped it together because I was a bit excited that I got it to work. :-)

    Eric

  10. Jorge Fernandes says:

    February 18th, 2010at 1:12 pm(#)

    Hey,

    did you by any change tried implementing a way to query the resulting end matrix?

    From what i read, it should be a simple cosine similarity between the documents in the LSI matrix and the query translated into the latent semantic space, something like:

    latentQuery = (reducedSigma)^-1 * reducedWordVector.Transpose() * query

    or:

    latentQuery = query.Transpose() * reducedWordVector * (reducedSigma)^-1

    I wanted to try it out, but i couldnt find which method in SmarthMathLibrary inverts a Matrix, since in the documentation i only found one for the ComplexMatrix.

    Any pointers you can give me relating this issue?

    Many thanks

  11. Eric says:

    February 18th, 2010at 3:18 pm(#)

    I haven’t gotten in to trying to querying the end matrix yet. I was going more in the direction of wanting to cluster documents via k-means, fuzzy c. And as far as trying to figure out the similarity the only thing I have done is the Euclidean, Manhattan distance or dot product since I reduced it down.

    Do you have anything I could read as I am pretty interested in what you are trying to do. I am not sure what a query matrix would look like or what it’s values are.

    After looking through the SmartMartLib it does have an inverse functions but I have no idea on how to access it so I quickly re-wrote it for you and seems to work on a couple of tutorials that I tested it out on. FYI the inverse of a matrix has to be a square matrix – I didn’t know this but it shouldn’t be a problem because Sigma is always square if I remember correctly.


    ///

    /// Inverses the matrix.
    ///

    /// A. ///
    public Matrix InverseMatrix(Matrix a)
    {
    if (a.Rows != a.Columns)
    {
    throw new ArgumentException("The specified matrix has to be a square matrix.");
    }

    if (a == (Matrix)null)
    {
    throw new ArgumentNullException("matrix");
    }

    Matrix tempuri = Matrix.AppendIdentityMatrix(a);
    Matrix tempResult = new Matrix(tempuri.Rows, tempuri.Columns);
    Matrix resultFinal = new Matrix(a.Rows, a.Columns);

    // Set up
    for (int i = 0; i < tempuri.Rows; i++)
    {
    for (int j = 0; j < tempuri.Columns; j++)
    {
    tempResult.SetValueAtPosition(new MatrixPosition(i, j),
    tempuri.GetValueAtPosition(new MatrixPosition(i, j)));
    }
    }

    tempuri.CreateUpperTriangularMatrix();
    tempuri.CreateDiagonalMatrix();
    tempuri.CreateIdentityMatrix();

    // Run inverse calculations
    for (int i = 0; i < tempuri.Rows; i++)
    {
    for (int j = tempuri.Rows; j < tempuri.Columns; j++)
    {
    tempResult.SetValueAtPosition(new MatrixPosition(i, j - tempuri.Rows),
    tempuri.GetValueAtPosition(new MatrixPosition(i, j)));
    }
    }

    // Shrink down matrix
    for (int i = 0; i < a.Rows; i++)
    {
    for (int j = 0; j < a.Columns; j++)
    {
    resultFinal.MatrixData[i, j] = tempResult.MatrixData[i, j];
    }
    }

    return resultFinal;
    }

    Hope this helps!

    Eric

Leave a Response


Archives

Calendar

May 2012
S M T W T F S
« Apr    
 12345
6789101112
13141516171819
20212223242526
2728293031