I am working on natural language processing algorithms in a C# 3.5 environment. I did not find any open source NLP packages for C# or VB.NET.
NLTK is a great open source NLP package written in Python. It comes with an online book. I decided to try to embed IronPython under C# and run NLTK from there. Here are a few thoughts about the experience.
C# and IronPython
IronPython is a very good implementation of Python, but in C# 3.5 there is still a mismatch between C# and Python; this becomes an issue when you are dealing with a library as big as NLTK.
The integration between IronPython and C# is going to improve with C# 4.0. How much remains to be seen.
Separate processes for NLTK under CPython and C#
If your C# tasks and your NLP tasks are not interacting too much, it might be simpler to have a C# program call a NLP CPython program as an external process. E.g. you want to analyze the content of a Word document. You would open the Word document in C# create a Python process pipe the text into it and read the result back in JSON or XML and display it in ASP, WPF or WinForms.
Small NLP tasks
There is a learning curve for both NLTK and embedded IronPython, that slows down you down when you start work.
Big NLP projects
The setup cost is not an issue, but at some point the mismatch between Python and C#, will start to outweigh the advantages you get.
Prototyping in NLTK
Start writing your application in NLTK either under CPython or IronPython. This should improve development time substantially. You might find that your prototype is good enough and you do not need to port it to C#; or you will have a working program that you can port to C#.
-Sami Badawi
NLTK is a great open source NLP package written in Python. It comes with an online book. I decided to try to embed IronPython under C# and run NLTK from there. Here are a few thoughts about the experience.
Problems with embedding IronPython and NLTK
- Some libraries that NLTK uses are not installed in IronPython, e.g. zlib and numpy, you can mainly patch this up
- You need a good understanding of how embedded IronPython works
- The connection between Python and C# is not seamless
- Sending data between Python and C# takes work
- NLTK is pretty slow at starting up
- Doing large scale machine learning in NLTK is slow
C# and IronPython
IronPython is a very good implementation of Python, but in C# 3.5 there is still a mismatch between C# and Python; this becomes an issue when you are dealing with a library as big as NLTK.The integration between IronPython and C# is going to improve with C# 4.0. How much remains to be seen.
To embed or not to embed
When is embedding IronPython and NLTK inside C# a good idea?Separate processes for NLTK under CPython and C#
If your C# tasks and your NLP tasks are not interacting too much, it might be simpler to have a C# program call a NLP CPython program as an external process. E.g. you want to analyze the content of a Word document. You would open the Word document in C# create a Python process pipe the text into it and read the result back in JSON or XML and display it in ASP, WPF or WinForms.Small NLP tasks
There is a learning curve for both NLTK and embedded IronPython, that slows down you down when you start work.Medium sized NLP projects
The setup cost is not an issue so embedding IronPython and NLTK could work very well here.Big NLP projects
The setup cost is not an issue, but at some point the mismatch between Python and C#, will start to outweigh the advantages you get.Prototyping in NLTK
Start writing your application in NLTK either under CPython or IronPython. This should improve development time substantially. You might find that your prototype is good enough and you do not need to port it to C#; or you will have a working program that you can port to C#.References
- Post about running NLTK from IronPython
- Chapter 15 of IronPython in Action is about embedding IronPython in C# or VB.NET
- Source code examples from IronPython in Action
- Here is a short intro to embedding IronPython by Michael Foord
- I tried loading Jeff Hardy'sIronPython.Zlib.dll using Assembly.LoadFile, that did not work but I could add it with clr.AddReference from the embedded Python code
-Sami Badawi