Resume-Classification-using-NLP-and-Machine-Learning-techniquesFebruary 1, 2016
IMPORTANT TO READ!!!
– Notebook I –
- The Following notebook is aimed at loading and access the data directly from the files by mentioning the path names. The file format used while importing is of JSON. Loading the json files and retrieving the job titles and work experience. - Removal of any redundancy for any non-alphanumeric values and storing the values into two separate dataframes. Finally converting the values stored in the dataframes into .csv file format as ‘job_desc.csv’
– Notebook II –
1. This notebook shows the implementation of different machine learning packages and shows a way to vectorize and split the data present. - Reading the given data from csv file - Lemmatization and transformation of data - Splitting and Vectorizing the data 2. Using LinearSVC a part of scikit-learn SVM package for classification - Building a LinearSVC model - Comparing the LinearSVC model with logistic regression o different metrics - Finding out the top positively and top negatively correlated features along with their importance in the model. - Lastly building a cloud word for visualization of data
What is a cloud word?
Many times you might have seen a cloud filled with lots of words in different sizes, which represent the frequency or the importance of each word. This is called Tag Cloud or WordCloud. In simple terms cloud word or word cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. They are widely used for analyzing data from social network websites.
For generating word cloud in Python, modules needed are – matplotlib, pandas and wordcloud. To install these packages, run the following commands:
- pip install matplotlib
- pip install pandas
- pip install wordcloud