[Teachable NLP] Résumé for SW Developers 📄

dleunji · April 29, 2021, 2:45am

Teachable NLP : Training the Model
TabTab : Write your own Résumé
Ainize : Coming soon…

1. Background

If you are preparing to get a new job ~~or move~~, writing résumé is inevitable. In résumé, you should briefly explain your own history and achievements in a single piece of paper. However, I don’t know what to write and how to write it.

So I made a résumé generator service that for developers who find it more difficult to introduce themselves in writing than in codes. It was possible by using Teachable NLP that trains GPT-2 with a text file of résumé. It is super easy if you follow below.

My résumé was written in only a few minutes.

- Web Frontend with HTML, CSS
- Creating Web RESTful API
- Taking part in preprocessing steps of machine learning mainly missing value treatment, outlier detection, encoding, scaling, feature selection.
- Testing machine learn algorithms in python. optimizing of existing algorithms.

Isn’t it interesting? Let me show you how to make the résumé generator!

2. Acquiring Dataset

In Kaggle, I acquired up-to-date resume dataset which is used for training GPT-2 in Teachable NLP. The file format is .csv and there is a table containing 2 columns, ‘Category’ and ‘Resume’.

3. Preprocessing Text

I used python package Pandas , and preprocessed in Jupyter notebook .

First of all, I made the table to DataFrame , and checked the basics. There are sufficient resume data for developer. And there’s no null value in the table. If you find out null value, please remove it. Fortunately, in my data, two columns are non null(Not Null)

import pandas as pd
import numpy as np
# Read File
data = pd.read_csv('/opt/notebooks/UpdatedResumeDataSet.csv')

# Check categories
print(data['Category'].unique())

"""
['Data Science' 'HR' 'Advocate' 'Arts' 'Web Designing'
 'Mechanical Engineer' 'Sales' 'Health and fitness' 'Civil Engineer'
 'Java Developer' 'Business Analyst' 'SAP Developer' 'Automation Testing'
 'Electrical Engineering' 'Operations Manager' 'Python Developer'
 'DevOps Engineer' 'Network Security Engineer' 'PMO' 'Database' 'Hadoop'
 'ETL Developer' 'DotNet Developer' 'Blockchain' 'Testing']
"""
# Check the numbers of data
print(data['Category'].value_counts())

"""
Java Developer               84
Testing                      70
DevOps Engineer              55
Python Developer             48
Web Designing                45
HR                           44
...
"""
# Check null value
data['Resume'].isna().sum()
"""
0
"""

And then, following below, you can get appropriate data specialized for developers.
A) Remove Unnecessary Words
B) Extract Resume Specialized For Developer

A) Remove Unnecessary Words

In cleaning stage, numbers, stopwords(meaningless word tokens) or extremely short words are usually removed. However I omit the steps. When I omitted the data and trained GPT-2 with the file, all formats of resume are gone and readabliity became poor. For example, the sentence, HTML Experience - Less than 3 months , becomes html experience less than months after cleaning. It sounds a little bit weird. Also given the lots of abbreviation for developers(e.g. nltk, api), it was unfit to simply cleaning the data because of the length of words.

For example, I’ll show you first resume in DataFrame. I have to remove * noticing the unordered list, and words generating encoding error.

I considered to remove parenthesis and comma, but I didn’t. Because I thought the meaning of library, package, framework is gone by removing them. So I kept them.
Rather, I thought number , - , ( , ) , , will let users know the format of resume.

"Skills * Programming Languages: Python (pandas, numpy, scipy, scikit-learn, matplotlib), Sql, Java, JavaScript/JQuery. * Machine learning: Regression, SVM, Na횄짱ve Bayes, KNN, Random Forest, Decision Trees, Boosting techniques, Cluster Analysis, Word Embedding, Sentiment Analysis, Natural Language processing, Dimensionality reduction, Topic Modelling (LDA, NMF), PCA & Neural Nets. * Database Visualizations: Mysql, SqlServer, Cassandra, Hbase, ElasticSearch D3.js, DC.js, Plotly, kibana, matplotlib, ggplot, Tableau. * Others: Regular Expression, HTML, CSS, Angular 6, Logstash, Kafka, Python Flask, Git, Docker, computer vision - Open CV and understanding of Deep learning.Education Details 

Data Science Assurance Associate 

Data Science Assurance Associate - Ernst & Young LLP
Skill Details 
JAVASCRIPT- Exprience - 24 months
jQuery- Exprience - 24 months
Python- Exprience - 24 monthsCompany Details 
.
.
.
MULTIPLE DATA SCIENCE AND ANALYTIC PROJECTS (USA CLIENTS)
TEXT ANALYTICS - MOTOR VEHICLE CUSTOMER REVIEW DATA * Received customer feedback survey data for past one year. Performed sentiment (Positive, Negative & Neutral) and time series analysis on customer comments across all 4 categories.
* Created heat map of terms by survey category based on frequency of words * Extracted Positive and Negative words across all the Survey categories and plotted Word cloud.
* Created customized tableau dashboards for effective reporting and visualizations.
CHATBOT * Developed a user friendly chatbot for one of our Products which handle simple questions about hours of operation, reservation options and so on.
* This chat bot serves entire product related questions. Giving overview of tool via QA platform and also give recommendation responses so that user question to build chain of relevant answer.
* This too has intelligence to build the pipeline of questions as per user requirement and asks the relevant /recommended questions.

.
.
.
창짖 FAP is a Fraud Analytics and investigative platform with inbuilt case manager and suite of Analytics for various ERP systems.
* It can be used by clients to interrogate their Accounting systems for identifying the anomalies which can be indicators of fraud by running advanced analytics
Tools & Technologies: HTML, JavaScript, SqlServer, JQuery, CSS, Bootstrap, Node.js, D3.js, DC.js"

The preprocessing is implemented in Python.

import re
import string

def clean_text(text):
    text = text.lower()
    #remove any numeric characters
    #text = ''.join([word for word in text if not word.isdigit()])
    #remove *(asterisk)
    text = re.sub('\*','',text)
    #replace consecutive non-ASCII characters with a space
    text = re.sub(r'[^\x00-\x7f]',r' ',text)
    #extra whitespace removal
    text = re.sub('\s+', ' ',text)
    return text

data['cleaned_text'] = data['Resume'].apply(lambda x : clean_text(x))

You can clean the text with regex, regular exprerssion. It looks complicated, but let me explain it easily.
스크린샷 2021-04-29 오전 11.35.17

I added the preprocessed data to DataFrame as a new column, cleaned_text using function apply .

B) Extract Resume Specialized For Developer

There are several jobs including HR, Arts, Mechanical Engineer in the Category column. I filtered out Resume of which Category belongs to Developer. And then I saved them to text file.

java = data['Category'] == 'Java Developer'
testing = data['Category'] == 'Testing'
devops = data['Category'] == 'DevOps Engineer'
python = data['Category'] == 'Python Developer'
hadoop = data['Category'] == 'Hadoop'
etl = data['Category'] == 'ETL Developer'
block = data['Category'] == 'Blockchain'
dt = data['Category'] == 'Data Science'
database = data['Category'] == 'Database'
dn = data['Category'] == 'DotNet Developer'
network = data['Category'] == 'Network Security Engineer'
sap = data['Category'] == 'SAP Developer'

cleaned_data = data[java|testing|devops|python|hadoop|etl|block|dt|database|dn|network|sap]

# Make the resume as one text
result = ""
for idx, row in cleaned_data.iterrows():
    result = result + row['cleaned_text'] + " "

# Save the text to a file
f = open("/opt/notebooks/developer.txt","w")
f.write(result)
f.close()

4. Teachable NLP

Teachable-NLP is a GPT-2 Finetuning program with a text(.txt) file without writing NLP codes. After training by uploading the preprocessed text file, you can fine-tune the GPT-2 model. I worried the size of data isn’t enough, so I chose medium size of model, and epoch to 3. In TabTab, you can test the model and generate resume.

Write your own perfect Résumé by choosing the most appropriate expressions out of 5 candidate sentences. And then show me your résumé in the Forum

Laeyoung · April 29, 2021, 8:23am

I got this output!

Topic		Replies	Views
[Teachable NLP] How to Use Teachable NLP AI Showcase (EN)	0	3590	April 14, 2021
How to fine-tune GPT-2? Q&A (EN)	0	1415	April 28, 2021
[Teachable NLP] GPT-2 Fairy Tales AI Showcase (EN)	1	708	April 16, 2021
[Teachable NLP] GPT-2 LoveCraft AI Showcase (EN)	1	662	April 22, 2021
Dataset Collection AI Dataset	1	440	July 11, 2022