Parser API

Collection of API endpoints which enable various features provided by the JANZZ Deep Learning tools, including job parsing, cv parsing, occupation extraction, oja classifier, multiple entity job posting classifier and term similarity.

job parsing

Parsing an unstructured job description in order to extract relevant types of entities.

Special notes

  • Each supported language is backed by trained deep learning models, specific for that language.

  • Currently, English, Spanish, Norwegian, Arabic, German, Japanese, Portuguese, French, Italian, Chinese and Dutch are supported with higher recall/precision.

  • Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Polish, Romanian, Slovak, Slovenian, Swedish, Catalan, Basque, Thai, Indonesian, Tagalog, Hindi, Urdu, Malay and Vietnamese are also covered with lower recall/precision.

  • The parser input could be free text or a file, the supported file types are: pdf, doc, docx, csv, tsv, txt, rtf, odf, xml, html and xlsx.

  • Input text does not need to be pre-processed or normalized, as it will be tokenized during parsing, so extra newlines, spaces, punctuation, etc. will be removed, and language does not need to be provided, as it will be automatically detected.

  • Available entity classes:

    • Occupation: Teacher, Engineer, Medical doctor, etc.
    • Specialization: Specific field, for example: Surgery, Administration, etc.
    • Function: Senior director, Manager, etc.
    • Contract_type: Part-time, Full-time, etc.
    • Localization: Switzerland, Toronto ON Canada, etc.
    • Supervisor: Unlike occupation, this identifies the occupation/function of the supervisor for this job
    • Skills: Java, Antenna design, etc.
    • Softskills: Friendly, Fast reaction time, etc.
    • Languages: English, Fluent in Spanish, etc.
    • Industry: Finance, Aerospace, etc.
    • Experience: 5 years, 2 years experience in accounting, etc.
    • Availability: Start date, specific date range for job, ASAP, etc.
    • Salary: Salary range or description such as “competitive salary”
    • Authorizations: HR certifications, IT certifications, Security clearances, etc.
    • Education: University degrees, courses, etc.
    • Working_conditions: Lifting heavy objects, shift work, etc.
    • Company: Company names
    • Number of Vacancies: Amount of open positions
    • Social tags: Special treatments for people with special conditions, first job, over 50 years old, etc.

SINGLE JOB DESCRIPTION:

URL

https://www.janzz.jobs/japi/parser/parse_job/

ALLOWABLE METHODS

POST

JSON body format

{
    "title": "job title...",
    "body": "job description..."
}

Or

multipart/form-data

{
    "file": open("document.pdf", "rb")
}

description of input fields

  • title
    • format: string, optional
    • effect: search for entities of occupation class in the title.
  • body
    • format: string, newlines represented by n
    • effect: search for entities of all classes in the body.
  • file
    • format: document, is recommended to send files in binary mode.
    • effect: search for entities of all classes within the text extracted from the document.

Query parameters

  • want_cids
    • format: providing one of (true, 1) will return the concept id instead of true / false. If no concept exists, null will be returned.
  • compact
    • format: providing one of (true, 1) will omit the “text” field from the json output. This will make reponses considerably smaller if the text is not required.
  • output
    • format: will switch between the different processing modes of the API, options: “standard”, “normalized” or “[customer name]”. For Janzz customers, we offer customized parser outputs based on your individual requirements. You can design this specialized version of the API by contacting our sales team at info@janzz.technology and access the results by passing your “[customer name]” to the “output” parameter in the API, default mode=”standard”.

description of output fields

  • id
    • format: integer
    • description: will always be null, used internally for other processes
  • lang
    • format: 2-character string
    • description: the ISO 639-1 language code detected during parsing
  • title
    • format: string
    • description: the title provided in the input
  • json
    • format: JSON object

    • description: all identified entities

    • format:

      "Entity class" : [list of identified texts],
      "Description" : "tokenized title and description, as used in parser, such that every identified entity appears in its exact form in the description"
      
      • the Entity Class list is a list that contains all the different entity types that the parser can detect and their extracted instances, each entity will have a different level of normalization depending on whether the parser mode is set to “standard” or “normalized”.

standard output

The standard output will produce a list of tuples, [String, boolean], where String is the entity, boolean is true if the entity exists in the JANZZ concept graph, in the respective branch. Entities which do not exist in the concept graph can still be identified during parsing, the output is not limited to existing terms. The JSON response has the following format:

{
    "id": null,
    "lang": "en",
    "title": "senior Java developers wanted",
    "json": {
        "Function": [],
        "Supervisor": [],
        "Specialization": [],
        "Localization": [
            [
                "Paris",
                false
            ]
        ],
        "Skills": [
            [
                "Java",
                true
            ],
            [
                "C++",
                true
            ],
            [
                "some unknown skill",
                false
            ]
        ],
        "Industry": [],
        "Softskills": [],
        "Experience": [],
        "Availability": [],
        "Languages": [],
        "Description": "the job title and description",
        "Salary": [],
        "Contract_type": [
            [
                "Full - time",
                true
            ],
            [
                "Part - time",
                true
            ],
            [
                "Full - time including weekends",
                false
            ]
        ],
        "Authorizations": [],
        "Education": [],
        "Working_conditions": [],
        "Company": [],
        "Occupation": [
            [
                "Senior Java developers",
                false
            ],
            [
                "Software engineer",
                true
            ]
        ]
    }
}

normalized output

This mode provides a more detailed output of the parser. For example, in elements such as languages and skills, it will return the proficiency level on a scale of 1 to 5, with 1 being a beginner and 5 being an expert. For languages, this proficiency is subdivided into oral and written levels. For education, it will return their respective education levels for a given title, e.g. “Bachelor’s degree in Technical Engineering” its corresponding level is “Bachelor/university primary degree/degree”, in addition to the cid of the concept. Various types of normalizations are applied to multiple elements such as contract types, experience, salary, etc., therefore, this mode takes longer to process a query.

{
    "id": null,
    "title": "Job title...",
    "lang": "en",
    "json": {
        "title": "",
        "Industry": ["Freight transportation, logistics and stockholding", "Commercial activities, administration and management",],
        "Company": [
            "First Transit",
            "FirstGroup"
        ],
        "Contract information": [
            {
                "contract_type": ["Full-time"],
                "duration": ["Permanent"],
                "working percentage": {
                    "min": 80,
                    "max": 100
                },
                "workload": {
                    "amount": 40,
                    "lower_unit": "hour",
                    "higher_unit": "week"
                }
            }
        ],
        "Salary": [
            {
                "amount": {
                    "min": 25.0,
                    "max": 25.0
                },
                "currency": "USD",
                "period": "hour"
            }
        ],
        "Benefits": [
            "Health Benefits",
            "Uniforms",
            "Medical, Dental, Vision",
            "401 (k)",
            "Uniforms provided",
            "Safety glasses supplied"
        ],
        "Social tags": [],
        "Working_conditions": [
            "Lift up to 65 lbs",
            "Work in a crouched position",
            "Subjected to dust, dirt, and grease conditions"
        ],
        "Localizations": ['Austin, Texas'],
        "Number of Vacancies": [2],
        "Availability": ["Starting in summer"],
        "Occupation": [
            {
                "text": "Diesel Mechanics",
                "concept": "Diesel Engine Mechanic",
                "cid": "95500",
                "level": "Individual_Contributor_Experienced"
            }
        ],
        "Function": [],
        "Specialization": ["Oil engines"],
        "Education": [{'text': 'High School Diploma', 'concept': 'A Level/High School Diploma/International Baccalaureate (IB)', 'cid': 11761, 'level': 'A Level/High School Diploma/International Baccalaureate (IB)', 'specified education field': False}]
        "Experience": [
            {
                "text": "3 + years hands - on diesel maintenance and repair experience",
                "concept": null,
                "cid": null,
                "industry": [
                    "Plant, machine and metal construction"
                ],
                "number_of_years": 3
            }
        ],
        "Skills": [
            {
                "text": "Perform vehicle maintenance",
                "concept": "Vehicle maintenance",
                "cid": "36436",
                "level": 3
            },
            {
                "text": "Computer skills",
                "concept": "Use computers",
                "cid": "36329",
                "level": 3
            },
            {
                "text": "Maintenance Shop",
                "concept": "Shop maintenance",
                "cid": "1116006",
                "level": 3
            },
            {
                "text": "Diesel maintenance and repair",
                "concept": "Heavy diesel repair and maintenance",
                "cid": "1374680",
                "level": 3
            }
        ],
        "Softskills": [
            {
                "text": "Work with little to no supervision",
                "concept": "Work with little to no supervision",
                "cid": "1150640",
                "level": 3
            },
            {
                "text": "Self - motivating",
                "concept": "Self motivation",
                "cid": "557689",
                "level": 3
            },
            {
                "text": "Communicate professionally",
                "concept": "Communication skills",
                "cid": "34140",
                "level": 3
            }
        ],
        "Languages": [{'text': 'English', 'concept': 'English', 'cid': '148', 'oral level': 4, 'written level': 4}],
        "Authorizations": [
            {
                "text": "Background Check",
                "concept": "Background check",
                "cid": "411228"
            },
            {
                "text": "Drug Screen",
                "concept": "Drug and Alcohol Screening",
                "cid": "411447"
            },
            {
                "text": "Motor Vehicle Records Check",
                "concept": "Driving record check",
                "cid": "2339946"
            }
        ],
        "Supervisor": ["Chief of Power Plant maintenance"],
        "Contact Information": {
            "Name": ["John Doe"],
            "Function": ["Sr.Recruiter"],
            "Address": ["Somewhere in ..."the Mancha""],
            "Email": ["p.rocinante@lgmail.com"],
            "Number": ["+411234345709"],
            "Online Information": ["psuplyforall.com"]
        },
        "Description": "Job description..."
    }
}

MULTIPLE JOB DESCRIPTIONS:

URL

https://www.janzz.jobs/japi/parser/parse_job_batch/

ALLOWABLE METHODS

POST

JSON body format

[
    {"title": title1, "body": description1},
    {"title": title2, "body": description2},
    {"title": title3, "body": description3},
    ...
]

description of input fields

  • json
    • format: list of dictionaries, the input must be a list of job titles and descriptions with the same format as used for single job parsing. The maximum number of jobs that can be processed in a single call is 50.
    • effect: search all different types of entities inside the job titles and descriptions.

Query parameters

  • want_cids
    • format: providing one of (true, 1) will return the concept id instead of true / false. If no concept exists, null will be returned.
  • compact
    • format: providing one of (true, 1) will omit the “text” field from the json output. This will make reponses considerably smaller if the text is not required.
  • output
    • format: will switch between the different processing modes of the API, options: “standard”, “normalized” or “[customer name]”, default=”standard”.

returns

A JSON response with the following format:

[
{"success": true, "id": null, "lang": language of the 1st job, "title": title1, "json": { "Occupation": [...], "Skills": [...], "Education": [...], ...}},
{"success": false, "id": null, "detail": "We currently do not provide support for this language"}},
{"success": true, "id": null, "lang": language of the 3rd job, "title": title3, "json": { "Occupation": [...], "Skills": [...], "Education": [...], ...}},
...
]

description of output fields

  • json
    • format: list of dictionaries,
    • description: the output corresponds to a list of processed jobs that are returned in the same order as the input. Each element of the list includes a “success” key that indicates if the job could be parsed or not. If not possible, the dictionary will contain another field “detail “with a log of the possible error. The processed jobs will include the same fields and characteristics as those described in the individual job parsing: ({“id”, “lang”, “title”, “json”}).

cv parsing

Parsing an unstructured resume text in order to extract relevant types of entities.

Special notes

  • Each supported language is backed by trained deep learning models specific for that language.
  • Currently, English, Spanish, German, Japanese, Portuguese and French are supported with higher recall/precision.
  • Bulgarian, Croatian, Czech, Danish, Dutch, Norwegian, Italian, Arabic, Chinese, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Polish, Romanian, Slovak, Slovenian, Swedish, Catalan, Basque, Thai, Indonesian, Tagalog, Hindi, Urdu, Malay and Vietnamese are also covered with lower recall/precision.
  • The parser input could be free text or a file, the supported file types are: pdf, doc, docx, csv, tsv, txt, rtf, odf, xml, html and xlsx.
  • Input text does not need to be pre-processed or normalized, as it will be tokenized during parsing, so extra newlines, spaces, punctuation, etc. will be removed, and language does not need to be provided, as it will be automatically detected.
  • The entities may be related to the candidate’s personal profile like:
    • Name: John
    • Surname: Smith
    • Marital_status: Single, married, divorced, widowed, etc.
    • Birthplace: London, UK
    • Religion: Christian, Muslim, Hindu, etc.
    • Title or Degree: Dr., Prof., etc.
    • Grade_point_average: 4.5
    • Email: johnsmith@gmail.com
    • Interests: Swimming, listening to Jazz, stamp collecting, etc.
    • Citizenships: US Citizen, German, etc.
    • Birthday: 01.12.1975
    • Address: 23 Park Road, Louisville, KY
    • Achievements: Employee of the year 2019, design award recipient, etc.
    • Gender: Male, female, etc.
    • Age: Current age eg. 45
    • Publications: Smith J (2020). “What is the future of job matching?” J Int HR 35 (4): 125-148
    • Memberships: Member of the UK Engineering Council, etc.
    • ID: ID card/passport number eg. Passport No: 8459214
    • Social tags: Tags for use within a JANZZ eg. First job, equal opportunity employment, etc.
    • References: Contact details of personal references eg. Jane Doe, former Line Manager, j.doe@gmail.com, 458-7896
    • Characteristics: Height: 174cm, brown eyes, etc.
    • Telephone: (541) 754-3010
    • Social_media: LinkedIn, Facebook, Twitter, etc. eg. https://www.linkedin.com/in/john-smith-123/
    • Portfolio: Samples of past work eg. www.beautifulwebsite.com
  • Also entities related to its professional career like:
    • Occupation: Teacher, Engineer, Medical doctor, etc.
    • Company: company names
    • Specialization: Specific field, for example: Surgery, Administration, etc.
    • Function: Senior director, Manager, etc.
    • Contract_type: Part-time, Full-time, etc.
    • Localization: Switzerland, Toronto ON Canada, etc.
    • Skills: Java, Antenna design, etc.
    • Softskills: Friendly, Fast reaction time, etc.
    • Languages: English, Fluent in Spanish, etc.
    • Industry: Finance, Aerospace, etc.
    • Experience: 5 years, 2 years experience in accounting, etc.
    • Availability: Start date, specific date range for job, ASAP, etc.
    • Salary: Salary range or description such as “competitive salary”
    • Authorizations: HR certifications, IT certifications, Security clearances, etc.
    • Working_Permission: EU Work Permit, US Green Card etc.
    • Education: University degrees, courses, etc.
    • Schools_or_training_companies: State University of New York, Oakfield High School, etc.
    • Working_conditions: Lifting heavy objects, shift work, etc.

SINGLE RESUME:

URL

https://www.janzz.jobs/japi/parser/parse_cv/

ALLOWABLE METHODS

POST

JSON body format

{
    "title": "resume title...",
    "body": "resume description..."
}

Or

multipart/form-data

{
    'file': open("document.pdf", 'rb')
}

description of input fields

  • title
    • format: string, optional
    • effect: search for entities of occupation class in the title.
  • body
    • format: string, newlines represented by n
    • effect: search for entities of all classes in the body.
  • file
    • format: document, is recommended to send files in binary mode.
    • effect: search for entities of all classes within the text extracted from the document.

Query parameters

  • want_cids
    • format: providing one of (true, 1) will return the concept id instead of true / false. If no concept exists, null will be returned.
  • compact
    • format: providing one of (true, 1) will omit the “text” field from the json output. This will make reponses considerably smaller if the text is not required.
  • output
    • format: will switch between the different processing modes of the API, options: “standard”, “normalized” or “[customer name]”. For Janzz customers, we offer customized parser outputs based on your individual requirements. You can design this specialized version of the API by contacting our sales team at info@janzz.technology and access the results by passing your “[customer name]” to the “output” parameter in the API, default mode=”standard”.
  • get_company_info
    • format: if activated, the industry will be obtained from the company where the candidate has completed each experience, and additional information will also be provided for each company. If deactivated, the industry will be obtained only from the activity/occupation of each individual experience. This service uses third parties apis to search in a company database, so the processing times will be slower, (default=”True”).

description of output fields


  • id
    • format: integer
    • description: will always be null, used internally for other processes
  • lang
    • format: 2-character string
    • description: the ISO 639-1 language code detected during parsing
  • title
    • format: string
    • description: the title provided in the input
  • json
    • format: JSON object

    • description: all identified entities

    • format:

      "Entity class" : [list of identified texts],
      "Description" : "tokenized title and description, as used in parser, such that every identified entity appears in its exact form in the description"
      
      • the Entity Class list is a list that contains all the different entity types that the parser can detect and their extracted instances, each entity will have a different level of normalization depending on whether the parser mode is set to “standard” or “normalized”.

standard output

The standard output will produce a list of tuples, [String, boolean], where String is the entity, boolean is true if the entity exists in the JANZZ concept graph, in the respective branch. Entities which do not exist in the concept graph can still be identified during parsing, the output is not limited to existing terms. The JSON response has the following format:

{
    "id": null,

    "lang": "en",

    "title": "Project Manager",

    "json": {
        "Birthplace": [],

        "Authorizations": [],

        "Specialization": [],

        "Marital_status": [],

        "Softskills": [
            [
                'Communication skills',
                true
            ],
            [
                'Integrity',
                true
            ]
         ],

        "Working_conditions": [],

        "Languages": [
            [
                'English',
                true
            ],
            [
                'German',
                true
            ]
         ],

        "Religion": [],

        "Title or Degree": [],

        "Grade_point_average": [],

        "Email": [
            [
                'jogn***@********. com',
                false
            ],
         ],

        "Interests": [
            [
                'Traveling',
                false
            ],
            [
                'Cooking',
                false
            ]

         ],

        "Localization": [],

        "Company": [

            [
                'Realizer AG',
                false
            ],

            [
                'Xerox Corporation',
                false
            ]

         ],

        "Birthday": [],

        "Citizenships": [],

        "Address": [],

        "Industry": [],

        "Salary": [],

        "Achievements": [],

        "Function": [],

        "Name": [],

        "Skills":  [
            [
                'SWIFT',
                true
            ],
            [
                'Photoshop',
                true
            ],
            [
                'some unknown skill',
                false
            ],
         ],

        "Gender": [],

        "Age": [],

        "Experience": [
            [
                'Business Process Outsourcing Operator',
                true
            ],
            [
                'Banking, financial services and financial management',
                true
            ],
         ],

        "Memberships": [],

        "Contract_type": [],

        "Publications": [],

        "Portfolio": [],

        "ID": [],

        "Surname": [],

        "Social tags": [],

        "Characteristics": [],

        "Working_Permission": [],

        "Telephone": [
            [
                '+417 * * * * * * * *',
                false
            ]

         ],
        "References": [],

        "Schools_or_training_companies": [
            [
                'Informatic Technologies University Titu Maiorescu Bucharest',
                false
            ],
            [
                'School of Art - Design and Ceramics',
                false
            ]
         ],

        "Social_media": [],

        "Education":[
            [
                'BSc in Business Management ',
                true
            ],
            [
                'Senior Secondary School Certificate',
                true
            ]
         ],

        "Availability": [],

        "Occupation": [
            [
                'Head of Business Applications',
                true
            ]
         ]

    }
}

normalized output

This mode provides a more detailed output of the parser. For example, in elements such as education, it will provide the start and end date, school name, localization, etc. The same for experiences it will provide the associated company, date range, industries, specificity, etc. Various types of normalizations are applied to multiple elements such as language, skills, softskills, etc., therefore, this mode will take longer to process a single query.

{
    "lang": "en",
    "title": "Resume title...",
    "competencies": {
        "Occupation": [],
        "Desired Occupation": [],
        "Experience": [
            {
                "text": "Operations Manager",
                "concept": "Operations Manager",
                "cid": "20165",
                "level": "Experienced_Manager",
                "activity_industry": [
                    "Commercial activities, administration and management",
                    "Consumer goods, food, beverage and tobacco manufacturing"
                ],
                "company": "SFuture.ltd",
                "company_industry": "Services - Management services",
                "company_size": 424,
                "localization": "Austin, Texas",
                "start_date": "2022-01-00",
                "end_date": "2019-01-00",
                "number_of_years": 3.0
            },
            {
                "text": "Cannabis Sales Representative",
                "concept": "Cannabis Sales Representative",
                "cid": "2301072",
                "level": "Individual_Contributor_Experienced",
                "activity_industry": ["Retail"],
                "company": "AUTO.Freeway",
                "company_industry": "Manufacturing - Mfg motor vehicle/car bodies",
                "company_size": 54411,
                "localization": "Zurich, Switzerland",
                "start_date": "2019-01-00",
                "end_date": "2016-01-00",
                "number_of_years": 5.0
        ],
        "Skills": [
            {
                "text": "Customer Service",
                "concept": "Customer service",
                "cid": "2569",
                "level": 5
            },
            {
                "text": "Built and designed Website",
                "concept": "Designed and built a website",
                "cid": "619327",
                "level": 4
            },
            {
                "text": "Landing pages",
                "concept": "Landing Pages",
                "cid": "158054",
                "level": 3
            },
            {
                "text": "Kitchen Safety",
                "concept": "Kitchen Safety",
                "cid": "2125716",
                "level": 3
            },
            {
                "text": "Event Planning",
                "concept": "Event planning",
                "cid": "44383",
                "level": 3
            },
            {
                "text": "WordPress",
                "concept": "WordPress (WCMS)",
                "cid": "27288",
                "level": 3
            },
            {
                "text": "Google Analytics",
                "concept": "Google Analytics",
                "cid": "27607",
                "level": 3
            },
            {
                "text": "Proofreading",
                "concept": "Proofreading",
                "cid": "53115",
                "level": 3
            },
            {
                "text": "Adobe InDesign",
                "concept": "Adobe InDesign",
                "cid": "19403",
                "level": 3
            }
        ],
        "Education": [
            {
                "text": "Bachelor's degree in English",
                "concept": "Bachelor of Arts (B.A.) - English",
                "cid": "66432",
                "level": "Bachelor's degree/University primary qualification/Undergraduate degree",
                "specified education field": true,
                "school or company": "Hochschule Bremerhaven",
                "grade": 5.7,
                "localization": "Bremerhaven",
                "start_date": "2012-09-00"
                "end_date": "2015-09-00",
                "number_of_years": 3.0
            }
        ],
        "Softskills": [
            {
                "text": "Can Do Attitude",
                "concept": "Can-do attitude",
                "cid": "35290",
                "level": 3
            },
            {
                "text": "Problem Solver",
                "concept": "Problem-solving skills",
                "cid": "34620",
                "level": 3
            },
            {
                "text": "Team Oriented",
                "concept": "Teamwork skills",
                "cid": "29618",
                "level": 4
            },
            {
                "text": "Accountable",
                "concept": "Willingness to take responsibilities",
                "cid": "176422",
                "level": 3
            }

        ],
        "Languages": [
            {
                "text": "English",
                "concept": "English",
                "cid": "148",
                "oral level": 4,
                "written level": 4
            },
            {
                "text": "German",
                "concept": "German",
                "cid": "143",
                "oral level": 5,
                "written level": 5
            }
        ],
        "Function": [],
        "Specialization": [],
        "Desired Industry": [],
        "Availability": ["As soon as possible"],
        "Authorizations": [
            {
                "text": "CDL License",
                "concept": "CDL Class A - United States",
                "cid": "1903687"
            }]
    },
    "profile": {
        "title": "B.A.",
        "first_name": "John",
        "last_name": "Doe",
        "address": "81081, Grand Canyon, Colorado",
        "id": null,
        "gender": "m",
        "status": "married",
        "age": "40",
        "birthday": "1982-02-29",
        "birthplace": "Basel, Switzerland",
        "phones_list": [
            "+xxx - xxx - xxx"
        ],
        "email": "johndoe@gmail.com",
        "onlinelinks": [{"url": "https://www.linkedin.com/in/candidatepedro", "type": "li"}, {"url": "https://twitter.com/candidatepedro", "type": "tw"}],
        "nationalities": [{"text": "Swiss", "ISO 3166": "CH"}],
        "work_permits": [
            "Authorized to work in the US for any employer"
        ],
        "memberships": ["Tolkien fellowship of the rings"],
        "honors_&_awards": ["Bent Spoon Award"],
        "driver_licenses": ["A Driver License, United States"],
        "highest_education_level": 5,
        "religion": ["Jewish", "Youruba"],
        "characteristics": ["1.85m", "Dark hair", "Blue eyes"],
        "social tags": ["I am deaf and hear via a cochlear implant", "first job"],
        "interests": ["Basketball", "E-sports"],
        "references": ["Under request"],
        "publications": ["The Amazing Spongebob, Vol.21"],
        "desired_localizations": [{"name": "Bali, Indonesia", "cc": "ID", "type": "administrative_area_level_1", "r": 0.3, "lat": -8.409518, "lon": 115.188916}],
        "desired_contract": [{"contract_type": ["Full-time"], "duration": ["Permanent"], "working percentage": {"min": 80, "max": 100}, "workload": {"amount": 40, "lower_unit": "hour", "higher_unit": "week"}}],
        "desired_salary": [{"amount": {"min": 30.0, "max": 35.0}, "currency": "USD", "period": "hour"}],
        "desired_benefits": ["Health Insurance"],
        "desired_working_conditions": ["Work-remotely"],
        "contact_information": {"Name": ["Ms. Shina Parker"], "Function": ["Senior Recruitment Advisor"], "Address": ["Los Angeles, CA, 63103"], "Email": ["s.parker@rechr.com"], "Number": ["+xxx - xxx - xxx"], "Online Information": ["https://www.rechr.com/id=856942"]}
        }
    }
}

MULTIPLE RESUMES:

URL

https://www.janzz.jobs/japi/parser/parse_cv_batch/

ALLOWABLE METHODS

POST

JSON body format

[
    {"title": title1, "body": description1},
    {"title": title2, "body": description2},
    {"title": title3, "body": description3},
    ...
]

Or

description of input fields

  • json
    • format: list of dictionaries, the input must be a list of resumes titles and descriptions with the same format as used for single resume parsing. The maximum number of cvs that can be processed in a single call is 50.
    • effect: search all different types of entities within the resumes titles and descriptions.

Query parameters

  • want_cids
    • format: providing one of (true, 1) will return the concept id instead of true / false. If no concept exists, null will be returned.
  • compact
    • format: providing one of (true, 1) will omit the “text” field from the json output. This will make reponses considerably smaller if the text is not required.
  • output
    • format: will switch between the different processing modes of the API, options: “standard”, “normalized” or “[customer name]”, default=”standard”.

returns

A JSON response with the following format:

[
{"success": true, "id": null, "lang": language of the 1st resume, "title": title1, "json": { "Occupation": [...], "Skills": [...], "Education": [...], ...}},
{"success": false, "id": null, "detail": "We currently do not provide support for this language"}},
{"success": true, "id": null, "lang": language of the 3rd resume, "title": title3, "json": { "Occupation": [...], "Skills": [...], "Education": [...], ...}},
...
]

description of output fields

  • json
    • format: list of dictionaries,
    • description: the output corresponds to a list of processed resumes that are returned in the same order as the input. Each element of the list includes a “success” key that indicates if a cv could be parsed or not. If not possible, the dictionary will contain another field “detail “with a log of the possible error. The processed resumes will include the same fields and characteristics as those described in the individual cv parsing: ({“id”, “lang”, “title”, “json”}).

janzzs creation

Produces a structured representation from an unstructured job description or resume, which can then be used to generate matches between job seekers and offers, statistical analysis, educational upskilling, etc.

Special notes

  • Each supported language is backed by a pipeline of different deep learning models that realizes specific NLP tasks to produce the final output.
  • Currently, English, Spanish, Norwegian, Arabic, German, Portuguese, Japanese, French, Italian, Chinese and Dutch are supported.
  • The input for creating a janzz could be a free text or a file, the supported file types are: pdf, doc, docx, csv, tsv, txt, rtf, odf, xml, html and xlsx.
  • Input text does not need to be pre-processed or normalized, as it will be tokenized during parsing, so extra newlines, spaces, punctuation, etc. will be removed, and language does not need to be provided, as it will be automatically detected.

Janzz from jobs

URL

https://www.janzz.jobs/japi/parser/job_to_janzz/

ALLOWABLE METHODS

POST

JSON body format

{
    "title": "job title...",
    "text": "job description..."
}

Or

multipart/form-data

{
    'file': open("document.pdf", 'rb')
}

description of input fields

  • title
    • format: string, optional
    • effect: search for entities of all classes in the title and normalize them.
  • text
    • format: string, newlines represented by n
    • effect: search for entities of all classes in the text and normalize them.
  • file
    • format: document, is recommended to send files in binary mode.
    • effect: search and normalize entities of all classes within the text extracted from the document.

returns

This API output corresponds to the standard janzz format, this one is properly described here: https://www.janzz.jobs/static/doc/apiv1/janzz_api.html#janzz-format

Janzz from resumes

URL

https://www.janzz.jobs/japi/parser/cv_to_janzz/

ALLOWABLE METHODS

POST

JSON body format

{
    "title": "resume's title, (if available)",
    "text": "resume's text..."
}

Or

multipart/form-data

{
    'file': open("document.pdf", 'rb')
}

description of input fields

  • title
    • format: string, optional
    • effect: search for entities of all classes in the title and normalize them.
  • text
    • format: string, newlines represented by n
    • effect: search for entities of all classes in the text and normalize them.
  • file
    • format: document, is recommended to send files in binary mode.
    • effect: search and normalize entities of all classes within the text extracted from the document.

returns

This API output will be split into two sections: “profile” and “janzz”. The “profile” section contains all the candidate personal information such as: first name, last name, address, etc., while the “janzz” part contains all of the candidate’s competencies and is equal to the standard janzz format, this one is properly described here: https://www.janzz.jobs/static/doc/apiv1/janzz_api.html#janzz-format

occupation extract

Parse a free-text job ad in order to identify the most probable occupations.

Special notes

  • Extracted occupations may appear either in the job description, or be similar to occupations in the ontology, regardless of whether they do not actually appear in the input text.
  • Currently, English, Spanish, Norwegian, Arabic, German, Portuguese, French, Italian, Chinese and Dutch are supported.
  • Input text does not need to be pre-processed or normalized, as it will be tokenized during parsing, so extra newlines, spaces, punctuation, etc. will be removed, and language does not need to be provided, as it will be automatically detected.

URL

https://www.janzz.jobs/japi/parser/occupation_extract/

ALLOWABLE METHODS

POST

JSON body format

{
    "title": "job title...",
    "body": "job description..."
}

Or

Multi-Part request

{
    'file': open("document.pdf", 'rb')
}

description of input fields

  • title
    • format: string, optional
    • effect: search for entities of occupation class in the title.
  • body
    • format: string, newlines represented by n
    • effect: search for entities of all classes in the body.

returns

A JSON response with the following format:

{
    "lang": "en",
    "results": [
        {
            "text": "Java Developer",
            "in_text": false,
            "score": 0.93,
            "cid": 22432
        },
        {
            "text": "Java developer web",
            "in_text": false,
            "score": 0.91,
            "cid": 142936
        },
        {
            "text": "Agile Java Developer",
            "in_text": false,
            "score": 0.9,
            "cid": 386003
        },
        {
            "text": "Junior Java Developer",
            "in_text": false,
            "score": 0.89,
            "cid": 552488
        },
        ...
    ]
}

description of output fields

  • lang
    • format: 2-character string
    • description: the ISO 639-1 language code detected during parsing
  • results
    • format: list

    • description: list of most likely occupations

    • format:

      "text" : the occupation
      "cid" : the concept ID of the occupation, or null if the occupation does not exist in the JANZZ Ontology
      "score" : probability of correctness, between 0 and 1
      "in_text" : boolean, whether or not the occupation appears in the input text
      

OJA Classifier

Online job advertisement classifier

Classify a job advertisement in multiple international taxonomies, for each of the desired classifications a list will be returned with the codes assigned within the classification ordered by probability, each element will contain a specific code, a concept in the JANZZ Ontology that matches its, the cid of the concept and the score value.

To see the list of all the available classification please check: https://www.janzz.jobs/static/doc/apiv1/classifications.html#classifications-and-taxonomies

Special notes

  • The maximum number of codes returned for each classification is 5, however, we do not limit the number of results by a minimum score value, leaving this up to the user to decide.
  • Currently, English, Spanish, Norwegian, Arabic, German, Dutch, Portuguese, French, Italian and Chinese are supported.
  • Input text does not need to be pre-processed or normalized, as it will be tokenized during parsing, so extra newlines, spaces, punctuation, etc. will be removed.
  • If the job description it’s not empty the language will be automatically detected, if the job description it’s empty the title alone it’s not enough to recognize the correct language, so the job description language should be pre-specified.

URL

https://www.janzz.jobs/japi/test_classifier/

ALLOWABLE METHODS

POST

JSON body format

{
    "title": "job title...",
    "text": "job description...",
    "classifications" : ["list of classifications"],
    "lang": "language"
}

description of input fields

  • title
    • format: string, required
    • effect: search for entities of occupation class in the title.
  • text
    • format: string, newlines represented by n, optional
    • effect: search for entities of occupation class in the job description.
  • classifications
    • format: list of strings, required
    • effect: classify and ranks the occupations present in the title and in the text according to these classifications.
  • lang
    • format: string, newlines represented by n, required if no job description it’s provided
    • effect: search for concept labels only in this language.

Query parameters

  • show_empty
    • format: providing true will also return concepts which do not have a value for requested classifications.

returns

A JSON response with the following format:

{
    "classifications": {
        "ISCO-08": [

                {'concept': 'SOFTWARE ENGINEER', 'code': '2512', 'cid': '22399', 'score': 1.0},

                {'concept': 'SOFTWARE ENGINEER', 'code': '2514', 'cid': '22399', 'score': 1.0},

                {'concept': 'Engineer Technical Software', 'code': '2511', 'cid': '213637', 'score': 0.93},

                {'concept': 'Software Engineer QA', 'code': '2519', 'cid': '161733', 'score': 0.88},

                {'concept': 'Java-/Web-Software Engineer/Architect', 'code': '2513', 'cid': '142936', 'score': 0.85}

        ],
        "ESCO": [

                {'concept': 'SOFTWARE ENGINEER', 'code': 'http://data.europa.eu/esco/isco/C2512', 'cid': '22399', 'score': 1.0},

                {'concept': 'Software Design Engineer', 'code': 'http://data.europa.eu/esco/occupation/f2b15a0e-e65a-438a-affb-29b9d50b77d1', 'cid': '66623', 'score': 0.94},

                {'concept': 'Embedded Software Engineer', 'code': 'http://data.europa.eu/esco/occupation/57af9090-55b4-4911-b2d0-86db01c00b02', 'cid': '22450', 'score': 0.92},

                {'concept': 'Software Engineer (IAM)', 'code': 'http://data.europa.eu/esco/isco/C2514', 'cid': '216403', 'score': 0.89},

                {'concept': 'SOFTWARE DEVELOPMENT ENGINEER IN TEST / QA ANALYST', 'code': 'http://data.europa.eu/esco/occupation/7086d0ca-1e77-4690-89c9-7ed1a0478fa3','cid': '54978', 'score': 0.8}

        ],
        "ROME V3": [

                {'concept': 'SOFTWARE ENGINEER', 'code': 'M1805', 'cid': '22399', 'score': 1.0}
        ]
    }
}

description of output fields

  • classifications

    • format: JSON object
    • description: list of all the desired classifications where each one will generate a list with a maximum of 5 elements that correspond to the best classified codes assignable to this combination of job title and description, the list is ordered by the probability of applicability of each code.
    • each element in the list corresponds to a dictionary: {‘concept’: string , ‘code’: string, ‘cid’: string, ‘score’: float}
    • where concept is the term found in the Janzz Ontology that contains this code, code is the value within the classification to which the element of the list belongs, cid is the id associated in the ontology to the concept found and score is the probability of this classification to be correct.

MEJPClassifier

Multiple Entity Job Posting Classifier

Parse a free-text job description in order to identify all known entities and classifies certain entities, such as occupation, industry, education, contract, salary into a set of predefined classes.

Special notes

  • The title of the advertisement must contain an occupation, this will be classified according to its Seniority Degree, O*Net, RIASEC, ESCO and ISCO-08 classifications, if the title is empty, it will classify all the occupations extracted from the job description and return the classifications of all of them.
  • Industry entities will be classified in the ISIC taxonomy and education will be classified in the ISCED classification.
  • The contract terms extracted will be structured according to their type, duration, workload and working percentage, and the salaries terms will be described according to their currency, maximum, minimum and period of payment.
  • Supervised deep learning models were trained as classifiers for each of the entities.
  • Currently, English, Spanish, Norwegian, Arabic, German, Dutch, Portuguese, French, Italian and Chinese are supported.
  • Input text does not need to be pre-processed or normalized, as it will be tokenized during parsing, so extra newlines, spaces, punctuation, etc. will be removed, and language does not need to be provided, as it will be automatically detected.

URL

https://www.janzz.jobs/japi/customer_igb/

ALLOWABLE METHODS

POST

JSON body format

{
    "title": "job title...",
    "text": "job description..."
}

description of input fields

  • title
    • format: string, optional
    • effect: search for entities of occupation class in the title.
  • text
    • format: string, newlines represented by n
    • effect: search for entities of all classes in the body.

returns

A JSON response with the following format:

{
        "Authorizations": [],

        "ESCO Function": ['http://data.europa.eu/esco/occupation/bd272aee-adc9-4a06-a15c-a73b4b4a46a7'],

        "Softskills": ['Innovative', 'Self - motivated','Flexible'],

        "RIASEC Codes": ['IRE'],

        "Specializations": [],

        "Languages": ['English'],

        "Working_conditions": [],

        "O*Net": {'Interests': ['IRC', 'IC', 'ICR'], 'Job Zones': ['4'], 'Codes': ['15-1132.00'], 'Work Values': ['Working Conditions', 'Recognition',
        'Achievement', 'Independence'], 'Job Family': ['Computer and Mathematical']},

        "Number of Vacancies": [],

        "Experiences": ['Two (2) years of experience in the job offered or in any related position (s)'],

        "Localizations": ['New York, NY'],

        "Salary": [{'currency': 'USD', 'amount': {'max': 74000.0, 'min': 90000.0}, 'period': 'year'}],

        "Benefits" : [],

        "Skills" : ['JavaScript', 'TypeScript', 'Java', 'Go',  'Angular'],

        "Companies" : ['GLMX'],

        "Social tags" : [],

        "ISIC Industries" : [],

        "Career level" : ['Individual_Contributor_Senior'],

        "Supervisor" : [],

        "ISCO Functiongroup" : ['2512'],

        "Occupations" : ['Senior Software Engineer'],

        "Vacancy language": "en",

        "Contract information" : [{'contract_type': ['Full-Time'], 'duration': [], 'workload': {'amount': 40, 'lower_unit':  'hour', 'higher_unit': 'week'}, 'working percentage': {'max': 100, 'min': 80}}],

        "ISCED Educations" : [{'Education': "Bachelor's degree in Computer Science, Computer Applications, Software or Computer Engineering or any related IT or Engineering field of study", 'ISCED_code': 'ISCED_6'}],

        "Availability": []
}

description of output fields

  • Vacancy language
    • format: 2-character string
    • description: the detected language used during parsing.
  • Occupations, Skills, Softskills, Specializations, Languages, Working_conditions, Number of Vacancies, Experiences, Localizations, Authorizations, Benefits, Companies, Social tags, Supervisor, Availability
    • format: list of strings
    • description: list of entities extracted from each of these categories
  • Career level
    • format: list of strings
    • description: Seniority degree of the job title if it is not empty, otherwise it will classify all the occupations extracted from the description and will return their associated degrees.
    • list of possible seniority levels: ‘Individual_Contributor_Experienced’, ‘Individual_Contributor_Entry_Level’, ‘Individual_Contributor_Senior’, ‘Entry_Level_Manager’, ‘Experienced_Manager’, ‘Executive’.
  • ESCO Function, RIASEC Codes, ISCO Functiongroup
    • format: list of strings
    • description: List of codes within the ESCO, ISCO-08 or RIASEC classifications associated to the job title if it’s not empty, otherwise it will classify all the occupations extracted from the description and will return the predicted codes for each of them.
  • O*Net
    • format: dictionary
    • description: List of O*Net codes associated with the job title if it is not empty, otherwise it will classify all the occupations extracted from the description and will return the predicted codes for each of them, each result will consist of a dictionary: {‘Interests’: List of strings, ‘Job Zones’: List of strings, ‘Codes’: List of strings, ‘Work Values’: List of strings, ‘Job Family’: List of strings}
  • ISIC Industries
    • format: list of strings
    • description: If the industries to which the job advertisement belongs are explicitly mentioned in the text of the job description, they will be extracted and for each one their respective ISIC codes will be predicted.
  • ISCED Educations
    • format: list of dictionaries
    • description: for each education that requires the job advertisement and is present in the text of the job description, its corresponding ISCED codes will be predicted, each result will be a dictionary: {‘Education’: Education string, ‘ISCED_code’: code string}
  • Salary
    • format: list of dictionaries
    • description: salary contains a list with all the salaries found in the description with their corresponding min & max salary, currency and period of payments, each result will be a dictionary: {‘currency’: code string, ‘amount’: {‘max’: float, ‘min’: float}, ‘period’: duration string }
    • the currency code is a 2 letter ISO 4217 classification,
    • the current list of types of periods of payments is: [“hour”, “day”, “week”, “month”, “year”]
  • Contract information
    • format: list of dictionaries
    • description: the contract terms that the job advertisement offers and are present in the text of the job description are grouped and classified in a fixed set of categories, the result is a dictionary: {‘contract_type’: list of strings, ‘duration’: list of strings, ‘workload’: {‘amount’: float , ‘lower_unit’: string , ‘higher_unit’: string }, ‘working percentage’: {‘max’: float, ‘min’: float}}
    • list of all possible types of contract: ‘Part-Time’, ‘Full-Time’, ‘Trial apprenticeship’, ‘Traineeship/Apprenticeship’, ‘Internship’, ‘Student Job’, ‘Interim Management’, ‘Mandate/ invoice’, ‘Supply of staff’.
    • list of possible duration types: ‘Per Day’, ‘Temporary/Fixed-Term’, ‘Seasonal’, ‘Permanent’, ‘Volunteer’, ‘Work remotely’, ‘Freelance/Service Contract/Project’, ‘Dissertation / Thesis’, ‘Mini Job/Micro Job/Odd job’, ‘Performance/artist contract’.
    • the field ‘percentage of work’ represents the percentage of time that the job should occupy from a general workday, if it’s present will include minimum and maximum values in a scale of [1-100].
    • the workload refers to the nominal amount of time expressed in the same units of measurements that the employer refers to, example “40 hours per week”, the list of all available time units is equal to the one used in salary periods.

jobs sectionizer

Sectionize the text of a job description into several main sections in order to structure the offer in an organized manner. The segmentation is done completely based on the semantic context of each region, no rules or heuristics are applied. Instead, deep learning models had been trained for this purpose, to provide robustness to multiple layouts, writing styles, and domains of job advertisements.

Special notes

  • The full list of available sections is: Title, Introduction, Duties, Qualifications, Offer & benefits, Contact Information.
  • For each of the sections, the principal section headline is also identified and extracted.
  • The input language does not need to be provided, as it is automatically detected during preprocessing.
  • This feature is available currently In English and German, more languages are under development.
  • The sectionizer supports also plain HTML content as input, the tags will be preprocessed and used to assemble the text of the job description in a way that reflects the original website layout.

URL

https://www.janzz.jobs/japi/job_sectionizer/

ALLOWABLE METHODS

POST

JSON body format

{
    "text": "Job description..."
}

description of input fields

  • text
    • format: string, newlines represented by n
    • effect: search each of the sections in the job offer.

returns

A JSON response with the following format:

{'Title': 'Senior Software Developer',

 'Introduction': {'header': 'About the team', 'text': 'We proudly drive our mission with unique team culture & engineering excellence where we help more than 400,000 customers worldwide. As a cloud company with 200 million users and more than 100,000 employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. We are looking for you, a motivated and well-versed Senior Software Developer who has an excellent track record in developing enterprise software in the cloud...

 'Duties': {'header': 'Your Reponsabilities:', 'text': 'With your personal engagement and ability to develop modern products based on newest technologies you will support us in successfully carrying out our mission to help our customers digitize their business processes. By building, delivering and operating state of the art microservices you will enable them to build, run, manage and improve workflows, from simple approvals to complex end-to-end processes used across applications, and running across organizations.'},

 'Qualifications': {'header': 'What we expect from you:', 'text': 'Are you passionate to apply agile software engineering practices and develop high quality end-to-end software? You hold a Bachelor/Master degree in Computer/Natural Science, Engineering or related. You have 5 or more years of experience in professional software development. You bring very good software development skills in ...'},

 'Offer & Benefits': {'header': 'What we offer:', 'text': 'A company environment and culture that is focused on helping our employees enable innovation by building breakthroughs together. A highly collaborative, caring team environment with a strong focus on learning and development, recognition for your individual contributions, and a variety of benefit options for you to choose from ...'},

 'Contact Information': {'header': 'Interested?', 'text': 'If we have caught up your attention and you are eager to join the crew, please send an e-mail with your request to our recruiting operations team: Americas at cloud@supercompany.com. Qualified applicants will receive consideration for employment without regard to their age, race, religion, national origin, ethnicity, age, gender (including pregnancy, childbirth, et al), sexual orientation, gender identity or expression, protected veteran status, or disability.. }}

description of output fields

  • Title
    • format: string
    • description: This is the title of the job description, in case there are several titles present in the text, the one with the highest confidence score to be correct will be returned.
  • Introduction:
    • format: dictionary containing “header” and “text”
    • description: The “header” field contains the introductory phrase for each section, in case multiple headers are detected, only the first one will be returned. The “text” field contains information about the background and vision of the company, as well as an overview of the type of candidate they are looking for.
  • Duties:
    • format: dictionary containing “header” and “text”
    • description: The “text” field describes which will be the candidate’s tasks and responsibilities within the future role.
  • Qualifications:
    • format: dictionary containing “header” and “text”
    • description: The “text” field describes what qualifications the applicant must possess to apply for the given position.
  • Offer & Benefits:
    • format: dictionary containing “header” and “text”
    • description: The “text” field includes the salary and benefits, as well as the work environment and company culture.
  • Contact Information:
    • format: dictionary containing “header” and “text”
    • description: The “text” field contains the application’s process related information and conditions that are explicitly mentioned in the text of the job description.

terms similarity

Search for the most similar concepts to the input query in the JANZZ Ontology, and sort them according to their semantic similarity.

URL

https://www.janzz.jobs/japi/parser/similarity/

ALLOWABLE METHODS

POST

JSON body format

{
    "term": "java coding language",
    "branch": "skill",
    "lang": "en"
}

description of input fields

  • term
    • format: string
    • effect: search for concepts similar to this free-text string.
  • branch
    • format: string
    • allowed values: occupation, function, specialization, skill, softskill, industry, education, authorization
    • effect: the branch to search for similar concepts in.
  • lang
    • format: 2-character string, ISO 639-1 language code
    • effect: will search only in this language.

returns

A JSON response with the following format:

{
    "results": [
        [
            19523,
            "JAVA programming language",
            0.82
        ],
        [
            42158,
            "Java-script programming language",
            0.8
        ]
    ]
}

description of output fields

  • results
    • format: json list
    • description: all similar concepts, sorted by most-similar first
    • result format
      • concept-id
      • closest matching label
      • similarity score from 0-1