File size: 4,576 Bytes
93daa38
 
 
 
 
 
 
 
 
 
 
 
 
85b8d58
 
 
cf9329c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4956b17
cf9329c
e31d2da
cf9329c
e31d2da
 
 
 
 
cf9329c
 
e31d2da
 
 
 
 
 
 
 
cf9329c
e31d2da
 
cf9329c
 
e31d2da
 
 
 
cf9329c
 
 
 
 
e31d2da
cf9329c
 
 
 
 
 
 
 
 
 
 
057a19f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
<div style="display: flex; align-items: center; justify-content: center;">
  <div style="margin-right: 20px;">
    <img src="https://cdn-lfs-us-1.hf.co/repos/de/fb/defb007867acd8852f4a283e9b06a933778826b18ed58ade01da945f5903795d/8b7831230df7d554c74f5e249e23be57165d143fea0ea7b5dde56dde5c13c95b?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27turing-test.gif%3B+filename%3D%22turing-test.gif%22%3B&response-content-type=image%2Fgif&Expires=1730008247&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczMDAwODI0N319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zL2RlL2ZiL2RlZmIwMDc4NjdhY2Q4ODUyZjRhMjgzZTliMDZhOTMzNzc4ODI2YjE4ZWQ1OGFkZTAxZGE5NDVmNTkwMzc5NWQvOGI3ODMxMjMwZGY3ZDU1NGM3NGY1ZTI0OWUyM2JlNTcxNjVkMTQzZmVhMGVhN2I1ZGRlNTZkZGU1YzEzYzk1Yj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=GBUn-4z3PMBTqT0NdT3H-NyZxNMGcN4zDNzK8ql%7ESwLF8pXzkH783GSCZQQYWwE-v1g90JTulsOt7z5szigK49ApFju6bkS2zwUAYxNttcl3c-VYrxGuFWYnkHpTQ73qbs3ELF2-5LzDy1ARpj3BOlSEXtH9ShwCRm-R0llQJ6EDx2eOyBIDg-Pgrx%7EKIxrdAZCNln9tJk74TrSN5survdIvcSZrSIGXc3tpFLm-BwpY6qtID3ltrPEHYWDrQ5ALV8lXqKmpVlFSq3lOEFlSa-opFJwe%7E8FIIwP5mJgtCZzlQQylRhsVLxDQ2cJYpTbZSvEVkfjyTxOP4dc%7EDz1tVQ__&Key-Pair-Id=K24J24Z295AEI9" 
         alt="AI App Icon" width="100" height="50" 
         style="border-radius: 20px; border: 2px solid #333;">
  </div>
  <div>
    <p style="font-size: 50px; font-weight: bold; text-align: center; margin: 0;">
      Spacy Model Creator
    </p>
  </div>
</div>
<hr>

<hr>

# Overview:
This project is a comprehensive Resume Parsing tool built using Python,
integrating the Mistral-Nemo-Instruct-2407 model for primary parsing.

# Installation Guide:

1. Create and Activate a Virtual Environment
    python -m venv venv
    source venv/bin/activate  # For Linux/Mac
    # or
    venv\Scripts\activate  # For Windows

    # NOTE: If the virtual environment (venv) is already created, you can skip the creation step and just activate.
        - For Linux/Mac:
            source venv/bin/activate
        - For Windows:
            venv\Scripts\activate

2. Install Required Libraries
    pip install -r requirements.txt

    # Ensure the following dependencies are included:
    - Flask
    - spaCy
    - huggingface_hub
    - PyMuPDF
    - python-docx
    - Tesseract-OCR (for image-based parsing)

; NOTE : If any model or library is not installed, you can install it using:
    pip install <model_name>
    _Replace <model_name> with the specific model or library you need to install_

3. Set up Hugging Face Token
    - Add your Hugging Face token to the .env file as:
    HF_TOKEN=<your_huggingface_token>


# File Structure Overview:
    Spacy_Model_creator/
    β”‚
    β”œβ”€β”€ Models/
    β”‚   └── ner_model_05_3  # Pretrained spaCy model directory for resume parsing
    β”‚    
    β”œβ”€β”€ data/
    β”‚   └── Json_data.json 
    β”‚   └── resume_text.txt
    β”‚   └── Spacy_data.spacy
    β”‚
    β”œβ”€β”€ templates/
    β”‚   β”œβ”€β”€ anoter.html  
    β”‚   └── result.html   
    β”‚   └── guide.html
    β”‚   └── savejson.html
    β”‚   └── savespacy.html
    β”‚   └── text.html
    β”‚   └── upload.html
    β”‚   └── data_files.html
    β”‚
    β”œβ”€β”€ JSON/ 
    β”‚   └── Json_data.json 
    β”‚
    β”œβ”€β”€ utils/
    β”‚   β”œβ”€β”€ model.py  # Code for calling Mistral API and handling responses
    β”‚   β”œβ”€β”€ json_to_spacy.py  # spaCy fallback model for parsing resumes
    β”‚   β”œβ”€β”€ anoter_to_json.py  # Error handling utilities
    β”‚   └── file_To_text.py  # Functions to extract text from different file formats (PDF, DOCX, etc.)
    β”‚
    β”œβ”€β”€ venv/  # Virtual environment
    β”‚
    β”œβ”€β”€ .env  # Environment variables file (contains Hugging Face token)
    β”‚
    β”œβ”€β”€ app.py  # Flask app handling API routes for uploading and processing resumes
    β”‚
    └── requirements.txt  # Dependencies required for the project

# References:

- [Flask Documentation](https://flask.palletsprojects.com/)
- [spaCy Documentation](https://spacy.io/usage)
- [Hugging Face Hub API](https://huggingface.co/docs/huggingface_hub/index)
- [PyMuPDF (MuPDF) Documentation](https://pymupdf.readthedocs.io/en/latest/)
- [python-docx Documentation](https://python-docx.readthedocs.io/en/latest/)
- [Tesseract OCR Documentation](https://github.com/UB-Mannheim/tesseract/wiki)
- [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html)