Parsing Study Regulations of German Universities and make them searchable.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Stefan Naumann 0fef2ee6e4 directory mode and recursive scan for input files 1 year ago
parser directory mode and recursive scan for input files 1 year ago
web parser: cleanup strings after parsing, fixed some regular expressions 1 year ago
README.md README, parser: renaming 1 year ago

README.md

Ideal enigma

Parsing and searching module descriptions, courses, etc. from the study regulations or module catalogue of a University.

Why?

Because.

Well, actually I want to be able to search for modules and courses on my University. There is no central tool or database inside the university for that, so I decided to create one myself. And I didn’t want to copy every single module over individually, so I created a parsing tool.

How?

This section describes how to use the tool

Requirements

  • Parser
    • PyMySQL
    • MySQL or MariaDB-server
  • Web interface
    • Apache2 + mod_wsgi
    • MySQL or MariaDB
    • PyMySQL
    • Flask

Usage

Step0: Create text-files from the PDF

# either
gs -sDEVICE=textwrite -o out.tex in.pdf
# or
cd pdftotext
bash pdf2text.sh in.pdf

Parser:

python enigma.py [parameters] {path to txt-files}

Parameters:

  • --mysql-user <user>
  • --mysql-db <db>
  • --mysql-passwd <passwd>
  • --mysql-host <host/IP>
  • --verbose - make the parser more talkative
  • --driver <drivername> - select a driver, which parses the module catalogue

Web Interface

Work in Progress

Drivers?

I wanted the parser to work for several versions of study regulations and for several universities. So I created an interface BaseDriver in driver/base.py. A driver has the following methods:

  • parseModuleCatalogue - go through a text-listing of modules from a module catalogue and identify modules and their parts
  • parseCredits - find the number of credits per module
  • parseExams - find the exams per module
  • parseCourses - find the courses per module
  • parseProgrammeMeta - find meta information about a programme from the study regulations text

A driver is selected by the user on invocation, then an instance is created and the driver is used to parse the input string (the content of the input-files).

Terminology

The whole project is centered around study regulations of German universities, or to be more exact study regulations from the TU Chemnitz. What constitutes a module is regulated by among others the Kultusministerkonferenz (KMK, Conference of ministers for educations). An interesting read could be Ländergemeinsame Strukturvorgaben zur Akkreditierung von Bachelor- und Masterstudiengängen. For the scope of this project the following terms are defined as follows.

A programme, i.e. a programme of study is a structured list of modules. The structure defines certain blocks, in wich a student can take modules in regards to certain constraints, like achieving a certain amount of credits or a certain amount of modules (e.g. “three out of five modules”, or “reach 12 credit points”).

A module is a well defined unit of a programme. A module is described by its contents and the achieved competences, as well as a number of credits, a name and exams. A module can consist of courses like lectures, tutorials, seminars, etc, and a module may end on exams, like a written report, a written exam, an oral exam, etc.