Fact sheet PDF reader

App can extract relevant information from numerous fact sheets.

To do:

  1. Reading PDF text
    • reading whole text
    • checking provider of the fund (Blackrock iShares, Vanguard, Invesco, Xtrackers, State Street Global Advisors SPDR, etc.)
    • checking asset class of the fund (equity, bonds, precious metals, commodities, cryptocurrencies, real estate, money market, multi asset)
    • checking exposure of the fund (region, country, industry)
    • checking strategy of the fund (market cap weighted , factor investing, market cap weighted with capped maximum allocation)
    • formatting text into json
  2. Handling multiple files at once
    • creating folder for PDFs and for outputs
    • handling multiple things at once (multiprocessing)
  3. Creating files with data
    • json
    • csv
    • Excel with multiple charts
    • using some (new for me) python library that can show good-looking charts
  4. API
    • creating test API
    • pushing, checking, getting data from API
  5. Detection two the same ISIN files
    • asking if you want to merge two data sets into one
    • merging those and creating 100% stacked area chart in Excel to show exposure changes during time

Notes:

  • compare speed of the other libraries that read pdf

GitHub

View Github