2-INF-185 Integrácia dátových zdrojov 2016/17

Materiály · Úvod · Pravidlá · Kontakt
HW10 a HW11 odovzdajte do utorka 30.5. 9:00.
Dátumy odovzdania projektov:
1. termín: nedeľa 11.6. 22:00
2. termín: streda 21.6. 22:00
Oba termíny sú riadne, prvý je určený pre študentov končiacich štúdium alebo tých, čo chcú mať predmet ukončený skôr. V oboch prípadoch sa pár dní po odvzdaní budú konať krátke osobné stretnutia s vyučujúcimi (diskusia k projektu a uzatvárane známky). Presné dni a časy dohodneme neskôr. Projekty odovzdajte podobne ako domáce úlohy do /submit/projekt


HW10

From IDZ
Jump to: navigation, search

L10

  • Submit the protocol, scripts and required output files.
  • Before running the hadoop command, source a script the necessary settings / paths:
source ~yoyo/hadoop/hadoop-env.sh
  • The directory /input/portage/logs contains large ammounts of log files from a gentoo instalation.
  • Installation of files is recorded in the logs as follows:
--- /usr/bin/
>>> /usr/bin/make -> gmake
>>> /usr/bin/gmake

The first line represents a directory, the second records a symlink being created (we want to include these in the list of files), the third line represents a regular file being installed.

Task A

  • Find out names of all the files that were ever installed and put them to a file installed_all.txt.

Task B

  • Find out 10 most rewriten / reinstalled files (you can use sort / head to postprocess the results file). Write the results (most reinstalled file first) to the file top_reinstalled.txt.

Task C

  • For each package find the number of errors that ever happened during its installation. Write the results to file package_errors.txt in the form package_name numErrors, sorted with package with most errors on the top.
  • Treat any occurrence of the string error: as an error (including the colon, case insensitive)
  • Packages are marked at the start of each log file with a line (note the space at the start of the line):
 * Package:    sys-devel/make-4.1-r1