2-INF-185 Integrácia dátových zdrojov 2016/17
- Submit the protocol, scripts and required output files.
- Before running the
hadoopcommand, source a script the necessary settings / paths:
- The directory
/input/portage/logscontains large ammounts of log files from a gentoo instalation.
- Installation of files is recorded in the logs as follows:
--- /usr/bin/ >>> /usr/bin/make -> gmake >>> /usr/bin/gmake
The first line represents a directory, the second records a symlink being created (we want to include these in the list of files), the third line represents a regular file being installed.
- Find out names of all the files that were ever installed and put them to a file
- Find out 10 most rewriten / reinstalled files (you can use
headto postprocess the results file). Write the results (most reinstalled file first) to the file
- For each package find the number of errors that ever happened during its installation. Write the results to file
package_errors.txtin the form
package_name numErrors, sorted with package with most errors on the top.
- Treat any occurrence of the string
error:as an error (including the colon, case insensitive)
- Packages are marked at the start of each log file with a line (note the space at the start of the line):
* Package: sys-devel/make-4.1-r1