-
Notifications
You must be signed in to change notification settings - Fork 55
[WORK IN PROGRESS] - preparsing job recognises uploaded archive better #492
base: master
Are you sure you want to change the base?
Changes from 2 commits
08e2c4b
276b593
2485376
9369ad0
610c56f
b5b38a0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,14 @@ | ||
| # frozen_string_literal: true | ||
| require 'zip' | ||
| require 'zlib' | ||
| require 'rubygems/package' | ||
| require 'digest' | ||
|
|
||
| class Nested | ||
| # a weird hack to get a nested method call working like I want it to | ||
| def myreader(zipfile) | ||
|
|
||
|
|
||
| class Preparsing | ||
| include Sidekiq::Worker | ||
| # only retry 10 times - after that, the genotyping probably has already been deleted | ||
|
|
@@ -10,11 +17,41 @@ class Preparsing | |
| def perform(genotype_id) | ||
| genotype = Genotype.find(genotype_id) | ||
|
|
||
| logger.info "Starting preparse" | ||
| biggest = '' | ||
| biggest_size = 0 | ||
| begin | ||
| Zip::File.open(genotype.genotype.path) do |zipfile| | ||
| logger.info "Starting preparse on #{genotype.genotype.path}" | ||
| # First, we need to find out which archive or flat text our uploaded file is! | ||
| # We use the bash tool file for that | ||
| # | ||
| # There are two possible outcomes - file is a collection of files (tar, tar.gz, zip) | ||
| # or file is a single file (ASCII, gz) | ||
| filetype = %x{file #{genotype.genotype.path}} | ||
| case filetype | ||
| when /ASCII text/ | ||
| logger.info "File is flat text" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| reader = File.method("open") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| is_collection = false | ||
| when /gzip compressed data, was/ | ||
| reader = Zlib::GzipReader.method("open") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| logger.info "File is gz" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| is_collection = false | ||
| when /gzip compressed data, last modified/ | ||
| reader = lambda { |zipfile| Gem::Package::TarReader.new(Zlib::GzipReader.open(zipfile)) } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the -> { ... } lambda literal syntax for single line lambdas. |
||
| is_collection = true | ||
| when /POSIX tar archive/ | ||
| logger.info "File is tar" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| reader = Gem::Package::TarReader.method("new") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| is_collection = true | ||
| when /Zip archive data/ | ||
| logger.info "File is zip" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| reader = Zip::File.method("open") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer single-quoted strings when you don't need string interpolation or special symbols. |
||
| is_collection = true | ||
| end | ||
|
|
||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extra blank line detected. |
||
| if is_collection | ||
| # Find the biggest file in the archive | ||
| biggest = '' | ||
| biggest_size = 0 | ||
| reader.call genotype.genotype.path do |zipfile| | ||
| # find the biggest file, since that's going to be the genotyping | ||
| zipfile.each do |entry| | ||
| if entry.size > biggest_size | ||
|
|
@@ -27,14 +64,13 @@ def perform(genotype_id) | |
| system("mv #{Rails.root}/tmp/#{genotype.fs_filename}.csv #{Rails.root}/public/data/#{genotype.fs_filename}") | ||
| logger.info "copied file" | ||
| end | ||
|
|
||
| rescue | ||
| logger.info "nothing to unzip, seems to be a text-file in the first place" | ||
| else | ||
| system("cp #{genotype.genotype.path} #{Rails.root}/public/data/#{genotype.fs_filename}") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use Rails.root.join('path', 'to') instead. |
||
| end | ||
|
|
||
| # now that they are unzipped, check if they're actually proper files | ||
| file_is_ok = false | ||
| fh = File.open(genotype.genotype.path) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How did this ever work?? It parses the original uploaded file the way this looks like, that can be anything before extraction |
||
| fh = File.open "#{Rails.root}/public/data/#{genotype.fs_filename}" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use Rails.root.join('path', 'to') instead. |
||
| l = fh.readline() | ||
| # some files, for some reason, start with the UTF-BOM-marker | ||
| l = l.sub("\uFEFF","") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use backticks around command string.