Skip to content
This repository was archived by the owner on Jan 14, 2026. It is now read-only.
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 45 additions & 9 deletions app/workers/preparsing.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
# frozen_string_literal: true
require 'zip'
require 'zlib'
require 'rubygems/package'
require 'digest'

class Nested
# a weird hack to get a nested method call working like I want it to
def myreader(zipfile)


class Preparsing
include Sidekiq::Worker
# only retry 10 times - after that, the genotyping probably has already been deleted
Expand All @@ -10,11 +17,41 @@ class Preparsing
def perform(genotype_id)
genotype = Genotype.find(genotype_id)

logger.info "Starting preparse"
biggest = ''
biggest_size = 0
begin
Zip::File.open(genotype.genotype.path) do |zipfile|
logger.info "Starting preparse on #{genotype.genotype.path}"
# First, we need to find out which archive or flat text our uploaded file is!
# We use the bash tool file for that
#
# There are two possible outcomes - file is a collection of files (tar, tar.gz, zip)
# or file is a single file (ASCII, gz)
filetype = %x{file #{genotype.genotype.path}}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backticks around command string.

case filetype
when /ASCII text/
logger.info "File is flat text"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

reader = File.method("open")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

is_collection = false
when /gzip compressed data, was/
reader = Zlib::GzipReader.method("open")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

logger.info "File is gz"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

is_collection = false
when /gzip compressed data, last modified/
reader = lambda { |zipfile| Gem::Package::TarReader.new(Zlib::GzipReader.open(zipfile)) }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the -> { ... } lambda literal syntax for single line lambdas.

is_collection = true
when /POSIX tar archive/
logger.info "File is tar"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

reader = Gem::Package::TarReader.method("new")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

is_collection = true
when /Zip archive data/
logger.info "File is zip"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

reader = Zip::File.method("open")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

is_collection = true
end


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line detected.

if is_collection
# Find the biggest file in the archive
biggest = ''
biggest_size = 0
reader.call genotype.genotype.path do |zipfile|
# find the biggest file, since that's going to be the genotyping
zipfile.each do |entry|
if entry.size > biggest_size
Expand All @@ -27,14 +64,13 @@ def perform(genotype_id)
system("mv #{Rails.root}/tmp/#{genotype.fs_filename}.csv #{Rails.root}/public/data/#{genotype.fs_filename}")
logger.info "copied file"
end

rescue
logger.info "nothing to unzip, seems to be a text-file in the first place"
else
system("cp #{genotype.genotype.path} #{Rails.root}/public/data/#{genotype.fs_filename}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rails.root.join('path', 'to') instead.

end

# now that they are unzipped, check if they're actually proper files
file_is_ok = false
fh = File.open(genotype.genotype.path)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this ever work?? It parses the original uploaded file the way this looks like, that can be anything before extraction

fh = File.open "#{Rails.root}/public/data/#{genotype.fs_filename}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rails.root.join('path', 'to') instead.

l = fh.readline()
# some files, for some reason, start with the UTF-BOM-marker
l = l.sub("\uFEFF","")
Expand Down