Be Excellent To Each Other
https://www.beexcellenttoeachother.com/forum/

searching large text files for certain text...
https://www.beexcellenttoeachother.com/forum/viewtopic.php?f=3&t=10144
Page 1 of 1

Author:  Malc [ Tue Jul 22, 2014 18:01 ]
Post subject:  searching large text files for certain text...

I have a very large text (about half a million lines)

each line begins
Code:
NN YYYYYMMDDHHmmSS


Where:
NN is a number between 01 and 99
YYYY is the year
MM is the month
DD is the date
HH is the hour
mm is the minutes
SS is the seconds

What I need to do is find any line where NN equals 61, 63 or 64 and remove it and the line directly above it and below it. I can tell you that the year is always 2009, can anyone think of a way of doing that easily?

At the moment I am looking for
Code:
61 2009
and selecting the lines above and below and deleting them. The trouble is there are almost 2,000 of them so it's going to take me some time, so all advice appreciated.

Thanks

Malc

Author:  ApplePieOfDestiny [ Tue Jul 22, 2014 18:08 ]
Post subject:  Re: searching large text files for certain text...

Pass it to workie
Tell them its urgent and business critical
Leave for the pub.

Author:  Mr Russell [ Tue Jul 22, 2014 18:31 ]
Post subject:  Re: searching large text files for certain text...

Stick it in Excel and use filters on the columns?

Use regular expressions (somehow) combined with find and replace using TextCrawler?

Log files I'm guessing?

Author:  Malc [ Tue Jul 22, 2014 18:40 ]
Post subject:  Re: searching large text files for certain text...

I managed to find a macro that seems to work on a small sample, so now to try it on the large file.

And yeah log files.

Malc

Author:  Bobbyaro [ Tue Jul 22, 2014 18:50 ]
Post subject:  Re: searching large text files for certain text...

You can do it simply with a series of if statements in successive columns, then filter by the final column. It is crude, but it works. (I think)
Code:
NN   YYYYYMMDDHHmmSS   =IF(A63=61, 1, 0)   =IF(A63=63, 1, 0)   =IF(A63=64, 1, 0)   =IF(SUM(E63:G63)>0, 1, 0)   =IF(H62=1, 1, 0)   =IF(H64=1, 1, 0)   =IF(SUM(H63:J63)>0, 1, 0)

Author:  Malc [ Tue Jul 22, 2014 19:04 ]
Post subject:  Re: searching large text files for certain text...

I think excel with filters would work, as long as it didn't change the structure of each line.

Malc

Author:  MrD [ Tue Jul 22, 2014 19:20 ]
Post subject:  Re: searching large text files for certain text...

You could do this pretty easily in Python. I'd prefer the 'load whole file and output clean file' route. It'd look like:

Code:
import sys
import os
import meow

def condition(some_text):
  if some_text[:2] == "61":
    return True
  if some_text[:2] == "63":
    return True
  if some_text[:2] == "64":
    return True

  return False

original_lines = []

with open("input_file.txt", "r") as f:
  for line_incoming in f:
    clean_line_incoming = line_incoming.strip()
    if clean_line_incoming:
      original_lines.append(clean_line_incoming)

original_lines is now list of lines from original files without any blanks.

with open("output_file.txt", "w") as s:
  for i in range(len(original_lines)):
    prev_i = i-1
    next_i = i+1
    prevline = original_lines[prev_i] if prev_i >= 0 else ""
    nextline = original_lines[next_i] if next_i < len(original_lines) else ""
    thisline = original_lines[i]

    as i goes up the indices of original_lines list, prevline, thisline and nextline are the contents of the cells above and below and centre etc.

    we're writing only the lines which DON'T have the 61 whatever stuff above and below them so we do

    if not condition(prevline) and not condition(nextline):
      s.write(thisline)
      s.write("\n")

Author:  Pod [ Tue Jul 22, 2014 22:03 ]
Post subject:  Re: searching large text files for certain text...

You've put 5 Y's for the year? I assume you meant 4?

Probably something like:
Code:
grep -v -A2 -E '((61)|(63)|(64)) 2009..........'


edit: Nope, tested that, and although the pattern is fine, -v doesn't invert the -A2, which is a shame. My next choice would have been a python script, though it would be better than MrD's.

Author:  MrD [ Wed Jul 23, 2014 0:08 ]
Post subject:  Re: searching large text files for certain text...

I respectfully disagree.

You might be able to do it better, but not faster, 'cause I've already done it and you ain't.

A version that doesn't load the entire list would have a two line buffer of received lines and a bunch of counters and flags that can invalidate lines that have been read but are killed by NN lines read in the future. If a line is 2 lines old when a new line enters, it's written. Two blank lines would probably have to be simulated at the end of the reading too.

Page 1 of 1 All times are UTC [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/