Be Excellent To Each Other

And, you know, party on. Dude.

All times are UTC [ DST ]




Reply to topic  [ 9 posts ] 
Author Message
 Post subject: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 18:01 
User avatar
Isn't that lovely?

Joined: 30th Mar, 2008
Posts: 10910
Location: Devon
I have a very large text (about half a million lines)

each line begins
Code:
NN YYYYYMMDDHHmmSS


Where:
NN is a number between 01 and 99
YYYY is the year
MM is the month
DD is the date
HH is the hour
mm is the minutes
SS is the seconds

What I need to do is find any line where NN equals 61, 63 or 64 and remove it and the line directly above it and below it. I can tell you that the year is always 2009, can anyone think of a way of doing that easily?

At the moment I am looking for
Code:
61 2009
and selecting the lines above and below and deleting them. The trouble is there are almost 2,000 of them so it's going to take me some time, so all advice appreciated.

Thanks

Malc

_________________
Where's the Kaboom? I was expecting an Earth shattering Kaboom!


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 18:08 
Filthy Junkie Bitch

Joined: 17th Dec, 2008
Posts: 8293
Pass it to workie
Tell them its urgent and business critical
Leave for the pub.


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 18:31 
Awesome
User avatar
Yes

Joined: 6th Apr, 2008
Posts: 12240
Stick it in Excel and use filters on the columns?

Use regular expressions (somehow) combined with find and replace using TextCrawler?

Log files I'm guessing?

_________________
Always proof read carefully in case you any words out


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 18:40 
User avatar
Isn't that lovely?

Joined: 30th Mar, 2008
Posts: 10910
Location: Devon
I managed to find a macro that seems to work on a small sample, so now to try it on the large file.

And yeah log files.

Malc

_________________
Where's the Kaboom? I was expecting an Earth shattering Kaboom!


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 18:50 
User avatar
Ticket to Ride World Champion

Joined: 18th Apr, 2008
Posts: 11843
You can do it simply with a series of if statements in successive columns, then filter by the final column. It is crude, but it works. (I think)
Code:
NN   YYYYYMMDDHHmmSS   =IF(A63=61, 1, 0)   =IF(A63=63, 1, 0)   =IF(A63=64, 1, 0)   =IF(SUM(E63:G63)>0, 1, 0)   =IF(H62=1, 1, 0)   =IF(H64=1, 1, 0)   =IF(SUM(H63:J63)>0, 1, 0)

_________________
No, it was a giant robot castle!


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 19:04 
User avatar
Isn't that lovely?

Joined: 30th Mar, 2008
Posts: 10910
Location: Devon
I think excel with filters would work, as long as it didn't change the structure of each line.

Malc

_________________
Where's the Kaboom? I was expecting an Earth shattering Kaboom!


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 19:20 
User avatar
What-ho, chaps!

Joined: 30th Mar, 2008
Posts: 2138
You could do this pretty easily in Python. I'd prefer the 'load whole file and output clean file' route. It'd look like:

Code:
import sys
import os
import meow

def condition(some_text):
  if some_text[:2] == "61":
    return True
  if some_text[:2] == "63":
    return True
  if some_text[:2] == "64":
    return True

  return False

original_lines = []

with open("input_file.txt", "r") as f:
  for line_incoming in f:
    clean_line_incoming = line_incoming.strip()
    if clean_line_incoming:
      original_lines.append(clean_line_incoming)

original_lines is now list of lines from original files without any blanks.

with open("output_file.txt", "w") as s:
  for i in range(len(original_lines)):
    prev_i = i-1
    next_i = i+1
    prevline = original_lines[prev_i] if prev_i >= 0 else ""
    nextline = original_lines[next_i] if next_i < len(original_lines) else ""
    thisline = original_lines[i]

    as i goes up the indices of original_lines list, prevline, thisline and nextline are the contents of the cells above and below and centre etc.

    we're writing only the lines which DON'T have the 61 whatever stuff above and below them so we do

    if not condition(prevline) and not condition(nextline):
      s.write(thisline)
      s.write("\n")

_________________
[www.mrdictionary.net]


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Tue Jul 22, 2014 22:03 
User avatar
MR EXCELLENT FACE

Joined: 30th Mar, 2008
Posts: 2568
You've put 5 Y's for the year? I assume you meant 4?

Probably something like:
Code:
grep -v -A2 -E '((61)|(63)|(64)) 2009..........'


edit: Nope, tested that, and although the pattern is fine, -v doesn't invert the -A2, which is a shame. My next choice would have been a python script, though it would be better than MrD's.

_________________
This man is bound by law to clear the snow away


Top
 Profile  
 
 Post subject: Re: searching large text files for certain text...
PostPosted: Wed Jul 23, 2014 0:08 
User avatar
What-ho, chaps!

Joined: 30th Mar, 2008
Posts: 2138
I respectfully disagree.

You might be able to do it better, but not faster, 'cause I've already done it and you ain't.

A version that doesn't load the entire list would have a two line buffer of received lines and a bunch of counters and flags that can invalidate lines that have been read but are killed by NN lines read in the future. If a line is 2 lines old when a new line enters, it's written. Two blank lines would probably have to be simulated at the end of the reading too.

_________________
[www.mrdictionary.net]


Top
 Profile  
 
Display posts from previous:  Sort by  
Reply to topic  [ 9 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Columbo, Vogons and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search within this thread:
You are using the 'Ted' forum. Bill doesn't really exist any more. Bogus!
Want to help out with the hosting / advertising costs? That's very nice of you.
Are you on a mobile phone? Try http://beex.co.uk/m/
RIP, Owen. RIP, MrC.

Powered by a very Grim... version of phpBB © 2000, 2002, 2005, 2007 phpBB Group.