Be Excellent To Each Other https://www.beexcellenttoeachother.com/forum/ |
|
searching large text files for certain text... https://www.beexcellenttoeachother.com/forum/viewtopic.php?f=3&t=10144 |
Page 1 of 1 |
Author: | Malc [ Tue Jul 22, 2014 18:01 ] |
Post subject: | searching large text files for certain text... |
I have a very large text (about half a million lines) each line begins Code: NN YYYYYMMDDHHmmSS Where: NN is a number between 01 and 99 YYYY is the year MM is the month DD is the date HH is the hour mm is the minutes SS is the seconds What I need to do is find any line where NN equals 61, 63 or 64 and remove it and the line directly above it and below it. I can tell you that the year is always 2009, can anyone think of a way of doing that easily? At the moment I am looking for Code: 61 2009 and selecting the lines above and below and deleting them. The trouble is there are almost 2,000 of them so it's going to take me some time, so all advice appreciated.Thanks Malc |
Author: | ApplePieOfDestiny [ Tue Jul 22, 2014 18:08 ] |
Post subject: | Re: searching large text files for certain text... |
Pass it to workie Tell them its urgent and business critical Leave for the pub. |
Author: | Mr Russell [ Tue Jul 22, 2014 18:31 ] |
Post subject: | Re: searching large text files for certain text... |
Stick it in Excel and use filters on the columns? Use regular expressions (somehow) combined with find and replace using TextCrawler? Log files I'm guessing? |
Author: | Malc [ Tue Jul 22, 2014 18:40 ] |
Post subject: | Re: searching large text files for certain text... |
I managed to find a macro that seems to work on a small sample, so now to try it on the large file. And yeah log files. Malc |
Author: | Bobbyaro [ Tue Jul 22, 2014 18:50 ] |
Post subject: | Re: searching large text files for certain text... |
You can do it simply with a series of if statements in successive columns, then filter by the final column. It is crude, but it works. (I think) Code: NN YYYYYMMDDHHmmSS =IF(A63=61, 1, 0) =IF(A63=63, 1, 0) =IF(A63=64, 1, 0) =IF(SUM(E63:G63)>0, 1, 0) =IF(H62=1, 1, 0) =IF(H64=1, 1, 0) =IF(SUM(H63:J63)>0, 1, 0)
|
Author: | Malc [ Tue Jul 22, 2014 19:04 ] |
Post subject: | Re: searching large text files for certain text... |
I think excel with filters would work, as long as it didn't change the structure of each line. Malc |
Author: | MrD [ Tue Jul 22, 2014 19:20 ] |
Post subject: | Re: searching large text files for certain text... |
You could do this pretty easily in Python. I'd prefer the 'load whole file and output clean file' route. It'd look like: Code: import sys
import os import meow def condition(some_text): if some_text[:2] == "61": return True if some_text[:2] == "63": return True if some_text[:2] == "64": return True return False original_lines = [] with open("input_file.txt", "r") as f: for line_incoming in f: clean_line_incoming = line_incoming.strip() if clean_line_incoming: original_lines.append(clean_line_incoming) original_lines is now list of lines from original files without any blanks. with open("output_file.txt", "w") as s: for i in range(len(original_lines)): prev_i = i-1 next_i = i+1 prevline = original_lines[prev_i] if prev_i >= 0 else "" nextline = original_lines[next_i] if next_i < len(original_lines) else "" thisline = original_lines[i] as i goes up the indices of original_lines list, prevline, thisline and nextline are the contents of the cells above and below and centre etc. we're writing only the lines which DON'T have the 61 whatever stuff above and below them so we do if not condition(prevline) and not condition(nextline): s.write(thisline) s.write("\n") |
Author: | Pod [ Tue Jul 22, 2014 22:03 ] |
Post subject: | Re: searching large text files for certain text... |
You've put 5 Y's for the year? I assume you meant 4? Probably something like: Code: grep -v -A2 -E '((61)|(63)|(64)) 2009..........' edit: Nope, tested that, and although the pattern is fine, -v doesn't invert the -A2, which is a shame. My next choice would have been a python script, though it would be better than MrD's. |
Author: | MrD [ Wed Jul 23, 2014 0:08 ] |
Post subject: | Re: searching large text files for certain text... |
I respectfully disagree. You might be able to do it better, but not faster, 'cause I've already done it and you ain't. A version that doesn't load the entire list would have a two line buffer of received lines and a bunch of counters and flags that can invalidate lines that have been read but are killed by NN lines read in the future. If a line is 2 lines old when a new line enters, it's written. Two blank lines would probably have to be simulated at the end of the reading too. |
Page 1 of 1 | All times are UTC [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |