Python Line Processing Pattern
processing a file, (or a stream) is as old as the hills, learn this pattern with Python.
Created:
Sometimes you want to write a program that has to modify line(s) in a
text file. The “UNIX” way of doing this is of course to use
sed
(stream editor) or awk
. Since I can type
faster than think in awk
, I write python instead.
The common pattern for such programs is:
for file in matching_files
for line in file
if line matches pattern
modify the line
print the line
else
write the line
One twist to the above is, you don’t want to overwrite the file before you know the overwriting can complete successfully.
The defensive way to handle this is: put the source files in a git
repo, so that you can always revert the changes back. This is also
useful in git diff
ing the changes so that you can verify
your program actually does what you wanted.
In addition, you can write the changed contents to a temporary file, and rename that file to the original file if there were no errors. Essentially.
$ ./processing_script original.txt > temp.txt && mv temp.txt original.txt
I learnt about NamedTemporaryFile
module recently, and I
used it to do the following.
#!/usr/bin/env python
"""
This program will rewrite the following:
---
title: AWK programming language
kind: notebook
tags: programming-language
date: 2020-02-12T05:51:45
---
to:
---
title: AWK programming language
kind: notebook
tags: programming-language
date: 2020-02-12
updated: 2020-02-12T05:51:45
---
"""
import glob
import shutil
import tempfile
def main():
= glob.glob('*.md')
files for mdfile in files:
= tempfile.NamedTemporaryFile('w', delete=False)
temp with open(mdfile) as inp:
for line in inp:
if line.startswith('date: '):
= line.split('date: ')[1]
timestamp = timestamp[:10]
dt print(f'{mdfile} {dt}')
f'date: {dt}\n')
temp.write(f'updated: {timestamp}')
temp.write(else:
temp.write(line)
temp.close()
shutil.move(temp.name, mdfile)
if __name__ == '__main__':
main()
Update: David Glick pointed out that – You can skip “lines = inp.readlines()” and just iterate over inp; it’ll give you one line at a time.
Update-2 (2020/5/20): If you want to know more about
awk
, which is really the best know “line processing”
language, see this video by
Ben Porter from Apr 2020.