How do I merge lines from a file I import in Python? - python

I have a file (nameOfTheFile.txt) containing the following data :
-200.0000 23.0786 0.2402 0.9807 2.7610 0.7627 0.3168 1.4249
0.8745 0.4953 1.4652 5.9483 0.0000 0.6919 2.2648 0.3407
0.0000 0.6958 0.5775 0.0000 0.6171 2.6211
-199.9800 23.0706 0.2401 0.9804 2.7598 0.7632 0.3167 1.4246
0.8743 0.4952 1.4646 5.9452 0.0000 0.6917 2.2638 0.3407
0.0000 0.6955 0.5774 0.0000 0.6170 2.6203
I import the data in an array that I would like to use later. However, when I do the following
with open(os.path.join(sys.path[0], 'nameOfTheFile.txt'), 'r') as file:
lines = file.readlines()
tab = []
for i in range(len(lines)):
tab.append(lines[i])
print(tab)
I get this
['-200.0000 23.0786 0.2402 0.9807 2.7610 0.7627 0.3168 1.4249\n', ' 0.8745 0.4953 1.4652 5.9483 0.0000 0.6919 2.2648 0.3407\n', ' 0.0000 0.6958 0.5775 0.0000 0.6171 2.6211\n', ' -199.9800 23.0706 0.2401 0.9804 2.7598 0.7632 0.3167 1.4246\n', ' 0.8743 0.4952 1.4646 5.9452 0.0000 0.6917 2.2638 0.3407\n', ' 0.0000 0.6955 0.5774 0.0000 0.6170 2.6203\n']
I know how to get rid of the \n but when I do so, I still get this output :
['-200.0000 23.0786 0.2402 0.9807 2.7610 0.7627 0.3168 1.4249', ' 0.8745 0.4953 1.4652 5.9483 0.0000 0.6919 2.2648 0.3407', ' 0.0000 0.6958 0.5775 0.0000 0.6171 2.6211', ' -199.9800 23.0706 0.2401 0.9804 2.7598 0.7632 0.3167 1.4246', ' 0.8743 0.4952 1.4646 5.9452 0.0000 0.6917 2.2638 0.3407', ' 0.0000 0.6955 0.5774 0.0000 0.6170 2.6203']
There is still a separation between two lines by a coma, making each line of nameOfTheFile.txt an item of the array. Therefore, when I want to print :
print(tab[0],tab[1])
I obtain
-200.0000 23.0786 0.2402 0.9807 2.7610 0.7627 0.3168 1.4249 0.8745 0.4953 1.4652 5.9483 0.0000 0.6919 2.2648 0.3407
What I would like is that not each line of nameOfTheFile.txt is considered as an item but each values, which would give me by applying the print before :
-200.0000 23.0786
Is there a way to do so please ?
Thanks !

Your file consists of lines of text, but you want numbers. You have to do that conversion yourself.
tab = []
with open(os.path.join(sys.path[0], 'nameOfTheFile.txt'), 'r') as file:
for line in file:
tab += [float(n) for n in line.strip().split()]
print(tab)

Looks like you need list.extend
Ex:
tab = []
with open(os.path.join(sys.path[0], 'nameOfTheFile.txt')) as infile:
for line in infile:
tab.extend(line.strip().split()) #Strip newline char and split by space.
print(tab[0],tab[1])
Output:
-200.0000 23.0786

Just read the entire file and split its content:
with open('nameOfTheFile.txt', 'r') as file:
text = file.read()
tab = text.split()
print(tab)
print(tab[0], tab[1])
Or use
tab = [float(i) for i in text.split()]
if you need numbers instead of strings.

Related

Why is this regular expression not returning anything?

I've recently been working on a project in the field of machine learning, and I'm having some trouble with a regular expression that is needed for my data. For some reason, instead of returning a decimal value as it should, it ends up returning a value of None. Any suggestions? Below is my code, and the file I'm trying to get data out of.
for list_num in range(0,92):
mach = [3,4,5,6,7]
for j in range(0,4):
if list_num >= 0 and list_num <= 4:
if os.path.isfile('C://Users/avickers/Desktop/XFOIL_Training_Data/listofoutputs/listofoutputs0006'+str(mach[j])+'.pol'):
tempaccess = os.listdir('C://Users/avickers/Desktop/XFOIL_Training_Data/listofoutputs')
with open('C://Users/avickers/Desktop/XFOIL_Training_Data/listofoutputs/listofoutputs0006'+str(mach[j])+'.pol','r') as tempfile:
lines = tempfile.readlines()
file = tempfile.read()
name = '0006'
f = open('C://Users/avickers/Desktop/XFOIL_Training_Data/listofoutputs/data.pol','a+')
Re = re.search(r'Re\s*=\s*(\d\.\d+)',file)
AOA = []
Dl = []
Dd = []
num=0
for line in lines[12:]:
columns = line.split()
AOA.append(columns[0])
Dl.append(columns[1])
Dd.append(columns[2])
num += 1
for i in range(0,num):
f.write(name+' ')
f.write(str(mach[j]/10)+' ')
f.write(str(Re)+' ')
f.write(AOA[i]+' ')
f.write(Dl[i]+' ')
f.write(Dd[i]+'\n')
The file:
XFOIL Version 6.99
Calculated polar for: NACA 0006
1 1 Reynolds number fixed Mach number fixed
xtrf = 1.000 (top) 1.000 (bottom)
Mach = 0.300 Re = 2.213 e 6 Ncrit = 9.000
alpha CL CD CDp CM Cpmin XCpmin Top_Xtr
Bot_Xtr
------ -------- --------- --------- -------- -------- -------- -------- ----
-10.000 -0.7244 0.11348 0.11239 0.0360 -3.2095 0.0001 1.0000
0.0032
-9.500 -0.7284 0.10263 0.10156 0.0311 -3.6224 0.0001 1.0000
0.0033
-9.000 -0.7330 0.09190 0.09085 0.0257 -3.9746 0.0001 1.0000
0.0036
The specific value I'm trying to extract is the Re. Thanks!
The reason for None is file has nothing.
lines = tempfile.readlines()
file = tempfile.read()
Once lines is done with readlines(), the pointer tempfile is at the end.You cannot use it again to read.You need to open file again.
x="""XFOIL Version 6.99
Calculated polar for: NACA 0006
1 1 Reynolds number fixed Mach number fixed
xtrf = 1.000 (top) 1.000 (bottom)
Mach = 0.300 Re = 2.213 e 6 Ncrit = 9.000
alpha CL CD CDp CM Cpmin XCpmin Top_Xtr
Bot_Xtr
------ -------- --------- --------- -------- -------- -------- -------- ----
-10.000 -0.7244 0.11348 0.11239 0.0360 -3.2095 0.0001 1.0000
0.0032
-9.500 -0.7284 0.10263 0.10156 0.0311 -3.6224 0.0001 1.0000
0.0033
-9.000 -0.7330 0.09190 0.09085 0.0257 -3.9746 0.0001 1.0000
0.0036"""
import re
print re.search(r"Re\s*=\s*(\d\.\d+)", x).group(1)
Use group(1) .
It prints 2.123

Python 3 Import Text File with Floating Point Numbers and Mixed Content

I have as section of code that inputs:
LOEWDIN ATOMIC CHARGES
----------------------
0 C : -0.780631
1 H : 0.114577
2 Br: 0.309802
3 Cl: 0.357316
4 F : -0.001065
And this is the code:
import numpy as np
name = input("Enter Molecule ID: ")
name_in = name+'.lac.dat'
print(name_in)
atm_chg = []
with open(name_in) as f:
# skip two lines
f.readline()
f.readline()
for line in f.readlines():
if line.strip(): # is the line non-empty, ignoring white space
atm_chg.append(float( line.split()[-1] ))
np.savetxt('atm_chg',atm_chg,delimiter=',')
The above produces:
-7.806309999999999638e-01
1.145769999999999983e-01
3.098020000000000218e-01
3.573160000000000225e-01
-1.064999999999999974e-03
I want to do the same thing with this text file:
* MAYER POPULATION ANALYSIS *
*****************************
NA - Mulliken gross atomic population
ZA - Total nuclear charge
QA - Mulliken gross atomic charge
VA - Mayer's total valence
BVA - Mayer's bonded valence
FA - Mayer's free valence
ATOM NA ZA QA VA BVA FA
0 C 5.8816 6.0000 0.1184 4.2631 4.2631 0.0000
1 H 0.8495 1.0000 0.1505 0.9510 0.9510 0.0000
2 Br 35.0064 35.0000 -0.0064 1.2192 1.2192 -0.0000
3 Cl 17.0401 17.0000 -0.0401 1.2405 1.2405 -0.0000
4 F 9.2225 9.0000 -0.2225 1.0449 1.0449 -0.0000
So far everything that I've tried has failed, and I really don't what to try next.
I would appreciate a shove in the right direction.

splitting up lines of a file using python

I have a file containing:
0.0000 6 G01 G03 G04 G11 G28 G32
42.750 38.750 44.250 36.000 39.000 42.750
I am trying to split up the 2nd and 3rd line so that i have a list of:
(0.0000, 6, G01, G03, G04, G11, G28, G32) for the 2nd line and,
(42.750, 38.750, 44.250, 36.000, 39.000, 42.750) for the 3rd line.
so far I have:
for line in file:
if ('G') in line:
sats = line.split("/t")
elif ('-1') in line:
epoch = line.split("/t")
However they are not being split up properly and all that i get is:
sats= [' 0.0000 6 G01 G03 G04 G11 G28 G32\n' ]
epoch= [' 1.0000 -1\n']
Can anybody help?
Try to replace split("/t") by split().

python skip useless data and read specific line [closed]

my text file has 20 pages long and i need to print specific data
my text file looks like:
123mcx version 1.5.0 ld=fri Apr 09 08:00:00 MST 2008 12/10/12 11:59:03
***************************************************************************************
1- c ==== CELLS ====
2- 1 0 1 $ outside
3- 2 102 -0.001 -1 23 51
4- c
5- 21 3 -4.15e-4 -21 $ detector
6- 22 5 -11.34 -22 21 $ Pb
7- 23 6 -7.87 -23 22 $ Fe tube
8- c
9- 50 7000 -1.7 -51 41
multiplier bins
att constant material reactions or material-rho*x pairs
1.02400E+00 3 103
time bins
-i to 5.00000E+02 shakes
5.00000E+02 to 1.06000E+03 shakes
1.06000E+03 to 1.69000E+03 shakes
1.69000E+03 to 2.40000E+03 shakes
2.40000E+03 to 3.19000E+03 shakes
3.19000E+03 to 4.08000E+03 shakes
4.08000E+03 to 5.08000E+03 shakes
5.08000E+03 to 6.19000E+03 shakes
6.19000E+03 to 7.43000E+03 shakes
7.43000E+03 to 8.84000E+03 shakes
multiplier bin: 1.02400E+00 3 103
time
5.0000E+02 5.54627E-06 0.0004-------- [I only need this data start here]
1.0600E+03 2.40573E-06 0.0018
1.6900E+03 2.11609E-06 0.0026
2.4000E+03 2.04138E-06 0.0033
3.1900E+03 2.01640E-06 0.0038
4.0800E+03 2.07022E-06 0.0043
5.0800E+03 2.11266E-06 0.0047
6.1900E+03 2.16806E-06 0.0050
7.4300E+03 2.24147E-06 0.0053
8.8400E+03 2.32872E-06 0.0056
1.0400E+04 2.36765E-06 0.0060
1.2200E+04 2.50930E-06 0.0061
1.4100E+04 2.43235E-06 0.0065
1.6400E+04 2.69267E-06 0.0066-----[end]
1analysis of the results in the tally fluctuation chart bin
(tfc) for tally 14 with nps =1598425200 print table 160
normed average 7.174350E-05 unnormed history = 8.85335E-01
estimated error = 0.0014 estimated variance of the variance = 0.0000
i need to skip all data and print only i need
like:
5.0000E+02 5.54627E-06 0.0004
1.0600E+03 2.40573E-06 0.0018
1.6900E+03 2.11609E-06 0.0026
2.4000E+03 2.04138E-06 0.0033
3.1900E+03 2.01640E-06 0.0038
4.0800E+03 2.07022E-06 0.0043
5.0800E+03 2.11266E-06 0.0047
6.1900E+03 2.16806E-06 0.0050
7.4300E+03 2.24147E-06 0.0053
8.8400E+03 2.32872E-06 0.0056
1.0400E+04 2.36765E-06 0.0060
1.2200E+04 2.50930E-06 0.0061
1.4100E+04 2.43235E-06 0.0065
1.6400E+04 2.69267E-06 0.0066
after that i need to add only second column numbers like
(5.54627E-06 + 2.40573E-06 + 2.11609E-06 + ...+ 2.69267E-06) = 3.504897E-05
please help me to know how to skip the data and print only i want
thank you
Use a regular expression on each line:
5.0000E+02 5.54627E-06 0.0004 would look something like:
import re
goodLineRegex = r'\d+\.\d+E[+-]\d+\s{3}\d+\.\d+E[+-]\d+\s\d+\.\d+'
for line in file:
m = goodLineRegex.match(line)
if m is not None:
do_something(line)
you could use regular expressions to find the text that matches your search criteria
data = numpy.genfromtxt('filename', skip_header=?, skip_footer=?, usecols=[1])
print data.sum()
skip_header : int, optional
The numbers of lines to skip at the beginning of the file.
skip_footer : int, optional
The numbers of lines to skip at the end of the file
usecols : sequence, optional
Which columns to read, with 0 being the first. For example,
usecols = (1, 4, 5) will extract the 2nd, 5th and 6th columns.
You could also combine this with the regex based approaches, since genfromtxt will take a generator in place of a file.

Parsing a text file into a list in python

I'm completely new to Python, and I'm trying to read in a txt file that contains a combination of words and numbers. I can read in the txt file just fine, but I'm struggling to get the string into a format I can work with.
import matplotlib.pyplot as plt
import numpy as np
from numpy import loadtxt
f= open("/Users/Jennifer/Desktop/test.txt", "r")
lines=f.readlines()
Data = []
list=lines[3]
i=4
while i<12:
list=list.append(line[i])
i=i+1
print list
f.close()
I want a list that contains all the elements in lines 3-12 (starting from 0), which is all numbers. When I do print lines[1], I get the data from that line. When I do print lines, or print lines[3:12], I get each character preceded by \x00. For example, the word "Plate" becomes: ['\x00P\x00l\x00a\x00t\x00e. Using lines = [line.strip() for line in f] gets the same result. When I try to put individual lines together in the while loop above, I get the error "AttributeError: 'str' object has no attribute 'append'."
How can I get a selection of lines from a txt file into a list? Thank you so much!!!
Edit: The txt file looks like this:
BLOCKS= 1
Plate: Phosphate Noisiness Assay 2000x 1.3 PlateFormat Endpoint Absorbance Raw FALSE 1 1 650 1 12 96 1 8
Temperature(¡C) 1 2 3 4 5 6 7 8 9 10 11 12
21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227
0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197
0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451
0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235
0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191
0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273
0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038
0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062
~End
Original Filename: 2013-08-06 Phosphate Noisiness; Date Last Saved: 8/6/2013 7:00:55 PM
Update
I used this code:
f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()
first_twelve = file_list[3:11]
data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]
print data
to get this result:
[' 21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227 ', ' 0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197 ', ' 0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451 ', ' 0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235 ', ' 0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191 ', ' 0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273 ', ' 0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038 ', ' 0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062 ']
Which is (correct me if I'm wrong, very new to Python!) a list of lists, which I should be able to work with. Thank you so much to everyone who responded!!!
When you write the code lines = f.readlines() a list of lines is being return to you. When you then say lines[3], you're getting the 3rd line. Thats why you're ending up with individual characters.
All you need to do is say
files = open("Your File.txt")
file_list = files.readlines()
first_twelve = file_list[0:12] #returns a list with the first 12 lines
Once you've got the first_twelve array you can do whatever you want with it.
To print each line you would do:
for each_line in first_twelve:
print each_line
That should work for you.
You have the line list=lines[3] in your source code.
Two issues here.
Don't use list as a variable name. You silently overwrote the built-in list constructor when you did that.
When you take one item from a list lines[3] now you only have that object -- in this case a string. When you try to append to it you can't -- it isn't a list.
You can demonstrate your bug easily in the console:
>>> li=['1']
>>> li.append('2')
>>> li
['1', '2']
>>> st='1'
>>> st.append('2')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'append'
Other comments, in general, on your code.
Assume you have a text file called '/tmp/test/txt' that contains this text:
Line 1
Line 2
...
Line 19
Reading the contents of that file is a simple as this:
with open('/tmp/test.txt', 'r') as fin:
lines=fin.readlines()
If you want a subset of the lines, you can use a slice:
subset=lines[3:12]
If you want to process each line for something, like strip the carriage return, use the file object as an iterator:
with open('/tmp/test.txt', 'r') as fin:
lines=[]
for line in fin:
lines.append(line.strip())
For your specific problem of having NULs in the data, perhaps you are reading a binary file masquerading as text? You need to post an example of the file.
Edit
Your file contains Unicode characters. (right after 'Temperature') which may be some of the odd characters you are seeing. If you are only interested in the lines with numbers, you can ignore them.
You do not YET have a list of lists, but it easy to get:
data=[] # will hold the lines of the file
with open(ur_file,'rU') as fin:
for line in fin: # for each line of the file
line=line.strip() # remove CR/LF
if line: # skip blank lines
data.append(line)
print data # list of STRINGS separated by spaces
matrix=[map(float,line.split()) for line in data[3:10]] # convert the strings..
print matrix # NOW you have a list of list of floats...
The tweak below might help you to get rid of the \00 character embedded in your data
f = open("/Users/Jennifer/Desktop/test.text", "r")
lines = f.readlines()
lines = [x.replace('\x00','') for x in lines]
for i in range(3,12):
l = []
l.append(lines[i])
I am not sure if your data has other delimiters (say comma or space) to separate the numbers. If so, a simple split will help to convert the line into a list:
line = '123.00,456.00,789.00'
l = line.split(',') # list will become ['123.00','456.00','789.00']
Edit
Continue from Rachel's updated code:
f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()
first_twelve = file_list[3:11]
data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]
items = []
for dataline in data:
items += dataline.split(' ')
items = [float(x) for x in items if len(x) > 0] # remove dummy items left in the list
print items

Resources