Populate new rows while iterating some elements in the dataframe using pandas - python

I trying to read a csv file containing a cell (which comes after skipping 28 rows). Using the following code,
import os, csv
import pandas as pd
file="testfile.csv"
df = pd.read_csv(file, skiprows=28, delimiter=";")
df.iloc[0:0]
cell A29 is read - which outputs a list
[SS_1; FC2; Computer_1995.SFX; N; -0.6; testfile]
I want to have the following output from cells A30 to A61
[SS_2; FC2; Computer_1995.SFX; FILE_Year00_-0.6%modrate.csv; -0.6;]
[SS_3; FC2; Computer_1995.SFX; FILE_Year01_-0.3%modrate.csv; -0.3;]
[SS_4; FC2; Computer_1995.SFX; FILE_Year02_0%modrate.csv; 0;]
[SS_5; FC2; Computer_1995.SFX; FILE_Year03_0.3%modrate.csv; 0.3;]
[SS_6; FC2; Computer_1995.SFX; FILE_Year04_0.6%modrate.csv; 0.6;]
.
.
.
[SS_32; FC2; Computer_1995.SFX; N; FILE_Year30_8.4%modrate.csv; 8.4;]
As you can see there is iteration invloved with elements SS_, FILE_Yearxx_xx%modrate.csv, -0.6.
Can anyone suggest easiest way to operate with existing dataframe or should I make a new df with for loop?

Related

How to extract column from excel file?

What be the best method to extract a column set of data? I have Matlab code for this data analysis, but I want to use Python.
Excel file
How would extract individual columns and put them into a column vector in Python? For example, say I want to extract column B, rows 3 to 26.
The code for reading in the excel file is below:
# importing libraries
import numpy as np
import pandas as pd
# reads in excel data
cylinder_data_file = pd.ExcelFile('FriDataCylinder.xlsx')
cylinder_data_file.sheet_names
data = cylinder_data_file.parse('Sheet1')
I am using Python 3.6 as well.
I'm not sure i understand your question, but given a pandas.DataFrame df, you can slice a column by using the column label
column = df.column_label
or the column integer index.
column = df.iloc[:, 2]
You could create a list of columns with
column_list.append(column)
and finally create a new DataFrame by concatenating the list like
new_df = pd.concat(column_list)
If you want a more in-depth answer, you will need to provide some data.

Add new column to dataframe using Pandas but it goes beyond the length of the data

Im trying to add a new column to a pandas dataframe. Also, I try to give a name to index to be printed out in Excel when I export the data
import pandas as pd
import csv
#read csv file
file='RALS-04.csv'
df=pd.read_csv(file)
#select the columns that I want
column1=df.iloc[:,0]
column2=df.iloc[:,2]
column3=df.iloc[:,3]
column1.index.name="items"
column2.index.name="march2012"
column3.index.name="march2011"
df=pd.concat([column1, column2, column3], axis=1)
#create a new column with 'RALS' as a defaut value
df['comps']='RALS'
#writing them back to a new CSV file
with open('test.csv','a') as f:
df.to_csv(f, index=False, header=True)
The output is the 'RALS' that I added to the dataframe goes to Row 2000 while the data stops at row 15. How to constrain the RALS so that it doesnt go beyond the length of the data being exported? I would also prefer a more elegant, automated way rather than specifying at which row should the default value stops at.
The second question is, the labels that I have assigned to the columns using columns.index.name, does not appear in the output. Instead it is replaced by a 0 and a 1. Please advise solutions.
Thanks so much for inputs

Selecting specific excel rows for analysis in pandas/ipython?

This question is probably quite elementary, but I am totally stuck here so I would appreciate any help: Is there a way to extract data for analysis from an excel file by selecting specific row numbers? For example, if I have an excel file with 30 rows, and I want to add up the values of row 5+10+21+27 ?
I only managed to learn how to select adjacent ranges with the iloc function like this:
import pandas as pd
df = pd.read_excel("example.xlsl")
df.iloc[1:5]
If this is not possible in Pandas, I would appreciate advice how to copy selected rows from a spreadsheet into a new spreadsheet via openpyxl, then I could just load the new worksheet into Pandas.
You can do like so, passing a list of indices:
df.iloc[[4,9,20,26]].sum()
Mind that pyton uses 0-indexing, so these indices are one below the desired row numbers.
import pandas as pd
df = pd.read_excel("example.xlsx")
sum(df.data[i - 1] for i in [5, 10, 21, 27])
My df:
data
0 1
1 2
2 3
3 4
4 5
...

How to extract the specific columns from a csv file and write a new csv for it, in python

I have a csv file and i want to extract some specific columns from it. How can I do that?
I have a dictionary of headings and the cell location like:
dict = {'Col1' : [(4,5)], 'Col2' : [(4,7)], 'Col3' : [(4,9)]}
I want to extract the data starting from the values of dict, till the end of the csv file!
For example:
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,Col0,Col1,,Col2,,Col3,Col4,
,,,bgr,abc,,efg,,hij,123,
,,,cde,klm,,nop,,qrs,123,
,,,asd,tuv,,wxy,,zzz,456,
,,,,,,,,,,
,,,,,,,,,,
I want to extract
Col1,Col2,Col3
abc,efg,hij
klm,nop,qrs
tuv,wxy,zzz
and write it in a new csv file! please help me doing this!
I want to efficiently handle this situation!
Pandas is a library with a powerful method to read csv files.
In the case you want to read each column from the same row, the following script will do the work (note that only 2 python lines are useful):
import pandas as pd
# Give the name of the columns
colnames = ('skip1', 'skip2', 'skip3', 'Col0','Col1','skip4','Col2','skip5','Col3','Col4','skip6')
# Give the number of lines to skip
nbskip=4
# Give the number of rows to read (you can also filter rows after reading and remove the empty ones)
nrows=3
#List of columns to keep
keep_only = ('Col1','Col2','Col3')
#Read the csv
df = pd.io.parsers.read_csv('test.csv',
header=None,
skiprows=nbskip,
names=colnames,
nrows=nrows, # Remove if you prefer filter rows
usecols=keep_only)
# If the number of lines to keep is unknow,
# you can remove empty lines here
#Save the csv
df.to_csv('result.csv', index=False)

pandas reading csv orientation

Hei I'm trying to read in pandas the csv file you can download from here (euribor rates I think you can imagine the reason I would like to have this file!). The file is a CSV file but it is somehow strangely oriented. If you import it in Excel file has the format
02/01/2012,03/01/2012,04/01/2012,,,,
1w 0.652,0.626,0.606,,,,
2w,0.738,0.716,0.700,,,,
act with first column going up to 12m (but I have give you the link where you can download a sample). I would like to read it in pandas but I'm not able to read it in the correct way. Pandas has a built-in function for reading csv files but somehow it expect to be row oriented rather than column oriented. What I would like to do is to obtain the information on the row labeled 3m and having the values and the date in order to plot the time variation of this index. But I can't handle this problem. I know I can read the data with
import pandas
data = pandas.io.read_csv("file.csv",parse_dates=True)
but it would work if the csv file would be somehow transpose. H
A pandas dataframe has a .transpose() method, but it doesn't like all the empty rows in this file. Here's how to get it cleaned up:
df = pandas.read_csv("hist_EURIBOR_2012.csv") # Read the file
df = df[:15] # Chop off the empty rows beyond 12m
df2 = df.transpose()
df2 = df2[:88] # Chop off what were empty columns (I guess you should increase 88 as more data is added.
Of course, you can chain these together:
df2 = pandas.read_csv("hist_EURIBOR_2012.csv")[:15].transpose()[:88]
Then df2['3m'] is the data you want, but the dates are still stored as strings. I'm not quite sure how to convert it to a DateIndex.
I've never used pandas for csv processing. I just use the standard Python lib csv functions as these use iterators.
import csv
myCSVfile=r"c:/Documents and Settings/Jason/Desktop/hist_EURIBOR_2012.csv"
f=open(myCSVfile,"r")
reader=csv.reader(f,delimiter=',')
data=[]
for l in reader:
if l[0].strip()=="3m":
data.append(l)
f.close()

Resources