exporting a text with double quotes - python

I'm pretty new to python and I'm facing the problem below.
I'm importing an xls file in python that in one of the columns has :
<a href="myhyperlink" target='_blank' >text displayed</a>
After I 'm processing the file I need to export it to a .txt file.
when I'm using this line:
file_to_export.to_csv('my/path' +"nameofthefile.txt", header=True, index=False, quoting=csv.QUOTE_NONE, sep='\t')
I get the following error:
need to escape, but no escapechar set
When I remove quoting=csv.QUOTE_NONE argument my file is being exported without any errors but in the column mentioned above all " are doubled:
"<a href=""myhyperlink"" target=""_blank"" >text displayed</a>"
I checked the intermediate steps and these extra " are not there.
Can anyone assist?

Related

Encoding/import Adwords .CSV into powerquery

Basics: of the question: how are .cvs reports from Google Adwords encoded?
Details: I'm trying to import a .csv from adwords using powerquery and for the life of me I can't get the "," (comma) characters to appear in my import.
My Code:
let
// Get raw file data as txt file,
fnRawFileContents = (fullpath) as table =>
let
EveryLine = Lines.FromBinary(File.Contents(fullpath),1,true,1200),
Value = Table.FromList((EveryLine),Splitter.SplitByNothing())
in
Value,
// Use functions to load contents
Source = fnRawFileContents("C:\Users\Jamie.Marshall\Desktop\Emma\adwordsDoc.csv"),
#"Removed Top Rows" = Table.Skip(Source,1)
in
#"Removed Top Rows"
Facts:
Adwords documentation says they use UTC-16LE
UTC-16LE in M is code page 1200
I cannot open the Adwords .csv in notepad under any encoding setting (Unicode, Unicode Big Endian, UTF-8, ASNI)
If resave the file in excel as UnicodeText I can open it with notepad as Unicode Big Endian with linebreaks, but no commas (",").
How can I verify the encoding on these docs?
What other encoding could this be?
Any help on this will be much appreciated.
Why you use lines instead of native csv-parser?
Use Csv.Document(file_content, [Delimiter="#(tab)", Columns=10, Encoding=1200, QuoteStyle=QuoteStyle.None])
Like this
let
file_path = "C:\Users\Jamie.Marshall\Desktop\Emma\adwordsDoc.csv",
file_content = File.Contents(file_path),
csv = Csv.Document(file_content, [Delimiter="#(tab)", Columns=10, Encoding=1200, QuoteStyle=QuoteStyle.None]),
skip_1_row = Table.Skip(csv,1),
promote_header = Table.PromoteHeaders(skip_1_row),
remove_last_2_rows = Table.RemoveLastN(promote_header,2)
in
remove_last_2_rows

how to read file into string without escaping characters

I'm using the following code to read a file into a string, but I want to read it as is, without escaping any characters.
for sheet in style_sheet_list:
with open (DJANGO_ROOT + "/assets/css/" + sheet + ".css", "r") as myfile:
style_sheets+=myfile.read()
style_sheets+="\n"
return style_sheets
For example, I get:
"\f05c";
Instead of:
"\f05c";
How can I do this?
This is not a Python problem; this is django escaping your template variable. Use the safe filter to correct the template output.

Writing into xlsx file - newline causes problems

I'm trying to write some text into xlsx file using openpyxl module. It works good until I try to write text (text = '\n'.join(list_of_sentences)).
It seems that it writes it correctly (no error is raised) but when I try to open the file, openoffice freezes. The file is somehow corrupted.
I've tried already text = r'\n'.join(list_of_sentences) but it did not help.
Maybe it is because of this Warning:
C:\Python27\lib\site-packages\openpyxl\styles\styleable.py:111: UserWarning: Use formatting objects such as font directly
warn("Use formatting objects such as font directly")
Could you give me an advice how to do it correctly?
EDIT - THE CODE:
with open(file) as f:
for line in f:
i+=1
splitted = line.strip('\n').split('::')
name = splitted[0]
text = splitted[1].split('***')
text_xlsx = '\n'.join(text)
# text_xlsx.replace('_x000D_','\n')
worksheet.cell('B{}'.format(i)).style.alignment.wrap_text = True
worksheet.cell('B{}'.format(i)).value = text_xlsx
Example of the text_xlsx:
'dir\\dir\\dir\\dir\\dir\\.file_name.pl-pl::\xef\xbb\xbf<p><strong class="title">€this</strong></p>***<p><span class="text">Please add Peter\'s account €1!.<br /> <a target="_blank" href="some url">click to redirect</a></span></p>\n'

Converting .results file to .xlsx file with Python

I am trying to convert a .results file to .xlsx format so that I can read it with Python. If I open the .results file, select "save as", and choose a .xlsx extension, the file can be read in perfectly using
from pandas import read_excel
dataframe = read_excel(r"C:\survey\Results.xlsx",'CTsurvey')
However, I would like to convert it to .xlsx format programmatically, so I tried this:
import os
ResultsFile = r"C:\survey\Results.results"
base = os.path.splitext(ResultsFile)[0]
os.rename(ResultsFile, base + ".xlsx")
However, when I try to read in the file, I get an error saying:
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'"hitid"\t'

Parse HTML uploaded CSV file in Python

I am using GAE to host a website which needs inputs from a CSV file. After this csv file has been uploaded, I will convert it to a table. However, I met the problem about Mac and Windows compatibility issue. The CSV file generated in Mac will not be recognized, and I got the error:
new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Here is my
Python CODE
def loop_html(thefile):
reader = csv.reader(thefile.file)
header = reader.next()
i=1
iter_html=""
for row in reader:
iter_html = iter_html +html_table(row,i) #generate inputs table
i=i+1
def html_table(row_inp,iter):
mai_temp=float(row_inp[0])
Input_header="""<table border="1">
<tr><H3>Batch Calculation of Iteration %s</H3></tr><br>
<tr>
<td><b>Input Name</b></td>
<td><b>Input value</b></td>
<td><b>Unit</b></td>
</tr>"""%(iter)
Input_mai="""<tr>
<td>Mass of Applied Ingredient Applied to Paddy</td>
<td>%s</td>
<td>kg</td>
</tr>""" %(mai_temp)
Inout_table = Input_header+Input_mai
return Inout_table
Later I changed the code 'reader = csv.reader(thefile.file) ' to
'reader = csv.reader(open(thefile.file,'U')) ' which gave me different types of error:
TypeError: coercing to Unicode: need string or buffer, cStringIO.StringO found
Can anyone take a look at my code and give me some suggestions? Thanks!
I just found a solution. 'splitlines()' will handle the new line issue. Here is the source.
reader = csv.reader(thefile.file.read().splitlines())

Resources