I'm pretty new to python and I'm facing the problem below.
I'm importing an xls file in python that in one of the columns has :
<a href="myhyperlink" target='_blank' >text displayed</a>
After I 'm processing the file I need to export it to a .txt file.
when I'm using this line:
file_to_export.to_csv('my/path' +"nameofthefile.txt", header=True, index=False, quoting=csv.QUOTE_NONE, sep='\t')
I get the following error:
need to escape, but no escapechar set
When I remove quoting=csv.QUOTE_NONE argument my file is being exported without any errors but in the column mentioned above all " are doubled:
"<a href=""myhyperlink"" target=""_blank"" >text displayed</a>"
I checked the intermediate steps and these extra " are not there.
Can anyone assist?
Related
Basics: of the question: how are .cvs reports from Google Adwords encoded?
Details: I'm trying to import a .csv from adwords using powerquery and for the life of me I can't get the "," (comma) characters to appear in my import.
My Code:
let
// Get raw file data as txt file,
fnRawFileContents = (fullpath) as table =>
let
EveryLine = Lines.FromBinary(File.Contents(fullpath),1,true,1200),
Value = Table.FromList((EveryLine),Splitter.SplitByNothing())
in
Value,
// Use functions to load contents
Source = fnRawFileContents("C:\Users\Jamie.Marshall\Desktop\Emma\adwordsDoc.csv"),
#"Removed Top Rows" = Table.Skip(Source,1)
in
#"Removed Top Rows"
Facts:
Adwords documentation says they use UTC-16LE
UTC-16LE in M is code page 1200
I cannot open the Adwords .csv in notepad under any encoding setting (Unicode, Unicode Big Endian, UTF-8, ASNI)
If resave the file in excel as UnicodeText I can open it with notepad as Unicode Big Endian with linebreaks, but no commas (",").
How can I verify the encoding on these docs?
What other encoding could this be?
Any help on this will be much appreciated.
Why you use lines instead of native csv-parser?
Use Csv.Document(file_content, [Delimiter="#(tab)", Columns=10, Encoding=1200, QuoteStyle=QuoteStyle.None])
Like this
let
file_path = "C:\Users\Jamie.Marshall\Desktop\Emma\adwordsDoc.csv",
file_content = File.Contents(file_path),
csv = Csv.Document(file_content, [Delimiter="#(tab)", Columns=10, Encoding=1200, QuoteStyle=QuoteStyle.None]),
skip_1_row = Table.Skip(csv,1),
promote_header = Table.PromoteHeaders(skip_1_row),
remove_last_2_rows = Table.RemoveLastN(promote_header,2)
in
remove_last_2_rows
I'm using the following code to read a file into a string, but I want to read it as is, without escaping any characters.
for sheet in style_sheet_list:
with open (DJANGO_ROOT + "/assets/css/" + sheet + ".css", "r") as myfile:
style_sheets+=myfile.read()
style_sheets+="\n"
return style_sheets
For example, I get:
"\f05c";
Instead of:
"\f05c";
How can I do this?
This is not a Python problem; this is django escaping your template variable. Use the safe filter to correct the template output.
I'm trying to write some text into xlsx file using openpyxl module. It works good until I try to write text (text = '\n'.join(list_of_sentences)).
It seems that it writes it correctly (no error is raised) but when I try to open the file, openoffice freezes. The file is somehow corrupted.
I've tried already text = r'\n'.join(list_of_sentences) but it did not help.
Maybe it is because of this Warning:
C:\Python27\lib\site-packages\openpyxl\styles\styleable.py:111: UserWarning: Use formatting objects such as font directly
warn("Use formatting objects such as font directly")
Could you give me an advice how to do it correctly?
EDIT - THE CODE:
with open(file) as f:
for line in f:
i+=1
splitted = line.strip('\n').split('::')
name = splitted[0]
text = splitted[1].split('***')
text_xlsx = '\n'.join(text)
# text_xlsx.replace('_x000D_','\n')
worksheet.cell('B{}'.format(i)).style.alignment.wrap_text = True
worksheet.cell('B{}'.format(i)).value = text_xlsx
Example of the text_xlsx:
'dir\\dir\\dir\\dir\\dir\\.file_name.pl-pl::\xef\xbb\xbf<p><strong class="title">€this</strong></p>***<p><span class="text">Please add Peter\'s account €1!.<br /> <a target="_blank" href="some url">click to redirect</a></span></p>\n'
I am trying to convert a .results file to .xlsx format so that I can read it with Python. If I open the .results file, select "save as", and choose a .xlsx extension, the file can be read in perfectly using
from pandas import read_excel
dataframe = read_excel(r"C:\survey\Results.xlsx",'CTsurvey')
However, I would like to convert it to .xlsx format programmatically, so I tried this:
import os
ResultsFile = r"C:\survey\Results.results"
base = os.path.splitext(ResultsFile)[0]
os.rename(ResultsFile, base + ".xlsx")
However, when I try to read in the file, I get an error saying:
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'"hitid"\t'
I am using GAE to host a website which needs inputs from a CSV file. After this csv file has been uploaded, I will convert it to a table. However, I met the problem about Mac and Windows compatibility issue. The CSV file generated in Mac will not be recognized, and I got the error:
new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Here is my
Python CODE
def loop_html(thefile):
reader = csv.reader(thefile.file)
header = reader.next()
i=1
iter_html=""
for row in reader:
iter_html = iter_html +html_table(row,i) #generate inputs table
i=i+1
def html_table(row_inp,iter):
mai_temp=float(row_inp[0])
Input_header="""<table border="1">
<tr><H3>Batch Calculation of Iteration %s</H3></tr><br>
<tr>
<td><b>Input Name</b></td>
<td><b>Input value</b></td>
<td><b>Unit</b></td>
</tr>"""%(iter)
Input_mai="""<tr>
<td>Mass of Applied Ingredient Applied to Paddy</td>
<td>%s</td>
<td>kg</td>
</tr>""" %(mai_temp)
Inout_table = Input_header+Input_mai
return Inout_table
Later I changed the code 'reader = csv.reader(thefile.file) ' to
'reader = csv.reader(open(thefile.file,'U')) ' which gave me different types of error:
TypeError: coercing to Unicode: need string or buffer, cStringIO.StringO found
Can anyone take a look at my code and give me some suggestions? Thanks!
I just found a solution. 'splitlines()' will handle the new line issue. Here is the source.
reader = csv.reader(thefile.file.read().splitlines())