Preserve unicode strings in python dictionary [duplicate] - python

This question already has an answer here:
Why do backslashes appear twice?
2 answers
I am trying to preserve the unicode in dict
here is the string looks like:
password = r"abc\3]xyz"
print(password)
output:
abc\3]xyz
But when I use the same variable in dict it's adding an escape character:
id_pass = { "id" : "username", "password" : password }
print(id_pass)
output:
{ u'id' : u'username', u'password' : u'abc\\3]xyz"' }
Expected:
{ u'id' : u'username', u'password' : u'abc\3]xyz"' }
I am not able to figure out a way.

It's not changing the value of your string, it's merely printing its repr() value, which shows the escape value.

It looks like it's adding an escape character because if you pass it to repr(), which is what is displayed, an escape character is added in the visualization. Nothing is actually being changed.

Related

Opposite single/double quotes using pymongo command

I am trying to take user input, create a URI, and add it with a collection in Pymongo, but whenever I try to do this, the format gets messed up and I cant figure out how to fix it.
When running the line:
print(db.command("create", "storage", someStorage={ "URI": {FS_URI}}))
where "Storage" is the collection,
I want the object to be {"fs" : "something://a:b"} or {'fs' : 'something://a:b'}
FS_URI = ('\"fs\" : \"'+URI+'\"')
gives the error: Cannot encode object: {'"fs" : "something://a:b"'}
FS_URI = ("fs\" : \"%s" % URI)
gives the error" Cannot encode object: {'fs" : "something://a:b'}
FS_URI = ("fs\' : \'%s" % URI)
gives the error" Cannot encode object: {"fs' : 'something://a:b"}
The quotes are always unmatching, or have extra quotes around them.
I have tried the command with the actual URI in the quote format I want, and it runs perfectly.
I found that using a dict solved this problem, by changing
FS_URI = ("fs\" : \"%s" % URI)
to a JSON object rather than a string:
FS_URI = {"fs": "{}".format(URI)}
solved this problem

Javascript variable to json in Python

I have a string look as javascript variable '{name : "John" , age : 17}' in Python, how I can convert it to JSON like this
{
"name" : "John" ,
"age" : 17
}
or how to add double quotes to field name and age.
json.loads coudn`t help me, it return me string not json
Are you sure this is a proper json - the keys in your example, name and age, are not surrounded with quotes, which is invalid data by itself.
If that is not the case, e.g. a typo in the question, the method to use is dumps():
import json
py_dict = {"name": "John" , "age": 17}
json_val = json.dumps(py_dict) # string json representation of the dictionary
json_val2 =json.dumps(json.loads('{"name" : "John" , "age" : 17}')) # parsing a string as dictionary and then dumping it as a json-formatted string
As you can see by the 2nd example, loads() takes a json string and creates a dictionary out of it.
Answering the updated question: any attempt to parse and fix the source with string replace or regex is error prone.
As yaml is a superset of json, and your example does look like yaml, it might be able to parse it:
import yaml, json
val = '{name: "John" , age: 17}'
py_dict = yaml.loads(val) # it is now a proper python dictionary
print(json.dumps(py_dict))
# prints
# {"name": "John" , "age": 17}

Format String of Dictionary

I've a string of dictionary as following:
CREDENTIALS = "{\"aaaUser\": {\"attributes\": {\"pwd\": \"cisco123\", \"name\": \"admin\"}}}"
Now I want to format this string to replace the pwd and name dynamically. What I've tried is:
CREDENTIALS = "{\"aaaUser\": {\"attributes\": {\"pwd\": \"{0}\", \"name\": \"{1}\"}}}".format('password', 'username')
But this gives following error:
traceback (most recent call last):
File ".\ll.py", line 4, in <module>
CREDENTIALS = "{\"aaaUser\": {\"attributes\": {\"pwd\": \"{0}\", \"name\": \"{1}\"}}}".format('password', 'username')
KeyError: '"aaaUser"
It is possible by just loading the string as dict using json.loads()and then setting the attributes as required, but this is not what I want. I want to format the string, so that I can use this string in other files/modules.
'
What I'm missing here? Any help would be appreciated.
Don't try to work with the JSON string directly; decode it, update the data structure, and re-encode it:
# Use single quotes instead of escaping all the double quotes
CREDENTIALS = '{"aaaUser": {"attributes": {"pwd": "cisco123", "name": "admin"}}}'
d = json.loads(CREDENTIALS)
attributes = d["aaaUser"]["attributes"]
attributes["name"] = username
attributes["pwd"] = password
CREDENTIALS = json.dumps(d)
With string formatting, you would need to change your string to look like
CREDENTIALS = '{{"aaaUser": {{"attributes": {{"pwd": "{0}", "name": "{1}"}}}}}}'
doubling all the literal braces so that the format method doesn't mistake them for placeholders.
However, formatting also means that the password needs to be pre-escaped if it contains anything that could be mistaken for JSON syntax, such as a double quote.
# This produces invalid JSON
NEW_CREDENTIALS = CREDENTIALS.format('new"password', 'bob')
# This produces valid JSON
NEW_CREDENTIALS = CREDENTIALS.format('new\\"password', 'bob')
It's far easier and safer to just decode and re-encode.
str.format deals with the text enclosed with braces {}. Here variable CREDENTIALS has the starting letter as braces { which follows the str.format rule to replace it's text and find the immediately closing braces since it don't find it and instead gets another opening braces '{' that's why it throws the error.
The string on which this method is called can contain literal text or replacement fields delimited by braces {}
Now to escape braces and replace only which indented can be done if enclosed twice like
'{{ Hey Escape }} {0}'.format(12) # O/P '{ Hey Escape } 12'
If you escape the parent and grandparent {} then it will work.
Example:
'{{Escape Me {n} }}'.format(n='Yes') # {Escape Me Yes}
So following the rule of the str.format, I'm escaping the parents text enclosed with braces by adding one extra brace to escape it.
"{{\"aaaUser\": {{\"attributes\": {{\"pwd\": \"{0}\", \"name\": \"{1}\"}}}}}}".format('password', 'username')
#O/P '{"aaaUser": {"attributes": {"pwd": "password", "name": "username"}}}'
Now Coming to the string formatting to make it work. There is other way of doing it. However this is not recommended in your case as you need to make sure the problem always has the format as you mentioned and never mess with other otherwise the result could change drastically.
So here the solution that I follow is using string replace to convert the format from {0} to %(0)s so that string formatting works without any issue and never cares about braces .
'Hello %(0)s' % {'0': 'World'} # Hello World
SO here I'm using re.sub to replace all occurrence
def myReplace(obj):
found = obj.group(0)
if found:
found = found.replace('{', '%(')
found = found.replace('}', ')s')
return found
CREDENTIALS = re.sub('\{\d{1}\}', myReplace, "{\"aaaUser\": {\"attributes\": {\"pwd\": \"{0}\", \"name\": \"{1}\"}}}"% {'0': 'password', '1': 'username'}
print CREDENTIALS # It should print desirable result

django python convert unicode in queryset

How convert u'[<Car: { surname :yass name : zazadz } >] in [<Car: { surname :yass name : zazadz } >].
So how convert unicode in django.db.models.query.QuerySet ?
This is the function I wrote to convert unicode into string which i then process according to my need.
_in_unicode is the parameter passed to this function, which takes an input of type unicode.
def unicode_to_string(_in_unicode):
return unicodedata.normalize('NFKD', _in_unicode).encode('ascii', 'ignore')
You will have to write a parser to process that string and then convert it into proper format for data processing.
P.S: If you are downvoting, please add comment for the reason.

Pymongo escaping unicode characters in field names

I have the following example key-value pair stored in MongoDB (and a number of similar pairs):
"Cl\uff0eG_bibcode": 'some value'
The reason it is stored like this is because MongoDB doesn't accept dots in the key name. Using the \uff0e unicode version of a dot has worked no problem thus far, but I have started using pymongo to pull in data from my db and it seems to be escaping the \uff0e. So when I look at it in my code it comes in like this:
"Cl\\uff0eG_bibcode": 'some value'
I've noticed it doesn't do this if the \uff0e is in the value, only if it's in the key. I'm also not manipulating the data at any point between Mongo and my code. This is all I'm doing:
url = 'mongodb://user:passwd#host/database'
client = MongoClient(url)
db = client.get_default_database()
collection = db['my_collection']
results = collection.find().limit(1) #just grabbing any record to test
I'm looking for some insight into how to get pymongo to stop escaping unicode characters in the key name. I'd really like to not have to go through all my results and get rid of it manually.
The previous driver you used may be escaping during insert. When done in my mongo shell, the following shows no encoding.
db.Junk.insert({"Cl\uff0eG_bibcode": "some value"})
db.Junk.find()
{ "_id" : ObjectId("5487e53c64316c4cb2442578"), "Cl.G_bibcode" : "some value" }
Escaping it as such does show your result from the shell.
db.Junk.insert({"Cl\\uff0eG_bibcode": "some value"})
db.Junk.find()
{ "_id" : ObjectId("5487e5fc64316c4cb2442579"), "Cl\uff0eG_bibcode" : "some value" }
Have you inserted the data directly via the shell?
Btw dots are accepted in the key name but they're interpreted as dot notation and therefore considered a subdocument. ie. a key of "user._id" will be interpreted as the _id key inside the object (value) associated with the user key
{ _id: value
key1: "str_value",
key2: 12345,
user: { _id: xxx, name: xxxx}
}
Two suggestions.
a) Why not use a pipe | , or some other simple but implausible (not likely to be used in any normal key) token. b) I would suggest hacking the BSON files and build in a encoder/decoder for your funky key. Possibly in this file. https://github.com/mongodb/mongo-python-driver/blob/master/bson/init.py
Assuming you're using python 2.x here, use unicode keys:
>>> c.foo.bar.insert({u'Cl\uff0eG_bibcode': 'some value'})
ObjectId('5488fe8dfba52249d72069bf')
>>> doc = c.foo.bar.find_one()
>>> doc
{u'_id': ObjectId('5488fe8dfba52249d72069bf'), u'Cl\uff0eG_bibcode': u'some value'}
>>> for key in doc:
... print key.encode("utf-8")
...
_id
Cl.G_bibcode

Resources