This writeup is based on my adventures with getting Apache Solr working for the first time. Solr is a wrapper over Lucene. To get Solr working, it is necessary that you have some basic knowledge of Lucene. This tutorial helped me in understanding the basics of Lucene.
I started by following the starters guide present in Solr site. I downloaded the nightly build and unzipped it. In the dist folder you can find the Solr war file. I wanted to host Solr in resin where as the tutorial has steps for doing it in Jetty.
I did the following to start Solr in resin:
- Put the Solr war file in resin webapps folder.
- Renamed the war file to solr.war.
- Created a folder called solr and copied bin and conf folder present in Solr package to solr folder. Made this my solr home.
- When you fire up Solr, you need to tell Solr where to find it’s home folder. So added the following in resin.conf file under the id element.
Id element looked as below:
<host id='*'> <root-directory>./solr</root-directory> <web-app id="solr/"> <document-directory>webapps/solr</document-directory> </web-app> <system-property solr.solr.home="E:\software\solr"/> </host>
- The tutorial assumes that your Solr is listening on port 8983. I changed mine to 8080 in resin.conf file.
<http server-id="" host="*" port="8080"/>
- Bounced resin and entered the url http://localhost:8080/solr/ in mozilla to view the Solr welcome page.
Wow…first hurdle passed.
In case you do not want the “solr” string in your uri(i.e no context), change the webapp id in resin.conf file from <web-app id=”solr/”> to <web-app id=”/”> and bounce resin. Now you do not have to type solr in all your urls. http://localhost:8080/ should get you the Solr welcome page.
I already had one application running in my resin. I did not want to change that. So I set up my resin to host 2 applications by adding the following to windows hosts.conf file present in my <windows folder>\system32\drivers\etc
My resin.conf file looked as below:
<host id='mumbai.brp.com'> <root-directory>.</root-directory> <web-app id="/"> <document-directory>webapps/burrp</document-directory> </web-app> </host> <host id='solr.com'> <root-directory>./solr></root-directory> <web-app id="solr/"> <document-directory>webapps/solr</document-directory> </web-app> <system-property solr.solr.home="E:\software\solr"/> </host>
To get the above working you have to create a directory called solr containing webapps folder in your resin home folder and place the solr.war file in this webapps folder.
Now comes the part where I need to provide solr with Lucene documents for it to index.
The tutorial has steps for uploading Lucene documents in xml format using a command line tool that comes along with the download. But this tries to upload the document to http://localhost:8983/solr/update. I could not use this as I had changed Solr url as well as port. I tried to find the source of post.jar so that I could make appropriate changes to it. I could not get hold of the source. As solr uses HTTP I tried cURL. This is my first experience with cURL also.
My first goal was to index the data present in monitor.xml file that is present in Solr example docs folder. I tried various options of cURL to upload files using a file name. The one that I got was for HTTP put not a multipart form file transfer. This did not work as Solr does not support HTTP put. Got some pointers as to how to do it here. I tried the curl command cited in the link with the contents of monitor.xml and my own solr link.
curl http://solr.com:8080/solr/update -H “Content-Type: text/xml” –data-binary ‘<contents of monitor.xml>’
I got the following error message.
“< was unexpected at this time. “
After some experimenting I got the above working. Did this by replacing the double quotes in monitor.xml with single quotes.
curl http://solr.com:8080/solr/update -H “Content-Type: text/xml” –data-binary “<contents of monitor.xml with single quotes>”
Once you do this you have to commit the indexing changes. This I did using the below:
curl http://solr.com:8080/solr/update -H “Content-Type: text/xml” –data-binary “<commit waitFlush=’false’ waitSearcher=’false’>”
Yippee…Now I could search the contents of monitor.xml from the admin search interface.
Now I wanted to upload my own data for indexing. But my xml data was too huge to paste in the command prompt and I had not figured a way to upload files using their name through cURL. So I whipped up an HTML with a file submit form, with the action as my Solr update link.
<html> <head> </head> <body> <form action="http://solr.com:8080/solr/update" enctype="multipart/form-data" method="post"> <input type="file"> <input type="submit" value="Send"> </form> </body> </html>
I configured Solr for auto commit by changing the options present in solrconfig.xml, present in Solr home folder. It comes commented in Solr download.
<autoCommit> <maxDocs>10</maxDocs> </autoCommit>
I uploaded a document and tried searching for it but the query did not return any result. After lots of fidgeting around the net came to know that you have to specify the fields present in the document that you upload in schema.xml file present in Solr home conf folder.
My xml document looked as below:
<add> <doc> <field name="name">foo</field> <field name="category">foocat</field> <field name="id">0</field> </doc> <doc> <field name="name">bar</field> <field name="category">barCat</field> <field name="id">1</field> </doc> . . . .
So added the following under the fields element present in schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="string" indexed="true" stored="true" required="true" /> <field name="category" type="string" indexed="true" stored="true" required="true" />
After making these changes you have to bounce resin. Otherwise the changes made will not be picked up. Still my search results where returning me blanks. Still more fidgeting around and got to know that the fields have to be added to copyField element also.
Added the below to schema.xml
<copyField source="id" dest="text"/> <copyField source="category" dest="text"/> <copyField source="name" dest="text"/>
Bounced resin. Yippeeee. My searches started working.
One day of adventure with Lucene and Solr comes to an end. Will keep you guys posted as I tread deep into Apache Lucene and Solr.