You are on page 1of 19

FP7-SME-2008-2 243768 OPEN-SME Open-Source Software Reuse Service for SMEs Deliverable D3.

.1b Open Source Search Engine (v.2)


Deliverable Type: Nature of the Deliverable: Date: Distribution: Code: Editor: Contributors:
* Deliverable Type:

PU* P** March 5, 2012 WP3 OPEN-SME/AUTH/WP3/D3.1b AUTH AUTH, TTEL

PU= Public, RE= Restricted to a group specified by the Consortium, PP= Restricted to other program participants (including the Commission services), CO= Confidential, only for members of the Consortium (including the Commission services) ** Nature of the Deliverable: P= Prototype, R= Report, S= Specification, T= Tool, O= Other

Abstract: This is a brief report accompanying the OCEAN tool prototype (version 2), already available to the consortium, with respect to the specifications and functionality already achieved.

Copyright by the OPEN-SME Consortium. The OPEN-SME Consortium consists of: Enosi Mihanikon Pliroforikis & Epikinonion Ellados Drustvo za informacione sisteme I racunarske mreze-Informaciono drustvo Srbije Epistimoniko Techniko Epimelitirio Kyprou (Technical Chamber of Cyprus) Teknikbyn Science Park Vasteras AB SOLINET GmbH Telecommunications GNOMON Informatics SA Maelardalens Hoegskola Teletel S.A. - Telecommunications and Information Technology Aristotelio Panepistimio Thessalonikis Universiteit Maastricht Project Coordinator Partner Partner Partner Partner Partner Partner Partner Partner Partner Greece Serbia Cyprus Sweden Germany Greece Sweden Greece Greece Netherlands

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 2 of 19

This page has been intentionally left blank.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 3 of 19

Table of Contents
ABBREVIATIONS...........................................................................................................................4 1. INTRODUCTION ........................................................................................................................5 1.1 DELIVERABLE SCOPE ..................................................................................................................5 2. TECHNOLOGY PLATFORM....................................................................................................6 3. API DEVELOPMENT..................................................................................................................8 3.1 EXTERNAL QUERY SERVICES ........................................................................................................8 3.2 DATABASE SERVICE API..........................................................................................................10 4. USER INTERFACE AND USE CASE DEVELOPMENT.....................................................15 5. SCREENSHOTS..........................................................................................................................16

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 4 of 19

ABBREVIATIONS
CBD CBSE CMMI COMPARE COTS CPU EFP ISO JSP PI ProCom QoS RCP RI RTOS RUP SME SWEET V&V WCET Component Based Development Component Based Software Engineering Capability Maturity Model Integration Component Repository and Search Engine Commercial Off The Shelf Central Processing Unit Extra Functional Property International Standard Organisation Java Server Pages Provided Interface Progress Component Model Quality of Service Rich Control Platform Required Interface Real-time Operating System Rational Unified Process Small and Medium scale Enterprise Swedish Worst Case Execution Time Tool Verification and Validation Worst Case Execution Time

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 5 of 19

1. INTRODUCTION
1.1 DELIVERABLE SCOPE
This is a brief report accompanying the OCEAN tool prototype (version 2), already available to the consortium, with respect to the specifications and functionality already achieved.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 6 of 19

2. TECHNOLOGY PLATFORM
An instance of Liferay, the open source content management system and Enterprise portal has been deployed and is accessible on http://ocean.gnomon.com.gr. For access as an administrator you can use the account (username, password) = (root, test). Basic user, role and security management has been implemented. Access to the tools is restricted to registered users only. Registration functionality and process are in place. After the development and during the testing phase of OCEAN v1.0 it became evident that the fundamental assumption, that a number of external APIs can be used / called from the internal search API in order to fetch results from OS search engines, was not valid anymore. During early test phase (Q3 Q4 2011) already the Merobase API service stopped working, leaving the Google Code search API as the single working source for OCEAN. This already problematic situation (to have a metasearch engine with only one source) became quickly a stalemate as Google Code ceased service from January 15th 2012. Thus there was an urgent need to redefine OCEAN functionality and basic design. The OCEAN team returned to the drawing table and came back in a very short time with an alternative system design: Instead of calling external APIs, the meta-search engine would call a new HTTP-based web service running on a Debian Linux server at the Aristotle University of Thessaloniki that queries standard HTML-based Open Source search sites and scrapes the N first results returned from their native web interface . To quickly achieve the desired functionality the team used the free web data extraction tool DEiXTo [http://deixto.com] and custom Perl CGI scripts capable of searching in real time Koders and Krugle. As far as Merobase is concerned, after communicating with its creators, access to a brand new API was provided through a JAR search client. So, a Perl web service (running on the same server) was also written utilizing the API and returning the results for a user-specified query in a suitable XML format. The revised OCEAN architecture is shown in Figure 1.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 7 of 19

Figure 1: Revised OCEAN Architecture Finally, in addition to the new architecture, OCEAN user interface has improved, with basic search parameters such as language, license and return type added to the main page as well as with improved user preferences management. (see Screenshots)

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 8 of 19

3. API DEVELOPMENT
Internal Query API: The following API methods have been developed and tested according to the specifications:

#
1

Method
searchOSS( textToSearch:Text, searchBase:Text, engines:List, licences:List, metrics:List, userDefinedOptions:List, resultGranularity:List, async:Boolean, timeout:function, complete:function): searchResults:List

2 3

getEngines(void) : engines:List getEngine(engine:String) : result:Engine

External Query and Fetch APIs: The API has been implemented for the Google Code1 and the Merobase2 search engines Database and DB access API. The specified DB schema has been defined and implemented in the MySQL database that is part of the deployed Liferay instance. The method storeResults(userID:Number, searchResults:List):void has been implemented and tested.

3.1 EXTERNAL QUERY SERVICES


Typically there are two main mechanisms to search and retrieve data from a website: either through an Application Programming Interface commonly known as an API (if available) or via screen scraping. The first one is better, faster and more reliable. However, there is not always a search API available. In such cases, web robots, also called agents, are usually used in order to simulate a person searching the target website through a web browser and capture bits of interest by utilizing scraping techniques.So, for the open source code search engines Koders and Krugle that do not offer an API, we deployed DEiXTo1 http://www.google.com/codesearch 2 http://merobase.com/#main

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 9 of 19

based wrappers in order to scrape in real time the results returned from their native web interface. Custom Perl scripts were written and got installed on a Debian Linux server at the premises of the Aristotle University of Thessaloniki. Therefore, OCEAN became able to search the two websites through an external web service. It should be noted though that for Krugle, an excellent open source web browser automation tool, called Selenium3, was used. Moreover, for Merobase, a JAR search client was utilized. After communicating with the Merobase core developer, we got access to their brand new API. Thus, another 3rd Perl service was created returning Merobase results in a suitable XML format. More specifically: Koders The koders script is based upon DEiXToBot (a Mechanize agent object capable of executing extraction rules previously built with the DEiXTo GUI tool). The service supports 4 URL parameters: s (for the search keyword), li (for license), la (for language) and n (for the number of results requested). Example: http://swserv2.csd.auth.gr/cgi-bin/koders.pl?li=*&la=*&s=perl&n=20 This http request would result in the following native http request: http://www.koders.com/default.aspx?s=perl&la=*&li=*&p=0 The XML response file returned by our service is depicted in Figure 2:

Figure 2: Example XML response from Koders Krugle

The krugle script/ service has two pillars: a) the Selenium Server (version 2.20.0) and b) DEiXToBot. Selenium allows us to launch a Firefox instance in order to programmatically simulate the process of searching on Krugles website. On the other hand, DEiXToBot facilitates the parsing of results data in the HTML result pages and their transformation into XML. The script supports 4 URL parameters: s (for the search keyword), project, license and n (for the number of results requested) Example: http://swserv2.csd.auth.gr/cgi-bin/krugle.pl?s=java&n=10&project=&language=&license= An example of the XML response is depicted in Figure 3:

3 http://seleniumhq.org/

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 10 of 19

Figure 3: Example XML response from Krugle The http request above would result in this native request: http://opensearch.krugle.org/document/? query=java&project=&language=&license=&search_type=advanced_search Merobase The Merobase Perl script (harnessing a Java search client) can submit queries in real time to Merobase through its API. It supports 2 parameters: s (for the search keyword) and n (for the number of results requested). Example: http://swserv2.csd.auth.gr/cgi-bin/merobase.pl?s=java&n=25 This would yield the following XML response depicted in Figure 4:

Figure 4: Example XML response from Merobase

3.2 DATABASE SERVICE API


public void storeResults (long userId, String title, String description, String[] tags, List<OceanSearchResult> oceanResults) Description This call stores a list of OceanSearchResults in the database characterized by a title, a description and a number of tags. Additionally the userId of the user who requests the operation.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 11 of 19

Argument Description userID : Id of the user requesting the operation. title : The title of the query. description : The description of the query. oceanResults : List of results record.

public Queries getQuery (String queryId) Description Returns a Queries object stored in the database with the specific queryId. A Query is specified by a Title, a Description, an array of Tags and a list of QueryResults. A Query is described as a group of QueryResults with a common search criteria. Argument Description queryId : the Id of the query that is stored in the database.

public void deleteQuery(String queryId) Description Deletes a Queries object with the current queryId from the database, additionally all the children that are associated (Metrics, Metadata, etc.) with it are also deleted. Argument Description......................................................................................................... queryId : the Id of the query that is stored in the database.

public void storeQuery(long userId, Queries query, String title, String description) Description A Queries object is been created in database with a title and a description. The userId of the user who requests the operation is required. Argument Description userId : Id of the user requesting the operation. title : The title of the query. description : The description of the query.

public void storeEntries(List<OceanSearchResult> oceanResults, Queries query)


OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 12 of 19

Description A list of OceanSearchResults objects is been created in database and attached to a Query object that already exists in database. Argument Description oceanResults : List of results record. query : The query with the results are associated.

public void storeMetrics(QueryResults queryResults, OceanSearchResult searchResult) Description A QueryMetrics object is been created in the database and attached to an already created queryResults object of the database by retrieving the appropriate metrics values from the searchResult object. ........................................................................................................................................................ Argument Description queryResults : a QueryResults object that belongs to a Queries object. Both already exist in database. searchResult : a result that is been return by the search interface.

public void storeMetadata(QueryResults queryResults, OceanSearchResult searchResult); Description A QueryMetadata object is been created in the database and attached to an already created queryResults object of the database by retrieving the appropriate metadata values from the searchResult object. ........................................................................................................................................................ Argument Description queryResults : a QueryResults object that belongs to a Queries object. Both already exist in database. searchResult : a result that is been return by the search interface.

public void storeTags(String[] tags, Queries query) Description An array of tags is been stored in the database and is associated with the query object that exists already in the database............................................................................................. Argument Description tags: a list of tags that are associated with the query object. query : the query object which is associated with the tags.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 13 of 19

public List<QueryResults> listQueryResults(String queryId) Description A list of QueryResults is returned based on the queryId of the parent Queries object. Argument Description queryId : the Id of the query that is stored in the database.

public List<QueryMetrics> listQueryMetrics(String queryId) Description A list of QueryMetrics is returned based on the queryId of the parent Queries object. Argument Description queryId : the Id of the query that is stored in the database.

public List<ResultMetadata> listResultMetadata(String queryId); Description A list of ResultMetadata is returned based on the queryId of the parent Queries object. Argument Description queryId : the Id of the query that is stored in the database.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 14 of 19

Figure 5: Database Schema

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 15 of 19

4. USER INTERFACE AND USE CASE DEVELOPMENT


The following table summarizes the status of the development of the different use cases defined for the tool. Use Case 1: Create user account Completed & tested Use Case 2: Approve user account request Completed & tested Use Case 3: User Login Completed & tested Use Case 4: User Log Out Completed & tested Use Case 5: Request forgotten password Completed & tested Use Case 6: Create/ Edit User profile Completed & tested Use Case 8: Perform search (Freetext) Completed & tested Use Case 7: Perform search (Navigational) This had to be delayed for the 1st week of April due to other priorities described earlier in Section 2, since additional resources had to be allocated for the integration of DeiXto tool in the OCEAN architecture. It is currently under development and testing. Use Case 9: Perform search (Advanced) Completed & tested Use Case 10:Store search results Completed & tested Use Case 11: View saved queries. Completed & tested Use Case 12: Subscribe to search notification service. This had to be delayed for the 1st week of April due to other priorities described earlier in Section 2, since additional resources had to be allocated for the integration of DeiXto tool in the OCEAN architecture. It is currently under development and testing.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 16 of 19

5. SCREENSHOTS

Figure 6: Sign-in welcome page

Figure 7: Basic search functionality using the Merobase and Krugle search engines

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 17 of 19

Figure 8: Account Management page

Figure 9: Create/Edit User Profile.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 18 of 19

Figure 10: Selection of results to be saved.

Figure 11: Title, Description and Tags of saved query.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

Deliverable D3.1b: Open Source Search Engine (version 1)

Page 19 of 19

Figure 12: List of saved queries.

Figure 13: Details of the saved query.

OPEN-SME/AUTH/WP3/D3.1b

OPEN-SME Consortium August 2011

You might also like